54 lines
1.7 KiB
Markdown
54 lines
1.7 KiB
Markdown
# Text2Img
|
||
|
||
A rough implementation of generating image embeddings through methodologies introduced in LLaVA
|
||
|
||
### Structure
|
||
We derived the image embeddings by using a CLIP encoder and mapping it with the pretrained LLaVA’s projection weights layer
|
||
|
||
### Prerequisites
|
||
1. install requirements.txt
|
||
2. Make sure you have [llava-v1.5-mlp2x-336px-pretrain-vicuna-7b-v1.5](https://huggingface.co/liuhaotian/llava-v1.5-mlp2x-336px-pretrain-vicuna-7b-v1.5/tree/main) under your **models** folder.
|
||
3. For example image data, I used [2017 Val images 5K/1GB](http://images.cocodataset.org/zips/val2017.zip) and [2017 Train/Val annotations 241MB](http://images.cocodataset.org/annotations/annotations_trainval2017.zip)
|
||
|
||
### Usage
|
||
|
||
For image_embedder.py:
|
||
|
||
1. Embed a single image (Print Only):
|
||
`python -m embed.image_embedder
|
||
--image "C:\path\img.jpg"
|
||
--no-save
|
||
`
|
||
|
||
2. Embed a single image (Save to File):
|
||
`python -m embed.image_embedder
|
||
--image "C:\path\to\image.jpg"
|
||
--out "C:\project\embeddings\image_embeddings.pkl"
|
||
`
|
||
|
||
3. Embed a single folder of images:
|
||
`python -m embed.image_embedder
|
||
--folder "C:\path\to\images"
|
||
--out "C:\project\embeddings\image_embeddings.pkl"
|
||
--batch-size 32
|
||
`
|
||
|
||
For text_embedder.py:
|
||
1. Embed a Single Article (Print Only):
|
||
`python -m embed.text_embedder
|
||
--text "This is my single-article input string."
|
||
`
|
||
|
||
2. Embed a Single Article (Save to File):
|
||
`python -m embed.text_embedder
|
||
--text "This is my single-article input string."
|
||
--out "C:\project\embeddings\text_embeddings.pkl"
|
||
`
|
||
|
||
3. Embed multiple articles from a file (one per line):
|
||
`python -m embed.text_embedder
|
||
--file "C:\path\to\articles.txt"
|
||
--out "C:\project\embeddings\text_embeddings.pkl"
|
||
--batch-size 8
|
||
`
|