1.7 KiB
Text2Img
A rough implementation of generating image embeddings through methodologies introduced in LLaVA
Structure
We derived the image embeddings by using a CLIP encoder and mapping it with the pretrained LLaVA’s projection weights layer
Prerequisites
- install requirements.txt
- Make sure you have llava-v1.5-mlp2x-336px-pretrain-vicuna-7b-v1.5 under your models folder.
- For example image data, I used 2017 Val images 5K/1GB and 2017 Train/Val annotations 241MB
Usage
For image_embedder.py:
-
Embed a single image (Print Only):
python -m embed.image_embedder --image "C:\path\img.jpg" --no-save -
Embed a single image (Save to File):
python -m embed.image_embedder --image "C:\path\to\image.jpg" --out "C:\project\embeddings\image_embeddings.pkl" -
Embed a single folder of images:
python -m embed.image_embedder --folder "C:\path\to\images" --out "C:\project\embeddings\image_embeddings.pkl" --batch-size 32
For text_embedder.py:
-
Embed a Single Article (Print Only):
python -m embed.text_embedder --text "This is my single-article input string." -
Embed a Single Article (Save to File):
python -m embed.text_embedder --text "This is my single-article input string." --out "C:\project\embeddings\text_embeddings.pkl" -
Embed multiple articles from a file (one per line):
python -m embed.text_embedder --file "C:\path\to\articles.txt" --out "C:\project\embeddings\text_embeddings.pkl" --batch-size 8