text2img/readme.md
2025-06-18 08:12:02 +00:00

54 lines
1.7 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Text2Img
A rough implementation of generating image embeddings through methodologies introduced in LLaVA
### Structure
We derived the image embeddings by using a CLIP encoder and mapping it with the pretrained LLaVAs projection weights layer
### Prerequisites
1. install requirements.txt
2. Make sure you have [llava-v1.5-mlp2x-336px-pretrain-vicuna-7b-v1.5](https://huggingface.co/liuhaotian/llava-v1.5-mlp2x-336px-pretrain-vicuna-7b-v1.5/tree/main) under your **models** folder.
3. For example image data, I used [2017 Val images 5K/1GB](http://images.cocodataset.org/zips/val2017.zip) and [2017 Train/Val annotations 241MB](http://images.cocodataset.org/annotations/annotations_trainval2017.zip)
### Usage
For image_embedder.py:
1. Embed a single image (Print Only):
`python -m embed.image_embedder
--image "C:\path\img.jpg"
--no-save
`
2. Embed a single image (Save to File):
`python -m embed.image_embedder
--image "C:\path\to\image.jpg"
--out "C:\project\embeddings\image_embeddings.pkl"
`
3. Embed a single folder of images:
`python -m embed.image_embedder
--folder "C:\path\to\images"
--out "C:\project\embeddings\image_embeddings.pkl"
--batch-size 32
`
For text_embedder.py:
1. Embed a Single Article (Print Only):
`python -m embed.text_embedder
--text "This is my single-article input string."
`
2. Embed a Single Article (Save to File):
`python -m embed.text_embedder
--text "This is my single-article input string."
--out "C:\project\embeddings\text_embeddings.pkl"
`
3. Embed multiple articles from a file (one per line):
`python -m embed.text_embedder
--file "C:\path\to\articles.txt"
--out "C:\project\embeddings\text_embeddings.pkl"
--batch-size 8
`