text2img/readme.md

# Text2Img

A rough implementation of generating image embeddings through methodologies introduced in LLaVA

### Structure
We derived the image embeddings by using a CLIP encoder and mapping it with the pretrained LLaVA’s projection weights layer

### Prerequisites
1. install requirements.txt
2. Make sure you have [llava-v1.5-mlp2x-336px-pretrain-vicuna-7b-v1.5](https://huggingface.co/liuhaotian/llava-v1.5-mlp2x-336px-pretrain-vicuna-7b-v1.5/tree/main) under your **models** folder.
3. For example image data, I used [2017 Val images 5K/1GB](http://images.cocodataset.org/zips/val2017.zip) and [2017 Train/Val annotations 241MB](http://images.cocodataset.org/annotations/annotations_trainval2017.zip)

### Usage

For image_embedder.py:

1. Embed a single image (Print Only):
`python -m embed.image_embedder
  --image "C:\path\img.jpg"
  --no-save
`

2. Embed a single image (Save to File):
`python -m embed.image_embedder
  --image "C:\path\to\image.jpg"
  --out   "C:\project\embeddings\image_embeddings.pkl"
`

3. Embed a single folder of images:
`python -m embed.image_embedder
  --folder "C:\path\to\images"
  --out "C:\project\embeddings\image_embeddings.pkl"
  --batch-size 32
`

For text_embedder.py:
1. Embed a Single Article (Print Only):
`python -m embed.text_embedder
  --text "This is my single-article input string."
`

2. Embed a Single Article (Save to File):
`python -m embed.text_embedder
  --text "This is my single-article input string."
  --out  "C:\project\embeddings\text_embeddings.pkl"
`

3. Embed multiple articles from a file (one per line):
`python -m embed.text_embedder
  --file "C:\path\to\articles.txt"
  --out "C:\project\embeddings\text_embeddings.pkl"
  --batch-size 8
`