text2img/readme.md
2025-06-18 08:12:02 +00:00

1.7 KiB
Raw Permalink Blame History

Text2Img

A rough implementation of generating image embeddings through methodologies introduced in LLaVA

Structure

We derived the image embeddings by using a CLIP encoder and mapping it with the pretrained LLaVAs projection weights layer

Prerequisites

  1. install requirements.txt
  2. Make sure you have llava-v1.5-mlp2x-336px-pretrain-vicuna-7b-v1.5 under your models folder.
  3. For example image data, I used 2017 Val images 5K/1GB and 2017 Train/Val annotations 241MB

Usage

For image_embedder.py:

  1. Embed a single image (Print Only): python -m embed.image_embedder --image "C:\path\img.jpg" --no-save

  2. Embed a single image (Save to File): python -m embed.image_embedder --image "C:\path\to\image.jpg" --out "C:\project\embeddings\image_embeddings.pkl"

  3. Embed a single folder of images: python -m embed.image_embedder --folder "C:\path\to\images" --out "C:\project\embeddings\image_embeddings.pkl" --batch-size 32

For text_embedder.py:

  1. Embed a Single Article (Print Only): python -m embed.text_embedder --text "This is my single-article input string."

  2. Embed a Single Article (Save to File): python -m embed.text_embedder --text "This is my single-article input string." --out "C:\project\embeddings\text_embeddings.pkl"

  3. Embed multiple articles from a file (one per line): python -m embed.text_embedder --file "C:\path\to\articles.txt" --out "C:\project\embeddings\text_embeddings.pkl" --batch-size 8