2025-06-18 16:07:02 +08:00
2025-06-18 16:07:02 +08:00
2025-06-18 16:07:02 +08:00
2025-06-18 16:07:02 +08:00
2025-06-18 16:07:02 +08:00

Text2Img

A rough implementation of generating image embeddings through methodologies introduced in LLaVA

Structure

We derived the image embeddings by using a CLIP encoder and mapping it with the pretrained LLaVAs projection weights layer

Prerequisites

  1. install requirements.txt
  2. Make sure you have downloaded pytorch_model-00003-of-00003.bin
  3. For example image data, I use 2017 Val images 5K/1GB and 2017 Train/Val annotations 241MB

Usage

For image_embedder.py:

  1. Embed a single image (Print Only): python -m embed.image_embedder --image "C:\path\img.jpg" --no-save

  2. Embed a single image (Save to File): python -m embed.image_embedder --image "C:\path\to\image.jpg" --out "C:\project\embeddings\image_embeddings.pkl"

  3. Embed a single folder of images: python -m embed.image_embedder --folder "C:\path\to\images" --out "C:\project\embeddings\image_embeddings.pkl" --batch-size 32

For text_embedder.py:

  1. Embed a Single Article (Print Only): python -m embed.text_embedder --text "This is my single-article input string."

  2. Embed a Single Article (Save to File): python -m embed.text_embedder --text "This is my single-article input string." --out "C:\project\embeddings\text_embeddings.pkl"

  3. Embed multiple articles from a file (one per line): python -m embed.text_embedder --file "C:\path\to\articles.txt" --out "C:\project\embeddings\text_embeddings.pkl" --batch-size 8

Description
No description provided
Readme 35 MiB
Languages
Python 100%