2025-06-13 07:07:33 +00:00
2025-06-13 15:00:33 +08:00
2025-06-13 15:00:33 +08:00
2025-06-13 07:07:33 +00:00
2025-06-13 07:21:58 +08:00
2025-06-13 15:00:33 +08:00

Img2Vec

A rough implementation of generating image embeddings through methodologies introduced in LLaVA

Structure

We derived the image embeddings by using a CLIP encoder and mapping it with the pretrained LLaVAs projection weight

Prerequisites

  1. install requirements.txt
  2. Make sure you have downloaded pytorch_model-00003-of-00003.bin
  3. For example image data, I use 2017 Val images 5K/1GB and 2017 Train/Val annotations 241MB

Usage

Replace image-dir and llava-ckpt to your test image folder addr and pytorch_model-00003-of-00003.bin addr and run:

python convert_images_to_vectors.py --image-dir ./datasets/coco/val2017 --output-dir imgVecs --vision-model openai/clip-vit-large-patch14-336 --proj-dim 5120 --llava-ckpt ./datasets/pytorch_model-00003-of-00003.bin --batch-size 64

Description
A image to vector implementation with the idea from LLaVA model
Readme 52 KiB
Languages
Python 100%