Img2Vec/readme.md
2025-06-13 07:13:03 +00:00

18 lines
1.0 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Img2Vec
A rough implementation of generating image embeddings through methodologies introduced in LLaVA
### Structure
We derived the image embeddings by using a CLIP encoder and mapping it with the pretrained LLaVAs projection weight
### Prerequisites
1. install requirements.txt
2. Make sure you have downloaded [pytorch_model-00003-of-00003.bin](https://huggingface.co/liuhaotian/LLaVA-13b-delta-v1-1/blob/main/pytorch_model-00003-of-00003.bin")
3. For example image data, I use [2017 Val images 5K/1GB](http://images.cocodataset.org/zips/val2017.zip) and [2017 Train/Val annotations 241MB](http://images.cocodataset.org/annotations/annotations_trainval2017.zip)
### Usage
Replace **image-dir** and **llava-ckpt** to your **test image folder addr** and **pytorch_model-00003-of-00003.bin addr** and run:
`python starter.py --image-dir ./datasets/coco/val2017 --output-dir imgVecs --vision-model openai/clip-vit-large-patch14-336 --proj-dim 5120 --llava-ckpt ./datasets/pytorch_model-00003-of-00003.bin --batch-size 64`