Evaluation Code of Img2Prompt

From Images to Textual Prompts: Zero-shot VQA with Frozen Large Language Models

This is the eveluation code for Img2Prompt-VQA paper. We public it evaluation codes.

Demo

We include an interactive demo Colab notebook to show Img2Prompt-VQA inference workflow:

Image-question matching: compute the relevancy score of the image patches wrt the question, and remove the generated noisy captions with low relevancy score.
Image captioning: generate question-guided captions based on the relevancy score.
Question Generation: generate questions based on the synthetic answers and captions.
Large Language Model: Pre-trained lagre language models, e.g. OPT/GPT-3

Zero-Shot Evaluation

Model	End-to-End Training?	VQAv2 val	VQAv2 test	OK-VQA test	AOK-VQA val	AOK-VQA test
Model	End-to-End Training?	Frozen-7B	✓	29.5	-	5.9	-	-
Flamingo-9B	✓	-	51.8	44.7	-	-
Flamingo-80B	✓	-	56.3	50.6	-	-
Img2Prompt-VQA-OPT_13B	x	57.1	57.3	39.9	33.3	33.0
Img2Prompt-VQA-OPT_30B	x	59.5	60.4	41.8	36.9	36.0
Img2Prompt-VQA-OPT_66B	x	59.9	60.3	43.2	38.7	38.2
Img2Prompt-VQA-OPT_175B	x	60.6	61.9	45.6	42.9	40.7

To reproduce these evaluation results of Img2LLM-VQA with different LLMs, you can follow the next steps:

Firstly, you should download the generated caption question files from this link, and put them in the caption_question_files folder. For example, you can download 'okvqa_question.json', 'okvqa_caption.json' and 'okvqa_ans_to_cap_dict.json' for reproducing results of okvqa results.

Then download the 2014_coco val anotation file in link, and put it in annotation_new folder

Then you can run the shell in folder VL_captioning to reproduce results, e.g.

$ ./run_okvqa.sh

Citation

If you find this code to be useful for your research, please consider citing.

@article{guo2022images,
  title={From Images to Textual Prompts: Zero-shot VQA with Frozen Large Language Models},
  author={Guo, Jiaxian and Li, Junnan and Li, Dongxu and Tiong, Anthony Meng Huat and Li, Boyang and Tao, Dacheng and Hoi, Steven CH},
  journal={arXiv preprint arXiv:2212.10846},
  year={2022}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Evaluation Code of Img2Prompt

From Images to Textual Prompts: Zero-shot VQA with Frozen Large Language Models

Demo

Zero-Shot Evaluation

Citation

Files

README.md

Latest commit

History

README.md

File metadata and controls

Evaluation Code of Img2Prompt

From Images to Textual Prompts: Zero-shot VQA with Frozen Large Language Models

Demo

Zero-Shot Evaluation

Citation