This is the eveluation code for Img2Prompt-VQA paper. We public it evaluation codes.
We include an interactive demo Colab notebook to show Img2Prompt-VQA inference workflow:
- Image-question matching: compute the relevancy score of the image patches wrt the question, and remove the generated noisy captions with low relevancy score.
- Image captioning: generate question-guided captions based on the relevancy score.
- Question Generation: generate questions based on the synthetic answers and captions.
- Large Language Model: Pre-trained lagre language models, e.g. OPT/GPT-3
Model | End-to-End Training? | VQAv2 val | VQAv2 test | OK-VQA test | AOK-VQA val | AOK-VQA test |
---|---|---|---|---|---|---|
Frozen-7B | ✓ | 29.5 | - | 5.9 | - | - |
Flamingo-9B | ✓ | - | 51.8 | 44.7 | - | - |
Flamingo-80B | ✓ | - | 56.3 | 50.6 | - | - |
Img2Prompt-VQA-OPT13B | x | 57.1 | 57.3 | 39.9 | 33.3 | 33.0 |
Img2Prompt-VQA-OPT30B | x | 59.5 | 60.4 | 41.8 | 36.9 | 36.0 |
Img2Prompt-VQA-OPT66B | x | 59.9 | 60.3 | 43.2 | 38.7 | 38.2 |
Img2Prompt-VQA-OPT175B | x | 60.6 | 61.9 | 45.6 | 42.9 | 40.7 |
To reproduce these evaluation results of Img2LLM-VQA with different LLMs, you can follow the next steps:
Firstly, you should download the generated caption question files from this link, and put them in the caption_question_files
folder. For example, you can download 'okvqa_question.json', 'okvqa_caption.json' and 'okvqa_ans_to_cap_dict.json' for reproducing results of okvqa results.
Then download the 2014_coco val anotation file in link, and put it in annotation_new
folder
Then you can run the shell in folder VL_captioning to reproduce results, e.g.
$ ./run_okvqa.sh
If you find this code to be useful for your research, please consider citing.
@article{guo2022images,
title={From Images to Textual Prompts: Zero-shot VQA with Frozen Large Language Models},
author={Guo, Jiaxian and Li, Junnan and Li, Dongxu and Tiong, Anthony Meng Huat and Li, Boyang and Tao, Dacheng and Hoi, Steven CH},
journal={arXiv preprint arXiv:2212.10846},
year={2022}
}