This repo implements the idea in Weakly-Supervised Visual-Retriever-Reader for Knowledge-based Question Answering. The visual retriever is built upon Dense Passage Retrieveal(DPR) with new visual features and aim to retrieve knowledge to OKVQA (cite), a knowledge-based visual question answering benchmark. The visual reader is adapted from hugginface question answering example.
- Caption-based retrieval: the caption is added after to the question.
- Image-based retrieval: the question encoder is based on LXMERT (cite), where a cross-representation of question and image is used as the question representation as original DPR.
- Visual extractive reader: the extractive reader based on RoBERTA (cite), where sepecial word "unanswerable" and a caption of an image is added in front of the context.
Installation from the source. Python's virtual or Conda environments are recommended.
git clone https://github.com/luomancs/retriever_reader_for_okvqa.git
cd retriever_reader_for_okvqa
pip install -r requirements.txt
Visual-DPR is tested on Python 3.7 and PyTorch 1.7.1.
We provide four types of corpus, can be downloaded from google drive
-
okvqa_train_corpus: the corpus is collected based on the training data. corpus size 112,724
-
okvqa_full_corpus: the corpus is collected based on the training data and testing data 168,306
-
okvqa_train_clean_corpus: the corpus is based on okvqa_train_corpus but filtered with similar process as T5, detailed process referred to paper. corpus size 111,412
-
okvqa_full_clean_corpus: the corpus is based on okvqa_full_corpus with same cleannp method as corpus 3. corpus size 166,390
Training data: you need to prepare data for either retriever or reader training. Training data can be downloaded from here and testing data can be downloaded from here
- for caption-retriever: the training data is from OKVQA, where we use OSCAR to generate the caption for the corresponding image.
- for image-retriever: the image features is extracted using Mask-RCNN
- for reader: the training data includes the question from OKVQA and the knowledge from the corpus.
Caption-DPR and extractive reader can be downloaded from here
python DPR/generate_dense_embeddings.py \
model_file={path to biencoder checkpoint} \
ctx_src={name of the passages resource} \
shard_id={shard_num, 0-based} num_shards={total number of shards} \
out_file={folder to save the indexing}
encoder=hf_bert
ctx_src: one of the corpus name (see DPR/conf/ctx_sources/okvqa_sources.yaml file).
encoder: either hf_bert (caption-dpr) or hf_lxmert_bert (image-dpr)
You can download already generated corpus embeddings from our original caption-dpr model from google_drive
python DPR/caption_dense_retriever.py \
model_file={path to biencoder checkpoint} \
qa_dataset=okvqa_test \
ctx_datatsets=[{list of corpus sources}] \
encoded_ctx_files=[{list of encoded document files glob expression, comma separated without spaces}] \
out_file={path to output json file with results}
python DPR/image_dense_retriever.py \
model_file={path to biencoder checkpoint} \
qa_dataset=okvqa_test \
ctx_datatsets=[{list of copurs sources}] \
encoded_ctx_files=[{list of encoded document files glob expression, comma separated without spaces}] \
out_file={path to output json file with results}
python evaluation/predict_answer.py \
--model_path {path to the EReader} \
--retrieve_kn_file {path to the retrieved knowledge given by the retriever} \
--prediction_save_path {path to save the prediction} \
--cuda_id 0 {-1 if evaluate on cpu}\
If you find this paper or this code useful, please cite this paper:
@inproceedings{luo2021weakly,
title={Weakly-Supervised Visual-Retriever-Reader for Knowledge-based Question Answering},
author={Luo, Man and Zeng, Yankai and Banerjee, Pratyay and Baral, Chitta},
booktitle={Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing},
pages={6417--6431},
year={2021}
}