Skip to content

luomancs/retriever_reader_for_okvqa

Repository files navigation

Visual Dense Passage Retrieval (Vis-DPR)

License: MIT

This repo implements the idea in Weakly-Supervised Visual-Retriever-Reader for Knowledge-based Question Answering. The visual retriever is built upon Dense Passage Retrieveal(DPR) with new visual features and aim to retrieve knowledge to OKVQA (cite), a knowledge-based visual question answering benchmark. The visual reader is adapted from hugginface question answering example.

Features

  1. Caption-based retrieval: the caption is added after to the question.
  2. Image-based retrieval: the question encoder is based on LXMERT (cite), where a cross-representation of question and image is used as the question representation as original DPR.
  3. Visual extractive reader: the extractive reader based on RoBERTA (cite), where sepecial word "unanswerable" and a caption of an image is added in front of the context.

Installation

Installation from the source. Python's virtual or Conda environments are recommended.

git clone https://github.com/luomancs/retriever_reader_for_okvqa.git
cd retriever_reader_for_okvqa
pip install -r requirements.txt

Visual-DPR is tested on Python 3.7 and PyTorch 1.7.1.

Resources

Corpus

We provide four types of corpus, can be downloaded from google drive

  1. okvqa_train_corpus: the corpus is collected based on the training data. corpus size 112,724

  2. okvqa_full_corpus: the corpus is collected based on the training data and testing data 168,306

  3. okvqa_train_clean_corpus: the corpus is based on okvqa_train_corpus but filtered with similar process as T5, detailed process referred to paper. corpus size 111,412

  4. okvqa_full_clean_corpus: the corpus is based on okvqa_full_corpus with same cleannp method as corpus 3. corpus size 166,390

Training data: you need to prepare data for either retriever or reader training. Training data can be downloaded from here and testing data can be downloaded from here

  1. for caption-retriever: the training data is from OKVQA, where we use OSCAR to generate the caption for the corresponding image.
  2. for image-retriever: the image features is extracted using Mask-RCNN
  3. for reader: the training data includes the question from OKVQA and the knowledge from the corpus.

Petrained models

Caption-DPR and extractive reader can be downloaded from here

Retriever inference

Generating representation vectors for entire corpus.

python DPR/generate_dense_embeddings.py \
	model_file={path to biencoder checkpoint} \
	ctx_src={name of the passages resource} \
	shard_id={shard_num, 0-based} num_shards={total number of shards} \
	out_file={folder to save the indexing}	
	encoder=hf_bert   

ctx_src: one of the corpus name (see DPR/conf/ctx_sources/okvqa_sources.yaml file).

encoder: either hf_bert (caption-dpr) or hf_lxmert_bert (image-dpr)

You can download already generated corpus embeddings from our original caption-dpr model from google_drive

Retriever evaluation against the entire set of documents:

Retriever knowledge by Caption-DPR

python DPR/caption_dense_retriever.py \
	model_file={path to biencoder checkpoint} \
	qa_dataset=okvqa_test \
	ctx_datatsets=[{list of corpus sources}] \
	encoded_ctx_files=[{list of encoded document files glob expression, comma separated without spaces}] \
	out_file={path to output json file with results} 
	

Retriever knowledge by Caption-DPR

python DPR/image_dense_retriever.py \
	model_file={path to biencoder checkpoint} \
	qa_dataset=okvqa_test \
	ctx_datatsets=[{list of copurs sources}] \
	encoded_ctx_files=[{list of encoded document files glob expression, comma separated without spaces}] \
	out_file={path to output json file with results} 
	

EReader model inference

python evaluation/predict_answer.py \
--model_path {path to the EReader} \
--retrieve_kn_file {path to the retrieved knowledge given by the retriever} \
--prediction_save_path {path to save the prediction} \
--cuda_id 0 {-1 if evaluate on cpu}\

Citation

If you find this paper or this code useful, please cite this paper:

@inproceedings{luo2021weakly,
  title={Weakly-Supervised Visual-Retriever-Reader for Knowledge-based Question Answering},
  author={Luo, Man and Zeng, Yankai and Banerjee, Pratyay and Baral, Chitta},
  booktitle={Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing},
  pages={6417--6431},
  year={2021}
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages