This repo provides the resource of our paper which aims to 1) introduce a new benchmark for multimodal-query retrieval task; 2) build an end-to-end multimodal retriever along with multimodal pretraining task. Check out our paper for more details.
A dataset for multimodal-query retriever. We turned WebQA into a multimodal query retrieval task by augmenting the WebQA questions and adding images to the questions as new multimodal-queries along with a large text-based courpus.
you can download the data from this Link.
We save the image into tsv format for efficient storage purpose. See read_tsv_img.ipynb notebook how to open an image.