This repository contains the code used for the experiments in "Object-centric vs. Scene-centric Image-Text Cross-modal Retrieval: A Reproducibility Study" published at ECIR 2023.
The contents of this repository are licensed under the MIT license. If you modify its contents in any way, please link back to this repository.
First off, install the dependencies:
pip install -r requirements.txt
Download the data from this repository. The repository contains two zip files: CLIP_data.zip
and X-VLM_data.zip
After unzipping CLIP_data
put the resulting data
folder in the CLIP
folder:
CLIP/
data/
datasets/
results/
For evaluating X-VLM, we need to have access to the original images from CUB-200 (CUB), Amazon Berkley Objects (ABO), Fashion200k, MS COCO, and Flickr30k. We cannot redistribute the images, therefore, we ask you to download the images yourself. The images should be added to the X-VLM/image
directory, each dataset in its own subfolder folder.
Overall the X-VLM folder should organized as follows:
X-VLM/
data/
json_splits/
models/
images/
cub/
001.Black_footed_Albatross/*.jpg
...
174.Palm_Warbler/*.jpg
fashion200k/
women/
dresses/*.jpg
jackets/*.jpg
pants/*.jpg
skirts/*.jpg
tops/*.jpg
abo/
00/*.jpg
...
fe/*.jpg
mscoco/*.jpg
flickr30k/*.jpg
Evaluate each model on each of the five datasets.
# CLIP evaluation
sh CLIP/jobs/evaluation/evaluate_cub.job
sh CLIP/jobs/evaluation/evaluate_abo.job
sh CLIP/jobs/evaluation/evaluate_fashion200k.job
sh CLIP/jobs/evaluation/evaluate_mscoco.job
sh CLIP/jobs/evaluation/evaluate_flickr30k.job
# printing the results for CLIP in one file
sh CLIP/jobs/postprocessing/results_printer.job
# X-VLM evaluation
sh X-VLM/jobs/evaluation/evaluate_flickr30k.job
sh X-VLM/jobs/evaluation/evaluate_cub.job
sh X-VLM/jobs/evaluation/evaluate_abo.job
sh X-VLM/jobs/evaluation/evaluate_fashion200k.job
sh X-VLM/jobs/evaluation/evaluate_mscoco.job
# printing the results for X-VLM in one file
sh X-VLM/jobs/postprocessing/results_printer.job
If you find this repository helpful, feel free to cite our paper "Object-centric vs. Scene-centric Cross-modal Retrieval: A Reproducibility Study":
@inproceedings{hendriksen-2023-object-centric,
author = {Hendriksen, Mariya and Vakulenko, Svitlana and Kuiper, Ernst and de Rijke, Maarten},
booktitle = {ECIR 2023: 45th European Conference on Information Retrieval},
month = {April},
publisher = {Springer},
title = {Scene-centric vs. Object-centric Image-Text Cross-modal Retrieval: A Reproducibility Study},
year = {2023}}