Object-centric vs. Scene-centric Image-Text Cross-modal Retrieval

This repository contains the code used for the experiments in "Object-centric vs. Scene-centric Image-Text Cross-modal Retrieval: A Reproducibility Study" published at ECIR 2023.

License

The contents of this repository are licensed under the MIT license. If you modify its contents in any way, please link back to this repository.

Reproducing Experiments

First off, install the dependencies:

pip install -r requirements.txt

Download the data

Download the data from this repository. The repository contains two zip files: CLIP_data.zip and X-VLM_data.zip

After unzipping CLIP_data put the resulting data folder in the CLIP folder:

CLIP/
    data/
        datasets/
        results/

For evaluating X-VLM, we need to have access to the original images from CUB-200 (CUB), Amazon Berkley Objects (ABO), Fashion200k, MS COCO, and Flickr30k. We cannot redistribute the images, therefore, we ask you to download the images yourself. The images should be added to the X-VLM/image directory, each dataset in its own subfolder folder. Overall the X-VLM folder should organized as follows:

X-VLM/
    data/
        json_splits/
        models/
    images/
        cub/
            001.Black_footed_Albatross/*.jpg
            ...
            174.Palm_Warbler/*.jpg
        fashion200k/
            women/
                dresses/*.jpg
                jackets/*.jpg
                pants/*.jpg
                skirts/*.jpg
                tops/*.jpg
        abo/
            00/*.jpg
            ...
            fe/*.jpg
        mscoco/*.jpg
        flickr30k/*.jpg

Evaluate the models

Evaluate each model on each of the five datasets.

# CLIP evaluation
sh CLIP/jobs/evaluation/evaluate_cub.job
sh CLIP/jobs/evaluation/evaluate_abo.job
sh CLIP/jobs/evaluation/evaluate_fashion200k.job
sh CLIP/jobs/evaluation/evaluate_mscoco.job
sh CLIP/jobs/evaluation/evaluate_flickr30k.job
# printing the results for CLIP in one file
sh CLIP/jobs/postprocessing/results_printer.job

# X-VLM evaluation
sh X-VLM/jobs/evaluation/evaluate_flickr30k.job 
sh X-VLM/jobs/evaluation/evaluate_cub.job 
sh X-VLM/jobs/evaluation/evaluate_abo.job 
sh X-VLM/jobs/evaluation/evaluate_fashion200k.job 
sh X-VLM/jobs/evaluation/evaluate_mscoco.job 
# printing the results for X-VLM in one file
sh X-VLM/jobs/postprocessing/results_printer.job

Citing and Authors

If you find this repository helpful, feel free to cite our paper "Object-centric vs. Scene-centric Cross-modal Retrieval: A Reproducibility Study":

@inproceedings{hendriksen-2023-object-centric,
author = {Hendriksen, Mariya and Vakulenko, Svitlana and Kuiper, Ernst and de Rijke, Maarten},
booktitle = {ECIR 2023: 45th European Conference on Information Retrieval},
month = {April},
publisher = {Springer},
title = {Scene-centric vs. Object-centric Image-Text Cross-modal Retrieval: A Reproducibility Study},
year = {2023}}

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
CLIP		CLIP
X-VLM		X-VLM
images		images
.gitignore		.gitignore
README.md		README.md
clip-ita.png		clip-ita.png
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Object-centric vs. Scene-centric Image-Text Cross-modal Retrieval

License

Reproducing Experiments

Download the data

Evaluate the models

Citing and Authors

About

Releases

Packages

Languages

mariyahendriksen/ecir23-object-centric-vs-scene-centric-CMR

Folders and files

Latest commit

History

Repository files navigation

Object-centric vs. Scene-centric Image-Text Cross-modal Retrieval

License

Reproducing Experiments

Download the data

Evaluate the models

Citing and Authors

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages