Demonstrating and Reducing Shortcuts in Vision-Language Representation Learning

This repository contains the code for the TMLR paper Demonstrating and Reducing Shortcuts in Vision-Language Representation Learning, by Maurits Bleeker¹, Mariya Hendriksen¹, Andrew Yates¹, and Maarten de Rijke¹.

The implementation builds upon the codebase of Latent Target Decoding.

¹University of Amsterdam, The Netherlands

News

Jul 2024: The paper has been accepted by TMLR
Feb 2024: Initial arXiv release

Requirements

To set up the environment, install the requirements using the provided YAML file:

conda env create --file src/environment.yaml

This command will create a conda environment contrastive-shortcuts. Activate the created environment:

source activate contrastive-shortcuts

Training the models

For local development, execute the following command:

python src/trainer.py --yaml_file src/configs/{f30k, coco}/development_local.yaml

To train a model run python src/trainer.py and provide a base config in YAML format using --yaml_file <config path.yaml>.

Hyperparameters can be overridden using command line flags. For example:

python src/trainer.py --yaml_file src/configs/f30k/development_local.yaml --experiment.wandb_project <your project name>

The recommended approach is to have a fixed base config for each experiment and only modify specific hyperparameters for different training/evaluation settings.

All training and evaluation were conducted using a SLURM-based scheduling system.

Data loading and preparation

We implemented a PyTorch Dataloader class that loads the images from the memory of the compute node the training runs on. The captions are loaded from either the Flickr30k or MS-COCO annotation file.

Update the *.yaml config with the right file paths.

img_path:
annotation_file:
annotation_path:

Vocabulary class

To create the vocabulary class, run:

python utils/vocab.py

With the appropriate input flags.

Job files

Job and hyperparameter files to reproduce experiments can be found in src/jobs/{coco, f30k}/.

The shortcut experiments (Section 4) are available in the shortcuts folder, the LTD experiments in the LTD folder, and the IFM experiments in the 'IFM' folder (Section 6).

Evaluation

To reproduce results from Section 3, run the following evaluation script (ensure correct file paths).

sbatch src/jobs/{coco, f30k}/snellius/shortcuts/{clip, vse}/{clip, vse}_{coco, f30k}_shortcut_experiments_eval.job

Next, copy all the RSUM values to notebooks/visualizations/visualization.ipynb to generate the plot.

The results from Section 6 are generated by using notebooks/Evaluation.ipynb.

Citing and Authors

If you find this repository helpful, feel free to cite our paper "Demonstrating and Reducing Shortcuts in Vision-Language Representation Learning":

@article{bleeker-2024-demonstrating,
  title={Demonstrating and Reducing Shortcuts in Vision-Language Representation Learning},
  author={Bleeker, Maurits and Hendriksen, Mariya and Yates, Andrew and de Rijke, Maarten},
  journal={Transactions on Machine Learning Research},
  url={https://openreview.net/forum?id=gfANevPraH},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
src		src
README.md		README.md
environment.txt		environment.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Demonstrating and Reducing Shortcuts in Vision-Language Representation Learning

News

Requirements

Training the models

Data loading and preparation

Vocabulary class

Job files

Evaluation

Citing and Authors

About

Releases

Packages

Contributors 2

Languages

MauritsBleeker/svl-framework

Folders and files

Latest commit

History

Repository files navigation

Demonstrating and Reducing Shortcuts in Vision-Language Representation Learning

News

Requirements

Training the models

Data loading and preparation

Vocabulary class

Job files

Evaluation

Citing and Authors

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages