Authors: Milton L. Montero, Jeffrey S. Bowers, Rui Ponte Costa, Casimir J.H. Ludwig and Gaurav Malhotra.
Abstract: Recent research has shown that generative models with highly disentangled representations fail to generalise to unseen combination of generative factor values. These findings contradict earlier research which showed improved performance in out-of-training distribution settings when compared to entangled representations. Additionally, it is not clear if the reported failures are due to (a) encoders failing to map novel combinations to the proper regions of the latent space, or (b) novel combinations being mapped correctly but the decoder being unable to render the correct output for the unseen combinations. We investigate these alternatives by testing several models on a range of datasets and training settings. We find that (i) when models fail, their encoders also fail to map unseen combinations to correct regions of the latent space and (ii) when models succeed, it is either because the test conditions do not exclude enough examples, or because excluded cases involve combinations of object properties with its shape. We argue that to generalise properly, models not only need to capture factors of variation, but also understand how to invert the process that causes the visual input.
This repo contains the code necessary to run the experiments for the article. The code was tested on Python 3.9.12 and PyTorch 1.11. There are implementations for:
- three models:
- CompositionNet: Solves the composition task using a variational autoencoder backbone.
- CascadeVAE: Uses continuous and discrete variables in its latent space.
- LieGroupVAE: Models interactions between latent variables using Group Theory.
- losses to train them:
- VAE: Penalize conditional posterior.
- $\beta$-VAE: Add capacity constrain.
- WAE: Penalize marginal posterior.
- Information Cascade: Progressively allows latent variables to become non-zero.
- five datasets to test the models on:
- dSprites: Simple, uniform sprites on black background to which several transformations are applied.
- 3DShapes: 3D scenes with one object in a room observerd from different prespectives.
- MPI3D: Different frames of objects being manipulated by a robot arm.
- Circles: Dataset consisting of a circle in different positions of an image.
- Simple: Extension of the circles dataset containing two shapes instead of one.
Is is technically possible to train other common unsupervised models (like standard VAEs and
Running these experiments requires (among others) the following libraries installed:
- PyTorch and Torchvision: Basic framework for Deep Learning models and training.
- Ignite: High-level framework to train models, eliminating the need for much boilerplate code.
- Sacred: Libary used to define and run experiments in a systematic way.
- Matplotlib: For plotting.
- Jupyter: To produce the plots.
We recommend using the provided environment configuration file and intalling using:
conda env create -f torchlab-env.yml
The repository is organized as follows:
data/
├── raw/
├── dsprites/
├── shapes3d/
├── mpi/
├── ....
├── mpi3d_real.npz
├── sims/
├── disent/ # Runs will be added here, Sacred will asign names as integers of increasing value
├── composition/
scripts/
├── configs/
├── vaes.py # An example config file with VAE architectures.
├── ingredients/
├── models.py # Example ingredient that wrapps model initalization
├── experiments/
├── composition.py # Experiment script for training disentangled models
src/
├── analysis/ # These folders contain the actual datasets, losses, model classes etc.
├── dataset/
├── models/
├── training/
The data structure should be self explanatory for the most part. The main thing to note is that src
contains code for models that are used throughout the experiments while the ingredients contain wrappers around these to initialize them from the configuration files. Simulation results will be saved in sims. The results of the analysis were stored in a new folder (results
, not shown). We attempted to use models with the hightes disentanglement in our analysis.
Datasets should appear in a subfolder as shown above. Right now, there is not method for automatically downloading the data, but they can be found in their corresponding repos. Alternatively, altering the source file or passing the dataset root as a parameter can be used to look for the datasets in another location1.
The configuration folder has the different parameters combinations used in the experiments. Following these should allow someone to define new experiments easily. Just remember to add the configurations to the appropriate ingredient using ingredient.named_config(config_function/yaml_file)
.
To run an experiment you should execute one of the scripts from the scripts folder with the appropraite options. We use Sacred to run and track experimetns. You can check the online documentation to understand how it works. Below you is the general command used and more can be found in the bin
folder.
cd ~/path/to/project/scripts/
python -m experiments.composition with dataset.<option> model.<option> training.<option>
Sacred allows passing parameters using keyword arguments. For example we can change the latent size and
python -m experiments.composition with dataset.dsprites model.kim training.factor model.latent_size=50 training.loss.params.beta=10
We would like to thank everyone who gave feedback on this research, especially the members of the Mind and Machine Research Lab and Neural and Machine Learning Group.
This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No. 741134).
If the code here helps with your research, please cite it as:
@article{montero2022lost,
title={Lost in Latent Space: Disentangled Models and the Challenge of Combinatorial Generalisation},
author={Montero, Milton L and Bowers, Jeffrey S and Costa, Rui Ponte and Ludwig, Casimir JH and Malhotra, Gaurav},
journal={arXiv preprint arXiv:2204.02283},
year={2022}
}
Footnotes
-
I might add code to automatically download the datasets and create the folders, but only if I have time. ↩