This is the official implementation of "Training Diverse High-Dimensional Controllers by Scaling Covariance Matrix Adaptation MAP-Annealing" by Bryon Tjanaka, Matthew C. Fontaine, David H. Lee, Aniruddha Kalkar, and Stefanos Nikolaidis.
For more info, visit the following links:
To cite this paper, please use the following bibtex:
@misc{tjanaka2023training,
title={Training Diverse High-Dimensional Controllers by Scaling Covariance Matrix Adaptation MAP-Annealing},
author={Bryon Tjanaka and Matthew C. Fontaine and David H. Lee and Aniruddha Kalkar and Stefanos Nikolaidis},
year={2023},
eprint={2210.02622},
archivePrefix={arXiv},
primaryClass={cs.RO}
}
We primarily use the pyribs library in this implementation. If you use this code in your research, please also cite pyribs:
@inproceedings{10.1145/3583131.3590374,
author = {Tjanaka, Bryon and Fontaine, Matthew C and Lee, David H and Zhang, Yulun and Balam, Nivedit Reddy and Dennler, Nathaniel and Garlanka, Sujay S and Klapsis, Nikitas Dimitri and Nikolaidis, Stefanos},
title = {Pyribs: A Bare-Bones Python Library for Quality Diversity Optimization},
year = {2023},
isbn = {9798400701191},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3583131.3590374},
doi = {10.1145/3583131.3590374},
abstract = {Recent years have seen a rise in the popularity of quality diversity (QD) optimization, a branch of optimization that seeks to find a collection of diverse, high-performing solutions to a given problem. To grow further, we believe the QD community faces two challenges: developing a framework to represent the field's growing array of algorithms, and implementing that framework in software that supports a range of researchers and practitioners. To address these challenges, we have developed pyribs, a library built on a highly modular conceptual QD framework. By replacing components in the conceptual framework, and hence in pyribs, users can compose algorithms from across the QD literature; equally important, they can identify unexplored algorithm variations. Furthermore, pyribs makes this framework simple, flexible, and accessible, with a user-friendly API supported by extensive documentation and tutorials. This paper overviews the creation of pyribs, focusing on the conceptual framework that it implements and the design principles that have guided the library's development. Pyribs is available at https://pyribs.org},
booktitle = {Proceedings of the Genetic and Evolutionary Computation Conference},
pages = {220–229},
numpages = {10},
keywords = {framework, quality diversity, software library},
location = {Lisbon, Portugal},
series = {GECCO '23}
}
- Manifest
- Getting Started
- Running Experiments
- Ablation on Archive Learning Rate
- Optimization Benchmarks
- Results
- Implementation
- Miscellaneous
- License
config/
: gin configuration files.docs/
: Additional documentation.src/
: Python implementations and related tools.scripts/
: Bash scripts.optimization_benchmarks/
: Benchmarks on optimization problems.
- Clone the repo:
git clone https://github.com/icaros-usc/scaling-cma-mae.git --recurse-submodules
- Install Singularity: Our primary code runs in a Singularity / Apptainer container. See here to install Singularity.
- Build or download the container: Build the Singularity container with
Alternatively, download the container here and place it in the root directory of this repo.
sudo make container.sif
- (Optional) Install NVIDIA drivers and CUDA: The node where the main script runs should have a GPU with NVIDIA drivers and CUDA installed (we have not included CUDA in the container). This is only necessary if you are running algorithms which use TD3.
There are two commands for running experiments.
scripts/run_local.sh
runs on a local machine:Wherebash scripts/run_local.sh CONFIG SEED NUM_WORKERS
CONFIG
is a gin file inconfig/
,SEED
is a random integer seed (we used values from 1-100), andNUM_WORKERS
is the number of worker processes.scripts/run_slurm.sh
runs on a SLURM cluster:Here,bash scripts/run_slurm.sh CONFIG SEED HPC_CONFIG
HPC_CONFIG
is the path to a config inconfig/hpc
. It specifies the number of nodes on the cluster and the number of workers per node.
In our paper, we evaluated three CMA-MAE variants (sep-CMA-MAE, LM-MA-MAE, OpenAI-MAE) and five baselines (CMA-MEGA (ES), CMA-MEGA (TD3, ES), PGA-MAP-Elites, ME-ES, MAP-Elites) in four environments (QD Ant, QD Half-Cheetah, QD Hopper, QD Walker). We have included config files for all of these experiments. To replicate results from the paper, you will need to run each of the following commands with different random seeds.
# QD Ant
bash scripts/run_slurm.sh config/qd_ant/sep_cma_mae.gin SEED config/hpc/100.sh
bash scripts/run_slurm.sh config/qd_ant/lm_ma_mae.gin SEED config/hpc/100.sh
bash scripts/run_slurm.sh config/qd_ant/openai_mae.gin SEED config/hpc/100.sh
bash scripts/run_slurm.sh config/qd_ant/cma_mega_es.gin SEED config/hpc/100.sh
bash scripts/run_slurm.sh config/qd_ant/cma_mega_td3_es.gin SEED config/hpc/100_gpu.sh
bash scripts/run_slurm.sh config/qd_ant/pga_me.gin SEED config/hpc/100_gpu.sh
bash scripts/run_slurm.sh config/qd_ant/me_es.gin SEED config/hpc/100_high_mem.sh
bash scripts/run_slurm.sh config/qd_ant/map_elites.gin SEED config/hpc/100.sh
# QD Half-Cheetah
bash scripts/run_slurm.sh config/qd_half_cheetah/sep_cma_mae.gin SEED config/hpc/100.sh
bash scripts/run_slurm.sh config/qd_half_cheetah/lm_ma_mae.gin SEED config/hpc/100.sh
bash scripts/run_slurm.sh config/qd_half_cheetah/openai_mae.gin SEED config/hpc/100.sh
bash scripts/run_slurm.sh config/qd_half_cheetah/cma_mega_es.gin SEED config/hpc/100.sh
bash scripts/run_slurm.sh config/qd_half_cheetah/cma_mega_td3_es.gin SEED config/hpc/100_gpu.sh
bash scripts/run_slurm.sh config/qd_half_cheetah/pga_me.gin SEED config/hpc/100_gpu.sh
bash scripts/run_slurm.sh config/qd_half_cheetah/me_es.gin SEED config/hpc/100_high_mem.sh
bash scripts/run_slurm.sh config/qd_half_cheetah/map_elites.gin SEED config/hpc/100.sh
# QD Hopper
bash scripts/run_slurm.sh config/qd_hopper/sep_cma_mae.gin SEED config/hpc/100.sh
bash scripts/run_slurm.sh config/qd_hopper/lm_ma_mae.gin SEED config/hpc/100.sh
bash scripts/run_slurm.sh config/qd_hopper/openai_mae.gin SEED config/hpc/100.sh
bash scripts/run_slurm.sh config/qd_hopper/cma_mega_es.gin SEED config/hpc/100.sh
bash scripts/run_slurm.sh config/qd_hopper/cma_mega_td3_es.gin SEED config/hpc/100_gpu.sh
bash scripts/run_slurm.sh config/qd_hopper/pga_me.gin SEED config/hpc/100_gpu.sh
bash scripts/run_slurm.sh config/qd_hopper/me_es.gin SEED config/hpc/100_high_mem.sh
bash scripts/run_slurm.sh config/qd_hopper/map_elites.gin SEED config/hpc/100.sh
# QD Walker
bash scripts/run_slurm.sh config/qd_walker/sep_cma_mae.gin SEED config/hpc/100.sh
bash scripts/run_slurm.sh config/qd_walker/lm_ma_mae.gin SEED config/hpc/100.sh
bash scripts/run_slurm.sh config/qd_walker/openai_mae.gin SEED config/hpc/100.sh
bash scripts/run_slurm.sh config/qd_walker/cma_mega_es.gin SEED config/hpc/100.sh
bash scripts/run_slurm.sh config/qd_walker/cma_mega_td3_es.gin SEED config/hpc/100_gpu.sh
bash scripts/run_slurm.sh config/qd_walker/pga_me.gin SEED config/hpc/100_gpu.sh
bash scripts/run_slurm.sh config/qd_walker/me_es.gin SEED config/hpc/100_high_mem.sh
bash scripts/run_slurm.sh config/qd_walker/map_elites.gin SEED config/hpc/100.sh
To run locally, replace run_slurm.sh
with run_local.sh
and pass a number of
workers instead of an HPC config:
bash scripts/run_local.sh config/qd_ant/sep_cma_mae.gin SEED 100
Regardless of whether experiments are run locally or on a cluster, all results
are placed in a logging directory under logs/
. The directory's name is of
the form logs/%Y-%m-%d_%H-%M-%S_dashed-name
, e.g.
logs/2020-12-01_15-00-30_experiment-1
. Refer to the
logging directory manifest for a list of files in
the directory. run_local.sh
and run_slurm.sh
additionally output a separate
directory which stores the stdout of the scheduler and workers; see
Running Locally and Running on Slurm
for more info.
Refer to src/analysis/figures.py
and src/analysis/supplemental.py
for how to
analyze results and generate figures.
The remainder of this section provides useful info for running experiments.
Each logging directory contains the following files:
- config.gin # All experiment config variables, lumped into one file.
- seed # Text file containing the seed for the experiment.
- reload.pkl # Data necessary to reload the experiment if it fails.
- reload_td3.pkl # Pickle data for TD3 (only applicable in some experiments).
- reload_td3.pth # PyTorch models for TD3 (only applicable in some experiments).
- metrics.json # Metrics like QD score; intended for MetricLogger.
- all_results.pkl # All returns and BCs from function evaluations during the run.
- hpc_config.sh # Same as the config in the Slurm dir, if Slurm is used.
- archive/ # Snapshots of the full archive, including solutions and
# metadata, in pickle format.
- archive_history.pkl # Stores objective values and behavior values necessary
# to reconstruct the archive. Solutions and metadata are
# excluded to save memory.
- dashboard_status.txt # Job status which can be picked up by dashboard scripts.
# Only used during execution.
- local_YYYY-MM-DD_HH-MM-SS/ # Local log dir (only exists if running locally).
- experiment.out # Output of the experiment.
- scheduler.out # Output of the scheduler.
- workers.out # Output of the workers.
- logdir # File containing the name of the main logdir.
- slurm_YYYY-MM-DD_HH-MM-SS/ # Slurm log dir (only exists if using Slurm).
# There can be a few of these if there were reloads.
- config/
- [config].sh # Copied from `config/hpc`
- job_ids.txt # Job IDs; can be used to cancel job (scripts/slurm_cancel.sh).
- logdir # File containing the name of the main logdir.
- scheduler.slurm # Slurm script for scheduler and experiment invocation.
- scheduler.out # stdout and stderr from running scheduler.slurm.
- worker-{i}.slurm # Slurm script for worker i.
- worker-{i}.out # stdout and stderr for worker i.
In addition to a logging directory,
run_slurm.sh
outputs a Slurm directory with items like the content of stdout on scheduler and workers. To move these into the logging directory, runslurm_postprocess.sh
(see below).
There are a number of helpful utilities associated with Slurm scripts. These
reminders are output on the command line by run_slurm.sh
after it executes:
tail -f ...
- Use this to monitor stdout and stderr of the main experiment script.bash scripts/slurm_cancel.sh ...
- This will cancel the job.ssh -N ...
- This will set up a tunnel from the HPC to your laptop so you can monitor the Dask dashboard. Run this on your local machine.bash scripts/slurm_postprocess.sh ...
- This will move the slurm logs into the logging directory. Run it after the experiment has finished.
You can monitor the status of your slurm experiments with:
watch scripts/slurm_dashboard.sh
Since the dashboard output can be quite long, it can be useful to scroll through
it. For this, consider an alternative to watch
, such as
viddy.
Similar to run_slurm.sh
, run_local.sh
outputs a local directory with items
like the content of stdout on scheduler and workers. To move these into the
logging directory, run local_postprocess.sh
.
You can monitor the status of your slurm experiments with:
watch scripts/local_dashboard.sh
To test an experiment configuration with smaller settings, add _test
to the
end of a name, e.g. config/qd_ant/cma_mega_es.gin_test
. Then, the original
config (config/qd_ant/cma_mega_es.gin
) and config/test.gin
will be included.
While the experiment is running, its state is saved to "reload files" (AKA checkpoints) in the logging directory. If the experiment fails, e.g. due to memory limits, time limits, or network connection issues, run this command with the name of the existing logging directory:
bash scripts/slurm_reload.sh LOGDIR
This will continue the job with the exact same configurations as before. For
finer-grained control, refer to the -r
flag in run_slurm.sh
. run_local.sh
also provides an option for reloading:
bash scripts/run_local.sh CONFIG SEED NUM_WORKERS LOGDIR
Corrected metrics (we call these Robustness in our code) are computed after
experiments are completed with src/analysis/robustness.py
through the shell
script scripts/run_robustness_local.sh
. Running the script outputs several
files with outputs from the robustness computations in each logging directory.
Analysis of these outputs may be done with src/analysis/figures.py
as follows:
# Aggregates logging directory data into `figure_data_robust.json`
python -m src.analysis.figures collect [YOUR_MANIFEST_FILE] --robust --output figure_data_robust.json
# We don't run `comparison` as there is no intermediate data to plot.
# Statistical analysis; output in `stats_tests_robust`
python -m src.analysis.figures tests figure_data_robust.json --output stats_tests_robust
# Table; output in `results_single_table_robust.tex`
python -m src.analysis.figures single_table figure_data_robust.json --output results_single_table_robust.tex
We also run an ablation where we look at how the choice of alpha affects the
performance of sep-CMA-MAE. This ablation is run in the same manner as the
experiments above, but with the following commands. Note that the default config
for sep-CMA-MAE (sep_cma_mae.gin
) uses alpha=0.001.
# QD Ant
bash scripts/run_slurm.sh config/qd_ant/sep_cma_mae_0-0.gin SEED config/hpc/100.sh
bash scripts/run_slurm.sh config/qd_ant/sep_cma_mae.gin SEED config/hpc/100.sh
bash scripts/run_slurm.sh config/qd_ant/sep_cma_mae_0-01.gin SEED config/hpc/100.sh
bash scripts/run_slurm.sh config/qd_ant/sep_cma_mae_0-1.gin SEED config/hpc/100.sh
bash scripts/run_slurm.sh config/qd_ant/sep_cma_mae_1-0.gin SEED config/hpc/100.sh
# QD Half-Cheetah
bash scripts/run_slurm.sh config/qd_half_cheetah/sep_cma_mae_0-0.gin SEED config/hpc/100.sh
bash scripts/run_slurm.sh config/qd_half_cheetah/sep_cma_mae.gin SEED config/hpc/100.sh
bash scripts/run_slurm.sh config/qd_half_cheetah/sep_cma_mae_0-01.gin SEED config/hpc/100.sh
bash scripts/run_slurm.sh config/qd_half_cheetah/sep_cma_mae_0-1.gin SEED config/hpc/100.sh
bash scripts/run_slurm.sh config/qd_half_cheetah/sep_cma_mae_1-0.gin SEED config/hpc/100.sh
# QD Hopper
bash scripts/run_slurm.sh config/qd_hopper/sep_cma_mae_0-0.gin SEED config/hpc/100.sh
bash scripts/run_slurm.sh config/qd_hopper/sep_cma_mae.gin SEED config/hpc/100.sh
bash scripts/run_slurm.sh config/qd_hopper/sep_cma_mae_0-01.gin SEED config/hpc/100.sh
bash scripts/run_slurm.sh config/qd_hopper/sep_cma_mae_0-1.gin SEED config/hpc/100.sh
bash scripts/run_slurm.sh config/qd_hopper/sep_cma_mae_1-0.gin SEED config/hpc/100.sh
# QD Walker
bash scripts/run_slurm.sh config/qd_walker/sep_cma_mae_0-0.gin SEED config/hpc/100.sh
bash scripts/run_slurm.sh config/qd_walker/sep_cma_mae.gin SEED config/hpc/100.sh
bash scripts/run_slurm.sh config/qd_walker/sep_cma_mae_0-01.gin SEED config/hpc/100.sh
bash scripts/run_slurm.sh config/qd_walker/sep_cma_mae_0-1.gin SEED config/hpc/100.sh
bash scripts/run_slurm.sh config/qd_walker/sep_cma_mae_1-0.gin SEED config/hpc/100.sh
Analysis should avoid using the default names to prevent name conflicts with the main results:
# Aggregates logging directory data into `figure_data_alpha.json`
python -m src.analysis.figures collect [YOUR_MANIFEST_FILE] --key AlphaAblation --output figure_data_alpha.json
# Plots comparison figures in the `comparison_alpha/` directory.
python -m src.analysis.figures comparison figure_data_alpha.json --output comparison_alpha
# Table; output in `results_single_table_alpha.tex`
python -m src.analysis.figures single_table figure_data_alpha.json --output results_single_table_alpha.tex
Robustness / corrected metrics can also be run as described earlier, with analysis performed as:
# Aggregates logging directory data into `figure_data_alpha_robust.json`
python -m src.analysis.figures collect [YOUR_MANIFEST_FILE] --key AlphaAblation --robust --output figure_data_alpha_robust.json
# Table; output in `results_single_table_alpha_robust.tex`
python -m src.analysis.figures single_table figure_data_alpha_robust.json --notime --output results_single_table_alpha_robust.tex
We also perform a study to understand the performance of CMA-MAE and its
variants on low-dimensional optimization benchmarks. The source code for this
study is located in optimization_benchmarks/
and is run in a separate Conda
environment. However, the analysis code is shared with this main repo and is run
in the same container as the main experiments of this paper. This study may be
run as follows.
-
This code requires the Kheperax submodule. If you did not already pull the submodule, do so now with:
git submodule update --init
-
Change into
optimization_benchmarks/
:cd optimization_benchmarks
-
Set up a separate Conda environment:
conda create --prefix ./env python=3.8 conda activate ./env
-
Install the requirements:
pip install -r requirements.txt pip install -r ../Kheperax/requirements.txt
-
(Optional) Kheperax runs with JAX and will be significantly faster with GPU support. To install JAX with GPU support, see here.
-
Run the experiments. To replicate our paper, first run 10 trials of CMA-MAE and its variants in both the sphere and arm domains, with both 100 and 1000 dimensions. The exact commands are as follows.
experiment_parallel.py
runs multiple experiments in parallel for convenience.python experiment_parallel.py sphere cma_mae 100 10 python experiment_parallel.py sphere sep_cma_mae 100 10 python experiment_parallel.py sphere lm_ma_mae 100 10 python experiment_parallel.py sphere openai_mae 100 10 python experiment_parallel.py sphere cma_mae 1000 10 python experiment_parallel.py sphere sep_cma_mae 1000 10 python experiment_parallel.py sphere lm_ma_mae 1000 10 python experiment_parallel.py sphere openai_mae 1000 10 python experiment_parallel.py arm cma_mae 100 10 python experiment_parallel.py arm sep_cma_mae 100 10 python experiment_parallel.py arm lm_ma_mae 100 10 python experiment_parallel.py arm openai_mae 100 10 python experiment_parallel.py arm cma_mae 1000 10 python experiment_parallel.py arm sep_cma_mae 1000 10 python experiment_parallel.py arm lm_ma_mae 1000 10 python experiment_parallel.py arm openai_mae 1000 10
Next, run 10 trials of each method on the maze domain. We have separated this script into
maze.py
. Each call tomaze.py
involves passing the algorithm and seed as shown below; you will need to run each command 10 times with different seeds.python maze.py arm cma_mae [SEED] python maze.py arm sep_cma_mae [SEED] python maze.py arm lm_ma_mae [SEED] python maze.py arm openai_mae [SEED]
If you wish to modify the configuration for the experiments, see the
CONFIG
variables inexperiment.py
andmaze.py
.
The analysis for the low-dimensional study is done in the main directory of this
repo within the Singularity container used in the main experiments. To perform
this analysis, first create a manifest file containing the logging directories
as described in src/analysis/figures.py
. Then, run the following commands
within the container. To start a shell in the container, you can use
make shell
(see the Makefile
) or run singularity shell container.sif
.
# Aggregates logging directory data into `figure_data_optbench.json`
python -m src.analysis.figures_optbench collect [YOUR_MANIFEST_FILE]
# Plots comparison figures in the `comparison_optbench/` directory.
python -m src.analysis.figures_optbench comparison
# Statistical analysis; output in `stats_tests_optbench`
python -m src.analysis.figures_optbench tests
# Table; output in `results_table_optbench.tex`
python -m src.analysis.figures_optbench table
Refer to our paper for results in each study.
Each experiment is structured as shown in the following diagram. Dask is the distributed compute library we use. When we run an experiment, we connect to a Dask scheduler, which is in turn connected to one or more Dask workers. Each component runs in a Singularity container.
The algorithm implementations are primarily located in the following files:
- CMA-MAE variants:
src/emitters/annealing_emitter.py
,src/emitters/opt
CMA-MEGA (ES)
andCMA-MEGA (TD3, ES)
:src/emitters/gradient_improvement_emitter.py
PGA-ME
:src/emitters/pga_emitter.py
,src/emitters/gaussian_emitter.py
ME-ES
:src/me_es/
(adapted from authors' implementation)MAP-Elites
:src/emitters/gaussian_emitter.py
src/main.py
: Entry point for all experiments.src/manager.py
: Handles all experiments that are implemented with pyribs.src/objectives/gym_control/
: This is the code that evaluates all solutions in the QDGym environments.
The Makefile has several useful commands. Run make
for a full command
reference.
There are some tests alongside the code to ensure basic correctness. To run these, start a Singularity container with:
make shell
Within that container, execute:
make test
To understand the code, it will be useful to be familiar with the following libraries:
- In the codebase, we refer to
behavior_values
andBCs
(behavior characteristics). These are synonymous withmeasures
in the paper. - We use
PGA-ME
andPGA-MAP-Elites
interchangeably in the code. - We also use
iterations
andgenerations
interchangeably. - In our code (specifically
src/manager.py
), we measureRobustness
on every iteration. However, this metric is only the robustness of the best-performing solution. Instead, we compute the Corrected Metrics described in the paper in a separate script (src/analysis/robustness.py
) after experiments are completed.
This code is released under the MIT License, with the following exceptions:
- The
src/me_es/
directory is derived from Colas 2020 (repository) and is released under the Uber Non-Commercial License. - The
src/qd_gym/
directory is adapted from Olle Nilsson's QDgym and is released under an MIT license.