Goal: We analyze different optimization approaches and examine their performances in different DRL applications, aiming at understanding why and how they perform differently. In particular, we focus our analysis on gradient-based approaches and (gradient-free) evolution-based optimization methods.
Environment | CartPole-v1 | BipedalWalker-v2 |
Gradient-based optimization | DQN * | TD3 ** |
Gradient-free optimization | GA * | GA *** |
* feed-forward neural network consisting of 1 hidden layer with 24 units
** feed-forward neural networks consisting of 2 hidden layers with 512, 256 units
*** feed-forward neural networks consisting of 3 hidden layers with 128, 128, 3 units
Evaluation metrics: we compare the different algorithms, according to their time of convergence, total agent's reward, weights distances and reward function's hessian.
Create a virtual environment and install all required packages:
conda create --name deep-learning python=3.6
source activate deep-learning
pip install -r requirements.txt
In config.yml
, one can choose which OpenAI Gym environment and optimization algorithm to use (all available possibilities are listed on top). For example:
name: 'CartPole-v1'
animate: False
algorithm: 'ga'
For each environment, we defined a specific neural network architecture for evolutionary algorithms in src/config/models.yml
It contains also optimization algorithms' parameters we used to train agents.
Please, make sure that you have set the desired environment and optimization algorithm in config.yml
, before start training.
python src/main.py
If you are using a machine without a display, please run the following instead:
xvfb-run -s "-screen 0 1400x900x24" python src/main.py
The analysis of the different DRL optimization algorithms can be found in results
and notebooks
├── config.yml # configuration file
├── src
│ ├── config # configuration loading package
│ ├── A2C # A2C package
│ ├── DDPG # Deep Deterministic Policy Gradients package
│ ├── TD3 # TD3 package
│ ├── GA # Genetic Algorithm package
│ ├── DQN # Deep Q Learning package
│ ├── ES # Evolution Strategies package
│ ├── CMA_ES # Covariance Matrix Adapatation ES package
│ ├── population # population package for evolutionary algorithms
│ ├── main.py # main
│ ├── optimizers.py # base gradient-free optimizer
│ ├── loss_analysis.py # functions for loss analysis
│ ├── visualization.py # visualization for analysis
│ └── utils.py # helper functions
├── notebooks # notebooks with results analysis
├── results # folder containing training results and analysis plots
├── runs.yml # runs log file
└── requirements.txt # list of all packages used
Note: the code has been tested on the following machines: macOS and Ubuntu computers, Google Cloud Platform virtual machine, and partially on ETH Leonhard cluster.