PyTorch implementation of D4PG

This repository contains a PyTorch implementation of D4PG with IQN as the improved distributional Critic instead of C51. Also the extentions Munchausen RL and D2RL are added and can be combined with D4PG as needed.

Dependencies

Trained and tested on:

Python 3.6
PyTorch 1.4.0  
Numpy 1.15.2 
gym 0.10.11

How to use:

The new script combines all extensions and the add-ons can be simply added by setting the corresponding flags.

python run.py -info your_run_info

Parameter: To see the options: python run.py -h

Observe training results

tensorboard --logdir=runs

Added Extensions:

Prioritized Experience Replay [X]
N-Step Bootstrapping [X]
D2RL [X]
Distributional IQN Critic [X]
Munchausen RL [X]
Parallel-Environments [X]

Results

Environment: Pendulum

Below you can see how IQN reduced the variance of the Critic loss:

Environment: LunarLander

Notes:

Performance depends a lot on good hyperparameter->> tau for Per bigger (pendulum 1e-2) for regular replay (1e-3)
BatchNorm had good impact on the overall performance (!)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

PyTorch implementation of D4PG

Dependencies

How to use:

Observe training results

Results

Environment: Pendulum

Environment: LunarLander

Files

README.md

Latest commit

History

README.md

File metadata and controls

PyTorch implementation of D4PG

Dependencies

How to use:

Observe training results

Results

Environment: Pendulum

Environment: LunarLander