Pytorch implementations of RL (Reinforcement Learning) algorithms with RNN (Reccurent Neural Network) and Experience Replay
Disclaimer: My code is based on TD3 of openAI/spinningup.
Experiment RL containing RNN and Experience Replay to better understand how following techniches and parameters affect.
R2D2 incporporated RNN into distributed reinforcement learning to achieve significant performance improvements on Atari tasks.
In that paper, they investigated the training of RNNs with Experience Replay. And proposed following techniques adapt off-policy and Experience Replay to Actor-Critic algorithm.
- 'Stored state' keep the hidden state of to the experience buffer when roll out.
- "Burn-in" allow network to go through state before training timestep.
TD3 which is Actor-Critic algorithm which has replay buffer is used for following benchmarks.
- Difference using simple stacked observation and RNN network against POMDP task
- How following techniques make difference
- Stored state
- Burn-in process
- How parameters of above techniques affect performance
pip install -e .
without RNN, using CPU
python rnnrl/algos/pytorch/td3/td3.py --env Pendulum-v0 --seed=$i --device cpu
with RNN, using GPU
python rnnrl/algos/pytorch/td3/td3.py --env Pendulum-v0 --seed=$i --device cuda --recurrent
Benchmarks are executed under environment of Pendulum-v0 with PartialObservation.
PartialObservation is a wrapper to allow policy to receive observation only once in 3 times of steps for making POMDP. The naive technique to mitigate POMDP is to simply use stacked observations as observation at some point.