The first step towards the objective of the Master Thesis was the implementation of the algorithm in [1].
This first version is not capable to elaborate images but only continuous environments of OpenAI Gym such as MountainCarContinuous-v0.
The next step will be the implementation of the same algorithm using images as input, which is a similar context to the one of the Anki Cozmo environment.
CriticNN(
(linear1): Linear(in_features=3, out_features=256, bias=True)
(linear2): Linear(in_features=256, out_features=256, bias=True)
(linear3): Linear(in_features=256, out_features=1, bias=True)
)
ActorNN(
(linear1): Linear(in_features=2, out_features=256, bias=True)
(linear2): Linear(in_features=256, out_features=256, bias=True)
(linear3): Linear(in_features=256, out_features=1, bias=True)
)
mu=0.0
sigma=0.3
theta=0.15
# Decay of OUNoise
eps_start=0.9
eps_end=0.2
eps_decay=1000
# ReplayBuffer
batch_size=32
replay_min_size=10000
replay_max_size=1000000
# Simulation
n_episode=1000
episode_max_len=1000
# Update
discount=0.99
critic_weight_decay=0.
critic_update_method='adam'
critic_lr=1e-3
actor_weight_decay=0
actor_update_method='adam'
actor_lr=1e-4
soft_target_tau=0.001
n_updates_per_sample=1
# Test
eval_samples=10000
These plots refers to one run of the DDPG algorithm using the MountainCarContinuous-v0 environment provided by OpenAI Gym
As reported in OpenAI Gym Leaderboards the problem is considered solved if the mean of the last 100 episode is greater than 90. This code reaches this threshold after about 130 episode (plot). Taking into account that we are using a simple NN this is an accettable result, therefore it can be used as a good baseline to start the evolution of our project.
[1] Learning to Drive in a Day (Sep 2018) by Alex Kendall, Jeffrey Hawke, David Janz, Przemyslaw Mazur, Daniele Reda, John-Mark Allen, Vinh-Dieu Lam, Alex Bewley & Amar Shah
[2] Continuous Control with Deep Reinforcement Learning (Feb 2016) by Timothy P. Lillicrap, Jonathan J. Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver & Daan Wierstra