This is a PyTorch implementation of
This repository is based on a fork of the pytorch-a2c-ppo-acktr repository by Ilya Kostrikov (https://github.com/ikostrikov/pytorch-a2c-ppo-acktr).
To cite our work please use the following bibtex:
@misc{repo,
author = {Florian Klemt, Angela Denninger, Tim Meinhardt, Laura Leal{-}Taix{\'{e}}},
title = {PyTorch Latent I2A},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/FlorianKlemt/pytorch-latent-i2a.git}},
urldate = {2018-10-18}
}
If you have any questions or suggestions write us under [email protected] or [email protected].
- Python 3, tested on python 3.5
- PyTorch, tested with version 0.4.1
- OpenAI Gym with Atari Environments enabled
- Visdom
- To use MiniPacman environments you also need to download and install the gym-minipacman repo MiniPacman.
In order to install MiniPacman run:
# MiniPacman
git clone https://github.com/FlorianKlemt/gym-minipacman.git
cd gym-minipacman
pip3 install -e .
python3 main.py --env-name HuntMiniPacmanNoFrameskip-v0 --algo a2c --num-stack 1
Requires a pretrained A2C model, or the flag --no-policy-model-loading
.
python3 main_train_environment_model.py --env-name HuntMiniPacmanNoFrameskip-v0 --environment-model MiniModelLabels --weight-decay 0
Requires pretrained environment model.
python3 main.py --env-name HuntMiniPacmanNoFrameskip-v0 --algo i2a --environment-model MiniModelLabels --num-stack 1 --distill-coef 10 --entropy-coef 0.02
The copy model has the same number of weights as the I2A model, but does not imagine the future. Therefore it does not need an environment model.
python3 main.py --environment-model CopyModel --env-name HuntMiniPacmanNoFrameskip-v0 --algo i2a --num-stack 1 --distill-coef 10 --entropy-coef 0.02
python3 main.py --env-name MsPacmanNoFrameskip-v0 --algo a2c --train-on-200x160-pixel --num-stack 4
Requires a pretrained A2C model, or the flag --no-policy-model-loading
.
python3 main_train_environment_model.py --env-name MsPacmanNoFrameskip-v0 --environment-model dSSM_DET --lr 0.0001 --weight-decay 0 --rollout-steps 10
python3 main_train_environment_model.py --env-name MsPacmanNoFrameskip-v0 --environment-model dSSM_VAE --lr 0.0001 --weight-decay 0 --rollout-steps 10
python3 main_train_environment_model.py --env-name MsPacmanNoFrameskip-v0 --environment-model sSSM --lr 0.0001 --weight-decay 0 --rollout-steps 10
Requires a pretrained latent space environment model.
python3 main.py --env-name MsPacmanNoFrameskip-v0 --algo i2a --distill-coef 10 --entropy-coef 0.01 --num-stack 4 --environment-model dSSM_DET
To see a visualization of the training curves during training, start a visdom server via
python3 -m visdom.server -p 8097
The default port used both by visdom and our code is 8097.
To continue training on a pretrained model use the --load-model
flag. The model must lie in the folder specified via the --save-dir
flag (default: ./trained_models/
). I2A models must lie under the subfolder ./trained_models/i2a/
, A2C models must lie under the subfolder './trained_models/a2c/'. The file must be named the same as the environment name with file-ending .pt
.
Example:
python3 main.py --env-name MsPacmanNoFrameskip-v0 --algo i2a --distill-coef 10 --num-stack 4 --environment-model dSSM_DET --load-model
loads the model under ./trained_models/i2a/MsPacmanNoFrameskip-v0.pt
. The --algo
, --num-stack
and --environment-model
arguments must be the same as used in the loaded model.
To play with a pretrained model without continuing to train use the --no-training
flag.
Example:
python3 main.py --env-name MsPacmanNoFrameskip-v0 --algo i2a --num-stack 4 --environment-model dSSM_DET --no-training