-
Notifications
You must be signed in to change notification settings - Fork 119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stuck when training in MsPacman-v0 #31
Comments
I tried python3.6 main.py --env Pong-v0 --workers 32 and the problem is still the same. The environment is pytorch 0.4.1.
|
Did you try training longer? I was able to train Pong-v0 in about 20 minutes with a GTX 1080 Ti, but it took hours with a CPU. By default it does not use GPU, use the --gpu-ids argument to set the gpu ids. |
It's weird that MsPacman-v0 will stuck. I found the reason could be that some of my processes become zombie processes. (Maybe because of the lack of CPU resources). There were only 2 processes working and updating the neural network. |
I've seen similar behavior when I ran out of memory, maybe you need to reduce the number of workers? Training MsPacman-v0 works for me on: I used the command python main.py --env MsPacman-v0 --workers 7 |
I guess you can also try to compensate the smaller number of workers with smaller learning rate:
|
7 workers is quite a small number of workers and will hinder performance and 4mins of training with lower number of workers is hardly enough time to see real improvement. Especially as the v0 environments are particularly more challenging than the more common used versions. If you have access to a setup with say one gpu and 8 cpu cores with hyperthreading you will get much better performance in terms of speed. In such a setup using 16 workers using the A3G version you should see scores of 15,000-20,000 in less than 12hrs. Using 7 workers is probably hindering adequate exploration and reducing learning rate may help in that regard but I really suggest, if you have resources to adequately support, to use at minimum 16 workers. Especially for the v0 environments. A reduction in the tau variable would also help in exploration, which is the lambda variable in the generalized advantage function. |
MsPacman-v0 is fairly hard for modern hardware; reaching a 5000 score occasionally took me 12 hours (100 moving window avg was 3858) with 36 agents, 12cores, 3x 1080ti Turbos, Adam --amsgrad True. Do you have any parameter suggestion to speed up convergence? I used the following parameters: lr=1e-4, gamma=0.99, tau=0.92, num_steps=20, max_episode_len=10000. Would it make sense to implement Huber L1 loss? How would I test this on your code base?
|
Hi @dgriff777 . Thank you for your repo. It's great that it can achieve such a high score. But I met a problem when I try to apply it to MsPacman-v0.
I simply used this command
python main.py --env MsPacman-v0 --workers 7
Then, I get the test score like this:
The test score is always 70 and It seems that the agent will choose the same way every time and stop at a corner.
Could you tell me how did you train the model to get 6323.01 ± 116.91 scores in MsPacman-v0? Is there any other parameters that I should set?
The text was updated successfully, but these errors were encountered: