-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add ddpg in tensorflow #51
Conversation
This is needed for using more than one MjViewer at a time
chainer setup is broken (on Ubuntu 16.04, at least), and the chainer devs direct users to newer versions, because v1.18.0 is unsupportted. See cupy/cupy#886. I can find no actual dependencies to chainer in the main rllab repo. If someone has some downstream code with a hard chainer dependency, they're welcome to submit a (working) PR to add it back in.
StatefulPool doesn't support batching over with class methods because it always passes `G` as the first argument to the worker function. If one of the `run_` methods in StatefulPool is called with a class method it can lead to a silent lock-up of the pool, which is very difficult to debug. Note: this bug does not appear unless n_parallel > 1
Adds TensorBoard support for basic key-value pairs. Anything logged via `logger.record_tabular()` is also available via TensorBoard.
I can do a more detailed review later, but one overall comment: the purpose of adding more algorithms is also to add more building blocks to rllab. I need you to break the building blocks (i.e.g the neural networks and the replay buffer) out into their own libraries in rllab, so they can be reused by future algorithms. |
sandbox/rocky/tf/algos/ddpg/ddpg.py
Outdated
import os | ||
|
||
|
||
class DDPG(object): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should inherit from rllab.algos.base.RLAlgorithm
action_range=(-1, 1), | ||
actor_lr=1e-4, | ||
critic_lr=1e-3, | ||
reward_scale=1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you leave detailed comments on what each of these parameters does? Even better if you can reference them back to the constants in the equations in a paper.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Just to reiterate: this needs to be split into building blocks which are reusable into rllab. The issue also requests a regression test against openai/baselines version of the algorithm. |
import random | ||
|
||
|
||
class ReplayBuffer: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you could just use the replay buffer from the ddpg in the theano tree.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The replay buffer from the ddpg in the theano tree is combined with ddpg algorithm in the same file. It should not include the ddpg file from the theano tree unless I separate them. Also, I think the sandbox tree should not use the functions in the theano tree since they use different platforms(theano and tensorflow).
return tc.layers.layer_norm(x, center=True, scale=True) | ||
|
||
|
||
class Model(object): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For actor critic, I think we could work on q functions and policies that is implemented with tensorflow directly so they are reusable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could not find the q functions and policies functions in tensorflow. If there are, could you provide them for me?
Uhh deleting files in a |
Oops, ignore. Deleting files in a commit is one way to deal with a polluted tree, but it's more productive to use |
@cjcchen where can I find the latest code for this PR? |
I created a new PR based on the master branch. #62 |
But it seems fail on the CI test and I could not find the reason. |
Can you please close either this PR or #62? I don't know which one to review. |
add a ddpg model based on tensorflow in rllab