Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add ddpg in tensorflow #51

Closed
wants to merge 34 commits into from
Closed

Conversation

cjcchen
Copy link

@cjcchen cjcchen commented Apr 14, 2018

add a ddpg model based on tensorflow in rllab

ryanjulian and others added 30 commits February 27, 2018 15:56
This is needed for using more than one MjViewer at a time
chainer setup is broken (on Ubuntu 16.04, at least), and the chainer
devs direct users to newer versions, because v1.18.0 is unsupportted.
See cupy/cupy#886.

I can find no actual dependencies to chainer in the main rllab repo. If
someone has some downstream code with a hard chainer dependency,
they're welcome to submit a (working) PR to add it back in.
StatefulPool doesn't support batching over with class methods because it
always passes `G` as the first argument to the worker function. If one of the
`run_` methods in StatefulPool is called with a class method it can lead to
a silent lock-up of the pool, which is very difficult to debug.

Note: this bug does not appear unless n_parallel > 1
Adds TensorBoard support for basic key-value pairs. Anything logged via
`logger.record_tabular()` is also available via TensorBoard.
@ryanjulian
Copy link
Owner

I can do a more detailed review later, but one overall comment: the purpose of adding more algorithms is also to add more building blocks to rllab. I need you to break the building blocks (i.e.g the neural networks and the replay buffer) out into their own libraries in rllab, so they can be reused by future algorithms.

import os


class DDPG(object):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should inherit from rllab.algos.base.RLAlgorithm

action_range=(-1, 1),
actor_lr=1e-4,
critic_lr=1e-3,
reward_scale=1,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you leave detailed comments on what each of these parameters does? Even better if you can reference them back to the constants in the equations in a paper.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@ryanjulian
Copy link
Owner

Just to reiterate: this needs to be split into building blocks which are reusable into rllab.

The issue also requests a regression test against openai/baselines version of the algorithm.

@ryanjulian ryanjulian requested a review from zhanpenghe May 2, 2018 22:07
import random


class ReplayBuffer:
Copy link
Collaborator

@zhanpenghe zhanpenghe May 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you could just use the replay buffer from the ddpg in the theano tree.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The replay buffer from the ddpg in the theano tree is combined with ddpg algorithm in the same file. It should not include the ddpg file from the theano tree unless I separate them. Also, I think the sandbox tree should not use the functions in the theano tree since they use different platforms(theano and tensorflow).

return tc.layers.layer_norm(x, center=True, scale=True)


class Model(object):
Copy link
Collaborator

@zhanpenghe zhanpenghe May 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For actor critic, I think we could work on q functions and policies that is implemented with tensorflow directly so they are reusable.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could not find the q functions and policies functions in tensorflow. If there are, could you provide them for me?

@ryanjulian ryanjulian changed the base branch from integration to master May 8, 2018 17:31
@ryanjulian
Copy link
Owner

Uhh deleting files in a

@ryanjulian ryanjulian closed this May 9, 2018
@ryanjulian ryanjulian reopened this May 9, 2018
@ryanjulian
Copy link
Owner

Oops, ignore.

Deleting files in a commit is one way to deal with a polluted tree, but it's more productive to use git rebase. You need to rebase a lot in this repository, anyway.

https://git-scm.com/book/pt-br/v2/Git-Branching-Rebasing

@ryanjulian
Copy link
Owner

@cjcchen where can I find the latest code for this PR?

@cjcchen
Copy link
Author

cjcchen commented May 16, 2018

I created a new PR based on the master branch. #62

@cjcchen
Copy link
Author

cjcchen commented May 16, 2018

But it seems fail on the CI test and I could not find the reason.

@ryanjulian ryanjulian added this to the Week of May 14 milestone May 16, 2018
@ryanjulian
Copy link
Owner

Can you please close either this PR or #62? I don't know which one to review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants