-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[drake_gym] Add info_handler callback #21900
base: master
Are you sure you want to change the base?
[drake_gym] Add info_handler callback #21900
Conversation
I am not very familiar with Gym, so let me ask a few questions and then also invite others to comment. (1) Is the (2) If not, then for users who want this kind of metadata, why would the following not be a better solution (in their own code, with no changes to Drake)? observation, reward, terminated, truncated, info = env.step(action)
info["timestamp"] = env.simulator.get_context().get_time() With this technique, users can add any metadata they like. Perhaps the problem is that the user is not directly calling (3) If we do need to populate +@JoseBarreiros-TRI for discussion and feature review, please. |
Here are my thoughts:
|
@jwnimmer-tri This is a great suggestion and works, but I'd prefer to support multiple gym environments, some not using Since it is not a convention to provide time data in If this PR is dismissed, as the desired behavior is possible with the current setup (thanks @JoseBarreiros-TRI ). That's fine too. |
Ah, thanks for the pointer; I'd missed that detail. It sounds like everyone is happy with the "info handler" proposal, so that's probably the way to proceed. |
info
return value from DrakeGymEnv.step(...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@samzapo Jenkins is showing linter errors. You can run the linters locally via bazel test --config=lint //bindings/pydrake/...
.
@JoseBarreiros-TRI do you have time/interest in serving as the reviewer here? If not, I can recruit someone else.
Reviewable status: 1 unresolved discussion, LGTM missing from assignee JoseBarreiros-TRI, needs platform reviewer assigned, needs at least two assigned reviewers, commits need curation (https://drake.mit.edu/reviewable.html#curated-commits) (waiting on @samzapo)
bindings/pydrake/gym/_drake_gym_env.py
line 256 at r2 (raw file):
terminated = False reward = 0 info = dict() if self.info_handler is None else self.info_handler(self.simulator)
nit Instead of copy-pasting the if-None-else logic multiple times, we can probably do something less brittle, e.g.
(1) Don't allow the info handler be None
; instead, change the default value in the constructor args to bedict
(i.e., the callable constructor that does what we want as a default).
(2) Add a private helper method def _info(self):
on DrakeGymEnv
with the None-handling logic, and change this to info = self._info()
or similar.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
happy to review this
Reviewable status: 1 unresolved discussion, LGTM missing from assignee JoseBarreiros-TRI, needs platform reviewer assigned, needs at least two assigned reviewers, commits need curation (https://drake.mit.edu/reviewable.html#curated-commits) (waiting on @samzapo)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: LGTM missing from assignee JoseBarreiros-TRI, needs platform reviewer assigned, needs at least two assigned reviewers, commits need curation (https://drake.mit.edu/reviewable.html#curated-commits) (waiting on @samzapo)
bindings/pydrake/gym/_drake_gym_env.py
line 256 at r2 (raw file):
Previously, jwnimmer-tri (Jeremy Nimmer) wrote…
nit Instead of copy-pasting the if-None-else logic multiple times, we can probably do something less brittle, e.g.
(1) Don't allow the info handler be
None
; instead, change the default value in the constructor args to bedict
(i.e., the callable constructor that does what we want as a default).(2) Add a private helper method
def _info(self):
onDrakeGymEnv
with the None-handling logic, and change this toinfo = self._info()
or similar.
Done. I've chosen to do (1), see assignment to self.info_handler
logic
Fixing this in a moment. Immediately validating why the complicated logic on each line is hard to maintain. Code quote: info = dict() if self.info_handler is None else self.info_handler(self.simulator) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 1 unresolved discussion, LGTM missing from assignee JoseBarreiros-TRI, needs platform reviewer assigned, needs at least two assigned reviewers, commits need curation (https://drake.mit.edu/reviewable.html#curated-commits) (waiting on @samzapo)
bindings/pydrake/gym/_drake_gym_env.py
line 256 at r2 (raw file):
Previously, samzapo (Samuel Zapolsky) wrote…
Done. I've chosen to do (1), see assignment to
self.info_handler
logic
BTW I expect that foo = lambda _: dict()
can be shortened to just foo = dict
. The constructor is already a callable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: LGTM missing from assignee JoseBarreiros-TRI, needs platform reviewer assigned, needs at least two assigned reviewers, commits need curation (https://drake.mit.edu/reviewable.html#curated-commits) (waiting on @samzapo)
bindings/pydrake/gym/_drake_gym_env.py
line 256 at r2 (raw file):
Previously, jwnimmer-tri (Jeremy Nimmer) wrote…
BTW I expect that
foo = lambda _: dict()
can be shortened to justfoo = dict
. The constructor is already a callable.
Would this work if self.info_handler(simulator)
is called (i.e., with a parameter)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: LGTM missing from assignee JoseBarreiros-TRI, needs platform reviewer assigned, needs at least two assigned reviewers, commits need curation (https://drake.mit.edu/reviewable.html#curated-commits) (waiting on @samzapo)
bindings/pydrake/gym/_drake_gym_env.py
line 256 at r2 (raw file):
Previously, samzapo (Samuel Zapolsky) wrote…
Would this work if
self.info_handler(simulator)
is called (i.e., with a parameter)?
Oops, nevermind. I forgot the callback took argument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 2 unresolved discussions, LGTM missing from assignee JoseBarreiros-TRI, needs platform reviewer assigned, needs at least two assigned reviewers, commits need curation (https://drake.mit.edu/reviewable.html#curated-commits) (waiting on @samzapo)
bindings/pydrake/examples/gym/envs/cart_pole.py
line 418 at r4 (raw file):
observation_port_id="observations", reset_handler=reset_handler, info_handler=lambda _: dict(),
why not set this to None or better provide a nice example of an info handler
bindings/pydrake/gym/_drake_gym_env.py
line 257 at r4 (raw file):
terminated = False reward = 0 info = self.info_handler(self.simulator)
I'm not sure about the expected behavior here. I think that if the run gets to this point it means that the simulator faulted which would cause an unexpected behavior when called by the info_handler(). Consider setting the info to an empty dict or info["sim_runtime_error"]=True.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@samzapo I'm happy to help land this PR, let me know if you have any questions.
Reviewable status: 2 unresolved discussions, LGTM missing from assignee JoseBarreiros-TRI, needs platform reviewer assigned, needs at least two assigned reviewers, commits need curation (https://drake.mit.edu/reviewable.html#curated-commits) (waiting on @samzapo)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @JoseBarreiros-TRI , I apologize for the delay, I was on vacation last week. I'll resume working on this today.
Reviewable status: 2 unresolved discussions, LGTM missing from assignee JoseBarreiros-TRI, needs platform reviewer assigned, needs at least two assigned reviewers, commits need curation (https://drake.mit.edu/reviewable.html#curated-commits)
bindings/pydrake/gym/_drake_gym_env.py
line 257 at r4 (raw file):
Previously, JoseBarreiros-TRI wrote…
I'm not sure about the expected behavior here. I think that if the run gets to this point it means that the simulator faulted which would cause an unexpected behavior when called by the info_handler(). Consider setting the info to an empty dict or info["sim_runtime_error"]=True.
I don't want to add anything to info that isn't standard.
This is what motivated adding an info handler over explicitly adding info["timestamp"]
I think it's likely that simulator
will still behave normally after most runtime errors, but this is a good point
I'd go for just returning an empty dict in this case, and adding documentation that says, e.g.,
"if the simulator emits a runtime error while advancing, the provided info handler will not be called and will return an empty dict instead".
Would that work?
…o add_timestamp_to_step_info
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 2 unresolved discussions, LGTM missing from assignee JoseBarreiros-TRI, needs platform reviewer assigned, needs at least two assigned reviewers, commits need curation (https://drake.mit.edu/reviewable.html#curated-commits) (waiting on @samzapo)
bindings/pydrake/examples/gym/envs/cart_pole.py
line 418 at r4 (raw file):
Previously, JoseBarreiros-TRI wrote…
why not set this to None or better provide a nice example of an info handler
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 1 unresolved discussion, LGTM missing from assignee JoseBarreiros-TRI, needs platform reviewer assigned, needs at least two assigned reviewers, commits need curation (https://drake.mit.edu/reviewable.html#curated-commits) (waiting on @samzapo)
bindings/pydrake/gym/_drake_gym_env.py
line 257 at r4 (raw file):
Previously, samzapo (Samuel Zapolsky) wrote…
I don't want to add anything to info that isn't standard.
This is what motivated adding an info handler over explicitly addinginfo["timestamp"]
I think it's likely that
simulator
will still behave normally after most runtime errors, but this is a good point
I'd go for just returning an empty dict in this case, and adding documentation that says, e.g.,
"if the simulator emits a runtime error while advancing, the provided info handler will not be called and will return an empty dict instead".Would that work?
yes, that is reasonable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 1 unresolved discussion, LGTM missing from assignee JoseBarreiros-TRI, needs platform reviewer assigned, needs at least two assigned reviewers, commits need curation (https://drake.mit.edu/reviewable.html#curated-commits) (waiting on @samzapo)
bindings/pydrake/gym/_drake_gym_env.py
line 257 at r4 (raw file):
Previously, JoseBarreiros-TRI wrote…
yes, that is reasonable.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: needs platform reviewer assigned, needs at least two assigned reviewers, commits need curation (https://drake.mit.edu/reviewable.html#curated-commits) (waiting on @samzapo)
Closes #21899.
This change is