-
Notifications
You must be signed in to change notification settings - Fork 246
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add partial support for dictionary observation spaces (bc, density) #785
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall this design looks good to me. As you mentioned lots of things that needed to be cleaned up but I think it's going in the right direction. I did a fairly detailed review of data/types.py, data/rollout.py and algorithms/bc.py but just skimmed the rest so do highlight if there are any other important areas.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing to think about is how much we want to push dict observations through the code. In many cases for the imitation algorithms we could basically just flatten the dict to a NumPy array and "add" dict support with minimal additional code changes.
But that kind of defeats the purpose of adding dict support -- may as well just flatten the observations at the environment level. On the other hand, we have to flatten them at some point: the neural network will take a tensor as input not a dict. So, a lot of the design decision is deciding where to do the flattening.
It seems nice to be able to preserve the dict up until calling the policy. This gives flexibility to the user. In our case, InteractivePolicy can ignore the non-rendering component. In general, a user might want to preprocess different components of the observation differently.
tests/algorithms/test_bc.py
Outdated
@@ -371,3 +375,59 @@ def inc_batch_cnt(): | |||
|
|||
# THEN | |||
assert batch_cnt == no_yield_after_iter | |||
|
|||
|
|||
class FloatReward(gym.RewardWrapper): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need this? shouldn't reward be a float already?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see SimpleMultiObsEnvs sometimes returns 1 rather than 1.0. This is a bug we should probably fix upstream. I'm a maintainer of SB3 so if you make a PR I can review it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR here. Though the environment is more sketchy the more I look at it, e.g. they hardcode the possible transitions from each state in a way that doesn't depend on gridworld size. So maybe best to not depend on the environment much (or submit more fixes upstream).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SimpleMultiObsEnvs is just a test env, it is not meant to be used except in tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, we'd only be using it in tests as well.
One sad thing to note is we don't support arbitrarily nested dictionaries. |
Co-authored-by: Adam Gleave <[email protected]>
I agree, I think the Policy is the place to handle the dict -> network input transition. This is consistent with SB3 (see here) although the type signatures in SB3 obfuscate this fact. |
This reverts commit ee83ec5.
This reverts commit a9b32bd.
This reverts commit ba6a6a7.
This reverts commit 6e5c3e8.
This reverts commit be79cf5.
…/imitation into support-dict-obs-space
This reverts commit f5288c6.
This reverts commit 5c1d751.
This reverts commit 5c6e5b8.
This reverts commit 6884538.
This reverts commit 15541cd.
This reverts commit 194ec1a.
Finished addressing your comments. Please take another look. I moved pytype upgrade related issue to #801 because it turns out to be more complicated than simply fixing some types. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM -- one minor typo to fix but no need for a re-review. Can merge once we get CI green.
…support-dict-obs-space
…support-dict-obs-space
Codecov Report
@@ Coverage Diff @@
## master #785 +/- ##
==========================================
+ Coverage 96.33% 96.40% +0.07%
==========================================
Files 98 98
Lines 9177 9441 +264
==========================================
+ Hits 8841 9102 +261
- Misses 336 339 +3
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
Appreciate adding this support. |
Description
Partially addresses #681, by adding support for dictionary observation spaces in:
types.py
,rollout.py
, etc.)Does not add support for any other algorithms, or trajectory saving / loading.
Testing
test_types.py
ObsDict
intest_types.py
test_rollout.py
test_bc.py
density.py