Add partial support for dictionary observation spaces (bc, density) #785

NixGD · 2023-09-13T20:46:59Z

Description

Partially addresses #681, by adding support for dictionary observation spaces in:

Core trajectory gathering and processing code (types.py, rollout.py, etc.)
Behavioral Cloning
Density based algorithms

Does not add support for any other algorithms, or trajectory saving / loading.

Testing

Add Dict space to observation space parameterization over trajectories in test_types.py
Add explicit tests of ObsDict in test_types.py
Add tests of dict observation spaces to test_rollout.py
Add tests of dict observation spaces to test_bc.py
Add tests of dict observation spaces to density.py

src/imitation/data/types.py

src/imitation/data/huggingface_utils.py

AdamGleave

Overall this design looks good to me. As you mentioned lots of things that needed to be cleaned up but I think it's going in the right direction. I did a fairly detailed review of data/types.py, data/rollout.py and algorithms/bc.py but just skimmed the rest so do highlight if there are any other important areas.

src/imitation/data/types.py

src/imitation/data/rollout.py

src/imitation/algorithms/bc.py

AdamGleave

One thing to think about is how much we want to push dict observations through the code. In many cases for the imitation algorithms we could basically just flatten the dict to a NumPy array and "add" dict support with minimal additional code changes.

But that kind of defeats the purpose of adding dict support -- may as well just flatten the observations at the environment level. On the other hand, we have to flatten them at some point: the neural network will take a tensor as input not a dict. So, a lot of the design decision is deciding where to do the flattening.

It seems nice to be able to preserve the dict up until calling the policy. This gives flexibility to the user. In our case, InteractivePolicy can ignore the non-rendering component. In general, a user might want to preprocess different components of the observation differently.

src/imitation/algorithms/density.py

src/imitation/algorithms/mce_irl.py

src/imitation/data/huggingface_utils.py

AdamGleave · 2023-09-14T22:08:35Z

tests/algorithms/test_bc.py

@@ -371,3 +375,59 @@ def inc_batch_cnt():

    # THEN
    assert batch_cnt == no_yield_after_iter
+
+
+class FloatReward(gym.RewardWrapper):


why do we need this? shouldn't reward be a float already?

Ah, I see SimpleMultiObsEnvs sometimes returns 1 rather than 1.0. This is a bug we should probably fix upstream. I'm a maintainer of SB3 so if you make a PR I can review it.

PR here. Though the environment is more sketchy the more I look at it, e.g. they hardcode the possible transitions from each state in a way that doesn't depend on gridworld size. So maybe best to not depend on the environment much (or submit more fixes upstream).

SimpleMultiObsEnvs is just a test env, it is not meant to be used except in tests.

Yep, we'd only be using it in tests as well.

tests/algorithms/test_bc.py

NixGD · 2023-09-14T22:44:34Z

One sad thing to note is we don't support arbitrarily nested dictionaries.

tests/data/test_types.py

Co-authored-by: Adam Gleave <[email protected]>

NixGD · 2023-09-15T15:29:50Z

One thing to think about is how much we want to push dict observations through the code. In many cases for the imitation algorithms we could basically just flatten the dict to a NumPy array and "add" dict support with minimal additional code changes.

But that kind of defeats the purpose of adding dict support -- may as well just flatten the observations at the environment level. On the other hand, we have to flatten them at some point: the neural network will take a tensor as input not a dict. So, a lot of the design decision is deciding where to do the flattening.

It seems nice to be able to preserve the dict up until calling the policy. This gives flexibility to the user. In our case, InteractivePolicy can ignore the non-rendering component. In general, a user might want to preprocess different components of the observation differently.

I agree, I think the Policy is the place to handle the dict -> network input transition. This is consistent with SB3 (see here) although the type signatures in SB3 obfuscate this fact.

This reverts commit ee83ec5.

This reverts commit a9b32bd.

This reverts commit ba6a6a7.

This reverts commit 6e5c3e8.

This reverts commit be79cf5.

…/imitation into support-dict-obs-space

This reverts commit f5288c6.

This reverts commit 5c1d751.

This reverts commit 5c6e5b8.

This reverts commit 6884538.

This reverts commit 15541cd.

This reverts commit 194ec1a.

ZiyueWang25 · 2023-10-04T14:43:32Z

Finished addressing your comments. Please take another look.

I moved pytype upgrade related issue to #801 because it turns out to be more complicated than simply fixing some types.

AdamGleave

LGTM -- one minor typo to fix but no need for a re-review. Can merge once we get CI green.

src/imitation/data/rollout.py

src/imitation/algorithms/dagger.py

…support-dict-obs-space

codecov · 2023-10-05T19:20:07Z

Codecov Report

Merging #785 (0af3037) into master (573b086) will increase coverage by 0.07%.
The diff coverage is 97.13%.

@@            Coverage Diff             @@
##           master     #785      +/-   ##
==========================================
+ Coverage   96.33%   96.40%   +0.07%     
==========================================
  Files          98       98              
  Lines        9177     9441     +264     
==========================================
+ Hits         8841     9102     +261     
- Misses        336      339       +3

Files	Coverage Δ
src/imitation/algorithms/bc.py	`98.32% <100.00%> (+<0.01%)`	⬆️
src/imitation/algorithms/preference_comparisons.py	`99.13% <100.00%> (ø)`
src/imitation/data/buffer.py	`95.38% <100.00%> (ø)`
src/imitation/data/rollout.py	`100.00% <100.00%> (+1.38%)`	⬆️
src/imitation/data/wrappers.py	`100.00% <100.00%> (ø)`
src/imitation/policies/base.py	`98.07% <100.00%> (+0.16%)`	⬆️
src/imitation/policies/exploration_wrapper.py	`100.00% <100.00%> (ø)`
src/imitation/rewards/reward_wrapper.py	`98.41% <100.00%> (+0.07%)`	⬆️
tests/algorithms/conftest.py	`100.00% <100.00%> (ø)`
tests/algorithms/test_adversarial.py	`100.00% <ø> (ø)`
... and 15 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

saeed349 · 2024-03-23T13:44:47Z

Appreciate adding this support.
Would be really good to see dict support extended to other algorithms as well.

NixGD added 5 commits September 13, 2023 13:40

first pass of dict obs functionality

5182ecf

cleanup DictObs

61d816b

add dict space to test_types.py, fix some problems

c3331f6

add dict-obs test for rollout

fc9838d

add bc.py test

fb9498b

NixGD commented Sep 14, 2023

View reviewed changes

src/imitation/data/types.py Show resolved Hide resolved

NixGD commented Sep 14, 2023

View reviewed changes

src/imitation/data/types.py Show resolved Hide resolved

NixGD commented Sep 14, 2023

View reviewed changes

src/imitation/data/types.py Outdated Show resolved Hide resolved

NixGD commented Sep 14, 2023

View reviewed changes

src/imitation/data/huggingface_utils.py Show resolved Hide resolved

NixGD added 7 commits September 14, 2023 11:05

cleanup

e54c36c

small fixes

ee04383

small fixes

6e2218a

fix type error in interactive.py

68fe666

fix introduced error in mce_irl.py

9ad2aaf

fix minor ci complaint

67341d5

add basic dictobs tests

c497b56

AdamGleave reviewed Sep 14, 2023

View reviewed changes

NixGD added 2 commits September 14, 2023 15:40

change default bc policy for dict obs space

d3f79bf

refine rollout.py typechecks, comments

2de9e49

NixGD added 3 commits September 14, 2023 15:53

check rollout produces dictobs of correct shape

c47cca6

cleanup types and dictobs helpers

276294b

clean useless lines

071d2a7

NixGD commented Sep 14, 2023

View reviewed changes

tests/data/test_types.py Show resolved Hide resolved

clean up print statements

a2ccd7e

NixGD commented Sep 14, 2023

View reviewed changes

tests/data/test_types.py Show resolved Hide resolved

NixGD and others added 2 commits September 14, 2023 17:48

fix typos

93baa2d

Co-authored-by: Adam Gleave <[email protected]>

assert matching keys in from_obs_list

54f33af

ZiyueWang25 added 18 commits October 3, 2023 22:30

Revert "add RemoveHumanReadableWrapper and update ob space"

27f9dc8

This reverts commit ee83ec5.

Revert "adjust wrapper and not set render_mode inside"

d954fed

This reverts commit a9b32bd.

Revert "fix dict env observation space"

d1131d0

This reverts commit ba6a6a7.

Revert "Add HumanReadableWrapper"

31f8887

This reverts commit 6e5c3e8.

Revert "acts to obs for clarity"

ae9fa64

This reverts commit be79cf5.

Merge branch 'support-dict-obs-space' of github.com:HumanCompatibleAI…

3dfafd0

…/imitation into support-dict-obs-space

address comments

7a2b7ce

new pytype need input directory or file

15541cd

fix np.dtype

6884538

ignore typed-dict-error

5c6e5b8

context manager related fix

5c1d751

keep pytype checking more failures

f5288c6

Revert "keep pytype checking more failures"

6e94dea

This reverts commit f5288c6.

Revert "context manager related fix"

bb1f9cd

This reverts commit 5c1d751.

Revert "ignore typed-dict-error"

a07ea26

This reverts commit 5c6e5b8.

Revert "fix np.dtype"

b2cca2e

This reverts commit 6884538.

Revert "new pytype need input directory or file"

1a24ae5

This reverts commit 15541cd.

Revert "Upgrade pytype and remove workaround for old versions"

b989af8

This reverts commit 194ec1a.

ZiyueWang25 mentioned this pull request Oct 4, 2023

Upgrade pytype #801

Merged

lint fix

4817c2f

ZiyueWang25 added 2 commits October 4, 2023 08:04

fix type check

94c3ecf

fix lint

d5d1918

AdamGleave approved these changes Oct 5, 2023

View reviewed changes

src/imitation/data/rollout.py Show resolved Hide resolved

src/imitation/algorithms/dagger.py Outdated Show resolved Hide resolved

ZiyueWang25 added 2 commits October 4, 2023 18:55

Merge branch 'master' of github.com:HumanCompatibleAI/imitation into …

4df8f83

…support-dict-obs-space

Merge branch 'master' of github.com:HumanCompatibleAI/imitation into …

0af3037

…support-dict-obs-space

AdamGleave merged commit e6d8886 into master Oct 5, 2023
12 of 15 checks passed

AdamGleave deleted the support-dict-obs-space branch October 5, 2023 19:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add partial support for dictionary observation spaces (bc, density) #785

Add partial support for dictionary observation spaces (bc, density) #785

NixGD commented Sep 13, 2023 •

edited by ZiyueWang25

Loading

AdamGleave left a comment

AdamGleave left a comment

AdamGleave Sep 14, 2023

AdamGleave Sep 14, 2023

NixGD Sep 15, 2023

araffin Sep 16, 2023

NixGD Sep 18, 2023

NixGD commented Sep 14, 2023

NixGD commented Sep 15, 2023

ZiyueWang25 commented Oct 4, 2023

AdamGleave left a comment

codecov bot commented Oct 5, 2023 •

edited

Loading

saeed349 commented Mar 23, 2024

Add partial support for dictionary observation spaces (bc, density) #785

Add partial support for dictionary observation spaces (bc, density) #785

Conversation

NixGD commented Sep 13, 2023 • edited by ZiyueWang25 Loading

Description

Testing

AdamGleave left a comment

Choose a reason for hiding this comment

AdamGleave left a comment

Choose a reason for hiding this comment

AdamGleave Sep 14, 2023

Choose a reason for hiding this comment

AdamGleave Sep 14, 2023

Choose a reason for hiding this comment

NixGD Sep 15, 2023

Choose a reason for hiding this comment

araffin Sep 16, 2023

Choose a reason for hiding this comment

NixGD Sep 18, 2023

Choose a reason for hiding this comment

NixGD commented Sep 14, 2023

NixGD commented Sep 15, 2023

ZiyueWang25 commented Oct 4, 2023

AdamGleave left a comment

Choose a reason for hiding this comment

codecov bot commented Oct 5, 2023 • edited Loading

Codecov Report

saeed349 commented Mar 23, 2024

NixGD commented Sep 13, 2023 •

edited by ZiyueWang25

Loading

codecov bot commented Oct 5, 2023 •

edited

Loading