Add rgb observation to dagger #802

ZiyueWang25 · 2023-10-04T16:15:55Z

Description

Add an environment wrapper to keep the original observation and rgb version together for interactive policy
Remove the rgb observation and its space in the bc algo.
Test the interactive policy can work with the wrapped environments.

Testing

Unit test + test by examples.

Co-authored-by: Adam Gleave <[email protected]>

…795-rgb-obs-2

ZiyueWang25 · 2023-10-05T16:03:45Z

In the files changed part, some of them are from master branch. This is because this branch is initialized from a certain commit of support-dict-obs-space branch rather than the latest condition. This would be more clear once support-dict-obs-space get merged and I can change its base to master.

…795-rgb-obs-2

codecov · 2023-10-05T21:19:24Z

Codecov Report

Merging #802 (33162b2) into master (e6d8886) will decrease coverage by 0.06%.
The diff coverage is 93.16%.

@@            Coverage Diff             @@
##           master     #802      +/-   ##
==========================================
- Coverage   96.40%   96.35%   -0.06%     
==========================================
  Files          98      100       +2     
  Lines        9441     9582     +141     
==========================================
+ Hits         9102     9233     +131     
- Misses        339      349      +10

Files	Coverage Δ
src/imitation/algorithms/dagger.py	`100.00% <100.00%> (ø)`
src/imitation/data/types.py	`97.36% <ø> (ø)`
tests/data/test_types.py	`100.00% <100.00%> (ø)`
tests/policies/test_interactive.py	`100.00% <100.00%> (ø)`
tests/policies/test_obs_update_wrapper.py	`100.00% <100.00%> (ø)`
tests/data/test_wrappers.py	`99.30% <97.56%> (-0.70%)`	⬇️
src/imitation/data/wrappers.py	`98.36% <87.50%> (-1.64%)`	⬇️
src/imitation/policies/interactive.py	`94.73% <66.66%> (-3.42%)`	⬇️
src/imitation/policies/obs_update_wrapper.py	`88.37% <88.37%> (ø)`

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

AdamGleave

Thanks for this PR! I like the implementation of HumanReadableWrapper and the test cases, very clean.

However I don't understand the design decision of how this was integrated with DAgger. As I see it right now it seems like we're special-casing some logic into BC and DAgger to catch (and strip out) RGB observations from the learnable policy. Generally it seems preferable to have each part of the code responsible for a small, discrete chunk. One natural way of splitting it is the algorithms are responsible for the training process, the policies as responsible for how the observations are processed and converted to actions, and other parts of the code as responsible for data collection etc. Putting logic to strip RGB observations out into the algorithms is breaking this abstraction hierarchy.

Sometimes it is necessary to break abstractions but I'd like to better understand what we're gaining from this and what the alternatives look like before committing to this. To sketch one alternative (which I suspect will need some modification): what if we made policies (or the policies' feature extractors) responsible for removing RGB observations instead? We might be able to do this with a wrapper for a policy, or just specifying a custom feature extractor (we might need some small changes to the algorithms to let callee specify feature extractor, but it'd be a more generic change). There may well be pitfalls to this approach as well -- would love to hear your thoughts.

src/imitation/algorithms/bc.py

src/imitation/algorithms/dagger.py

src/imitation/data/wrappers.py

tests/data/test_wrappers.py

ZiyueWang25 · 2023-10-06T21:19:34Z

Thank you for the detailed comments! I agree with your idea and I think the original design is flawed. I update the design to use a policy wrapper. I choose it instead of feature extractor because feature_extractor works on the tensor level and the wrapper can work on the np.ndarray and dict[str, np.ndarray] level, which is closer to the input. I don't have a strong preference about it though.

In this new design, like you said, we don't need to modify any algorithm level code to fix the data level issue. Please take another look.

AdamGleave

@ZiyueWang25 please don't make any further changes to this PR now your period with us is up -- safe travels back to Seattle and have a good weekend :) We'll find someone else to finish off the PR. Including this for informational value for you and for the next developer to pick up on.

It looks like HumanReadableWrapper ends up with an observation space that is inconsistent with observations returned (does not include the human-readable component that is added). This violates the Gynmasium API and is likely to cause problems somewhere down the road: e.g. if algorithms allocate buffers to store observations based on the declared observation space. This should be fixed.

I suspect having the observation space be unchanged made the policy wrapper easier. It's workable to do it without, but might need to mangle the observation space being fed into the underlying policy. This along with the need for policy wrapper to be specialized to particular kinds (ActorCriticPolicy, OffPolicy, maybe even SACPolicy) makes me think feature extractors are likely the cleaner solution here.

AdamGleave · 2023-10-07T00:49:26Z

examples/train_dagger_atari_interactive_policy.py

+
+def lr_schedule(_: float):
+    # Set lr_schedule to max value to force error if policy.optimizer
+    # is used by mistake (should use self.optimizer instead).


Copy-and-pasted comment doesn't make sense out of context (what is self here?)

AdamGleave · 2023-10-07T00:57:13Z

src/imitation/policies/obs_update_wrapper.py

+from imitation.data import wrappers as data_wrappers
+
+
+class Base(ActorCriticPolicy, abc.ABC):


Given the need to inherit from ActorCriticPolicy, I suggest we instead implement a feature extractor sub-classing CombinedExtractor, having it skip the key (I think this is as simple as modifying observation_space passed through to the constructor of CombinedExtractor). Then set that feature extractor in the policy, and it should work with any kind of policy.

Nit: Base is a bit vague, base of what? Base policy wraper? Base actor critic policy wrapper?

AdamGleave · 2023-10-07T00:59:34Z

src/imitation/policies/obs_update_wrapper.py

+        else:
+            full_std = True
+            use_expln = False
+        super().__init__(


This is a bit hacky although I don't see a better way of doing this. It's a limitation/gap in the SB3 API that there isn't a PolicyWrapper class -- but this is perhaps also a sign that wrapping policies will fit awkwardly into the API (sorry for putting you on the wrong trail).

AdamGleave · 2023-10-07T01:03:25Z

src/imitation/policies/obs_update_wrapper.py

+        raise ValueError(
+            "Only human readable observation exists, can't remove it",
+        )
+    # keeps the original observation unchanged in case it is used elsewhere.


👍 yes good to avoid side effects where possible, and copying a dict should be cheap as it's just a shallow-copy

AdamGleave · 2023-10-07T01:07:16Z

tests/data/test_wrappers.py

-        )
-    assert isinstance(env.observation_space, gym.spaces.Dict)
-    _check_obs_or_space_equal(env.observation_space, expected_obs_space)
+    assert hr_env.observation_space == ori_env.observation_space


How can this be true? Does the observation returned by step() not belong to the observation space (that'd violate the Gymnasium API I think)? Or are we encoding the human readable information somewhere other than the observation?

AdamGleave · 2023-10-07T01:08:45Z

src/imitation/data/wrappers.py

@@ -235,30 +234,8 @@ def __init__(self, env: Env, original_obs_key: str = "ORI_OBS"):
            )
        self._original_obs_key = original_obs_key
        super().__init__(env)
-        self._update_obs_space()


I think we still need the observation space update. From the Gynmasium docs:

The transformation defined in that method must be reflected by the env observation space. Otherwise, you need to specify the new observation space of the wrapper by setting self.observation_space in the init() method of your wrapper.

NixGD and others added 30 commits September 13, 2023 13:40

first pass of dict obs functionality

5182ecf

cleanup DictObs

61d816b

add dict space to test_types.py, fix some problems

c3331f6

add dict-obs test for rollout

fc9838d

add bc.py test

fb9498b

cleanup

e54c36c

small fixes

ee04383

small fixes

6e2218a

fix type error in interactive.py

68fe666

fix introduced error in mce_irl.py

9ad2aaf

fix minor ci complaint

67341d5

add basic dictobs tests

c497b56

change default bc policy for dict obs space

d3f79bf

refine rollout.py typechecks, comments

2de9e49

check rollout produces dictobs of correct shape

c47cca6

cleanup types and dictobs helpers

276294b

clean useless lines

071d2a7

clean up print statements

a2ccd7e

fix typos

93baa2d

Co-authored-by: Adam Gleave <[email protected]>

assert matching keys in from_obs_list

54f33af

move maybe_wrap, clean rollout

c711abf

change policy callable to take dict[str, np.ndarray] not dictobs

58a0d70

rollout info wrapper supports dictobs

0f080d4

fix from_obs_list key consistency check

c4d3e11

xfail save/load tests with dictobs

b93294a

doc for dictobs wrapper

3f17ff2

don't error on int observations

0212e0e

lint fixes

070ebf9

cleanup bc test for dict obs

657e17e

cleanup bc.py unwrapping

1f8c12a

ZiyueWang25 mentioned this pull request Oct 5, 2023

Add partial support for dictionary observation spaces (bc, density) #785

Merged

5 tasks

ZiyueWang25 added 7 commits October 4, 2023 19:42

change ob to obs

5dd1699

allow not only dict type obs in dagger

8481890

fix lint and test

b0d8d4b

fix type and test

12b60b2

Merge branch 'master' of github.com:HumanCompatibleAI/imitation into …

afbbe46

…795-rgb-obs-2

fix type

5036fcf

resolve typing issue

3f23de1

ZiyueWang25 mentioned this pull request Oct 5, 2023

Part of solution to #795: add HumanReadableWrapper #796

Closed

ZiyueWang25 changed the title ~~795 rgb obs 2~~ Add rgb observation to dagger Oct 5, 2023

Remove wrong type annotation in test

a44b193

ZiyueWang25 requested a review from AdamGleave October 5, 2023 16:03

Base automatically changed from support-dict-obs-space to master October 5, 2023 19:32

ZiyueWang25 added 2 commits October 5, 2023 13:26

Merge branch 'master' of github.com:HumanCompatibleAI/imitation into …

1073967

…795-rgb-obs-2

resolve conflict

ae17588

AdamGleave reviewed Oct 6, 2023

View reviewed changes

ZiyueWang25 added 8 commits October 6, 2023 08:28

add policy wrapper

f140035

small fix

9a00e68

fix the data and policy wrappers

052cf00

Use ObservationWrapper

c63761c

update naming

468f621

update tests

3c6def5

update tests

68d1ac2

update demo

125d19d

ZiyueWang25 added 2 commits October 6, 2023 14:21

rgb to hr

f8ebbc4

small fix

33162b2

AdamGleave requested changes Oct 7, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add rgb observation to dagger #802

Add rgb observation to dagger #802

ZiyueWang25 commented Oct 4, 2023

ZiyueWang25 commented Oct 5, 2023

codecov bot commented Oct 5, 2023 •

edited

Loading

AdamGleave left a comment

ZiyueWang25 commented Oct 6, 2023

AdamGleave left a comment

AdamGleave Oct 7, 2023

AdamGleave Oct 7, 2023

AdamGleave Oct 7, 2023

AdamGleave Oct 7, 2023

AdamGleave Oct 7, 2023

AdamGleave Oct 7, 2023

AdamGleave Oct 7, 2023

		from imitation.data import wrappers as data_wrappers


		class Base(ActorCriticPolicy, abc.ABC):

Add rgb observation to dagger #802

Are you sure you want to change the base?

Add rgb observation to dagger #802

Conversation

ZiyueWang25 commented Oct 4, 2023

Description

Testing

ZiyueWang25 commented Oct 5, 2023

codecov bot commented Oct 5, 2023 • edited Loading

Codecov Report

AdamGleave left a comment

Choose a reason for hiding this comment

ZiyueWang25 commented Oct 6, 2023

AdamGleave left a comment

Choose a reason for hiding this comment

AdamGleave Oct 7, 2023

Choose a reason for hiding this comment

AdamGleave Oct 7, 2023

Choose a reason for hiding this comment

AdamGleave Oct 7, 2023

Choose a reason for hiding this comment

AdamGleave Oct 7, 2023

Choose a reason for hiding this comment

AdamGleave Oct 7, 2023

Choose a reason for hiding this comment

AdamGleave Oct 7, 2023

Choose a reason for hiding this comment

AdamGleave Oct 7, 2023

Choose a reason for hiding this comment

codecov bot commented Oct 5, 2023 •

edited

Loading