Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add partial support for dictionary observation spaces (bc, density) #785

Merged
merged 89 commits into from
Oct 5, 2023
Merged
Show file tree
Hide file tree
Changes from 24 commits
Commits
Show all changes
89 commits
Select commit Hold shift + click to select a range
5182ecf
first pass of dict obs functionality
NixGD Sep 13, 2023
61d816b
cleanup DictObs
NixGD Sep 13, 2023
c3331f6
add dict space to test_types.py, fix some problems
NixGD Sep 14, 2023
fc9838d
add dict-obs test for rollout
NixGD Sep 14, 2023
fb9498b
add bc.py test
NixGD Sep 14, 2023
e54c36c
cleanup
NixGD Sep 14, 2023
ee04383
small fixes
NixGD Sep 14, 2023
6e2218a
small fixes
NixGD Sep 14, 2023
68fe666
fix type error in interactive.py
NixGD Sep 14, 2023
9ad2aaf
fix introduced error in mce_irl.py
NixGD Sep 14, 2023
67341d5
fix minor ci complaint
NixGD Sep 14, 2023
c497b56
add basic dictobs tests
NixGD Sep 14, 2023
d3f79bf
change default bc policy for dict obs space
NixGD Sep 14, 2023
2de9e49
refine rollout.py typechecks, comments
NixGD Sep 14, 2023
c47cca6
check rollout produces dictobs of correct shape
NixGD Sep 14, 2023
276294b
cleanup types and dictobs helpers
NixGD Sep 14, 2023
071d2a7
clean useless lines
NixGD Sep 14, 2023
a2ccd7e
clean up print statements
NixGD Sep 14, 2023
93baa2d
fix typos
NixGD Sep 15, 2023
54f33af
assert matching keys in from_obs_list
NixGD Sep 15, 2023
c711abf
move maybe_wrap, clean rollout
NixGD Sep 15, 2023
58a0d70
change policy callable to take dict[str, np.ndarray] not dictobs
NixGD Sep 15, 2023
0f080d4
rollout info wrapper supports dictobs
NixGD Sep 15, 2023
c4d3e11
fix from_obs_list key consistency check
NixGD Sep 15, 2023
b93294a
xfail save/load tests with dictobs
NixGD Sep 15, 2023
3f17ff2
doc for dictobs wrapper
NixGD Sep 15, 2023
0212e0e
don't error on int observations
NixGD Sep 15, 2023
070ebf9
lint fixes
NixGD Sep 15, 2023
657e17e
cleanup bc test for dict obs
NixGD Sep 15, 2023
1f8c12a
cleanup bc.py unwrapping
NixGD Sep 15, 2023
bd70ecd
cleanup rollout.py
NixGD Sep 15, 2023
bec464c
cleanup dictobs interface
NixGD Sep 15, 2023
bef19e6
small cleanups
NixGD Sep 15, 2023
9aaf73f
coverage fixes, test fix
NixGD Sep 15, 2023
5d6aa77
adjust error types
NixGD Sep 15, 2023
86fbcf1
docstrings for type helpers
NixGD Sep 15, 2023
8d1e0d6
add dict obs space support for density
NixGD Sep 15, 2023
96978d5
fix typos
NixGD Sep 15, 2023
e95df9d
Adam suggestions from code review
NixGD Sep 16, 2023
161ec95
small changes for code review
NixGD Sep 16, 2023
90bdf57
fix docstring
NixGD Sep 16, 2023
6aa25ff
remove FloatReward
ZiyueWang25 Oct 2, 2023
bf48c76
Merge remote-tracking branch 'origin/master' into support-dict-obs-space
ZiyueWang25 Oct 2, 2023
4ce1b57
Fix test_bc
ZiyueWang25 Oct 2, 2023
de1b1c8
Turn off GPU finding to avoid using gpu device
ZiyueWang25 Oct 2, 2023
1a1a458
Check None to ensure __add__ can work
ZiyueWang25 Oct 2, 2023
f7866f4
fix docstring
ZiyueWang25 Oct 2, 2023
daa838d
bypass pytype and lint test
ZiyueWang25 Oct 2, 2023
803eab0
format with black
ZiyueWang25 Oct 2, 2023
0ac6f54
Test dict space in density algo
ZiyueWang25 Oct 2, 2023
be9798b
black format
ZiyueWang25 Oct 2, 2023
c7e6809
small fix
ZiyueWang25 Oct 2, 2023
82fb558
Add DictObs into test_wrappers
ZiyueWang25 Oct 3, 2023
03714cc
fix format
ZiyueWang25 Oct 3, 2023
187e881
minor fix
ZiyueWang25 Oct 3, 2023
ae96521
type and lint fix
ZiyueWang25 Oct 3, 2023
535a986
Add policy training test
ZiyueWang25 Oct 3, 2023
de027c4
suppress line too long lint check on a line
ZiyueWang25 Oct 3, 2023
be79cf5
acts to obs for clarity
ZiyueWang25 Oct 3, 2023
6e5c3e8
Add HumanReadableWrapper
ZiyueWang25 Oct 3, 2023
ba6a6a7
fix dict env observation space
ZiyueWang25 Oct 3, 2023
a9b32bd
adjust wrapper and not set render_mode inside
ZiyueWang25 Oct 3, 2023
77eab66
Add additional obs check
AdamGleave Oct 4, 2023
194ec1a
Upgrade pytype and remove workaround for old versions
AdamGleave Oct 4, 2023
44b357e
Fix test_rollout test
AdamGleave Oct 4, 2023
ee83ec5
add RemoveHumanReadableWrapper and update ob space
ZiyueWang25 Oct 4, 2023
27f9dc8
Revert "add RemoveHumanReadableWrapper and update ob space"
ZiyueWang25 Oct 4, 2023
d954fed
Revert "adjust wrapper and not set render_mode inside"
ZiyueWang25 Oct 4, 2023
d1131d0
Revert "fix dict env observation space"
ZiyueWang25 Oct 4, 2023
31f8887
Revert "Add HumanReadableWrapper"
ZiyueWang25 Oct 4, 2023
ae9fa64
Revert "acts to obs for clarity"
ZiyueWang25 Oct 4, 2023
3dfafd0
Merge branch 'support-dict-obs-space' of github.com:HumanCompatibleAI…
ZiyueWang25 Oct 4, 2023
7a2b7ce
address comments
ZiyueWang25 Oct 4, 2023
15541cd
new pytype need input directory or file
ZiyueWang25 Oct 4, 2023
6884538
fix np.dtype
ZiyueWang25 Oct 4, 2023
5c6e5b8
ignore typed-dict-error
ZiyueWang25 Oct 4, 2023
5c1d751
context manager related fix
ZiyueWang25 Oct 4, 2023
f5288c6
keep pytype checking more failures
ZiyueWang25 Oct 4, 2023
6e94dea
Revert "keep pytype checking more failures"
ZiyueWang25 Oct 4, 2023
bb1f9cd
Revert "context manager related fix"
ZiyueWang25 Oct 4, 2023
a07ea26
Revert "ignore typed-dict-error"
ZiyueWang25 Oct 4, 2023
b2cca2e
Revert "fix np.dtype"
ZiyueWang25 Oct 4, 2023
1a24ae5
Revert "new pytype need input directory or file"
ZiyueWang25 Oct 4, 2023
b989af8
Revert "Upgrade pytype and remove workaround for old versions"
ZiyueWang25 Oct 4, 2023
4817c2f
lint fix
ZiyueWang25 Oct 4, 2023
94c3ecf
fix type check
ZiyueWang25 Oct 4, 2023
d5d1918
fix lint
ZiyueWang25 Oct 4, 2023
4df8f83
Merge branch 'master' of github.com:HumanCompatibleAI/imitation into …
ZiyueWang25 Oct 5, 2023
0af3037
Merge branch 'master' of github.com:HumanCompatibleAI/imitation into …
ZiyueWang25 Oct 5, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@
ATARI_REQUIRE = [
"seals[atari]~=0.2.1",
]
PYTYPE = ["pytype==2023.9.27"] if IS_NOT_WINDOWS else []
PYTYPE = ["pytype==2022.7.26"] if IS_NOT_WINDOWS else []

# Note: the versions of the test and doc requirements should be tightly pinned to known
# working versions to make our CI/CD pipeline as stable as possible.
Expand Down
2 changes: 1 addition & 1 deletion src/imitation/algorithms/dagger.py
Original file line number Diff line number Diff line change
Expand Up @@ -508,7 +508,7 @@ def create_trajectory_collector(self) -> InteractiveTrajectoryCollector:
beta = self.beta_schedule(self.round_num)
collector = InteractiveTrajectoryCollector(
venv=self.venv,
get_robot_acts=lambda obs: self.bc_trainer.policy.predict(obs)[0],
get_robot_acts=lambda acts: self.bc_trainer.policy.predict(acts)[0],
ZiyueWang25 marked this conversation as resolved.
Show resolved Hide resolved
beta=beta,
save_dir=save_dir,
rng=self.rng,
Expand Down
9 changes: 5 additions & 4 deletions src/imitation/algorithms/density.py
Original file line number Diff line number Diff line change
Expand Up @@ -145,14 +145,15 @@ def _get_demo_from_batch(
)

assert act_b.shape[1:] == self.venv.action_space.shape

ob_space = self.venv.observation_space
if isinstance(obs_b, types.DictObs):
exp_shape = {k: v.shape for k, v in self.venv.observation_space.items()} # type: ignore[attr-defined] # noqa: E501

exp_shape = {
k: v.shape for k, v in ob_space.items() # type: ignore[attr-defined]
}
obs_shape = {k: v.shape[1:] for k, v in obs_b.items()}
assert exp_shape == obs_shape, f"Expected {exp_shape}, got {obs_shape}"
else:
assert obs_b.shape[1:] == self.venv.observation_space.shape
assert obs_b.shape[1:] == ob_space.shape
assert len(act_b) == len(obs_b)
if next_obs_b is not None:
assert next_obs_b.shape == obs_b.shape
Expand Down
7 changes: 3 additions & 4 deletions src/imitation/data/rollout.py
Original file line number Diff line number Diff line change
Expand Up @@ -490,10 +490,9 @@ def generate_trajectories(
assert v.shape is not None
exp_obs[k] = (n_steps + 1,) + v.shape
else:
assert venv.observation_space.shape is not None
exp_obs = (
n_steps + 1,
) + venv.observation_space.shape # type: ignore[assignment]
obs_space_shape = venv.observation_space.shape
assert obs_space_shape is not None
exp_obs = (n_steps + 1,) + obs_space_shape # type: ignore[assignment]
real_obs = trajectory.obs.shape
assert real_obs == exp_obs, f"expected shape {exp_obs}, got {real_obs}"
assert venv.action_space.shape is not None
Expand Down
61 changes: 2 additions & 59 deletions src/imitation/data/wrappers.py
Original file line number Diff line number Diff line change
@@ -1,18 +1,14 @@
"""Environment wrappers for collecting rollouts."""

from typing import List, Optional, Sequence, Tuple, Dict, Union
from typing import List, Optional, Sequence, Tuple

import gymnasium as gym
from gymnasium.core import Env
import numpy as np
import numpy.typing as npt
from stable_baselines3.common.vec_env import VecEnv, VecEnvWrapper

from imitation.data import rollout, types

# The key for human readable data in the observation.
HR_OBS_KEY = "HR_OBS"


class BufferingWrapper(VecEnvWrapper):
"""Saves transitions of underlying VecEnv.
Expand Down Expand Up @@ -174,7 +170,7 @@ def pop_transitions(self) -> types.TransitionsWithRew:


class RolloutInfoWrapper(gym.Wrapper):
"""Adds the entire episode's rewards and observations to `info` at episode end.
"""Add the entire episode's rewards and observations to `info` at episode end.

Whenever done=True, `info["rollouts"]` is a dict with keys "obs" and "rews", whose
corresponding values hold the NumPy arrays containing the raw observations and
Expand Down Expand Up @@ -210,56 +206,3 @@ def step(self, action):
"rews": np.stack(self._rews),
}
return obs, rew, terminated, truncated, info


class HumanReadableWrapper(gym.Wrapper):
"""Adds human-readable observation to `obs` at every step."""

def __init__(self, env: Env, original_obs_key: str = "ORI_OBS"):
"""Builds HumanReadableWrapper

Args:
env: Environment to wrap.
original_obs_key: The key for original observation if the original
observation is not in dict format.
"""
env.render_mode = "rgb_array"
self._original_obs_key = original_obs_key
super().__init__(env)

def _add_hr_obs(
self, obs: Union[np.ndarray, Dict[str, np.ndarray]]
) -> Dict[str, np.ndarray]:
"""Adds human-readable observation to obs.

Transforms obs into dictionary if it is not already, and adds the human-readable
observation from `env.render()` under the key HR_OBS_KEY.

Args:
obs: Observation from environment.

Returns:
Observation dictionary with the human-readable data

Raises:
KeyError: When the key HR_OBS_KEY already exists in the observation
dictionary.
"""
assert self.env.render_mode is not None
assert self.env.render_mode == "rgb_array"
hr_obs = self.env.render()
if not isinstance(obs, Dict):
obs = {self._original_obs_key: obs}

if HR_OBS_KEY in obs:
raise KeyError(f"{HR_OBS_KEY!r} already exists in observation dict")
obs[HR_OBS_KEY] = hr_obs
return obs

def reset(self, **kwargs):
obs, info = super().reset(**kwargs)
return self._add_hr_obs(obs), info

def step(self, action):
obs, rew, terminated, truncated, info = self.env.step(action)
return self._add_hr_obs(obs), rew, terminated, truncated, info
23 changes: 22 additions & 1 deletion tests/algorithms/conftest.py
Original file line number Diff line number Diff line change
@@ -1,9 +1,11 @@
"""Fixtures common across algorithm tests."""
from typing import Sequence

import gymnasium as gym
import pytest
from stable_baselines3.common import envs
from stable_baselines3.common.policies import BasePolicy
from stable_baselines3.common.vec_env import VecEnv
from stable_baselines3.common.vec_env import DummyVecEnv, VecEnv

from imitation.algorithms import bc
from imitation.data.types import TrajectoryWithRew
Expand Down Expand Up @@ -109,3 +111,22 @@ def pendulum_single_venv(rng) -> VecEnv:
post_wrappers=[lambda env, _: RolloutInfoWrapper(env)],
rng=rng,
)


# TODO(GH#794): Remove after https://github.com/DLR-RM/stable-baselines3/pull/1676
# merged and released.
class FloatReward(gym.RewardWrapper):
"""Typecasts reward to a float."""

def reward(self, reward):
return float(reward)


@pytest.fixture
def multi_obs_venv() -> VecEnv:
def make_env():
env = envs.SimpleMultiObsEnv(channel_last=False)
env = FloatReward(env)
return RolloutInfoWrapper(env)

return DummyVecEnv([make_env, make_env])
5 changes: 4 additions & 1 deletion tests/algorithms/test_base.py
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,10 @@ def test_check_fixed_horizon_flag(custom_logger):


def _make_and_iterate_loader(*args, **kwargs):
loader = base.make_data_loader(*args, **kwargs)
# our pytype version doesn't understand optional arguments in TypedDict
AdamGleave marked this conversation as resolved.
Show resolved Hide resolved
# this is fixed in 2023.04.11, but we require 2022.7.26
# See https://github.com/google/pytype/issues/1195
loader = base.make_data_loader(*args, **kwargs) # pytype: disable=wrong-arg-types
for batch in loader:
pass

Expand Down
30 changes: 7 additions & 23 deletions tests/algorithms/test_bc.py
ZiyueWang25 marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,6 @@
import numpy as np
import pytest
import torch as th
from stable_baselines3.common import envs as sb_envs
from stable_baselines3.common import evaluation
from stable_baselines3.common import policies as sb_policies
from stable_baselines3.common import vec_env
Expand Down Expand Up @@ -291,44 +290,29 @@ def test_that_policy_reconstruction_preserves_parameters(
th.testing.assert_close(original, reconstructed)


# TODO(GH#794): Remove after https://github.com/DLR-RM/stable-baselines3/pull/1676
# merged and released.
class FloatReward(gym.RewardWrapper):
"""Typecasts reward to a float."""

def reward(self, reward):
return float(reward)


def test_dict_space():
def make_env():
env = sb_envs.SimpleMultiObsEnv(channel_last=False)
env = FloatReward(env)
return RolloutInfoWrapper(env)

env = vec_env.DummyVecEnv([make_env, make_env])

def test_dict_space(multi_obs_venv: vec_env.VecEnv):
# multi-input policy to accept dict observations
assert isinstance(multi_obs_venv.observation_space, gym.spaces.Dict)
policy = sb_policies.MultiInputActorCriticPolicy(
env.observation_space,
env.action_space,
multi_obs_venv.observation_space,
multi_obs_venv.action_space,
lambda _: 0.001,
)
rng = np.random.default_rng()

# sample random transitions
rollouts = rollout.rollout(
policy=None,
venv=env,
venv=multi_obs_venv,
sample_until=rollout.make_sample_until(min_timesteps=None, min_episodes=50),
rng=rng,
unwrap=True,
)
transitions = rollout.flatten_trajectories(rollouts)
bc_trainer = bc.BC(
observation_space=env.observation_space,
observation_space=multi_obs_venv.observation_space,
policy=policy,
action_space=env.action_space,
action_space=multi_obs_venv.action_space,
rng=rng,
demonstrations=transitions,
)
Expand Down
27 changes: 5 additions & 22 deletions tests/algorithms/test_density_baselines.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,13 +7,11 @@
import numpy as np
import pytest
import stable_baselines3
from stable_baselines3.common import envs as sb_envs
from stable_baselines3.common import policies, vec_env

from imitation.algorithms.density import DensityAlgorithm, DensityType
from imitation.data import rollout, types
from imitation.data.types import TrajectoryWithRew
from imitation.data.wrappers import RolloutInfoWrapper
from imitation.policies.base import RandomPolicy
from imitation.testing import reward_improvement

Expand Down Expand Up @@ -172,27 +170,12 @@ def test_density_trainer_raises(
density_trainer.set_demonstrations("foo") # type: ignore[arg-type]


# TODO(GH#794): Remove after https://github.com/DLR-RM/stable-baselines3/pull/1676
# merged and released.
class FloatReward(gym.RewardWrapper):
"""Typecasts reward to a float."""

def reward(self, reward):
return float(reward)


def test_dict_space():
def make_env():
env = sb_envs.SimpleMultiObsEnv(channel_last=False)
env = FloatReward(env)
return RolloutInfoWrapper(env)

venv = vec_env.DummyVecEnv([make_env, make_env])

def test_dict_space(multi_obs_venv: vec_env.VecEnv):
# multi-input policy to accept dict observations
assert isinstance(multi_obs_venv.observation_space, gym.spaces.Dict)
rl_algo = stable_baselines3.PPO(
policies.MultiInputActorCriticPolicy,
venv,
multi_obs_venv,
n_steps=10, # small value to make test faster
n_epochs=2, # small value to make test faster
)
Expand All @@ -202,14 +185,14 @@ def make_env():
sample_until = rollout.make_min_episodes(15)
rollouts = rollout.rollout(
policy=None,
venv=venv,
venv=multi_obs_venv,
sample_until=sample_until,
rng=rng,
)
density_trainer = DensityAlgorithm(
demonstrations=rollouts,
kernel="gaussian",
venv=venv,
venv=multi_obs_venv,
rl_algo=rl_algo,
kernel_bandwidth=0.2,
standardise_inputs=True,
Expand Down
3 changes: 2 additions & 1 deletion tests/data/test_rollout.py
Original file line number Diff line number Diff line change
Expand Up @@ -423,5 +423,6 @@ def test_dictionary_observations(rng):
)
for traj in trajs:
assert isinstance(traj.obs, types.DictObs)
AdamGleave marked this conversation as resolved.
Show resolved Hide resolved
assert venv.observation_space.contains(obs)
for obs in traj.obs:
assert venv.observation_space.contains(dict(obs.items()))
np.testing.assert_allclose(traj.obs.get("a") / 2, traj.obs.get("b"))
Loading
Loading