Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added A ready notebook for Dagger baseline #236

Draft
wants to merge 11 commits into
base: daffy
Choose a base branch
from
124 changes: 102 additions & 22 deletions learning/imitation/iil-dagger/README.md
Original file line number Diff line number Diff line change
@@ -1,39 +1,119 @@
# Imitation Learning using Dataset Aggregation
# Imitation Learning

## Introduction
In this baseline we train a small squeezenet model on expert trajectories to simply clone the behaviour of the expert.
Using only the expert trajectories would result in a model unable to recover from non-optimal positions ,Hence we use a technique called DAgger a dataset aggregation technique with mixed policies between expert and model.
This technique of random mixing would help the model learn a more general trajectory than the optimal one provided by the expert alone.

## Quickstart
1) Clone this [repo](https://github.com/duckietown/gym-duckietown):
In this baseline we train a small squeezenet model on expert trajectories to simply clone the behavior of the expert.
Using only the expert trajectories would result in a model unable to recover from non-optimal positions; Instead, we use a technique called DAgger: a dataset aggregation technique with mixed policies between expert and model.

$ git clone https://github.com/duckietown/gym-duckietown.git
## Quick start

2) Change into the directory:
Use the jupyter notebook notebook.ipynb to quickly start training and testing the imitation learning Dagger.

$ cd gym-duckietown
## Detailed Steps

3) Install the package:
### Clone the repo

$ pip3 install -e .
Clone this [repo](https://github.com/duckietown/gym-duckietown):

4) Start training:
$ git clone https://github.com/duckietown/gym-duckietown.git

$ python -m learning.imitation.iil-dagger.train
$ cd gym-duckietown

5) Test the trained agent specifying the saved model:
### Installing Packages

$ python -m learning.imitation.pytorch-v2.test --model-path ![path]
$ pip3 install -e .

## Training

## Acknowledgement
- We started from previous work done by Manfred Díaz as a boilerplate and we would like to thank him for his full support with code and answering our questions
$ python -m learning.imitation.iil-dagger.train

### Arguments

* --episode: number of episodes
* --horizon: number of steps per episode
* --learning-rate: index of learning rate from list [1e-1, 1e-2, 1e-3, 1e-4, 1e-5]
* --decay: mixing decay between expert and learner [0.5, 0.6, 0.7, 0.8, 0.85, 0.9, 0.95]
* --save-path: directory used to save output model
* --map-name: name of the map used during the training
* --num-outputs: specify number of outputs from the learner model 1 to predict only angular velocity with fixed speed and 2 to predict both of them
* --domain-rand: flag to enable domain randomization to rbe able to transfer trained model to real world.
* --randomize-map: randomize training maps on reset

## Testing

$ python -m learning.imitation.iil-dagger.test

### Arguments

* --model-path: path of the model to be tested
* --episode: number of episodes
* --horizon: number of steps per episode

## Submitting
Use [Pytorch RL Template](https://github.com/duckietown/challenge-aido_LF-template-pytorch) and replace model with the model trained in model/squeezenet.py
and use the following code snippet to convert speed and angular velocity to pwm left and right.
``` Python
velocity, omega = self.compute_action(self.current_image)

# assuming same motor constants k for both motors
k_r = 27.0
k_l = 27.0
gain = 1.0
trim = 0.0

# adjusting k by gain and trim
k_r_inv = (gain + trim) / k_r
k_l_inv = (gain - trim) / k_l
wheel_dist = 0.102
radius=0.0318

omega_r = (velocity + 0.5 * omega * wheel_dist) / radius
omega_l = (velocity - 0.5 * omega * wheel_dist) / radius

# conversion from motor rotation rate to duty cycle
u_r = omega_r * k_r_inv
u_l = omega_l * k_l_inv

# limiting output to limit, which is 1.0 for the duckiebot
pwm_right = max(min(u_r, 1), -1)
pwm_left = max(min(u_l, 1), -1)

```

## Acknowledgment

* We started from previous work done by Manfred Díaz as a boilerplate, and we would like to thank him for his full support with code and answering our questions.

## Authors
- [Mostafa ElAraby ](https://www.mostafaelaraby.com/)
- [Linkedin](https://linkedin.com/in/mostafaelaraby)
- Ramon Emiliani
- [Linkedin](https://www.linkedin.com/in/ramonemiliani)

* [Mostafa ElAraby ](https://www.mostafaelaraby.com/)
+ [Linkedin](https://linkedin.com/in/mostafaelaraby)
* Ramon Emiliani
+ [Linkedin](https://www.linkedin.com/in/ramonemiliani)

## References
- Implementation idea and code skeleton based on Diaz Cabrera, Manfred Ramon (2018)Interactive and Uncertainty-aware Imitation Learning: Theory and Applications. Masters thesis, Concordia University.

```

@phdthesis{diaz2018interactive,
title={Interactive and Uncertainty-aware Imitation Learning: Theory and Applications},
author={Diaz Cabrera, Manfred Ramon},
year={2018},
school={Concordia University}
}

@inproceedings{ross2011reduction,
title={A reduction of imitation learning and structured prediction to no-regret online learning},
author={Ross, St{\'e}phane and Gordon, Geoffrey and Bagnell, Drew},
booktitle={Proceedings of the fourteenth international conference on artificial intelligence and statistics},
pages={627--635},
year={2011}
}

@article{iandola2016squeezenet,
title={SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 0.5 MB model size},
author={Iandola, Forrest N and Han, Song and Moskewicz, Matthew W and Ashraf, Khalid and Dally, William J and Keutzer, Kurt},
journal={arXiv preprint arXiv:1602.07360},
year={2016}
}
```
Original file line number Diff line number Diff line change
Expand Up @@ -148,7 +148,7 @@ def _transform(self, observations, expert_actions):
]
)

observations = [compose_obs(observation).numpy() for observation in observations]
observations = [compose_obs(observation).cpu().numpy() for observation in observations]
try:
# scaling velocity to become in 0-1 range which is multiplied by max speed to get actual vel
# also scaling steering angle to become in range -1 to 1 to make it easier to regress
Expand All @@ -158,7 +158,7 @@ def _transform(self, observations, expert_actions):
]
except:
pass
expert_actions = [torch.tensor(expert_action).numpy() for expert_action in expert_actions]
expert_actions = [torch.tensor(expert_action).cpu().numpy() for expert_action in expert_actions]

return observations, expert_actions

Expand Down
6 changes: 3 additions & 3 deletions learning/imitation/iil-dagger/model/squeezenet.py
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ def __init__(self, num_outputs=2, max_velocity=0.7, max_steering=np.pi / 2):
self._device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
self.model = models.squeezenet1_1()
self.num_outputs = num_outputs
self.max_velocity_tensor = torch.tensor(max_velocity).to(self._device)
self.max_velocity_tensor = torch.tensor([max_velocity]).to(self._device)
self.max_steering = max_steering

# using a subset of full squeezenet for input image features
Expand Down Expand Up @@ -117,12 +117,12 @@ def predict(self, *args):
output = self.model(images)
if self.num_outputs == 1:
omega = output
v_tensor = self.max_velocity_tensor.clone()
v_tensor = self.max_velocity_tensor.clone().unsqueeze(1)
else:
v_tensor = output[:, 0].unsqueeze(1)
omega = output[:, 1].unsqueeze(1) * self.max_steering
output = torch.cat((v_tensor, omega), 1).squeeze().detach()
return output
return output.cpu().numpy()


if __name__ == "__main__":
Expand Down
Loading