sanity check by training with sac or tqc #64

Armandpl · 2024-01-23T13:17:39Z

I want to train the robot in very few steps and very quickly in terms of wall time but I haven't completed a training run on the robot yet. I should do that first to sanity check, make sure there is nothing wrong with the robot, the laptop/robot coms or the env code.

repro the training run from last time:

200k steps, max th speed 50 rad/s, took ~6 hours, used the action limiter
could use the exact same setup but I'm tempted to try to make it slightly better
use TQC instead of SAC
write DeadZone wrapper that remaps the action to leave a deadzone between -0.3 and 0.3
what do we think about the action limiter? maybe set that up again? or just have the agent be able to spin freely around theta? and make the episodes shorter e.g 400 steps to avoid the pendulum just spinning on itself? try in sim first
if this succeed with the DeadZone wrapper, investigate what happens without it
- use the mcap wrapper and train for like 5k steps
- why is the action very low before learning starts? why is the action too big once learning starts?
- is the fact that the init state is always zeros an issue?

Armandpl · 2024-01-24T17:22:35Z

find how to change stats window w/ sbx

Armandpl · 2024-01-25T16:37:20Z

The robot arm broke so I can't secure it to the motor shaft anymore.
The motor also start making weird noise when the action oscillates too much, sounds like gears are slipping. I disassembled the gearbox and it seems ok. Maybe there is a bit of play and the vibrations are making a gear pop out of place??

print new arm
check tqc is using gSDE
- run 1000 training steps with sac vs. tqc and log the result to mcap files to take a look at the action

Armandpl · 2024-01-25T17:07:57Z

tqc action in sbx:

sac action in sb3:

I feel like sde isn't working with tqc? check in the code, maybe open issue in the repo to ask the question?

Armandpl · 2024-01-26T13:50:25Z

Try sac w/ gSDE in sbx?

Armandpl · 2024-01-29T17:40:18Z

Ok so I trained for 1000 steps using tqc in sbx with gsde on/off and episodic training or not and looked at the action over 100 steps.
gsde=True, train_freq=(1, "episode"):

gsde True, train_freq=1:

gsde True, train_freq=100:

gsde=False, train_freq=(1, "episode"):

Episodic it is then? Or maybe when the policy converges the action gets less noisy and its fine?

Armandpl · 2024-01-29T17:53:32Z

Ok so now we could bench TQC against SAC to gauge which one we should use on the real robot? But if we go this route we should also probably tune the hyper-parameters? but is the tuning going to transfer since it is still unclear how far/close the sim is to the actual robot?
Still worth trying I guess

Maybe just go to hp tuning for tqc since 'we know' it is better?

Armandpl · 2024-01-30T22:28:01Z

In simulation, I can get the agent to converge in ~40k timesteps. 40k timesteps at 50Hz is 15 min in real life. But when training on the real robot it takes hours. It is slow in part because waiting for the pendulum to reset is slow. Maybe we shouldn't reset the robot and let it learn for a long time??

no TimeLimit wrapper. remove the bounds on theta and add a reward on it to keep it from spinning? quickly try in sim
- I guess it will still got out off speed bound at first but it might stop doing that soon in the training?
- should we keeps bounds on alpha though?
- make sure the eval env has a time limit though
set up the training to update the policy every N steps, maybe 64? maybe 256? check if the update is still fast enough
- can't get sbx to work on metal if train_freq!=1 -> longer episodes: 1500
is the gsde noise going to be re-sampled if we never terminate the episode?

Armandpl · 2024-01-31T14:02:33Z

Need to update the way we reset the episode. I set up a PID but it is badly tuned and I think it may have damaged the motor.

Armandpl mentioned this issue Jan 23, 2024

setup RL experiments #51

Open

18 tasks

Armandpl linked a pull request Feb 1, 2024 that will close this issue

improve gym env / tune hps #71

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sanity check by training with sac or tqc #64

sanity check by training with sac or tqc #64

Armandpl commented Jan 23, 2024 •

edited

Loading

Armandpl commented Jan 24, 2024

Armandpl commented Jan 25, 2024 •

edited

Loading

Armandpl commented Jan 25, 2024 •

edited

Loading

Armandpl commented Jan 26, 2024

Armandpl commented Jan 29, 2024 •

edited

Loading

Armandpl commented Jan 29, 2024 •

edited

Loading

Armandpl commented Jan 30, 2024 •

edited

Loading

Armandpl commented Jan 31, 2024

sanity check by training with sac or tqc #64

sanity check by training with sac or tqc #64

Comments

Armandpl commented Jan 23, 2024 • edited Loading

Armandpl commented Jan 24, 2024

Armandpl commented Jan 25, 2024 • edited Loading

Armandpl commented Jan 25, 2024 • edited Loading

Armandpl commented Jan 26, 2024

Armandpl commented Jan 29, 2024 • edited Loading

Armandpl commented Jan 29, 2024 • edited Loading

Armandpl commented Jan 30, 2024 • edited Loading

Armandpl commented Jan 31, 2024

Armandpl commented Jan 23, 2024 •

edited

Loading

Armandpl commented Jan 25, 2024 •

edited

Loading

Armandpl commented Jan 25, 2024 •

edited

Loading

Armandpl commented Jan 29, 2024 •

edited

Loading

Armandpl commented Jan 29, 2024 •

edited

Loading

Armandpl commented Jan 30, 2024 •

edited

Loading