You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The way in which learning rate schedulers are parametrised has led to confusion.
Particularly torch's LambdaLR is used throughout Tianshou. While we use it correctly (as far as I can tell), the terminology that is used can confuse users (and has confused a user in #1157). The reason for the confusion is that the learning rate scheduler has a different notion of "epoch" than Tianshou itself does:
For the learning rate scheduler, if its step method is called without the explicit epoch argument (which we do), the "epoch" is simply a counter of the number of times the step function has been called in the past. Since we call the step function once in every training step, the maximum "epoch" for the scheduler is the total number of training steps (which can be computed as ceil(step_per_epoch / step_per_collect) * num_epochs). We thus correctly parametrise the scheduler as follows (e.g. in atari_ppo.py):
The fact that we use the term epoch in our code for parametrising schedulers is the main issue here.
Note that if a user defines the scheduler as above, because the number of training steps is much larger than the number of epochs, the lambda factor becomes negative. This is not caught in LambdaLR and effectively causes the gradient to be reversed, resulting in fatal learning failure.
I would suggest to:
Replace the term epoch with the term update_num or training_step (also in the high-level API)
Try to catch errors where the learning rate becomes negative.
Improve documentation (where appropriate) in order to avoid this kind of confusion
The text was updated successfully, but these errors were encountered:
The way in which learning rate schedulers are parametrised has led to confusion.
Particularly torch's
LambdaLR
is used throughout Tianshou. While we use it correctly (as far as I can tell), the terminology that is used can confuse users (and has confused a user in #1157). The reason for the confusion is that the learning rate scheduler has a different notion of "epoch" than Tianshou itself does:For the learning rate scheduler, if its
step
method is called without the explicitepoch
argument (which we do), the "epoch" is simply a counter of the number of times thestep
function has been called in the past. Since we call thestep
function once in every training step, the maximum "epoch" for the scheduler is the total number of training steps (which can be computed asceil(step_per_epoch / step_per_collect) * num_epochs
). We thus correctly parametrise the scheduler as follows (e.g. in atari_ppo.py):tianshou/examples/atari/atari_ppo.py
Lines 128 to 132 in bd74273
A user thinking it is the Tianshou "epoch" semantics (like the reporter of #1157), will be tempted to do this instead:
The fact that we use the term
epoch
in our code for parametrising schedulers is the main issue here.Note that if a user defines the scheduler as above, because the number of training steps is much larger than the number of epochs, the lambda factor becomes negative. This is not caught in
LambdaLR
and effectively causes the gradient to be reversed, resulting in fatal learning failure.I would suggest to:
epoch
with the termupdate_num
ortraining_step
(also in the high-level API)The text was updated successfully, but these errors were encountered: