Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Learning rate scheduler confusion #1197

Open
opcode81 opened this issue Aug 9, 2024 · 1 comment
Open

Learning rate scheduler confusion #1197

opcode81 opened this issue Aug 9, 2024 · 1 comment
Labels
documentation enhancement Feature that is not a new algorithm or an algorithm enhancement

Comments

@opcode81
Copy link
Collaborator

opcode81 commented Aug 9, 2024

The way in which learning rate schedulers are parametrised has led to confusion.
Particularly torch's LambdaLR is used throughout Tianshou. While we use it correctly (as far as I can tell), the terminology that is used can confuse users (and has confused a user in #1157). The reason for the confusion is that the learning rate scheduler has a different notion of "epoch" than Tianshou itself does:

For the learning rate scheduler, if its step method is called without the explicit epoch argument (which we do), the "epoch" is simply a counter of the number of times the step function has been called in the past. Since we call the step function once in every training step, the maximum "epoch" for the scheduler is the total number of training steps (which can be computed as ceil(step_per_epoch / step_per_collect) * num_epochs). We thus correctly parametrise the scheduler as follows (e.g. in atari_ppo.py):

if args.lr_decay:
# decay learning rate to 0 linearly
max_update_num = np.ceil(args.step_per_epoch / args.step_per_collect) * args.epoch
lr_scheduler = LambdaLR(optim, lr_lambda=lambda epoch: 1 - epoch / max_update_num)

A user thinking it is the Tianshou "epoch" semantics (like the reporter of #1157), will be tempted to do this instead:

lr_scheduler = LambdaLR(optim, lr_lambda=lambda epoch: 1 - epoch / args.epoch)

The fact that we use the term epoch in our code for parametrising schedulers is the main issue here.

Note that if a user defines the scheduler as above, because the number of training steps is much larger than the number of epochs, the lambda factor becomes negative. This is not caught in LambdaLR and effectively causes the gradient to be reversed, resulting in fatal learning failure.

I would suggest to:

  • Replace the term epoch with the term update_num or training_step (also in the high-level API)
  • Try to catch errors where the learning rate becomes negative.
  • Improve documentation (where appropriate) in order to avoid this kind of confusion
@opcode81 opcode81 added bug Something isn't working documentation enhancement Feature that is not a new algorithm or an algorithm enhancement and removed bug Something isn't working labels Aug 9, 2024
@MischaPanch
Copy link
Collaborator

Loosely related: #1198

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation enhancement Feature that is not a new algorithm or an algorithm enhancement
Projects
None yet
Development

No branches or pull requests

2 participants