Learning rate scheduler confusion #1197

opcode81 · 2024-08-09T09:36:57Z

The way in which learning rate schedulers are parametrised has led to confusion.
Particularly torch's LambdaLR is used throughout Tianshou. While we use it correctly (as far as I can tell), the terminology that is used can confuse users (and has confused a user in #1157). The reason for the confusion is that the learning rate scheduler has a different notion of "epoch" than Tianshou itself does:

For the learning rate scheduler, if its step method is called without the explicit epoch argument (which we do), the "epoch" is simply a counter of the number of times the step function has been called in the past. Since we call the step function once in every training step, the maximum "epoch" for the scheduler is the total number of training steps (which can be computed as ceil(step_per_epoch / step_per_collect) * num_epochs). We thus correctly parametrise the scheduler as follows (e.g. in atari_ppo.py):

tianshou/examples/atari/atari_ppo.py

Lines 128 to 132 in bd74273

    
           if args.lr_decay: 
        
               # decay learning rate to 0 linearly 
        
               max_update_num = np.ceil(args.step_per_epoch / args.step_per_collect) * args.epoch 
        
               lr_scheduler = LambdaLR(optim, lr_lambda=lambda epoch: 1 - epoch / max_update_num)

A user thinking it is the Tianshou "epoch" semantics (like the reporter of #1157), will be tempted to do this instead:

lr_scheduler = LambdaLR(optim, lr_lambda=lambda epoch: 1 - epoch / args.epoch)

The fact that we use the term epoch in our code for parametrising schedulers is the main issue here.

Note that if a user defines the scheduler as above, because the number of training steps is much larger than the number of epochs, the lambda factor becomes negative. This is not caught in LambdaLR and effectively causes the gradient to be reversed, resulting in fatal learning failure.

I would suggest to:

Replace the term epoch with the term update_num or training_step (also in the high-level API)
Try to catch errors where the learning rate becomes negative.
Improve documentation (where appropriate) in order to avoid this kind of confusion

The text was updated successfully, but these errors were encountered:

MischaPanch · 2024-08-10T13:55:34Z

Loosely related: #1198

opcode81 added bug Something isn't working documentation enhancement Feature that is not a new algorithm or an algorithm enhancement and removed bug Something isn't working labels Aug 9, 2024

opcode81 mentioned this issue Aug 9, 2024

Unable to replicate original PPO performance #1157

Closed

9 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Learning rate scheduler confusion #1197

Learning rate scheduler confusion #1197

opcode81 commented Aug 9, 2024 •

edited

Loading

MischaPanch commented Aug 10, 2024

Learning rate scheduler confusion #1197

Learning rate scheduler confusion #1197

Comments

opcode81 commented Aug 9, 2024 • edited Loading

MischaPanch commented Aug 10, 2024

opcode81 commented Aug 9, 2024 •

edited

Loading