-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ADD] PyTorch: Tolstoi Char RNN #40
base: develop
Are you sure you want to change the base?
Conversation
Working training parameters are: | ||
|
||
- batch size ``50`` | ||
- ``200`` epochs | ||
- SGD with a learning rate of :math:`\\approx 0.1` works |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is copied from tensorflow and not verified.
In Tensorflow, the CrossEntropyLoss takes mean across time axis and sum across batch axis.
Such an option does not exist in PyTorch. The only options are "sum"
or "mean"
for both axes. Currently "mean"
is chosen.
In this case, the learning rate should be a factor batch_size
bigger, because gradients are a factor batch_size
smaller.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you instead use "sum"
and divide by seq_length
(or whatever the variable name for the width of the time axis is)?
It would be great if running, e.g. SGD with lr=0.1
produced similar results in PyTorch and TensorFlow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is exactly the idea we discussed in person. However, it turns out that this didn't work.
The division by seq_length
must happen only after the CrossEntropyLoss
. Therefore, it cannot be part of the model.
I see two possibilities:
- introduce a custom CrossEntropyLoss, that divides after applying CrossEntropyLoss.
- leave it as it is
In my opinion, both options are quite bad.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Easiest would be to change the definition of the Loss in the TensorFlow version to something compatible with PyTorch...
Let me think about this and I will address and merge it once I find time for DeepOBS again.
Introduces a PyTorch version of the Tolstoi Char RNN (previously only in Tensorflow).