Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ADD] PyTorch: Tolstoi Char RNN #40

Open
wants to merge 17 commits into
base: develop
Choose a base branch
from

Conversation

schaefertim
Copy link

Introduces a PyTorch version of the Tolstoi Char RNN (previously only in Tensorflow).

Comment on lines +24 to +28
Working training parameters are:

- batch size ``50``
- ``200`` epochs
- SGD with a learning rate of :math:`\\approx 0.1` works
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is copied from tensorflow and not verified.
In Tensorflow, the CrossEntropyLoss takes mean across time axis and sum across batch axis.
Such an option does not exist in PyTorch. The only options are "sum" or "mean" for both axes. Currently "mean" is chosen.
In this case, the learning rate should be a factor batch_size bigger, because gradients are a factor batch_size smaller.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you instead use "sum" and divide by seq_length (or whatever the variable name for the width of the time axis is)?
It would be great if running, e.g. SGD with lr=0.1 produced similar results in PyTorch and TensorFlow.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is exactly the idea we discussed in person. However, it turns out that this didn't work.
The division by seq_length must happen only after the CrossEntropyLoss. Therefore, it cannot be part of the model.
I see two possibilities:

  • introduce a custom CrossEntropyLoss, that divides after applying CrossEntropyLoss.
  • leave it as it is

In my opinion, both options are quite bad.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Easiest would be to change the definition of the Loss in the TensorFlow version to something compatible with PyTorch...
Let me think about this and I will address and merge it once I find time for DeepOBS again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants