-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ADD] PyTorch: Tolstoi Char RNN #40
Open
schaefertim
wants to merge
17
commits into
fsschneider:develop
Choose a base branch
from
schaefertim:develop-tolstoi
base: develop
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
17 commits
Select commit
Hold shift + click to select a range
588c323
[GIT] add PyCharm files to gitignore
schaefertim c084691
[ADD] Tolstoi Char RNN testproblem
schaefertim e5db1af
[FIX] Tolstoi dataset
schaefertim 6701e1a
[ADD] net_char_rnn: debug with print
schaefertim a7e4071
[ADD] add TODO, fix parameters
schaefertim 8970099
[ADD] fix network, remove print
schaefertim cfa7f8f
[ADD] LSTM PyTorch: different parameter count
schaefertim a899ed2
[ADD] SGD Runner
schaefertim b46ccf3
[REF] adjust NR_PT_TESTPROBLEMS to 21
schaefertim 6ba92d0
[ADD] Tolstoi: PyTorch: redundant bias: set to zero and requires_grad…
schaefertim a46b8ef
[REF] Tolstoi, PyTorch: adjust dropout probability to tensorflow
schaefertim ba2f002
[REF] adjust TODO
schaefertim 523e4a8
[REF] cleanup
schaefertim b71d673
[FIX] denominator for 2d labels (like in Tolstoi)
schaefertim 3edd505
[REF] cleanup
schaefertim 7be0020
[DEL] remove default sgd
schaefertim 883202f
[REF] separate training and validation data
schaefertim File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,59 @@ | ||
# -*- coding: utf-8 -*- | ||
"""A vanilla RNN architecture for Tolstoi.""" | ||
from torch import nn | ||
|
||
from deepobs.pytorch.testproblems.testproblem import WeightRegularizedTestproblem | ||
from .testproblems_modules import net_char_rnn | ||
from ..datasets.tolstoi import tolstoi | ||
|
||
|
||
class tolstoi_char_rnn(WeightRegularizedTestproblem): | ||
"""DeepOBS test problem class for a two-layer LSTM for character-level language | ||
modelling (Char RNN) on Tolstoi's War and Peace. | ||
|
||
Some network characteristics: | ||
|
||
- ``128`` hidden units per LSTM cell | ||
- sequence length ``50`` | ||
- cell state is automatically stored in variables between subsequent steps | ||
- when the phase placeholder switches its value from one step to the next, | ||
the cell state is set to its zero value (meaning that we set to zero state | ||
after each round of evaluation, it is therefore important to set the | ||
evaluation interval such that we evaluate after a full epoch.) | ||
|
||
Working training parameters are: | ||
|
||
- batch size ``50`` | ||
- ``200`` epochs | ||
- SGD with a learning rate of :math:`\\approx 0.1` works | ||
|
||
Args: | ||
batch_size (int): Batch size to use. | ||
l2_reg (float): L2-regularization factor. L2-Regularization (weight decay) | ||
is used on the weights but not the biases. | ||
Defaults to ``5e-4``. | ||
|
||
Attributes: | ||
data: The dataset used by the test problem (datasets.DataSet instance). | ||
loss_function: The loss function for this test problem. | ||
net: The torch module (the neural network) that is trained. | ||
""" | ||
|
||
def __init__(self, batch_size, l2_reg=0.0005): | ||
"""Create a new char_rnn test problem instance on Tolstoi. | ||
|
||
Args: | ||
batch_size (int): Batch size to use. | ||
l2_reg (float): L2-regularization factor. L2-Regularization (weight decay) | ||
is used on the weights but not the biases. | ||
Defaults to ``5e-4``. | ||
""" | ||
super(tolstoi_char_rnn, self).__init__(batch_size, l2_reg) | ||
|
||
def set_up(self): | ||
"""Set up the Char RNN test problem on Tolstoi.""" | ||
self.data = tolstoi(self._batch_size) | ||
self.loss_function = nn.CrossEntropyLoss | ||
self.net = net_char_rnn(hidden_dim=128, num_layers=2, seq_len=50, vocab_size=83) | ||
self.net.to(self._device) | ||
self.regularization_groups = self.get_regularization_groups() |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is copied from tensorflow and not verified.
In Tensorflow, the CrossEntropyLoss takes mean across time axis and sum across batch axis.
Such an option does not exist in PyTorch. The only options are
"sum"
or"mean"
for both axes. Currently"mean"
is chosen.In this case, the learning rate should be a factor
batch_size
bigger, because gradients are a factorbatch_size
smaller.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you instead use
"sum"
and divide byseq_length
(or whatever the variable name for the width of the time axis is)?It would be great if running, e.g. SGD with
lr=0.1
produced similar results in PyTorch and TensorFlow.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is exactly the idea we discussed in person. However, it turns out that this didn't work.
The division by
seq_length
must happen only after theCrossEntropyLoss
. Therefore, it cannot be part of the model.I see two possibilities:
In my opinion, both options are quite bad.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Easiest would be to change the definition of the Loss in the TensorFlow version to something compatible with PyTorch...
Let me think about this and I will address and merge it once I find time for DeepOBS again.