Miscellaneous fixes to the x-transformers implementation #79

Waino · 2024-10-21T08:08:55Z

Validation no longer crashes (transposes were missing).
A distributed component covering the parameters in the TransformerWrapper object, most notably to_logits.
Arguments of TransformerWrapper can be set through the config file.
A fix to the content of state dicts, avoiding duplicate storage of some parameters.
Removal of some obsolete opts.
Correctly handle stats both with and without accuracy computation (type of initial value is inferred from preceding stats object).

Skip backward if loss is NaN. Stop training if enough batches are skipped.

The default value must be either zero or None, depending on whether accuracy is reported or not.

Parameters in the TransformerWrapper, e.g. to_logits, need their own distributed component and optimizer.

The adapter injection code was causing parameter duplication. Another issue: to normalize or not to normalize? We compute a normalization based on either tokens or sents, but never apply it. The effect can be compensated for using the learning rate, as long as batches are approximately the same size. Too high learning rates lead to gradient clipping, which is extra detrimental because each component is individually clipped. Clipping deterministically requires one of the following: - access to gradients for all parameters of the entire model (infeasible) - component local clipping (current approach) - communicating a clipping factor across devices (maybe we should do this?)

Waino added 8 commits October 7, 2024 12:15

Detect model blowout

5a73b4a

Skip backward if loss is NaN. Stop training if enough batches are skipped.

Bugfix to validation

17b6ced

Remove more obsolete opts

4f6620c

Number of correct defaults to None, but can handle zero

7229141

Bugfix: Statisics inherits n_correct from previous instance

203d4d5

The default value must be either zero or None, depending on whether accuracy is reported or not.

Pass kwargs also to TransformerWrapper

83c8c26

Distributed component for TransformerWrapper

97bc2d9

Parameters in the TransformerWrapper, e.g. to_logits, need their own distributed component and optimizer.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Miscellaneous fixes to the x-transformers implementation #79

Miscellaneous fixes to the x-transformers implementation #79

Waino commented Oct 21, 2024

Miscellaneous fixes to the x-transformers implementation #79

Are you sure you want to change the base?

Miscellaneous fixes to the x-transformers implementation #79

Conversation

Waino commented Oct 21, 2024