Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Miscellaneous fixes to the x-transformers implementation #79

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

Waino
Copy link
Collaborator

@Waino Waino commented Oct 21, 2024

  • Validation no longer crashes (transposes were missing).
  • A distributed component covering the parameters in the TransformerWrapper object, most notably to_logits.
  • Arguments of TransformerWrapper can be set through the config file.
  • A fix to the content of state dicts, avoiding duplicate storage of some parameters.
  • Removal of some obsolete opts.
  • Correctly handle stats both with and without accuracy computation (type of initial value is inferred from preceding stats object).

Skip backward if loss is NaN.
Stop training if enough batches are skipped.
The default value must be either zero or None, depending on whether
accuracy is reported or not.
Parameters in the TransformerWrapper, e.g. to_logits, need their own
distributed component and optimizer.
The adapter injection code was causing parameter duplication.

Another issue: to normalize or not to normalize?
We compute a normalization based on either tokens or sents, but never
apply it. The effect can be compensated for using the learning rate, as
long as batches are approximately the same size. Too high learning rates
lead to gradient clipping, which is extra detrimental because each
component is individually clipped.

Clipping deterministically requires one of the following:
- access to gradients for all parameters of the entire model (infeasible)
- component local clipping (current approach)
- communicating a clipping factor across devices (maybe we should do
  this?)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant