-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add gradient accumulation scheduler #21
Conversation
@@ -15,4 +15,4 @@ model_config: | |||
block_size: 2048 | |||
vocab_size: 256000 | |||
separator_token: <bos> | |||
tokenizer_path: "chemlactica/tokenizer/GemmaTokenizer" | |||
tokenizer_path: "/auto/home/menuab/code/ChemLactica/chemlactica/tokenizer/GemmaTokenizer" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the relative path break something here? If not, let's keep it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there was an issue with running from a notebook, should be okay now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I missed this one, will fix with the next test run
last_far_loss = [ | ||
s["loss"] | ||
for s in state.log_history[ | ||
-20 * self.ga_delta_steps : -19 * self.ga_delta_steps # noqa |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-20 and -19 seem like magic values. Can we either have them as named constants or add a comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good idea, comments added
submit_run.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for cleaning this up, these are very nice defaults.
adds gradient accumulation scheduler