Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update gemma_backbone.py for sharding config. #1491

Merged
merged 3 commits into from
Mar 14, 2024
Merged

Conversation

qlzh727
Copy link
Member

@qlzh727 qlzh727 commented Mar 6, 2024

This is trying to address the #1464.

The new setting is based on the Gemma training script internally.

Here is some perf benchmark on TPU v3-8:

(Smaller value are better)

===================
base line (current setting):
generate: 1342 ms per 100 token
finetune with lora: 125ms/step

=====================
This PR setting
generate: 1245 ms per 100 token
finetune with lora: 64ms/step

@github-actions github-actions bot added the Gemma Gemma model specific issues label Mar 6, 2024
@mattdangerw mattdangerw self-requested a review March 8, 2024 01:42
@qlzh727
Copy link
Member Author

qlzh727 commented Mar 12, 2024

PTAL again.

Copy link
Member

@mattdangerw mattdangerw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM, though we still might want to check with other folks to help decide between the two.

@qlzh727
Copy link
Member Author

qlzh727 commented Mar 12, 2024

Ack, I will leave the PR here and feel free to merge it when ready.

Copy link
Member

@mattdangerw mattdangerw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good!

@mattdangerw mattdangerw merged commit 4511580 into master Mar 14, 2024
18 of 19 checks passed
abuelnasr0 pushed a commit to abuelnasr0/keras-nlp that referenced this pull request Apr 2, 2024
* Update gemma_backbone.py for sharding config.

* Update unit test and fix format.

* Update sharding spec for gemma based on gemma training.
@mattdangerw mattdangerw deleted the qlzh727-patch-2 branch August 22, 2024 00:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Gemma Gemma model specific issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants