-
Notifications
You must be signed in to change notification settings - Fork 242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add FalconBackbone
#1475
Add FalconBackbone
#1475
Conversation
|
||
alibi = self._build_alibi_tensor( | ||
self.num_attention_heads, decoder_padding_mask | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mattdangerw Right now, I'm calculating alibi in every layer although it doesn't change from layer to layer! It would be more efficient to just calculate it once in the backbone but backbone init doesn't know the shapes yet! If you have any suggestions to be able to calculate the alibi once, let me know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed this offline. This is fine for now because it doesn't seem a really compute intensive part. Ideally, it would be better to analyze to see what's the impact of this repetition and avoid repetition if it has a significant impact on the performance of the model.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! Left a few comments.
|
||
class FalconCausalLM(GenerativeTask): | ||
def __init__(self, backbone, preprocessor=None, **kwargs): | ||
inputs = backbone.input |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use this form https://github.com/keras-team/keras-nlp/blob/master/keras_nlp/models/gemma/gemma_causal_lm.py#L148-L169
Though could also leave this class as a follow up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the reference.
Yeah! I was thinking of adding this as a follow up PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed the file.
keras.config.disable_traceback_filtering() | ||
|
||
|
||
def convert_checkpoints(hf_model): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could consider modeling after this #1402 for a more fleshed out conversion script (though the PR i'm linking is missing the numerics validation). Will save a full preset directory and print emojis, both very important.
Also, could wait to shift to something like this after we add the tokenizer. Either way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the info! It would be great if I can add it later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm! feel free to merge if gpu testing looks good!
* Add Falcon backbone. * Add docstring. * Add dtype. * Add checkpoint conversion script. * Fix tests. * Random fixes. * Add cache. * Cast cumsum to int32. * Make sublayers public. * Address backbone comments. * Update attention computation to use einsum. * Falcon only works with Keras3. * Fix tests. * Remove falcon_causal_lm file. * Remove commented/unused codes.
This is part of addressing #1372 to add the
Falcon
model to KerasNLP. This PR adds theFalconBackbone
.Checkpoint conversion colab