-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: NaN Values in fwd_prepare_wy_repr
Output in GatedDeltaNet
#99
Comments
I'm encountering an error which might be related. I get NaNs after hundreds training steps using the following config and pytorch 2.4.1 and triton 3.0.0
|
Thank u for the report, I'll take a look at this issue right now. |
@xffxff @prolearner Hi, please check out cb36e67 |
Thanks for the quick fix! |
@yzhangcs Thank you for the quick fix! I've tested the updated code locally, and the NaN issue has been resolved with the reproduction code I provided above. I'm now going to test it in my training job. |
Thank you! we're checking the numerical stability of exp gate with high priority. |
Describe the bug
When using fla's
GatedDeltaNet
, I encountered an issue where the output offwd_prepare_wy_repr
contains NaN values. I have located the problem and can reproduce it with the following code:Steps to reproduce the bug
The output u contains NaN values. The relevant input .pt files are included in the attached debug.tar.gz archive.
Expected behavior
The output
u
should not contain any NaN values.Environment info
The text was updated successfully, but these errors were encountered: