Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ForwardKLWithChunkedOutputLoss should be averaged over total_non_pad_tokens #1865

Open
moussaKam opened this issue Oct 18, 2024 · 1 comment
Assignees

Comments

@moussaKam
Copy link

Hi there,

Currently the loss ForwardKLWithChunkedOutputLoss is averaged over the chunks which I think is not the right way to do it since there are chunks that contains only pad tokens. line

I think the right way to do it is to average over the total non pad tokens. I did the fix here but couldn't open a pull request because my commit didn't pass the test since the loss did not match with the expected loss.

@pbontrager
Copy link
Contributor

Thank you for pointing this out. You should be able to submit a PR even if the tests don't pass. Before landing the PR, you'll need to verify that training is improved and then you can update the loss values in the test too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants