Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add GPU sync tests #2194

Closed
wants to merge 1 commit into from
Closed

Conversation

iamzainhuda
Copy link
Contributor

@iamzainhuda iamzainhuda commented Jun 28, 2024

Summary:
TLDR: Significant clean up of recmetrics tests and fixing long standing testing issues with particular metrics. Because RecMetrics is community sourced, the quality of metric and test implementations can have high variance which leads to SEVs later on due to the high surface area RecMetrics operates on.

Added GPU sync tests to simulate gathering metric states on to rank 0 and computing. Tests don't cover this case before, which has resulted in SEVs in the past as users aren't aware of how RecMetrics collects and computes metrics.

Added numerical value tests, most metrics are do not have this which can result in issues down the line if metrics need to be changed/accommodate future changes. Also we've found inconsistencies sometimes from other methods, so always good to check here. We compare each metric to a reference implementation from literature to ensure the values are as expected.

Fixed and cleaned up some tests, particularly enforcing a standard of quality that can be referenced for future metric implementations and tests.

Reviewed By: henrylhtsang

Differential Revision: D59173140

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 28, 2024
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D59173140

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D59173140

iamzainhuda added a commit to iamzainhuda/torchrec that referenced this pull request Sep 25, 2024
Summary:
Pull Request resolved: pytorch#2194

Added GPU sync tests to simulate gathering metric states on to rank 0 and computing. Tests don't cover this case before, which has resulted in SEVs in the past as users aren't aware of how RecMetrics collects and computes metrics.

Added numerical value tests, most metrics are do not have this which can result in issues down the line if metrics need to be changed/accomodate future changes. Also we've found inconsistencies sometimes from other methods, so always good to check here. We compare each metric to a reference implementation from literature to ensure the values are as expected.

Reviewed By: henrylhtsang

Differential Revision: D59173140
iamzainhuda added a commit to iamzainhuda/torchrec that referenced this pull request Oct 15, 2024
Summary:

Added GPU sync tests to simulate gathering metric states on to rank 0 and computing. Tests don't cover this case before, which has resulted in SEVs in the past as users aren't aware of how RecMetrics collects and computes metrics.

Added numerical value tests, most metrics are do not have this which can result in issues down the line if metrics need to be changed/accomodate future changes. Also we've found inconsistencies sometimes from other methods, so always good to check here. We compare each metric to a reference implementation from literature to ensure the values are as expected.

Reviewed By: henrylhtsang

Differential Revision: D59173140
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D59173140

iamzainhuda added a commit to iamzainhuda/torchrec that referenced this pull request Oct 15, 2024
Summary:

Added GPU sync tests to simulate gathering metric states on to rank 0 and computing. Tests don't cover this case before, which has resulted in SEVs in the past as users aren't aware of how RecMetrics collects and computes metrics.

Added numerical value tests, most metrics are do not have this which can result in issues down the line if metrics need to be changed/accommodate future changes. Also we've found inconsistencies sometimes from other methods, so always good to check here. We compare each metric to a reference implementation from literature to ensure the values are as expected.

Fixed and cleaned up some tests too.

Reviewed By: henrylhtsang

Differential Revision: D59173140
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D59173140

iamzainhuda added a commit to iamzainhuda/torchrec that referenced this pull request Oct 15, 2024
Summary:

Added GPU sync tests to simulate gathering metric states on to rank 0 and computing. Tests don't cover this case before, which has resulted in SEVs in the past as users aren't aware of how RecMetrics collects and computes metrics.

Added numerical value tests, most metrics are do not have this which can result in issues down the line if metrics need to be changed/accommodate future changes. Also we've found inconsistencies sometimes from other methods, so always good to check here. We compare each metric to a reference implementation from literature to ensure the values are as expected.

Fixed and cleaned up some tests too.

Reviewed By: henrylhtsang

Differential Revision: D59173140
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D59173140

iamzainhuda added a commit to iamzainhuda/torchrec that referenced this pull request Oct 15, 2024
Summary:

Added GPU sync tests to simulate gathering metric states on to rank 0 and computing. Tests don't cover this case before, which has resulted in SEVs in the past as users aren't aware of how RecMetrics collects and computes metrics.

Added numerical value tests, most metrics are do not have this which can result in issues down the line if metrics need to be changed/accommodate future changes. Also we've found inconsistencies sometimes from other methods, so always good to check here. We compare each metric to a reference implementation from literature to ensure the values are as expected.

Fixed and cleaned up some tests too.

Reviewed By: henrylhtsang

Differential Revision: D59173140
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D59173140

iamzainhuda added a commit to iamzainhuda/torchrec that referenced this pull request Oct 16, 2024
Summary:

**TLDR: Significant clean up of recmetrics tests and fixing long standing testing issues with particular metrics. Because RecMetrics is community sourced, the quality of metric and test implementations can have high variance which leads to SEVs later on due to the high surface area RecMetrics operates on.**

Added GPU sync tests to simulate gathering metric states on to rank 0 and computing. Tests don't cover this case before, which has resulted in SEVs in the past as users aren't aware of how RecMetrics collects and computes metrics.

Added numerical value tests, most metrics are do not have this which can result in issues down the line if metrics need to be changed/accommodate future changes. Also we've found inconsistencies sometimes from other methods, so always good to check here. We compare each metric to a reference implementation from literature to ensure the values are as expected.

Fixed and cleaned up some tests, particularly enforcing a standard of quality that can be referenced for future metric implementations and tests.

Reviewed By: henrylhtsang

Differential Revision: D59173140
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D59173140

Summary:

**TLDR: Significant clean up of recmetrics tests and fixing long standing testing issues with particular metrics. Because RecMetrics is community sourced, the quality of metric and test implementations can have high variance which leads to SEVs later on due to the high surface area RecMetrics operates on.**

Added GPU sync tests to simulate gathering metric states on to rank 0 and computing. Tests don't cover this case before, which has resulted in SEVs in the past as users aren't aware of how RecMetrics collects and computes metrics.

Added numerical value tests, most metrics are do not have this which can result in issues down the line if metrics need to be changed/accommodate future changes. Also we've found inconsistencies sometimes from other methods, so always good to check here. We compare each metric to a reference implementation from literature to ensure the values are as expected.

Fixed and cleaned up some tests, particularly enforcing a standard of quality that can be referenced for future metric implementations and tests.

Reviewed By: henrylhtsang

Differential Revision: D59173140
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D59173140

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants