Temporary Fix: Skip TestAffineQuantizedTensorParallel on H100 #1001

jainapurva · 2024-10-03T01:58:32Z

The current aqt test runs on bfloat16, float16 and float32, but the test doesn't run on H100 for these dtypes. As a temporray fix, skipping the test if H100

Created issue to track this: #1000

pytorch-bot · 2024-10-03T01:58:35Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/1001

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit 9dec2af with merge base 09b8b3c ():

FLAKY - The following job failed but was likely due to flakiness present on trunk:

Run Regression Tests / test (CUDA Nightly (Aug 30), linux.g5.12xlarge.nvidia.gpu, --pre torch==2.5.0.dev20240831+cu121 -... / linux-job (gh) (detected as infra flaky with no log or failing log classifier)

This comment was automatically generated by Dr. CI and updates every 15 minutes.

msaroufim · 2024-10-03T19:11:53Z

test/dtypes/test_affine_quantized_tensor_parallel.py

+    if not is_H100:
+        run_tests()
+    else:
+        print("Skipping TestAffineQuantizedTensorParallel: not supported on H100")


I'd prefer we explicitly put a skip test on the skipped test, this will also mess up the pytest output

msaroufim · 2024-10-03T20:02:14Z

test/dtypes/test_affine_quantized_tensor_parallel.py

@@ -1,12 +1,17 @@
 from torchao.testing.utils import copy_tests, TorchAOTensorParallelTestCase
 from torch.testing._internal.common_utils import run_tests
 from torchao.quantization import int8_weight_only
+import torch


do you still see the error if you move the import torch statement above any ao imports?

Moving torch as the first import does resolve some issues, but there are still issues of compatibility with H100. I'm working on another PR to add support for H100 (specially float8). #1003

Temporary Fix: Skip on H100

2b83972

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 3, 2024

jainapurva added 2 commits October 2, 2024 19:01

Temporary Fix: Skip on H100

6bf6caa

Temporary Fix: Skip on H100

ce5273c

jainapurva requested a review from jerryzh168 October 3, 2024 02:03

jerryzh168 approved these changes Oct 3, 2024

View reviewed changes

msaroufim reviewed Oct 3, 2024

View reviewed changes

jainapurva added 2 commits October 3, 2024 13:08

Test cuda nightly failure

59a4113

Test cuda nightly failure

9dec2af

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Temporary Fix: Skip TestAffineQuantizedTensorParallel on H100 #1001

Temporary Fix: Skip TestAffineQuantizedTensorParallel on H100 #1001

jainapurva commented Oct 3, 2024

pytorch-bot bot commented Oct 3, 2024 •

edited

Loading

msaroufim Oct 3, 2024

msaroufim Oct 3, 2024

jainapurva Oct 3, 2024

Temporary Fix: Skip TestAffineQuantizedTensorParallel on H100 #1001

Are you sure you want to change the base?

Temporary Fix: Skip TestAffineQuantizedTensorParallel on H100 #1001

Conversation

jainapurva commented Oct 3, 2024

pytorch-bot bot commented Oct 3, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/1001

✅ You can merge normally! (1 Unrelated Failure)

msaroufim Oct 3, 2024

Choose a reason for hiding this comment

msaroufim Oct 3, 2024

Choose a reason for hiding this comment

jainapurva Oct 3, 2024

Choose a reason for hiding this comment

pytorch-bot bot commented Oct 3, 2024 •

edited

Loading