support W4A8 Marlin kernel #1113

HandH1998 · 2024-10-18T08:45:11Z

Summary

We inroduce a mixed precision GEMM kernel for INT4-Weight and INT8-Activation. We implemented the W4A8 GEMM based on Marlin GEMM. The kernel is designed to support our W4A8 quantization method QQQ. For more details on the kernel implementation, you can refer to our paper. The kernel demonstrates excellent performance and has been merged into the official vLLM project (see vllm-project/vllm#5218).

We hope the w4a8 GEMM can also provide a practical speedup for other W4A8 quantization methods in the community.
Additionally, since torchao is widely used in frameworks like SGLang, we can extend support for W4A8 once the kernel is integrated into torchao.

Performance

Here is the speedup over PyTorch FP16 GEMM (Calling CUTLASS) of all GEMMs under different numbers of input tokens. The weight matrix size is (N=8192, K=21760). You can reproduce the benchmark results using bench_w4a8.py in my repo.

pytorch-bot · 2024-10-18T08:45:15Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/1113

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 952f65e with merge base c87cc9b ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

drisspg · 2024-10-18T20:56:50Z

can we do some comparisons between this and #880?

support W4A8 Marlin kernel

952f65e

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support W4A8 Marlin kernel #1113

support W4A8 Marlin kernel #1113

HandH1998 commented Oct 18, 2024

pytorch-bot bot commented Oct 18, 2024 •

edited

Loading

drisspg commented Oct 18, 2024

support W4A8 Marlin kernel #1113

Are you sure you want to change the base?

support W4A8 Marlin kernel #1113

Conversation

HandH1998 commented Oct 18, 2024

Summary

Performance

pytorch-bot bot commented Oct 18, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/1113

✅ No Failures

drisspg commented Oct 18, 2024

pytorch-bot bot commented Oct 18, 2024 •

edited

Loading