Add fp8 quantization for conv and linear layers #277

nithinsubbiah · 2024-10-14T16:33:12Z

This is a follow-up to (#202) which injects fp8 linear and convolution kernels in punet.

sharktank/sharktank/kernels/templates/batch_matmul_transpose_b.mlir

sharktank/sharktank/ops/qconv_impls.py

This reverts commit ec5672e.

stellaraccident · 2024-10-17T23:36:33Z

sharktank/sharktank/kernels/batch_matmul_transpose_b.py

@@ -80,7 +80,7 @@ def generate(self, ksel: KernelSelection, kb: KernelBuilder):
        spec_sig = f"L{a_ident}_R{b_ident}"
        template_file = "batch_matmul_transpose_b.mlir"
        target_function_name = f"sharktank_batch_matmul_transpose_b_{spec_sig}"
-
+        cst_zero = "0." if "f" in str(accum_type) else "0"


Be specific. Don't just use stringy char-in-str.

In this case, I think you can say something like:

cst_zero = "0" if IntegerType.isa(accum_type) else "0."

nithinsubbiah marked this pull request as draft October 14, 2024 16:33

nithinsubbiah force-pushed the punet_f8 branch 2 times, most recently from 0a9185b to c2d02cc Compare October 14, 2024 23:55

nithinsubbiah requested a review from stellaraccident October 14, 2024 23:56

nithinsubbiah marked this pull request as ready for review October 14, 2024 23:56

nithinsubbiah requested a review from rsuderman October 14, 2024 23:57

nithinsubbiah force-pushed the punet_f8 branch 4 times, most recently from 6b49a8e to 4e84bbb Compare October 17, 2024 00:58

nithinsubbiah added 2 commits October 17, 2024 16:19

Add fp8 quantization for conv and linear layers

939416c

Update test signature

5e9aa44

nithinsubbiah force-pushed the punet_f8 branch from 984e342 to 33d45a0 Compare October 17, 2024 23:20

Fix formatting

33d45a0

nithinsubbiah enabled auto-merge (squash) October 17, 2024 23:32

rsuderman approved these changes Oct 17, 2024

View reviewed changes

sharktank/sharktank/kernels/templates/batch_matmul_transpose_b.mlir Show resolved Hide resolved

sharktank/sharktank/kernels/templates/batch_matmul_transpose_b.mlir Show resolved Hide resolved

sharktank/sharktank/ops/qconv_impls.py Show resolved Hide resolved

nithinsubbiah merged commit ec5672e into nod-ai:main Oct 17, 2024
8 of 9 checks passed

nithinsubbiah added a commit that referenced this pull request Oct 17, 2024

Revert "Add fp8 quantization for conv and linear layers (#277)"

0022804

This reverts commit ec5672e.

stellaraccident reviewed Oct 17, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add fp8 quantization for conv and linear layers #277

Add fp8 quantization for conv and linear layers #277

nithinsubbiah commented Oct 14, 2024 •

edited

Loading

stellaraccident Oct 17, 2024

Add fp8 quantization for conv and linear layers #277

Add fp8 quantization for conv and linear layers #277

Conversation

nithinsubbiah commented Oct 14, 2024 • edited Loading

stellaraccident Oct 17, 2024

Choose a reason for hiding this comment

nithinsubbiah commented Oct 14, 2024 •

edited

Loading