-
Notifications
You must be signed in to change notification settings - Fork 307
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix Half-Quadratic Quantization and Dequantization on CPU #873
fix Half-Quadratic Quantization and Dequantization on CPU #873
Conversation
Code Metrics Report=============================================================================== Language Files Lines Code Comments Blanks =============================================================================== C Header 2 35 28 0 7 Dockerfile 1 34 25 0 9 Happy 1 442 369 0 73 JSON 12 105 104 0 1 Python 52 2280 1940 68 272 TOML 20 630 564 2 64 YAML 2 21 19 2 0 ------------------------------------------------------------------------------- Jupyter Notebooks 4 0 0 0 0 |- Markdown 2 77 32 31 14 |- Python 2 205 178 1 26 (Total) 282 210 32 40 ------------------------------------------------------------------------------- Markdown 38 2803 0 2132 671 |- BASH 6 103 100 0 3 |- JSON 1 12 12 0 0 |- Python 5 92 82 0 10 |- Rust 9 322 274 0 48 |- TOML 2 75 63 0 12 (Total) 3407 531 2132 744 ------------------------------------------------------------------------------- Rust 271 79722 71594 1674 6454 |- Markdown 132 1361 25 1241 95 (Total) 81083 71619 2915 6549 =============================================================================== Total 404 86072 74643 3878 7551 =============================================================================== |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @haricot! Thanks for the PR. Can you please update it so it also tests 8 bit quantization? Thanks!
@haricot were you planning on implementing HQQ for non-CUDA devices in this PR? The name seems to indicate so, I was just wondering! |
Hi @EricLBuehler! My first goal was to make the quantization work on my device. In fact, I could not quantize the models, I got OOM. There is a small optimisation because in the dequantize function if the scales and zeros are in f32 then it dequantizes to f32 even if we use other dtype.
This would mean that all the dtype scales should be integrated into the uqff format or only the specific or, more simply, the dtypes should be changed dynamically depending on the possible dtype chosen. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @haricot! All tests pass on CPU (the target of this PR) and the changes look good. Merging now, thanks for the contribution.
…er#873) * test_bitpack cpu/cuda * add test_bitpack 8 bit quantization cpu/cuda * fix unnecessary nested cfg attributes * fix alloc/init cpu dequantize hqq * ensuring contiguous data slices * Revert "ensuring contiguous data slices to see result in CI" * code cleanup * ensuring contiguous data slices
This confirms that
test_bitpack
is running solely on non-CPU hardware. To address this, we could implement a fix by ensuring contiguous data slices.