-
Notifications
You must be signed in to change notification settings - Fork 141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] SmoothQuant using tensor subclassing #1030
base: main
Are you sure you want to change the base?
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/1030
Note: Links to docs will display an error until the docs builds have been completed. ❌ 1 New FailureAs of commit fa1144c with merge base d4b2f33 (): NEW FAILURE - The following job has failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
return insert_subclass | ||
|
||
|
||
def save_smooth_quant_recipe(model: torch.nn.Module, save_path: str) -> Dict[str, torch.Tensor]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need this? or just saving the state_dict for observed model is enough?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We want to have an API to modify (tune) quantization parameters, i.e. the recipe here. Do you have any concern about adding this API?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so the state_dict is supposed to be used by other APIs to tune quantization parameters? I think that's fine if you have this use case in mind, is the model with SmoothQuantObservedLinear not serializable by itself?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SmoothQuantObservedLinear
is serializable. However, a recipe is more flexible to tune parameters. Thanks.
Hi @jerryzh168 I added a new tensor subclass If I use Do you have any concern adding this new class? Thanks. |
I think in this case we should be composing
in dispatch time, we first unwrap the outer most tensor subclass, which will be would this work? the naming for different tensor subclasses is a bit confusing right now I think, we should cleanup a bit later |
It works. Thanks |
Hi @jerryzh168 It's weird that if I add these lines https://github.com/pytorch/ao/blob/f595ed41b99685cc16fc480ca2218965bb812bed/torchao/kernel/intmm.py#L142C1-L146C1 to avoid overflow of float16, there will be a failure in |
what is the test failure? is it possible to do the dtype conversion before calling
I just saw the error, it is talking about some triton error:
|
The error is results not all close. One element exceed the tolerance by a small amount. As for dtype conversion, I didn't make such changes in affine_quantized_tensor.py 🤔
Thanks for the info. Did you see which test case failed? |
Still WIP
The implementation of SmoothQuant with tensor subclassing (AffineQuantizedTensor) is similar to that of AWQ with the following differences: