Int8DynActInt4WeightQATQuantizer doesn't support qwen series #1080

elfisworking · 2024-10-15T06:00:46Z

i use Int8DynActInt4WeightQATQuantizer to quantize qwen2-1.5B model. But after prepare function, i find that bias is set to False.
This is my Code

from torchtune.models.qwen2 import qwen2_1_5b
model = qwen2_1_5b()
from torchao.quantization.prototype.qat.linear import Int8DynActInt4WeightQATQuantizer
qat_quantizer = Int8DynActInt4WeightQATQuantizer()
print("before prepare: ", model)
model = qat_quantizer.prepare(model)
print("after prepare: ", model)

The output is

before prepare:  TransformerDecoder(
  (tok_embeddings): Embedding(151936, 1536)
  (layers): ModuleList(
    (0-27): 28 x TransformerSelfAttentionLayer(
      (attn): MultiHeadAttention(
        (q_proj): Linear(in_features=1536, out_features=1536, bias=True)
        (k_proj): Linear(in_features=1536, out_features=256, bias=True)
        (v_proj): Linear(in_features=1536, out_features=256, bias=True)
        (output_proj): Linear(in_features=1536, out_features=1536, bias=False)
        (pos_embeddings): Qwen2RotaryPositionalEmbeddings()
      )
      (mlp): FeedForward(
        (w1): Linear(in_features=1536, out_features=8960, bias=False)
        (w2): Linear(in_features=8960, out_features=1536, bias=False)
        (w3): Linear(in_features=1536, out_features=8960, bias=False)
        (activation): SiLU()
      )
      (sa_norm): RMSNorm()
      (mlp_norm): RMSNorm()
      (sa_scale): Identity()
      (mlp_scale): Identity()
    )
  )
  (norm): RMSNorm()
)
after prepare:  TransformerDecoder(
  (tok_embeddings): Embedding(151936, 1536)
  (layers): ModuleList(
    (0-27): 28 x TransformerSelfAttentionLayer(
      (attn): MultiHeadAttention(
        (q_proj): Int8DynActInt4WeightQATLinear(in_features=1536, out_features=1536, bias=False)
        (k_proj): Int8DynActInt4WeightQATLinear(in_features=1536, out_features=256, bias=False)
        (v_proj): Int8DynActInt4WeightQATLinear(in_features=1536, out_features=256, bias=False)
        (output_proj): Int8DynActInt4WeightQATLinear(in_features=1536, out_features=1536, bias=False)
        (pos_embeddings): Qwen2RotaryPositionalEmbeddings()
      )
      (mlp): FeedForward(
        (w1): Int8DynActInt4WeightQATLinear(in_features=1536, out_features=8960, bias=False)
        (w2): Int8DynActInt4WeightQATLinear(in_features=8960, out_features=1536, bias=False)
        (w3): Int8DynActInt4WeightQATLinear(in_features=1536, out_features=8960, bias=False)
        (activation): SiLU()
      )
      (sa_norm): RMSNorm()
      (mlp_norm): RMSNorm()
      (sa_scale): Identity()
      (mlp_scale): Identity()
    )
  )
  (norm): RMSNorm()
)

we can see that after prepare function, (q_proj): Linear(in_features=1536, out_features=1536, bias=True) has been (q_proj): Int8DynActInt4WeightQATLinear(in_features=1536, out_features=1536, bias=False)
From torchao code, we can see In function

def replacement_fn(child: torch.nn.Module) -> torch.nn.Module:
        new_linear = linear_class(
                    child.in_features,
                    child.out_features,
                    bias=False,
                    device=child.weight.device,
                    groupsize=groupsize,
                    precision=precision,
                    scales_precision=scales_precision,
                )

bias is set to False.
So has any Solution about this problem ?

The text was updated successfully, but these errors were encountered:

elfisworking · 2024-10-15T08:25:27Z

i read the code
in function filter_fn

    def filter_fn(child: torch.nn.Module, cur_fqn:str) -> bool:
        return isinstance(child, nn.Linear) and (_check_linear_int4_k(child.in_features, groupsize) or padding_allowed)

add a judgment condition child.bias is None, maybe a solution?
For example

    def filter_fn(child: torch.nn.Module, cur_fqn:str) -> bool:
        return isinstance(child, nn.Linear) and (_check_linear_int4_k(child.in_features, groupsize) or padding_allowed) and child.bias is None

skip the linear layer where bias is True

jerryzh168 · 2024-10-15T18:56:30Z

cc @andrewor14 can you take a look

andrewor14 · 2024-10-15T21:42:26Z

Hi @elfisworking, yes the easy fix would be to skip the replacement when bias is False. Would you like to submit a fix for this? If not I can do it too.

Probably the longer term fix would be to actually support the bias=True case. This is currently not supported because the quantized linear used in the convert path (Int8DynActInt4WeightLinear) does not support bias. If we make the convert path call the tensor subclass path (using quantize_(model, int8_dynamic_activations_int4_weight())) instead, then this problem will be resolved. This is on my TODO list.

elfisworking · 2024-10-16T01:16:51Z

@andrewor14 ok, i will submit a fix.

andrewor14 self-assigned this Oct 15, 2024

elfisworking mentioned this issue Oct 16, 2024

Temporary fix for QAT quantizer when linear layer bias is True #1087

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Int8DynActInt4WeightQATQuantizer doesn't support qwen series #1080

Int8DynActInt4WeightQATQuantizer doesn't support qwen series #1080

elfisworking commented Oct 15, 2024 •

edited

Loading

elfisworking commented Oct 15, 2024 •

edited

Loading

jerryzh168 commented Oct 15, 2024

andrewor14 commented Oct 15, 2024 •

edited

Loading

elfisworking commented Oct 16, 2024

Int8DynActInt4WeightQATQuantizer doesn't support qwen series #1080

Int8DynActInt4WeightQATQuantizer doesn't support qwen series #1080

Comments

elfisworking commented Oct 15, 2024 • edited Loading

elfisworking commented Oct 15, 2024 • edited Loading

jerryzh168 commented Oct 15, 2024

andrewor14 commented Oct 15, 2024 • edited Loading

elfisworking commented Oct 16, 2024

elfisworking commented Oct 15, 2024 •

edited

Loading

elfisworking commented Oct 15, 2024 •

edited

Loading

andrewor14 commented Oct 15, 2024 •

edited

Loading