Add generic fake quantized embedding for QAT #1085

andrewor14 · 2024-10-15T21:54:32Z

Summary: This is equivalent to #1020 but for nn.Embedding. This commit adds a generic fake quantized embedding module to replace the uses of the existing more specific QAT embeddings. For example, Int4WeightOnlyQATEmbedding can be expressed as follows:

from torchao.quantization.prototype.qat.api import FakeQuantizeConfig
from torchao.quantization.prototype.qat.embedding import FakeQuantizedEmbedding

weight_config = FakeQuantizeConfig(
    dtype=torch.int4,
    group_size=group_size,
    is_symmetric=True,
)
fq_embedding = FakeQuantizedEmbedding(16, 32, weight_config=weight_config)

Test Plan:
python test/quantization/test_qat.py -k test_qat_4w_embedding
python test/quantization/test_qat.py -k test_fake_quantized_embedding_4w

pytorch-bot · 2024-10-15T21:54:35Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/1085

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 53239e2 with merge base 48bc81c ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

torchao/quantization/prototype/qat/embedding.py

Summary: This is equivalent to #1020 but for nn.Embedding. This commit adds a generic fake quantized embedding module to replace the uses of the existing more specific QAT embeddings. For example, `Int4WeightOnlyQATEmbedding` can be expressed as follows: ``` from torchao.quantization.prototype.qat.api import FakeQuantizeConfig from torchao.quantization.prototype.qat.embedding import FakeQuantizedEmbedding weight_config = FakeQuantizeConfig( dtype=torch.int4, group_size=group_size, is_symmetric=True, ) fq_embedding = FakeQuantizedEmbedding(16, 32, weight_config=weight_config) ``` Test Plan: python test/quantization/test_qat.py -k test_qat_4w_embedding python test/quantization/test_qat.py -k test_fake_quantized_embedding_4w

andrewor14 requested a review from jerryzh168 October 15, 2024 21:54

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 15, 2024

jerryzh168 approved these changes Oct 15, 2024

View reviewed changes

jerryzh168 reviewed Oct 15, 2024

View reviewed changes

torchao/quantization/prototype/qat/embedding.py Outdated Show resolved Hide resolved

jerryzh168 reviewed Oct 15, 2024

View reviewed changes

torchao/quantization/prototype/qat/embedding.py Outdated Show resolved Hide resolved

andrewor14 force-pushed the fq-embedding branch from e88f00e to 997e2ce Compare October 16, 2024 02:52

andrewor14 force-pushed the fq-embedding branch from 997e2ce to 53239e2 Compare October 16, 2024 02:52

andrewor14 merged commit 0b71b8d into main Oct 16, 2024
17 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add generic fake quantized embedding for QAT #1085

Add generic fake quantized embedding for QAT #1085

andrewor14 commented Oct 15, 2024 •

edited

Loading

pytorch-bot bot commented Oct 15, 2024 •

edited

Loading

Add generic fake quantized embedding for QAT #1085

Add generic fake quantized embedding for QAT #1085

Conversation

andrewor14 commented Oct 15, 2024 • edited Loading

pytorch-bot bot commented Oct 15, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/1085

✅ No Failures

andrewor14 commented Oct 15, 2024 •

edited

Loading

pytorch-bot bot commented Oct 15, 2024 •

edited

Loading