[MOE Quantization] Warn against "undercalibrated" modules #2262

dbogunowicz · 2024-05-02T11:35:15Z

Note: this branch requires this PR: neuralmagic/compressed-tensors#46 to land in compressed-tensors.

Example Use:

from sparseml.transformers import SparseAutoModelForCausalLM, SparseAutoTokenizer, oneshot
import os
import torch

model_name = "Isotonic/TinyMixtral-4x248M-MoE"

model = SparseAutoModelForCausalLM.from_pretrained(
    model_name,
    device_map="cuda:0",
    torch_dtype=torch.float16,
)
tokenizer = SparseAutoTokenizer.from_pretrained(
    model_name
)

dataset = "open-platypus"
recipe = "tests/sparseml/transformers/compression/recipes/new_quant_full.yaml"

oneshot(
        model=model,
        dataset=dataset,
        overwrite_output_dir=True,
        output_dir="./output_one_shot",
        recipe=recipe,
        num_calibration_samples=4,
        pad_to_max_length=False,
        min_tokens_per_group = 0.3 
    )

2024-05-15 12:15:13 sparseml.transformers.finetune.runner INFO     *** One Shot ***
2024-05-15 12:15:14 sparseml.core.recipe.recipe INFO     Loading recipe from file tests/sparseml/transformers/compression/recipes/new_quant_full.yaml
/root/sparseml/.venv/lib/python3.10/site-packages/pydantic/_internal/_fields.py:151: UserWarning: Field "model_fuse_fn_name" has conflict with protected namespace "model_".

You may be able to resolve this warning by setting `model_config['protected_namespaces'] = ()`.
  warnings.warn(
/root/sparseml/.venv/lib/python3.10/site-packages/pydantic/_internal/_fields.py:151: UserWarning: Field "model_fuse_fn_kwargs" has conflict with protected namespace "model_".

You may be able to resolve this warning by setting `model_config['protected_namespaces'] = ()`.
  warnings.warn(
2024-05-15 12:15:14 sparseml.modifiers.quantization_vllm.pytorch INFO     Running vLLMQuantizationModifier calibration with 4 samples...
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00,  4.19it/s]
2024-05-15 12:15:15 sparseml.modifiers.quantization_vllm.pytorch WARNING  The module_name: model.layers.0.block_sparse_moe.experts.1.w1 received less than 30% of calibration batch tokens (212/970 tokens). This could result may harm the quantization quality.
2024-05-15 12:15:15 sparseml.modifiers.quantization_vllm.pytorch WARNING  The module_name: model.layers.0.block_sparse_moe.experts.1.w2 received less than 30% of calibration batch tokens (212/970 tokens). This could result may harm the quantization quality.
...
2024-05-15 12:15:15 sparseml.modifiers.quantization_vllm.pytorch WARNING  The module_name: model.layers.10.block_sparse_moe.experts.3.w3 received less than 30% of calibration batch tokens (233/970 tokens). This could result may harm the quantization quality.
2024-05-15 12:15:15 sparseml.modifiers.quantization_vllm.pytorch WARNING  The module_name: model.layers.11.block_sparse_moe.experts.2.w1 received less than 30% of calibration batch tokens (21/970 tokens). This could result may harm the quantization quality.
2024-05-15 12:15:15 sparseml.modifiers.quantization_vllm.pytorch WARNING  The module_name: model.layers.11.block_sparse_moe.experts.2.w2 received less than 30% of calibration batch tokens (21/970 tokens). This could result may harm the quantization quality.
2024-05-15 12:15:15 sparseml.modifiers.quantization_vllm.pytorch WARNING  The module_name: model.layers.11.block_sparse_moe.experts.2.w3 received less than 30% of calibration batch tokens (21/970 tokens). This could result may harm the quantization quality.

* working reload * sparsegpt

…ml into sa/quant_mod_refactor

bfineran

let's move this to an examples/integrations directory

src/sparseml/modifiers/utils/pytorch_helpers.py

dbogunowicz and others added 30 commits April 8, 2024 11:38

initial commit

097bd79

update setup.py

76970e3

Update setup.py

bbf4b39

fix setup.py

a272a30

move all config to sparsetensors

c0d3ead

Merge branch 'main' into feature/damian/sparsetensors

b3f7ff3

cleanup class name and comments

a75f8da

Merge branch 'main' into feature/damian/sparsetensors

c5b897e

initial implementation untested

2c72ab1

fixing issues

9174c1d

add test script

aa17e77

update perplexity test

f1f114c

refactor to compressed-tensors

bbbdcb9

Merge branch 'main' into feature/damian/sparsetensors

5d9c7dd

rename sparsetensors

7a9f9e5

update setup

fa43088

Sa/model reload (#2250)

63266d8

* working reload * sparsegpt

Merge branch 'main' into sa/quant_mod_refactor

b0f0fc9

Merge branch 'main' into feature/damian/sparsetensors

dfa41fb

Merge branch 'feature/damian/sparsetensors' into sa/quant_mod_refactor

4af4852

cleanup

55976c5

refactor tests

38f4f77

only run oneshot once

6574874

all tests passing

7f5babf

remove unused config

c0d6cb9

reset models on each parameterize

a59e2af

Merge branch 'feature/damian/sparsetensors' into sa/quant_mod_refactor

cba7c27

style

2a6b0f2

Merge branch 'main' into feature/damian/sparsetensors

1e7ee94

bring back SparsityConfigMetadata

a4e0575

Sara Adkins and others added 4 commits May 1, 2024 11:54

Merge branch 'main' into sa/quant_mod_refactor

1c3b31b

quality

90795bd

shape consistency

bf7d0f6

Merge branch 'sa/quant_mod_refactor' of github.com:neuralmagic/sparse…

579d201

…ml into sa/quant_mod_refactor

dbogunowicz changed the base branch from main to sa/quant_mod_refactor May 2, 2024 11:35

address PR comments

2432cf4

dbogunowicz changed the title ~~[One-shot MOE model] Support for Qwen2-MOE~~ [WiP][One-shot MOE model] Support for Qwen2-MOE May 6, 2024

dbogunowicz changed the base branch from sa/quant_mod_refactor to feature/damian/bump_transformers_440 May 6, 2024 11:49

dbogunowicz changed the base branch from feature/damian/bump_transformers_440 to sa/quant_mod_refactor May 6, 2024 11:49

dbogunowicz changed the title ~~[WiP][One-shot MOE model] Support for Qwen2-MOE~~ [WiP][MOE Quantization] Support for Qwen2-MOE May 6, 2024

only relevant files

139f388

dbogunowicz force-pushed the feature/damian/moe branch from 31d86d5 to 139f388 Compare May 6, 2024 11:55

bfineran reviewed May 6, 2024

View reviewed changes

Base automatically changed from sa/quant_mod_refactor to main May 6, 2024 20:02

Merge remote-tracking branch 'origin/main' into feature/damian/moe

b33e393

dbogunowicz changed the base branch from main to feature/damian/bump_transformers_440 May 7, 2024 10:28

add checks for undercalibrated modules

d70daa2

dbogunowicz changed the title ~~[WiP][MOE Quantization] Support for Qwen2-MOE~~ [MOE Quantization] Warn against "undercalibrated" modules May 7, 2024

dbogunowicz and others added 2 commits May 7, 2024 12:09

typo

6a4eecb

Delete moe.py

7a82f56

dbogunowicz changed the base branch from feature/damian/bump_transformers_440 to main May 10, 2024 12:35

dbogunowicz added 2 commits May 10, 2024 14:35

Merge branch 'main' into feature/damian/moe

a264819

Merge branch 'main' into feature/damian/moe

66b6cf7

bfineran reviewed May 13, 2024

View reviewed changes

src/sparseml/modifiers/utils/pytorch_helpers.py Outdated Show resolved Hide resolved

src/sparseml/modifiers/utils/pytorch_helpers.py Outdated Show resolved Hide resolved

dbogunowicz and others added 6 commits May 14, 2024 13:18

Merge branch 'main' into feature/damian/moe

c5fb319

ready to hand over

453a34c

truncate the warning content

92191e8

Merge branch 'main' into feature/damian/moe

b859ff6

Merge remote-tracking branch 'origin/main' into feature/damian/moe

de34285

refresh

c385c68

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MOE Quantization] Warn against "undercalibrated" modules #2262

[MOE Quantization] Warn against "undercalibrated" modules #2262

dbogunowicz commented May 2, 2024 •

edited

Loading

bfineran left a comment

[MOE Quantization] Warn against "undercalibrated" modules #2262

Are you sure you want to change the base?

[MOE Quantization] Warn against "undercalibrated" modules #2262

Conversation

dbogunowicz commented May 2, 2024 • edited Loading

Example Use:

bfineran left a comment

Choose a reason for hiding this comment

dbogunowicz commented May 2, 2024 •

edited

Loading