AutoGPTQ state dict converter #2315

rahul-tuli · 2024-06-04T14:52:25Z

PR Description

This pull request introduces the following enhancements:

BaseConverter for Transforming Model Checkpoints:
- A new BaseConverter class has been added to facilitate the transformation of model checkpoints.
ExllamaToCompressedTensorConverter:
- This new converter transforms an AutoGPTQ Exllama checkpoint into the CompressedTensors format, making it loadable in SparseAutoModel classes.

Test Code

Below is an example of how to use the ExllamaToCompressedTensorConverter:

Test Code:

from sparseml.utils.pytorch import ExllamaToCompressedTensorConverter

def local_test():
    autogptq_model_path: str = "/network/rahul/tinyllama_1b_test_w4a16"
    new_path = ExllamaToCompressedTensorConverter.convert_from_safetensors(
        autogptq_model_path, save_dir="local/models/compressed_tensor_equi"
    )

    from sparseml.transformers import SparseAutoModelForCausalLM
    model = SparseAutoModelForCausalLM.from_pretrained(new_path)


local_test()

Output:

python local/investigation.py
/root/projects/sparseml/.venv/lib/python3.10/site-packages/pydantic/_internal/_fields.py:151: UserWarning: Field "model_name" has conflict with protected namespace "model_".

You may be able to resolve this warning by setting `model_config['protected_namespaces'] = ()`.
  warnings.warn(
/root/projects/sparseml/.venv/lib/python3.10/site-packages/pydantic/_internal/_fields.py:186: UserWarning: Field name "registry_requires_subclass" shadows an attribute in parent "RegistryMixin"; 
  warnings.warn(
/root/projects/sparseml/.venv/lib/python3.10/site-packages/pydantic/_internal/_fields.py:186: UserWarning: Field name "registry_requires_subclass" shadows an attribute in parent "SparsityCompressionConfig"; 
  warnings.warn(
2024-06-07 14:33:54 sparseml.utils.pytorch.converters.converters INFO     Loading file: /network/rahul/tinyllama_1b_test_w4a16/model.safetensors
2024-06-07 14:33:54 sparseml.utils.pytorch.converters.converters INFO     Applying transformations...
2024-06-07 14:33:54 sparseml.utils.pytorch.converters.transformations INFO     Applying transformation: TRANSFORM_AUTOGPTQ_WEIGHTS_AND_RESHAPE_TENSORS
2024-06-07 14:34:35 sparseml.utils.pytorch.converters.transformations INFO     Transformation: TRANSFORM_AUTOGPTQ_WEIGHTS_AND_RESHAPE_TENSORS complete
2024-06-07 14:34:35 sparseml.utils.pytorch.converters.transformations INFO     Applying transformation: TRANSFORM_EXLLAMA_NAMES
2024-06-07 14:34:35 sparseml.utils.pytorch.converters.transformations INFO     Transformation: TRANSFORM_EXLLAMA_NAMES complete
2024-06-07 14:34:49 sparseml.utils.pytorch.converters.converters INFO     Copying file: /network/rahul/tinyllama_1b_test_w4a16/config.json to local/models/compressed_tensor_equi
2024-06-07 14:34:49 sparseml.utils.pytorch.converters.converters INFO     Copying file: /network/rahul/tinyllama_1b_test_w4a16/quantize_config.json to local/models/compressed_tensor_equi
2024-06-07 14:34:49 sparseml.utils.pytorch.converters.converters INFO     Copying file: /network/rahul/tinyllama_1b_test_w4a16/tokenizer_config.json to local/models/compressed_tensor_equi
2024-06-07 14:34:49 sparseml.utils.pytorch.converters.converters INFO     Copying file: /network/rahul/tinyllama_1b_test_w4a16/special_tokens_map.json to local/models/compressed_tensor_equi
2024-06-07 14:34:49 sparseml.utils.pytorch.converters.converters INFO     Copying file: /network/rahul/tinyllama_1b_test_w4a16/tokenizer.model to local/models/compressed_tensor_equi
2024-06-07 14:34:49 sparseml.utils.pytorch.converters.converters INFO     Copying file: /network/rahul/tinyllama_1b_test_w4a16/tokenizer.json to local/models/compressed_tensor_equi
2024-06-07 14:34:49 sparseml.utils.pytorch.converters.converters INFO     Copying file: /network/rahul/tinyllama_1b_test_w4a16/recipe.yaml to local/models/compressed_tensor_equi
/root/projects/sparseml/.venv/lib/python3.10/site-packages/pydantic/_internal/_fields.py:151: UserWarning: Field "model_fuse_fn_name" has conflict with protected namespace "model_".

You may be able to resolve this warning by setting `model_config['protected_namespaces'] = ()`.
  warnings.warn(
/root/projects/sparseml/.venv/lib/python3.10/site-packages/pydantic/_internal/_fields.py:151: UserWarning: Field "model_fuse_fn_kwargs" has conflict with protected namespace "model_".

You may be able to resolve this warning by setting `model_config['protected_namespaces'] = ()`.
  warnings.warn(
/root/projects/sparseml/.venv/lib/python3.10/site-packages/transformers/utils/import_utils.py:521: FutureWarning: `is_torch_tpu_available` is deprecated and will be removed in 4.41.0. Please use the `is_torch_xla_available` instead.
  warnings.warn(
2024-06-07 14:34:49 sparseml.utils.pytorch.converters.converters INFO     Updating quantization config...
2024-06-07 14:34:50 sparseml.transformers.sparsification.sparse_model WARNING  The dtype of the loaded model: torch.float32 is different from from the dtype specified in the model config: torch.float16.To load the model in the format that it was previously saved in, set torch_dtype=`auto` in the SparseAutoModel creation call.
2024-06-07 14:34:50 sparseml.transformers.utils.helpers INFO     Found recipe in the model_path: local/models/compressed_tensor_equi/recipe.yaml
Logging all SparseML modifier-level logs to sparse_logs/07-06-2024_14.34.50.log
2024-06-07 14:34:50 sparseml.core.logger.logger INFO     Logging all SparseML modifier-level logs to sparse_logs/07-06-2024_14.34.50.log
2024-06-07 14:34:50 sparseml.core.recipe.recipe INFO     Loading recipe from file local/models/compressed_tensor_equi/recipe.yaml
2024-06-07 14:34:50 sparseml.modifiers.quantization.gptq.base WARNING  GPTQ quantization is set to True without an active quantization modifier.
2024-06-07 14:34:50 sparseml.modifiers.quantization.gptq.base INFO     Building quantization modifier with args: {'config_groups': {'group_0': QuantizationScheme(targets=['Linear'], weights=QuantizationArgs(num_bits=4, type=<QuantizationType.INT: 'int'>, symmetric=True, group_size=128, strategy=<QuantizationStrategy.GROUP: 'group'>, block_structure=None, dynamic=False, observer='minmax', observer_kwargs={}), input_activations=None, output_activations=None)}, 'ignore': ['lm_head', 'Embedding']}
manager stage: Model structure initialized
2024-06-07 14:34:50 sparseml.pytorch.model_load.helpers INFO     Applied an unstaged recipe to the model at local/models/compressed_tensor_equi
➜  sparseml git:(autogptq-compressed-tensors)

Original Checkpoint:

➜  sparseml git:(autogptq-compressed-tensors) tree "/network/rahul/tinyllama_1b_test_w4a16"
/network/rahul/tinyllama_1b_test_w4a16
|-- config.json
|-- model.safetensors
|-- quantize_config.json
|-- recipe.yaml
|-- special_tokens_map.json
|-- tokenizer.json
|-- tokenizer.model
`-- tokenizer_config.json

0 directories, 8 files

New Checkpoint:

➜  sparseml git:(autogptq-compressed-tensors) tree "local/models/compressed_tensor_equi"
local/models/compressed_tensor_equi
|-- config.json
|-- model.safetensors
|-- quantize_config.json
|-- recipe.yaml
|-- special_tokens_map.json
|-- tokenizer.json
|-- tokenizer.model
`-- tokenizer_config.json

0 directories, 8 files

converting autogptq checkpoint to be loadable in sparseml, credits to @dbogonwicz for a dimension mismatch bugfix Stitch converter with init files and bugfixes

dbogunowicz

lgtm

could we:

turn local_test() into a unittest
also make sure that the pre-translation and post-translation tests give the same outputs? just to make sure that the translation from one model to the other is not lossy in any way.

bfineran

let's run some evals against the produced models
and before land, let's rewrite translate so we process layer by layer to save memory

rahul-tuli · 2024-06-13T14:05:14Z

Functionality moved to compressed tensors: neuralmagic/compressed-tensors#82

rahul-tuli force-pushed the autogptq-compressed-tensors branch from 31ef166 to 643598e Compare June 5, 2024 15:46

Add convertors and transformations for

adc42f2

converting autogptq checkpoint to be loadable in sparseml, credits to @dbogonwicz for a dimension mismatch bugfix Stitch converter with init files and bugfixes

rahul-tuli force-pushed the autogptq-compressed-tensors branch from 643598e to adc42f2 Compare June 7, 2024 14:29

rahul-tuli changed the title ~~[WIP] AutoGptq state dict converter~~ AutoGPTQ state dict converter Jun 7, 2024

rahul-tuli marked this pull request as ready for review June 7, 2024 14:38

rahul-tuli requested review from Satrat, dsikka, dbogunowicz and bfineran June 7, 2024 14:39

rahul-tuli self-assigned this Jun 7, 2024

Fix docstrings

fa6234b

dbogunowicz approved these changes Jun 7, 2024

View reviewed changes

bfineran suggested changes Jun 7, 2024

View reviewed changes

Merge branch 'main' into autogptq-compressed-tensors

398efd2

rahul-tuli closed this Jun 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AutoGPTQ state dict converter #2315

AutoGPTQ state dict converter #2315

rahul-tuli commented Jun 4, 2024 •

edited

Loading

dbogunowicz left a comment

bfineran left a comment

rahul-tuli commented Jun 13, 2024

AutoGPTQ state dict converter #2315

AutoGPTQ state dict converter #2315

Conversation

rahul-tuli commented Jun 4, 2024 • edited Loading

PR Description

Test Code

dbogunowicz left a comment

Choose a reason for hiding this comment

bfineran left a comment

Choose a reason for hiding this comment

rahul-tuli commented Jun 13, 2024

rahul-tuli commented Jun 4, 2024 •

edited

Loading