Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AutoGPTQ state dict converter #2315

Closed
wants to merge 3 commits into from
Closed

Conversation

rahul-tuli
Copy link
Member

@rahul-tuli rahul-tuli commented Jun 4, 2024

PR Description

This pull request introduces the following enhancements:

  1. BaseConverter for Transforming Model Checkpoints:

    • A new BaseConverter class has been added to facilitate the transformation of model checkpoints.
  2. ExllamaToCompressedTensorConverter:

    • This new converter transforms an AutoGPTQ Exllama checkpoint into the CompressedTensors format, making it loadable in SparseAutoModel classes.

Test Code

Below is an example of how to use the ExllamaToCompressedTensorConverter:

Test Code:

from sparseml.utils.pytorch import ExllamaToCompressedTensorConverter

def local_test():
    autogptq_model_path: str = "/network/rahul/tinyllama_1b_test_w4a16"
    new_path = ExllamaToCompressedTensorConverter.convert_from_safetensors(
        autogptq_model_path, save_dir="local/models/compressed_tensor_equi"
    )

    from sparseml.transformers import SparseAutoModelForCausalLM
    model = SparseAutoModelForCausalLM.from_pretrained(new_path)


local_test()

Output:

python local/investigation.py
/root/projects/sparseml/.venv/lib/python3.10/site-packages/pydantic/_internal/_fields.py:151: UserWarning: Field "model_name" has conflict with protected namespace "model_".

You may be able to resolve this warning by setting `model_config['protected_namespaces'] = ()`.
  warnings.warn(
/root/projects/sparseml/.venv/lib/python3.10/site-packages/pydantic/_internal/_fields.py:186: UserWarning: Field name "registry_requires_subclass" shadows an attribute in parent "RegistryMixin"; 
  warnings.warn(
/root/projects/sparseml/.venv/lib/python3.10/site-packages/pydantic/_internal/_fields.py:186: UserWarning: Field name "registry_requires_subclass" shadows an attribute in parent "SparsityCompressionConfig"; 
  warnings.warn(
2024-06-07 14:33:54 sparseml.utils.pytorch.converters.converters INFO     Loading file: /network/rahul/tinyllama_1b_test_w4a16/model.safetensors
2024-06-07 14:33:54 sparseml.utils.pytorch.converters.converters INFO     Applying transformations...
2024-06-07 14:33:54 sparseml.utils.pytorch.converters.transformations INFO     Applying transformation: TRANSFORM_AUTOGPTQ_WEIGHTS_AND_RESHAPE_TENSORS
2024-06-07 14:34:35 sparseml.utils.pytorch.converters.transformations INFO     Transformation: TRANSFORM_AUTOGPTQ_WEIGHTS_AND_RESHAPE_TENSORS complete
2024-06-07 14:34:35 sparseml.utils.pytorch.converters.transformations INFO     Applying transformation: TRANSFORM_EXLLAMA_NAMES
2024-06-07 14:34:35 sparseml.utils.pytorch.converters.transformations INFO     Transformation: TRANSFORM_EXLLAMA_NAMES complete
2024-06-07 14:34:49 sparseml.utils.pytorch.converters.converters INFO     Copying file: /network/rahul/tinyllama_1b_test_w4a16/config.json to local/models/compressed_tensor_equi
2024-06-07 14:34:49 sparseml.utils.pytorch.converters.converters INFO     Copying file: /network/rahul/tinyllama_1b_test_w4a16/quantize_config.json to local/models/compressed_tensor_equi
2024-06-07 14:34:49 sparseml.utils.pytorch.converters.converters INFO     Copying file: /network/rahul/tinyllama_1b_test_w4a16/tokenizer_config.json to local/models/compressed_tensor_equi
2024-06-07 14:34:49 sparseml.utils.pytorch.converters.converters INFO     Copying file: /network/rahul/tinyllama_1b_test_w4a16/special_tokens_map.json to local/models/compressed_tensor_equi
2024-06-07 14:34:49 sparseml.utils.pytorch.converters.converters INFO     Copying file: /network/rahul/tinyllama_1b_test_w4a16/tokenizer.model to local/models/compressed_tensor_equi
2024-06-07 14:34:49 sparseml.utils.pytorch.converters.converters INFO     Copying file: /network/rahul/tinyllama_1b_test_w4a16/tokenizer.json to local/models/compressed_tensor_equi
2024-06-07 14:34:49 sparseml.utils.pytorch.converters.converters INFO     Copying file: /network/rahul/tinyllama_1b_test_w4a16/recipe.yaml to local/models/compressed_tensor_equi
/root/projects/sparseml/.venv/lib/python3.10/site-packages/pydantic/_internal/_fields.py:151: UserWarning: Field "model_fuse_fn_name" has conflict with protected namespace "model_".

You may be able to resolve this warning by setting `model_config['protected_namespaces'] = ()`.
  warnings.warn(
/root/projects/sparseml/.venv/lib/python3.10/site-packages/pydantic/_internal/_fields.py:151: UserWarning: Field "model_fuse_fn_kwargs" has conflict with protected namespace "model_".

You may be able to resolve this warning by setting `model_config['protected_namespaces'] = ()`.
  warnings.warn(
/root/projects/sparseml/.venv/lib/python3.10/site-packages/transformers/utils/import_utils.py:521: FutureWarning: `is_torch_tpu_available` is deprecated and will be removed in 4.41.0. Please use the `is_torch_xla_available` instead.
  warnings.warn(
2024-06-07 14:34:49 sparseml.utils.pytorch.converters.converters INFO     Updating quantization config...
2024-06-07 14:34:50 sparseml.transformers.sparsification.sparse_model WARNING  The dtype of the loaded model: torch.float32 is different from from the dtype specified in the model config: torch.float16.To load the model in the format that it was previously saved in, set torch_dtype=`auto` in the SparseAutoModel creation call.
2024-06-07 14:34:50 sparseml.transformers.utils.helpers INFO     Found recipe in the model_path: local/models/compressed_tensor_equi/recipe.yaml
Logging all SparseML modifier-level logs to sparse_logs/07-06-2024_14.34.50.log
2024-06-07 14:34:50 sparseml.core.logger.logger INFO     Logging all SparseML modifier-level logs to sparse_logs/07-06-2024_14.34.50.log
2024-06-07 14:34:50 sparseml.core.recipe.recipe INFO     Loading recipe from file local/models/compressed_tensor_equi/recipe.yaml
2024-06-07 14:34:50 sparseml.modifiers.quantization.gptq.base WARNING  GPTQ quantization is set to True without an active quantization modifier.
2024-06-07 14:34:50 sparseml.modifiers.quantization.gptq.base INFO     Building quantization modifier with args: {'config_groups': {'group_0': QuantizationScheme(targets=['Linear'], weights=QuantizationArgs(num_bits=4, type=<QuantizationType.INT: 'int'>, symmetric=True, group_size=128, strategy=<QuantizationStrategy.GROUP: 'group'>, block_structure=None, dynamic=False, observer='minmax', observer_kwargs={}), input_activations=None, output_activations=None)}, 'ignore': ['lm_head', 'Embedding']}
manager stage: Model structure initialized
2024-06-07 14:34:50 sparseml.pytorch.model_load.helpers INFO     Applied an unstaged recipe to the model at local/models/compressed_tensor_equi
➜  sparseml git:(autogptq-compressed-tensors) 

Original Checkpoint:

➜  sparseml git:(autogptq-compressed-tensors) tree "/network/rahul/tinyllama_1b_test_w4a16"
/network/rahul/tinyllama_1b_test_w4a16
|-- config.json
|-- model.safetensors
|-- quantize_config.json
|-- recipe.yaml
|-- special_tokens_map.json
|-- tokenizer.json
|-- tokenizer.model
`-- tokenizer_config.json

0 directories, 8 files

New Checkpoint:

➜  sparseml git:(autogptq-compressed-tensors) tree "local/models/compressed_tensor_equi"
local/models/compressed_tensor_equi
|-- config.json
|-- model.safetensors
|-- quantize_config.json
|-- recipe.yaml
|-- special_tokens_map.json
|-- tokenizer.json
|-- tokenizer.model
`-- tokenizer_config.json

0 directories, 8 files

@rahul-tuli rahul-tuli force-pushed the autogptq-compressed-tensors branch from 31ef166 to 643598e Compare June 5, 2024 15:46
converting autogptq checkpoint to be loadable in
sparseml, credits to @dbogonwicz for a dimension
mismatch bugfix
Stitch converter with init files and bugfixes
@rahul-tuli rahul-tuli force-pushed the autogptq-compressed-tensors branch from 643598e to adc42f2 Compare June 7, 2024 14:29
@rahul-tuli rahul-tuli changed the title [WIP] AutoGptq state dict converter AutoGPTQ state dict converter Jun 7, 2024
@rahul-tuli rahul-tuli marked this pull request as ready for review June 7, 2024 14:38
@rahul-tuli rahul-tuli self-assigned this Jun 7, 2024
Copy link
Contributor

@dbogunowicz dbogunowicz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

could we:

  1. turn local_test() into a unittest
  2. also make sure that the pre-translation and post-translation tests give the same outputs? just to make sure that the translation from one model to the other is not lossy in any way.

Copy link
Contributor

@bfineran bfineran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's run some evals against the produced models
and before land, let's rewrite translate so we process layer by layer to save memory

@rahul-tuli
Copy link
Member Author

Functionality moved to compressed tensors: neuralmagic/compressed-tensors#82

@rahul-tuli rahul-tuli closed this Jun 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants