To run example scripts in this folder, one must first install auto_gptq
as described in this
Commands in this chapter should be run under
quantization
folder.
To Execute basic_usage.py
, using command like this:
python basic_usage.py
This script also showcases how to download/upload quantized model from/to 🤗 Hub, to enable those features, you can uncomment the commented codes.
To Execute basic_usage_with_wikitext2.py
, using command like this:
python basic_usage_with_wikitext2.py
Note: There is about 0.6 ppl degrade on opt-125m model using AutoGPTQ, compared to GPTQ-for-LLaMa.
To Execute quant_with_alpaca.py
, using command like this:
python quant_with_alpaca.py --pretrained_model_dir "facebook/opt-125m" --per_gpu_max_memory 4 --quant_batch_size 16
Use --help
flag to see detailed descriptions for more command arguments.
The alpaca dataset used in here is a cleaned version provided by gururise in AlpacaDataCleaned
Commands in this chapter should be run under
evaluation
folder.
run_language_modeling_task.py
script gives an example of using LanguageModelingTask
to evaluate model's performance on language modeling task before and after quantization using tatsu-lab/alpaca
dataset.
To execute this script, using command like this:
CUDA_VISIBLE_DEVICES=0 python run_language_modeling_task.py --base_model_dir PATH/TO/BASE/MODEL/DIR --quantized_model_dir PATH/TO/QUANTIZED/MODEL/DIR
Use --help
flag to see detailed descriptions for more command arguments.
run_sequence_classification_task.py
script gives an example of using SequenceClassificationTask
to evaluate model's performance on sequence classification task before and after quantization using cardiffnlp/tweet_sentiment_multilingual
dataset.
To execute this script, using command like this:
CUDA_VISIBLE_DEVICES=0 python run_sequence_classification_task.py --base_model_dir PATH/TO/BASE/MODEL/DIR --quantized_model_dir PATH/TO/QUANTIZED/MODEL/DIR
Use --help
flag to see detailed descriptions for more command arguments.
run_text_summarization_task.py
script gives an example of using TextSummarizationTask
to evaluate model's performance on text summarization task before and after quantization using samsum
dataset.
To execute this script, using command like this:
CUDA_VISIBLE_DEVICES=0 python run_text_summarization_task.py --base_model_dir PATH/TO/BASE/MODEL/DIR --quantized_model_dir PATH/TO/QUANTIZED/MODEL/DIR
Use --help
flag to see detailed descriptions for more command arguments.
Commands in this chapter should be run under
benchmark
folder.
generation_speed.py
scripts gives an example of how to benchmark the generations speed of pretrained and quantized models that auto_gptq
supports, this benchmarks model generation speed in tokens/s metric.
To eexcute this script, using command like this:
CUDA_VISIBLE_DEVICES=0 python generation_speed.py --model_name_pr_path PATH/TO/MODEL/DIR
Use --help
flag to see detailed descriptions for more command arguments.