INT4 量化的模型可以被Megatron-LLaMA支持吗？ #46

Jeff123z · 2023-10-17T12:30:11Z

我只有两块3090 N卡(装在同一网络的两台机器上)。拿到了LLAMA2 70b GPTQ int4量化的模型文件（约35G)了。想先转换成megatron format, 不知道可不可以？我自己试了试

python ./tools/checkpoint_conversion/llama_checkpoint_conversion.py --load_path "path1" --save_path “output_path2" --target_tensor_model_parallel_size 2 --target_pipeline_model_parallel_size 1 --target_data_parallel_size 1 --make_vocab_size_divisible_by 1 --print-checkpoint-structure --megatron-path "./Megatron_LLaMA"

转换后，在

进去看，每个model_optim_rng.pt只有2G, 两个目录下pt文件加起来就4G, 远远小于35G. 但如果用原始的LLAMA2 7B hf （pytorch_model.bin format) , 未经量化的大约是13G, 转换成megatron format后两个目录下pt文件加起来也是13G左右，看起来很正常。

li-yi-dong · 2023-10-18T02:06:57Z

LLaMA2 70B 的GQA 还没支持，正在开发

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

INT4 量化的模型可以被Megatron-LLaMA支持吗？ #46

INT4 量化的模型可以被Megatron-LLaMA支持吗？ #46

Jeff123z commented Oct 17, 2023 •

edited

Loading

li-yi-dong commented Oct 18, 2023

INT4 量化的模型可以被Megatron-LLaMA支持吗？ #46

INT4 量化的模型可以被Megatron-LLaMA支持吗？ #46

Comments

Jeff123z commented Oct 17, 2023 • edited Loading

li-yi-dong commented Oct 18, 2023

Jeff123z commented Oct 17, 2023 •

edited

Loading