GGUF: the file quantization type is not the GGMLQuantizationType. #794

snowyu · 2024-07-11T07:37:32Z

There are two kinds of quantization in llama.cpp, don't confuse them:

GGMLQuantizationType(ggml_type): for tensor https://github.com/ggerganov/llama.cpp/blob/7a221b672e49dfae459b1af27210ba3f2b5419b6/ggml/include/ggml.h#L354
The GGUF file quantization type(general.file_type): for file https://github.com/ggerganov/llama.cpp/blob/7a221b672e49dfae459b1af27210ba3f2b5419b6/include/llama.h#L131

If "general.file_type" is not configured, the following algorithm is used to guess the quantization type of the file:

https://github.com/ggerganov/llama.cpp/blob/7a221b672e49dfae459b1af27210ba3f2b5419b6/src/llama.cpp#L3751C1-L3802C65

The text was updated successfully, but these errors were encountered:

julien-c · 2024-07-11T08:01:49Z

Yes. What is the precise issue in this repo?

snowyu · 2024-07-14T04:26:37Z

@julien-c

The "general.file_type" missing enum FileQuantizationType like GGMLQuantizationType
Should guess the fileQuantType from metadata if no "general.file_type".

And some tests incorrectly use the GGMLQuantizationType as "general.file_type":

https://github.com/huggingface/huggingface.js/blame/8d6fe81cd25936f65975d65eade246064ad48f7b/packages/gguf/src/gguf.spec.ts#L137

https://github.com/huggingface/huggingface.js/blame/8d6fe81cd25936f65975d65eade246064ad48f7b/packages/gguf/src/gguf.spec.ts#L174

julien-c · 2024-07-16T16:33:42Z

maybe cc @ngxson (not sure)

ngxson · 2024-07-16T16:55:51Z

Yes you’re correct @snowyu . general.file_type is the quantization scheme (i.e MOSTLY_*). I will push a fix later (sorry I’m quite busy atm)

@julien-c FYI, it’s because quantized model usually use mixed types. For example norm layer can always stay at f32 or f16, while other tensors can be Qk. This improves model performance with a little cost of space. Hence the word « mostly » used in type name

mishig25 · 2024-07-16T16:57:42Z

@snowyu thanks for your comment !

The "general.file_type" missing enum FileQuantizationType like GGMLQuantizationType

indeed, we should create enum GGMLFileQuantizationType that looks similar to GGMLQuantizationType but slightly different. The values for GGMLFileQuantizationType should come from llama_ftype (as you've suggested in the description).

Should guess the fileQuantType from metadata if no "general.file_type".

Since this package is for parsing metadata (not a framework like llama.cpp), we should not guess anything. If the field exists, it exists. Otherwise, it does not and we do not present anything that did not exist in the file itself.

And some tests incorrectly use the GGMLQuantizationType as "general.file_type": here and here

Yep, after we add GGMLFileQuantizationType, we can use GGMLFileQuantizationType instead of GGMLQuantizationType in those tests.

The changes should be pretty straightforward. Please feel free to open a PR and tag me 🤗

…ace#794

@mishig25

@mishig25 that's it for #794 --------- Co-authored-by: Xuan Son Nguyen <[email protected]>

snowyu added a commit to snowyu/huggingface.js that referenced this issue Jul 17, 2024

feat: add GGMLFileQuantizationType and apply to test - close huggingf…

d275c9d

…ace#794

snowyu mentioned this issue Jul 17, 2024

feat: add GGMLFileQuantizationType and apply to test #806

Merged

ngxson added a commit that referenced this issue Aug 16, 2024

feat: add GGMLFileQuantizationType and apply to test (#806)

1140e0c

@mishig25 that's it for #794 --------- Co-authored-by: Xuan Son Nguyen <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GGUF: the file quantization type is not the GGMLQuantizationType. #794

GGUF: the file quantization type is not the GGMLQuantizationType. #794

snowyu commented Jul 11, 2024

julien-c commented Jul 11, 2024

snowyu commented Jul 14, 2024

julien-c commented Jul 16, 2024

ngxson commented Jul 16, 2024

mishig25 commented Jul 16, 2024 •

edited

Loading

GGUF: the file quantization type is not the GGMLQuantizationType. #794

GGUF: the file quantization type is not the GGMLQuantizationType. #794

Comments

snowyu commented Jul 11, 2024

julien-c commented Jul 11, 2024

snowyu commented Jul 14, 2024

julien-c commented Jul 16, 2024

ngxson commented Jul 16, 2024

mishig25 commented Jul 16, 2024 • edited Loading

mishig25 commented Jul 16, 2024 •

edited

Loading