Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GGUF: the file quantization type is not the GGMLQuantizationType. #794

Open
snowyu opened this issue Jul 11, 2024 · 5 comments
Open

GGUF: the file quantization type is not the GGMLQuantizationType. #794

snowyu opened this issue Jul 11, 2024 · 5 comments

Comments

@snowyu
Copy link
Contributor

snowyu commented Jul 11, 2024

There are two kinds of quantization in llama.cpp, don't confuse them:

If "general.file_type" is not configured, the following algorithm is used to guess the quantization type of the file:

https://github.com/ggerganov/llama.cpp/blob/7a221b672e49dfae459b1af27210ba3f2b5419b6/src/llama.cpp#L3751C1-L3802C65

@julien-c
Copy link
Member

Yes. What is the precise issue in this repo?

@snowyu
Copy link
Contributor Author

snowyu commented Jul 14, 2024

@julien-c

  1. The "general.file_type" missing enum FileQuantizationType like GGMLQuantizationType
  2. Should guess the fileQuantType from metadata if no "general.file_type".

And some tests incorrectly use the GGMLQuantizationType as "general.file_type":

https://github.com/huggingface/huggingface.js/blame/8d6fe81cd25936f65975d65eade246064ad48f7b/packages/gguf/src/gguf.spec.ts#L137

https://github.com/huggingface/huggingface.js/blame/8d6fe81cd25936f65975d65eade246064ad48f7b/packages/gguf/src/gguf.spec.ts#L174

@julien-c
Copy link
Member

maybe cc @ngxson (not sure)

@ngxson
Copy link
Member

ngxson commented Jul 16, 2024

Yes you’re correct @snowyu . general.file_type is the quantization scheme (i.e MOSTLY_*). I will push a fix later (sorry I’m quite busy atm)

@julien-c FYI, it’s because quantized model usually use mixed types. For example norm layer can always stay at f32 or f16, while other tensors can be Qk. This improves model performance with a little cost of space. Hence the word « mostly » used in type name

@mishig25
Copy link
Collaborator

mishig25 commented Jul 16, 2024

@snowyu thanks for your comment !

The "general.file_type" missing enum FileQuantizationType like GGMLQuantizationType

indeed, we should create enum GGMLFileQuantizationType that looks similar to GGMLQuantizationType but slightly different. The values for GGMLFileQuantizationType should come from llama_ftype (as you've suggested in the description).

Should guess the fileQuantType from metadata if no "general.file_type".

Since this package is for parsing metadata (not a framework like llama.cpp), we should not guess anything. If the field exists, it exists. Otherwise, it does not and we do not present anything that did not exist in the file itself.

And some tests incorrectly use the GGMLQuantizationType as "general.file_type": here and here

Yep, after we add GGMLFileQuantizationType, we can use GGMLFileQuantizationType instead of GGMLQuantizationType in those tests.

The changes should be pretty straightforward. Please feel free to open a PR and tag me 🤗

snowyu added a commit to snowyu/huggingface.js that referenced this issue Jul 17, 2024
ngxson added a commit that referenced this issue Aug 16, 2024
@mishig25 that's it for #794

---------

Co-authored-by: Xuan Son Nguyen <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants