Supporting inference with EETQ quantized model #391

thincal · 2024-04-05T14:23:12Z

Feature request

EETQ quantized model perform with very good quality in my case, but the loading is pretty slow. So that if the base model is quantized with EETQ already, LoRAX should load it directly without the JIT quantization, but currently will failed to find related layers.

Motivation

Speed up the EETQ model loading speed.

Your contribution

I will prepare a PR for a review, also I need some help with the implementation in someplace.

thincal linked a pull request Apr 5, 2024 that will close this issue

feat: support loading eetq quantized model #393

Draft

3 tasks

tgaddair added the enhancement New feature or request label May 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Supporting inference with EETQ quantized model #391

Supporting inference with EETQ quantized model #391

thincal commented Apr 5, 2024 •

edited

Loading

Supporting inference with EETQ quantized model #391

Supporting inference with EETQ quantized model #391

Comments

thincal commented Apr 5, 2024 • edited Loading

Feature request

Motivation

Your contribution

thincal commented Apr 5, 2024 •

edited

Loading