[Feature Request]: Use custom models #23

Neet-Nestor · 2024-05-31T04:28:55Z

Problem Description

Users want to be able to upload their own models from local machine.

Solution Description

WebLLM Engine is capable of loading any MLC format models.

https://github.com/mlc-ai/web-llm/tree/main/examples/simple-chat-upload is an example of supporting local model in the app.

We want to do something similar to allow uploading.

0wwafa · 2024-06-04T15:27:34Z

hmm no.. in the example there is a list of models...
I wish to be able to upload a model from alocal directory without downloading it from the internet.

0wwafa · 2024-06-04T15:32:24Z

for example.. let's say I wish to have Mistral Instruct v0.3 quantized as: f16 (output and embed) and q6_k for the other tensors. How should I proceed?

Neet-Nestor · 2024-06-04T16:00:56Z

@0wwafa I understand the need here. Let me explain.

First, the prerequisite for custom models to run on WebLLM chat is that the models must be compiled to MLC format. For more details, checking the instructions of mlc llm here.

Once you got the the MLC-format models on your local, the proposal here is to allow one of the three following ways to use it on the webapp:

You selectes the weight files and wasm files on your local machine, then the webapp loads the files and use it in inference;
You uploads the weights files and wasm files to Hugging Face, then input the url to the webapp. The webapp will download the files from HuggingFace and use it for inference;
You host your model on a local port using mlc-llm CLI, then webapp connects to the port to use the model for inference.

These are planned to be released in the next months. Does any of these fulfill what you need?

0wwafa · 2024-06-05T11:26:45Z

Welll I just wish to see how mistral works in the web browser using one of my quantizations, specifically:
f16 / q6, f16 /q5 and q8/q6 and q8 q5..
https://huggingface.co/ZeroWw/Test

In other words I quantized the output and embed tensors to f16 (or q8) and the other tensors to q6 or q5.
This keeps the "understanding" and "expressing" to an almost lossless quantization (f16) while it quantizes in a "good" way the other tensors.
The results in my test confirm that the model in this way is less degraded and works almost as the original.
I could not see any difference during interactive inference...

Neet-Nestor · 2024-06-24T02:44:12Z

The app has updated to support custom models through MLC-LLM REST APIs by switching model type in settings.

2fb025c

Neet-Nestor · 2024-06-24T02:45:07Z

@0wwafa Could I know whether the update above fulfills your use case through hosting your models through mlc_llm serve command of MLC_LLM?

0wwafa · 2024-06-26T01:07:53Z

My models are available here. I still don't understand how to use them with mlc_llm

Neet-Nestor added the enhancement New feature or request label May 31, 2024

Neet-Nestor mentioned this issue May 31, 2024

[feature request] Please allow to upload a model. mlc-ai/web-llm#421

Closed

Neet-Nestor changed the title ~~[Feature Request]: Upload local models~~ [Feature Request]: Use local models Jun 24, 2024

Neet-Nestor changed the title ~~[Feature Request]: Use local models~~ [Feature Request]: Use custom models Jun 24, 2024

Neet-Nestor closed this as completed Jun 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request]: Use custom models #23

[Feature Request]: Use custom models #23

Neet-Nestor commented May 31, 2024 •

edited

Loading

0wwafa commented Jun 4, 2024

0wwafa commented Jun 4, 2024

Neet-Nestor commented Jun 4, 2024

0wwafa commented Jun 5, 2024

Neet-Nestor commented Jun 24, 2024

Neet-Nestor commented Jun 24, 2024 •

edited

Loading

0wwafa commented Jun 26, 2024

[Feature Request]: Use custom models #23

[Feature Request]: Use custom models #23

Comments

Neet-Nestor commented May 31, 2024 • edited Loading

Problem Description

Solution Description

0wwafa commented Jun 4, 2024

0wwafa commented Jun 4, 2024

Neet-Nestor commented Jun 4, 2024

0wwafa commented Jun 5, 2024

Neet-Nestor commented Jun 24, 2024

Neet-Nestor commented Jun 24, 2024 • edited Loading

0wwafa commented Jun 26, 2024

Neet-Nestor commented May 31, 2024 •

edited

Loading

Neet-Nestor commented Jun 24, 2024 •

edited

Loading