Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request]: Use custom models #23

Closed
Neet-Nestor opened this issue May 31, 2024 · 7 comments
Closed

[Feature Request]: Use custom models #23

Neet-Nestor opened this issue May 31, 2024 · 7 comments
Labels
enhancement New feature or request

Comments

@Neet-Nestor
Copy link
Collaborator

Neet-Nestor commented May 31, 2024

Problem Description

mlc-ai/web-llm#421

Users want to be able to upload their own models from local machine.

Solution Description

WebLLM Engine is capable of loading any MLC format models.

https://github.com/mlc-ai/web-llm/tree/main/examples/simple-chat-upload is an example of supporting local model in the app.

We want to do something similar to allow uploading.

@0wwafa
Copy link

0wwafa commented Jun 4, 2024

hmm no.. in the example there is a list of models...
I wish to be able to upload a model from alocal directory without downloading it from the internet.

@0wwafa
Copy link

0wwafa commented Jun 4, 2024

for example.. let's say I wish to have Mistral Instruct v0.3 quantized as: f16 (output and embed) and q6_k for the other tensors. How should I proceed?

@Neet-Nestor
Copy link
Collaborator Author

@0wwafa I understand the need here. Let me explain.

First, the prerequisite for custom models to run on WebLLM chat is that the models must be compiled to MLC format. For more details, checking the instructions of mlc llm here.

Once you got the the MLC-format models on your local, the proposal here is to allow one of the three following ways to use it on the webapp:

  1. You selectes the weight files and wasm files on your local machine, then the webapp loads the files and use it in inference;
  2. You uploads the weights files and wasm files to Hugging Face, then input the url to the webapp. The webapp will download the files from HuggingFace and use it for inference;
  3. You host your model on a local port using mlc-llm CLI, then webapp connects to the port to use the model for inference.

These are planned to be released in the next months. Does any of these fulfill what you need?

@0wwafa
Copy link

0wwafa commented Jun 5, 2024

Welll I just wish to see how mistral works in the web browser using one of my quantizations, specifically:
f16 / q6, f16 /q5 and q8/q6 and q8 q5..
https://huggingface.co/ZeroWw/Test

In other words I quantized the output and embed tensors to f16 (or q8) and the other tensors to q6 or q5.
This keeps the "understanding" and "expressing" to an almost lossless quantization (f16) while it quantizes in a "good" way the other tensors.
The results in my test confirm that the model in this way is less degraded and works almost as the original.
I could not see any difference during interactive inference...

@Neet-Nestor Neet-Nestor changed the title [Feature Request]: Upload local models [Feature Request]: Use local models Jun 24, 2024
@Neet-Nestor Neet-Nestor changed the title [Feature Request]: Use local models [Feature Request]: Use custom models Jun 24, 2024
@Neet-Nestor
Copy link
Collaborator Author

The app has updated to support custom models through MLC-LLM REST APIs by switching model type in settings.

2fb025c

@Neet-Nestor
Copy link
Collaborator Author

Neet-Nestor commented Jun 24, 2024

@0wwafa Could I know whether the update above fulfills your use case through hosting your models through mlc_llm serve command of MLC_LLM?

@0wwafa
Copy link

0wwafa commented Jun 26, 2024

My models are available here. I still don't understand how to use them with mlc_llm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants