[Model Request] Nemotron architecture #2901

dusty-nv · 2024-09-13T16:17:05Z

⚙️ Request New Models

Link to an existing implementation (e.g. Hugging Face/Github): Nemotron-Mini-4B-Instruct Nemotron-4
Is this model architecture supported by MLC-LLM? (the list of supported models) No

Additional context

This request is to add support to MLC for NVIDIA Nemotron architecture, the 4B Minitron SLM is a good target for edge deployment and Nemo team will continue training it. I am happy to help with the porting/verification efforts but lack expertise of the current MLC/TVM model builder. Support has been added to HF Transformers and llama.cpp to serve as reference. Hoping for those sweet performance gains from MLC q4f16_ft quantization next! 😀

The text was updated successfully, but these errors were encountered:

wuxianliang · 2024-10-17T10:32:16Z

Nvidia's Llama-3.1-Nemotron-70B-Instruct model is strong and has released. It requires 4 40G GPUs or 2 80G GPUs. I think it is the very case that MLC-LLM will do something good. Just suggest the team to blog the progress of making Nemotron available on MLC-LLM and difficulties. The commity will learn a lot about MLC-LLM. I think it is better than just request a result.

dusty-nv added the new-models label Sep 13, 2024

tqchen assigned YiyanZhai Sep 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Model Request] Nemotron architecture #2901

[Model Request] Nemotron architecture #2901

dusty-nv commented Sep 13, 2024

wuxianliang commented Oct 17, 2024

[Model Request] Nemotron architecture #2901

[Model Request] Nemotron architecture #2901

Comments

dusty-nv commented Sep 13, 2024

⚙️ Request New Models

Additional context

wuxianliang commented Oct 17, 2024