You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is this model architecture supported by MLC-LLM? (the list of supported models) No
Additional context
This request is to add support to MLC for NVIDIA Nemotron architecture, the 4B Minitron SLM is a good target for edge deployment and Nemo team will continue training it. I am happy to help with the porting/verification efforts but lack expertise of the current MLC/TVM model builder. Support has been added to HF Transformers and llama.cpp to serve as reference. Hoping for those sweet performance gains from MLC q4f16_ft quantization next! 😀
The text was updated successfully, but these errors were encountered:
Nvidia's Llama-3.1-Nemotron-70B-Instruct model is strong and has released. It requires 4 40G GPUs or 2 80G GPUs. I think it is the very case that MLC-LLM will do something good. Just suggest the team to blog the progress of making Nemotron available on MLC-LLM and difficulties. The commity will learn a lot about MLC-LLM. I think it is better than just request a result.
⚙️ Request New Models
Nemotron-Mini-4B-Instruct
Nemotron-4
Additional context
This request is to add support to MLC for NVIDIA Nemotron architecture, the 4B Minitron SLM is a good target for edge deployment and Nemo team will continue training it. I am happy to help with the porting/verification efforts but lack expertise of the current MLC/TVM model builder. Support has been added to HF Transformers and llama.cpp to serve as reference. Hoping for those sweet performance gains from MLC q4f16_ft quantization next! 😀
The text was updated successfully, but these errors were encountered: