Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Model Request] Nemotron architecture #2901

Open
dusty-nv opened this issue Sep 13, 2024 · 1 comment
Open

[Model Request] Nemotron architecture #2901

dusty-nv opened this issue Sep 13, 2024 · 1 comment
Assignees

Comments

@dusty-nv
Copy link

⚙️ Request New Models

Additional context

This request is to add support to MLC for NVIDIA Nemotron architecture, the 4B Minitron SLM is a good target for edge deployment and Nemo team will continue training it. I am happy to help with the porting/verification efforts but lack expertise of the current MLC/TVM model builder. Support has been added to HF Transformers and llama.cpp to serve as reference. Hoping for those sweet performance gains from MLC q4f16_ft quantization next! 😀

@wuxianliang
Copy link

Nvidia's Llama-3.1-Nemotron-70B-Instruct model is strong and has released. It requires 4 40G GPUs or 2 80G GPUs. I think it is the very case that MLC-LLM will do something good. Just suggest the team to blog the progress of making Nemotron available on MLC-LLM and difficulties. The commity will learn a lot about MLC-LLM. I think it is better than just request a result.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

No branches or pull requests

3 participants