-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for model in nemo format #54
Comments
Thanks you. Your contribution is very welcome. Can you explain why the current version has limited support to case. Is it because the parameters are fixed here? Do you have any insight to improve that? |
The issue is that the parameters are fixed, which makes the model incompatible when exporting it to TensorRT-LLM and deploying it on Triton Server. |
So what is needed is to specify |
I don't have a specific idea yet. However, when I fine-tune models, I typically pull them from Huggingface. I’ve noticed that there is usually a file called config.json (for example: config.json) which already includes the model_type. Do you think implementing something similar could be a good approach? |
That's not exactly the issue I was referring to. The parameters I'm talking about are in the file chat.rs. When deploying NeMo LLMs on a Triton Server, the request format from openai_trtllm differs from the input as expected by Triton. To make them compatible, we need to rename the parameters. For example, change "text_input" to "prompts" and "text_output" to "outputs", etc. |
So what about other parameters? Does NeMo support most of them? |
For streaming, there's a parameter called --enable_streaming available when deploying with NeMo. You can find more information here. Hope this helps! |
Hi,
I am currently working on an LLM project and have fine-tuned a model (e.g., Llama 3) using NVIDIA NeMo, resulting in a .nemo format model. While I can deploy it by exporting to trt-llm, the current version of this repository does not yet support that workflow. I believe there’s an opportunity to extend the project to support that backend version.
I find this project fascinating and would love to contribute by adding compatibility for models served via an API using the .nemo format. If possible, I would be happy to discuss how I can contribute to this effort.
Looking forward to your thoughts.
The text was updated successfully, but these errors were encountered: