Support for model in nemo format #54

Minhhnh · 2024-10-01T07:04:19Z

Hi,

I am currently working on an LLM project and have fine-tuned a model (e.g., Llama 3) using NVIDIA NeMo, resulting in a .nemo format model. While I can deploy it by exporting to trt-llm, the current version of this repository does not yet support that workflow. I believe there’s an opportunity to extend the project to support that backend version.

I find this project fascinating and would love to contribute by adding compatibility for models served via an API using the .nemo format. If possible, I would be happy to discuss how I can contribute to this effort.

Looking forward to your thoughts.

npuichigo · 2024-10-01T11:52:41Z

Thanks you. Your contribution is very welcome.

Can you explain why the current version has limited support to case. Is it because the parameters are fixed here? Do you have any insight to improve that?

Minhhnh · 2024-10-02T01:51:42Z

The issue is that the parameters are fixed, which makes the model incompatible when exporting it to TensorRT-LLM and deploying it on Triton Server.
You can refer to the details here: Export and Deploy a LLM Model.
It only requires minimal effort, mainly renaming the parameters.

npuichigo · 2024-10-02T03:05:10Z

So what is needed is to specify model_type to choose parameters? Maybe I’d like a more elegant way instead of if-else to support that.

Minhhnh · 2024-10-02T04:04:46Z

I don't have a specific idea yet. However, when I fine-tune models, I typically pull them from Huggingface. I’ve noticed that there is usually a file called config.json (for example: config.json) which already includes the model_type. Do you think implementing something similar could be a good approach?

Minhhnh · 2024-10-02T04:20:57Z

That's not exactly the issue I was referring to. The parameters I'm talking about are in the file chat.rs. When deploying NeMo LLMs on a Triton Server, the request format from openai_trtllm differs from the input as expected by Triton. To make them compatible, we need to rename the parameters. For example, change "text_input" to "prompts" and "text_output" to "outputs", etc.

npuichigo · 2024-10-02T16:24:50Z

So what about other parameters? Does NeMo support most of them?

Minhhnh · 2024-10-03T01:36:16Z

Here are the changes I made. I couldn’t find equivalent parameters for presence_penalty and beam_width.

For streaming, there's a parameter called --enable_streaming available when deploying with NeMo. You can find more information here.

Hope this helps!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for model in nemo format #54

Support for model in nemo format #54

Minhhnh commented Oct 1, 2024

npuichigo commented Oct 1, 2024

Minhhnh commented Oct 2, 2024

npuichigo commented Oct 2, 2024

Minhhnh commented Oct 2, 2024

Minhhnh commented Oct 2, 2024 •

edited

Loading

npuichigo commented Oct 2, 2024

Minhhnh commented Oct 3, 2024

Support for model in nemo format #54

Support for model in nemo format #54

Comments

Minhhnh commented Oct 1, 2024

npuichigo commented Oct 1, 2024

Minhhnh commented Oct 2, 2024

npuichigo commented Oct 2, 2024

Minhhnh commented Oct 2, 2024

Minhhnh commented Oct 2, 2024 • edited Loading

npuichigo commented Oct 2, 2024

Minhhnh commented Oct 3, 2024

Minhhnh commented Oct 2, 2024 •

edited

Loading