Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for model in nemo format #54

Open
Minhhnh opened this issue Oct 1, 2024 · 7 comments
Open

Support for model in nemo format #54

Minhhnh opened this issue Oct 1, 2024 · 7 comments

Comments

@Minhhnh
Copy link

Minhhnh commented Oct 1, 2024

Hi,

I am currently working on an LLM project and have fine-tuned a model (e.g., Llama 3) using NVIDIA NeMo, resulting in a .nemo format model. While I can deploy it by exporting to trt-llm, the current version of this repository does not yet support that workflow. I believe there’s an opportunity to extend the project to support that backend version.

I find this project fascinating and would love to contribute by adding compatibility for models served via an API using the .nemo format. If possible, I would be happy to discuss how I can contribute to this effort.

Looking forward to your thoughts.

@npuichigo
Copy link
Owner

Thanks you. Your contribution is very welcome.

Can you explain why the current version has limited support to case. Is it because the parameters are fixed here? Do you have any insight to improve that?

@Minhhnh
Copy link
Author

Minhhnh commented Oct 2, 2024

The issue is that the parameters are fixed, which makes the model incompatible when exporting it to TensorRT-LLM and deploying it on Triton Server.
You can refer to the details here: Export and Deploy a LLM Model.
It only requires minimal effort, mainly renaming the parameters.

@npuichigo
Copy link
Owner

So what is needed is to specify model_type to choose parameters? Maybe I’d like a more elegant way instead of if-else to support that.

@Minhhnh
Copy link
Author

Minhhnh commented Oct 2, 2024

I don't have a specific idea yet. However, when I fine-tune models, I typically pull them from Huggingface. I’ve noticed that there is usually a file called config.json (for example: config.json) which already includes the model_type. Do you think implementing something similar could be a good approach?

@Minhhnh
Copy link
Author

Minhhnh commented Oct 2, 2024

That's not exactly the issue I was referring to. The parameters I'm talking about are in the file chat.rs. When deploying NeMo LLMs on a Triton Server, the request format from openai_trtllm differs from the input as expected by Triton. To make them compatible, we need to rename the parameters. For example, change "text_input" to "prompts" and "text_output" to "outputs", etc.

@npuichigo
Copy link
Owner

So what about other parameters? Does NeMo support most of them?

@Minhhnh
Copy link
Author

Minhhnh commented Oct 3, 2024

image
Here are the changes I made. I couldn’t find equivalent parameters for presence_penalty and beam_width.

For streaming, there's a parameter called --enable_streaming available when deploying with NeMo. You can find more information here.

Hope this helps!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants