Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BFCL] Add dynamic max_token handling for locally hosted models #693

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

yutongxie58
Copy link

This PR adds a get_max_tokens function that dynamically sets the max_tokens limit based on the model name. It supports different models like Llama, GLM, Phi, Hermes, etc. The logic is based on model families and their typical token limits. Tested locally with various model names.

Copy link
Collaborator

@HuanzhiMao HuanzhiMao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @yutongxie58,

Thanks for the PR and welcome!

According to OpenAI's spec, the max_token for the chat.completion endpoint is the maximum number of tokens that can be generated. It does not include the input token count. For example, if a model has a context length of 4096, our input message takes1000 tokens and you set the max_token to 4096, then that would error because the total number of tokens (1000 in input and 4096 requesed for output) exceeds the model's context window length. So what we want to do is that, before call the chat.completion endpoint, use the model's tokenizer (each model has its own tokenizer) to figure out how many tokens the input message formatted_prompt has used, subtract that amount from the model's maximum context length and supply that value for the max_token argument so that we will never get the maximum length exceeded error. In short, we want to allow the model to generate as many as possible till the limit of its context length.

Let me know if this makes sense.

@HuanzhiMao HuanzhiMao added the BFCL-General General BFCL Issue label Oct 14, 2024
@HuanzhiMao HuanzhiMao changed the title Add dynamic max_token handling for locally hosted models [BFCL] Add dynamic max_token handling for locally hosted models Oct 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
BFCL-General General BFCL Issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants