Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When ESBMC output is too big, a TPM Rate Limit error occurs. #93

Open
Yiannis128 opened this issue Nov 13, 2023 · 4 comments
Open

When ESBMC output is too big, a TPM Rate Limit error occurs. #93

Yiannis128 opened this issue Nov 13, 2023 · 4 comments
Labels
bug Something isn't working

Comments

@Yiannis128
Copy link
Collaborator

Yiannis128 commented Nov 13, 2023

When ESBMC has an output (counterexample) that is too big, then the token size is too large to be passed to the LLM. Currently, due to switching to LangChain, we don't measure and check if the current tokens have been surpassed. When the error occurs, LangChain will output it, please see Example.

Example (From the FormAI dataset FormAI_92991.c):

Retrying langchain.chat_models.openai.ChatOpenAI.completion_with_retry.<locals>._completion_with_retry in 4.0 seconds as it raised RateLimitError: Rate limit reached for gpt-3.5-turbo-16k in organization on tokens per min (TPM): Limit 180000, Used 150144, Requested 53468. Please try again in 7.87s. Visit https://platform.openai.com/account/rate-limits to learn more..
@Yiannis128 Yiannis128 added the bug Something isn't working label Nov 13, 2023
@ishaan-jaff
Copy link

@Yiannis128
i'm the maintainer of LiteLLM we allow you to increase your throughput - load balance between multiple deployments (Azure, OpenAI)
I'd love your feedback, especially if this does not solve your problem

Here's how to use it
Docs: https://docs.litellm.ai/docs/routing

from litellm import Router

model_list = [{ # list of model deployments 
    "model_name": "gpt-3.5-turbo", # model alias 
    "litellm_params": { # params for litellm completion/embedding call 
        "model": "azure/chatgpt-v-2", # actual model name
        "api_key": os.getenv("AZURE_API_KEY"),
        "api_version": os.getenv("AZURE_API_VERSION"),
        "api_base": os.getenv("AZURE_API_BASE")
    }
}, {
    "model_name": "gpt-3.5-turbo", 
    "litellm_params": { # params for litellm completion/embedding call 
        "model": "azure/chatgpt-functioncalling", 
        "api_key": os.getenv("AZURE_API_KEY"),
        "api_version": os.getenv("AZURE_API_VERSION"),
        "api_base": os.getenv("AZURE_API_BASE")
    }
}, {
    "model_name": "gpt-3.5-turbo", 
    "litellm_params": { # params for litellm completion/embedding call 
        "model": "vllm/TheBloke/Marcoroni-70B-v1-AWQ", 
        "api_key": os.getenv("OPENAI_API_KEY"),
    }
}]

router = Router(model_list=model_list)

# openai.ChatCompletion.create replacement
response = router.completion(model="gpt-3.5-turbo", 
                messages=[{"role": "user", "content": "Hey, how's it going?"}])

print(response)

@Yiannis128
Copy link
Collaborator Author

@Yiannis128
i'm the maintainer of LiteLLM we allow you to increase your throughput - load balance between multiple deployments (Azure, OpenAI)
I'd love your feedback, especially if this does not solve your problem

Here's how to use it
Docs: https://docs.litellm.ai/docs/routing

from litellm import Router

model_list = [{ # list of model deployments 
    "model_name": "gpt-3.5-turbo", # model alias 
    "litellm_params": { # params for litellm completion/embedding call 
        "model": "azure/chatgpt-v-2", # actual model name
        "api_key": os.getenv("AZURE_API_KEY"),
        "api_version": os.getenv("AZURE_API_VERSION"),
        "api_base": os.getenv("AZURE_API_BASE")
    }
}, {
    "model_name": "gpt-3.5-turbo", 
    "litellm_params": { # params for litellm completion/embedding call 
        "model": "azure/chatgpt-functioncalling", 
        "api_key": os.getenv("AZURE_API_KEY"),
        "api_version": os.getenv("AZURE_API_VERSION"),
        "api_base": os.getenv("AZURE_API_BASE")
    }
}, {
    "model_name": "gpt-3.5-turbo", 
    "litellm_params": { # params for litellm completion/embedding call 
        "model": "vllm/TheBloke/Marcoroni-70B-v1-AWQ", 
        "api_key": os.getenv("OPENAI_API_KEY"),
    }
}]

router = Router(model_list=model_list)

# openai.ChatCompletion.create replacement
response = router.completion(model="gpt-3.5-turbo", 
                messages=[{"role": "user", "content": "Hey, how's it going?"}])

print(response)

Hi, thanks for the suggestion, before I look at this, I would like to ask if you have a hugging face model uploaded? Since I already have hugging face model support.

I will still look at it if you don't but if you do it will be much easier to implement.

@ishaan-jaff
Copy link

Yes we support hugging face llms - are you trying to load balance between hugging face endpoints ?

@Yiannis128
Copy link
Collaborator Author

Yiannis128 commented Nov 21, 2023

Yes we support hugging face llms - are you trying to load balance between hugging face endpoints ?

No, I only ask it because I have an interface for adding text-generation-inference compatible models through hugging face. So if you do, which is great! Is this an alternative to langchain? Could you inform me of the positives you have?

Langchain had some limitations when I implemented it, not sure about now, so I'm weighing cost/benefit of considering to switch :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants