Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: Add Asynchronous Message Queue Support with Rate Limiting #149

Open
kaustavbecs opened this issue Dec 11, 2024 · 1 comment
Assignees
Labels

Comments

@kaustavbecs
Copy link

Use case

Regular enterprise use cases require support for spiky workloads without violating rate limits from LLM Providers. The Multi Agent Orchestrator should be able to:

  1. Store the incoming messages in a queue
  2. Check the classifier to identify top LLMs for the request
  3. Check the rate limits and current consumption
  4. Request the appropriate LLM endpoint
  5. Receive the response and send it back to the requestor via Websockets

Solution/User Experience

The Multi Agent Orchestrator should be able to:

  1. Store the incoming messages in a queue
  2. Check the classifier to identify top LLMs for the request
  3. Check the rate limits and current consumption
  4. Request the appropriate LLM endpoint
  5. Receive the response and send it back to the requestor via Websockets

Alternative solutions

Frameworks such as LLamaIndex supports async mode - but that targeted towards async IO non blocking http calls only. For a comprehensive enterprise grade Multi Agent Orchestrator solution, we need a custom solution
@kaustavbecs kaustavbecs changed the title Feature request: Support for async Multi Agent Orchestrator with support for massive scale Feature request: Add Asynchronous Message Queue Support with Rate Limiting Dec 11, 2024
@cornelcroi cornelcroi self-assigned this Dec 12, 2024
@cornelcroi
Copy link
Contributor

Hi @kaustavbecs, thank you for the proposition.

To have a async behaviour I think the best way is to handle this per agent (not at orchestrator level), because each agent is not necessary an LLM and because each agent could have different limits.
To implement this you could create a custom agent and handle the asynchronous communication with the LLM inside.
I could see this being implemented in a LambdaAgent for example and this lambda will have the asynchronous communication with the LLM by using queues or any other mechanism.
It would be interesting to create such agent to be part of built in agents.
Happy to review a solution before you can start the implementation if you want to contribute to this repostitory.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Development

No branches or pull requests

2 participants