Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Bedrock Batch Inference #250

Open
bharven opened this issue Oct 22, 2024 · 3 comments
Open

Support Bedrock Batch Inference #250

bharven opened this issue Oct 22, 2024 · 3 comments
Assignees
Labels

Comments

@bharven
Copy link

bharven commented Oct 22, 2024

Add support for Bedrock Batch Inference when using BedrockLLM batch() instead of making calls with the sync API

@3coins
Copy link
Collaborator

3coins commented Oct 23, 2024

@bharven
Can you provide more info about the batch API support in Bedrock, can you share the documentation for this feature and if it is supported in the current boto3 sdk.

@bharven
Copy link
Author

bharven commented Oct 23, 2024

@3coins expected behavior would be to use the native bedrock batch inference capability instead of using sync API calls where possible. Bedrock batch currently requires >= 1000 examples in a job, ideally the batch() call would use sync API for < 1000 records, and batch api for >= 1000 records. Bedrock batch enables processing of large (10k+) datasets without worrying about getting rate-limited, etc. It is supported in the current boto3 sdk.

AWS Documentation: https://docs.aws.amazon.com/bedrock/latest/userguide/batch-inference.html

Boto3 SDK link: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/bedrock/client/create_model_invocation_job.html

@3coins 3coins removed the Needs Info label Oct 23, 2024
@3coins
Copy link
Collaborator

3coins commented Oct 24, 2024

@bharven
Thanks for providing the documentation for the batch API in Bedrock. Some considerations that come to mind for this implementation.

  1. The create_model_invocation_job uses the bedrock service, not the bedrock-runtime that converse or invoke APIs do, which means we will need a second boto3 client for supporting batch.

  2. From the documentation, it is not clear if the data (messages) should be in native model format that each model supports or what the converse API supports. This is important because we will need to convert the input messages (LangChain messages) to the format that Bedrock batch supports.

  3. To keep compatibility with the LangChain messages as inputs, we should only support the messages part of the input from the payload. For example, we won't be able to support recordId directly, but can embed this in the message id and then reformat to recordId when we send data to S3.

    {
        "recordId": "CALL0000001", 
        "modelInput": {
            "anthropic_version": "bedrock-2023-05-31", 
            "max_tokens": 1024,
            "messages": [ 
                { 
                    "role": "user", 
                    "content": [
                        {
                            "type": "text", 
                            "text": "Summarize the following call transcript: ..." 
                        } 
                    ]
                }
            ]
        }
    }
  4. There are many configuration inputs/outputs required for the batch job, and so we should use the kwargs parameter of the batch method to accept these inputs except the messages.

  5. The Bedrock batch API is a long running non-sync API. How do you plan to implement fetching the results and polling for the job status within the batch method?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants