Support Bedrock Batch Inference #250

bharven · 2024-10-22T19:54:54Z

Add support for Bedrock Batch Inference when using BedrockLLM batch() instead of making calls with the sync API

The text was updated successfully, but these errors were encountered:

3coins · 2024-10-23T00:23:18Z

@bharven
Can you provide more info about the batch API support in Bedrock, can you share the documentation for this feature and if it is supported in the current boto3 sdk.

bharven · 2024-10-23T00:59:11Z

@3coins expected behavior would be to use the native bedrock batch inference capability instead of using sync API calls where possible. Bedrock batch currently requires >= 1000 examples in a job, ideally the batch() call would use sync API for < 1000 records, and batch api for >= 1000 records. Bedrock batch enables processing of large (10k+) datasets without worrying about getting rate-limited, etc. It is supported in the current boto3 sdk.

AWS Documentation: https://docs.aws.amazon.com/bedrock/latest/userguide/batch-inference.html

Boto3 SDK link: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/bedrock/client/create_model_invocation_job.html

3coins · 2024-10-24T20:29:05Z

@bharven
Thanks for providing the documentation for the batch API in Bedrock. Some considerations that come to mind for this implementation.

The create_model_invocation_job uses the bedrock service, not the bedrock-runtime that converse or invoke APIs do, which means we will need a second boto3 client for supporting batch.
From the documentation, it is not clear if the data (messages) should be in native model format that each model supports or what the converse API supports. This is important because we will need to convert the input messages (LangChain messages) to the format that Bedrock batch supports.

To keep compatibility with the LangChain messages as inputs, we should only support the messages part of the input from the payload. For example, we won't be able to support recordId directly, but can embed this in the message id and then reformat to recordId when we send data to S3.

{
    "recordId": "CALL0000001", 
    "modelInput": {
        "anthropic_version": "bedrock-2023-05-31", 
        "max_tokens": 1024,
        "messages": [ 
            { 
                "role": "user", 
                "content": [
                    {
                        "type": "text", 
                        "text": "Summarize the following call transcript: ..." 
                    } 
                ]
            }
        ]
    }
}

There are many configuration inputs/outputs required for the batch job, and so we should use the kwargs parameter of the batch method to accept these inputs except the messages.
The Bedrock batch API is a long running non-sync API. How do you plan to implement fetching the results and polling for the job status within the batch method?

3coins added enhancement New feature or request bedrock Needs Info batch support labels Oct 23, 2024

3coins removed the Needs Info label Oct 23, 2024

3coins assigned bharven Oct 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support Bedrock Batch Inference #250

Support Bedrock Batch Inference #250

bharven commented Oct 22, 2024

3coins commented Oct 23, 2024

bharven commented Oct 23, 2024

3coins commented Oct 24, 2024 •

edited

Loading

Support Bedrock Batch Inference #250

Support Bedrock Batch Inference #250

Comments

bharven commented Oct 22, 2024

3coins commented Oct 23, 2024

bharven commented Oct 23, 2024

3coins commented Oct 24, 2024 • edited Loading

3coins commented Oct 24, 2024 •

edited

Loading