Fixes token logging in callbacks when streaming=True is used. #241

3coins · 2024-10-15T20:03:17Z

Fixes #240
Fixes #217

Code to verify

from langchain_aws import ChatBedrock
from langchain.callbacks.base import BaseCallbackHandler
from langchain_core.prompts import ChatPromptTemplate

streaming = True

class MyCustomHandler(BaseCallbackHandler):
    def on_llm_new_token(self, token: str, **kwargs) -> None:
        print(f"My custom handler, token: {token}")

prompt = ChatPromptTemplate.from_messages(["Tell me a joke about {animal} in a few words."])

model = ChatBedrock(
    model_id="anthropic.claude-3-haiku-20240307-v1:0",
    streaming = streaming, 
    callbacks=[MyCustomHandler()]
)

chain = prompt | model

response = chain.invoke({"animal": "bears"})

Output

My custom handler, token: 
My custom handler, token: Bear
My custom handler, token: -
My custom handler, token: ly funny
My custom handler, token: .
My custom handler, token: 
My custom handler, token:

3coins · 2024-10-15T20:16:19Z

@wuodar @supreetkt
Can you folks verify. Thanks.

libs/aws/langchain_aws/chat_models/bedrock.py

supreetkt · 2024-10-16T14:28:41Z

Btw, this check also seems to be missing from SagemakerEndpoint class. So there might be empty tokens there as well.

3coins · 2024-10-16T17:12:55Z

@supreetkt
Thanks for those suggestions. Upon checking some other partner implementations, there doesn't seem to be any filtering of messages in their code based on the content. There are valid cases in which other values in the callbacks might be useful, for example the response_metadata.

I will look separately into the responses to see if we are yielding extra chunks that shouldn't be exposed. As far as this PR is concerned, I believe this fixes the raised issue. Let me know if you have any other suggestions.

supreetkt · 2024-10-16T17:32:01Z

@supreetkt Thanks for those suggestions. Upon checking some other partner implementations, there doesn't seem to be any filtering of messages in their code based on the content. There are valid cases in which other values in the callbacks might be useful, for example the response_metadata.

I will look separately into the responses to see if we are yielding extra chunks that shouldn't be exposed. As far as this PR is concerned, I believe this fixes the raised issue. Let me know if you have any other suggestions.

@ihmaws raised a similar concern this morning, and when I took a look at the generated chunks, they only contained stop reason and for our use case, having additional callback invocation for the websocket meant additional calls which add to additional costs. I checked one partner implementation and they seemed to be checking only to ensure that the output is a string. Here is a sample empty generated chunk:

message=AIMessageChunk(content='', additional_kwargs={}, response_metadata={'stop_reason': 'end_turn', 'stop_sequence': None})

All this information is already packaged as a part of the final response:

content='Grizzly business.' additional_kwargs={} response_metadata={'stop_reason': 'end_turn', 'stop_sequence': None} id='run-xxx' usage_metadata={'input_tokens': 18, 'output_tokens': 9, 'total_tokens': 27}

At the end its your call based on your experience with more callback options. I can just handle the same logic on my end within our custom implementation of the on_llm_new_token too :)

I had one more concern: I tried using ChatBedrock with beta_use_converse_api set to True and it doesn't stream responses. I think its missing the usage of on_llm_new_token as well.

3coins · 2024-10-16T18:36:00Z

@supreetkt

I checked one partner implementation and they seemed to be checking only to ensure that the output is a string. Here is a sample empty generated chunk

Can you provide a link to that implementation.

I can just handle the same logic on my end within our custom implementation of the on_llm_new_token too :)

Yes, I think this can be handled easily by the callbacks based on their use case.

I had one more concern: I tried using ChatBedrock with beta_use_converse_api set to True and it doesn't stream responses. I think its missing the usage of on_llm_new_token as well.

I will tackle this in a separate PR, but in general streaming attribute is legacy and not required at-all, only there for backwards compatibility, since streaming is natively supported by LCEL and Runnables, and callbacks approach should be discouraged IMO.

supreetkt · 2024-10-16T19:03:09Z

@supreetkt

I checked one partner implementation and they seemed to be checking only to ensure that the output is a string. Here is a sample empty generated chunk

Can you provide a link to that implementation.

I can just handle the same logic on my end within our custom implementation of the on_llm_new_token too :)

Yes, I think this can be handled easily by the callbacks based on their use case.

I had one more concern: I tried using ChatBedrock with beta_use_converse_api set to True and it doesn't stream responses. I think its missing the usage of on_llm_new_token as well.

I will tackle this in a separate PR, but in general streaming attribute is legacy and not required at-all, only there for backwards compatibility, since streaming is natively supported by LCEL and Runnables, and callbacks approach should be discouraged IMO.

Here is the link. This logic allows for empty string.

Also, if you're tackling the converse API issue in a separate PR, this LGTM.

Fixes token logging in callbacks when streaming=True is used.

660bea9

3coins marked this pull request as ready for review October 15, 2024 20:06

supreetkt suggested changes Oct 15, 2024

View reviewed changes

libs/aws/langchain_aws/chat_models/bedrock.py Show resolved Hide resolved

supreetkt reviewed Oct 15, 2024

View reviewed changes

libs/aws/langchain_aws/chat_models/bedrock.py Show resolved Hide resolved

3coins merged commit 902252b into langchain-ai:main Oct 17, 2024
12 checks passed

shiv248 mentioned this pull request Oct 24, 2024

ChatBedrockConverse response not streaming shiv248/Streamlit-x-LangGraph-Cookbooks#3

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixes token logging in callbacks when streaming=True is used. #241

Fixes token logging in callbacks when streaming=True is used. #241

3coins commented Oct 15, 2024 •

edited

Loading

3coins commented Oct 15, 2024

supreetkt commented Oct 16, 2024

3coins commented Oct 16, 2024

supreetkt commented Oct 16, 2024 •

edited

Loading

3coins commented Oct 16, 2024 •

edited

Loading

supreetkt commented Oct 16, 2024 •

edited

Loading

Fixes token logging in callbacks when streaming=True is used. #241

Fixes token logging in callbacks when streaming=True is used. #241

Conversation

3coins commented Oct 15, 2024 • edited Loading

Code to verify

Output

3coins commented Oct 15, 2024

supreetkt commented Oct 16, 2024

3coins commented Oct 16, 2024

supreetkt commented Oct 16, 2024 • edited Loading

3coins commented Oct 16, 2024 • edited Loading

supreetkt commented Oct 16, 2024 • edited Loading

3coins commented Oct 15, 2024 •

edited

Loading

supreetkt commented Oct 16, 2024 •

edited

Loading

3coins commented Oct 16, 2024 •

edited

Loading

supreetkt commented Oct 16, 2024 •

edited

Loading