使用glm_server.py作为llm服务端，使用agent传输tools工具时，streaming模式并不会流式输出。 #618

jurnea · 2024-10-30T07:59:15Z

System Info / 系統信息

pip install langchain==0.2.16
pip install langgraph==0.2.34
pip install langchain_openai==0.1.9

Who can help? / 谁可以帮助到您？

前提：
使用glm_server.py作为大模型聊天服务端
问题描述：
构建agent智能体，传递tools，非流式输出无问题，流式输出时，在确定工具后，让大模型总结时不是流式输出，会将最终结果一次性返回。
注意：使用智谱官方提供的api（https://open.bigmodel.cn/api/paas/v4/）是正常的，可以流式输出。
原因分析：
在将工具的回答传入给大模型时：
在predict_stream 方法内

    elif (gen_params["tools"] and gen_params["tool_choice"] != "none") or is_function_call:
        continue

只要传输了tools，这里会一直continue，直到大模型问答完整后一次性返回，没有流式输出效果。

Information / 问题信息

The official example scripts / 官方的示例脚本
My own modified scripts / 我自己修改的脚本和任务

Reproduction / 复现过程

使用如下代码即可复现（注意智谱官方提供api是正常的）

import asyncio

from typing import Annotated
from typing_extensions import TypedDict
from langgraph.graph.message import add_messages
from langchain_core.tools import tool
from langgraph.prebuilt import ToolNode
from langchain_openai import ChatOpenAI
from langchain_core.runnables import RunnableConfig
from langgraph.graph import END, START, StateGraph
from langchain_core.messages import AIMessageChunk, HumanMessage, SystemMessage, AnyMessage

"""
This script build an agent by langgraph and stream LLM tokens
pip install langchain==0.2.16
pip install langgraph==0.2.34
pip install langchain_openai==0.1.9
"""


class State(TypedDict):
    messages: Annotated[list, add_messages]


@tool
def search(query: str):
    """Call to surf the web."""
    return ["Cloudy with a chance of hail."]


tools = [search]

model = ChatOpenAI(
    temperature=0,
    # model="glm-4",
    model="GLM-4-Flash",
    openai_api_key="[You Key]",
    # openai_api_base="https://open.bigmodel.cn/api/paas/v4/", #使用智谱官方提供的是正常流式输出
    openai_api_base="You url by glm_server.py ",
    streaming=True
)


class Agent:

    def __init__(self, model, tools, system=""):
        self.system = system
        workflow = StateGraph(State)
        workflow.add_node("agent", self.call_model)
        workflow.add_node("tools", ToolNode(tools))
        workflow.add_edge(START, "agent")
        workflow.add_conditional_edges(
            # First, we define the start node. We use `agent`.
            # This means these are the edges taken after the `agent` node is called.
            "agent",
            # Next, we pass in the function that will determine which node is called next.
            self.should_continue,
            # Next we pass in the path map - all the nodes this edge could go to
            ["tools", END],
        )
        workflow.add_edge("tools", "agent")
        self.model = model.bind_tools(tools)
        self.app = workflow.compile()

    def should_continue(self, state: State):
        messages = state["messages"]
        last_message = messages[-1]
        # If there is no function call, then we finish
        if not last_message.tool_calls:
            return END
        # Otherwise if there is, we continue
        else:
            return "tools"

    async def call_model(self, state: State, config: RunnableConfig):
        messages = state["messages"]
        if self.system:
            messages = [SystemMessage(content=self.system)] + messages
        response = await self.model.ainvoke(messages, config)
        # We return a list, because this will get added to the existing list
        return {"messages": response}

    async def query(self, user_input: str):
        inputs = [HumanMessage(content=user_input)]
        first = True
        async for msg, metadata in self.app.astream({"messages": inputs}, stream_mode="messages"):
            if msg.content and not isinstance(msg, HumanMessage):
                # 这里可以看出是否正常流式输出
                print(msg.content, end="|", flush=True)

            if isinstance(msg, AIMessageChunk):
                if first:
                    gathered = msg
                    first = False
                else:
                    gathered = gathered + msg

                if msg.tool_call_chunks:
                    print('tool_call_chunks...', gathered.tool_calls)


if __name__ == '__main__':

    input = "what is the weather in sf"
    prompt = """
    You are smart research assistant. Use the search engine ...
    """
    agent = Agent(model, tools, prompt)
    asyncio.run(agent.query(input))

Expected behavior / 期待表现

使用tool时可以正常流式输出大模型的回答

The text was updated successfully, but these errors were encountered:

jurnea added a commit to jurnea/GLM-4 that referenced this issue Oct 30, 2024

update issues THUDM#618 使用tools时无法stream的问题, THUDM#618

8fed0d9

jurnea added a commit to jurnea/GLM-4 that referenced this issue Oct 30, 2024

update issues THUDM#618 使用tools时无法stream流式输出的问题

9a84c47

jurnea mentioned this issue Oct 30, 2024

update issues THUDM#618 使用tools时无法stream流式输出的问题 #619

Open

zRzRzRzRzRzRzR self-assigned this Nov 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

使用glm_server.py作为llm服务端，使用agent传输tools工具时，streaming模式并不会流式输出。 #618

使用glm_server.py作为llm服务端，使用agent传输tools工具时，streaming模式并不会流式输出。 #618

jurnea commented Oct 30, 2024

使用glm_server.py作为llm服务端，使用agent传输tools工具时，streaming模式并不会流式输出。 #618

使用glm_server.py作为llm服务端，使用agent传输tools工具时，streaming模式并不会流式输出。 #618

Comments

jurnea commented Oct 30, 2024

System Info / 系統信息

Who can help? / 谁可以帮助到您？

Information / 问题信息

Reproduction / 复现过程

Expected behavior / 期待表现