Memory leak in LangServe #717

lukasugar · 2024-07-26T15:12:31Z

I'm hosting a langserve app. The app is quite simple, but there seems to be a memory leak. Any ideas on why this is happening?

I'm seeing this error:

OSError: [Errno 24] Too many open files
socket.accept() out of system resource

seems like some clients are not closing connections. I'm using only ChatOpenAI in this app.

With every new request, RAM increases and doesn't go down:

The code is straightforward, I'm following examples from the docs.
Chain definitio in public_review.py:

from langchain_core.output_parsers import JsonOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI

from app.prompts.public_review_analysis_prompt import (
    PUBLIC_REVIEW_ISSUE_GENERATOR_SYSTEM_PROMPT,
)

public_review_text_chain = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            PUBLIC_REVIEW_ISSUE_GENERATOR_SYSTEM_PROMPT,
        ),
        ("user", "{text}"),
    ]
) | ChatOpenAI(model="gpt-4o", temperature=0.03, model_kwargs={"seed": 13})


public_review_chain = (
    | public_review_text_chain
    | JsonOutputParser(pydantic_object=IssueList)
)

Chain is imported in routers.py:

# Chain added to router and router is then added to the app
from fastapi import APIRouter
from langserve import add_routes

from app.enrichment.aggregator import aggregator_review_chain, aggregator_text_chain
from app.enrichment.public_review import public_review_chain, public_review_text_chain
from app.enrichment.types import (
    InputFragment,
    InputFragmentList
  )

router = APIRouter()

add_routes(
    router,
    public_review_chain.with_types(input_type=InputFragmentList, output_type=IssueList),
    path="/api/v1/public_review",
)
add_routes(router, public_review_text_chain, path="/api/v1/public_review/text")

Any ideas what could be causing the leak? This is literally the entire code.

The text was updated successfully, but these errors were encountered:

eyurtsev · 2024-07-26T15:53:14Z

I don't see anything obvious. What does from app.enrichment.aggregator import aggregator_review_chain, aggregator_text_chain look like?

lukasugar · 2024-07-26T18:47:55Z

It's pretty basic as well:

from langchain_core.output_parsers import JsonOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnableLambda
from langchain_openai import ChatOpenAI

from app.enrichment.types import IssueList
from app.prompts.issue_aggregator_prompt import ISSUE_AGGREGATOR_SYSTEM_PROMPT

aggregator_text_chain = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            ISSUE_AGGREGATOR_SYSTEM_PROMPT,
        ),
        ("user", "{text}"),
    ]
) | ChatOpenAI(model="gpt-4-turbo", temperature=0.03, model_kwargs={"seed": 13})


def _serialize_input(x: IssueList) -> str:
    """Helper function to serialize the input"""

    if isinstance(x, dict):
        _ifl = IssueList(issues=x["issues"])
        return _ifl.json()
    return x.json()


aggregator_review_chain = (
    {"text": RunnableLambda(_serialize_input)}
    | aggregator_text_chain
    | JsonOutputParser(pydantic_object=IssueList)
)

lukasugar · 2024-07-27T08:49:57Z

@eyurtsev any ideas on how to debug this?

Is ChatOpenAI closing its connections after calls?

eyurtsev · 2024-07-27T19:59:34Z

Ill read over the chat open ai implementation on Monday.

You could try deploying chat open ai as the sole runnable and verifying that you can recreate the problem if so that would help isolate the issue so we can rule out user code.

eyurtsev · 2024-07-27T20:00:25Z

Would you mind including output of

python -m langchain_core.sys_info

lukasugar · 2024-07-28T18:33:24Z

I'll try deploying chat open ai as the sole runnable and recreating the problem first thing tomorrow.
In the meantime, here's the output of the stuff command you asked:

# python -m langchain_core.sys_info

System Information
------------------
> OS:  Linux
> OS Version:  #1 SMP Mon Oct 9 16:21:24 UTC 2023
> Python Version:  3.11.9 (main, Jul 23 2024, 07:22:56) [GCC 12.2.0]

Package Information
-------------------
> langchain_core: 0.2.11
> langchain: 0.2.6
> langsmith: 0.1.83
> langchain_cli: 0.0.25
> langchain_openai: 0.1.14
> langchain_text_splitters: 0.2.2
> langgraph: 0.1.5
> langserve: 0.2.2
#

Additionally, these are the dependecies in the poetry file:

[tool.poetry.dependencies]
python = ">3.11, <3.12"
uvicorn = "^0.23.2"
langserve = "^0.2.2"
python-decouple = "^3.8"
mypy = "^1.10.0"
poetry-dotenv-plugin = "^0.2.0"
python-dotenv = "^1.0.1"
langchain-openai = "^0.1.14"
langchain-core = "^0.2.11"
langgraph = "^0.1.5"
langchain = "^0.2.6"
pydantic = "<2"
aiosqlite = "^0.20.0"


[tool.poetry.group.dev.dependencies]
langchain-cli = ">=0.0.15"

And this is the poetry.lock file: poetry.lock

lukasugar · 2024-07-29T13:07:33Z

@eyurtsev I've ran 4k requests to ChatOpenAI and I can see the memory leak.
Code:

# Chain definition
simple_chat_openai = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "You're a helpfull assistant. Talk with user using pirate language.",
        ),
        ("user", "{text}"),
    ]
) | ChatOpenAI(model="gpt-4o-mini", temperature=0.03, model_kwargs={"seed": 13})

# server.py
add_routes(app, simple_chat_openai, path="/chat_openai")

Here's the RAM usage. The app uses ~200MB when started. The usage jumps to ~400MB, and stays there even after the requests are completed. The red line is the point in time when all the requests are completed.

lukasugar · 2024-07-29T14:06:03Z

I've continued running the endpoint and the memory continued leaking until the service broke:

eyurtsev · 2024-07-29T14:45:15Z

Here's the chat open AI implementation. It's creating httpx.AsyncClient.

https://github.com/langchain-ai/langchain/blob/b3a23ddf9378a2616e35077b6d82d8fd1ef60cbc/libs/partners/openai/langchain_openai/chat_models/base.py#L451-L451

The client has default limits of:

DEFAULT_LIMITS = Limits(max_connections=100, max_keepalive_connections=20)

So there should be a connection pool there.

@lukasugar

which endpoint are you hitting on the server? (ainvoke? astream?)
Do you have any additional model configuration? e.g., proxy set up? (I'm wondering if there's any configuration coming from env variables)

eyurtsev · 2024-07-29T14:50:46Z

@lukasugar while we're debugging, you can roll out a quick workaround using: https://www.uvicorn.org/settings/#resource-limits

lukasugar · 2024-07-29T14:52:00Z

Side note, I've tried using Anthropic chain, and got the same issue:

# chain definition
simple_chat_anthropic = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "You're a helpfull assistant. Talk with user using pirate language.",
        ),
        ("user", "{text}"),
    ]
) | ChatAnthropic(model="claude-3-haiku-20240307")

# server.py
from app.enrichment.public_review import simple_chat_anthropic

add_routes(app, simple_chat_anthropic, path="/chat_anthropic")

The memory is constantly growing (ignore the orange line)

So this could be:

An issue with some base chat langchain class?
An issue with the way prompt templates are created in the code?

eyurtsev · 2024-07-29T14:56:47Z

An issue with some base chat langchain class?

Possibly, see if you can confirm the env configuration you have. I don't see anything suspicious in the chat model code right now as it looks like it uses a connection pool by default and is only initialized once.

An issue with the way prompt templates are created in the code?

I wonder if if we're seeing something from instantiation of pydantic models. LangChain relies on pydantic v1 namespace and we instantiate models both to create the prompts and also when we output the messages from the chat model.

The other possible source of issues is langserve itself as it does some stuff w/ request objects and it creates pydantic models

lukasugar · 2024-07-29T14:58:18Z

To answer your questions @eyurtsev :

I'm invoking the server through:

I'm calling the LangServe app from a javascript app and from python notebooks. In js, I'm using `fetch`:

      const aiResponse = await fetch("www.my/endpoint/chat_openai/invoke", {
        method: 'POST',
        headers: {
          'Content-Type': 'application/json',
        },
        body: JSON.stringify(requestData),
      });

In python code, I'm using requests:

def post_invoke_langserve(path: str, payload: str):
    headers = {"Content-Type": "application/json", "Accept": "application/json"}
    _url = os.path.join(base_url, path)
    response = requests.post(_url, headers=headers, data=payload)

    return response

We don't have any additional model configs. Some models have specified seed and temperature, that's all:

ChatOpenAI(model="gpt-4-turbo", temperature=0.03, model_kwargs={"seed": 13})

lukasugar · 2024-07-29T15:03:00Z

What environment information do you need?

The Dockerfile is the same as in the LangServe documentation:

FROM python:3.11-slim

RUN pip install poetry==1.6.1

RUN poetry config virtualenvs.create false

WORKDIR /code

COPY ./pyproject.toml ./README.md ./poetry.lock* ./

COPY ./package[s] ./packages

RUN poetry install  --no-interaction --no-ansi --no-root

COPY ./app ./app

RUN poetry install --no-interaction --no-ansi

ARG PORT

EXPOSE ${PORT:-8080}

CMD exec uvicorn app.server:app --host 0.0.0.0 --port ${PORT:-8080}

Environment variables that I'm specifying:

LANGCHAIN_API_KEY=some_value
LANGCHAIN_ENDPOINT=https://api.smith.langchain.com
LANGCHAIN_PROJECT=langserve-staging
LANGCHAIN_TRACING_V2=true
OPENAI_API_KEY=some_value
ANTHROPIC_API_KEY=some_value

eyurtsev · 2024-07-29T15:20:13Z

@lukasugar cool that's complete.

I was looking for any env information that could possibly change the instantiation of the httpx client used in ChatOpenAI (e.g., OPENAI_PROXY) to see if by any chance gets rid of the max limit on the number of connections in the connection pool.

https://github.com/langchain-ai/langchain/blob/b3a23ddf9378a2616e35077b6d82d8fd1ef60cbc/libs/partners/openai/langchain_openai/chat_models/base.py#L410-L410

But that's not the case, and I don't think that's where the issue is from.

Are you able to isolate further and determine whether just deploying the prompt re-produces the issue?

# Chain definition
prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "You're a helpfull assistant. Talk with user using pirate language.",
        ),
        ("user", "{text}"),
    ]
)

# server.py
add_routes(app, prompt, path="/prompt")

lukasugar · 2024-07-29T15:26:05Z

I've tried running ChatOpenAI as the only runnable in the chain - and the memory is still leaking.

Here's the code:

from langchain_openai import ChatOpenAI
from langserve import add_routes

add_routes(app, ChatOpenAI(model="gpt-4o-mini"), path="/chat_openai_plain")

The chat now contains only ChatOpenAI and the memory is leaking (orange line). After few thousand requests, memory went from 200MB -> 500MB.

lukasugar · 2024-07-29T16:16:31Z

@eyurtsev I've ran the experiment with returning only the prompt.
The memory is leaking:

I'm using the exact code as in your example.

@lukasugar cool that's complete.

I was looking for any env information that could possibly change the instantiation of the httpx client used in ChatOpenAI (e.g., OPENAI_PROXY) to see if by any chance gets rid of the max limit on the number of connections in the connection pool.

https://github.com/langchain-ai/langchain/blob/b3a23ddf9378a2616e35077b6d82d8fd1ef60cbc/libs/partners/openai/langchain_openai/chat_models/base.py#L410-L410

But that's not the case, and I don't think that's where the issue is from.

Are you able to isolate further and determine whether just deploying the prompt re-produces the issue?
# Chain definition
prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "You're a helpfull assistant. Talk with user using pirate language.",
        ),
        ("user", "{text}"),
    ]
)

# server.py
add_routes(app, prompt, path="/prompt")

eyurtsev · 2024-07-29T16:41:46Z

OK great this rules out anything specific to chat models.

There's one more potential source which is the langsmith-sdk LANGCHAIN_TRACING_V2=false -- this also makes network connections so it could explain the oserror.

If it's also not that, I'll need a bit of time to dig in since it's either pydantic or glue code in langserve. If it's pydantic, you'll need to force restart the workers as a work-around and hope that it gets resolved when we upgrade to pydantic 2 (tentatively next month).

lukasugar · 2024-07-29T17:18:19Z

I'll disable tracing and check if it changes anything

lukasugar · 2024-07-29T19:01:38Z

I'm a bit confused... I disabled langsmith tracing (removed the environment variables).
It seems that there still is a memory leak, but it's less deterministic, it doesn't happen with all calls.

Memory:

Requests:

It looks like:

some requests are causing memory to increase. Later calls are not.
memory never drops

So, disabling langsmith tracing helps, but it's not the only reason for memory leaks.

I don't see a great solution:

swapping langsmith with some other logging service would help, but I like LangSmith
disabling logging - no way, we need the logs

And there still are some memory leaks...

lukasugar · 2024-07-29T19:07:14Z

@eyurtsev do you think the pydantic 2 update will fix the memory leaks? Could you please find someone from the LangSmith team look into the issue as well?
Thanks!

eyurtsev · 2024-07-29T19:45:47Z

@lukasugar thanks!

do you think the pydantic 2 update will fix the memory leaks?

I don't know since we still need to isolate exactly where it is. It could be that there's some easy to fix bug in core or langsmith or langserve that's not related to pydantic.

Could you please find someone from the LangSmith team look into the issue as well?

Yes of course!

eyurtsev · 2024-07-29T21:29:32Z

@lukasugar while we're investigating you should be able to use this work-around: https://www.uvicorn.org/settings/#resource-limits

--limit-max-requests

eyurtsev · 2024-07-30T02:12:28Z

@lukasugar I haven't been able to reproduce any issues as long as langsmith tracer is either disabled or else configured properly (i.e. not rate limited).

Could you configure a logger and check if you're getting warnings from the langsmith client about getting rate limited?

if you hammer at the server hard enough while being rate limited by langsmith, you could definitely see memory consumption increase as the tracer will hold on to the data in memory temporarily and do a few more retries to avoid losing tracing data.

import logging
import os

import psutil
from fastapi import FastAPI
from langchain_core.prompts import ChatPromptTemplate

from langserve import add_routes

logging.basicConfig(
    level=logging.INFO,
    format="%(asctime)s - %(levelname)s - %(message)s",
)

logger = logging.getLogger(__name__)

app = FastAPI()


def get_memory_usage():
    process = psutil.Process(os.getpid())
    mem_info = process.memory_info()
    return mem_info.rss / 1024 / 1024  # Convert bytes to MB


prompt = ChatPromptTemplate.from_messages(
    [
        ("system", "You're an assistant by the name of Bob." * 100),
        ("human", "{input}"),
    ]
)


@app.get("/memory-usage")
def memory_usage():
    memory = get_memory_usage()
    return {"memory_usage": memory}


add_routes(app, prompt, path="/prompt")


if __name__ == "__main__":
    import uvicorn

    uvicorn.run(app, host="localhost", port=7999)

Here's a curl to issue a request:

random_string=$(head /dev/urandom | tr -dc A-Za-z0-9 | head -c 16)
curl -X 'POST' \
  'http://localhost:7999/prompt/invoke' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d "{
  \"input\": {
    \"text\": \"$random_string\"
  },
  \"config\": {},
  \"kwargs\": {}
}"

And you can monitor the memory usage this way:

watch -n 1 curl -s localhost:7999/memory-usage

My environment:

System Information

OS: Linux
OS Version: #44~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Tue Jun 18 14:36:16 UTC 2
Python Version: 3.11.4 (main, Sep 25 2023, 10:06:23) [GCC 11.4.0]

Package Information

langchain_core: 0.2.11
langchain: 0.2.6
langchain_community: 0.0.36
langsmith: 0.1.83
langchain_anthropic: 0.1.11
langchain_cli: 0.0.25
langchain_openai: 0.1.14
langchain_text_splitters: 0.2.2
langgraph: 0.1.5
langserve: 0.2.2

lukasugar · 2024-07-30T09:05:56Z

I can confirm that I was getting rate limited by langsmith:

Failed to batch ingest runs: LangSmithRateLimitError('Rate limit exceeded for https://api.smith.langchain.com/runs/batch.
 HTTPError(\'429 Client Error: Too Many Requests for url: https://api.smith.langchain.com/runs/batch\', \'{"detail":"Usage limit monthly_traces of 50000 exceeded"}\')')

I'll check the memory consumption the way you suggested, probably tomorrow.

eyurtsev · 2024-08-02T01:28:46Z

@lukasugar OK for me to close the issue for now?

lukasugar · 2024-08-02T11:16:08Z

@eyurtsev sorry, I'm overwhelmed with work the last few days... When I test the memory consumption the way you suggested, I'll re-open the ticket if the issue persists.

lukasugar · 2024-08-02T11:44:13Z

@lukasugar while we're investigating you should be able to use this work-around: https://www.uvicorn.org/settings/#resource-limits

--limit-max-requests

I've tried setting limit_max_requests to make the server restart after the max number of requests has been reached.

Here's the code:

if __name__ == "__main__":
    import uvicorn

    while True:
        uvicorn.run(
            app, host="0.0.0.0", port=8000, limit_max_requests=10
        )

        print(f"Restarting server")

Nothing happens after the server gets 10 (or even 50) requests.

I've tried simplified code, where it's expected that server will terminate after the limit is reached, it still doesn't work:

if __name__ == "__main__":
    import uvicorn

    uvicorn.run(
        app, host="0.0.0.0", port=8000, limit_max_requests=10
    )

I can make as many requests as I want, and the service is still running:

Any idea why that's happening?

lukasugar · 2024-08-02T15:03:55Z

I can't verify that workers are restarted after limit_max_requests. Do you know how I could verify that?

lukasugar · 2024-08-05T09:11:48Z

LangServe takes dependency on uvicorn (>=0.23.2,<0.24.0). That's a year old version... I've tried updating to the latest, uvicorn 0.30, but I encountered an issue:

poetry add uvicorn@^0.30

...

Because no versions of langchain-cli match >0.0.15,<0.0.16 || >0.0.16,<0.0.17 || >0.0.17,<0.0.18 || >0.0.18,<0.0.19 || >0.0.19,<0.0.20 || >0.0.20,<0.0.21 || >0.0.21,<0.0.22 || >0.0.22,<0.0.23 || >0.0.23,<0.0.24 || >0.0.24,<0.0.25 || >0.0.25,<0.0.26 || >0.0.26,<0.0.27 || >0.0.27,<0.0.28 || >0.0.28
 and langchain-cli (0.0.15) depends on uvicorn (>=0.23.2,<0.24.0), langchain-cli (>=0.0.15,<0.0.16 || >0.0.16,<0.0.17 || >0.0.17,<0.0.18 || >0.0.18,<0.0.19 || >0.0.19,<0.0.20 || >0.0.20,<0.0.21 || >0.0.21,<0.0.22 || >0.0.22,<0.0.23 || >0.0.23,<0.0.24 || >0.0.24,<0.0.25 || >0.0.25,<0.0.26 || >0.0.26,<0.0.27 || >0.0.27,<0.0.28 || >0.0.28) requires uvicorn (>=0.23.2,<0.24.0).
And because langchain-cli (0.0.16) depends on uvicorn (>=0.23.2,<0.24.0), langchain-cli (>=0.0.15,<0.0.17 || >0.0.17,<0.0.18 || >0.0.18,<0.0.19 || >0.0.19,<0.0.20 || >0.0.20,<0.0.21 || >0.0.21,<0.0.22 || >0.0.22,<0.0.23 || >0.0.23,<0.0.24 || >0.0.24,<0.0.25 || >0.0.25,<0.0.26 || >0.0.26,<0.0.27 || >0.0.27,<0.0.28 || >0.0.28) requires uvicorn (>=0.23.2,<0.24.0).
And because langchain-cli (0.0.17) depends on uvicorn (>=0.23.2,<0.24.0)
 and langchain-cli (0.0.18) depends on uvicorn (>=0.23.2,<0.24.0), langchain-cli (>=0.0.15,<0.0.19 || >0.0.19,<0.0.20 || >0.0.20,<0.0.21 || >0.0.21,<0.0.22 || >0.0.22,<0.0.23 || >0.0.23,<0.0.24 || >0.0.24,<0.0.25 || >0.0.25,<0.0.26 || >0.0.26,<0.0.27 || >0.0.27,<0.0.28 || >0.0.28) requires uvicorn (>=0.23.2,<0.24.0).
And because langchain-cli (0.0.19) depends on uvicorn (>=0.23.2,<0.24.0)
 and langchain-cli (0.0.20) depends on uvicorn (>=0.23.2,<0.24.0), langchain-cli (>=0.0.15,<0.0.21 || >0.0.21,<0.0.22 || >0.0.22,<0.0.23 || >0.0.23,<0.0.24 || >0.0.24,<0.0.25 || >0.0.25,<0.0.26 || >0.0.26,<0.0.27 || >0.0.27,<0.0.28 || >0.0.28) requires uvicorn (>=0.23.2,<0.24.0).
And because langchain-cli (0.0.21) depends on uvicorn (>=0.23.2,<0.24.0)
 and langchain-cli (0.0.22) depends on uvicorn (>=0.23.2,<0.24.0), langchain-cli (>=0.0.15,<0.0.23 || >0.0.23,<0.0.24 || >0.0.24,<0.0.25 || >0.0.25,<0.0.26 || >0.0.26,<0.0.27 || >0.0.27,<0.0.28 || >0.0.28) requires uvicorn (>=0.23.2,<0.24.0).
And because langchain-cli (0.0.23) depends on uvicorn (>=0.23.2,<0.24.0)
 and langchain-cli (0.0.24) depends on uvicorn (>=0.23.2,<0.24.0), langchain-cli (>=0.0.15,<0.0.25 || >0.0.25,<0.0.26 || >0.0.26,<0.0.27 || >0.0.27,<0.0.28 || >0.0.28) requires uvicorn (>=0.23.2,<0.24.0).
And because langchain-cli (0.0.26) depends on uvicorn (>=0.23.2,<0.24.0)
 and langchain-cli (0.0.27) depends on uvicorn (>=0.23.2,<0.24.0), langchain-cli (>=0.0.15,<0.0.25 || >0.0.25,<0.0.28 || >0.0.28) requires uvicorn (>=0.23.2,<0.24.0).
And because langchain-cli (0.0.28) depends on uvicorn (>=0.23.2,<0.24.0)
 and langchain-cli (0.0.25) depends on uvicorn (>=0.23.2,<0.24.0), langchain-cli (>=0.0.15) requires uvicorn (>=0.23.2,<0.24.0).
So, because narrative-langserve depends on both uvicorn (^0.30) and langchain-cli (>=0.0.15), version solving failed.

I've tried updating to the latest uvicorn version, hoping it solves the issue. Is there any reason why langchain-cli takes dependency on an old version?

Omega-Centauri-21 · 2024-08-06T13:28:06Z

Can you try collecting garbarge by calling the garbage collector explicitly (gc.collect() ) after handling requests to free up memory?

siddicky · 2024-08-17T07:46:25Z

I'm hosting a langserve app. The app is quite simple, but there seems to be a memory leak. Any ideas on why this is happening?

I'm seeing this error:
OSError: [Errno 24] Too many open files
socket.accept() out of system resource
seems like some clients are not closing connections. I'm using only ChatOpenAI in this app.

With every new request, RAM increases and doesn't go down:
The code is straightforward, I'm following examples from the docs. Chain definitio in `public_review.py`:
from langchain_core.output_parsers import JsonOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI

from app.prompts.public_review_analysis_prompt import (
    PUBLIC_REVIEW_ISSUE_GENERATOR_SYSTEM_PROMPT,
)

public_review_text_chain = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            PUBLIC_REVIEW_ISSUE_GENERATOR_SYSTEM_PROMPT,
        ),
        ("user", "{text}"),
    ]
) | ChatOpenAI(model="gpt-4o", temperature=0.03, model_kwargs={"seed": 13})


public_review_chain = (
    | public_review_text_chain
    | JsonOutputParser(pydantic_object=IssueList)
)

Hi, just wanted to confirm if this chain works as intended? I see you're using JsonOutputParser(pydantic_object=IssueList) however in your implementation, you're not using .with_structured_output() or bind_tools() to enforce this.

If the goal is to get json output you should specify the json_mode in .bind_tools or .with_structured_output

michael81045 · 2024-08-20T06:20:47Z

Hello everyone,
Any suggestions or solutions? I'm having the same problem... ...
After running it 1500 times, my memory usage has remained on the peak.

lukasugar · 2024-08-21T11:48:37Z

@michael81045 can you provide more context so we can see how our projects overlap, and precisely identify the issue?

What's your system info, what environment are you using?
Are you using LangSmith logging?

pedrojrv · 2024-09-04T14:19:53Z

Same issue on our side :O

lukasugar · 2024-09-04T14:35:12Z

@eyurtsev this seems to be an issue that a lot of folks are facing... Any new ideas for the fix? 🙏

lukasugar · 2024-09-04T14:35:18Z

@eyurtsev this seems to be an issue that a lot of folks are facing... Any new ideas for the fix? 🙏

eyurtsev · 2024-09-13T15:54:43Z

Hi apologies I was on vacation and then working on the 0.3 release for langchain. I'll check what's constraining uvicorn (probably sse-starlette) and unpin.

@michael81045 , @pedrojrv , @lukasugar I still haven't seen a confirmation of what's actually causing the memory leak. Based on what I diagnosed above it was happening because of user misconfiguration of langsmith. (i.e., enabling the tracer, not sampling of traces etc). For folks seeing problems, can you confirm that it's not from a misconfiguration of langsmith?

eyurtsev · 2024-09-13T17:58:47Z

langserve does not pin uvicorn directly, and based on sub-deps I don't see any uvicorn version pinning (e.g., from sse-starlette).

sse-starlette==1.8.2
├── anyio [required: Any, installed: 4.4.0]
│ ├── idna [required: >=2.8, installed: 3.8]
│ └── sniffio [required: >=1.1, installed: 1.3.1]
├── fastapi [required: Any, installed: 0.114.1]
│ ├── pydantic [required: >=1.7.4,<3.0.0,!=2.1.0,!=2.0.1,!=2.0.0,!=1.8.1,!=1.8, installed: 2.9.1]
│ │ ├── annotated-types [required: >=0.6.0, installed: 0.7.0]
│ │ ├── pydantic_core [required: ==2.23.3, installed: 2.23.3]
│ │ │ └── typing_extensions [required: >=4.6.0,!=4.7.0, installed: 4.12.2]
│ │ └── typing_extensions [required: >=4.6.1, installed: 4.12.2]
│ ├── starlette [required: >=0.37.2,<0.39.0, installed: 0.38.5]
│ │ └── anyio [required: >=3.4.0,<5, installed: 4.4.0]
│ │ ├── idna [required: >=2.8, installed: 3.8]
│ │ └── sniffio [required: >=1.1, installed: 1.3.1]
│ └── typing_extensions [required: >=4.8.0, installed: 4.12.2]
├── starlette [required: Any, installed: 0.38.5]
│ └── anyio [required: >=3.4.0,<5, installed: 4.4.0]
│ ├── idna [required: >=2.8, installed: 3.8]
│ └── sniffio [required: >=1.1, installed: 1.3.1]
└── uvicorn [required: Any, installed: 0.23.2]
├── click [required: >=7.0, installed: 8.1.7]
└── h11 [required: >=0.8, installed: 0.14.0]

I suggest using pipdeptree to determine what's pinning the uvicorn version

eyurtsev self-assigned this Jul 26, 2024

eyurtsev mentioned this issue Aug 2, 2024

Scaling to production -> OSError: [Errno 24] Too many open files socket.accept() out of system resource #714

Closed

lukasugar closed this as completed Sep 4, 2024

lukasugar reopened this Sep 4, 2024

eyurtsev closed this as completed Sep 13, 2024

eyurtsev reopened this Sep 13, 2024

Memory leak in LangServe #717

Memory leak in LangServe #717

Comments

lukasugar commented Jul 26, 2024 • edited Loading

eyurtsev commented Jul 26, 2024

lukasugar commented Jul 26, 2024

lukasugar commented Jul 27, 2024

eyurtsev commented Jul 27, 2024

eyurtsev commented Jul 27, 2024

lukasugar commented Jul 28, 2024

lukasugar commented Jul 29, 2024

lukasugar commented Jul 29, 2024

eyurtsev commented Jul 29, 2024 • edited Loading

eyurtsev commented Jul 29, 2024

lukasugar commented Jul 29, 2024

eyurtsev commented Jul 29, 2024 • edited Loading

lukasugar commented Jul 29, 2024

lukasugar commented Jul 29, 2024

eyurtsev commented Jul 29, 2024

lukasugar commented Jul 29, 2024

lukasugar commented Jul 29, 2024

eyurtsev commented Jul 29, 2024

lukasugar commented Jul 29, 2024

lukasugar commented Jul 29, 2024

lukasugar commented Jul 29, 2024

eyurtsev commented Jul 29, 2024

eyurtsev commented Jul 29, 2024

eyurtsev commented Jul 30, 2024 • edited Loading

System Information

Package Information

lukasugar commented Jul 30, 2024 • edited Loading

eyurtsev commented Aug 2, 2024

lukasugar commented Aug 2, 2024

lukasugar commented Aug 2, 2024

lukasugar commented Aug 2, 2024

lukasugar commented Aug 5, 2024

Omega-Centauri-21 commented Aug 6, 2024

siddicky commented Aug 17, 2024

michael81045 commented Aug 20, 2024

lukasugar commented Aug 21, 2024

pedrojrv commented Sep 4, 2024

lukasugar commented Sep 4, 2024

lukasugar commented Sep 4, 2024

eyurtsev commented Sep 13, 2024 • edited Loading

eyurtsev commented Sep 13, 2024 • edited Loading

lukasugar commented Jul 26, 2024 •

edited

Loading

eyurtsev commented Jul 29, 2024 •

edited

Loading

eyurtsev commented Jul 29, 2024 •

edited

Loading

eyurtsev commented Jul 30, 2024 •

edited

Loading

lukasugar commented Jul 30, 2024 •

edited

Loading

eyurtsev commented Sep 13, 2024 •

edited

Loading

eyurtsev commented Sep 13, 2024 •

edited

Loading