Agent Protocol is our attempt at codifying the framework-agnostic APIs that are needed to serve LLM agents in production. This documents explains the purpose of the protocol and makes the case for each of the endpoints in the spec. We finish by listing some roadmap items for the future.
See the full OpenAPI docs here and the JSON spec here.
LangGraph Platform implements a superset of this protocol, but we very much welcome other implementations from the community.
What is the right API to serve an LLM application in production? We believe it’s centered around 3 important concepts:
- Runs: APIs for executing an agent
- Threads: APIs to organize multi-turn executions of agents
- Store: APIs to work with long-term memory
Let’s dive deeper into each one, starting with the requirements, and then presenting the Protocol endpoints that meet these requirements.
What do we need out of an API to execute an agent?
- Support the two paradigms for launching a run
- Fire and forget, ie. launch a run in the background, but don’t wait for it to finish
- Waiting on a reply (blocking or polling), ie. launch a run and wait/stream its output
- Support CRUD for agent executions
- List and get runs
- Cancel and delete runs
- Flexible ways to consume output
- Get the final state
- Multiple types of streaming output, eg. token-by-token, intermediate steps, etc.
- Able to reconnect to output stream if disconnected
- Handling edge cases
- Failures should be handled gracefully, and retried if desired
- Bursty traffic should be queued up
Base Endpoints:
GET /threads/{thread_id}/runs
- List runs.POST /threads/{thread_id}/runs
- Create a run.GET /threads/{thread_id}/runs/{run_id}
- Get a run and its status.POST /threads/{thread_id}/runs/{run_id}/cancel
- Cancel a run. If the run hasn’t started, cancel it immediately, if it’s currently running then cancel it as soon as possible.DELETE /threads/{thread_id}/runs/{run_id}
- Delete a finished run. A pending run needs to be cancelled first, see previous endpoint.GET /threads/{thread_id}/runs/{run_id}/wait
- Wait for a run to finish, return the final output. If the run already finished, returns its final output immediately.GET /threads/{thread_id}/runs/{run_id}/stream
- Join the output stream of an existing run. Only output produced after this endpoint is called will be streamed.
Convenience Endpoints:
POST /threads/{thread_id}/runs/wait
- Create a run, and wait for its final output.POST /threads/{thread_id}/runs/stream
- Create a run, and stream output as produced.
What APIs do you need to enable multi-turn interactions?
- Persistent state
- Get and update state
- Track history of past states of a thread, modelled as an append-only log of states
- Optimize storage by storing only diffs between states
- Concurrency controls
- Ensure that only one run per thread is active at a time
- Customizable handling of concurrent runs (interrupt, enqueue, interrupt or rollback)
- CRUD endpoints for threads
- List threads by user, or other metadata
- List threads by status (idle, interrupted, errored, finished)
- Copy or delete threads
Endpoints:
POST /threads
- Create a thread.POST /threads/search
- Search threads.GET /threads/{thread_id}
- Get a thread.GET /threads/{thread_id}/state
- Get the latest state of a thread.POST /threads/{thread_id}/state
- Create a new revision of the thread’s state.GET /threads/{thread_id}/history
- Browse past revisions of a thread’s state. Revisions are created by runs, or through the endpoint just above.POST /threads/{thread_id}/copy
- Create an independent copy of a thread.DELETE /threads/{thread_id}
- Delete a thread.PATCH /threads/{thread_id}
- Update the metadata for a thread.
What do you need out of a memory API for agents?
- Customizable memory scopes
- Storing memory against the user, thread, assistant, company, etc
- Accessing memory from different scopes in the same run
- Flexible storage
- Support simple text memories, as well as structured data
- CRUD operations for memories (create, read, update, delete)
- Search and retrieval
- Get a single memory by namespace and key
- List memories filtered by namespace, contents, sorted by time, etc
Endpoints:
PUT /store/items
- Create or update a memory item, at a given namespace and key.DELETE /store/items
- Delete a memory item, at a given namespace and key.GET /store/items
- Get a memory item, at a given namespace and key.POST /store/items/search
- Search memory items.POST /store/namespaces
- List namespaces.
- Add Store endpoint to perform a vector search over memory entries
- Add param for
POST /threads/{thread_id}/runs/{run_id}/stream
to replay events sinceevent-id
before streaming new events - Add param to
POST /threads/{thread_id}/runs
to optionally allow concurrent runs on the same thread (current spec makes this forbidden) - (Open an issue and let us know what else should be here!)