Agent Protocol codifies essential APIs for managing LLM agents in production. Designed to facilitate atomic executions, multi-turn interactions, and long-term memory, it streamlines the complexities of API design for agent-based applications. Explore its comprehensive specs and contribute to the community-driven advancements.
Agent Protocol is a groundbreaking initiative designed to establish a framework-agnostic API that facilitates the deployment of LLM (Large Language Model) agents in production environments. This documentation not only elucidates the core purpose of the protocol but also highlights the necessity for each of the specified endpoints. Additionally, we outline our roadmap for future developments, inviting community contributions for alternative implementations beyond our own.
For a comprehensive view of the API, explore the full OpenAPI documentation here and access the JSON specification here.
Value Proposition of Agent Protocol
The primary question tackled by Agent Protocol is: What is the optimal API for serving an LLM application in a production setting? We propose that the solution centers around three pivotal concepts:
- Runs: APIs dedicated to executing agents.
- Threads: APIs that structure multi-turn executions of agents.
- Store: APIs designed for managing long-term memory.
Features
Runs: Atomic Agent Executions
Our API supports two paradigms for launching agent runs:
- Fire and Forget: Initiate a run in the background without waiting for completion.
- Waiting on a Reply: Launch a run and either block until it finishes or poll for its output.
The API also includes CRUD functionality for agent executions, encompassing:
- Listing and retrieving runs.
- Canceling and deleting runs.
Flexible output consumption methods are also provided, including:
- Accessing the final state of runs.
- Multiple streaming output options, such as token-by-token streams or intermediate steps.
- Reconnecting to output streams in case of disconnections.
Key Endpoints Include:
- List Runs:
GET /threads/{thread_id}/runs
- Create a Run:
POST /threads/{thread_id}/runs
- Get a Run's Status:
GET /threads/{thread_id}/runs/{run_id}
Threads: Multi-Turn Interactions
To facilitate multi-turn interactions, we offer:
- Persistent State Management: Get and modify the state of threads, track historical changes, and optimize storage through differential storage techniques.
- Concurrency Controls: Ensure that only one run per thread is active, alongside options for handling concurrent executions.
Endpoints Include:
- Create a Thread:
POST /threads
- Get Latest Thread State:
GET /threads/{thread_id}/state
Store: Long-Term Memory
Our memory API provides:
- Customizable Memory Scopes: Enables memory storage and retrieval associated with users, threads, assistants, and organizations.
- Flexible Storage Options: Supporting both simple and structured memory data with CRUD operations.
- Efficient Search Capabilities: Retrieve memories using various filters and attributes.
Endpoint Overview:
- Create or Update Memory Item:
PUT /store/items
- Delete Memory Item:
DELETE /store/items
Roadmap
We continuously seek to enhance Agent Protocol with future roadmap items, including:
- Implementing a Store endpoint for vector searches over memory entries.
- Adding functionality to
POST /threads/{thread_id}/runs/{run_id}/stream
for event replay capabilities. - Allowing concurrent runs on the same thread through API parameter adjustments.
- Contributing ideas and requirements through community engagement.
By standardizing how LLM agents interact in production, Agent Protocol is set to revolutionize API usage in AI-driven applications, ensuring robust and efficient deployments.