From ca371459da7842bbff5fc07598d1beb6f5e0ff3b Mon Sep 17 00:00:00 2001 From: Mahmoud Mabrouk Date: Thu, 18 Jun 2026 17:11:28 +0200 Subject: [PATCH 1/2] docs(agent): add agent protocol RFC (POST /messages, sessions, SSE streaming) Specify a new POST /messages endpoint for the chat agent, sitting next to the unchanged workflow /invoke. Covers an optional session_id in the request/response body with scoped resolution rules, a Server-Sent Events response in the Vercel AI UI Message Stream format, UIMessage inputs, and a load-session endpoint. Written IETF-style so it can be built against directly. --- .../agent-workflows/agent-protocol-rfc.md | 526 ++++++++++++++++++ 1 file changed, 526 insertions(+) create mode 100644 docs/design/agent-workflows/agent-protocol-rfc.md diff --git a/docs/design/agent-workflows/agent-protocol-rfc.md b/docs/design/agent-workflows/agent-protocol-rfc.md new file mode 100644 index 0000000000..912f882b8d --- /dev/null +++ b/docs/design/agent-workflows/agent-protocol-rfc.md @@ -0,0 +1,526 @@ +# RFC: Agenta Agent Protocol (`POST /messages`, Sessions and Streaming) + +| | | +| --- | --- | +| **Status** | Draft | +| **Version** | 0.1 | +| **Layer** | Frontend to backend, over HTTP/1.1 | +| **Defines** | `POST /messages`, `POST /load-session` | +| **Reuses** | The workflow response envelope (`WorkflowServiceResponse`) and revision resolution (`references`) | +| **Companion** | [streaming-and-sessions.md](streaming-and-sessions.md) (design rationale and trade-offs) | + +## Abstract + +This document specifies the wire protocol between an Agenta client (typically a browser +running the Vercel AI SDK `useChat` hook) and the Agenta backend for running an **agent** +workflow. It defines a new endpoint, `POST /messages`, for stateful, streaming chat. The +endpoint carries a session identifier in the request and response bodies, offers two response +modes (a single JSON response and a Server-Sent Events stream in the Vercel UI Message Stream +format), and takes the agent's inputs as a conversation (`messages`) plus named input +variables (`inputs`). A second endpoint, `POST /load-session`, returns a conversation's +history. + +`/messages` is a sibling of the existing workflow `/invoke`, not a change to it. The generic, +stateless `/invoke` is untouched. `/messages` exists because the chat contract differs: the +conversation is a first-class top-level member in the Vercel `UIMessage` shape, the response +can stream, and a turn belongs to a session. + +## 1. Conventions and terminology + +The key words **MUST**, **MUST NOT**, **REQUIRED**, **SHALL**, **SHALL NOT**, **SHOULD**, +**SHOULD NOT**, **RECOMMENDED**, **MAY**, and **OPTIONAL** in this document are to be +interpreted as described in RFC 2119 and RFC 8174 when, and only when, they appear in all +capitals. + +JSON is defined in RFC 8259. Server-Sent Events (SSE) follow the WHATWG HTML `text/event- +stream` definition. All request and response bodies are UTF-8 encoded JSON unless a streaming +content type is negotiated. + +| Term | Definition | +| --- | --- | +| **Agent** | A workflow that runs a multi-step loop (model, tool, model, ...) and emits a stream of events before producing a final answer. | +| **Turn** | One request to `/messages`. A turn supplies new input and produces one assistant response (streamed or whole). | +| **Session** | A server-named conversation that groups turns. Identified by a `session_id`. | +| **`session_id`** | An opaque string that identifies a session within a project. Carried in the request and response bodies. | +| **`UIMessage`** | A message in Vercel AI SDK v5/v6 form: `{ id, role, parts[] }`. See Appendix B. | +| **Part** | One element of the UI Message Stream (for example `text-delta`, `tool-input-available`). See Section 6.2. | +| **`inputs`** | The agent's inputs for a turn: the conversation `messages` plus named input variables. See Section 5. | +| **Streaming edge** | The backend component that encodes the agent's internal `AgentEvent` stream into the UI Message Stream. | + +## 2. Protocol overview + +The protocol defines two endpoints: + +| Method | Path | Purpose | +| --- | --- | --- | +| `POST` | `/messages` | Run one agent turn. Returns one JSON response or an SSE stream, by content negotiation. | +| `POST` | `/load-session` | Return the history of a session. | + +A turn carries an OPTIONAL `session_id`. The server resolves it per Section 4. A turn's +response mode is selected by the `Accept` request header per Section 6. + +The agent's input for a turn is the conversation `data.messages` in `UIMessage` form, plus the +named input variables in `data.inputs`. The agent configuration travels as on `/invoke`, +either inline in `data.parameters` or resolved from `references` (Section 5). + +``` + ┌─────────────────────────── client (useChat) ───────────────────────────┐ + │ │ + POST /messages (Accept: text/event-stream) POST /load-session │ + │ │ │ + ▼ ▼ │ + ┌──────────────────┐ AgentEvent stream ┌───────────────────────────────────┐ │ + │ agent run │ ─────────────────────▶│ streaming edge → UI Message Stream │──┘ + │ (harness loop) │ └───────────────────────────────────┘ + └──────────────────┘ persists per turn + │ │ + └──────────────── trace store (ag.session.id) ◀───────┘ load-session reads here +``` + +## 3. Relationship to `/invoke` + +`/messages` is a new endpoint. It does not change `/invoke`. The generic, stateless workflow +invoke keeps its exact request and response, and a client that does not run a chat agent never +touches `/messages`. + +`/messages` reuses two things from the workflow contract so the backend does not fork: the +response envelope (`WorkflowServiceResponse`, with the answer in `data.outputs`) and revision +resolution (`references`). It diverges from `/invoke` in three ways, which is why it is its own +endpoint: + +1. The conversation is a first-class member, `data.messages`, in the `UIMessage` shape, rather + than nested in `data.inputs.messages` as `{role, content}`. +2. The response can stream as a UI Message Stream (Section 6.2). +3. A turn belongs to a session (`session_id`, Section 4). + +A server **SHOULD** map a `/messages` request onto the same internal agent invocation that +`/invoke` uses, after lifting `data.messages` and `data.inputs` into the handler's `messages` +and `inputs` arguments. + +## 4. Session model + +### 4.1 Identity + +A `session_id` is an opaque string scoped to a project. The pair `(project_id, session_id)` +**MUST** be unique. A bare `session_id` is not a global identifier. + +A client-supplied `session_id`: + +- **MUST** be treated as an opaque token. A server **MUST NOT** interpolate it into a storage + path, a query, or a trace attribute without escaping. +- **SHOULD** be constrained by the server to a bounded length and a restricted character set. + A server **MAY** reject an id outside those bounds with `400 Bad Request`. + +### 4.2 Resolution + +On receiving a turn, the server resolves the session as follows: + +1. If the request omits `session_id`, the server **MUST** mint a new unique id, associate the + turn with it, and return that id (Section 6). +2. If the request supplies a `session_id` that does not exist for the caller's project, the + server **MUST** create a session with that id and associate the turn with it. +3. If the request supplies a `session_id` that exists for the caller's project, the server + **MUST** associate the turn with that existing session. +4. If the request supplies a `session_id` that exists under a **different** project, the + server **MUST NOT** resume it. The server **MUST** treat it as case 2 within the caller's + own project, or reject the turn. A server **MUST NOT** disclose the existence of a session + the caller does not own. + +Rule 4 is the ownership boundary. "Resume if it exists" means "resume if it exists and +belongs to the caller." + +### 4.3 Continuation semantics for this version + +In this version, associating a turn with a session records the turn under that session for +tracing and later retrieval. The conversation context the model sees is supplied by the +`messages` in the request (Section 5.2), not reconstructed from the server's record. + +A future version MAY make the server's record authoritative, at which point a turn carries +only the new message and the server supplies the prior history. The request field is +unchanged by that evolution. See [streaming-and-sessions.md](streaming-and-sessions.md). + +### 4.4 Concurrency + +Two turns that create the same new `(project_id, session_id)` concurrently **MUST** resolve +to a single session. A server **SHOULD** enforce this with a unique constraint and treat the +losing creation as a resume (case 3). + +## 5. Request format (`POST /messages`) + +### 5.1 Envelope + +```jsonc +{ + "session_id": "sess_123", // OPTIONAL (Section 4) + "references": { ... }, // OPTIONAL: selects the workflow revision (as on /invoke) + "data": { + "messages": [ /* UIMessage[] */ ], // REQUIRED: the conversation (Section 5.2) + "inputs": { "": }, // OPTIONAL: named input variables (Section 5.3) + "parameters": { /* agent config */ } // OPTIONAL (Section 5.4) + } +} +``` + +`session_id` sits at the envelope top level, alongside the existing `trace_id` and `span_id`. +It **MUST NOT** be required in a request header. + +`data.messages`, `data.inputs`, and `data.parameters` are siblings. They map onto the agent +handler's `messages`, `inputs`, and `parameters` arguments. On `/invoke` the conversation is +nested at `data.inputs.messages`; on `/messages` it is lifted out to `data.messages`, because +the conversation is the primary input of this endpoint. + +### 5.2 `data.messages` + +`data.messages` is the conversation as an array of `UIMessage` objects (Appendix B). It is +REQUIRED. The last element is the new user turn. + +In this version the client **MUST** send the full conversation in `data.messages`. Each +element uses the parts-based `UIMessage` shape (Appendix B), not the `{role, content}` shape +of `/invoke`. + +### 5.3 `data.inputs` + +`data.inputs` carries the agent's named input variables for the turn: the workflow's declared +inputs and any per-turn context the caller supplies (for example a retrieved document or a +record id). Keys are input names; values are arbitrary JSON. This is the same `inputs` as the +workflow contract, with the conversation no longer nested inside it. + +`data.inputs` is OPTIONAL and MAY be sent on every turn, since its values can change between +turns. + +### 5.4 `data.parameters` and `references` + +The agent configuration (instructions, model, tools, harness, sandbox, permission policy) +travels as on `/invoke`: inline in `data.parameters.agent`, or resolved by the platform from +`references` when the request targets a stored revision. This protocol does not change that +resolution. + +### 5.5 Content negotiation + +The response mode is selected by the `Accept` request header: + +| `Accept` | Response | +| --- | --- | +| `application/json` (or absent) | Single JSON response (Section 6.1) | +| `text/event-stream` | UI Message Stream over SSE (Section 6.2) | + +A server that cannot satisfy the `Accept` header **MUST** respond `406 Not Acceptable`. + +## 6. Response formats + +### 6.1 Single JSON response + +For `Accept: application/json`, the server returns `200 OK` with a body extending +`WorkflowServiceResponse`: + +```jsonc +{ + "trace_id": "...", + "span_id": "...", + "session_id": "sess_123", // the resolved id (minted or echoed) + "status": { "code": 200 }, + "data": { "outputs": { "role": "assistant", "content": "Berlin." } } +} +``` + +The response **MUST** include `session_id`, set to the resolved session (Section 4). The +assistant answer rides in `data.outputs` as today. Token usage is not in the body; it is +recorded on the trace. + +### 6.2 UI Message Stream (SSE) + +For `Accept: text/event-stream`, the server returns `200 OK` and streams the run in the +Vercel UI Message Stream format (AI SDK v5/v6). + +#### 6.2.1 Response headers + +The response **MUST** set: + +``` +content-type: text/event-stream +x-vercel-ai-ui-message-stream: v1 +``` + +and **SHOULD** set: + +``` +cache-control: no-cache +connection: keep-alive +x-accel-buffering: no +``` + +`x-accel-buffering: no` disables proxy buffering so parts flush immediately. + +#### 6.2.2 Framing + +Each part is one SSE event: the literal bytes `data: `, followed by the part as compact JSON +(no insignificant whitespace), followed by `\n\n`. + +``` +data: {"type":"text-delta","id":"t1","delta":"Hello"}\n\n +``` + +The stream **MUST** terminate with the literal line `data: [DONE]\n\n`. + +#### 6.2.3 Part registry + +The parts a server emits, with their REQUIRED fields. Fields not listed are OPTIONAL and MAY +be omitted. + +| `type` | Required fields | Meaning | +| --- | --- | --- | +| `start` | none | Begin a message. Carries `messageId` and `messageMetadata` (Section 6.2.4). | +| `start-step` | none | Begin a step of the agent loop. | +| `finish-step` | none | End the current step. | +| `finish` | none | End the message. Carries `finishReason`, `messageMetadata`. | +| `text-start` | `id` | Begin a text block. | +| `text-delta` | `id`, `delta` | Append `delta` to the text block `id`. | +| `text-end` | `id` | End the text block. | +| `reasoning-start` | `id` | Begin a reasoning block. | +| `reasoning-delta` | `id`, `delta` | Append to the reasoning block. | +| `reasoning-end` | `id` | End the reasoning block. | +| `tool-input-start` | `toolCallId`, `toolName` | A tool call begins. | +| `tool-input-delta` | `toolCallId`, `inputTextDelta` | Append a fragment of the tool arguments (note: `inputTextDelta`, not `delta`). | +| `tool-input-available` | `toolCallId`, `toolName`, `input` | The full tool arguments are known. | +| `tool-output-available` | `toolCallId`, `output` | The tool result. | +| `tool-output-error` | `toolCallId`, `errorText` | The tool failed. | +| `file` | `url`, `mediaType` | A file or image. `url` MAY be an `https:` or `data:` URL. | +| `data-` | `data` | An application-defined part (generative UI). MAY carry `id` and `transient`. | +| `error` | `errorText` | A stream-level error (Section 8.2). | + +A server **MUST** order parts so that for any `id` or `toolCallId`, a `*-start` precedes its +deltas, which precede its `*-end` or `*-available`. Text and reasoning deltas are +concatenated by `id`. Tool parts are keyed by `toolCallId`. + +#### 6.2.4 Session id in the stream + +The server **MUST** convey the resolved `session_id` as `messageMetadata.sessionId` on the +`start` part, which is the first part of the stream: + +``` +data: {"type":"start","messageId":"msg_1","messageMetadata":{"sessionId":"sess_123"}} +``` + +A server **MAY** additionally mirror `session_id` to a response header. The body remains the +normative source. + +#### 6.2.5 Mapping from agent events + +The streaming edge consumes the agent's internal `AgentEvent` stream +(`services/agent/src/protocol.ts:74`) and emits parts as follows: + +| `AgentEvent` | Parts | +| --- | --- | +| run start (synthesized) | `start` (with `messageId`, `messageMetadata.sessionId`), then `start-step` | +| `message` | `text-start`, one or more `text-delta`, `text-end` | +| `thought` | `reasoning-start`, `reasoning-delta`, `reasoning-end` | +| `tool_call` | `tool-input-start`, then `tool-input-available` | +| `tool_result` with `isError=false` | `tool-output-available` | +| `tool_result` with `isError=true` | `tool-output-error` | +| `usage` | `messageMetadata` on the `finish` part | +| `error` | `error` (Section 8.2) | +| `done` | `finish-step`, then `finish` (`finishReason` = `stopReason`), then `[DONE]` | + +A harness that reports `capabilities.streamingDeltas` produces token-level `text-delta` +parts. A harness that does not produces one `text-delta` carrying the whole text. The wire +shape is identical, so the client does not distinguish them. + +The protocol streams deltas only. There is no full-message snapshot part. The client +assembles the final `UIMessage` from the parts. The server **SHOULD** record the assembled +turn on the trace (`ag.session.id`), which is the source `load-session` reads. + +## 7. The `load-session` endpoint (`POST /load-session`) + +Returns the history of a session so a client can rebuild a conversation it does not hold +locally. + +### 7.1 Request + +```jsonc +{ "session_id": "sess_123" } +``` + +`session_id` is REQUIRED. The server **MUST** apply the ownership rule of Section 4.2: if the +session does not exist for the caller's project, the server **MUST** respond `404 Not Found` +and **MUST NOT** reveal a session owned by another project. + +### 7.2 Response (default, `Accept: application/json`) + +The server returns `200 OK` with the conversation as `UIMessage` objects, the shape `useChat` +accepts as its initial `messages`: + +```jsonc +{ + "session_id": "sess_123", + "messages": [ + { "id": "m1", "role": "user", "parts": [ { "type": "text", "text": "capital of France?" } ] }, + { "id": "m2", "role": "assistant", "parts": [ { "type": "text", "text": "Paris." } ] } + ] +} +``` + +### 7.3 Response (negotiated replay, `Accept: text/event-stream`) + +A server **MAY** support a delta replay of the stored history under +`Accept: text/event-stream`, re-emitting the session as a UI Message Stream (Section 6.2). +This is OPTIONAL. Whether the folded form or the replay is the primary form is left open by +this draft; a conformant client **SHOULD** request `application/json` for rebuilding a static +view. + +## 8. Error handling + +### 8.1 Request and endpoint errors (JSON) + +Before a stream begins, the server reports errors with an HTTP status and the existing +`status` envelope (`WorkflowServiceStatus`: `code`, `message`, `type`, `stacktrace`): + +| Status | Condition | +| --- | --- | +| `400 Bad Request` | Malformed body, or a `session_id` that violates Section 4.1. | +| `401 Unauthorized` / `403 Forbidden` | Missing or invalid credentials. | +| `404 Not Found` | `load-session` on a session the caller does not own. | +| `406 Not Acceptable` | The `Accept` header cannot be satisfied. | +| `5xx` | Server failure before streaming starts. | + +### 8.2 In-stream errors + +A failure after the stream has started **MUST** be reported as an `error` part: + +``` +data: {"type":"error","errorText":"the agent run failed: ..."} +``` + +After emitting an `error` part, the server **SHOULD** terminate the stream. It **MAY** omit +the `finish` part. It **SHOULD** still emit `[DONE]` to close the SSE channel cleanly. The +client surfaces the error to the user. + +## 9. Security considerations + +- **Session ownership.** Section 4.2 rule 4 is a security requirement, not a convenience. + Because a client may supply a `session_id` for an unknown id (case 2), a server that keys + sessions on `session_id` alone would let a caller read or extend another tenant's + conversation. Servers **MUST** key on `(project_id, session_id)` and scope every resume, + every `load-session`, and every existence check to the caller's project. +- **Opaque ids.** A client-supplied `session_id` is untrusted input. See Section 4.1. +- **Secrets.** Provider keys and tool credentials travel and resolve as in the current + contract. This protocol adds no new secret-bearing field. `inputs` is caller-supplied + input and **MUST NOT** be used to smuggle credentials in place of the existing `secrets` + and signed-credential mechanisms. +- **Content negotiation and buffering.** A streaming response disables proxy buffering + (Section 6.2.1). Operators **MUST** ensure intermediaries do not re-buffer `text/event- + stream` responses, or streaming degrades to a single delayed flush. + +## 10. Interaction sequences + +### 10.1 New session, streaming turn + +``` +client server + │ POST /messages │ + │ Accept: text/event-stream │ + │ { data:{ messages:[...] } } │ (no session_id) + │───────────────────────────────────────▶│ + │ │ mint sess_123 + │ 200 text/event-stream │ + │ data: {"type":"start", │ + │ "messageMetadata": │ + │ {"sessionId":"sess_123"}} │ + │◀───────────────────────────────────────│ + │ data: {"type":"start-step"} ... │ + │ ... tool / text parts ... │ + │ data: {"type":"finish"} │ + │ data: [DONE] │ + │◀───────────────────────────────────────│ + │ (client stores sess_123 for next turn) │ +``` + +### 10.2 Returning to a known session + +``` +client server + │ POST /load-session │ + │ { "session_id": "sess_123" } │ + │───────────────────────────────────────▶│ check ownership + │ 200 { messages: [ UIMessage, ... ] } │ + │◀───────────────────────────────────────│ + │ (render history; hold it) │ + │ │ + │ POST /messages │ + │ Accept: text/event-stream │ + │ { session_id:"sess_123", │ + │ data:{ messages:[...full] } } │ + │───────────────────────────────────────▶│ resolve existing sess_123 + │ 200 text/event-stream → parts → [DONE] │ + │◀───────────────────────────────────────│ +``` + +## Appendix A: Full stream transcript + +One turn: the agent calls a weather tool, reads the result, and answers. Every `data:` line +in order, each followed by a blank line. + +``` +data: {"type":"start","messageId":"msg_1","messageMetadata":{"sessionId":"sess_123"}} + +data: {"type":"start-step"} + +data: {"type":"tool-input-start","toolCallId":"call_1","toolName":"getWeather"} + +data: {"type":"tool-input-available","toolCallId":"call_1","toolName":"getWeather","input":{"city":"Paris"}} + +data: {"type":"tool-output-available","toolCallId":"call_1","output":{"weather":"sunny","temp":24}} + +data: {"type":"finish-step"} + +data: {"type":"start-step"} + +data: {"type":"text-start","id":"t1"} + +data: {"type":"text-delta","id":"t1","delta":"It is sunny "} + +data: {"type":"text-delta","id":"t1","delta":"and 24°C in Paris."} + +data: {"type":"text-end","id":"t1"} + +data: {"type":"finish-step"} + +data: {"type":"finish","messageMetadata":{"usage":{"input":820,"output":36,"cost":0.004}}} + +data: [DONE] +``` + +## Appendix B: `UIMessage` schema + +A message accumulated by the client and accepted by `load-session`: + +```jsonc +{ + "id": "m2", + "role": "user | assistant | system", + "parts": [ + { "type": "text", "text": "..." }, + { "type": "reasoning", "text": "..." }, + { "type": "tool-", "toolCallId": "...", "state": "output-available", "input": {}, "output": {} }, + { "type": "file", "url": "...", "mediaType": "image/png" }, + { "type": "data-", "data": { } }, + { "type": "step-start" } + ], + "metadata": { } +} +``` + +A `UIMessage` carries no top-level `content` string in v5/v6. All content lives in `parts`. + +## Appendix C: References + +- RFC 2119, RFC 8174: requirement keywords. +- RFC 8259: JSON. +- WHATWG HTML, Server-Sent Events: `text/event-stream`. +- Vercel AI SDK UI Message Stream (v5/v6): https://ai-sdk.dev, and the chunk schema at + https://github.com/vercel/ai/blob/main/packages/ai/src/ui-message-stream/ui-message-chunks.ts +- Current contract: `sdks/python/agenta/sdk/models/workflows.py`, + `sdks/python/agenta/sdk/decorators/routing.py` (Accept negotiation at `:236`). +- Agent events and session id: `services/agent/src/protocol.ts:74`, + `services/oss/src/harness/ports.py`, `services/oss/src/agent/app.py`. +- Design rationale and trade-offs: [streaming-and-sessions.md](streaming-and-sessions.md). +``` From dd8ba2a0757e09cb03248ba05be008034414f7cc Mon Sep 17 00:00:00 2001 From: Arda Erzin Date: Thu, 18 Jun 2026 21:16:42 +0200 Subject: [PATCH 2/2] docs(agent): add tool approval + trace id to the agent protocol RFC Both are additive and already proven end-to-end by the agent-chat slice (this PR's companion); writing them into the spec: - 6.2.3: tool-approval-request + tool-output-denied parts (+ preliminary flag on tool-output-available) - 6.2.4: traceId on finish messageMetadata so the client can open the trace - 6.2.5: AgentEvent-map rows for the before-tool-hook approval pause + traceId - 6.2.6: new 'Tool approval (human-in-the-loop)' round-trip section; decision rides in data.messages (no new request field); marked an OPTIONAL extension --- .../agent-workflows/agent-protocol-rfc.md | 44 +++++++++++++++++-- 1 file changed, 41 insertions(+), 3 deletions(-) diff --git a/docs/design/agent-workflows/agent-protocol-rfc.md b/docs/design/agent-workflows/agent-protocol-rfc.md index 912f882b8d..ca6369e8f4 100644 --- a/docs/design/agent-workflows/agent-protocol-rfc.md +++ b/docs/design/agent-workflows/agent-protocol-rfc.md @@ -282,8 +282,10 @@ be omitted. | `tool-input-start` | `toolCallId`, `toolName` | A tool call begins. | | `tool-input-delta` | `toolCallId`, `inputTextDelta` | Append a fragment of the tool arguments (note: `inputTextDelta`, not `delta`). | | `tool-input-available` | `toolCallId`, `toolName`, `input` | The full tool arguments are known. | -| `tool-output-available` | `toolCallId`, `output` | The tool result. | +| `tool-output-available` | `toolCallId`, `output` | The tool result. MAY carry `preliminary` for streamed/partial results. | | `tool-output-error` | `toolCallId`, `errorText` | The tool failed. | +| `tool-approval-request` | `approvalId`, `toolCallId` | A tool call pauses for human approval (Section 6.2.6). | +| `tool-output-denied` | `toolCallId` | The user denied a tool; it was not executed (Section 6.2.6). | | `file` | `url`, `mediaType` | A file or image. `url` MAY be an `https:` or `data:` URL. | | `data-` | `data` | An application-defined part (generative UI). MAY carry `id` and `transient`. | | `error` | `errorText` | A stream-level error (Section 8.2). | @@ -292,7 +294,7 @@ A server **MUST** order parts so that for any `id` or `toolCallId`, a `*-start` deltas, which precede its `*-end` or `*-available`. Text and reasoning deltas are concatenated by `id`. Tool parts are keyed by `toolCallId`. -#### 6.2.4 Session id in the stream +#### 6.2.4 Session id and trace id in the stream The server **MUST** convey the resolved `session_id` as `messageMetadata.sessionId` on the `start` part, which is the first part of the stream: @@ -304,6 +306,17 @@ data: {"type":"start","messageId":"msg_1","messageMetadata":{"sessionId":"sess_1 A server **MAY** additionally mirror `session_id` to a response header. The body remains the normative source. +The server **SHOULD** also convey the run's **trace id** as `messageMetadata.traceId`, so a +client can open the trace for the turn (the assembled turn is recorded under `ag.session.id`, +Section 6.2.5). It rides `messageMetadata` on the `finish` part, where `usage` already travels: + +``` +data: {"type":"finish","messageMetadata":{"traceId":"<32-hex>","usage":{...}}} +``` + +A server **MAY** instead carry it on `start`. The trace id is the OTel trace id of the run +(the `trace_id` of the JSON response in Section 6.1); the client treats it as opaque. + #### 6.2.5 Mapping from agent events The streaming edge consumes the agent's internal `AgentEvent` stream @@ -317,7 +330,8 @@ The streaming edge consumes the agent's internal `AgentEvent` stream | `tool_call` | `tool-input-start`, then `tool-input-available` | | `tool_result` with `isError=false` | `tool-output-available` | | `tool_result` with `isError=true` | `tool-output-error` | -| `usage` | `messageMetadata` on the `finish` part | +| tool gated by the before-tool hook | `tool-input-start`/`-available`, then `tool-approval-request` (pause); on resume `tool-output-available` or `tool-output-denied` (Section 6.2.6) | +| `usage` | `messageMetadata` (`usage`) on the `finish` part; the run `traceId` rides there too (Section 6.2.4) | | `error` | `error` (Section 8.2) | | `done` | `finish-step`, then `finish` (`finishReason` = `stopReason`), then `[DONE]` | @@ -329,6 +343,30 @@ The protocol streams deltas only. There is no full-message snapshot part. The cl assembles the final `UIMessage` from the parts. The server **SHOULD** record the assembled turn on the trace (`ag.session.id`), which is the source `load-session` reads. +#### 6.2.6 Tool approval (human-in-the-loop) + +A tool MAY require human approval before it runs — the harness gates it at its before-tool +hook (`capabilities.permissions`). The round-trip uses the parts in Section 6.2.3 and the +re-sent conversation in Section 5.2; it adds no new request field. + +1. The server emits `tool-input-start` then `tool-input-available` for the call, then + `tool-approval-request { approvalId, toolCallId }`, and **ends the turn** (`finish`, + `[DONE]`) with that tool awaiting a decision. It **MUST NOT** emit a `tool-output-*` part + for that call in this turn. +2. The client records the user's decision on that tool part and re-sends the conversation + (`data.messages`, Section 5.2). Because the decision lives in the assistant message's tool + part (`UIMessage` form), the server reads it from the history — **no new request field is + required**. +3. On the resumed turn the server, for each decided call, either runs the tool (approved) and + emits `tool-output-available`, or skips it (denied) and emits `tool-output-denied + { toolCallId }`. The resumed output lands on the **same** logical tool call (matched by + `toolCallId`); the turn then continues normally. + +A server that never gates tools simply never emits `tool-approval-request` / +`tool-output-denied`. A client that does not render approvals MAY treat a turn that ends with +an outstanding `tool-approval-request` as complete. Approval is an extension to the base part +flow; a profile of this protocol MAY mark it OPTIONAL. + ## 7. The `load-session` endpoint (`POST /load-session`) Returns the history of a session so a client can rebuild a conversation it does not hold