feat(agent): run coding agents as an Agenta workflow over ACP by mmabrouk · Pull Request #4721 · Agenta-AI/agenta

mmabrouk · 2026-06-17T11:07:58Z

Context

Agenta runs prompt workflows today: completion, chat, and the LLM judge. Each calls a model
once and returns one answer. This PoC adds a different kind of workflow. An agent runs a
loop: it reads its instructions, calls a model, runs a tool, reads the result, and calls the
model again, until the task is done. It sits behind the same /invoke contract, traces into
the same spans, and reads its config from the same playground.

The PoC proves one claim: the agent and the place it runs are both config, not code. You
change a dropdown to swap Pi for Claude Code, or local for a Daytona cloud sandbox, and
nothing above the seam changes.

How it works

A Python workflow handler receives /invoke and calls a Node sidecar over HTTP. The sidecar
drives a coding agent through the open-source rivet sandbox-agent daemon over the Agent
Client Protocol (ACP). The daemon starts an ACP adapter, which starts the agent (Pi or
Claude Code). Two orthogonal config axes swap independently: harness (pi | claude) and
sandbox (local | daytona).

The full write-up lives in docs/design/agent-workflows/. Read it in this order:

architecture.md: the data flow, the two containers, and the vocabulary. Start here.
ports-and-adapters.md: the neutral seam, the wire contract, and how the service picks an
engine and a transport.
sessions.md: cold replay today, and the two paths to warm or persisted sessions tomorrow.
adapters/pi.md and adapters/claude-code.md: how each agent is wired, traced, and given
tools.

What this PoC includes

The agent workflow behind /invoke, with harness and sandbox as playground dropdowns.
Two harnesses: Pi (default) and Claude Code.
Two sandboxes: local (processes inside the sidecar) and Daytona (a throwaway cloud machine).
A neutral, session-shaped port (Environment, Harness, AgentSession) with two engines
behind it: rivet over ACP, and a legacy in-process Pi path.
Backend-resolved tools that keep the provider key and connection auth server-side. Pi takes
them natively through a Pi extension; an MCP-capable harness takes them over MCP.
Tracing that nests the agent's run under the caller's /invoke span, with token usage
rolled onto the workflow span.

What it defers (documented, not built)

A warm daemon and server-owned session storage (see sessions.md).
Live streaming to the client over the HTTP edge.
The multi-tenant filesystem jail for a shared daemon.
Registering the agent as a first-class backend workflow type with its own builtin URI.

Tests / notes

Verified live against the dockerized dev stack, with real /invoke requests through the
playground service route:

pi / local: 200, ~3s. Trace nests _agent → invoke_agent → turn 0 → chat, with the token
total on _agent.
claude / local: 200, same nested tree. Confirms the harness swap is one config value.
pi / daytona: 200, ~10s. Runner-built trace, token total on _agent.

Python passes ruff format and ruff check. TypeScript compiles under tsc --strict.

For reviewers:

The playground model choice is not honored inside the Daytona sandbox yet. The in-sandbox
Pi lacks the model catalog the local Pi loads. The model axis is honored on pi / local.
The Claude over MCP tool path is wired but not yet exercised end to end. Pi tools are
verified end to end.

This PR lands the full agent-workflows PoC into main: the WP-2 service, the WP-7 tools, the
WP-8 ACP runtime, and the session-shaped port. The smaller stacked PRs (for example #4718)
are intermediate stacking artifacts. Review this one as the PoC.

What to QA

In the agent playground, run with harness=pi, sandbox=local. You get a reply, and the trace
shows _agent → invoke_agent → turn 0 → chat with tokens on _agent.
Change harness to claude, run again. You still get a reply, and the trace shows the same
nested shape.
Set sandbox=daytona with harness=pi. The reply comes back in about 10s with the full trace.
Regression: a normal completion or chat playground run still traces and answers as before.

… docs - New agent workflow service wrapping the Pi harness, served same-origin like chat/completion at /services/agent/v0 (Python service + Node Pi sidecar, ports/adapters). - Builtin 'agent' app type + create-app template; config is model + AGENTS.md. - /inspect chat schema and OpenTelemetry tracing into Agenta. - EE dev compose agent-pi sidecar; design docs under docs/design/agent-workflows.

Re-platform the agent workflow service to drive coding harnesses (Pi, Claude Code) over the Agent Client Protocol through a rivet sandbox-agent daemon, behind the unchanged Harness port and /invoke contract. The harness (pi/claude) and sandbox (local/daytona) are editable playground config; tracing nests under the /invoke span; tools are delivered Pi-native via a bundled extension; and the model provider key resolves from the project vault.

Evolve the agent service ports toward the rivet sandbox-agent session shape, so the rivet (ACP) and legacy in-process Pi backends share one clean, capability-aware port. - ports.py: Environment + Harness seams, a first-class AgentSession (create/prompt/ destroy), HarnessCapabilities, ContentBlock, Message, AgentEvent, structured AgentResult. - harness.py: SubprocessHarness + HttpHarness share one wire contract (wire.py), replacing the pi_harness/pi_http_harness/rivet_harness trio. The engine is an env value. - TS: shared protocol.ts; runPi/runRivet return the enriched result; runRivet probes getAgent() capabilities and routes tools by mcpTools, not the harness name; usage flows on the rivet path (split from PromptResponse.usage); one shared toolClient.ts replaces the triplicated /tools/call client. - agent.py uses the session API; _select_backend upgrades pi/local to rivet when the selected harness/sandbox needs it. permission_policy added to /inspect. Verified live: pi, rivet+pi+local, rivet+claude+local, rivet+pi+daytona; playground run succeeds with usage; invoke_agent nests under the /invoke span. Design notes under docs/design/agent-workflows/harness-port-redesign/.

vercel · 2026-06-17T11:08:04Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
agenta-documentation	Ready	Preview, Comment	Jun 17, 2026 7:35pm

coderabbitai · 2026-06-17T11:08:08Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: cf0acae1-2003-4724-bc1b-bd9d9c65edd0

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

✅ Review completed - (🔄 Check again to review again)

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/agent-harness-port

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

…ss runtime Address the god-module and the misleading package name: - services/oss/src/harness/ (was agent_pi/): the engine-agnostic runtime — ports.py, transports.py (was harness.py), environment.py, wire.py. Named for the seam, not Pi; harness choice (pi/claude) lives inside the runtime, so there is no agent_claude. - services/oss/src/agent/ (was the 470-line agent.py god-module): the Agenta workflow app — app.py (thin handler + backend wiring), inputs.py (request parsing), tools.py, secrets.py, tracing.py, client.py (shared backend access), schemas.py, config.py. No behavior change. Verified live: a playground run answers 'REFACTOR-OK Lisbon' with usage.

+    "content-type": "application/json",
+    "content-length": Buffer.byteLength(payload),
+  });
+  res.end(payload);


…ite README The TS runner's src/ had grown one work package at a time into a flat folder of ten files with no signal of role. Group them and rewrite the stale README (it still called this a 'Pi wrapper' and pointed at the moved agent.py): src/cli.ts, server.ts, protocol.ts entrypoints + the wire contract src/engines/{pi,rivet}.ts the two engines (was runPi.ts / runRivet.ts) src/tracing/otel.ts the tracers (was agenta-otel.ts) src/tools/{client,mcp-bridge,mcp-server}.ts tool delivery (was toolClient/toolBridge*) src/extensions/agenta.ts the Pi extension (was piExtension.ts) No behavior change. Updated the fragile __dirname-relative paths in engines/rivet.ts (PKG_ROOT) and tools/mcp-bridge.ts (the tsx bin + server path) for the new depth, and the build-extension entry. Verified live: rivet+pi+local through the restarted sidecar answers 'Athens' with usage; tsc --strict and the extension build pass.

Replace the loose model/agents_md/harness/sandbox params with one `agent` config element (x-ag-type: agent_config) carrying instructions, model, tools, harness, sandbox, and permission policy. The playground renders it through a new AgentConfigControl that reuses the existing controls: the model selector, the tool picker (so Composio and builtin tools are finally selectable on the agent), the enum selects, and a textarea. The backend reads it via resolve_agent_config and falls back to the old shape so existing revisions keep running. Verified live: the element renders with the tool picker, and a GitHub Composio tool runs end to end on pi+local.

Tools worked locally but failed on Daytona. The in-sandbox Pi extension POSTed each tool call to Agenta's /tools/call, but a firewalled or private backend does not expose that to the remote cloud sandbox (the same reason tracing is built from the event stream on Daytona rather than in-sandbox OTLP). The sandbox has internet but cannot reach the backend, so the call failed and the model gave up. Route the call through the runner, which can reach Agenta. The extension writes the request to a file in a sandbox dir and polls for the response; the runner watches the dir over the daemon filesystem API, calls /tools/call, and writes the result back (tools/relay.ts). Local runs keep the direct path. Verified programmatically: rivet+pi+daytona with a GitHub Composio tool now returns the real login (was 'the tool failed twice'); local is unchanged.

Move the raw work-package material (wp-1..wp-8, harness-port-redesign, research) into scratch/ and add clean top-level pages a reviewer can read top to bottom: a README index, architecture, ports-and-adapters, sessions, and adapters/{pi,claude-code}. Update the three in-code references that pointed at the moved doc paths.

coderabbitai

Actionable comments posted: 19

Note

Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

services/agent/docker/Dockerfile.dev (1)
7-42: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Run the container as a non-root user.

There is no USER directive, so the runtime process defaults to root.
💡 Proposed fix
 RUN pnpm run build:extension
+RUN chown -R node:node /app

 ENV NODE_ENV=development \
     PORT=8765

 EXPOSE 8765

 # Call the local tsx binary directly to avoid pnpm/corepack HOME writes when the
 # container runs as a non-root host uid.
+USER node
 CMD ["node_modules/.bin/tsx", "watch", "src/server.ts"]
Source: Linters/SAST tools

🟡 Minor comments (22)

docs/design/agent-workflows/scratch/harness-port-redesign/research.md-3-7 (1)

3-7: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Update the “source verified” file paths to current module locations.

This header cites services/oss/src/agent_pi/ports.py and services/oss/src/agent.py, but current stack context and code snippets point to services/oss/src/harness/ports.py and split agent modules. Keeping this exact avoids audit drift.

docs/design/agent-workflows/scratch/wp-2-agent-service/README.md-3-3 (1)

3-3: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Refresh the status line to avoid stale planning state.

Status: not started. conflicts with the implementation-plan/history in this same WP folder and can misdirect readers. Consider marking this as historical/archived instead.

docs/design/agent-workflows/scratch/wp-2-agent-service/README.md-55-55 (1)

55-55: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix the WP identifier typo.

Success for this WP1 should be WP-2 in this document.

docs/design/agent-workflows/scratch/wp-2-agent-service/implementation-plan.md-153-170 (1)

153-170: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Specify a language for this fenced block.

This block is missing a language tag and will continue tripping MD040.

Source: Linters/SAST tools
docs/design/agent-workflows/scratch/wp-2-agent-service/implementation-plan.md-267-271 (1)
267-271: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix the cd path in the login verification snippet.

The path misses the scratch/ segment, so this command won’t resolve in the documented tree.
🛠️ Suggested fix
-cd docs/design/agent-workflows/wp-1-pi-tracing/poc
+cd docs/design/agent-workflows/scratch/wp-1-pi-tracing/poc
docs/design/agent-workflows/scratch/wp-3-daytona-sandbox/poc/run_agent.py-183-184 (1)
183-184: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Guard against missing flag values in arg().

If a flag is passed without a value (or followed by another flag), this currently throws IndexError.
✅ Suggested fix
 def arg(name: str, default: str) -> str:
-    return sys.argv[sys.argv.index(name) + 1] if name in sys.argv else default
+    if name not in sys.argv:
+        return default
+    idx = sys.argv.index(name) + 1
+    if idx >= len(sys.argv) or sys.argv[idx].startswith("--"):
+        raise SystemExit(f"{name} requires a value")
+    return sys.argv[idx]
docs/design/agent-workflows/scratch/wp-3-daytona-sandbox/poc/run_agent.py-259-266 (1)
259-266: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Quote shell arguments when assembling pi_cmd.

provider, model, and paths are interpolated directly into a shell command. Quoting avoids malformed commands when values include shell-significant characters.
🔒 Suggested fix
+import shlex
 ...
-            pi_cmd = (
-                f"cd {run_dir} && TMPDIR={run_dir}/tmp "
-                f"pi -p {json.dumps(PROMPT)} "
-                f"--mode json --approve --provider {provider} --model {model} "
+            q_run_dir = shlex.quote(run_dir)
+            q_prompt = shlex.quote(PROMPT)
+            q_provider = shlex.quote(provider)
+            q_model = shlex.quote(model)
+            pi_cmd = (
+                f"cd {q_run_dir} && TMPDIR={q_run_dir}/tmp "
+                f"pi -p {q_prompt} "
+                f"--mode json --approve --provider {q_provider} --model {q_model} "
                 f"-t read,bash,edit,write,ls "
-                f"--session-dir {run_dir}/.pi-sessions --name {session_id} "
+                f"--session-dir {q_run_dir}/.pi-sessions --name {shlex.quote(session_id)} "
                 f"< /dev/null"
             )
docs/design/agent-workflows/scratch/harness-port-redesign/research.md-148-152 (1)
148-152: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add a language identifier to the fenced code block.

The fenced block here is missing a language tag and will keep triggering MD040.
✏️ Suggested fix
-```
+```text
 commandExecution, errorEvents, fileAttachments, fileChanges, images, itemStarted,
 mcpTools, permissions, planMode, questions, reasoning, sessionLifecycle, sharedProcess,
 status, streamingDeltas, textMessages, toolCalls, toolResults
</details>



_Source: Linters/SAST tools_

</blockquote></details>
<details>
<summary>docs/design/agent-workflows/scratch/wp-3-daytona-sandbox/poc/bench_coldstart.py-33-42 (1)</summary><blockquote>

`33-42`: _⚠️ Potential issue_ | _🟡 Minor_ | _⚡ Quick win_

**Ensure sandbox teardown runs even when a benchmark iteration errors.**

`daytona.delete(sb)` is currently best-effort on the happy path only; an exception after `create()` can leak a sandbox.




<details>
<summary>♻️ Suggested fix</summary>

```diff
 for snap in SNAPSHOTS:
     times: list[float] = []
     for i in range(N):
-        t = time.monotonic()
-        sb = daytona.create(
-            CreateSandboxFromSnapshotParams(snapshot=snap, auto_stop_interval=0),
-            timeout=120,
-        )
-        dt = time.monotonic() - t
-        times.append(dt)
-        print(f"{snap:20} run {i + 1}/{N}: {dt:.2f}s  state={sb.state}", flush=True)
-        daytona.delete(sb)
+        sb = None
+        t = time.monotonic()
+        try:
+            sb = daytona.create(
+                CreateSandboxFromSnapshotParams(snapshot=snap, auto_stop_interval=0),
+                timeout=120,
+            )
+            dt = time.monotonic() - t
+            times.append(dt)
+            print(f"{snap:20} run {i + 1}/{N}: {dt:.2f}s  state={sb.state}", flush=True)
+        finally:
+            if sb is not None:
+                daytona.delete(sb)
     results[snap] = times
docs/design/agent-workflows/scratch/wp-8-rivet-acp-runtime/poc/commit_agent_config.py-62-71 (1)
62-71: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add error handling after POST commit.

Line 62–67 makes a POST to the commit endpoint but line 69 calls resp.json() without first checking for errors.

The GET request (lines 25–30) properly calls raise_for_status(), but the POST does not. If the commit fails, resp.json() could raise an exception if the response body is not JSON (e.g., on a 500 error).
         resp = client.post(
             f"{BASE}/api/workflows/revisions/commit",
             params={"project_id": PROJ},
             headers=H,
             json=body,
         )
+        resp.raise_for_status()
         print("commit status:", resp.status_code)
docs/design/agent-workflows/scratch/wp-1-pi-tracing/integrating-the-tracing-extension.md-23-23 (1)

23-23: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add languages to fenced code blocks to satisfy markdownlint.

These fences are missing language identifiers (MD040), which can fail doc lint pipelines.

Also applies to: 147-147, 161-161

Source: Linters/SAST tools

docs/design/agent-workflows/scratch/wp-1-pi-tracing/poc/README.md-15-15 (1)

15-15: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Specify language for fenced blocks to clear MD040.

Both fences should include a language token (for example text) to keep markdownlint clean.

Also applies to: 67-67

Source: Linters/SAST tools
docs/design/agent-workflows/scratch/wp-1-pi-tracing/tracing-in-the-agent-service.md-41-42 (1)
41-42: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix Python source path references to services/oss/src/agent/app.py.

The doc currently points readers to services/oss/src/agent.py, but the _agent flow is implemented under services/oss/src/agent/app.py, so navigation/debugging from this guide will be misleading.
Suggested doc fix
-1. **Capture (Python, `services/oss/src/agent.py`).** Inside the instrumented
+1. **Capture (Python, `services/oss/src/agent/app.py`).** Inside the instrumented
...
-- `services/oss/src/agent.py` — `_trace_context()` captures the workflow span context.
+- `services/oss/src/agent/app.py` — `_trace_context()` captures the workflow span context.
Also applies to: 110-110
docs/design/agent-workflows/scratch/wp-1-pi-tracing/tracing-in-the-agent-service.md-25-25 (1)

25-25: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add language identifiers on fenced code blocks (MD040).

These code fences are missing language tags, which can break markdownlint in CI.

Also applies to: 101-101

Source: Linters/SAST tools
docs/design/agent-workflows/scratch/wp-1-pi-tracing/README.md-41-41 (1)
41-41: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Align OTLP export docs with the implemented auth contract.

This line documents ?project_id=<uuid>, but the companion integration doc and POC exporter use API-key-only routing without project_id. Keeping both patterns in the same WP docs is confusing for adopters.
Suggested doc fix
-- Export over OTLP/HTTP protobuf with `Authorization: ApiKey <key>` and `?project_id=<uuid>`.
+- Export over OTLP/HTTP protobuf with `Authorization: ApiKey <key>` (project resolved from the key).
docs/design/agent-workflows/scratch/wp-1-pi-tracing/poc/run.ts-173-178 (1)
173-178: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Reset runConfig.traceId before each prompt iteration.

Without clearing it, a failed prompt can leave the previous run’s trace ID in place and be re-recorded as if it belonged to the current prompt.
Suggested fix
   for (let i = 0; i < scenario.prompts.length; i++) {
     const p = scenario.prompts[i];
     console.log(`\n[run] prompt ${i + 1}/${scenario.prompts.length}: ${p}\n`);
+    runConfig.traceId = undefined;
     await session.prompt(p);
     if (runConfig.traceId) traceIds.push(runConfig.traceId);
   }
services/oss/src/agent/tools.py-100-105 (1)
100-105: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Missing error handling for malformed JSON response.

If the backend returns a non-JSON response (e.g., an HTML error page from a proxy), response.json() will raise JSONDecodeError. Consider wrapping in try/except to provide a clearer error message.
Proposed fix
     if response.status_code >= 400:
         raise RuntimeError(
             f"Tool resolution failed (HTTP {response.status_code}): {response.text[:500]}"
         )

-    data = response.json()
+    try:
+        data = response.json()
+    except Exception as exc:
+        raise RuntimeError(
+            f"Tool resolution returned invalid JSON: {response.text[:200]}"
+        ) from exc
services/oss/src/agent/config.py-66-70 (1)
66-70: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Missing error handling for malformed agent.json.

If agent.json exists but contains invalid JSON, json.loads will raise a JSONDecodeError that propagates up, potentially crashing the agent startup with an unclear error. Consider catching and logging/falling back gracefully, similar to how a missing file is handled.
Proposed fix
     meta_path = base / "agent.json"
     if meta_path.exists():
-        meta = json.loads(meta_path.read_text(encoding="utf-8"))
-        model = meta.get("model") or DEFAULT_MODEL
-        tools = meta.get("tools", []) or []
+        try:
+            meta = json.loads(meta_path.read_text(encoding="utf-8"))
+            model = meta.get("model") or DEFAULT_MODEL
+            tools = meta.get("tools", []) or []
+        except json.JSONDecodeError as exc:
+            from agenta.sdk.utils.logging import get_module_logger
+            get_module_logger(__name__).warning(
+                "agent: invalid agent.json, using defaults: %s", exc
+            )
services/agent/src/server.ts-71-74 (1)
71-74: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Stack trace exposure in error responses.

The catch block exposes err.stack to the client, which can leak internal file paths, library versions, and implementation details. Consider logging the full error to stderr and returning only a generic message to the client.
Proposed fix
   } catch (err) {
-    const message = err instanceof Error ? err.stack ?? err.message : String(err);
+    const detail = err instanceof Error ? err.stack ?? err.message : String(err);
+    process.stderr.write(`[pi-wrapper] request error: ${detail}\n`);
+    const message = err instanceof Error ? err.message : "Internal server error";
     return send(res, 500, { ok: false, error: message });
   }
Source: Linters/SAST tools
sdks/python/agenta/sdk/engines/running/utils.py-248-252 (1)

248-252: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Catalog description is outdated and understates supported harnesses.

At Line 251, the description says this workflow runs on “the Pi harness,” but this PR wires selectable harness/sandbox combinations. The catalog text should be generalized to avoid misleading users.
web/packages/agenta-entity-ui/src/DrillInView/SchemaControls/AgentConfigControl.tsx-139-157 (1)
139-157: ⚠️ Potential issue | 🟡 Minor

Use stable keys for tool rows instead of array index.

Tools are added/deleted via handleAddTool and handleToolDelete, which mutates the array. Index-based keys cause React to remap component state to the wrong tool row after these mutations, producing stale editor state.

Use tool.agenta_metadata?.toolCode ?? toolName(tool) ?? index as a stable key, since toolCode is populated when tools are selected and toolName extracts the function name fallback.
Suggested fix
 {tools.map((tool, index) => {
+    const stableToolKey =
+        typeof tool === "object" && tool
+            ? String(
+                  (tool as Record<string, any>)?.agenta_metadata?.toolCode ??
+                      toolName(tool) ??
+                      index,
+              )
+            : String(index)
     const control = (
         <ToolItemControl
-            key={`tool-${index}`}
             value={tool}
             onChange={(v) => handleToolChange(index, v)}
             onDelete={disabled ? undefined : () => handleToolDelete(index)}
             disabled={disabled}
         />
     )
     return EditorProvider ? (
         <EditorProvider
-            key={`tool-editor-${index}`}
+            key={`tool-editor-${stableToolKey}`}
             codeOnly
             language="json"
             showToolbar={false}
             enableTokens={false}
-            id={`agent-tool-editor-${index}`}
+            id={`agent-tool-editor-${stableToolKey}`}
         >
             {control}
         </EditorProvider>
     ) : (
         control
     )
 })}
services/agent/README.md-24-40 (1)
24-40: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add a language identifier to the fenced block.

This fence triggers markdownlint MD040 and can be fixed by marking it as text.
💡 Proposed fix
-```
+```text
 src/
   cli.ts              entrypoint: stdin/stdout (subprocess transport)
   server.ts           entrypoint: HTTP sidecar on :8765
@@
-```
+```
Source: Linters/SAST tools

🧹 Nitpick comments (3)

docs/design/agent-workflows/scratch/wp-8-rivet-acp-runtime/status.md (1)

94-98: ⚡ Quick win

Document the Daytona Pi model catalog limitation as a follow-up.

The known limitation (freshly-provisioned Pi advertises only model: default on Daytona) is well-documented here. Consider opening a follow-up issue to settle the in-sandbox Pi model config if not already tracked.

Do you want me to help create a follow-up issue to track the Daytona Pi model catalog setup?
docs/design/agent-workflows/scratch/wp-8-rivet-acp-runtime/poc/commit_agent_config.py (1)
35-53: 💤 Low value

Add defensive checks for schema structure.

Lines 35–53 directly access nested dict keys (data["schemas"]["parameters"]["properties"]) without validating the schema structure. If the revision data has a different structure, a KeyError will be raised.

For a PoC this is acceptable, but for production robustness, add:
+        schemas = data.get("schemas", {})
+        parameters = schemas.get("parameters", {})
+        props = parameters.get("properties", {})
-        props = data["schemas"]["parameters"]["properties"]
services/oss/src/agent/schemas.py (1)
13-20: ⚡ Quick win

Consider importing defaults from config.py to reduce duplication.

_DEFAULT_MODEL and _DEFAULT_AGENTS_MD are duplicated here and in config.py. While the comment acknowledges the need to "keep in sync," importing from config.py would eliminate drift risk.
-_DEFAULT_MODEL = "gpt-5.5"
-_DEFAULT_AGENTS_MD = (
-    "You are a friendly hello-world agent running on the Agenta agent service.\n\n"
-    "- Greet the user warmly.\n"
-    "- Answer the user's message in one or two short sentences."
-)
+from oss.src.agent.config import DEFAULT_MODEL as _DEFAULT_MODEL, DEFAULT_AGENTS_MD as _DEFAULT_AGENTS_MD

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 27cfac53-0d07-4c8c-b8eb-9188f96ee1e4

📥 Commits

Reviewing files that changed from the base of the PR and between 5722ff7 and 7be759e.

⛔ Files ignored due to path filters (2)

docs/design/agent-workflows/scratch/wp-1-pi-tracing/poc/pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml
services/agent/pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml

📒 Files selected for processing (102)

.gitignore
api/oss/src/apis/fastapi/tools/models.py
api/oss/src/apis/fastapi/tools/router.py
api/oss/src/core/tools/dtos.py
api/oss/src/core/tools/exceptions.py
api/oss/src/core/tools/service.py
docs/design/agent-workflows/README.md
docs/design/agent-workflows/adapters/claude-code.md
docs/design/agent-workflows/adapters/pi.md
docs/design/agent-workflows/architecture.md
docs/design/agent-workflows/ports-and-adapters.md
docs/design/agent-workflows/scratch/harness-port-redesign/README.md
docs/design/agent-workflows/scratch/harness-port-redesign/implementation.md
docs/design/agent-workflows/scratch/harness-port-redesign/plan.md
docs/design/agent-workflows/scratch/harness-port-redesign/proposal.md
docs/design/agent-workflows/scratch/harness-port-redesign/research.md
docs/design/agent-workflows/scratch/harness-port-redesign/status.md
docs/design/agent-workflows/scratch/research/auth-secrets.md
docs/design/agent-workflows/scratch/research/daytona-sandbox.md
docs/design/agent-workflows/scratch/research/diskless-in-memory-config.md
docs/design/agent-workflows/scratch/research/open-questions.md
docs/design/agent-workflows/scratch/research/otel-instrumentation.md
docs/design/agent-workflows/scratch/research/pi-interaction.md
docs/design/agent-workflows/scratch/research/sandbox-sharing.md
docs/design/agent-workflows/scratch/wp-1-pi-tracing/README.md
docs/design/agent-workflows/scratch/wp-1-pi-tracing/integrating-the-tracing-extension.md
docs/design/agent-workflows/scratch/wp-1-pi-tracing/poc/.env.example
docs/design/agent-workflows/scratch/wp-1-pi-tracing/poc/README.md
docs/design/agent-workflows/scratch/wp-1-pi-tracing/poc/agenta-otel.ts
docs/design/agent-workflows/scratch/wp-1-pi-tracing/poc/package.json
docs/design/agent-workflows/scratch/wp-1-pi-tracing/poc/run.ts
docs/design/agent-workflows/scratch/wp-1-pi-tracing/tracing-in-the-agent-service.md
docs/design/agent-workflows/scratch/wp-2-agent-service/README.md
docs/design/agent-workflows/scratch/wp-2-agent-service/implementation-plan.md
docs/design/agent-workflows/scratch/wp-2-agent-service/qa.md
docs/design/agent-workflows/scratch/wp-3-daytona-sandbox/README.md
docs/design/agent-workflows/scratch/wp-3-daytona-sandbox/poc/README.md
docs/design/agent-workflows/scratch/wp-3-daytona-sandbox/poc/bench_coldstart.py
docs/design/agent-workflows/scratch/wp-3-daytona-sandbox/poc/build_snapshot.py
docs/design/agent-workflows/scratch/wp-3-daytona-sandbox/poc/cleanup.py
docs/design/agent-workflows/scratch/wp-3-daytona-sandbox/poc/run_agent.py
docs/design/agent-workflows/scratch/wp-4-multi-message-output/README.md
docs/design/agent-workflows/scratch/wp-5-chat-vs-completion/README.md
docs/design/agent-workflows/scratch/wp-6-workflow-type-and-template/README.md
docs/design/agent-workflows/scratch/wp-7-tools/README.md
docs/design/agent-workflows/scratch/wp-8-rivet-acp-runtime/README.md
docs/design/agent-workflows/scratch/wp-8-rivet-acp-runtime/architecture.md
docs/design/agent-workflows/scratch/wp-8-rivet-acp-runtime/context.md
docs/design/agent-workflows/scratch/wp-8-rivet-acp-runtime/isolation-and-fork.md
docs/design/agent-workflows/scratch/wp-8-rivet-acp-runtime/plan.md
docs/design/agent-workflows/scratch/wp-8-rivet-acp-runtime/poc/build_rivet_snapshot.py
docs/design/agent-workflows/scratch/wp-8-rivet-acp-runtime/poc/commit_agent_config.py
docs/design/agent-workflows/scratch/wp-8-rivet-acp-runtime/poc/debug-events.ts
docs/design/agent-workflows/scratch/wp-8-rivet-acp-runtime/poc/dump-full.ts
docs/design/agent-workflows/scratch/wp-8-rivet-acp-runtime/poc/package.json
docs/design/agent-workflows/scratch/wp-8-rivet-acp-runtime/poc/spike.ts
docs/design/agent-workflows/scratch/wp-8-rivet-acp-runtime/research.md
docs/design/agent-workflows/scratch/wp-8-rivet-acp-runtime/status.md
docs/design/agent-workflows/sessions.md
hosting/docker-compose/ee/docker-compose.dev.yml
sdks/python/agenta/sdk/engines/running/interfaces.py
sdks/python/agenta/sdk/engines/running/utils.py
services/agent/.dockerignore
services/agent/README.md
services/agent/config/AGENTS.md
services/agent/config/agent.json
services/agent/docker/Dockerfile.dev
services/agent/package.json
services/agent/scripts/build-extension.mjs
services/agent/src/cli.ts
services/agent/src/engines/pi.ts
services/agent/src/engines/rivet.ts
services/agent/src/extensions/agenta.ts
services/agent/src/protocol.ts
services/agent/src/server.ts
services/agent/src/tools/client.ts
services/agent/src/tools/mcp-bridge.ts
services/agent/src/tools/mcp-server.ts
services/agent/src/tools/relay.ts
services/agent/src/tracing/otel.ts
services/agent/tsconfig.json
services/entrypoints/main.py
services/oss/src/agent/__init__.py
services/oss/src/agent/app.py
services/oss/src/agent/client.py
services/oss/src/agent/config.py
services/oss/src/agent/inputs.py
services/oss/src/agent/schemas.py
services/oss/src/agent/secrets.py
services/oss/src/agent/tools.py
services/oss/src/agent/tracing.py
services/oss/src/harness/__init__.py
services/oss/src/harness/environment.py
services/oss/src/harness/ports.py
services/oss/src/harness/transports.py
services/oss/src/harness/wire.py
web/oss/src/components/pages/app-management/components/CreateAppDropdown/index.tsx
web/oss/src/components/pages/app-management/modals/CreateAppTypeModal/index.tsx
web/oss/src/components/pages/prompts/assets/iconHelpers.tsx
web/packages/agenta-entities/src/workflow/state/appUtils.ts
web/packages/agenta-entity-ui/src/DrillInView/SchemaControls/AgentConfigControl.tsx
web/packages/agenta-entity-ui/src/DrillInView/SchemaControls/SchemaPropertyRenderer.tsx

coderabbitai · 2026-06-17T19:51:44Z

+    async def resolve_agent_tools(
+        self,
+        *,
+        project_id: UUID,
+        tools: List[AgentToolReference],
+    ) -> AgentToolsResolution:
+        """Resolve an agent's tool references into model-ready specs.
+
+        ``builtin`` references pass through as names. ``composio`` references are
+        validated against the project's connections up front and enriched from the
+        catalog (description + input schema), so the model never sees a stale schema
+        and the invoke fails fast on a missing/invalid connection rather than mid-loop.
+        """
+        builtins: List[str] = []
+        custom: List[ResolvedAgentTool] = []
+
+        for ref in tools:
+            if isinstance(ref, AgentBuiltinTool):
+                if ref.name:
+                    builtins.append(ref.name)
+                continue
+
+            if isinstance(ref, AgentComposioTool):
+                custom.append(
+                    await self._resolve_composio_tool(
+                        project_id=project_id,
+                        ref=ref,
+                    )
+                )
+
+        return AgentToolsResolution(builtins=builtins, custom=custom)
+


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Enforce unique resolved tool names to avoid cross-connection misrouting.

Line 559 defaults names to {integration}__{action} and drops connection, so two refs for different connections can resolve to the same name. Downstream consumers key by name (e.g., MCP map), which can overwrite one tool with another and execute against the wrong account.

💡 Suggested fix

async def resolve_agent_tools( @@ - builtins: List[str] = [] - custom: List[ResolvedAgentTool] = [] + builtins: List[str] = [] + custom: List[ResolvedAgentTool] = [] + seen_names: set[str] = set() @@ if isinstance(ref, AgentComposioTool): - custom.append( - await self._resolve_composio_tool( - project_id=project_id, - ref=ref, - ) - ) + resolved = await self._resolve_composio_tool( + project_id=project_id, + ref=ref, + ) + if resolved.name in seen_names: + raise ToolSlugInvalidError( + slug=resolved.name, + detail="Duplicate tool name in resolved tools list", + ) + seen_names.add(resolved.name) + custom.append(resolved) @@ - name = ref.name or f"{ref.integration}__{ref.action}" + name = ref.name or f"{ref.integration}__{ref.action}__{ref.connection}"

Also applies to: 559-562

coderabbitai · 2026-06-17T19:51:44Z

@@ -0,0 +1,7 @@
+# Agenta collector (the runner also falls back to the repo-root .env.test.local).
+AGENTA_HOST=http://144.76.237.122:8280/


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Use a neutral placeholder for AGENTA_HOST in the sample env.

A hardcoded real host in .env.example can unintentionally route traces to an external environment during quick-start runs.

Suggested change

-AGENTA_HOST=http://144.76.237.122:8280/ +AGENTA_HOST=http://localhost:8280

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

AGENTA_HOST=http://144.76.237.122:8280/

AGENTA_HOST=http://localhost:8280

coderabbitai · 2026-06-17T19:51:44Z

+  pi.on("agent_end", async (event: any) => {
+    if (!agentSpan) return;
+    setOutput(agentSpan, lastAssistantText(event?.messages));
+    agentSpan.end();
+    agentSpan = undefined;
+    agentCtx = undefined;
+    lastContextMessages = undefined;
+  });


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Close and clear dangling tool spans at run end.

Run teardown does not currently drain toolSpans, so interrupted runs can leave stale spans in memory and contaminate later runs in the same process.

Suggested fix

pi.on("agent_end", async (event: any) => { if (!agentSpan) return; + for (const span of toolSpans.values()) { + span.setStatus({ code: SpanStatusCode.ERROR, message: "closed on agent_end" }); + span.end(); + } + toolSpans.clear(); + + if (llmSpan) { + llmSpan.setStatus({ code: SpanStatusCode.ERROR, message: "closed on agent_end" }); + llmSpan.end(); + llmSpan = undefined; + } + setOutput(agentSpan, lastAssistantText(event?.messages)); agentSpan.end(); agentSpan = undefined; agentCtx = undefined; lastContextMessages = undefined; + pendingPrompt = undefined; });

coderabbitai · 2026-06-17T19:51:44Z

+  const { session } = await createAgentSession({
+    cwd,
+    model,
+    authStorage,
+    modelRegistry,
+    tools: ["read", "bash", "edit", "write", "ls"],
+    sessionManager: SessionManager.inMemory(cwd),
+    resourceLoader: loader,
+  });
+
+  // Hand the session id + model to the extension so spans carry them.
+  runConfig.sessionId = session.sessionId;
+  runConfig.provider = model.provider;
+  runConfig.requestModel = model.id;
+
+  session.subscribe((event: any) => {
+    if (
+      event.type === "message_update" &&
+      event.assistantMessageEvent?.type === "text_delta"
+    ) {
+      process.stdout.write(event.assistantMessageEvent.delta);
+    } else if (event.type === "tool_execution_start") {
+      process.stdout.write(`\n[tool] ${event.toolName}\n`);
+    }
+  });
+
+  const traceIds: string[] = [];
+  for (let i = 0; i < scenario.prompts.length; i++) {
+    const p = scenario.prompts[i];
+    console.log(`\n[run] prompt ${i + 1}/${scenario.prompts.length}: ${p}\n`);
+    await session.prompt(p);
+    if (runConfig.traceId) traceIds.push(runConfig.traceId);
+  }
+
+  console.log("\n\n[run] flushing spans to Agenta...");
+  session.dispose();
+  await shutdownTracing();
+


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Move session disposal and tracing shutdown into finally.

If any prompt throws, the current flow skips session.dispose() and shutdownTracing(), which can lose final spans and leave session resources open.

Suggested fix

const { session } = await createAgentSession({ cwd, model, authStorage, modelRegistry, tools: ["read", "bash", "edit", "write", "ls"], sessionManager: SessionManager.inMemory(cwd), resourceLoader: loader, }); - // Hand the session id + model to the extension so spans carry them. - runConfig.sessionId = session.sessionId; - runConfig.provider = model.provider; - runConfig.requestModel = model.id; - - session.subscribe((event: any) => { - ... - }); - - const traceIds: string[] = []; - for (let i = 0; i < scenario.prompts.length; i++) { - ... - await session.prompt(p); - if (runConfig.traceId) traceIds.push(runConfig.traceId); - } - - console.log("\n\n[run] flushing spans to Agenta..."); - session.dispose(); - await shutdownTracing(); + try { + runConfig.sessionId = session.sessionId; + runConfig.provider = model.provider; + runConfig.requestModel = model.id; + + session.subscribe((event: any) => { + ... + }); + + const traceIds: string[] = []; + for (let i = 0; i < scenario.prompts.length; i++) { + ... + await session.prompt(p); + if (runConfig.traceId) traceIds.push(runConfig.traceId); + } + } finally { + session.dispose(); + await shutdownTracing(); + }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

const { session } = await createAgentSession({

cwd,

model,

authStorage,

modelRegistry,

tools: ["read", "bash", "edit", "write", "ls"],

sessionManager: SessionManager.inMemory(cwd),

resourceLoader: loader,

});

// Hand the session id + model to the extension so spans carry them.

runConfig.sessionId = session.sessionId;

runConfig.provider = model.provider;

runConfig.requestModel = model.id;

session.subscribe((event: any) => {

if (

event.type === "message_update" &&

event.assistantMessageEvent?.type === "text_delta"

) {

process.stdout.write(event.assistantMessageEvent.delta);

} else if (event.type === "tool_execution_start") {

process.stdout.write(`\n[tool] ${event.toolName}\n`);

}

});

const traceIds: string[] = [];

for (let i = 0; i < scenario.prompts.length; i++) {

const p = scenario.prompts[i];

console.log(`\n[run] prompt ${i + 1}/${scenario.prompts.length}: ${p}\n`);

await session.prompt(p);

if (runConfig.traceId) traceIds.push(runConfig.traceId);

}

console.log("\n\n[run] flushing spans to Agenta...");

session.dispose();

await shutdownTracing();

const { session } = await createAgentSession({

cwd,

model,

authStorage,

modelRegistry,

tools: ["read", "bash", "edit", "write", "ls"],

sessionManager: SessionManager.inMemory(cwd),

resourceLoader: loader,

});

try {

// Hand the session id + model to the extension so spans carry them.

runConfig.sessionId = session.sessionId;

runConfig.provider = model.provider;

runConfig.requestModel = model.id;

session.subscribe((event: any) => {

if (

event.type === "message_update" &&

event.assistantMessageEvent?.type === "text_delta"

) {

process.stdout.write(event.assistantMessageEvent.delta);

} else if (event.type === "tool_execution_start") {

process.stdout.write(`\n[tool] ${event.toolName}\n`);

}

});

const traceIds: string[] = [];

for (let i = 0; i < scenario.prompts.length; i++) {

const p = scenario.prompts[i];

console.log(`\n[run] prompt ${i + 1}/${scenario.prompts.length}: ${p}\n`);

await session.prompt(p);

if (runConfig.traceId) traceIds.push(runConfig.traceId);

}

} finally {

console.log("\n\n[run] flushing spans to Agenta...");

session.dispose();

await shutdownTracing();

}

coderabbitai · 2026-06-17T19:51:45Z

+            AGENTA_HOST: ${AGENTA_HOST:-http://144.76.237.122:8280}
+            AGENTA_API_KEY: ${AGENTA_API_KEY:-}


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Avoid a public-IP fallback for AGENTA_HOST in dev.

Line 446 defaults to an external host, which can unintentionally send local trace/tool payloads outside the dev stack when AGENTA_HOST is unset.

Suggested fix

- AGENTA_HOST: ${AGENTA_HOST:-http://144.76.237.122:8280} + AGENTA_HOST: ${AGENTA_HOST:-http://api:8000}

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

AGENTA_HOST: ${AGENTA_HOST:-http://144.76.237.122:8280}

AGENTA_API_KEY: ${AGENTA_API_KEY:-}

AGENTA_HOST: ${AGENTA_HOST:-http://api:8000}

AGENTA_API_KEY: ${AGENTA_API_KEY:-}

coderabbitai · 2026-06-17T19:51:46Z

+/** Build a parent Context from a W3C traceparent string, or undefined if absent/invalid. */
+function parentContext(traceparent?: string): Context | undefined {
+  if (!traceparent) return undefined;
+  const match = /^00-([0-9a-f]{32})-([0-9a-f]{16})-([0-9a-f]{2})$/.exec(
+    traceparent.trim(),
+  );
+  if (!match) return undefined;
+  const [, traceId, spanId, flags] = match;
+  const spanContext: SpanContext = {
+    traceId,
+    spanId,
+    // Honor the incoming sampled bit; default to sampled so child spans record.
+    traceFlags: (parseInt(flags, 16) & 1) === 1 ? TraceFlags.SAMPLED : TraceFlags.NONE,
+    isRemote: true,
+  };
+  return trace.setSpanContext(ROOT_CONTEXT, spanContext);
+}


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash rg -n --type=ts '\bbaggage\b' services/agent/src/engines/pi.ts services/agent/src/engines/rivet.ts services/agent/src/tracing/otel.ts

Repository: Agenta-AI/agenta

Length of output: 376

🏁 Script executed:

# Check the current implementation of parentContext and surrounding context sed -n '220,245p' services/agent/src/tracing/otel.ts

Repository: Agenta-AI/agenta

Length of output: 1035

🏁 Script executed:

# Check the RunConfig type definition and the lines mentioned sed -n '250,260p' services/agent/src/tracing/otel.ts

Repository: Agenta-AI/agenta

Length of output: 569

🏁 Script executed:

# Check the specific lines mentioned: 423-431, 468-469 sed -n '420,435p' services/agent/src/tracing/otel.ts sed -n '465,475p' services/agent/src/tracing/otel.ts

Repository: Agenta-AI/agenta

Length of output: 1212

🏁 Script executed:

# Check the imports to see what's currently imported from OpenTelemetry head -30 services/agent/src/tracing/otel.ts

Repository: Agenta-AI/agenta

Length of output: 1886

🏁 Script executed:

# Check if propagation is imported and what the complete imports look like sed -n '1,50p' services/agent/src/tracing/otel.ts | grep -A 20 "^import"

Repository: Agenta-AI/agenta

Length of output: 478

🏁 Script executed:

# Check line 726-727 to see the other call site sed -n '720,732p' services/agent/src/tracing/otel.ts

Repository: Agenta-AI/agenta

Length of output: 692

🏁 Script executed:

# Check rivet.ts to see if it has the same pattern issue sed -n '715,730p' services/agent/src/engines/rivet.ts

Repository: Agenta-AI/agenta

Length of output: 729

🏁 Script executed:

# Check if propagation is available in `@opentelemetry/api` by searching the codebase for its usage rg -n 'propagation\.' services/agent/src --type=ts

Repository: Agenta-AI/agenta

Length of output: 42

🏁 Script executed:

# Also check what version of `@opentelemetry/api` is being used grep -A 5 '"`@opentelemetry/api`"' package.json

Repository: Agenta-AI/agenta

Length of output: 106

🏁 Script executed:

# Verify the RunConfig object assignment at lines 423-431 - check what fields are actually assigned sed -n '418,435p' services/agent/src/tracing/otel.ts

Repository: Agenta-AI/agenta

Length of output: 614

🏁 Script executed:

# Find package.json in the repo find . -name "package.json" -type f | head -5

Repository: Agenta-AI/agenta

Length of output: 339

🏁 Script executed:

# Check if propagation is being used anywhere else in the codebase rg 'propagation' services/agent/src --type=ts -A 2 -B 2

Repository: Agenta-AI/agenta

Length of output: 42

🏁 Script executed:

# Look for any W3C baggage related imports or patterns rg 'baggage|Baggage' services/agent/src --type=ts -B 2 -A 2

Repository: Agenta-AI/agenta

Length of output: 1583

baggage is wired upstream but never propagated into tracing context.

RunConfig accepts baggage from callers (pi.ts:180, rivet.ts:724) but it is not stored in the config object during initialization, not passed to parentContext(), and not handled by parentContext() which only parses traceparent. Baggage context is dropped on both Pi and Rivet paths.

The fix requires:

Adding baggage: init.baggage to the config object assignment (line 423-431)

Updating parentContext() signature to accept baggage?: string

Using OpenTelemetry's standard propagation.extract() API to handle both W3C traceparent and baggage headers

Passing config.baggage to all parentContext() calls (lines 468, 726-727)

Suggested fix

import { context, + propagation, ROOT_CONTEXT, trace, @@ -function parentContext(traceparent?: string): Context | undefined { - if (!traceparent) return undefined; - const match = /^00-([0-9a-f]{32})-([0-9a-f]{16})-([0-9a-f]{2})$/.exec( - traceparent.trim(), - ); - if (!match) return undefined; - const [, traceId, spanId, flags] = match; - const spanContext: SpanContext = { - traceId, - spanId, - traceFlags: (parseInt(flags, 16) & 1) === 1 ? TraceFlags.SAMPLED : TraceFlags.NONE, - isRemote: true, - }; - return trace.setSpanContext(ROOT_CONTEXT, spanContext); +function parentContext(traceparent?: string, baggage?: string): Context | undefined { + const carrier: Record<string, string> = {}; + if (traceparent?.trim()) carrier.traceparent = traceparent.trim(); + if (baggage?.trim()) carrier.baggage = baggage.trim(); + if (!carrier.traceparent && !carrier.baggage) return undefined; + const extracted = propagation.extract(ROOT_CONTEXT, carrier); + return trace.getSpanContext(extracted) ? extracted : undefined; } @@ const config: RunConfig = { @@ + baggage: init.baggage, @@ - const parent = parentContext(config.traceparent); + const parent = parentContext(config.traceparent, config.baggage); @@ - const parent = parentContext(init.traceparent); + const parent = parentContext(init.traceparent, init.baggage);

🧰 Tools

🪛 OpenGrep (1.22.0)

[ERROR] 228-230: Dynamic command passed to child_process.exec/execSync. Use child_process.execFile or spawn with an argument array instead.

(coderabbit.command-injection.exec-js)

coderabbitai · 2026-06-17T19:51:46Z

+    try:
+        session = harness.create_session(session_config)
+        result = await session.prompt(msgs)
+        await session.destroy()
+    finally:


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Session cleanup is skipped on prompt failures.

At Line 94-97, session.destroy() executes only on success. If session.prompt(...) raises, session resources remain undisposed.

Suggested fix

await harness.setup() try: session = harness.create_session(session_config) - result = await session.prompt(msgs) - await session.destroy() + try: + result = await session.prompt(msgs) + finally: + await session.destroy() finally: await harness.shutdown()

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

try:

session = harness.create_session(session_config)

result = await session.prompt(msgs)

await session.destroy()

finally:

await harness.setup()

try:

session = harness.create_session(session_config)

try:

result = await session.prompt(msgs)

finally:

await session.destroy()

finally:

await harness.shutdown()

coderabbitai · 2026-06-17T19:51:46Z

+            instructions=agent.get("instructions") or config.agents_md,
+            model=agent.get("model") or config.model,
+            tools=_as_list(agent.get("tools")),
+            harness=(


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Default tool configuration is dropped in the agent path.

At Line 45, omitting agent.tools resolves to [], while the legacy path falls back to config.tools. This makes tool availability depend on request shape and can silently disable configured tools.

Suggested fix

if isinstance(agent, dict): + raw_tools = agent.get("tools") return RunConfig( instructions=agent.get("instructions") or config.agents_md, model=agent.get("model") or config.model, - tools=_as_list(agent.get("tools")), + tools=_as_list(config.tools if raw_tools is None else raw_tools), harness=( agent.get("harness") or os.getenv("AGENTA_AGENT_HARNESS", "pi") ).lower(),

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

instructions=agent.get("instructions") or config.agents_md,

model=agent.get("model") or config.model,

tools=_as_list(agent.get("tools")),

harness=(

if isinstance(agent, dict):

raw_tools = agent.get("tools")

return RunConfig(

instructions=agent.get("instructions") or config.agents_md,

model=agent.get("model") or config.model,

tools=_as_list(config.tools if raw_tools is None else raw_tools),

harness=(

coderabbitai · 2026-06-17T19:51:46Z

+        content = raw.get("content", "")
+        if isinstance(content, list):
+            content = [ContentBlock.from_raw(block) for block in content]
+        return cls(role=str(raw["role"]), content=content)


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Message.from_raw can produce invalid wire payloads for dict content.

At Line 164-167, dict content is not normalized. Later (Line 146-150), to_wire treats non-string content as a list, so dict content is iterated as keys, corrupting the outgoing message.

Suggested fix

content = raw.get("content", "") - if isinstance(content, list): + if isinstance(content, dict): + content = [ContentBlock.from_raw(content)] + elif isinstance(content, list): content = [ContentBlock.from_raw(block) for block in content] + elif not isinstance(content, str): + content = str(content) return cls(role=str(raw["role"]), content=content)

coderabbitai · 2026-06-17T19:51:46Z

+_DEFAULT_TIMEOUT = float(os.getenv("AGENTA_AGENT_TIMEOUT", "180"))
+_DEFAULT_COMMAND = ["pnpm", "exec", "tsx", "src/cli.ts"]


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Import-time timeout parsing can crash the service on bad env values.

At Line 32, a non-numeric AGENTA_AGENT_TIMEOUT raises ValueError during import, taking down startup instead of falling back safely.

Suggested fix

-_DEFAULT_TIMEOUT = float(os.getenv("AGENTA_AGENT_TIMEOUT", "180")) +def _resolve_default_timeout() -> float: + raw = os.getenv("AGENTA_AGENT_TIMEOUT", "180") + try: + value = float(raw) + return value if value > 0 else 180.0 + except (TypeError, ValueError): + log.warning("Invalid AGENTA_AGENT_TIMEOUT=%r; using 180s", raw) + return 180.0 + +_DEFAULT_TIMEOUT = _resolve_default_timeout()

mmabrouk added 4 commits June 16, 2026 20:08

feat(agent): runnable tools as agent configuration (WP-7)

59bf20a

vercel Bot deployed to Preview June 17, 2026 11:08 View deployment

mmabrouk changed the base branch from feat/agent-rivet-acp-wp8 to main June 17, 2026 11:37

vercel Bot deployed to Preview June 17, 2026 11:54 View deployment

github-advanced-security AI found potential problems Jun 17, 2026

View reviewed changes

Comment thread services/agent/src/server.ts

"content-type": "application/json",

"content-length": Buffer.byteLength(payload),

});

res.end(payload);

vercel Bot deployed to Preview June 17, 2026 12:19 View deployment

mmabrouk added 2 commits June 17, 2026 15:46

vercel Bot deployed to Preview June 17, 2026 13:48 View deployment

mmabrouk changed the title ~~feat(agent): session-shaped harness/runtime port (rivet + legacy Pi)~~ feat(agent): run coding agents as an Agenta workflow over ACP Jun 17, 2026

vercel Bot deployed to Preview June 17, 2026 19:35 View deployment

mmabrouk requested a review from jp-agenta June 17, 2026 19:36

coderabbitai Bot reviewed Jun 17, 2026

View reviewed changes

		@@ -0,0 +1,7 @@
		# Agenta collector (the runner also falls back to the repo-root .env.test.local).
		AGENTA_HOST=http://144.76.237.122:8280/

	AGENTA_HOST=http://144.76.237.122:8280/
	AGENTA_HOST=http://localhost:8280

		AGENTA_HOST: ${AGENTA_HOST:-http://144.76.237.122:8280}
		AGENTA_API_KEY: ${AGENTA_API_KEY:-}

		_DEFAULT_TIMEOUT = float(os.getenv("AGENTA_AGENT_TIMEOUT", "180"))
		_DEFAULT_COMMAND = ["pnpm", "exec", "tsx", "src/cli.ts"]

Conversation

mmabrouk commented Jun 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Context

How it works

What this PoC includes

What it defers (documented, not built)

Tests / notes

What to QA

Uh oh!

vercel Bot commented Jun 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot commented Jun 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mmabrouk commented Jun 17, 2026 •

edited

Loading

vercel Bot commented Jun 17, 2026 •

edited

Loading

coderabbitai Bot commented Jun 17, 2026 •

edited

Loading