Skip to content

feat(agent): run coding agents as an Agenta workflow over ACP#4721

Draft
mmabrouk wants to merge 9 commits into
mainfrom
feat/agent-harness-port
Draft

feat(agent): run coding agents as an Agenta workflow over ACP#4721
mmabrouk wants to merge 9 commits into
mainfrom
feat/agent-harness-port

Conversation

@mmabrouk

@mmabrouk mmabrouk commented Jun 17, 2026

Copy link
Copy Markdown
Member

Context

Agenta runs prompt workflows today: completion, chat, and the LLM judge. Each calls a model
once and returns one answer. This PoC adds a different kind of workflow. An agent runs a
loop: it reads its instructions, calls a model, runs a tool, reads the result, and calls the
model again, until the task is done. It sits behind the same /invoke contract, traces into
the same spans, and reads its config from the same playground.

The PoC proves one claim: the agent and the place it runs are both config, not code. You
change a dropdown to swap Pi for Claude Code, or local for a Daytona cloud sandbox, and
nothing above the seam changes.

How it works

A Python workflow handler receives /invoke and calls a Node sidecar over HTTP. The sidecar
drives a coding agent through the open-source rivet sandbox-agent daemon over the Agent
Client Protocol (ACP). The daemon starts an ACP adapter, which starts the agent (Pi or
Claude Code). Two orthogonal config axes swap independently: harness (pi | claude) and
sandbox (local | daytona).

The full write-up lives in docs/design/agent-workflows/. Read it in this order:

  • architecture.md: the data flow, the two containers, and the vocabulary. Start here.
  • ports-and-adapters.md: the neutral seam, the wire contract, and how the service picks an
    engine and a transport.
  • sessions.md: cold replay today, and the two paths to warm or persisted sessions tomorrow.
  • adapters/pi.md and adapters/claude-code.md: how each agent is wired, traced, and given
    tools.

What this PoC includes

  • The agent workflow behind /invoke, with harness and sandbox as playground dropdowns.
  • Two harnesses: Pi (default) and Claude Code.
  • Two sandboxes: local (processes inside the sidecar) and Daytona (a throwaway cloud machine).
  • A neutral, session-shaped port (Environment, Harness, AgentSession) with two engines
    behind it: rivet over ACP, and a legacy in-process Pi path.
  • Backend-resolved tools that keep the provider key and connection auth server-side. Pi takes
    them natively through a Pi extension; an MCP-capable harness takes them over MCP.
  • Tracing that nests the agent's run under the caller's /invoke span, with token usage
    rolled onto the workflow span.

What it defers (documented, not built)

  • A warm daemon and server-owned session storage (see sessions.md).
  • Live streaming to the client over the HTTP edge.
  • The multi-tenant filesystem jail for a shared daemon.
  • Registering the agent as a first-class backend workflow type with its own builtin URI.

Tests / notes

Verified live against the dockerized dev stack, with real /invoke requests through the
playground service route:

  • pi / local: 200, ~3s. Trace nests _agent → invoke_agent → turn 0 → chat, with the token
    total on _agent.
  • claude / local: 200, same nested tree. Confirms the harness swap is one config value.
  • pi / daytona: 200, ~10s. Runner-built trace, token total on _agent.

Python passes ruff format and ruff check. TypeScript compiles under tsc --strict.

For reviewers:

  • The playground model choice is not honored inside the Daytona sandbox yet. The in-sandbox
    Pi lacks the model catalog the local Pi loads. The model axis is honored on pi / local.
  • The Claude over MCP tool path is wired but not yet exercised end to end. Pi tools are
    verified end to end.

This PR lands the full agent-workflows PoC into main: the WP-2 service, the WP-7 tools, the
WP-8 ACP runtime, and the session-shaped port. The smaller stacked PRs (for example #4718)
are intermediate stacking artifacts. Review this one as the PoC.

What to QA

  • In the agent playground, run with harness=pi, sandbox=local. You get a reply, and the trace
    shows _agent → invoke_agent → turn 0 → chat with tokens on _agent.
  • Change harness to claude, run again. You still get a reply, and the trace shows the same
    nested shape.
  • Set sandbox=daytona with harness=pi. The reply comes back in about 10s with the full trace.
  • Regression: a normal completion or chat playground run still traces and answers as before.

mmabrouk added 4 commits June 16, 2026 20:08
… docs

- New agent workflow service wrapping the Pi harness, served same-origin like
  chat/completion at /services/agent/v0 (Python service + Node Pi sidecar, ports/adapters).
- Builtin 'agent' app type + create-app template; config is model + AGENTS.md.
- /inspect chat schema and OpenTelemetry tracing into Agenta.
- EE dev compose agent-pi sidecar; design docs under docs/design/agent-workflows.
Re-platform the agent workflow service to drive coding harnesses (Pi, Claude
Code) over the Agent Client Protocol through a rivet sandbox-agent daemon,
behind the unchanged Harness port and /invoke contract. The harness (pi/claude)
and sandbox (local/daytona) are editable playground config; tracing nests under
the /invoke span; tools are delivered Pi-native via a bundled extension; and the
model provider key resolves from the project vault.
Evolve the agent service ports toward the rivet sandbox-agent session shape, so the
rivet (ACP) and legacy in-process Pi backends share one clean, capability-aware port.

- ports.py: Environment + Harness seams, a first-class AgentSession (create/prompt/
  destroy), HarnessCapabilities, ContentBlock, Message, AgentEvent, structured AgentResult.
- harness.py: SubprocessHarness + HttpHarness share one wire contract (wire.py),
  replacing the pi_harness/pi_http_harness/rivet_harness trio. The engine is an env value.
- TS: shared protocol.ts; runPi/runRivet return the enriched result; runRivet probes
  getAgent() capabilities and routes tools by mcpTools, not the harness name; usage flows
  on the rivet path (split from PromptResponse.usage); one shared toolClient.ts replaces
  the triplicated /tools/call client.
- agent.py uses the session API; _select_backend upgrades pi/local to rivet when the
  selected harness/sandbox needs it. permission_policy added to /inspect.

Verified live: pi, rivet+pi+local, rivet+claude+local, rivet+pi+daytona; playground run
succeeds with usage; invoke_agent nests under the /invoke span. Design notes under
docs/design/agent-workflows/harness-port-redesign/.
@vercel

vercel Bot commented Jun 17, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
agenta-documentation Ready Ready Preview, Comment Jun 17, 2026 7:35pm

Request Review

@coderabbitai

coderabbitai Bot commented Jun 17, 2026

Copy link
Copy Markdown

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: cf0acae1-2003-4724-bc1b-bd9d9c65edd0

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • ✅ Review completed - (🔄 Check again to review again)
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/agent-harness-port

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@mmabrouk mmabrouk changed the base branch from feat/agent-rivet-acp-wp8 to main June 17, 2026 11:37
…ss runtime

Address the god-module and the misleading package name:

- services/oss/src/harness/ (was agent_pi/): the engine-agnostic runtime — ports.py,
  transports.py (was harness.py), environment.py, wire.py. Named for the seam, not Pi;
  harness choice (pi/claude) lives inside the runtime, so there is no agent_claude.
- services/oss/src/agent/ (was the 470-line agent.py god-module): the Agenta workflow app
  — app.py (thin handler + backend wiring), inputs.py (request parsing), tools.py,
  secrets.py, tracing.py, client.py (shared backend access), schemas.py, config.py.

No behavior change. Verified live: a playground run answers 'REFACTOR-OK Lisbon' with usage.
"content-type": "application/json",
"content-length": Buffer.byteLength(payload),
});
res.end(payload);
…ite README

The TS runner's src/ had grown one work package at a time into a flat folder of ten
files with no signal of role. Group them and rewrite the stale README (it still called
this a 'Pi wrapper' and pointed at the moved agent.py):

  src/cli.ts, server.ts, protocol.ts   entrypoints + the wire contract
  src/engines/{pi,rivet}.ts            the two engines (was runPi.ts / runRivet.ts)
  src/tracing/otel.ts                  the tracers (was agenta-otel.ts)
  src/tools/{client,mcp-bridge,mcp-server}.ts   tool delivery (was toolClient/toolBridge*)
  src/extensions/agenta.ts             the Pi extension (was piExtension.ts)

No behavior change. Updated the fragile __dirname-relative paths in engines/rivet.ts
(PKG_ROOT) and tools/mcp-bridge.ts (the tsx bin + server path) for the new depth, and the
build-extension entry. Verified live: rivet+pi+local through the restarted sidecar answers
'Athens' with usage; tsc --strict and the extension build pass.
mmabrouk added 2 commits June 17, 2026 15:46
Replace the loose model/agents_md/harness/sandbox params with one `agent`
config element (x-ag-type: agent_config) carrying instructions, model, tools,
harness, sandbox, and permission policy. The playground renders it through a new
AgentConfigControl that reuses the existing controls: the model selector, the
tool picker (so Composio and builtin tools are finally selectable on the agent),
the enum selects, and a textarea. The backend reads it via resolve_agent_config
and falls back to the old shape so existing revisions keep running.

Verified live: the element renders with the tool picker, and a GitHub Composio
tool runs end to end on pi+local.
Tools worked locally but failed on Daytona. The in-sandbox Pi extension POSTed
each tool call to Agenta's /tools/call, but a firewalled or private backend does
not expose that to the remote cloud sandbox (the same reason tracing is built
from the event stream on Daytona rather than in-sandbox OTLP). The sandbox has
internet but cannot reach the backend, so the call failed and the model gave up.

Route the call through the runner, which can reach Agenta. The extension writes
the request to a file in a sandbox dir and polls for the response; the runner
watches the dir over the daemon filesystem API, calls /tools/call, and writes the
result back (tools/relay.ts). Local runs keep the direct path.

Verified programmatically: rivet+pi+daytona with a GitHub Composio tool now
returns the real login (was 'the tool failed twice'); local is unchanged.
@mmabrouk mmabrouk changed the title feat(agent): session-shaped harness/runtime port (rivet + legacy Pi) feat(agent): run coding agents as an Agenta workflow over ACP Jun 17, 2026
Move the raw work-package material (wp-1..wp-8, harness-port-redesign,
research) into scratch/ and add clean top-level pages a reviewer can read
top to bottom: a README index, architecture, ports-and-adapters, sessions,
and adapters/{pi,claude-code}. Update the three in-code references that
pointed at the moved doc paths.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 19

Note

Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
services/agent/docker/Dockerfile.dev (1)

7-42: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Run the container as a non-root user.

There is no USER directive, so the runtime process defaults to root.

💡 Proposed fix
 RUN pnpm run build:extension
+RUN chown -R node:node /app

 ENV NODE_ENV=development \
     PORT=8765

 EXPOSE 8765

 # Call the local tsx binary directly to avoid pnpm/corepack HOME writes when the
 # container runs as a non-root host uid.
+USER node
 CMD ["node_modules/.bin/tsx", "watch", "src/server.ts"]

Source: Linters/SAST tools

🟡 Minor comments (22)
docs/design/agent-workflows/scratch/harness-port-redesign/research.md-3-7 (1)

3-7: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Update the “source verified” file paths to current module locations.

This header cites services/oss/src/agent_pi/ports.py and services/oss/src/agent.py, but current stack context and code snippets point to services/oss/src/harness/ports.py and split agent modules. Keeping this exact avoids audit drift.

docs/design/agent-workflows/scratch/wp-2-agent-service/README.md-3-3 (1)

3-3: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Refresh the status line to avoid stale planning state.

Status: not started. conflicts with the implementation-plan/history in this same WP folder and can misdirect readers. Consider marking this as historical/archived instead.

docs/design/agent-workflows/scratch/wp-2-agent-service/README.md-55-55 (1)

55-55: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix the WP identifier typo.

Success for this WP1 should be WP-2 in this document.

docs/design/agent-workflows/scratch/wp-2-agent-service/implementation-plan.md-153-170 (1)

153-170: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Specify a language for this fenced block.

This block is missing a language tag and will continue tripping MD040.

Source: Linters/SAST tools

docs/design/agent-workflows/scratch/wp-2-agent-service/implementation-plan.md-267-271 (1)

267-271: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix the cd path in the login verification snippet.

The path misses the scratch/ segment, so this command won’t resolve in the documented tree.

🛠️ Suggested fix
-cd docs/design/agent-workflows/wp-1-pi-tracing/poc
+cd docs/design/agent-workflows/scratch/wp-1-pi-tracing/poc
docs/design/agent-workflows/scratch/wp-3-daytona-sandbox/poc/run_agent.py-183-184 (1)

183-184: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Guard against missing flag values in arg().

If a flag is passed without a value (or followed by another flag), this currently throws IndexError.

✅ Suggested fix
 def arg(name: str, default: str) -> str:
-    return sys.argv[sys.argv.index(name) + 1] if name in sys.argv else default
+    if name not in sys.argv:
+        return default
+    idx = sys.argv.index(name) + 1
+    if idx >= len(sys.argv) or sys.argv[idx].startswith("--"):
+        raise SystemExit(f"{name} requires a value")
+    return sys.argv[idx]
docs/design/agent-workflows/scratch/wp-3-daytona-sandbox/poc/run_agent.py-259-266 (1)

259-266: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Quote shell arguments when assembling pi_cmd.

provider, model, and paths are interpolated directly into a shell command. Quoting avoids malformed commands when values include shell-significant characters.

🔒 Suggested fix
+import shlex
 ...
-            pi_cmd = (
-                f"cd {run_dir} && TMPDIR={run_dir}/tmp "
-                f"pi -p {json.dumps(PROMPT)} "
-                f"--mode json --approve --provider {provider} --model {model} "
+            q_run_dir = shlex.quote(run_dir)
+            q_prompt = shlex.quote(PROMPT)
+            q_provider = shlex.quote(provider)
+            q_model = shlex.quote(model)
+            pi_cmd = (
+                f"cd {q_run_dir} && TMPDIR={q_run_dir}/tmp "
+                f"pi -p {q_prompt} "
+                f"--mode json --approve --provider {q_provider} --model {q_model} "
                 f"-t read,bash,edit,write,ls "
-                f"--session-dir {run_dir}/.pi-sessions --name {session_id} "
+                f"--session-dir {q_run_dir}/.pi-sessions --name {shlex.quote(session_id)} "
                 f"< /dev/null"
             )
docs/design/agent-workflows/scratch/harness-port-redesign/research.md-148-152 (1)

148-152: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add a language identifier to the fenced code block.

The fenced block here is missing a language tag and will keep triggering MD040.

✏️ Suggested fix
-```
+```text
 commandExecution, errorEvents, fileAttachments, fileChanges, images, itemStarted,
 mcpTools, permissions, planMode, questions, reasoning, sessionLifecycle, sharedProcess,
 status, streamingDeltas, textMessages, toolCalls, toolResults
</details>

<!-- cr-comment:v1:1b544a4fad3cea700526d400 -->

_Source: Linters/SAST tools_

</blockquote></details>
<details>
<summary>docs/design/agent-workflows/scratch/wp-3-daytona-sandbox/poc/bench_coldstart.py-33-42 (1)</summary><blockquote>

`33-42`: _⚠️ Potential issue_ | _🟡 Minor_ | _⚡ Quick win_

**Ensure sandbox teardown runs even when a benchmark iteration errors.**

`daytona.delete(sb)` is currently best-effort on the happy path only; an exception after `create()` can leak a sandbox.




<details>
<summary>♻️ Suggested fix</summary>

```diff
 for snap in SNAPSHOTS:
     times: list[float] = []
     for i in range(N):
-        t = time.monotonic()
-        sb = daytona.create(
-            CreateSandboxFromSnapshotParams(snapshot=snap, auto_stop_interval=0),
-            timeout=120,
-        )
-        dt = time.monotonic() - t
-        times.append(dt)
-        print(f"{snap:20} run {i + 1}/{N}: {dt:.2f}s  state={sb.state}", flush=True)
-        daytona.delete(sb)
+        sb = None
+        t = time.monotonic()
+        try:
+            sb = daytona.create(
+                CreateSandboxFromSnapshotParams(snapshot=snap, auto_stop_interval=0),
+                timeout=120,
+            )
+            dt = time.monotonic() - t
+            times.append(dt)
+            print(f"{snap:20} run {i + 1}/{N}: {dt:.2f}s  state={sb.state}", flush=True)
+        finally:
+            if sb is not None:
+                daytona.delete(sb)
     results[snap] = times
docs/design/agent-workflows/scratch/wp-8-rivet-acp-runtime/poc/commit_agent_config.py-62-71 (1)

62-71: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add error handling after POST commit.

Line 62–67 makes a POST to the commit endpoint but line 69 calls resp.json() without first checking for errors.

The GET request (lines 25–30) properly calls raise_for_status(), but the POST does not. If the commit fails, resp.json() could raise an exception if the response body is not JSON (e.g., on a 500 error).

         resp = client.post(
             f"{BASE}/api/workflows/revisions/commit",
             params={"project_id": PROJ},
             headers=H,
             json=body,
         )
+        resp.raise_for_status()
         print("commit status:", resp.status_code)
docs/design/agent-workflows/scratch/wp-1-pi-tracing/integrating-the-tracing-extension.md-23-23 (1)

23-23: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add languages to fenced code blocks to satisfy markdownlint.

These fences are missing language identifiers (MD040), which can fail doc lint pipelines.

Also applies to: 147-147, 161-161

Source: Linters/SAST tools

docs/design/agent-workflows/scratch/wp-1-pi-tracing/poc/README.md-15-15 (1)

15-15: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Specify language for fenced blocks to clear MD040.

Both fences should include a language token (for example text) to keep markdownlint clean.

Also applies to: 67-67

Source: Linters/SAST tools

docs/design/agent-workflows/scratch/wp-1-pi-tracing/tracing-in-the-agent-service.md-41-42 (1)

41-42: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix Python source path references to services/oss/src/agent/app.py.

The doc currently points readers to services/oss/src/agent.py, but the _agent flow is implemented under services/oss/src/agent/app.py, so navigation/debugging from this guide will be misleading.

Suggested doc fix
-1. **Capture (Python, `services/oss/src/agent.py`).** Inside the instrumented
+1. **Capture (Python, `services/oss/src/agent/app.py`).** Inside the instrumented
...
-- `services/oss/src/agent.py` — `_trace_context()` captures the workflow span context.
+- `services/oss/src/agent/app.py` — `_trace_context()` captures the workflow span context.

Also applies to: 110-110

docs/design/agent-workflows/scratch/wp-1-pi-tracing/tracing-in-the-agent-service.md-25-25 (1)

25-25: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add language identifiers on fenced code blocks (MD040).

These code fences are missing language tags, which can break markdownlint in CI.

Also applies to: 101-101

Source: Linters/SAST tools

docs/design/agent-workflows/scratch/wp-1-pi-tracing/README.md-41-41 (1)

41-41: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Align OTLP export docs with the implemented auth contract.

This line documents ?project_id=<uuid>, but the companion integration doc and POC exporter use API-key-only routing without project_id. Keeping both patterns in the same WP docs is confusing for adopters.

Suggested doc fix
-- Export over OTLP/HTTP protobuf with `Authorization: ApiKey <key>` and `?project_id=<uuid>`.
+- Export over OTLP/HTTP protobuf with `Authorization: ApiKey <key>` (project resolved from the key).
docs/design/agent-workflows/scratch/wp-1-pi-tracing/poc/run.ts-173-178 (1)

173-178: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Reset runConfig.traceId before each prompt iteration.

Without clearing it, a failed prompt can leave the previous run’s trace ID in place and be re-recorded as if it belonged to the current prompt.

Suggested fix
   for (let i = 0; i < scenario.prompts.length; i++) {
     const p = scenario.prompts[i];
     console.log(`\n[run] prompt ${i + 1}/${scenario.prompts.length}: ${p}\n`);
+    runConfig.traceId = undefined;
     await session.prompt(p);
     if (runConfig.traceId) traceIds.push(runConfig.traceId);
   }
services/oss/src/agent/tools.py-100-105 (1)

100-105: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Missing error handling for malformed JSON response.

If the backend returns a non-JSON response (e.g., an HTML error page from a proxy), response.json() will raise JSONDecodeError. Consider wrapping in try/except to provide a clearer error message.

Proposed fix
     if response.status_code >= 400:
         raise RuntimeError(
             f"Tool resolution failed (HTTP {response.status_code}): {response.text[:500]}"
         )

-    data = response.json()
+    try:
+        data = response.json()
+    except Exception as exc:
+        raise RuntimeError(
+            f"Tool resolution returned invalid JSON: {response.text[:200]}"
+        ) from exc
services/oss/src/agent/config.py-66-70 (1)

66-70: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Missing error handling for malformed agent.json.

If agent.json exists but contains invalid JSON, json.loads will raise a JSONDecodeError that propagates up, potentially crashing the agent startup with an unclear error. Consider catching and logging/falling back gracefully, similar to how a missing file is handled.

Proposed fix
     meta_path = base / "agent.json"
     if meta_path.exists():
-        meta = json.loads(meta_path.read_text(encoding="utf-8"))
-        model = meta.get("model") or DEFAULT_MODEL
-        tools = meta.get("tools", []) or []
+        try:
+            meta = json.loads(meta_path.read_text(encoding="utf-8"))
+            model = meta.get("model") or DEFAULT_MODEL
+            tools = meta.get("tools", []) or []
+        except json.JSONDecodeError as exc:
+            from agenta.sdk.utils.logging import get_module_logger
+            get_module_logger(__name__).warning(
+                "agent: invalid agent.json, using defaults: %s", exc
+            )
services/agent/src/server.ts-71-74 (1)

71-74: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Stack trace exposure in error responses.

The catch block exposes err.stack to the client, which can leak internal file paths, library versions, and implementation details. Consider logging the full error to stderr and returning only a generic message to the client.

Proposed fix
   } catch (err) {
-    const message = err instanceof Error ? err.stack ?? err.message : String(err);
+    const detail = err instanceof Error ? err.stack ?? err.message : String(err);
+    process.stderr.write(`[pi-wrapper] request error: ${detail}\n`);
+    const message = err instanceof Error ? err.message : "Internal server error";
     return send(res, 500, { ok: false, error: message });
   }

Source: Linters/SAST tools

sdks/python/agenta/sdk/engines/running/utils.py-248-252 (1)

248-252: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Catalog description is outdated and understates supported harnesses.

At Line 251, the description says this workflow runs on “the Pi harness,” but this PR wires selectable harness/sandbox combinations. The catalog text should be generalized to avoid misleading users.

web/packages/agenta-entity-ui/src/DrillInView/SchemaControls/AgentConfigControl.tsx-139-157 (1)

139-157: ⚠️ Potential issue | 🟡 Minor

Use stable keys for tool rows instead of array index.

Tools are added/deleted via handleAddTool and handleToolDelete, which mutates the array. Index-based keys cause React to remap component state to the wrong tool row after these mutations, producing stale editor state.

Use tool.agenta_metadata?.toolCode ?? toolName(tool) ?? index as a stable key, since toolCode is populated when tools are selected and toolName extracts the function name fallback.

Suggested fix
 {tools.map((tool, index) => {
+    const stableToolKey =
+        typeof tool === "object" && tool
+            ? String(
+                  (tool as Record<string, any>)?.agenta_metadata?.toolCode ??
+                      toolName(tool) ??
+                      index,
+              )
+            : String(index)
     const control = (
         <ToolItemControl
-            key={`tool-${index}`}
             value={tool}
             onChange={(v) => handleToolChange(index, v)}
             onDelete={disabled ? undefined : () => handleToolDelete(index)}
             disabled={disabled}
         />
     )
     return EditorProvider ? (
         <EditorProvider
-            key={`tool-editor-${index}`}
+            key={`tool-editor-${stableToolKey}`}
             codeOnly
             language="json"
             showToolbar={false}
             enableTokens={false}
-            id={`agent-tool-editor-${index}`}
+            id={`agent-tool-editor-${stableToolKey}`}
         >
             {control}
         </EditorProvider>
     ) : (
         control
     )
 })}
services/agent/README.md-24-40 (1)

24-40: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add a language identifier to the fenced block.

This fence triggers markdownlint MD040 and can be fixed by marking it as text.

💡 Proposed fix
-```
+```text
 src/
   cli.ts              entrypoint: stdin/stdout (subprocess transport)
   server.ts           entrypoint: HTTP sidecar on :8765
@@
-```
+```

Source: Linters/SAST tools

🧹 Nitpick comments (3)
docs/design/agent-workflows/scratch/wp-8-rivet-acp-runtime/status.md (1)

94-98: ⚡ Quick win

Document the Daytona Pi model catalog limitation as a follow-up.

The known limitation (freshly-provisioned Pi advertises only model: default on Daytona) is well-documented here. Consider opening a follow-up issue to settle the in-sandbox Pi model config if not already tracked.

Do you want me to help create a follow-up issue to track the Daytona Pi model catalog setup?

docs/design/agent-workflows/scratch/wp-8-rivet-acp-runtime/poc/commit_agent_config.py (1)

35-53: 💤 Low value

Add defensive checks for schema structure.

Lines 35–53 directly access nested dict keys (data["schemas"]["parameters"]["properties"]) without validating the schema structure. If the revision data has a different structure, a KeyError will be raised.

For a PoC this is acceptable, but for production robustness, add:

+        schemas = data.get("schemas", {})
+        parameters = schemas.get("parameters", {})
+        props = parameters.get("properties", {})
-        props = data["schemas"]["parameters"]["properties"]
services/oss/src/agent/schemas.py (1)

13-20: ⚡ Quick win

Consider importing defaults from config.py to reduce duplication.

_DEFAULT_MODEL and _DEFAULT_AGENTS_MD are duplicated here and in config.py. While the comment acknowledges the need to "keep in sync," importing from config.py would eliminate drift risk.

-_DEFAULT_MODEL = "gpt-5.5"
-_DEFAULT_AGENTS_MD = (
-    "You are a friendly hello-world agent running on the Agenta agent service.\n\n"
-    "- Greet the user warmly.\n"
-    "- Answer the user's message in one or two short sentences."
-)
+from oss.src.agent.config import DEFAULT_MODEL as _DEFAULT_MODEL, DEFAULT_AGENTS_MD as _DEFAULT_AGENTS_MD

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 27cfac53-0d07-4c8c-b8eb-9188f96ee1e4

📥 Commits

Reviewing files that changed from the base of the PR and between 5722ff7 and 7be759e.

⛔ Files ignored due to path filters (2)
  • docs/design/agent-workflows/scratch/wp-1-pi-tracing/poc/pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml
  • services/agent/pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml
📒 Files selected for processing (102)
  • .gitignore
  • api/oss/src/apis/fastapi/tools/models.py
  • api/oss/src/apis/fastapi/tools/router.py
  • api/oss/src/core/tools/dtos.py
  • api/oss/src/core/tools/exceptions.py
  • api/oss/src/core/tools/service.py
  • docs/design/agent-workflows/README.md
  • docs/design/agent-workflows/adapters/claude-code.md
  • docs/design/agent-workflows/adapters/pi.md
  • docs/design/agent-workflows/architecture.md
  • docs/design/agent-workflows/ports-and-adapters.md
  • docs/design/agent-workflows/scratch/harness-port-redesign/README.md
  • docs/design/agent-workflows/scratch/harness-port-redesign/implementation.md
  • docs/design/agent-workflows/scratch/harness-port-redesign/plan.md
  • docs/design/agent-workflows/scratch/harness-port-redesign/proposal.md
  • docs/design/agent-workflows/scratch/harness-port-redesign/research.md
  • docs/design/agent-workflows/scratch/harness-port-redesign/status.md
  • docs/design/agent-workflows/scratch/research/auth-secrets.md
  • docs/design/agent-workflows/scratch/research/daytona-sandbox.md
  • docs/design/agent-workflows/scratch/research/diskless-in-memory-config.md
  • docs/design/agent-workflows/scratch/research/open-questions.md
  • docs/design/agent-workflows/scratch/research/otel-instrumentation.md
  • docs/design/agent-workflows/scratch/research/pi-interaction.md
  • docs/design/agent-workflows/scratch/research/sandbox-sharing.md
  • docs/design/agent-workflows/scratch/wp-1-pi-tracing/README.md
  • docs/design/agent-workflows/scratch/wp-1-pi-tracing/integrating-the-tracing-extension.md
  • docs/design/agent-workflows/scratch/wp-1-pi-tracing/poc/.env.example
  • docs/design/agent-workflows/scratch/wp-1-pi-tracing/poc/README.md
  • docs/design/agent-workflows/scratch/wp-1-pi-tracing/poc/agenta-otel.ts
  • docs/design/agent-workflows/scratch/wp-1-pi-tracing/poc/package.json
  • docs/design/agent-workflows/scratch/wp-1-pi-tracing/poc/run.ts
  • docs/design/agent-workflows/scratch/wp-1-pi-tracing/tracing-in-the-agent-service.md
  • docs/design/agent-workflows/scratch/wp-2-agent-service/README.md
  • docs/design/agent-workflows/scratch/wp-2-agent-service/implementation-plan.md
  • docs/design/agent-workflows/scratch/wp-2-agent-service/qa.md
  • docs/design/agent-workflows/scratch/wp-3-daytona-sandbox/README.md
  • docs/design/agent-workflows/scratch/wp-3-daytona-sandbox/poc/README.md
  • docs/design/agent-workflows/scratch/wp-3-daytona-sandbox/poc/bench_coldstart.py
  • docs/design/agent-workflows/scratch/wp-3-daytona-sandbox/poc/build_snapshot.py
  • docs/design/agent-workflows/scratch/wp-3-daytona-sandbox/poc/cleanup.py
  • docs/design/agent-workflows/scratch/wp-3-daytona-sandbox/poc/run_agent.py
  • docs/design/agent-workflows/scratch/wp-4-multi-message-output/README.md
  • docs/design/agent-workflows/scratch/wp-5-chat-vs-completion/README.md
  • docs/design/agent-workflows/scratch/wp-6-workflow-type-and-template/README.md
  • docs/design/agent-workflows/scratch/wp-7-tools/README.md
  • docs/design/agent-workflows/scratch/wp-8-rivet-acp-runtime/README.md
  • docs/design/agent-workflows/scratch/wp-8-rivet-acp-runtime/architecture.md
  • docs/design/agent-workflows/scratch/wp-8-rivet-acp-runtime/context.md
  • docs/design/agent-workflows/scratch/wp-8-rivet-acp-runtime/isolation-and-fork.md
  • docs/design/agent-workflows/scratch/wp-8-rivet-acp-runtime/plan.md
  • docs/design/agent-workflows/scratch/wp-8-rivet-acp-runtime/poc/build_rivet_snapshot.py
  • docs/design/agent-workflows/scratch/wp-8-rivet-acp-runtime/poc/commit_agent_config.py
  • docs/design/agent-workflows/scratch/wp-8-rivet-acp-runtime/poc/debug-events.ts
  • docs/design/agent-workflows/scratch/wp-8-rivet-acp-runtime/poc/dump-full.ts
  • docs/design/agent-workflows/scratch/wp-8-rivet-acp-runtime/poc/package.json
  • docs/design/agent-workflows/scratch/wp-8-rivet-acp-runtime/poc/spike.ts
  • docs/design/agent-workflows/scratch/wp-8-rivet-acp-runtime/research.md
  • docs/design/agent-workflows/scratch/wp-8-rivet-acp-runtime/status.md
  • docs/design/agent-workflows/sessions.md
  • hosting/docker-compose/ee/docker-compose.dev.yml
  • sdks/python/agenta/sdk/engines/running/interfaces.py
  • sdks/python/agenta/sdk/engines/running/utils.py
  • services/agent/.dockerignore
  • services/agent/README.md
  • services/agent/config/AGENTS.md
  • services/agent/config/agent.json
  • services/agent/docker/Dockerfile.dev
  • services/agent/package.json
  • services/agent/scripts/build-extension.mjs
  • services/agent/src/cli.ts
  • services/agent/src/engines/pi.ts
  • services/agent/src/engines/rivet.ts
  • services/agent/src/extensions/agenta.ts
  • services/agent/src/protocol.ts
  • services/agent/src/server.ts
  • services/agent/src/tools/client.ts
  • services/agent/src/tools/mcp-bridge.ts
  • services/agent/src/tools/mcp-server.ts
  • services/agent/src/tools/relay.ts
  • services/agent/src/tracing/otel.ts
  • services/agent/tsconfig.json
  • services/entrypoints/main.py
  • services/oss/src/agent/__init__.py
  • services/oss/src/agent/app.py
  • services/oss/src/agent/client.py
  • services/oss/src/agent/config.py
  • services/oss/src/agent/inputs.py
  • services/oss/src/agent/schemas.py
  • services/oss/src/agent/secrets.py
  • services/oss/src/agent/tools.py
  • services/oss/src/agent/tracing.py
  • services/oss/src/harness/__init__.py
  • services/oss/src/harness/environment.py
  • services/oss/src/harness/ports.py
  • services/oss/src/harness/transports.py
  • services/oss/src/harness/wire.py
  • web/oss/src/components/pages/app-management/components/CreateAppDropdown/index.tsx
  • web/oss/src/components/pages/app-management/modals/CreateAppTypeModal/index.tsx
  • web/oss/src/components/pages/prompts/assets/iconHelpers.tsx
  • web/packages/agenta-entities/src/workflow/state/appUtils.ts
  • web/packages/agenta-entity-ui/src/DrillInView/SchemaControls/AgentConfigControl.tsx
  • web/packages/agenta-entity-ui/src/DrillInView/SchemaControls/SchemaPropertyRenderer.tsx

Comment on lines +489 to +520
async def resolve_agent_tools(
self,
*,
project_id: UUID,
tools: List[AgentToolReference],
) -> AgentToolsResolution:
"""Resolve an agent's tool references into model-ready specs.

``builtin`` references pass through as names. ``composio`` references are
validated against the project's connections up front and enriched from the
catalog (description + input schema), so the model never sees a stale schema
and the invoke fails fast on a missing/invalid connection rather than mid-loop.
"""
builtins: List[str] = []
custom: List[ResolvedAgentTool] = []

for ref in tools:
if isinstance(ref, AgentBuiltinTool):
if ref.name:
builtins.append(ref.name)
continue

if isinstance(ref, AgentComposioTool):
custom.append(
await self._resolve_composio_tool(
project_id=project_id,
ref=ref,
)
)

return AgentToolsResolution(builtins=builtins, custom=custom)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Enforce unique resolved tool names to avoid cross-connection misrouting.

Line 559 defaults names to {integration}__{action} and drops connection, so two refs for different connections can resolve to the same name. Downstream consumers key by name (e.g., MCP map), which can overwrite one tool with another and execute against the wrong account.

💡 Suggested fix
 async def resolve_agent_tools(
@@
-        builtins: List[str] = []
-        custom: List[ResolvedAgentTool] = []
+        builtins: List[str] = []
+        custom: List[ResolvedAgentTool] = []
+        seen_names: set[str] = set()
@@
             if isinstance(ref, AgentComposioTool):
-                custom.append(
-                    await self._resolve_composio_tool(
-                        project_id=project_id,
-                        ref=ref,
-                    )
-                )
+                resolved = await self._resolve_composio_tool(
+                    project_id=project_id,
+                    ref=ref,
+                )
+                if resolved.name in seen_names:
+                    raise ToolSlugInvalidError(
+                        slug=resolved.name,
+                        detail="Duplicate tool name in resolved tools list",
+                    )
+                seen_names.add(resolved.name)
+                custom.append(resolved)
@@
-        name = ref.name or f"{ref.integration}__{ref.action}"
+        name = ref.name or f"{ref.integration}__{ref.action}__{ref.connection}"

Also applies to: 559-562

@@ -0,0 +1,7 @@
# Agenta collector (the runner also falls back to the repo-root .env.test.local).
AGENTA_HOST=http://144.76.237.122:8280/

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Use a neutral placeholder for AGENTA_HOST in the sample env.

A hardcoded real host in .env.example can unintentionally route traces to an external environment during quick-start runs.

Suggested change
-AGENTA_HOST=http://144.76.237.122:8280/
+AGENTA_HOST=http://localhost:8280
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
AGENTA_HOST=http://144.76.237.122:8280/
AGENTA_HOST=http://localhost:8280

Comment on lines +397 to +404
pi.on("agent_end", async (event: any) => {
if (!agentSpan) return;
setOutput(agentSpan, lastAssistantText(event?.messages));
agentSpan.end();
agentSpan = undefined;
agentCtx = undefined;
lastContextMessages = undefined;
});

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Close and clear dangling tool spans at run end.

Run teardown does not currently drain toolSpans, so interrupted runs can leave stale spans in memory and contaminate later runs in the same process.

Suggested fix
   pi.on("agent_end", async (event: any) => {
     if (!agentSpan) return;
+    for (const span of toolSpans.values()) {
+      span.setStatus({ code: SpanStatusCode.ERROR, message: "closed on agent_end" });
+      span.end();
+    }
+    toolSpans.clear();
+
+    if (llmSpan) {
+      llmSpan.setStatus({ code: SpanStatusCode.ERROR, message: "closed on agent_end" });
+      llmSpan.end();
+      llmSpan = undefined;
+    }
+
     setOutput(agentSpan, lastAssistantText(event?.messages));
     agentSpan.end();
     agentSpan = undefined;
     agentCtx = undefined;
     lastContextMessages = undefined;
+    pendingPrompt = undefined;
   });

Comment on lines +146 to +183
const { session } = await createAgentSession({
cwd,
model,
authStorage,
modelRegistry,
tools: ["read", "bash", "edit", "write", "ls"],
sessionManager: SessionManager.inMemory(cwd),
resourceLoader: loader,
});

// Hand the session id + model to the extension so spans carry them.
runConfig.sessionId = session.sessionId;
runConfig.provider = model.provider;
runConfig.requestModel = model.id;

session.subscribe((event: any) => {
if (
event.type === "message_update" &&
event.assistantMessageEvent?.type === "text_delta"
) {
process.stdout.write(event.assistantMessageEvent.delta);
} else if (event.type === "tool_execution_start") {
process.stdout.write(`\n[tool] ${event.toolName}\n`);
}
});

const traceIds: string[] = [];
for (let i = 0; i < scenario.prompts.length; i++) {
const p = scenario.prompts[i];
console.log(`\n[run] prompt ${i + 1}/${scenario.prompts.length}: ${p}\n`);
await session.prompt(p);
if (runConfig.traceId) traceIds.push(runConfig.traceId);
}

console.log("\n\n[run] flushing spans to Agenta...");
session.dispose();
await shutdownTracing();

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Move session disposal and tracing shutdown into finally.

If any prompt throws, the current flow skips session.dispose() and shutdownTracing(), which can lose final spans and leave session resources open.

Suggested fix
   const { session } = await createAgentSession({
     cwd,
     model,
     authStorage,
     modelRegistry,
     tools: ["read", "bash", "edit", "write", "ls"],
     sessionManager: SessionManager.inMemory(cwd),
     resourceLoader: loader,
   });

-  // Hand the session id + model to the extension so spans carry them.
-  runConfig.sessionId = session.sessionId;
-  runConfig.provider = model.provider;
-  runConfig.requestModel = model.id;
-
-  session.subscribe((event: any) => {
-    ...
-  });
-
-  const traceIds: string[] = [];
-  for (let i = 0; i < scenario.prompts.length; i++) {
-    ...
-    await session.prompt(p);
-    if (runConfig.traceId) traceIds.push(runConfig.traceId);
-  }
-
-  console.log("\n\n[run] flushing spans to Agenta...");
-  session.dispose();
-  await shutdownTracing();
+  try {
+    runConfig.sessionId = session.sessionId;
+    runConfig.provider = model.provider;
+    runConfig.requestModel = model.id;
+
+    session.subscribe((event: any) => {
+      ...
+    });
+
+    const traceIds: string[] = [];
+    for (let i = 0; i < scenario.prompts.length; i++) {
+      ...
+      await session.prompt(p);
+      if (runConfig.traceId) traceIds.push(runConfig.traceId);
+    }
+  } finally {
+    session.dispose();
+    await shutdownTracing();
+  }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const { session } = await createAgentSession({
cwd,
model,
authStorage,
modelRegistry,
tools: ["read", "bash", "edit", "write", "ls"],
sessionManager: SessionManager.inMemory(cwd),
resourceLoader: loader,
});
// Hand the session id + model to the extension so spans carry them.
runConfig.sessionId = session.sessionId;
runConfig.provider = model.provider;
runConfig.requestModel = model.id;
session.subscribe((event: any) => {
if (
event.type === "message_update" &&
event.assistantMessageEvent?.type === "text_delta"
) {
process.stdout.write(event.assistantMessageEvent.delta);
} else if (event.type === "tool_execution_start") {
process.stdout.write(`\n[tool] ${event.toolName}\n`);
}
});
const traceIds: string[] = [];
for (let i = 0; i < scenario.prompts.length; i++) {
const p = scenario.prompts[i];
console.log(`\n[run] prompt ${i + 1}/${scenario.prompts.length}: ${p}\n`);
await session.prompt(p);
if (runConfig.traceId) traceIds.push(runConfig.traceId);
}
console.log("\n\n[run] flushing spans to Agenta...");
session.dispose();
await shutdownTracing();
const { session } = await createAgentSession({
cwd,
model,
authStorage,
modelRegistry,
tools: ["read", "bash", "edit", "write", "ls"],
sessionManager: SessionManager.inMemory(cwd),
resourceLoader: loader,
});
try {
// Hand the session id + model to the extension so spans carry them.
runConfig.sessionId = session.sessionId;
runConfig.provider = model.provider;
runConfig.requestModel = model.id;
session.subscribe((event: any) => {
if (
event.type === "message_update" &&
event.assistantMessageEvent?.type === "text_delta"
) {
process.stdout.write(event.assistantMessageEvent.delta);
} else if (event.type === "tool_execution_start") {
process.stdout.write(`\n[tool] ${event.toolName}\n`);
}
});
const traceIds: string[] = [];
for (let i = 0; i < scenario.prompts.length; i++) {
const p = scenario.prompts[i];
console.log(`\n[run] prompt ${i + 1}/${scenario.prompts.length}: ${p}\n`);
await session.prompt(p);
if (runConfig.traceId) traceIds.push(runConfig.traceId);
}
} finally {
console.log("\n\n[run] flushing spans to Agenta...");
session.dispose();
await shutdownTracing();
}

Comment on lines +446 to +447
AGENTA_HOST: ${AGENTA_HOST:-http://144.76.237.122:8280}
AGENTA_API_KEY: ${AGENTA_API_KEY:-}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Avoid a public-IP fallback for AGENTA_HOST in dev.

Line 446 defaults to an external host, which can unintentionally send local trace/tool payloads outside the dev stack when AGENTA_HOST is unset.

Suggested fix
-            AGENTA_HOST: ${AGENTA_HOST:-http://144.76.237.122:8280}
+            AGENTA_HOST: ${AGENTA_HOST:-http://api:8000}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
AGENTA_HOST: ${AGENTA_HOST:-http://144.76.237.122:8280}
AGENTA_API_KEY: ${AGENTA_API_KEY:-}
AGENTA_HOST: ${AGENTA_HOST:-http://api:8000}
AGENTA_API_KEY: ${AGENTA_API_KEY:-}

Comment on lines +225 to +241
/** Build a parent Context from a W3C traceparent string, or undefined if absent/invalid. */
function parentContext(traceparent?: string): Context | undefined {
if (!traceparent) return undefined;
const match = /^00-([0-9a-f]{32})-([0-9a-f]{16})-([0-9a-f]{2})$/.exec(
traceparent.trim(),
);
if (!match) return undefined;
const [, traceId, spanId, flags] = match;
const spanContext: SpanContext = {
traceId,
spanId,
// Honor the incoming sampled bit; default to sampled so child spans record.
traceFlags: (parseInt(flags, 16) & 1) === 1 ? TraceFlags.SAMPLED : TraceFlags.NONE,
isRemote: true,
};
return trace.setSpanContext(ROOT_CONTEXT, spanContext);
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
rg -n --type=ts '\bbaggage\b' services/agent/src/engines/pi.ts services/agent/src/engines/rivet.ts services/agent/src/tracing/otel.ts

Repository: Agenta-AI/agenta

Length of output: 376


🏁 Script executed:

# Check the current implementation of parentContext and surrounding context
sed -n '220,245p' services/agent/src/tracing/otel.ts

Repository: Agenta-AI/agenta

Length of output: 1035


🏁 Script executed:

# Check the RunConfig type definition and the lines mentioned
sed -n '250,260p' services/agent/src/tracing/otel.ts

Repository: Agenta-AI/agenta

Length of output: 569


🏁 Script executed:

# Check the specific lines mentioned: 423-431, 468-469
sed -n '420,435p' services/agent/src/tracing/otel.ts
sed -n '465,475p' services/agent/src/tracing/otel.ts

Repository: Agenta-AI/agenta

Length of output: 1212


🏁 Script executed:

# Check the imports to see what's currently imported from OpenTelemetry
head -30 services/agent/src/tracing/otel.ts

Repository: Agenta-AI/agenta

Length of output: 1886


🏁 Script executed:

# Check if propagation is imported and what the complete imports look like
sed -n '1,50p' services/agent/src/tracing/otel.ts | grep -A 20 "^import"

Repository: Agenta-AI/agenta

Length of output: 478


🏁 Script executed:

# Check line 726-727 to see the other call site
sed -n '720,732p' services/agent/src/tracing/otel.ts

Repository: Agenta-AI/agenta

Length of output: 692


🏁 Script executed:

# Check rivet.ts to see if it has the same pattern issue
sed -n '715,730p' services/agent/src/engines/rivet.ts

Repository: Agenta-AI/agenta

Length of output: 729


🏁 Script executed:

# Check if propagation is available in `@opentelemetry/api` by searching the codebase for its usage
rg -n 'propagation\.' services/agent/src --type=ts

Repository: Agenta-AI/agenta

Length of output: 42


🏁 Script executed:

# Also check what version of `@opentelemetry/api` is being used
grep -A 5 '"`@opentelemetry/api`"' package.json

Repository: Agenta-AI/agenta

Length of output: 106


🏁 Script executed:

# Verify the RunConfig object assignment at lines 423-431 - check what fields are actually assigned
sed -n '418,435p' services/agent/src/tracing/otel.ts

Repository: Agenta-AI/agenta

Length of output: 614


🏁 Script executed:

# Find package.json in the repo
find . -name "package.json" -type f | head -5

Repository: Agenta-AI/agenta

Length of output: 339


🏁 Script executed:

# Check if propagation is being used anywhere else in the codebase
rg 'propagation' services/agent/src --type=ts -A 2 -B 2

Repository: Agenta-AI/agenta

Length of output: 42


🏁 Script executed:

# Look for any W3C baggage related imports or patterns
rg 'baggage|Baggage' services/agent/src --type=ts -B 2 -A 2

Repository: Agenta-AI/agenta

Length of output: 1583


baggage is wired upstream but never propagated into tracing context.

RunConfig accepts baggage from callers (pi.ts:180, rivet.ts:724) but it is not stored in the config object during initialization, not passed to parentContext(), and not handled by parentContext() which only parses traceparent. Baggage context is dropped on both Pi and Rivet paths.

The fix requires:

  • Adding baggage: init.baggage to the config object assignment (line 423-431)
  • Updating parentContext() signature to accept baggage?: string
  • Using OpenTelemetry's standard propagation.extract() API to handle both W3C traceparent and baggage headers
  • Passing config.baggage to all parentContext() calls (lines 468, 726-727)
Suggested fix
 import {
   context,
+  propagation,
   ROOT_CONTEXT,
   trace,
@@
-function parentContext(traceparent?: string): Context | undefined {
-  if (!traceparent) return undefined;
-  const match = /^00-([0-9a-f]{32})-([0-9a-f]{16})-([0-9a-f]{2})$/.exec(
-    traceparent.trim(),
-  );
-  if (!match) return undefined;
-  const [, traceId, spanId, flags] = match;
-  const spanContext: SpanContext = {
-    traceId,
-    spanId,
-    traceFlags: (parseInt(flags, 16) & 1) === 1 ? TraceFlags.SAMPLED : TraceFlags.NONE,
-    isRemote: true,
-  };
-  return trace.setSpanContext(ROOT_CONTEXT, spanContext);
+function parentContext(traceparent?: string, baggage?: string): Context | undefined {
+  const carrier: Record<string, string> = {};
+  if (traceparent?.trim()) carrier.traceparent = traceparent.trim();
+  if (baggage?.trim()) carrier.baggage = baggage.trim();
+  if (!carrier.traceparent && !carrier.baggage) return undefined;
+  const extracted = propagation.extract(ROOT_CONTEXT, carrier);
+  return trace.getSpanContext(extracted) ? extracted : undefined;
 }
@@
   const config: RunConfig = {
@@
+    baggage: init.baggage,
@@
-      const parent = parentContext(config.traceparent);
+      const parent = parentContext(config.traceparent, config.baggage);
@@
-    const parent = parentContext(init.traceparent);
+    const parent = parentContext(init.traceparent, init.baggage);
🧰 Tools
🪛 OpenGrep (1.22.0)

[ERROR] 228-230: Dynamic command passed to child_process.exec/execSync. Use child_process.execFile or spawn with an argument array instead.

(coderabbit.command-injection.exec-js)

Comment on lines +93 to +97
try:
session = harness.create_session(session_config)
result = await session.prompt(msgs)
await session.destroy()
finally:

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Session cleanup is skipped on prompt failures.

At Line 94-97, session.destroy() executes only on success. If session.prompt(...) raises, session resources remain undisposed.

Suggested fix
     await harness.setup()
     try:
         session = harness.create_session(session_config)
-        result = await session.prompt(msgs)
-        await session.destroy()
+        try:
+            result = await session.prompt(msgs)
+        finally:
+            await session.destroy()
     finally:
         await harness.shutdown()
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
try:
session = harness.create_session(session_config)
result = await session.prompt(msgs)
await session.destroy()
finally:
await harness.setup()
try:
session = harness.create_session(session_config)
try:
result = await session.prompt(msgs)
finally:
await session.destroy()
finally:
await harness.shutdown()

Comment on lines +43 to +46
instructions=agent.get("instructions") or config.agents_md,
model=agent.get("model") or config.model,
tools=_as_list(agent.get("tools")),
harness=(

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Default tool configuration is dropped in the agent path.

At Line 45, omitting agent.tools resolves to [], while the legacy path falls back to config.tools. This makes tool availability depend on request shape and can silently disable configured tools.

Suggested fix
     if isinstance(agent, dict):
+        raw_tools = agent.get("tools")
         return RunConfig(
             instructions=agent.get("instructions") or config.agents_md,
             model=agent.get("model") or config.model,
-            tools=_as_list(agent.get("tools")),
+            tools=_as_list(config.tools if raw_tools is None else raw_tools),
             harness=(
                 agent.get("harness") or os.getenv("AGENTA_AGENT_HARNESS", "pi")
             ).lower(),
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
instructions=agent.get("instructions") or config.agents_md,
model=agent.get("model") or config.model,
tools=_as_list(agent.get("tools")),
harness=(
if isinstance(agent, dict):
raw_tools = agent.get("tools")
return RunConfig(
instructions=agent.get("instructions") or config.agents_md,
model=agent.get("model") or config.model,
tools=_as_list(config.tools if raw_tools is None else raw_tools),
harness=(

Comment on lines +164 to +167
content = raw.get("content", "")
if isinstance(content, list):
content = [ContentBlock.from_raw(block) for block in content]
return cls(role=str(raw["role"]), content=content)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Message.from_raw can produce invalid wire payloads for dict content.

At Line 164-167, dict content is not normalized. Later (Line 146-150), to_wire treats non-string content as a list, so dict content is iterated as keys, corrupting the outgoing message.

Suggested fix
         content = raw.get("content", "")
-        if isinstance(content, list):
+        if isinstance(content, dict):
+            content = [ContentBlock.from_raw(content)]
+        elif isinstance(content, list):
             content = [ContentBlock.from_raw(block) for block in content]
+        elif not isinstance(content, str):
+            content = str(content)
         return cls(role=str(raw["role"]), content=content)

Comment on lines +32 to +33
_DEFAULT_TIMEOUT = float(os.getenv("AGENTA_AGENT_TIMEOUT", "180"))
_DEFAULT_COMMAND = ["pnpm", "exec", "tsx", "src/cli.ts"]

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Import-time timeout parsing can crash the service on bad env values.

At Line 32, a non-numeric AGENTA_AGENT_TIMEOUT raises ValueError during import, taking down startup instead of falling back safely.

Suggested fix
-_DEFAULT_TIMEOUT = float(os.getenv("AGENTA_AGENT_TIMEOUT", "180"))
+def _resolve_default_timeout() -> float:
+    raw = os.getenv("AGENTA_AGENT_TIMEOUT", "180")
+    try:
+        value = float(raw)
+        return value if value > 0 else 180.0
+    except (TypeError, ValueError):
+        log.warning("Invalid AGENTA_AGENT_TIMEOUT=%r; using 180s", raw)
+        return 180.0
+
+_DEFAULT_TIMEOUT = _resolve_default_timeout()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants