feat(agent): run coding agents as an Agenta workflow over ACP#4721
feat(agent): run coding agents as an Agenta workflow over ACP#4721mmabrouk wants to merge 9 commits into
Conversation
… docs - New agent workflow service wrapping the Pi harness, served same-origin like chat/completion at /services/agent/v0 (Python service + Node Pi sidecar, ports/adapters). - Builtin 'agent' app type + create-app template; config is model + AGENTS.md. - /inspect chat schema and OpenTelemetry tracing into Agenta. - EE dev compose agent-pi sidecar; design docs under docs/design/agent-workflows.
Re-platform the agent workflow service to drive coding harnesses (Pi, Claude Code) over the Agent Client Protocol through a rivet sandbox-agent daemon, behind the unchanged Harness port and /invoke contract. The harness (pi/claude) and sandbox (local/daytona) are editable playground config; tracing nests under the /invoke span; tools are delivered Pi-native via a bundled extension; and the model provider key resolves from the project vault.
Evolve the agent service ports toward the rivet sandbox-agent session shape, so the rivet (ACP) and legacy in-process Pi backends share one clean, capability-aware port. - ports.py: Environment + Harness seams, a first-class AgentSession (create/prompt/ destroy), HarnessCapabilities, ContentBlock, Message, AgentEvent, structured AgentResult. - harness.py: SubprocessHarness + HttpHarness share one wire contract (wire.py), replacing the pi_harness/pi_http_harness/rivet_harness trio. The engine is an env value. - TS: shared protocol.ts; runPi/runRivet return the enriched result; runRivet probes getAgent() capabilities and routes tools by mcpTools, not the harness name; usage flows on the rivet path (split from PromptResponse.usage); one shared toolClient.ts replaces the triplicated /tools/call client. - agent.py uses the session API; _select_backend upgrades pi/local to rivet when the selected harness/sandbox needs it. permission_policy added to /inspect. Verified live: pi, rivet+pi+local, rivet+claude+local, rivet+pi+daytona; playground run succeeds with usage; invoke_agent nests under the /invoke span. Design notes under docs/design/agent-workflows/harness-port-redesign/.
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Plus Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
…ss runtime Address the god-module and the misleading package name: - services/oss/src/harness/ (was agent_pi/): the engine-agnostic runtime — ports.py, transports.py (was harness.py), environment.py, wire.py. Named for the seam, not Pi; harness choice (pi/claude) lives inside the runtime, so there is no agent_claude. - services/oss/src/agent/ (was the 470-line agent.py god-module): the Agenta workflow app — app.py (thin handler + backend wiring), inputs.py (request parsing), tools.py, secrets.py, tracing.py, client.py (shared backend access), schemas.py, config.py. No behavior change. Verified live: a playground run answers 'REFACTOR-OK Lisbon' with usage.
| "content-type": "application/json", | ||
| "content-length": Buffer.byteLength(payload), | ||
| }); | ||
| res.end(payload); |
…ite README
The TS runner's src/ had grown one work package at a time into a flat folder of ten
files with no signal of role. Group them and rewrite the stale README (it still called
this a 'Pi wrapper' and pointed at the moved agent.py):
src/cli.ts, server.ts, protocol.ts entrypoints + the wire contract
src/engines/{pi,rivet}.ts the two engines (was runPi.ts / runRivet.ts)
src/tracing/otel.ts the tracers (was agenta-otel.ts)
src/tools/{client,mcp-bridge,mcp-server}.ts tool delivery (was toolClient/toolBridge*)
src/extensions/agenta.ts the Pi extension (was piExtension.ts)
No behavior change. Updated the fragile __dirname-relative paths in engines/rivet.ts
(PKG_ROOT) and tools/mcp-bridge.ts (the tsx bin + server path) for the new depth, and the
build-extension entry. Verified live: rivet+pi+local through the restarted sidecar answers
'Athens' with usage; tsc --strict and the extension build pass.
Replace the loose model/agents_md/harness/sandbox params with one `agent` config element (x-ag-type: agent_config) carrying instructions, model, tools, harness, sandbox, and permission policy. The playground renders it through a new AgentConfigControl that reuses the existing controls: the model selector, the tool picker (so Composio and builtin tools are finally selectable on the agent), the enum selects, and a textarea. The backend reads it via resolve_agent_config and falls back to the old shape so existing revisions keep running. Verified live: the element renders with the tool picker, and a GitHub Composio tool runs end to end on pi+local.
Tools worked locally but failed on Daytona. The in-sandbox Pi extension POSTed each tool call to Agenta's /tools/call, but a firewalled or private backend does not expose that to the remote cloud sandbox (the same reason tracing is built from the event stream on Daytona rather than in-sandbox OTLP). The sandbox has internet but cannot reach the backend, so the call failed and the model gave up. Route the call through the runner, which can reach Agenta. The extension writes the request to a file in a sandbox dir and polls for the response; the runner watches the dir over the daemon filesystem API, calls /tools/call, and writes the result back (tools/relay.ts). Local runs keep the direct path. Verified programmatically: rivet+pi+daytona with a GitHub Composio tool now returns the real login (was 'the tool failed twice'); local is unchanged.
Move the raw work-package material (wp-1..wp-8, harness-port-redesign,
research) into scratch/ and add clean top-level pages a reviewer can read
top to bottom: a README index, architecture, ports-and-adapters, sessions,
and adapters/{pi,claude-code}. Update the three in-code references that
pointed at the moved doc paths.
There was a problem hiding this comment.
Actionable comments posted: 19
Note
Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
services/agent/docker/Dockerfile.dev (1)
7-42:⚠️ Potential issue | 🟠 Major | ⚡ Quick winRun the container as a non-root user.
There is no
USERdirective, so the runtime process defaults to root.💡 Proposed fix
RUN pnpm run build:extension +RUN chown -R node:node /app ENV NODE_ENV=development \ PORT=8765 EXPOSE 8765 # Call the local tsx binary directly to avoid pnpm/corepack HOME writes when the # container runs as a non-root host uid. +USER node CMD ["node_modules/.bin/tsx", "watch", "src/server.ts"]Source: Linters/SAST tools
🟡 Minor comments (22)
docs/design/agent-workflows/scratch/harness-port-redesign/research.md-3-7 (1)
3-7:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winUpdate the “source verified” file paths to current module locations.
This header cites
services/oss/src/agent_pi/ports.pyandservices/oss/src/agent.py, but current stack context and code snippets point toservices/oss/src/harness/ports.pyand split agent modules. Keeping this exact avoids audit drift.docs/design/agent-workflows/scratch/wp-2-agent-service/README.md-3-3 (1)
3-3:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winRefresh the status line to avoid stale planning state.
Status: not started.conflicts with the implementation-plan/history in this same WP folder and can misdirect readers. Consider marking this as historical/archived instead.docs/design/agent-workflows/scratch/wp-2-agent-service/README.md-55-55 (1)
55-55:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winFix the WP identifier typo.
Success for this WP1should beWP-2in this document.docs/design/agent-workflows/scratch/wp-2-agent-service/implementation-plan.md-153-170 (1)
153-170:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winSpecify a language for this fenced block.
This block is missing a language tag and will continue tripping MD040.
Source: Linters/SAST tools
docs/design/agent-workflows/scratch/wp-2-agent-service/implementation-plan.md-267-271 (1)
267-271:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winFix the
cdpath in the login verification snippet.The path misses the
scratch/segment, so this command won’t resolve in the documented tree.🛠️ Suggested fix
-cd docs/design/agent-workflows/wp-1-pi-tracing/poc +cd docs/design/agent-workflows/scratch/wp-1-pi-tracing/pocdocs/design/agent-workflows/scratch/wp-3-daytona-sandbox/poc/run_agent.py-183-184 (1)
183-184:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winGuard against missing flag values in
arg().If a flag is passed without a value (or followed by another flag), this currently throws
IndexError.✅ Suggested fix
def arg(name: str, default: str) -> str: - return sys.argv[sys.argv.index(name) + 1] if name in sys.argv else default + if name not in sys.argv: + return default + idx = sys.argv.index(name) + 1 + if idx >= len(sys.argv) or sys.argv[idx].startswith("--"): + raise SystemExit(f"{name} requires a value") + return sys.argv[idx]docs/design/agent-workflows/scratch/wp-3-daytona-sandbox/poc/run_agent.py-259-266 (1)
259-266:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winQuote shell arguments when assembling
pi_cmd.
provider,model, and paths are interpolated directly into a shell command. Quoting avoids malformed commands when values include shell-significant characters.🔒 Suggested fix
+import shlex ... - pi_cmd = ( - f"cd {run_dir} && TMPDIR={run_dir}/tmp " - f"pi -p {json.dumps(PROMPT)} " - f"--mode json --approve --provider {provider} --model {model} " + q_run_dir = shlex.quote(run_dir) + q_prompt = shlex.quote(PROMPT) + q_provider = shlex.quote(provider) + q_model = shlex.quote(model) + pi_cmd = ( + f"cd {q_run_dir} && TMPDIR={q_run_dir}/tmp " + f"pi -p {q_prompt} " + f"--mode json --approve --provider {q_provider} --model {q_model} " f"-t read,bash,edit,write,ls " - f"--session-dir {run_dir}/.pi-sessions --name {session_id} " + f"--session-dir {q_run_dir}/.pi-sessions --name {shlex.quote(session_id)} " f"< /dev/null" )docs/design/agent-workflows/scratch/harness-port-redesign/research.md-148-152 (1)
148-152:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winAdd a language identifier to the fenced code block.
The fenced block here is missing a language tag and will keep triggering MD040.
✏️ Suggested fix
-``` +```text commandExecution, errorEvents, fileAttachments, fileChanges, images, itemStarted, mcpTools, permissions, planMode, questions, reasoning, sessionLifecycle, sharedProcess, status, streamingDeltas, textMessages, toolCalls, toolResults</details> <!-- cr-comment:v1:1b544a4fad3cea700526d400 --> _Source: Linters/SAST tools_ </blockquote></details> <details> <summary>docs/design/agent-workflows/scratch/wp-3-daytona-sandbox/poc/bench_coldstart.py-33-42 (1)</summary><blockquote> `33-42`: _⚠️ Potential issue_ | _🟡 Minor_ | _⚡ Quick win_ **Ensure sandbox teardown runs even when a benchmark iteration errors.** `daytona.delete(sb)` is currently best-effort on the happy path only; an exception after `create()` can leak a sandbox. <details> <summary>♻️ Suggested fix</summary> ```diff for snap in SNAPSHOTS: times: list[float] = [] for i in range(N): - t = time.monotonic() - sb = daytona.create( - CreateSandboxFromSnapshotParams(snapshot=snap, auto_stop_interval=0), - timeout=120, - ) - dt = time.monotonic() - t - times.append(dt) - print(f"{snap:20} run {i + 1}/{N}: {dt:.2f}s state={sb.state}", flush=True) - daytona.delete(sb) + sb = None + t = time.monotonic() + try: + sb = daytona.create( + CreateSandboxFromSnapshotParams(snapshot=snap, auto_stop_interval=0), + timeout=120, + ) + dt = time.monotonic() - t + times.append(dt) + print(f"{snap:20} run {i + 1}/{N}: {dt:.2f}s state={sb.state}", flush=True) + finally: + if sb is not None: + daytona.delete(sb) results[snap] = timesdocs/design/agent-workflows/scratch/wp-8-rivet-acp-runtime/poc/commit_agent_config.py-62-71 (1)
62-71:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winAdd error handling after POST commit.
Line 62–67 makes a POST to the commit endpoint but line 69 calls
resp.json()without first checking for errors.The GET request (lines 25–30) properly calls
raise_for_status(), but the POST does not. If the commit fails,resp.json()could raise an exception if the response body is not JSON (e.g., on a 500 error).resp = client.post( f"{BASE}/api/workflows/revisions/commit", params={"project_id": PROJ}, headers=H, json=body, ) + resp.raise_for_status() print("commit status:", resp.status_code)docs/design/agent-workflows/scratch/wp-1-pi-tracing/integrating-the-tracing-extension.md-23-23 (1)
23-23:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winAdd languages to fenced code blocks to satisfy markdownlint.
These fences are missing language identifiers (MD040), which can fail doc lint pipelines.
Also applies to: 147-147, 161-161
Source: Linters/SAST tools
docs/design/agent-workflows/scratch/wp-1-pi-tracing/poc/README.md-15-15 (1)
15-15:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winSpecify language for fenced blocks to clear MD040.
Both fences should include a language token (for example
text) to keep markdownlint clean.Also applies to: 67-67
Source: Linters/SAST tools
docs/design/agent-workflows/scratch/wp-1-pi-tracing/tracing-in-the-agent-service.md-41-42 (1)
41-42:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winFix Python source path references to
services/oss/src/agent/app.py.The doc currently points readers to
services/oss/src/agent.py, but the_agentflow is implemented underservices/oss/src/agent/app.py, so navigation/debugging from this guide will be misleading.Suggested doc fix
-1. **Capture (Python, `services/oss/src/agent.py`).** Inside the instrumented +1. **Capture (Python, `services/oss/src/agent/app.py`).** Inside the instrumented ... -- `services/oss/src/agent.py` — `_trace_context()` captures the workflow span context. +- `services/oss/src/agent/app.py` — `_trace_context()` captures the workflow span context.Also applies to: 110-110
docs/design/agent-workflows/scratch/wp-1-pi-tracing/tracing-in-the-agent-service.md-25-25 (1)
25-25:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winAdd language identifiers on fenced code blocks (MD040).
These code fences are missing language tags, which can break markdownlint in CI.
Also applies to: 101-101
Source: Linters/SAST tools
docs/design/agent-workflows/scratch/wp-1-pi-tracing/README.md-41-41 (1)
41-41:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winAlign OTLP export docs with the implemented auth contract.
This line documents
?project_id=<uuid>, but the companion integration doc and POC exporter use API-key-only routing withoutproject_id. Keeping both patterns in the same WP docs is confusing for adopters.Suggested doc fix
-- Export over OTLP/HTTP protobuf with `Authorization: ApiKey <key>` and `?project_id=<uuid>`. +- Export over OTLP/HTTP protobuf with `Authorization: ApiKey <key>` (project resolved from the key).docs/design/agent-workflows/scratch/wp-1-pi-tracing/poc/run.ts-173-178 (1)
173-178:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winReset
runConfig.traceIdbefore each prompt iteration.Without clearing it, a failed prompt can leave the previous run’s trace ID in place and be re-recorded as if it belonged to the current prompt.
Suggested fix
for (let i = 0; i < scenario.prompts.length; i++) { const p = scenario.prompts[i]; console.log(`\n[run] prompt ${i + 1}/${scenario.prompts.length}: ${p}\n`); + runConfig.traceId = undefined; await session.prompt(p); if (runConfig.traceId) traceIds.push(runConfig.traceId); }services/oss/src/agent/tools.py-100-105 (1)
100-105:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winMissing error handling for malformed JSON response.
If the backend returns a non-JSON response (e.g., an HTML error page from a proxy),
response.json()will raiseJSONDecodeError. Consider wrapping in try/except to provide a clearer error message.Proposed fix
if response.status_code >= 400: raise RuntimeError( f"Tool resolution failed (HTTP {response.status_code}): {response.text[:500]}" ) - data = response.json() + try: + data = response.json() + except Exception as exc: + raise RuntimeError( + f"Tool resolution returned invalid JSON: {response.text[:200]}" + ) from excservices/oss/src/agent/config.py-66-70 (1)
66-70:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winMissing error handling for malformed
agent.json.If
agent.jsonexists but contains invalid JSON,json.loadswill raise aJSONDecodeErrorthat propagates up, potentially crashing the agent startup with an unclear error. Consider catching and logging/falling back gracefully, similar to how a missing file is handled.Proposed fix
meta_path = base / "agent.json" if meta_path.exists(): - meta = json.loads(meta_path.read_text(encoding="utf-8")) - model = meta.get("model") or DEFAULT_MODEL - tools = meta.get("tools", []) or [] + try: + meta = json.loads(meta_path.read_text(encoding="utf-8")) + model = meta.get("model") or DEFAULT_MODEL + tools = meta.get("tools", []) or [] + except json.JSONDecodeError as exc: + from agenta.sdk.utils.logging import get_module_logger + get_module_logger(__name__).warning( + "agent: invalid agent.json, using defaults: %s", exc + )services/agent/src/server.ts-71-74 (1)
71-74:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winStack trace exposure in error responses.
The catch block exposes
err.stackto the client, which can leak internal file paths, library versions, and implementation details. Consider logging the full error to stderr and returning only a generic message to the client.Proposed fix
} catch (err) { - const message = err instanceof Error ? err.stack ?? err.message : String(err); + const detail = err instanceof Error ? err.stack ?? err.message : String(err); + process.stderr.write(`[pi-wrapper] request error: ${detail}\n`); + const message = err instanceof Error ? err.message : "Internal server error"; return send(res, 500, { ok: false, error: message }); }Source: Linters/SAST tools
sdks/python/agenta/sdk/engines/running/utils.py-248-252 (1)
248-252:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winCatalog description is outdated and understates supported harnesses.
At Line 251, the description says this workflow runs on “the Pi harness,” but this PR wires selectable harness/sandbox combinations. The catalog text should be generalized to avoid misleading users.
web/packages/agenta-entity-ui/src/DrillInView/SchemaControls/AgentConfigControl.tsx-139-157 (1)
139-157:⚠️ Potential issue | 🟡 MinorUse stable keys for tool rows instead of array index.
Tools are added/deleted via
handleAddToolandhandleToolDelete, which mutates the array. Index-based keys cause React to remap component state to the wrong tool row after these mutations, producing stale editor state.Use
tool.agenta_metadata?.toolCode ?? toolName(tool) ?? indexas a stable key, sincetoolCodeis populated when tools are selected andtoolNameextracts the function name fallback.Suggested fix
{tools.map((tool, index) => { + const stableToolKey = + typeof tool === "object" && tool + ? String( + (tool as Record<string, any>)?.agenta_metadata?.toolCode ?? + toolName(tool) ?? + index, + ) + : String(index) const control = ( <ToolItemControl - key={`tool-${index}`} value={tool} onChange={(v) => handleToolChange(index, v)} onDelete={disabled ? undefined : () => handleToolDelete(index)} disabled={disabled} /> ) return EditorProvider ? ( <EditorProvider - key={`tool-editor-${index}`} + key={`tool-editor-${stableToolKey}`} codeOnly language="json" showToolbar={false} enableTokens={false} - id={`agent-tool-editor-${index}`} + id={`agent-tool-editor-${stableToolKey}`} > {control} </EditorProvider> ) : ( control ) })}services/agent/README.md-24-40 (1)
24-40:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winAdd a language identifier to the fenced block.
This fence triggers markdownlint MD040 and can be fixed by marking it as
text.💡 Proposed fix
-``` +```text src/ cli.ts entrypoint: stdin/stdout (subprocess transport) server.ts entrypoint: HTTP sidecar on :8765 @@ -``` +```Source: Linters/SAST tools
🧹 Nitpick comments (3)
docs/design/agent-workflows/scratch/wp-8-rivet-acp-runtime/status.md (1)
94-98: ⚡ Quick winDocument the Daytona Pi model catalog limitation as a follow-up.
The known limitation (freshly-provisioned Pi advertises only
model: defaulton Daytona) is well-documented here. Consider opening a follow-up issue to settle the in-sandbox Pi model config if not already tracked.Do you want me to help create a follow-up issue to track the Daytona Pi model catalog setup?
docs/design/agent-workflows/scratch/wp-8-rivet-acp-runtime/poc/commit_agent_config.py (1)
35-53: 💤 Low valueAdd defensive checks for schema structure.
Lines 35–53 directly access nested dict keys (
data["schemas"]["parameters"]["properties"]) without validating the schema structure. If the revision data has a different structure, a KeyError will be raised.For a PoC this is acceptable, but for production robustness, add:
+ schemas = data.get("schemas", {}) + parameters = schemas.get("parameters", {}) + props = parameters.get("properties", {}) - props = data["schemas"]["parameters"]["properties"]services/oss/src/agent/schemas.py (1)
13-20: ⚡ Quick winConsider importing defaults from
config.pyto reduce duplication.
_DEFAULT_MODELand_DEFAULT_AGENTS_MDare duplicated here and inconfig.py. While the comment acknowledges the need to "keep in sync," importing fromconfig.pywould eliminate drift risk.-_DEFAULT_MODEL = "gpt-5.5" -_DEFAULT_AGENTS_MD = ( - "You are a friendly hello-world agent running on the Agenta agent service.\n\n" - "- Greet the user warmly.\n" - "- Answer the user's message in one or two short sentences." -) +from oss.src.agent.config import DEFAULT_MODEL as _DEFAULT_MODEL, DEFAULT_AGENTS_MD as _DEFAULT_AGENTS_MD
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: 27cfac53-0d07-4c8c-b8eb-9188f96ee1e4
⛔ Files ignored due to path filters (2)
docs/design/agent-workflows/scratch/wp-1-pi-tracing/poc/pnpm-lock.yamlis excluded by!**/pnpm-lock.yamlservices/agent/pnpm-lock.yamlis excluded by!**/pnpm-lock.yaml
📒 Files selected for processing (102)
.gitignoreapi/oss/src/apis/fastapi/tools/models.pyapi/oss/src/apis/fastapi/tools/router.pyapi/oss/src/core/tools/dtos.pyapi/oss/src/core/tools/exceptions.pyapi/oss/src/core/tools/service.pydocs/design/agent-workflows/README.mddocs/design/agent-workflows/adapters/claude-code.mddocs/design/agent-workflows/adapters/pi.mddocs/design/agent-workflows/architecture.mddocs/design/agent-workflows/ports-and-adapters.mddocs/design/agent-workflows/scratch/harness-port-redesign/README.mddocs/design/agent-workflows/scratch/harness-port-redesign/implementation.mddocs/design/agent-workflows/scratch/harness-port-redesign/plan.mddocs/design/agent-workflows/scratch/harness-port-redesign/proposal.mddocs/design/agent-workflows/scratch/harness-port-redesign/research.mddocs/design/agent-workflows/scratch/harness-port-redesign/status.mddocs/design/agent-workflows/scratch/research/auth-secrets.mddocs/design/agent-workflows/scratch/research/daytona-sandbox.mddocs/design/agent-workflows/scratch/research/diskless-in-memory-config.mddocs/design/agent-workflows/scratch/research/open-questions.mddocs/design/agent-workflows/scratch/research/otel-instrumentation.mddocs/design/agent-workflows/scratch/research/pi-interaction.mddocs/design/agent-workflows/scratch/research/sandbox-sharing.mddocs/design/agent-workflows/scratch/wp-1-pi-tracing/README.mddocs/design/agent-workflows/scratch/wp-1-pi-tracing/integrating-the-tracing-extension.mddocs/design/agent-workflows/scratch/wp-1-pi-tracing/poc/.env.exampledocs/design/agent-workflows/scratch/wp-1-pi-tracing/poc/README.mddocs/design/agent-workflows/scratch/wp-1-pi-tracing/poc/agenta-otel.tsdocs/design/agent-workflows/scratch/wp-1-pi-tracing/poc/package.jsondocs/design/agent-workflows/scratch/wp-1-pi-tracing/poc/run.tsdocs/design/agent-workflows/scratch/wp-1-pi-tracing/tracing-in-the-agent-service.mddocs/design/agent-workflows/scratch/wp-2-agent-service/README.mddocs/design/agent-workflows/scratch/wp-2-agent-service/implementation-plan.mddocs/design/agent-workflows/scratch/wp-2-agent-service/qa.mddocs/design/agent-workflows/scratch/wp-3-daytona-sandbox/README.mddocs/design/agent-workflows/scratch/wp-3-daytona-sandbox/poc/README.mddocs/design/agent-workflows/scratch/wp-3-daytona-sandbox/poc/bench_coldstart.pydocs/design/agent-workflows/scratch/wp-3-daytona-sandbox/poc/build_snapshot.pydocs/design/agent-workflows/scratch/wp-3-daytona-sandbox/poc/cleanup.pydocs/design/agent-workflows/scratch/wp-3-daytona-sandbox/poc/run_agent.pydocs/design/agent-workflows/scratch/wp-4-multi-message-output/README.mddocs/design/agent-workflows/scratch/wp-5-chat-vs-completion/README.mddocs/design/agent-workflows/scratch/wp-6-workflow-type-and-template/README.mddocs/design/agent-workflows/scratch/wp-7-tools/README.mddocs/design/agent-workflows/scratch/wp-8-rivet-acp-runtime/README.mddocs/design/agent-workflows/scratch/wp-8-rivet-acp-runtime/architecture.mddocs/design/agent-workflows/scratch/wp-8-rivet-acp-runtime/context.mddocs/design/agent-workflows/scratch/wp-8-rivet-acp-runtime/isolation-and-fork.mddocs/design/agent-workflows/scratch/wp-8-rivet-acp-runtime/plan.mddocs/design/agent-workflows/scratch/wp-8-rivet-acp-runtime/poc/build_rivet_snapshot.pydocs/design/agent-workflows/scratch/wp-8-rivet-acp-runtime/poc/commit_agent_config.pydocs/design/agent-workflows/scratch/wp-8-rivet-acp-runtime/poc/debug-events.tsdocs/design/agent-workflows/scratch/wp-8-rivet-acp-runtime/poc/dump-full.tsdocs/design/agent-workflows/scratch/wp-8-rivet-acp-runtime/poc/package.jsondocs/design/agent-workflows/scratch/wp-8-rivet-acp-runtime/poc/spike.tsdocs/design/agent-workflows/scratch/wp-8-rivet-acp-runtime/research.mddocs/design/agent-workflows/scratch/wp-8-rivet-acp-runtime/status.mddocs/design/agent-workflows/sessions.mdhosting/docker-compose/ee/docker-compose.dev.ymlsdks/python/agenta/sdk/engines/running/interfaces.pysdks/python/agenta/sdk/engines/running/utils.pyservices/agent/.dockerignoreservices/agent/README.mdservices/agent/config/AGENTS.mdservices/agent/config/agent.jsonservices/agent/docker/Dockerfile.devservices/agent/package.jsonservices/agent/scripts/build-extension.mjsservices/agent/src/cli.tsservices/agent/src/engines/pi.tsservices/agent/src/engines/rivet.tsservices/agent/src/extensions/agenta.tsservices/agent/src/protocol.tsservices/agent/src/server.tsservices/agent/src/tools/client.tsservices/agent/src/tools/mcp-bridge.tsservices/agent/src/tools/mcp-server.tsservices/agent/src/tools/relay.tsservices/agent/src/tracing/otel.tsservices/agent/tsconfig.jsonservices/entrypoints/main.pyservices/oss/src/agent/__init__.pyservices/oss/src/agent/app.pyservices/oss/src/agent/client.pyservices/oss/src/agent/config.pyservices/oss/src/agent/inputs.pyservices/oss/src/agent/schemas.pyservices/oss/src/agent/secrets.pyservices/oss/src/agent/tools.pyservices/oss/src/agent/tracing.pyservices/oss/src/harness/__init__.pyservices/oss/src/harness/environment.pyservices/oss/src/harness/ports.pyservices/oss/src/harness/transports.pyservices/oss/src/harness/wire.pyweb/oss/src/components/pages/app-management/components/CreateAppDropdown/index.tsxweb/oss/src/components/pages/app-management/modals/CreateAppTypeModal/index.tsxweb/oss/src/components/pages/prompts/assets/iconHelpers.tsxweb/packages/agenta-entities/src/workflow/state/appUtils.tsweb/packages/agenta-entity-ui/src/DrillInView/SchemaControls/AgentConfigControl.tsxweb/packages/agenta-entity-ui/src/DrillInView/SchemaControls/SchemaPropertyRenderer.tsx
| async def resolve_agent_tools( | ||
| self, | ||
| *, | ||
| project_id: UUID, | ||
| tools: List[AgentToolReference], | ||
| ) -> AgentToolsResolution: | ||
| """Resolve an agent's tool references into model-ready specs. | ||
|
|
||
| ``builtin`` references pass through as names. ``composio`` references are | ||
| validated against the project's connections up front and enriched from the | ||
| catalog (description + input schema), so the model never sees a stale schema | ||
| and the invoke fails fast on a missing/invalid connection rather than mid-loop. | ||
| """ | ||
| builtins: List[str] = [] | ||
| custom: List[ResolvedAgentTool] = [] | ||
|
|
||
| for ref in tools: | ||
| if isinstance(ref, AgentBuiltinTool): | ||
| if ref.name: | ||
| builtins.append(ref.name) | ||
| continue | ||
|
|
||
| if isinstance(ref, AgentComposioTool): | ||
| custom.append( | ||
| await self._resolve_composio_tool( | ||
| project_id=project_id, | ||
| ref=ref, | ||
| ) | ||
| ) | ||
|
|
||
| return AgentToolsResolution(builtins=builtins, custom=custom) | ||
|
|
There was a problem hiding this comment.
Enforce unique resolved tool names to avoid cross-connection misrouting.
Line 559 defaults names to {integration}__{action} and drops connection, so two refs for different connections can resolve to the same name. Downstream consumers key by name (e.g., MCP map), which can overwrite one tool with another and execute against the wrong account.
💡 Suggested fix
async def resolve_agent_tools(
@@
- builtins: List[str] = []
- custom: List[ResolvedAgentTool] = []
+ builtins: List[str] = []
+ custom: List[ResolvedAgentTool] = []
+ seen_names: set[str] = set()
@@
if isinstance(ref, AgentComposioTool):
- custom.append(
- await self._resolve_composio_tool(
- project_id=project_id,
- ref=ref,
- )
- )
+ resolved = await self._resolve_composio_tool(
+ project_id=project_id,
+ ref=ref,
+ )
+ if resolved.name in seen_names:
+ raise ToolSlugInvalidError(
+ slug=resolved.name,
+ detail="Duplicate tool name in resolved tools list",
+ )
+ seen_names.add(resolved.name)
+ custom.append(resolved)
@@
- name = ref.name or f"{ref.integration}__{ref.action}"
+ name = ref.name or f"{ref.integration}__{ref.action}__{ref.connection}"Also applies to: 559-562
| @@ -0,0 +1,7 @@ | |||
| # Agenta collector (the runner also falls back to the repo-root .env.test.local). | |||
| AGENTA_HOST=http://144.76.237.122:8280/ | |||
There was a problem hiding this comment.
Use a neutral placeholder for AGENTA_HOST in the sample env.
A hardcoded real host in .env.example can unintentionally route traces to an external environment during quick-start runs.
Suggested change
-AGENTA_HOST=http://144.76.237.122:8280/
+AGENTA_HOST=http://localhost:8280📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| AGENTA_HOST=http://144.76.237.122:8280/ | |
| AGENTA_HOST=http://localhost:8280 |
| pi.on("agent_end", async (event: any) => { | ||
| if (!agentSpan) return; | ||
| setOutput(agentSpan, lastAssistantText(event?.messages)); | ||
| agentSpan.end(); | ||
| agentSpan = undefined; | ||
| agentCtx = undefined; | ||
| lastContextMessages = undefined; | ||
| }); |
There was a problem hiding this comment.
Close and clear dangling tool spans at run end.
Run teardown does not currently drain toolSpans, so interrupted runs can leave stale spans in memory and contaminate later runs in the same process.
Suggested fix
pi.on("agent_end", async (event: any) => {
if (!agentSpan) return;
+ for (const span of toolSpans.values()) {
+ span.setStatus({ code: SpanStatusCode.ERROR, message: "closed on agent_end" });
+ span.end();
+ }
+ toolSpans.clear();
+
+ if (llmSpan) {
+ llmSpan.setStatus({ code: SpanStatusCode.ERROR, message: "closed on agent_end" });
+ llmSpan.end();
+ llmSpan = undefined;
+ }
+
setOutput(agentSpan, lastAssistantText(event?.messages));
agentSpan.end();
agentSpan = undefined;
agentCtx = undefined;
lastContextMessages = undefined;
+ pendingPrompt = undefined;
});| const { session } = await createAgentSession({ | ||
| cwd, | ||
| model, | ||
| authStorage, | ||
| modelRegistry, | ||
| tools: ["read", "bash", "edit", "write", "ls"], | ||
| sessionManager: SessionManager.inMemory(cwd), | ||
| resourceLoader: loader, | ||
| }); | ||
|
|
||
| // Hand the session id + model to the extension so spans carry them. | ||
| runConfig.sessionId = session.sessionId; | ||
| runConfig.provider = model.provider; | ||
| runConfig.requestModel = model.id; | ||
|
|
||
| session.subscribe((event: any) => { | ||
| if ( | ||
| event.type === "message_update" && | ||
| event.assistantMessageEvent?.type === "text_delta" | ||
| ) { | ||
| process.stdout.write(event.assistantMessageEvent.delta); | ||
| } else if (event.type === "tool_execution_start") { | ||
| process.stdout.write(`\n[tool] ${event.toolName}\n`); | ||
| } | ||
| }); | ||
|
|
||
| const traceIds: string[] = []; | ||
| for (let i = 0; i < scenario.prompts.length; i++) { | ||
| const p = scenario.prompts[i]; | ||
| console.log(`\n[run] prompt ${i + 1}/${scenario.prompts.length}: ${p}\n`); | ||
| await session.prompt(p); | ||
| if (runConfig.traceId) traceIds.push(runConfig.traceId); | ||
| } | ||
|
|
||
| console.log("\n\n[run] flushing spans to Agenta..."); | ||
| session.dispose(); | ||
| await shutdownTracing(); | ||
|
|
There was a problem hiding this comment.
Move session disposal and tracing shutdown into finally.
If any prompt throws, the current flow skips session.dispose() and shutdownTracing(), which can lose final spans and leave session resources open.
Suggested fix
const { session } = await createAgentSession({
cwd,
model,
authStorage,
modelRegistry,
tools: ["read", "bash", "edit", "write", "ls"],
sessionManager: SessionManager.inMemory(cwd),
resourceLoader: loader,
});
- // Hand the session id + model to the extension so spans carry them.
- runConfig.sessionId = session.sessionId;
- runConfig.provider = model.provider;
- runConfig.requestModel = model.id;
-
- session.subscribe((event: any) => {
- ...
- });
-
- const traceIds: string[] = [];
- for (let i = 0; i < scenario.prompts.length; i++) {
- ...
- await session.prompt(p);
- if (runConfig.traceId) traceIds.push(runConfig.traceId);
- }
-
- console.log("\n\n[run] flushing spans to Agenta...");
- session.dispose();
- await shutdownTracing();
+ try {
+ runConfig.sessionId = session.sessionId;
+ runConfig.provider = model.provider;
+ runConfig.requestModel = model.id;
+
+ session.subscribe((event: any) => {
+ ...
+ });
+
+ const traceIds: string[] = [];
+ for (let i = 0; i < scenario.prompts.length; i++) {
+ ...
+ await session.prompt(p);
+ if (runConfig.traceId) traceIds.push(runConfig.traceId);
+ }
+ } finally {
+ session.dispose();
+ await shutdownTracing();
+ }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| const { session } = await createAgentSession({ | |
| cwd, | |
| model, | |
| authStorage, | |
| modelRegistry, | |
| tools: ["read", "bash", "edit", "write", "ls"], | |
| sessionManager: SessionManager.inMemory(cwd), | |
| resourceLoader: loader, | |
| }); | |
| // Hand the session id + model to the extension so spans carry them. | |
| runConfig.sessionId = session.sessionId; | |
| runConfig.provider = model.provider; | |
| runConfig.requestModel = model.id; | |
| session.subscribe((event: any) => { | |
| if ( | |
| event.type === "message_update" && | |
| event.assistantMessageEvent?.type === "text_delta" | |
| ) { | |
| process.stdout.write(event.assistantMessageEvent.delta); | |
| } else if (event.type === "tool_execution_start") { | |
| process.stdout.write(`\n[tool] ${event.toolName}\n`); | |
| } | |
| }); | |
| const traceIds: string[] = []; | |
| for (let i = 0; i < scenario.prompts.length; i++) { | |
| const p = scenario.prompts[i]; | |
| console.log(`\n[run] prompt ${i + 1}/${scenario.prompts.length}: ${p}\n`); | |
| await session.prompt(p); | |
| if (runConfig.traceId) traceIds.push(runConfig.traceId); | |
| } | |
| console.log("\n\n[run] flushing spans to Agenta..."); | |
| session.dispose(); | |
| await shutdownTracing(); | |
| const { session } = await createAgentSession({ | |
| cwd, | |
| model, | |
| authStorage, | |
| modelRegistry, | |
| tools: ["read", "bash", "edit", "write", "ls"], | |
| sessionManager: SessionManager.inMemory(cwd), | |
| resourceLoader: loader, | |
| }); | |
| try { | |
| // Hand the session id + model to the extension so spans carry them. | |
| runConfig.sessionId = session.sessionId; | |
| runConfig.provider = model.provider; | |
| runConfig.requestModel = model.id; | |
| session.subscribe((event: any) => { | |
| if ( | |
| event.type === "message_update" && | |
| event.assistantMessageEvent?.type === "text_delta" | |
| ) { | |
| process.stdout.write(event.assistantMessageEvent.delta); | |
| } else if (event.type === "tool_execution_start") { | |
| process.stdout.write(`\n[tool] ${event.toolName}\n`); | |
| } | |
| }); | |
| const traceIds: string[] = []; | |
| for (let i = 0; i < scenario.prompts.length; i++) { | |
| const p = scenario.prompts[i]; | |
| console.log(`\n[run] prompt ${i + 1}/${scenario.prompts.length}: ${p}\n`); | |
| await session.prompt(p); | |
| if (runConfig.traceId) traceIds.push(runConfig.traceId); | |
| } | |
| } finally { | |
| console.log("\n\n[run] flushing spans to Agenta..."); | |
| session.dispose(); | |
| await shutdownTracing(); | |
| } |
| AGENTA_HOST: ${AGENTA_HOST:-http://144.76.237.122:8280} | ||
| AGENTA_API_KEY: ${AGENTA_API_KEY:-} |
There was a problem hiding this comment.
Avoid a public-IP fallback for AGENTA_HOST in dev.
Line 446 defaults to an external host, which can unintentionally send local trace/tool payloads outside the dev stack when AGENTA_HOST is unset.
Suggested fix
- AGENTA_HOST: ${AGENTA_HOST:-http://144.76.237.122:8280}
+ AGENTA_HOST: ${AGENTA_HOST:-http://api:8000}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| AGENTA_HOST: ${AGENTA_HOST:-http://144.76.237.122:8280} | |
| AGENTA_API_KEY: ${AGENTA_API_KEY:-} | |
| AGENTA_HOST: ${AGENTA_HOST:-http://api:8000} | |
| AGENTA_API_KEY: ${AGENTA_API_KEY:-} |
| /** Build a parent Context from a W3C traceparent string, or undefined if absent/invalid. */ | ||
| function parentContext(traceparent?: string): Context | undefined { | ||
| if (!traceparent) return undefined; | ||
| const match = /^00-([0-9a-f]{32})-([0-9a-f]{16})-([0-9a-f]{2})$/.exec( | ||
| traceparent.trim(), | ||
| ); | ||
| if (!match) return undefined; | ||
| const [, traceId, spanId, flags] = match; | ||
| const spanContext: SpanContext = { | ||
| traceId, | ||
| spanId, | ||
| // Honor the incoming sampled bit; default to sampled so child spans record. | ||
| traceFlags: (parseInt(flags, 16) & 1) === 1 ? TraceFlags.SAMPLED : TraceFlags.NONE, | ||
| isRemote: true, | ||
| }; | ||
| return trace.setSpanContext(ROOT_CONTEXT, spanContext); | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
rg -n --type=ts '\bbaggage\b' services/agent/src/engines/pi.ts services/agent/src/engines/rivet.ts services/agent/src/tracing/otel.tsRepository: Agenta-AI/agenta
Length of output: 376
🏁 Script executed:
# Check the current implementation of parentContext and surrounding context
sed -n '220,245p' services/agent/src/tracing/otel.tsRepository: Agenta-AI/agenta
Length of output: 1035
🏁 Script executed:
# Check the RunConfig type definition and the lines mentioned
sed -n '250,260p' services/agent/src/tracing/otel.tsRepository: Agenta-AI/agenta
Length of output: 569
🏁 Script executed:
# Check the specific lines mentioned: 423-431, 468-469
sed -n '420,435p' services/agent/src/tracing/otel.ts
sed -n '465,475p' services/agent/src/tracing/otel.tsRepository: Agenta-AI/agenta
Length of output: 1212
🏁 Script executed:
# Check the imports to see what's currently imported from OpenTelemetry
head -30 services/agent/src/tracing/otel.tsRepository: Agenta-AI/agenta
Length of output: 1886
🏁 Script executed:
# Check if propagation is imported and what the complete imports look like
sed -n '1,50p' services/agent/src/tracing/otel.ts | grep -A 20 "^import"Repository: Agenta-AI/agenta
Length of output: 478
🏁 Script executed:
# Check line 726-727 to see the other call site
sed -n '720,732p' services/agent/src/tracing/otel.tsRepository: Agenta-AI/agenta
Length of output: 692
🏁 Script executed:
# Check rivet.ts to see if it has the same pattern issue
sed -n '715,730p' services/agent/src/engines/rivet.tsRepository: Agenta-AI/agenta
Length of output: 729
🏁 Script executed:
# Check if propagation is available in `@opentelemetry/api` by searching the codebase for its usage
rg -n 'propagation\.' services/agent/src --type=tsRepository: Agenta-AI/agenta
Length of output: 42
🏁 Script executed:
# Also check what version of `@opentelemetry/api` is being used
grep -A 5 '"`@opentelemetry/api`"' package.jsonRepository: Agenta-AI/agenta
Length of output: 106
🏁 Script executed:
# Verify the RunConfig object assignment at lines 423-431 - check what fields are actually assigned
sed -n '418,435p' services/agent/src/tracing/otel.tsRepository: Agenta-AI/agenta
Length of output: 614
🏁 Script executed:
# Find package.json in the repo
find . -name "package.json" -type f | head -5Repository: Agenta-AI/agenta
Length of output: 339
🏁 Script executed:
# Check if propagation is being used anywhere else in the codebase
rg 'propagation' services/agent/src --type=ts -A 2 -B 2Repository: Agenta-AI/agenta
Length of output: 42
🏁 Script executed:
# Look for any W3C baggage related imports or patterns
rg 'baggage|Baggage' services/agent/src --type=ts -B 2 -A 2Repository: Agenta-AI/agenta
Length of output: 1583
baggage is wired upstream but never propagated into tracing context.
RunConfig accepts baggage from callers (pi.ts:180, rivet.ts:724) but it is not stored in the config object during initialization, not passed to parentContext(), and not handled by parentContext() which only parses traceparent. Baggage context is dropped on both Pi and Rivet paths.
The fix requires:
- Adding
baggage: init.baggageto the config object assignment (line 423-431) - Updating
parentContext()signature to acceptbaggage?: string - Using OpenTelemetry's standard
propagation.extract()API to handle both W3C traceparent and baggage headers - Passing
config.baggageto allparentContext()calls (lines 468, 726-727)
Suggested fix
import {
context,
+ propagation,
ROOT_CONTEXT,
trace,
@@
-function parentContext(traceparent?: string): Context | undefined {
- if (!traceparent) return undefined;
- const match = /^00-([0-9a-f]{32})-([0-9a-f]{16})-([0-9a-f]{2})$/.exec(
- traceparent.trim(),
- );
- if (!match) return undefined;
- const [, traceId, spanId, flags] = match;
- const spanContext: SpanContext = {
- traceId,
- spanId,
- traceFlags: (parseInt(flags, 16) & 1) === 1 ? TraceFlags.SAMPLED : TraceFlags.NONE,
- isRemote: true,
- };
- return trace.setSpanContext(ROOT_CONTEXT, spanContext);
+function parentContext(traceparent?: string, baggage?: string): Context | undefined {
+ const carrier: Record<string, string> = {};
+ if (traceparent?.trim()) carrier.traceparent = traceparent.trim();
+ if (baggage?.trim()) carrier.baggage = baggage.trim();
+ if (!carrier.traceparent && !carrier.baggage) return undefined;
+ const extracted = propagation.extract(ROOT_CONTEXT, carrier);
+ return trace.getSpanContext(extracted) ? extracted : undefined;
}
@@
const config: RunConfig = {
@@
+ baggage: init.baggage,
@@
- const parent = parentContext(config.traceparent);
+ const parent = parentContext(config.traceparent, config.baggage);
@@
- const parent = parentContext(init.traceparent);
+ const parent = parentContext(init.traceparent, init.baggage);🧰 Tools
🪛 OpenGrep (1.22.0)
[ERROR] 228-230: Dynamic command passed to child_process.exec/execSync. Use child_process.execFile or spawn with an argument array instead.
(coderabbit.command-injection.exec-js)
| try: | ||
| session = harness.create_session(session_config) | ||
| result = await session.prompt(msgs) | ||
| await session.destroy() | ||
| finally: |
There was a problem hiding this comment.
Session cleanup is skipped on prompt failures.
At Line 94-97, session.destroy() executes only on success. If session.prompt(...) raises, session resources remain undisposed.
Suggested fix
await harness.setup()
try:
session = harness.create_session(session_config)
- result = await session.prompt(msgs)
- await session.destroy()
+ try:
+ result = await session.prompt(msgs)
+ finally:
+ await session.destroy()
finally:
await harness.shutdown()📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| try: | |
| session = harness.create_session(session_config) | |
| result = await session.prompt(msgs) | |
| await session.destroy() | |
| finally: | |
| await harness.setup() | |
| try: | |
| session = harness.create_session(session_config) | |
| try: | |
| result = await session.prompt(msgs) | |
| finally: | |
| await session.destroy() | |
| finally: | |
| await harness.shutdown() |
| instructions=agent.get("instructions") or config.agents_md, | ||
| model=agent.get("model") or config.model, | ||
| tools=_as_list(agent.get("tools")), | ||
| harness=( |
There was a problem hiding this comment.
Default tool configuration is dropped in the agent path.
At Line 45, omitting agent.tools resolves to [], while the legacy path falls back to config.tools. This makes tool availability depend on request shape and can silently disable configured tools.
Suggested fix
if isinstance(agent, dict):
+ raw_tools = agent.get("tools")
return RunConfig(
instructions=agent.get("instructions") or config.agents_md,
model=agent.get("model") or config.model,
- tools=_as_list(agent.get("tools")),
+ tools=_as_list(config.tools if raw_tools is None else raw_tools),
harness=(
agent.get("harness") or os.getenv("AGENTA_AGENT_HARNESS", "pi")
).lower(),📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| instructions=agent.get("instructions") or config.agents_md, | |
| model=agent.get("model") or config.model, | |
| tools=_as_list(agent.get("tools")), | |
| harness=( | |
| if isinstance(agent, dict): | |
| raw_tools = agent.get("tools") | |
| return RunConfig( | |
| instructions=agent.get("instructions") or config.agents_md, | |
| model=agent.get("model") or config.model, | |
| tools=_as_list(config.tools if raw_tools is None else raw_tools), | |
| harness=( |
| content = raw.get("content", "") | ||
| if isinstance(content, list): | ||
| content = [ContentBlock.from_raw(block) for block in content] | ||
| return cls(role=str(raw["role"]), content=content) |
There was a problem hiding this comment.
Message.from_raw can produce invalid wire payloads for dict content.
At Line 164-167, dict content is not normalized. Later (Line 146-150), to_wire treats non-string content as a list, so dict content is iterated as keys, corrupting the outgoing message.
Suggested fix
content = raw.get("content", "")
- if isinstance(content, list):
+ if isinstance(content, dict):
+ content = [ContentBlock.from_raw(content)]
+ elif isinstance(content, list):
content = [ContentBlock.from_raw(block) for block in content]
+ elif not isinstance(content, str):
+ content = str(content)
return cls(role=str(raw["role"]), content=content)| _DEFAULT_TIMEOUT = float(os.getenv("AGENTA_AGENT_TIMEOUT", "180")) | ||
| _DEFAULT_COMMAND = ["pnpm", "exec", "tsx", "src/cli.ts"] |
There was a problem hiding this comment.
Import-time timeout parsing can crash the service on bad env values.
At Line 32, a non-numeric AGENTA_AGENT_TIMEOUT raises ValueError during import, taking down startup instead of falling back safely.
Suggested fix
-_DEFAULT_TIMEOUT = float(os.getenv("AGENTA_AGENT_TIMEOUT", "180"))
+def _resolve_default_timeout() -> float:
+ raw = os.getenv("AGENTA_AGENT_TIMEOUT", "180")
+ try:
+ value = float(raw)
+ return value if value > 0 else 180.0
+ except (TypeError, ValueError):
+ log.warning("Invalid AGENTA_AGENT_TIMEOUT=%r; using 180s", raw)
+ return 180.0
+
+_DEFAULT_TIMEOUT = _resolve_default_timeout()
Context
Agenta runs prompt workflows today: completion, chat, and the LLM judge. Each calls a model
once and returns one answer. This PoC adds a different kind of workflow. An agent runs a
loop: it reads its instructions, calls a model, runs a tool, reads the result, and calls the
model again, until the task is done. It sits behind the same
/invokecontract, traces intothe same spans, and reads its config from the same playground.
The PoC proves one claim: the agent and the place it runs are both config, not code. You
change a dropdown to swap Pi for Claude Code, or local for a Daytona cloud sandbox, and
nothing above the seam changes.
How it works
A Python workflow handler receives
/invokeand calls a Node sidecar over HTTP. The sidecardrives a coding agent through the open-source rivet
sandbox-agentdaemon over the AgentClient Protocol (ACP). The daemon starts an ACP adapter, which starts the agent (Pi or
Claude Code). Two orthogonal config axes swap independently:
harness(pi | claude) andsandbox(local | daytona).The full write-up lives in
docs/design/agent-workflows/. Read it in this order:architecture.md: the data flow, the two containers, and the vocabulary. Start here.ports-and-adapters.md: the neutral seam, the wire contract, and how the service picks anengine and a transport.
sessions.md: cold replay today, and the two paths to warm or persisted sessions tomorrow.adapters/pi.mdandadapters/claude-code.md: how each agent is wired, traced, and giventools.
What this PoC includes
/invoke, withharnessandsandboxas playground dropdowns.Environment,Harness,AgentSession) with two enginesbehind it: rivet over ACP, and a legacy in-process Pi path.
them natively through a Pi extension; an MCP-capable harness takes them over MCP.
/invokespan, with token usagerolled onto the workflow span.
What it defers (documented, not built)
sessions.md).Tests / notes
Verified live against the dockerized dev stack, with real
/invokerequests through theplayground service route:
_agent → invoke_agent → turn 0 → chat, with the tokentotal on
_agent._agent.Python passes
ruff formatandruff check. TypeScript compiles undertsc --strict.For reviewers:
modelchoice is not honored inside the Daytona sandbox yet. The in-sandboxPi lacks the model catalog the local Pi loads. The model axis is honored on pi / local.
verified end to end.
This PR lands the full agent-workflows PoC into
main: the WP-2 service, the WP-7 tools, theWP-8 ACP runtime, and the session-shaped port. The smaller stacked PRs (for example #4718)
are intermediate stacking artifacts. Review this one as the PoC.
What to QA
shows
_agent → invoke_agent → turn 0 → chatwith tokens on_agent.nested shape.