Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 7 additions & 9 deletions .go-arch-lint.yml
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,6 @@ commonComponents:
- pkg-httpx
- pkg-output
- pkg-registry
- pkg-acpserver

vendors:
go-stdlib:
Expand Down Expand Up @@ -163,6 +162,11 @@ vendors:
- github.com/modelcontextprotocol/go-sdk/mcp
- github.com/modelcontextprotocol/go-sdk/**

go-sdk-acp:
in:
- github.com/coder/acp-go-sdk
- github.com/coder/acp-go-sdk/**

components:
# DOMAIN LAYER
domain-workflow:
Expand Down Expand Up @@ -211,9 +215,6 @@ components:
pkg-validation:
in: ../pkg/validation

pkg-acpserver:
in: ../pkg/acpserver

# PROTOBUF
proto-plugin:
in: ../proto/plugin/v1
Expand Down Expand Up @@ -611,9 +612,10 @@ deps:
- domain-plugin
- infra-agents
- infra-logger
- pkg-acpserver
- application
canUse:
- go-stdlib
- go-sdk-acp

infra-mcp:
mayDependOn:
Expand All @@ -626,10 +628,6 @@ deps:
# (see internal/infrastructure/mcp/doc.go). Keep it out of canUse so an
# accidental direct import is caught as an architecture violation.

pkg-acpserver:
canUse:
- go-stdlib

infra-tools:
mayDependOn:
- domain-ports
Expand Down

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
:- module(pr_feature_f105_acp_server_migration_to_coderacp_go_s, []).
% ─── PR Tracking Schema ──────────────────────────────────────────────────────
% Memory segment: pr_<branch>
% Lifecycle: created at implement start, gated before commit, archived on merge.
%
% Facts (asserted by scan scripts and LLM):
% pr_file(Path, ChangeType) — file in PR scope (changed | added | test)
% todo(Id, File, Line, Desc) — TODO/FIXME found in changed code
% stub(Id, File, Symbol) — stub/placeholder implementation
% mock(Id, File, Symbol) — mock that should be replaced with real impl
% not_impl(Id, File, Desc) — "not yet implemented" marker
% resolved(Type, Id) — marks a tracked issue as resolved
%
% Dynamic declarations (required by Trealla Prolog for runtime assertion).
:- dynamic(pr_file/2).
:- dynamic(todo/4).
:- dynamic(stub/3).
:- dynamic(mock/3).
:- dynamic(not_impl/3).
:- dynamic(resolved/2).

% ─── Unresolved queries ─────────────────────────────────────────────────────
% Convenience predicates for querying unresolved issues by type.
unresolved_todo(Id, File, Line, Desc) :-
todo(Id, File, Line, Desc), \+ resolved(todo, Id).
unresolved_stub(Id, File, Symbol) :-
stub(Id, File, Symbol), \+ resolved(stub, Id).
unresolved_mock(Id, File, Symbol) :-
mock(Id, File, Symbol), \+ resolved(mock, Id).
unresolved_not_impl(Id, File, Desc) :-
not_impl(Id, File, Desc), \+ resolved(not_impl, Id).

% A blocking issue is any tracked issue that has not been resolved.
blocking_issue(Id, todo, File, Desc) :-
todo(Id, File, _, Desc), \+ resolved(todo, Id).
blocking_issue(Id, stub, File, Symbol) :-
stub(Id, File, Symbol), \+ resolved(stub, Id).
blocking_issue(Id, mock, File, Symbol) :-
mock(Id, File, Symbol), \+ resolved(mock, Id).
blocking_issue(Id, not_impl, File, Desc) :-
not_impl(Id, File, Desc), \+ resolved(not_impl, Id).

% PR is ready ONLY when zero blocking issues remain.
pr_ready :- \+ blocking_issue(_, _, _, _).

% Health summary — counts by category.
pr_health(blocking, N) :-
findall(I, blocking_issue(I, _, _, _), L), length(L, N).
pr_health(resolved, N) :-
findall(I, resolved(_, I), L), length(L, N).
pr_health(files, N) :-
findall(F, pr_file(F, _), L), length(L, N).

% Coverage gap: source file changed without corresponding test file.
coverage_gap(File) :-
pr_file(File, changed),
\+ pr_file(File, test),
\+ test_file(File, _).

% List all blocking issues as Id-Type-File-Desc tuples.
all_blockers(Blockers) :-
findall(blocker(Id, Type, File, Desc), blocking_issue(Id, Type, File, Desc), Blockers).
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,10 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

## [Unreleased]

### Changed

- **F105**: ACP server migrated from the homegrown `pkg/acpserver/` JSON-RPC engine (~1620 LOC across 10 files) to the official `github.com/coder/acp-go-sdk` v0.13.0, wrapped in a new `internal/infrastructure/acp/` adapter package implementing `acp.Agent` with the four wired methods (`Initialize`, `NewSession`, `Prompt`, `Cancel`) and seven typed "method not supported" stubs (`Authenticate`, `LoadSession`, `SetSessionMode`, `SetSessionModel`, `ExtMethod`, `ExtNotification`, `CancelNotification`). The `awf acp-serve` lifecycle is rebuilt around `acp.NewAgentSideConnection(agent, os.Stdout, os.Stdin)` + `<-conn.Done()` with `conn.SetLogger(stderr)` keeping stdout exclusively for protocol frames. `ports.ACPClient.RequestPermission` is now live, wired through `internal/infrastructure/acp/permission.go` to `conn.RequestPermission`, with the stdout serialization invariant preserved under concurrent `SessionUpdate` + `RequestPermission` traffic (mirroring the legacy `TestServer_OutboundWritesDoNotInterleave` guarantee). Domain `ACPHandlerError` taxonomy (`ACPErrInvalidParams`, `ACPErrInternal`, `ACPErrMethodNotFound`, `ACPErrUnsupportedContentBlock`) stays in `application/acp_errors.go` and is mapped to SDK error variants by `toACPError` in `internal/infrastructure/acp/errors.go`. The SDK import is confined to `internal/infrastructure/acp/` and jointly enforced by `.go-arch-lint.yml` (new `go-sdk-acp` vendor entry, `infra-acp.canUse = [go-stdlib, go-sdk-acp]`) and a new AST-based `architecture_test.go` mirroring the F104 MCP pattern. `pkg/acpserver/` is fully deleted (10 files); `internal/infrastructure/acp/message.go` and `acp_wiring.go:acpErrorCode()` are removed as their bespoke types are subsumed by SDK `SessionUpdate` constructors and `toACPError` respectively. User-facing behavior is iso-functional with the legacy engine: editors (Zed, acp.nvim) see identical session lifecycle, slash-command discovery, multi-turn parking, and approval-gate semantics. Panic isolation via SDK-independent `defer recover()` wrapper with named returns (`fmt.Sprintf("panic recovered: %v", r)`, no stack trace exfiltration). Adapter test coverage > 85% with `make test-race` green. SDK 0.x risk and rollback rationale documented in [ADR-020](docs/ADR/020-acp-server-migration-to-coder-sdk.md), which supersedes the `pkg/acpserver/` implementation detail of [ADR-018](docs/ADR/018-acp-transparent-agent-server-protocol.md) (the ACP protocol and per-session subprocess topology decisions in ADR-018 stand unchanged). Unblocks F107 (facade refactor) and F108 (Axis B permission gate).

### Fixed

- **F103**: Codex provider output parity — `state.Output` and `ConversationResult.Output` for the `codex` provider now contain clean aggregated assistant text instead of raw NDJSON, regardless of `output_format` value (`json`, `stream-json`, `text`, or absent), closing the gap left by B015 in 0.7.1 which only covered Claude, Gemini, and OpenCode. Aggregation is presence-aware: when the NDJSON stream contains ≥1 `assistant_message` event, the extracted text overwrites `Output` (empty `assistant_message` yields `Output == ""`); when no events are parsed (plain-text mocks, pre-result lifecycle events only), the base output is preserved. `state.Response` and `ConversationResult.Response` are now populated via `tryParseJSONResponse` on the aggregated assistant text when it is a valid JSON object — matching Gemini/OpenCode semantics. Estimated token recount runs on the extracted text rather than the raw NDJSON length. Conversation parity achieved through a Codex-targeted override in `ExecuteConversation` plus an `extractTextContent` hook wired into `cliProviderHooks`; base provider and other CLI providers (Claude, Gemini, OpenCode, GitHub Copilot) are bit-identical. NUL bytes, multi-event streams, empty assistant messages, and non-JSON plain text are all handled without panic or truncation. Workflow authors can now reference `{{.states.codex_step.Output}}` and `{{.states.codex_step.Response}}` with the same semantics as other providers, unblocking multi-step workflows that pipe Codex output downstream.
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ A Go CLI tool for orchestrating AI agents (Claude, Gemini, Codex, GitHub Copilot
- **Built-in Notification Plugin** - Workflow completion alerts via desktop and webhooks with configurable backends
- **Terminal User Interface (TUI)** - Full-screen interactive dashboard (`awf tui`) with tab-based navigation for workflow browsing, real-time execution monitoring, history exploration, agent conversation rendering, and Claude Code session tailing; built on Bubble Tea with Lip Gloss styling and Glamour Markdown rendering
- **HTTP REST API Server** - `awf serve` exposes workflow discovery, async execution, SSE event streaming, lifecycle control, and execution history over HTTP with auto-generated OpenAPI 3.1 spec and Swagger UI at `/docs`; built on Huma v2 + chi v5; defaults to `127.0.0.1:2511` (loopback-only) with `--host`/`--port` overrides
- **ACP Transparent Agent Server** - `awf acp-serve` (hidden) exposes workflows as an [ACP (Agent Client Protocol)](https://agentclientprotocol.com) agent over stdio, enabling ACP-compatible editors (Zed, acp.nvim) to spawn AWF as a transparent agent subprocess; full workflow execution per `session/prompt` (not single-provider passthrough) with multi-step progress projected as `tool_call` / `tool_call_update` notifications; workflow discovery via slash commands (`available_commands_update`); native `session/request_permission` for approval gates; mid-workflow user input via turn-boundary resume; `session/cancel` with 5s SIGTERM→SIGKILL grace; editor-provided `mcpServers` merged with per-step MCP proxy config (editor wins on collision); stdlib-only `pkg/acpserver` engine mirroring the `pkg/mcpserver` invariant. See [ADR-018](docs/ADR/018-acp-transparent-agent-server-protocol.md).
- **ACP Transparent Agent Server** - `awf acp-serve` (hidden) exposes workflows as an [ACP (Agent Client Protocol)](https://agentclientprotocol.com) agent over stdio, enabling ACP-compatible editors (Zed, acp.nvim) to spawn AWF as a transparent agent subprocess; full workflow execution per `session/prompt` (not single-provider passthrough) with multi-step progress projected as `tool_call` / `tool_call_update` notifications; workflow discovery via slash commands (`available_commands_update`); native `session/request_permission` for approval gates; mid-workflow user input via turn-boundary resume; `session/cancel` with 5s SIGTERM→SIGKILL grace; editor-provided `mcpServers` merged with per-step MCP proxy config (editor wins on collision); powered by the official `github.com/coder/acp-go-sdk` with infrastructure adapter pattern. See [ADR-018](docs/ADR/018-acp-transparent-agent-server-protocol.md) and [ADR-020](docs/ADR/020-acp-server-migration-to-coder-sdk.md).

## Installation

Expand Down
2 changes: 1 addition & 1 deletion docs/ADR/018-acp-transparent-agent-server-protocol.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ title: "018: ACP Transparent Agent Server via JSON-RPC 2.0 stdio Subprocess"
**Date**: 2026-05-30
**Issue**: F102
**Supersedes**: N/A
**Superseded by**: N/A
**Superseded by**: ADR 020 (implementation detail, not decision)

## Context

Expand Down
120 changes: 120 additions & 0 deletions docs/ADR/020-acp-server-migration-to-coder-sdk.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,120 @@
---
title: "020: ACP Server Migration to Official coder/acp-go-sdk"
---

**Status**: Accepted
**Date**: 2026-06-05
**Issue**: F105
**Supersedes**: ADR 018 (implementation detail, not decision)
**Superseded by**: N/A

## Context

AWF's ACP server (introduced in ADR 018) was initially implemented as a custom JSON-RPC 2.0 server in `pkg/acpserver/` (~1620 lines across 10 files). This custom implementation:

1. Duplicates protocol conformance logic already solved by the official SDK
2. Increases maintenance burden when the ACP spec evolves or becomes semver-public
3. Blocks extensions that depend on SDK features (e.g., structured protocol updates in F108)
4. Requires custom error handling, session lifecycle management, and stdout serialization guarantees
5. Provides no advantage over the battle-tested official implementation

The official `github.com/coder/acp-go-sdk` (v0.13.x+) provides:

- Complete ACP protocol implementation with proper error handling
- Maintained by Coder with tight spec alignment
- Stdio transport with configurable payload caps
- Panic-safe handler execution primitives
- Regular updates aligned with official ACP releases
- Clean handler signatures supporting the required methods (Initialize, NewSession, Prompt, Cancel)

## Decision

Migrate the ACP server implementation from the custom `pkg/acpserver/` to the official SDK, wrapped in a new `internal/infrastructure/acp/` adapter package that:

1. Implements the `acp.Agent` interface from the SDK, delegating to `internal/application/acp_errors.go` taxonomy and application-layer `ACPSessionService`
2. Exposes the SDK connection lifecycle via `acp.NewAgentSideConnection(agent, stdout, stdin)` with proper `<-conn.Done()` cleanup
3. Wires `RequestPermission` transport through `ports.ACPClient` to `conn.RequestPermission`
4. Isolates SDK-specific types from the CLI layer, maintaining hexagonal architecture
5. Preserves 100% user-facing behavior parity with the legacy implementation (iso-functional)
6. Maintains panic isolation via `defer recover()` in handler wrappers with SDK-independent error recovery
7. Includes comprehensive test coverage (>85%) exercising the SDK's transport layer and concurrency invariants

## Rationale

### Architecture Compliance

The migration preserves the hexagonal layering principle by placing the SDK adapter in `internal/infrastructure/acp/` rather than directly using the SDK in `interfaces/cli/`. This allows:

- **Substitutability**: Future SDK upgrades or replacements require changes in one package only
- **Type isolation**: SDK types stay within the adapter; the CLI depends only on domain ports (ports.ACPClient for permission transport)
- **Clear ownership**: Protocol implementation logic is cleanly separated from command wiring and session coordination
- **Error taxonomy preservation**: Application layer `acp_errors.go` remains the single source of truth for `ACPHandlerError` kinds, mapped to SDK error variants in infra-only `toACPError`

This pattern mirrors the successful F104 MCP server migration (ADR 019) and follows the project's architectural rules. The `RequestPermission` transport binding is wired as an infrastructure adapter (`internal/infrastructure/acp/permission.go`) following the ports-and-adapters pattern.

### Gating and Risk Mitigation

A mandatory SPIKE (US3 in F105 spec) validates eight protocol-shape unknowns before any production code change or deletion:

1. SDK's `acp.Agent` interface signature and connection lifecycle (`NewAgentSideConnection`, `Done()`, `SetLogger`)
2. Handler signatures for Initialize, NewSession, Prompt, Cancel
3. Parking semantics for multi-turn prompts (per-prompt completion hook support)
4. SessionUpdate emission API (typed variants vs free-form payload)
5. Payload cap configuration (10 MiB read limit)
6. RequestPermission outbound call signature and stdout serialization
7. Error type mappings and SDK error variants
8. Protocol version number and minimum Go version requirements

SPIKE failure (any unknown unresolved) aborts F105 entirely per FR-014; this gate makes the big-bang migration approach safe.

## Consequences

**What becomes easier:**

- ACP server implementation gains maintenance parity with MCP (both via official SDKs)
- Future ACP spec additions (e.g., new session methods, structured content types for F108) are covered by SDK releases rather than custom protocol code
- Payload cap, error handling, and concurrent dispatch safety are guaranteed by the SDK rather than custom invariant tests
- Handler panics are caught and translated to proper ACP errors without exposing stack traces
- Two nearly-identical serve scaffolds (`mcp_serve.go` + `acp_serve.go`) can now follow identical SDK patterns, setting up a future DRY extraction

**What becomes harder:**

- Debugging ACP session issues requires familiarity with the SDK's internal error paths (though these are well-documented)
- Each `awf acp-serve` process consumes ~10 MB RSS. Long-lived editor sessions that never close their ACP process will hold that memory until the editor exits or explicitly closes the session (same as before)
- SDK version lock (v0.13.x) must be actively maintained; point releases are evaluated for breakage before upgrade
- Windows support remains deferred: signal-aware shutdown and process-group cleanup use POSIX-only syscalls (`Setpgid`, `syscall.Kill(-pgid, ...)`); ACP integration tests gate on `//go:build integration && !windows`

## Constitution Compliance

| Principle | Status | Justification |
|-----------|--------|---------------|
| Hexagonal Architecture | Compliant | SDK confined to `internal/infrastructure/acp/` via AST-based architecture test; domain gains no SDK types; application gets infra adapter for `RequestPermission` via `ports.ACPClient`; error taxonomy (`ACPHandlerError`) stays in application layer (24 in-package consumers); `.go-arch-lint.yml` updated with `go-sdk-acp` vendor and `infra-acp` component |
| Go Idioms | Compliant | `context.Context` threads from `acp_serve.RunE` through `conn.Serve` and handler dispatch; goroutine+channel for shutdown coordination; defer panic recovery with named returns following F104 pattern |
| Minimal Abstraction | Compliant | SDK adapter is infrastructure-only; single `ports.ACPClient` port method for permission requests; no new domain ports or abstractions |
| Error Taxonomy | Compliant | Application layer `ACPHandlerError` kinds map to SDK error variants in `toACPError`; five `USER.ACP.*` codes preserved (INVALID_PARAMS, UNSUPPORTED_BLOCK, PROMPT_IN_FLIGHT, UNKNOWN_SESSION, PROTOCOL_VERSION_UNSUPPORTED) |
| Security First | Compliant | `SecretMasker.MaskText` applied to all `agent_message_chunk`, `agent_thought_chunk`, and `tool_call` args before emission; 10 MiB `bufio.Scanner` ceiling (verified / configured against SDK default) prevents OOM; `signal.NotifyContext` SIGTERM→SIGKILL prevents zombie processes |
| Test-Driven Development | Compliant | SPIKE harness validates all 8 unknowns before production code starts; adapter coverage >85% on `internal/infrastructure/acp/` required (NFR-001); `make test-race` mandatory for concurrency-heavy code |
| Documentation Co-location | Compliant | `internal/infrastructure/acp/doc.go` ≥145 lines documenting Purpose, Public Surface, Internal Layout, Threat Model, Error Taxonomy, Dependency Contract, SDK Substitution patterns |

## Notes

**Deletion of `pkg/acpserver/`:**

The entire `pkg/acpserver/` package (10 files: doc.go, protocol.go, server.go, types.go, protocol_test.go, server_test.go, types_test.go, architecture_test.go, goroutine_leak_test.go, writeframe_internal_test.go) is deleted as part of F105 completion. This is the intended outcome: the custom implementation is fully replaced by the SDK adapter.

**ADR-018 Relationship:**

ADR-018 decided on the ACP protocol and the per-session subprocess architecture (`awf acp-serve`). This decision stands unchanged and is not superseded by F105. What F105 supersedes is the implementation detail in ADR-018's "Public package" section: moving from stdlib-only `pkg/acpserver/` to SDK-wrapped `internal/infrastructure/acp/` with `internal/domain/ports/acp_client.go` for the permission transport port.

**Comparison to F104 (MCP Migration):**

This migration follows the identical playbook as F104 (ADR 019):
- Mandatory SPIKE gate resolving SDK unknowns before production code
- New infrastructure adapter in `internal/infrastructure/{service}/`
- AST architecture test enforcing SDK confinement
- Panic isolation via defer recover wrappers
- Per-step or per-handler renderer/emitter preservation
- `.go-arch-lint.yml` updated with vendor stanza and component registration
- Big-bang approach with SPIKE failure abort gate

The F105 spec explicitly notes "F104 (MCP migration to the official go-sdk, commit 9740292) is the live blueprint for this work."
Loading
Loading