Skip to content

feat(backend): add Cobra CLI foundation#53

Merged
AgentWrapper merged 12 commits into
mainfrom
feat/cli-foundation
May 31, 2026
Merged

feat(backend): add Cobra CLI foundation#53
AgentWrapper merged 12 commits into
mainfrom
feat/cli-foundation

Conversation

@illegalcall
Copy link
Copy Markdown
Collaborator

Summary

  • adds a Go/Cobra ao CLI entrypoint under backend/cmd/ao
  • extracts daemon startup into internal/daemon and keeps go run . as a compatibility wrapper
  • implements daemon-control commands: hidden ao daemon, ao start, ao stop, ao status, ao doctor, ao completion, and ao version
  • documents the current CLI surface, what works now, and the next product-command steps

What works now

  • ao start starts the daemon and waits for /readyz
  • ao status / ao status --json report daemon state from running.json and health probes
  • ao stop gracefully terminates the daemon and cleans stale run-files
  • ao doctor checks config, data dir, SQLite migrations, daemon state, and local tool availability
  • shell completions generate for bash, zsh, fish, and powershell

Next steps

  1. Add /api/v1/projects over a small project service.
  2. Implement ao project list/add/show/remove.
  3. Wire Session Manager production dependencies.
  4. Add /api/v1/sessions and implement ao spawn, ao session ..., and ao send.
  5. Add /events SSE/list reads and implement ao events tail/list.

Verification

  • git diff --check
  • gofmt -l . && go build ./... && go test ./...
  • go vet ./...
  • go test -race ./...
  • E2E matrix with isolated temp AO_RUN_FILE, AO_DATA_DIR, and AO_PORT: --help, version, --version, all completion shells, stopped status --json, doctor, start, idempotent start --json, ready status, ready status --json, stop, final stopped status --json

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 31, 2026

Greptile Summary

This PR adds a Cobra-based ao CLI foundation to the Go backend, extracting daemon startup into internal/daemon and implementing daemon-control commands (start, stop, status, doctor, completion, version). The previous review findings — Windows processAlive always returning true and the concurrent-start run-file clobber race — are both addressed.

  • internal/cli introduces injectable Deps for testability, a two-tier exit-code scheme (usage=2 / runtime=1), and a layered status state machine (stopped → stale → unhealthy → not_ready → ready) with PID+service-name cross-verification against /healthz and /readyz to avoid mis-signalling a reused PID.
  • internal/httpd/router.go adds POST /shutdown guarded by localControlRequest (blocks cross-Origin and non-loopback Host headers to prevent CSRF/DNS-rebinding), and enriches healthz/readyz responses with service and pid fields consumed by the CLI.
  • A two-tier CI matrix (ubuntu/macos/windows native Go E2E + container smoke test) covers the OS-specific detach paths and freshly-installed binary behaviour.

Confidence Score: 5/5

Safe to merge. The daemon-control commands, shutdown guard, and ownership-verification logic are all well-tested, and the two prior review findings (Windows process-liveness and concurrent-start run-file clobber) are addressed.

All changed paths are covered by unit tests (root_test, stop_test, exitcode_test, control_test, server_test) and a cross-platform E2E suite. The shutdown guard, PID reuse protection, and stale run-file handling are each tested by dedicated cases. No logic errors were found in the production code paths.

No files require special attention. The one nit is in e2e_test.go's httpGet helper (single Read call), which is test code only and unlikely to cause flakes given the small response sizes.

Important Files Changed

Filename Overview
backend/internal/cli/e2e_test.go Comprehensive E2E suite covering lifecycle, stale run-file, exit codes, completions, shutdown guard, and idempotent start. httpGet uses a single Read call instead of io.ReadAll, which could produce a truncated body under load.
backend/internal/cli/start.go Implements daemon start with readiness polling. Correctly handles stale run-files, idempotent start for an already-ready daemon, and timeout propagation.
backend/internal/cli/stop.go Daemon shutdown via POST /shutdown with an ownership-verified gate before signalling. The concurrent-start race fix (only remove run-file when info.PID == stoppedPID) is in place and covered by the new stop_test.go.
backend/internal/cli/status.go Status inspection with layered state machine (stopped → stale → unhealthy → not_ready → ready). Uses service-name and PID cross-check from healthz/readyz to prevent mis-signalling a reused PID.
backend/internal/cli/process_windows.go Fixes the prior review finding: now uses WaitForSingleObject(handle, 0) instead of os.FindProcess to correctly distinguish live from exited processes on Windows.
backend/internal/httpd/router.go Adds POST /shutdown with a localControlRequest guard (rejects cross-origin and non-loopback Host). healthz/readyz now include service name and PID for ownership verification by the CLI.
backend/internal/daemon/daemon.go Extracts daemon startup from backend/main.go into internal/daemon. Explicit shutdown ordering (stop() → lcStack.Stop() → cdcPipe.Stop()) avoids the LIFO defer trap.
backend/internal/cli/root.go Root command factory with injectable Deps for testability. UsageError / ExitCode separation (exit 2 for misuse, exit 1 for runtime failures) is clean and tested.
backend/internal/cli/doctor.go Doctor checks config, data dir, SQLite file presence (without opening/migrating it), daemon state, and tool availability. Correctly avoids being a second store writer.
.github/workflows/cli-e2e.yml Two-tier CI: native Go E2E on ubuntu/macos/windows and a container smoke test on a slim image. fail-fast: false ensures all OS results are visible.

Sequence Diagram

sequenceDiagram
    participant User
    participant ao_start as ao start
    participant ao_stop as ao stop
    participant ao_status as ao status
    participant RunFile as running.json
    participant Daemon as ao daemon

    User->>ao_start: ao start
    ao_start->>RunFile: Read (inspect state)
    alt already ready
        RunFile-->>ao_start: PID+Port (live, owned)
        ao_start-->>User: daemon ready (pid, port)
    else stopped or stale
        ao_start->>RunFile: Remove (if stale)
        ao_start->>Daemon: spawn subprocess (detached)
        loop poll /readyz every 100ms
            ao_start->>Daemon: "GET /healthz -> verify service+pid"
            ao_start->>Daemon: "GET /readyz  -> verify service+pid"
        end
        Daemon->>RunFile: Write PID+Port
        ao_start-->>User: daemon ready (pid, port)
    end

    User->>ao_status: ao status [--json]
    ao_status->>RunFile: Read
    ao_status->>Daemon: GET /healthz (verify owner)
    ao_status->>Daemon: GET /readyz  (verify owner)
    ao_status-->>User: "state: ready | stale | unhealthy | not_ready | stopped"

    User->>ao_stop: ao stop
    ao_stop->>RunFile: Read + inspectDaemon
    ao_stop->>Daemon: POST /shutdown (localControlRequest gate)
    Daemon->>RunFile: Remove
    loop poll until run-file gone or PID dead
        ao_stop->>RunFile: Read
        ao_stop->>Daemon: ProcessAlive(pid)?
    end
    ao_stop-->>User: daemon stopped
Loading

Reviews (8): Last reviewed commit: "test(cli): port the E2E suite to cross-p..." | Re-trigger Greptile

Comment thread backend/internal/cli/process_windows.go
Comment thread docs/cli/README.md Outdated
Comment thread backend/internal/cli/completion.go Outdated
Comment thread backend/internal/cli/stop.go
illegalcall and others added 6 commits June 1, 2026 01:44
The shutdown endpoint test was authored against the pre-rebase
httpd.New(cfg, log) signature. After rebasing onto main, the terminal
manager (from #50) made termMgr a required third arg. Pass nil — the
test exercises /shutdown, not /mux, so the terminal surface stays off.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
… store

Addresses review findings on PR #53 (on top of the rebase onto main).

- doctor: stop opening/migrating SQLite. The daemon is the sole store
  writer/migrator (architecture.md §7); the CLI must not run migrations or
  open a second writer against a DB a live daemon owns. doctor now reports
  database-file presence and gains --json.
- stop: only remove running.json when it still belongs to the PID we
  stopped, so a concurrent `ao start` that wrote a new run-file is not
  clobbered into looking stopped.
- httpd: gate POST /shutdown to loopback callers with no Origin header,
  closing the CSRF / DNS-rebinding vector against an unauthenticated,
  state-changing endpoint.
- start: detach the spawned daemon into its own session/process group so a
  Ctrl-C while `ao start` waits for readiness doesn't also kill it.
- cli: exit 2 for usage errors (bad flag / arg count) vs 1 for runtime
  failures.
- daemon: unexport newLogger (only used in-package).
- tests: /shutdown guard (cross-origin + rebinding) and stop run-file
  ownership guard.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@AgentWrapper
Copy link
Copy Markdown
Contributor

Took ownership of this PR to get it green and address the review.

Rebase / conflict resolution. The branch is now rebased onto current main (the daemon refactor had forked before #50 terminal streaming and #52 Session Manager landed). The daemon startup in internal/daemon/daemon.go builds the tmux runtime + terminal Manager and starts the Session Manager, threading termMgr through httpd.New so the /mux WebSocket terminal surface and SM wiring are preserved. PR is now mergeable.

Review fixes (2d00e46):

  • doctor no longer opens/migrates SQLite — the daemon is the sole store writer/migrator (architecture.md §7), so the CLI must not run migrations or open a second writer against a DB a live daemon owns. It now reports database-file presence and gained --json.
  • stop only removes running.json when it still belongs to the stopped PID, so a concurrent ao start is not clobbered (greptile P1).
  • POST /shutdown is gated to loopback callers with no Origin header, closing the CSRF / DNS-rebinding vector on an unauthenticated, state-changing endpoint.
  • start detaches the spawned daemon into its own session/process group so a Ctrl-C while waiting for readiness doesn't kill it.
  • usage errors now exit 2 vs 1 for runtime failures; newLogger unexported.
  • Added tests for the /shutdown guard and the stop run-file ownership guard.

gofmt, go vet, and go test -race ./... all pass; verified the built binary end-to-end (start → status → stop, plus a cross-origin /shutdown returning 403 with the daemon surviving).

i-trytoohard and others added 5 commits June 1, 2026 02:33
Adds a fresh-machine, install→use→verify E2E test for the `ao` CLI and wires
it into CI. The suite drives the real binary (start/status/doctor/stop + the
daemon-control HTTP surface) against fully isolated state — its own temp
run-file, data dir, and an auto-picked free loopback port — so it never
collides with a developer's real AO install or daemon.

- test/cli/smoke.sh: 40 assertions covering install resolution, version/help
  (daemon hidden), doctor text+json (and that it does NOT migrate SQLite),
  status stopped/stale/ready, start fresh+idempotent, daemon-created store,
  /healthz identity, the /shutdown CSRF + DNS-rebinding guard (403 + survives),
  graceful/stale/idempotent stop, run-file ownership cleanup, exit codes
  (2 usage / 1 runtime), and completion for all four shells. It deliberately
  ignores an inherited AO_PORT and self-allocates a free port for isolation.
- test/cli/Dockerfile: models installing ao on a fresh machine — builds the
  binary, drops it on PATH in a clean Debian image with only runtime deps
  (git/tmux/curl), runs the suite as a non-root user.
- test/cli/run-local.sh: build-from-source + native run convenience wrapper.
- .github/workflows/cli-e2e.yml: two tiers — `native` runs the suite on a
  ubuntu+macos runner matrix (the real VMs, to cover the unix setsid detach and
  macOS os.UserConfigDir paths a Linux container can't), and `container` runs
  the fresh-machine Docker image with --init (real PID-1 reaper so the
  stale-daemon assertion is reliable).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The stale-daemon assertion does not depend on container PID-1 reaping — it
writes a fabricated dead PID rather than killing a real process. --init is
still run so the real daemon spawned by the `start` test (detached via setsid)
is reaped after `stop` instead of lingering as a zombie. Reword the README,
Dockerfile, and workflow comments to say that accurately.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Closes the one nit from the regression audit: the exit-code wiring was correct
and covered end-to-end by the smoke test, but not pinned by a unit test.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…tput

run-local.sh -v (or AO_SMOKE_VERBOSE=1) makes smoke.sh echo every command and
its complete output, indented, alongside the PASS/FAIL — for auditing exactly
what the suite runs and what the CLI returns. Default output is unchanged.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…arness

Replaces the growing bash smoke test with a Go os/exec suite behind the `e2e`
build tag (backend/internal/cli/e2e_test.go). It builds the real binary and
drives start/status/doctor/stop + the daemon-control HTTP surface against
isolated state (temp dir + OS-assigned free port), and now runs natively on
ubuntu + macOS + WINDOWS in CI — finally covering the Windows
CREATE_NEW_PROCESS_GROUP detach path and per-OS os.UserConfigDir resolution
that a Linux container can't observe. `go test -tags e2e -v` logs every command
and its output, replacing the bash -v flag.

- backend/internal/cli/e2e_test.go: 8 table-style TestE2E_* cases; strips any
  inherited AO_* env so a real daemon's AO_PORT can't leak in.
- test/cli/install-check.sh: small, linear fresh-install proof the Dockerfile
  runs (binary on PATH, no toolchain) — kept as the hardening tier.
- test/cli/Dockerfile: run install-check.sh instead of the full bash suite.
- .github/workflows/cli-e2e.yml: `native` is now a go test matrix over
  ubuntu+macos+windows; `container` builds the image and runs it with --init.
- Removes test/cli/smoke.sh and test/cli/run-local.sh (superseded by `go test`).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@AgentWrapper AgentWrapper merged commit 721888f into main May 31, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants