test: isolated onboarding harness for published @aoagents/ao#2049
test: isolated onboarding harness for published @aoagents/ao#2049i-trytoohard wants to merge 2 commits into
Conversation
Adds a local end-to-end onboarding test for a PUBLISHED npm version of ao: npm install -g → ao start → dashboard serves → ao stop, fully isolated from any real ao install/runtime on the machine. - scripts/test-onboarding.sh: configurable (--version latest|nightly|exact, --mode fresh|coexist, --port, --keep). Isolates via temp npm --prefix, sandbox HOME, auto free ports, runtime: process (no shared tmux), throwaway git repo, AO_CALLER_TYPE=agent, single temp root cleaned on trap EXIT. coexist mode snapshots real running.json + tmux + listening ports before/after and fails if perturbed. - skills/onboarding-test/SKILL.md (+ README entry): lets any agent onboarding-test its own PR/version, documenting both modes, isolation guarantees, and cleanup. Run locally / on demand (not gated per-PR — the published package, not the PR's source, is what it exercises). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Test Coverage ReportNo TypeScript source files changed in this PR. |
Greptile SummaryThis PR adds a local, on-demand end-to-end onboarding harness that installs a published
Confidence Score: 4/5Safe to merge with a fix to the coexist lsof snapshot — the current implementation will produce spurious failures on any active development machine. The coexist mode snapshots all listening TCP ports system-wide and diffs them before/after. On an active development machine — the explicit intended audience for coexist — any unrelated service binding or releasing a port during the test window will register as an isolation breach, making the mode unreliable in its primary use case. scripts/test-onboarding.sh — specifically the lsof snapshot in snapshot_real and the diff -u comparison in the coexist block.
|
| Filename | Overview |
|---|---|
| scripts/test-onboarding.sh | New E2E onboarding test harness. Solid isolation design; coexist mode's system-wide lsof snapshot will produce false positives from unrelated TCP port activity on busy machines. |
| skills/onboarding-test/SKILL.md | New skill document describing the onboarding test harness — well-written, accurately reflects the script's behavior and isolation guarantees. |
| skills/README.md | Adds a single table row registering the new onboarding-test skill — no issues. |
Prompt To Fix All With AI
Fix the following 2 code review issues. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 2
scripts/test-onboarding.sh:183-187
**`coexist` lsof snapshot will produce false positives from unrelated port activity**
The before/after diff covers every TCP LISTEN socket on the entire machine. On an active development machine — the exact use case `coexist` mode is designed for — any other service that binds or releases a port during the ~15–30 s test window (a restarting dev server, a Docker container, `npx` ephemeral installs, etc.) will change the snapshot and fail the check. The false positive cannot be distinguished from a real isolation breach. Filtering the `lsof` snapshot to only ao's specific ports would eliminate the false-positive surface while still catching any actual sandbox leakage.
### Issue 2 of 2
scripts/test-onboarding.sh:79-82
`--port` accepts any string but only guards against three literal reserved values. Passing a non-numeric value (e.g. `--port foo`) silently slips through validation and produces a cryptic failure deep inside `ao start` rather than a clear argument error at the entry point.
```suggestion
case "$MODE" in
fresh|coexist) ;;
*) echo "Invalid --mode: $MODE (want fresh|coexist)" >&2; exit 2 ;;
esac
if [ -n "$FIXED_PORT" ]; then
case "$FIXED_PORT" in
''|*[!0-9]*) echo "--port must be a positive integer, got: $FIXED_PORT" >&2; exit 2 ;;
esac
fi
```
Reviews (2): Last reviewed commit: "test: support testing a branch via --reg..." | Re-trigger Greptile
| free_port() { | ||
| node -e 'const n=require("net");const s=n.createServer();s.listen(0,"127.0.0.1",()=>{const p=s.address().port;s.close(()=>console.log(p));});' | ||
| } |
There was a problem hiding this comment.
Port TOCTOU and missing duplicate check
free_port() asks the kernel for a free port, closes the server immediately, then returns the number. The port is now unbound — another process on a busy machine can claim it before ao start binds to it. More critically, the three sequential free_port() calls are not checked for duplicates: if the kernel recycles the just-freed port on the next call, PORT, TERM_PORT, and DIRECT_TERM_PORT could collide. ao start would then try to bind two identical ports, fail to start, and the test would produce a misleading "daemon never appeared" error rather than a clear allocation failure.
Prompt To Fix With AI
This is a comment left during a code review.
Path: scripts/test-onboarding.sh
Line: 151-153
Comment:
**Port TOCTOU and missing duplicate check**
`free_port()` asks the kernel for a free port, closes the server immediately, then returns the number. The port is now unbound — another process on a busy machine can claim it before `ao start` binds to it. More critically, the three sequential `free_port()` calls are not checked for duplicates: if the kernel recycles the just-freed port on the next call, `PORT`, `TERM_PORT`, and `DIRECT_TERM_PORT` could collide. `ao start` would then try to bind two identical ports, fail to start, and the test would produce a misleading "daemon never appeared" error rather than a clear allocation failure.
How can I resolve this? If you propose a fix, please make it concise.| step "Verifying clean shutdown" | ||
| for _ in $(seq 1 40); do | ||
| kill -0 "$RUNNING_PID" 2>/dev/null || break | ||
| sleep 0.5 | ||
| done | ||
| kill -0 "$RUNNING_PID" 2>/dev/null && die "daemon pid $RUNNING_PID still alive after stop" | ||
| [ -f "$SANDBOX_RUNNING" ] && die "sandbox running.json still present after stop" | ||
| curl -sf "http://127.0.0.1:$PORT/api/sessions" >/dev/null 2>&1 \ | ||
| && die "dashboard still serving on $PORT after stop" || true | ||
| ok "Sandbox daemon stopped, running.json cleared, port released" |
There was a problem hiding this comment.
kill -0 succeeds for zombie processes
After ao stop signals the daemon, the daemon can briefly enter the zombie state (exited but not yet reaped by its parent). kill -0 <zombie_pid> returns 0 on Linux until the parent calls wait(), so the poll loop runs all 40 iterations (20 s) before the final kill -0 check at line 329 can then pass or fail depending on reap timing. On fast machines the daemon's parent is init/systemd (since it daemonized), which reaps zombies quickly, but on slower machines this can add unexpected wall-clock time to the test.
Prompt To Fix With AI
This is a comment left during a code review.
Path: scripts/test-onboarding.sh
Line: 324-333
Comment:
**`kill -0` succeeds for zombie processes**
After `ao stop` signals the daemon, the daemon can briefly enter the zombie state (exited but not yet reaped by its parent). `kill -0 <zombie_pid>` returns 0 on Linux until the parent calls `wait()`, so the poll loop runs all 40 iterations (20 s) before the final `kill -0` check at line 329 can then pass or fail depending on reap timing. On fast machines the daemon's parent is `init`/`systemd` (since it daemonized), which reaps zombies quickly, but on slower machines this can add unexpected wall-clock time to the test.
How can I resolve this? If you propose a fix, please make it concise.…route Adds --registry <url> so an unmerged branch published to a throwaway local registry (verdaccio) can be onboarding-tested with full packaging fidelity, keeping every isolation guarantee. Documents that merged commits are already covered by the auto-published per-commit nightly (--version 0.x.y-nightly-<sha>), and why a source/npm-link build cannot reproduce packaging bugs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
What
A local end-to-end onboarding test for a published
@aoagents/aonpm version —npm install -g→ao start→ dashboard serves →ao stop— kept completely isolated from any real ao install/runtime on the machine.Deliverables
scripts/test-onboarding.sh— configurable:--version latest|nightly|exact,--mode fresh|coexist,--port(auto if omitted),--keep. Clear pass/fail, non-zero exit on failure.skills/onboarding-test/SKILL.md(+ README entry) — any agent can onboarding-test its own PR/version; documents both modes, isolation guarantees, cleanup.Isolation (all enforced by the script)
--prefixfirst onPATH— real global modules untouched.HOME→ separate~/.agent-orchestrator; sandboxao stoponly ever sees the sandboxrunning.json.3001/14800/14801.runtime: process(no shared tmux server) — cannot see/kill real tmux sessions.git initrepo; non-interactive (AO_CALLER_TYPE=agent+--no-orchestrator, no LLM key needed).trap EXIT(success/failure/interrupt): graceful sandboxao stop --all, kill recorded daemon pid tree,pkill -f <unique-temp-root>safety net,rm -rf.coexistmode additionally snapshots the realrunning.json+tmux ls+ listening ports before/after and fails loudly if anything moved.Two modes
Validation (run locally, both modes)
--version 0.8.0 --mode coexist→ PASS: dashboard/api/sessions200, clean stop, and real ao state IDENTICAL before/after (running.json, tmux, ports).--version 0.8.0 --mode fresh→ PASS.--version nightly --mode fresh→ PASS.Bug the harness caught 🐛
--version latest(0.9.1) → FAIL: a fresh install returns HTTP 500 on the dashboard because the bundledagent-grokplugin throwsCannot find module '../package.json'. This is exactly whatcdd1030d (#2040)fixes onmain— but that fix is not in the published 0.9.1 release. The harness surfaced a real, ship-blocking onboarding regression in the current published package.🤖 Generated with Claude Code