Skip to content

test: isolated onboarding harness for published @aoagents/ao#2049

Open
i-trytoohard wants to merge 2 commits into
mainfrom
test/onboarding-harness
Open

test: isolated onboarding harness for published @aoagents/ao#2049
i-trytoohard wants to merge 2 commits into
mainfrom
test/onboarding-harness

Conversation

@i-trytoohard
Copy link
Copy Markdown
Contributor

What

A local end-to-end onboarding test for a published @aoagents/ao npm version — npm install -gao start → dashboard serves → ao stop — kept completely isolated from any real ao install/runtime on the machine.

Deliverables

  • scripts/test-onboarding.sh — configurable: --version latest|nightly|exact, --mode fresh|coexist, --port (auto if omitted), --keep. Clear pass/fail, non-zero exit on failure.
  • skills/onboarding-test/SKILL.md (+ README entry) — any agent can onboarding-test its own PR/version; documents both modes, isolation guarantees, cleanup.

The GitHub Actions workflow was dropped at the maintainer's request — this is run locally / on demand. The SKILL documents the one-line CI step for wiring it up later. Per-PR source-build onboarding remains covered by the existing "Test Fresh Onboarding" job.

Isolation (all enforced by the script)

  • Temp npm --prefix first on PATH — real global modules untouched.
  • Sandbox HOME → separate ~/.agent-orchestrator; sandbox ao stop only ever sees the sandbox running.json.
  • Auto-allocated free dashboard + terminal WS ports; refuses reserved 3001/14800/14801.
  • runtime: process (no shared tmux server) — cannot see/kill real tmux sessions.
  • Throwaway git init repo; non-interactive (AO_CALLER_TYPE=agent + --no-orchestrator, no LLM key needed).
  • One temp root, removed on trap EXIT (success/failure/interrupt): graceful sandbox ao stop --all, kill recorded daemon pid tree, pkill -f <unique-temp-root> safety net, rm -rf.

coexist mode additionally snapshots the real running.json + tmux ls + listening ports before/after and fails loudly if anything moved.

Two modes

  • fresh — canonical clean onboarding (clean CI / new machine).
  • coexist — machine already runs ao; proves zero change to the real install.

Validation (run locally, both modes)

  • --version 0.8.0 --mode coexistPASS: dashboard /api/sessions 200, clean stop, and real ao state IDENTICAL before/after (running.json, tmux, ports).
  • --version 0.8.0 --mode freshPASS.
  • --version nightly --mode freshPASS.
  • Real daemon (pid 49919, port 3001, 5 tmux sessions) verified alive and unchanged after every run; no leftover temp dirs.

Bug the harness caught 🐛

--version latest (0.9.1) → FAIL: a fresh install returns HTTP 500 on the dashboard because the bundled agent-grok plugin throws Cannot find module '../package.json'. This is exactly what cdd1030d (#2040) fixes on main — but that fix is not in the published 0.9.1 release. The harness surfaced a real, ship-blocking onboarding regression in the current published package.

🤖 Generated with Claude Code

Adds a local end-to-end onboarding test for a PUBLISHED npm version of ao:
npm install -g → ao start → dashboard serves → ao stop, fully isolated
from any real ao install/runtime on the machine.

- scripts/test-onboarding.sh: configurable (--version latest|nightly|exact,
  --mode fresh|coexist, --port, --keep). Isolates via temp npm --prefix,
  sandbox HOME, auto free ports, runtime: process (no shared tmux), throwaway
  git repo, AO_CALLER_TYPE=agent, single temp root cleaned on trap EXIT.
  coexist mode snapshots real running.json + tmux + listening ports
  before/after and fails if perturbed.
- skills/onboarding-test/SKILL.md (+ README entry): lets any agent
  onboarding-test its own PR/version, documenting both modes, isolation
  guarantees, and cleanup.

Run locally / on demand (not gated per-PR — the published package, not the
PR's source, is what it exercises).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

Test Coverage Report

No TypeScript source files changed in this PR.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 23, 2026

Greptile Summary

This PR adds a local, on-demand end-to-end onboarding harness that installs a published @aoagents/ao npm version into an isolated temp prefix, starts the dashboard, verifies it serves, and stops cleanly — with an accompanying SKILL.md so agents can invoke it autonomously.

  • scripts/test-onboarding.sh: New script with two modes (fresh / coexist) and a clean isolation model (sandbox HOME, temp npm prefix, process runtime, throwaway git repo, single trap EXIT cleanup root).
  • skills/onboarding-test/SKILL.md + skills/README.md: Documents the harness, both modes, isolation guarantees, and how to test unmerged branches via verdaccio.

Confidence Score: 4/5

Safe to merge with a fix to the coexist lsof snapshot — the current implementation will produce spurious failures on any active development machine.

The coexist mode snapshots all listening TCP ports system-wide and diffs them before/after. On an active development machine — the explicit intended audience for coexist — any unrelated service binding or releasing a port during the test window will register as an isolation breach, making the mode unreliable in its primary use case.

scripts/test-onboarding.sh — specifically the lsof snapshot in snapshot_real and the diff -u comparison in the coexist block.

Important Files Changed

Filename Overview
scripts/test-onboarding.sh New E2E onboarding test harness. Solid isolation design; coexist mode's system-wide lsof snapshot will produce false positives from unrelated TCP port activity on busy machines.
skills/onboarding-test/SKILL.md New skill document describing the onboarding test harness — well-written, accurately reflects the script's behavior and isolation guarantees.
skills/README.md Adds a single table row registering the new onboarding-test skill — no issues.
Prompt To Fix All With AI
Fix the following 2 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 2
scripts/test-onboarding.sh:183-187
**`coexist` lsof snapshot will produce false positives from unrelated port activity**

The before/after diff covers every TCP LISTEN socket on the entire machine. On an active development machine — the exact use case `coexist` mode is designed for — any other service that binds or releases a port during the ~15–30 s test window (a restarting dev server, a Docker container, `npx` ephemeral installs, etc.) will change the snapshot and fail the check. The false positive cannot be distinguished from a real isolation breach. Filtering the `lsof` snapshot to only ao's specific ports would eliminate the false-positive surface while still catching any actual sandbox leakage.

### Issue 2 of 2
scripts/test-onboarding.sh:79-82
`--port` accepts any string but only guards against three literal reserved values. Passing a non-numeric value (e.g. `--port foo`) silently slips through validation and produces a cryptic failure deep inside `ao start` rather than a clear argument error at the entry point.

```suggestion
case "$MODE" in
  fresh|coexist) ;;
  *) echo "Invalid --mode: $MODE (want fresh|coexist)" >&2; exit 2 ;;
esac

if [ -n "$FIXED_PORT" ]; then
  case "$FIXED_PORT" in
    ''|*[!0-9]*) echo "--port must be a positive integer, got: $FIXED_PORT" >&2; exit 2 ;;
  esac
fi
```

Reviews (2): Last reviewed commit: "test: support testing a branch via --reg..." | Re-trigger Greptile

Comment on lines +151 to +153
free_port() {
node -e 'const n=require("net");const s=n.createServer();s.listen(0,"127.0.0.1",()=>{const p=s.address().port;s.close(()=>console.log(p));});'
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Port TOCTOU and missing duplicate check

free_port() asks the kernel for a free port, closes the server immediately, then returns the number. The port is now unbound — another process on a busy machine can claim it before ao start binds to it. More critically, the three sequential free_port() calls are not checked for duplicates: if the kernel recycles the just-freed port on the next call, PORT, TERM_PORT, and DIRECT_TERM_PORT could collide. ao start would then try to bind two identical ports, fail to start, and the test would produce a misleading "daemon never appeared" error rather than a clear allocation failure.

Prompt To Fix With AI
This is a comment left during a code review.
Path: scripts/test-onboarding.sh
Line: 151-153

Comment:
**Port TOCTOU and missing duplicate check**

`free_port()` asks the kernel for a free port, closes the server immediately, then returns the number. The port is now unbound — another process on a busy machine can claim it before `ao start` binds to it. More critically, the three sequential `free_port()` calls are not checked for duplicates: if the kernel recycles the just-freed port on the next call, `PORT`, `TERM_PORT`, and `DIRECT_TERM_PORT` could collide. `ao start` would then try to bind two identical ports, fail to start, and the test would produce a misleading "daemon never appeared" error rather than a clear allocation failure.

How can I resolve this? If you propose a fix, please make it concise.

Comment thread scripts/test-onboarding.sh
Comment on lines +324 to +333
step "Verifying clean shutdown"
for _ in $(seq 1 40); do
kill -0 "$RUNNING_PID" 2>/dev/null || break
sleep 0.5
done
kill -0 "$RUNNING_PID" 2>/dev/null && die "daemon pid $RUNNING_PID still alive after stop"
[ -f "$SANDBOX_RUNNING" ] && die "sandbox running.json still present after stop"
curl -sf "http://127.0.0.1:$PORT/api/sessions" >/dev/null 2>&1 \
&& die "dashboard still serving on $PORT after stop" || true
ok "Sandbox daemon stopped, running.json cleared, port released"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 kill -0 succeeds for zombie processes

After ao stop signals the daemon, the daemon can briefly enter the zombie state (exited but not yet reaped by its parent). kill -0 <zombie_pid> returns 0 on Linux until the parent calls wait(), so the poll loop runs all 40 iterations (20 s) before the final kill -0 check at line 329 can then pass or fail depending on reap timing. On fast machines the daemon's parent is init/systemd (since it daemonized), which reaps zombies quickly, but on slower machines this can add unexpected wall-clock time to the test.

Prompt To Fix With AI
This is a comment left during a code review.
Path: scripts/test-onboarding.sh
Line: 324-333

Comment:
**`kill -0` succeeds for zombie processes**

After `ao stop` signals the daemon, the daemon can briefly enter the zombie state (exited but not yet reaped by its parent). `kill -0 <zombie_pid>` returns 0 on Linux until the parent calls `wait()`, so the poll loop runs all 40 iterations (20 s) before the final `kill -0` check at line 329 can then pass or fail depending on reap timing. On fast machines the daemon's parent is `init`/`systemd` (since it daemonized), which reaps zombies quickly, but on slower machines this can add unexpected wall-clock time to the test.

How can I resolve this? If you propose a fix, please make it concise.

…route

Adds --registry <url> so an unmerged branch published to a throwaway local
registry (verdaccio) can be onboarding-tested with full packaging fidelity,
keeping every isolation guarantee. Documents that merged commits are already
covered by the auto-published per-commit nightly (--version 0.x.y-nightly-<sha>),
and why a source/npm-link build cannot reproduce packaging bugs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant