Skip to content

Worker orchestrator smoke test: message flow works, CLI PATH and zellij test gaps remain #89

@harshitsinghbhandari

Description

@harshitsinghbhandari

This was generated by AI during triage.

Summary

Manual smoke testing of feat/worker-orchestrator confirmed the core worker-orchestrator flow works when launched with the local built daemon binary, but exposed two repo issues that should be fixed before treating the flow as fully shippable:

  • spawned agents resolve ao from the user's PATH, which can point at an older installed CLI whose syntax does not match the generated prompts;
  • zellij integration tests can fail on macOS because generated socket paths exceed zellij's IPC path limit.

Checkout Tested

cd /Users/harshitsinghbhandari/.agent-orchestrator/projects/ao-agents_af10fe8c96/worktrees/aa-47
git switch feat/worker-orchestrator
cd backend
go build -o /tmp/ao ./cmd/ao

The requested aa-47 worktree did not exist initially, so it was created with:

git worktree add /Users/harshitsinghbhandari/.agent-orchestrator/projects/ao-agents_af10fe8c96/worktrees/aa-47 feat/worker-orchestrator

Isolated Daemon State Used

export AO_PORT=3131
export AO_RUN_FILE=/tmp/ao-worker-orch-running.json
export AO_DATA_DIR=/tmp/ao-worker-orch-data
export AO_AGENT=codex

rm -rf "$AO_DATA_DIR" "$AO_RUN_FILE"

Daemon startup worked:

/tmp/ao start --log-file /tmp/ao-worker-orch.log
/tmp/ao status

Observed status included:

AO daemon: ready
  pid: 31911
  port: 3131
  run file: /tmp/ao-worker-orch-running.json
  data dir: /tmp/ao-worker-orch-data
  healthz: ok
  readyz: ready

Commands That Worked

Project registration worked:

/tmp/ao project add \
  --path /Users/harshitsinghbhandari/.agent-orchestrator/projects/ao-agents_af10fe8c96/worktrees/aa-47 \
  --id testao

Observed:

registered project testao at /Users/harshitsinghbhandari/.agent-orchestrator/projects/ao-agents_af10fe8c96/worktrees/aa-47

Orchestrator creation via API worked:

curl -sS -X POST "http://127.0.0.1:3131/api/v1/orchestrators" \
  -H 'Content-Type: application/json' \
  -d '{"projectId":"testao"}' | jq

Observed:

{
  "orchestrator": {
    "id": "testao-1",
    "projectId": "testao"
  }
}

Worker spawn worked:

/tmp/ao spawn \
  --project testao \
  --prompt "Create a tiny test file named ORCH_TEST.md saying hello from worker"

Observed:

spawned session testao-2 (idle)
attach with: ZELLIJ_SOCKET_DIR=/tmp/ao-zellij-501 zellij attach testao-2

Manual worker-to-orchestrator send worked:

AO_SESSION_ID=testao-2 /tmp/ao send \
  --session testao-1 \
  --message "I am blocked; please coordinate"

The orchestrator pane visibly received:

[from testao-2] I am blocked; please coordinate

The worker completed the requested test task and created:

ORCH_TEST.md

with content:

hello from worker

Prompt Verification

Using zellij screen dumps / pane metadata:

ZELLIJ_SOCKET_DIR=/tmp/ao-zellij-501 zellij --session testao-1 action dump-screen --pane-id terminal_0 --full
ZELLIJ_SOCKET_DIR=/tmp/ao-zellij-501 zellij --session testao-2 action dump-screen --pane-id terminal_0 --full
ZELLIJ_SOCKET_DIR=/tmp/ao-zellij-501 zellij --session testao-1 action list-panes --json --all --command --state --tab
ZELLIJ_SOCKET_DIR=/tmp/ao-zellij-501 zellij --session testao-2 action list-panes --json --all --command --state --tab

Confirmed:

  • testao-1 prompt says it is the coordinator and should spawn/message workers.
  • testao-2 prompt includes the orchestrator coordination block.
  • Both runtime commands export AO_SESSION_ID.
  • Worker prompt includes:
An active orchestrator session exists for this project.

and:

ao send --session testao-1 --message "<your message>"

Issue 1: Spawned Agents Can Resolve The Wrong ao

Inside spawned Codex panes, plain ao resolved to the installed Homebrew binary:

command -v ao
ao --version

Observed:

/opt/homebrew/bin/ao
0.9.2

That older CLI does not support the new prompt syntax. When the orchestrator attempted to message the worker using the prompt's command:

ao send --session testao-2 --message "..."

it failed with:

error: unknown option '--session'
(Did you mean --version?)

The local built binary did support the expected syntax:

/tmp/ao send --help

showed:

Flags:
  --message string   Message body (required)
  --session string   Session id (required)

Expected

Spawned workers and orchestrators should use the AO CLI matching the daemon/branch under test, or the prompt should invoke an explicit configured AO CLI path.

Actual

The generated prompts hardcode plain ao, so spawned agents can pick up an unrelated installed version from PATH.

Likely Fix Direction

Add an explicit AO CLI path to runtime config/env/prompt generation, or ensure the runtime launch environment puts the daemon-matching AO binary ahead of any globally installed ao.

Issue 2: PATH Prefixing A Test Binary Made Orchestrator Creation Fail

Attempts to start the daemon with a PATH prefix pointing at the freshly built binary caused orchestrator creation to fail:

PATH=/tmp:$PATH AO_PORT=3131 AO_RUN_FILE=/tmp/ao-worker-orch-running.json AO_DATA_DIR=/tmp/ao-worker-orch-data AO_AGENT=codex /tmp/ao start --log-file /tmp/ao-worker-orch.log

and later with a repo-local binary dir:

PATH=/Users/harshitsinghbhandari/.agent-orchestrator/projects/ao-agents_af10fe8c96/worktrees/aa-47/.ao-test-bin:$PATH .../.ao-test-bin/ao start --log-file /tmp/ao-worker-orch.log

Both resulted in the orchestrator API returning:

{
  "error": "internal",
  "code": "SESSION_OPERATION_FAILED",
  "message": "Session operation failed"
}

GET /api/v1/sessions then showed testao-1 inserted but immediately terminated:

{
  "id": "testao-1",
  "projectId": "testao",
  "kind": "orchestrator",
  "activity": { "state": "exited" },
  "isTerminated": true,
  "status": "terminated"
}

The daemon log only showed the 500 response, without enough diagnostic detail to identify why the runtime launch exited.

Expected

Putting a branch-local AO binary earlier in PATH should not cause the spawned Codex session to exit during orchestrator creation.

Actual

The session exits immediately, and the API returns only a generic operation failure.

Likely Fix Direction

Improve runtime/spawn error logging around agent process startup and determine whether the injected PATH affects Codex launch, hook startup, or zellij layout execution.

Issue 3: Zellij Test Path Length Failure On macOS

Running the repo validation normally failed in backend/internal/terminal:

npm run lint

Failure:

--- FAIL: TestSessionStreamsRealZellijPane
Create: zellij runtime: create session ao-term-it-TestSessionStreamsRealZellijPane: exit status 1: Error: the IPC socket path is too long (162 bytes, max 103)

Even with shorter temp roots:

TMPDIR=/tmp/ao-test-tmp ZELLIJ_SOCKET_DIR=/tmp/ao-zellij-test npm run lint

it still failed:

IPC socket path is too long (130 bytes, max 103)
/tmp/ao-test-tmp/ao-term-it-TestSessionStreamsRealZellijPane-socket/contract_version_1/ao-term-it-TestSessionStreamsRealZellijPane

The test constructs a long session/socket name from t.Name(), and zellij repeats the session name in its IPC path.

Expected

The zellij integration test should use a short socket/session name that fits zellij's IPC path limit on macOS.

Actual

The test fails before exercising terminal behavior due to path length.

Likely Fix Direction

Shorten the generated test session name/socket directory in backend/internal/terminal/session_integration_test.go, or skip with a clear message when the computed zellij socket path would exceed the platform limit.

Validation Results

With zellij hidden from PATH so zellij integration tests skip, backend tests and golangci-lint passed after cleaning stale golangci-lint cache:

go run github.com/golangci/golangci-lint/v2/cmd/golangci-lint@v2.12.2 cache clean
PATH=/tmp/ao-no-zellij-bin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin npm run lint

Observed:

ok   github.com/aoagents/agent-orchestrator/backend/internal/terminal
0 issues.

Frontend typecheck initially failed because frontend/node_modules was missing:

Cannot find module 'electron'
Cannot find name 'process'

After installing dependencies:

cd frontend
npm ci
cd ..
npm run frontend:typecheck

it passed:

tsc --noEmit

Note: npm ci reported one high severity npm audit issue.

Cleanup Performed

/tmp/ao stop
rm -rf /tmp/ao-worker-orch-data /tmp/ao-worker-orch-running.json /tmp/ao-worker-orch.log
ZELLIJ_SOCKET_DIR=/tmp/ao-zellij-501 zellij delete-session --force testao-1
ZELLIJ_SOCKET_DIR=/tmp/ao-zellij-501 zellij delete-session --force testao-2

The aa-47 worktree remains and is clean.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingneeds-triageMaintainer needs to evaluate this issue

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions