Conversation
On cloud providers (daytona/hetzner/vercel/e2b) the bare `agentbox
claude`/`codex`/`opencode` commands and `agentbox fork` route through
`cloudAgentCreate`, which on `--no-attach` returned before the agent's
tmux session was ever created. The cloud session is created lazily by the
attach step, so skipping attach skipped the agent entirely — the box came
up with no agent running, contradicting the documented behavior ("create
the box and start the agent session, but do not attach"). Docker was
unaffected: it creates the session before the attach check.
Fix `cloudAgentCreate` to call `cloudAgentStartDetached` in the
`attach === false` branch — the same helper the `-i` queue worker uses,
which starts a detached tmux session and verifies it stayed up (fail-loud
on immediate exit / credential rejection). The `<agent> start` subcommand
cloud branches did the same lazy no-op (printing "started lazily on
attach"); they now resolve args first, then start the detached session in
background mode. Cloud now matches docker on every path.
Verified end-to-end: docker `--no-attach` still starts the session
(regression guard), and e2b `--no-attach` now brings up a live, logged-in
claude tmux session where before it created an empty box.
Claude-Session: https://claude.ai/code/session_01PTY4KwAeZdAVvgSWxjpYfs
cloudAgentStartDetached launched the agent with raw extraArgs only, so a background `agentbox <agent> start <box> --no-attach` (and idle-resumed creates) started a fresh agent instead of resuming the box's recorded claude/codex session — the interactive cloudAgentAttach path applies agentResumeArgs when args are empty, but the detached path did not. Apply the same resume-args resolution in cloudAgentStartDetached so the detached path is symmetric with attach. The `-i` queue path always seeds a prompt (non-empty extraArgs), so this no-ops there. Found by Cursor Bugbot on PR #116. Claude-Session: https://claude.ai/code/session_01PTY4KwAeZdAVvgSWxjpYfs
…te_*.sqlite We stopped seeding Codex's state_*.sqlite index (commit 2eaf2b4), so Codex now creates it at startup instead of receiving it pre-uploaded. The create failed with a permission error because the directory wasn't owned by vscode (the user the agent runs as). Two distinct ownership defects: 1. The agent home dirs (~/.codex, ~/.claude, ~/.local/share/opencode) were not reliably vscode-owned in cloud templates (E2B's base image ships a `node` user; the root `npm install -g @openai/codex` bake step left ~/.codex as node:node). This breaks even a plain `agentbox codex` start. Fixed with a cheap, idempotent create-time chown (ensureAgentHomeDirsOwned) — no re-bake. 2. The upload primitives only chowned the final landed path, not the parent directory chain they mkdir -p'd as root. Session-teleport lands a rollout at ~/.codex/sessions/YYYY/MM/DD/, leaving that chain root-owned so Codex can't write a new rollout / its sqlite index. Mirror the carry.ts parent-walk fix in both upload primitives (cloud-cp.ts + docker box-cp.ts), gated on the dest being under /home/vscode. Chowns are name/id-derived (vscode / id -un), not hardcoded 1000, since the vscode uid varies per provider (docker/hetzner=1000, vercel=1001, e2b=1002). Claude-Session: https://claude.ai/code/session_0152GmbNW3e7QpXNkQFd3MB2
Bugbot: when an upload's resolved finalPath was exactly /home/vscode, the `=== BOX_HOME` branch of the gate let the parent walk run with dirname=/home, reassigning /home itself to the agent user. Gate strictly on `startsWith(BOX_HOME + '/')` (a trailing segment), matching carry.ts. Applies to both cloud-cp.ts and docker box-cp.ts; adds a regression test. Claude-Session: https://claude.ai/code/session_0152GmbNW3e7QpXNkQFd3MB2
…SH (#118) * feat(cli): add `agentbox shell --ssh-config` for Codex / Claude desktop SSH Write a `~/.ssh/config` alias on demand so external apps (the Codex app, Claude desktop) can connect to a box over plain SSH, and surface the identity-file path + a Codex deep link. - New `agentbox shell <box> --ssh-config` (+ `--json`): brings the box online, writes the alias, prints alias/host/user/identity + the `codex://settings/connections/ssh/add?name=<alias>` link and Claude desktop instructions. Gated to providers with a persistent per-box key (Hetzner) — Docker/Daytona/Vercel/E2B exit cleanly without writing. - Extract the shared bring-online → buildAttach → parseSshTarget → write alias flow into `cloud-ssh.ts` (`resolveCloudSshTarget` / `ensureCloudSshAlias`); reuse it from `code` and `open`. - Alias is now the box name (clean `ssh <box>` + Codex `name=` param); add `readAgentboxSshAlias` and surface `ssh alias` / `ssh identity` in cloud `inspect`. - Document the flow in the agentbox-info and fork skills. Claude-Session: https://claude.ai/code/session_011VoAz7mUaUGh6dKAvr7kAP * fix(cli): address Cursor Bugbot findings on SSH-config - Check SSH support (`buildAttach`) before any lifecycle action so an unsupported box (e.g. a stopped Docker box) errors without being started; add a `bringOnline` option to skip a redundant lifecycle pass. - `agentbox code` now passes `bringOnline: false` (it already brings the box online + waits) — removes the duplicate resume/start. - Migrate away legacy `agentbox-cloud-<box>` blocks on write/remove so the box-name rename doesn't leave stale Host entries behind. - Warn (don't fail) when `~/.ssh/config` already has a user-authored `Host <box>` that could shadow agentbox's entry. - Tests for legacy-block migration and conflict detection. Claude-Session: https://claude.ai/code/session_011VoAz7mUaUGh6dKAvr7kAP
…erride.md The box "system prompt" baked at /etc/claude-code/CLAUDE.md (sandbox facts: DinD, per-box worktree, push/PR/cp via the host relay, identity in /etc/agentbox/box.env) previously only reached Claude. Codex got none of it. Codex loads a global personal-instructions file from CODEX_HOME, first-match of ~/.codex/AGENTS.override.md then ~/.codex/AGENTS.md (no concat, no @import). At create time we now regenerate ~/.codex/AGENTS.override.md = sentinel + box facts (read fresh) + the user's own AGENTS.md / authored override folded in beneath, so the in-box Codex agent reads the same facts. A line-1 sentinel makes it idempotent and preserves user content (the host ~/.codex is re-synced before each seed, restoring the source). No-op when the facts file is absent. One shared generator (buildCodexAgentsOverrideScript) drives both paths: - docker: seedCodexAgentsOverride() seeds the codex-config volume, called after seedCodexHooks() in create.ts (post host rsync). - cloud (daytona/hetzner/vercel/e2b): ensureCodexAgentsOverride() runs the same script in-box via backend.exec, wired into cloud-provider.ts. Verified: codex debug prompt-input shows the box facts in Codex's model-visible prompt; compose + authored-override + facts-only + no-op cases all check out. Claude-Session: https://claude.ai/code/session_01PTY4KwAeZdAVvgSWxjpYfs
Cloud backends signal script failure via a non-zero CloudExecResult.exitCode
rather than throwing, so the prior try/await/log('seeded') reported success even
when the `set -e` script aborted (perms, missing paths) — a box could boot
without box facts while create logs looked healthy. Read the exitCode and log
the failure (still best-effort, never fails create). Found by Cursor Bugbot.
Claude-Session: https://claude.ai/code/session_01PTY4KwAeZdAVvgSWxjpYfs
…ritten The shared generator exits 0 in the no-op case (box-facts file absent) without writing the override, so a bare exitCode===0 check logged a false "seeded" on the cloud path. Move the success signal into the script as a stdout marker (CODEX_OVERRIDE_WROTE_MARKER) printed only after the write; both docker and cloud now key their "seeded" log off the marker, and cloud logs an explicit "skipped: box-facts file absent" otherwise. Found by Cursor Bugbot. Claude-Session: https://claude.ai/code/session_01PTY4KwAeZdAVvgSWxjpYfs
…lhost works When the host Portless proxy runs in TLS mode, the symmetric <box>.localhost URL is served inside the box by a self-signed CA the box doesn't trust: - hetzner: the in-VPS mirror's own CA at /root/.portless/ca.pem - docker: the host CA bind-mounted at /home/vscode/.portless/ca.pem portless proxy start only trusts the CA in the Linux system store, not the box user's NSS db, so the in-box VNC Chromium window and Playwright (via Codex) rejected the cert with an HTTPS error. Fix by trusting the CA across every in-box client: - New baked helper agentbox-portless-trust: installs a CA into the system store (update-ca-certificates) + the box user's NSS db (certutil), idempotent and best-effort, prints the system CA path for NODE_EXTRA_CA_CERTS. - Bake libnss3-tools (certutil) into the hetzner snapshot + docker base image. - hetzner startInBoxPortless: when tls, run the helper on /root/.portless/ca.pem and export NODE_EXTRA_CA_CERTS via /etc/profile.d. - docker create: when the resolved portless URL is https, run the helper on the bind-mounted host CA + drop the same profile.d export. No-TLS host proxies (the --no-tls -p 1355 default) serve plain http and skip this entirely. Requires a snapshot re-bake / docker image rebuild to pick up libnss3-tools + the helper. Claude-Session: https://claude.ai/code/session_0152GmbNW3e7QpXNkQFd3MB2
…ugin Installing agentbox left the Codex plugin (the `agentbox`/`agentbox-info` skills) a manual chore: `codex plugin marketplace add` + `codex plugin add`, then a toggle in the Codex app. Automate all three. New `install-codex.ts` (mirrors install-herdr.ts), gated on Codex being present (~/.codex + `codex` on PATH), best-effort so it never aborts `agentbox install`: - `codex plugin marketplace add madarco/agentbox` + `codex plugin add agentbox@agentbox` - enable by default: append `[plugins."agentbox@agentbox"] enabled = true` to ~/.codex/config.toml when no such table exists. Codex has no `plugin enable` CLI — this is the same key the TUI toggle writes. Robustness: - Skips the network re-add when already installed (codex plugin add re-enables a deliberately-disabled plugin), so a user's explicit disable is respected; --force bypasses to re-enable. - The config write only appends when the key is absent — never duplicates the table (a second table is a TOML parse error) and respects enabled = false. Wired into the `agentbox install` wizard (runs when Codex is detected) and a standalone `agentbox install codex`. Adds smol-toml to apps/cli (read-only parse for the presence check). Unit tests cover the upsert/respect-disable logic; e2e verified against codex 0.142 (fresh -> enabled, disabled re-run -> stays off, --force -> re-enabled). Docs: cli.mdx + plugin README. Claude-Session: https://claude.ai/code/session_01PTY4KwAeZdAVvgSWxjpYfs
Addresses the CI failure and Cursor Bugbot findings on the enable step: - commands.test.ts asserted the exact `install` subcommand list — add `codex`. - The managed-block approach conflicted with the `[plugins."agentbox@agentbox"]` table Codex itself writes (`plugin add` / TUI toggle): a second table is a TOML parse error, and the strip+regenerate path could (a) leave a stale duplicate unwritten and (b) override a disable set inside the block. Replace it: parse config.toml read-only and - append a plain `[plugins."agentbox@agentbox"] enabled = true` table only when the key is ABSENT (Codex usually writes it itself on `plugin add`); - respect a present value (enabled defaults true; only explicit `false` is off); - with --force, flip a disabled entry to true via a targeted in-place line edit that preserves the rest of the file (comments/order/formatting). Always write when the text changed (fixes the "cleanup not written" gap). E2E re-verified vs codex 0.142: fresh -> enabled; disabled re-run -> stays off with comments preserved; --force -> re-enabled, file still single-table valid TOML. Claude-Session: https://claude.ai/code/session_01PTY4KwAeZdAVvgSWxjpYfs
From a source checkout, `agentbox install codex` now points the Codex marketplace at the local repo (`source_type = "local"`) instead of the `madarco/agentbox` GitHub slug, re-syncs the bundle skill copy, stages the plugin from the working tree, then symlinks the staged per-skill SKILL.md back to the repo so skill edits go live on the next Codex restart. A published npm install is unaffected (no `plugins/`/`.agents/` ships, so it always uses GitHub). - New `apps/cli/src/lib/source-checkout.ts`: shared `isSourceCheckout`, `resolveHostSkillsDir`, and `resolveDevRepoRoot` (repo root only in a checkout carrying the Codex sources, else null). `install.ts` now imports these. - `--no-dev` forces the published GitHub path even inside a checkout; a source-type conflict (local <-> git) is handled by remove-then-add. - Symlink the skill *files* only — a whole-dir symlink makes Codex report the plugin "not installed". - Tests for the source-checkout helpers and `marketplaceSource`; docs in docs/development.md. Claude-Session: https://claude.ai/code/session_01An9tT8HqjQoGKKVWuYg4bb
…mpts Tell the in-box agent that `agentbox.yaml` services/containers start automatically (check with `agentbox-ctl status`) and that web services are reachable at https://<AGENTBOX_BOX_HOST> from both box and host. Wording is tailored per provider (local proxy for docker, cloud URL for the cloud ones). Claude-Session: https://claude.ai/code/session_01An9tT8HqjQoGKKVWuYg4bb
- Attempt the dev skill symlink whenever a staged dir exists, not only when `codex plugin add` exits 0: a non-zero exit is treated as "already installed" and continues, leaving a staged cache we still want live-symlinked. - Never symlink into a non-existent manifest-version path: if the manifest version dir is absent and the cache doesn't hold exactly one child, bail with "staged plugin dir not found" instead of mkdir-ing an orphan tree Codex ignores. Claude-Session: https://claude.ai/code/session_01An9tT8HqjQoGKKVWuYg4bb
…r on failure The Claude native installer (`curl https://claude.ai/install.sh | bash`) can get an intermittent HTTP 403 from Cloudflare on cloud-datacenter egress IPs (Hetzner among them) under load. The bare `curl | bash` masked it (pipeline exit = bash's 0), so a 403 silently baked a snapshot with no `claude` — boxes from it had no agent, so the in-box tmux session died instantly and `attach` crash-looped on "no server running on /tmp/tmux-1000/default". Replace the masked install with a `retry_backoff` helper: retry the native installer 3x with 60s then 240s backoff (~5 min), keep `set -o pipefail`, and fold `command -v claude` into the retried command so a "succeeded but absent" result also retries. If all attempts fail, abort the bake (exit 71) rather than ship a claude-less snapshot. Applied to hetzner / vercel / e2b bake scripts and the docker Dockerfile (sh/dash plain-loop variant). No npm fallback: `npm install -g @anthropic-ai/claude-code` lacks native-only features and lands at /usr/bin/claude, mismatching the host-seeded installMethod=native and tripping Claude Code's "missing or broken" doctor warning. `prepareHetzner` now special-cases exit 71 with an actionable message (the generic one showed empty stderr because the install runs `bash -x ... 2>&1 | tee`, merging stderr into stdout). Known gap (deferred): the 403 can outlast the ~5-min retry window, so prepare can still fail and need a manual re-run. The validated reliable fix (host-proxies the native binary download, places it at ~/.local/bin/claude) is documented in docs/hertzner_backlog.md as a follow-up. Claude-Session: https://claude.ai/code/session_019m5WHxP4vmsoXaHUhQdY9e
fix(hetzner): retry Claude native installer with backoff + clear error
Adding a box as a remote host (Codex app, VS Code Remote-SSH, plain `ssh <box>`) opened the session in /home/vscode instead of the project at /workspace. SSH config has no start-directory directive, and the only client-side option (RemoteCommand) would break scp/sftp and VS Code's remote bootstrap on the same shared alias. So fix it server-side: make interactive login shells cd into /workspace via the existing /etc/profile.d/agentbox.sh shim every box installs. Guarded to interactive shells only (scp/sftp and `ssh box <cmd>` untouched) and only when still at $HOME (never overrides a caller-chosen dir, e.g. agentbox's own tmux `-c /workspace`). Applied to all provider shims for consistency: hetzner, vercel, e2b, and the canonical docker Dockerfile.box. Claude-Session: https://claude.ai/code/session_01An9tT8HqjQoGKKVWuYg4bb
feat(box): land interactive SSH/login shells in /workspace
The host `~/.codex` is ~1.1 GB and was being synced into boxes almost whole: ~485 MB of macOS aarch64 standalone release binaries (`packages/`), a ~238 MB plugin app-server runtime (`plugins/.plugin-appserver`), the macOS `Codex Computer Use.app` bundle (`computer-use/`), host session archives, and regenerable caches (`.tmp` ~213 MB, `tmp`, `cache`, `vendor_imports`, `sqlite`, `models_cache.json`). None of it is usable in a Linux box — the in-box codex is npm-installed and rebuilds these caches on demand. Exclude all of it from both codex staging paths: - `CODEX_RSYNC_EXCLUDES` (host-stage.ts) — the cloud bake path (hetzner/vercel/ e2b/daytona `stageCodexStaticForUpload`). Dry-run: staged tarball 820 MB -> 482 KB. - the docker `agentbox-codex-config` volume rsync (codex.ts), plus its `rm -rf` purge so existing shared volumes get cleaned on the next sync. Verified live: a fresh docker box's `~/.codex` dropped 1.5 GB -> 59 MB and `codex exec` still returns a real turn. Config / auth / skills / prompts / rules / memories / plugins are still synced, so codex keeps working — just without the host-only ballast. Claude-Session: https://claude.ai/code/session_019m5WHxP4vmsoXaHUhQdY9e
fix(codex): drop heavy host-only artifacts from codex config staging
Cloud boxes (hetzner/daytona/vercel/e2b) have no global env primitive, so the in-box agent — launched via a tmux login shell — only saw the relay token through /etc/agentbox/box.env. Commit b9e4ebf made the ctl daemon overwrite box.env without the token (correctly, to keep secrets out of a 0644 file), which severed the agent's only channel: `agentbox-ctl git push` failed with "no relay configured". Persist the per-box relay URL + token to a 0600 /run/agentbox/relay.env (tmpfs, never snapshotted) written by the daemon once it validates its own token, and have agentbox-ctl's relay clients (postRpc, RelayClient) fall back to it when env is absent. agentbox-ctl is the only in-box relay consumer, so a single on-demand file read fixes every path — the agent and the host-driven `agentbox git push` on all backends — without spraying the token into every login shell's env. The bridge token stays daemon-only. Also stop hetzner cloud-init from copying the relay/bridge tokens into the 0644 box.env (they now travel via relay.env / the daemon process env). Claude-Session: https://claude.ai/code/session_01SAturA5Fs2XHzzondT6DDv
Correct environment.mdx's docker-only box.env claim, and document in host-relay.md / cloud-providers.md / the hetzner+vercel backlogs that the cloud relay token now reaches agentbox-ctl via a 0600 /run/agentbox/relay.env (read by resolveRelayEnv), not login-shell env — guarding the b9e4ebf regression. Bridge token stays daemon-only. Claude-Session: https://claude.ai/code/session_01SAturA5Fs2XHzzondT6DDv
fix(cloud): restore relay token for in-box agent via 0600 relay.env
A box's host-side state (the relay's in-memory registry + CloudBoxPoller,
the Hetzner SSH ControlMaster + port forwards, the host Portless aliases,
the detached agent tmux session) is separate from the box and is lost on a
host reboot / relay restart / new CLI process while the sandbox keeps
running. `start`/`unpause` only fix this by power-cycling the box and can't
touch a box missing from local state at all.
`agentbox recover [box]`:
- ensures the host relay is up and rehydrates every box into it,
- calls the new `Provider.reconnect(box)` — the no-power-cycle sibling of
`start`: cloud re-runs `reEnsureCloudBox` (refresh preview URLs, re-open
the Hetzner tunnel, re-register Portless + the relay poller, relaunch
in-box daemons) without `backend.start`; docker re-runs the idempotent
`startBox`,
- relaunches the agent the box was running (resuming, or starting
`box.lastAgent` fresh) and attaches.
Adds `BoxRecord.lastAgent` (claude/codex/opencode), written on every agent
launch (foreground + queued via `recordLastAgent`) — durable, unlike the
in-box session pointers cleared on stop, so recover knows which agent to
bring back.
`recover --provider <cloud> --adopt [ref]` rebuilds local state for a
sandbox missing from this host (from `backend.list()` + the agentbox.name
tag), minting fresh relay/bridge tokens that reach the in-box agent when
reconnect relaunches the ctl daemon (it writes /run/agentbox/relay.env).
Hetzner adoption needs the box's per-host SSH key; a box created elsewhere
can't be controlled and recover says so.
Works across all five providers. Docs updated (cli.mdx, state.md,
host-relay.md, cloud-providers.md). Unit tests for recordLastAgent and the
restoreAgentSessions launch-fresh path; docker reconnect + lastAgent +
fresh-launch verified live (StartedAt unchanged → no power-cycle).
Claude-Session: https://claude.ai/code/session_01Ja5HgEjwyER5BhhFCpPUup
…ession Bugbot: `recover` passed the target agent to `restoreAgentSessions` only to gate the fresh-launch pass, while pass 1 still resumed every resumable agent that had an in-box pointer — so recover could resurrect an unrelated Claude/ Codex session (possibly from a stale pointer) alongside the intended agent. Rework `restoreAgentSessions`: `restoreOnly` (was `launchFresh`) now scopes the whole restore to that one agent — resume it if there's a live/resumable session, else start it fresh — and touches nothing else. `start`/`unpause` (no `restoreOnly`) keep the resume-every-running-agent semantics. A box created before `lastAgent` existed passes `undefined` and so falls back to resume-all. Claude-Session: https://claude.ai/code/session_01Ja5HgEjwyER5BhhFCpPUup
… start
Bugbot: docker `reconnect` always delegated to `startBox` (`docker start`),
which errors on a paused container ("cannot start a paused container"). So
`agentbox recover` on a paused docker box left it frozen while reporting the
relay/portless recovery as success — exec, agent restore, and attach kept
failing until the user ran `unpause`.
Probe state first: paused -> `unpauseBox` (resumes the still-frozen
ctl/dockerd/vnc; the portless alias survives a pause); running/stopped ->
`startBox` (idempotent, relaunches dead daemons + re-aliases portless);
missing/destroyed -> clear error. Mirrors the cloud provider's state-routed
reconnect.
Claude-Session: https://claude.ai/code/session_01Ja5HgEjwyER5BhhFCpPUup
feat(recover): add `agentbox recover` to reconnect a box without power-cycling
…-up) `launchCloudCtlDaemon` spawned `agentbox-ctl daemon` unconditionally on every create/start/unpause/resume/recover (via `reEnsureCloudBox`). The daemon has no singleton guard — when :8788 is already bound it catches EADDRINUSE and keeps running — so each relaunch leaked another idle daemon (a live Hetzner box had 9, only one serving the relay). `recover`, which targets already-running boxes, hit this every time. Guard the spawn with a liveness probe: a healthy ctl daemon already serving the box relay's /healthz (port parsed from relayUrl, default 8788) means the launch is skipped — it already has the right env (stable across a host-only reconnect), and only a real sandbox restart kills it (probe fails -> relaunch). Probe via node (guaranteed present; curl may not be). Same guard for docker's `launchCtlDaemon` (`pgrep -f 'agentbox-ctl daemon$'`, end-anchored so it can't match the `sh -c` wrapper): docker doesn't pile up today because the daemon dies with the container, but the new docker `reconnect` runs `startBox` against a possibly-running container and would otherwise double-launch. Verified live: `recover` twice on the Hetzner box left the daemon count flat and :8788 still healthy on the original pid. Claude-Session: https://claude.ai/code/session_01Ja5HgEjwyER5BhhFCpPUup
fix(ctl): make the in-box ctl-daemon launch idempotent (stop the pile-up)
A Hetzner box's firewall locks SSH to the host's egress IP at create time and is never re-synced. When the host IP changes (laptop moves networks), every comms op fails with an opaque `ssh ControlMaster failed … Operation timed out` and the user has to know to run `agentbox hetzner firewall sync`. Two fixes, both gated to the connection-failure path so the happy path never pays the egress-detect cost, and the firewall is re-synced ONLY when the IP actually changed: 1. Hint (read-only): wrap `tunnels.open` in `ensureTunnel` — the one choke point all of exec/scp/forward/poller/attach funnel through. On a real mismatch it turns the opaque timeout into "firewall allows X but your egress is now Y — run `firewall sync`/`recover`". Safe on a checkpoint drop (box merely stopped, IP unchanged → no hint). 2. Auto-sync, scoped to connection ESTABLISHMENT only. New optional `repairReachability` on CloudBackend/Provider (Hetzner-only): re-syncs the firewall to the current egress, but only when it changed (else changed:false). A `withFirewallRepair` CLI helper retries the attempt once iff something changed, wired at the two establish sites — `recover` (provider.reconnect) and the INITIAL attach connect (`_cloud-attach` buildAttach). Deliberately NOT the mid-session reconnect closure: a checkpoint stops the box and drops the PTY, which must not be mistaken for an IP change. `--no-firewall-sync` opts out on recover (shared/untrusted egress). A short-TTL egress cache avoids probe storms across retries / `recover --all`. `0.0.0.0/0` (explicit dynamic-IP opt-in) is never hinted or synced. Verified live on a Hetzner box: locking the firewall to a bogus IP makes `shell` fail with the hint (no auto-repair), `recover` auto-syncs back + reconnects, and `--no-firewall-sync` leaves it locked. Unit tests cover firewallNeedsSync + the egress TTL cache. Claude-Session: https://claude.ai/code/session_01Ja5HgEjwyER5BhhFCpPUup
…establishes 1. Stale egress cache could mask a real IP change: cut the cache TTL from 60s to 5s. It only exists to dedup a burst of failure-path probes (poller backoff, `recover --all`), not to remember the IP over time — a long TTL would hide the very IP change we're detecting. 2. The firewall self-heal wrapped only the final buildAttach, but the resume probe and the detached pre-start connect first — a firewall block there aborted the attach (or silently dropped the resumed session) before repair ran. Move the repair to a single up-front warm-up (`exec true`, Hetzner-only) that opens the tunnel + self-heals BEFORE any later establish touch, which then reuse the live master. Verified live: a locked firewall is now auto-synced on `claude attach` before it connects. Claude-Session: https://claude.ai/code/session_01Ja5HgEjwyER5BhhFCpPUup
…re path Bugbot (round 2): even a 5s-TTL cache could read a just-changed egress IP as "unchanged" in the firewall comparison and skip the heal — the exact mismatch this exists to catch. The cache only dedup'd failure-path probes, but the cloud poller already de-dupes its recover calls and `recover --all` is sequential, so a fresh `detectEgressIp` in `firewallEgressStatus` won't storm. Remove the cache entirely; correctness over a marginal probe dedup. Claude-Session: https://claude.ai/code/session_01Ja5HgEjwyER5BhhFCpPUup
feat(hetzner): self-heal the per-box firewall on a host egress-IP change
… --host-only) Add a `--host-only` flag to `agentbox git push <box>` and in-box `agentbox-ctl git push` that makes the box's branch available in the host's *local* repo without pushing to any remote — nothing is published online. The destination branch defaults to the box's current branch name; `--as <branch>` overrides it and `--force` allows a non-fast-forward overwrite. `--host-only` is incompatible with `--remote` (exit 64). Because nothing leaves the host, the relay skips the push-confirm gate / host-initiated-token path entirely. Docker copies the box branch ref within the shared bind-mounted `.git/` via a self-fetch (`handleGitSaveToHost`); cloud reuses the push flow's git-bundle pull-back, stopping before the remote push (`runGitRpc` short-circuit), so all four cloud providers are covered. The in-box agent system prompt (custom-system-CLAUDE.md, all providers) and the docs (host-relay, features, web cli/sync-and-git) document the new mode. Claude-Session: https://claude.ai/code/session_01TmyXca2hNF9TtK6q9MAh1L
Making `--force` a known option on `push` (for `--host-only`) meant Commander consumed it for every push, but only the host-only path forwarded it — a normal `agentbox git push <box> --force` silently dropped `--force` (`params.force` is only honored on the host-only land path). Re-append `--force` to the forwarded git args on a remote push, in both the host CLI and in-box ctl, so the relay appends it to `git push <remote> <branch>`. The host CLI's predicted params hash stays in lockstep with ctl's normalized args tail. Add a pure unit test for ctl's buildParams covering remote --force re-forwarding vs host-only params. Verified live (docker): non-ff push rejected without --force (exit 1, remote unchanged), forced update succeeds with --force; host-only land still works. Claude-Session: https://claude.ai/code/session_01TmyXca2hNF9TtK6q9MAh1L
`agentbox cp` and the in-box `agentbox-ctl cp toHost|fromHost` took a single
source per call; copying several files meant several invocations (and, from
inside a box, several host approval prompts). Both now accept multiple sources
in one call — list files/dirs before the destination, which must then be a
directory. Wildcards are handled by the invoking shell (host shell for host-side
sources, box shell for in-box sources), so no glob expansion lives in the code
and there's no box->host shell-injection surface.
- Host CLI: `cp <paths...>` (variadic, arity-split). One arg keeps the
download-to-cwd back-compat; a single source keeps full docker-cp rename
semantics. >=2 sources require a directory dest. All sources must be on the
side opposite the dest, and box sources must name one box. Size guard runs
per source.
- Providers: `uploadPath`/`downloadPath` take `string[]`. Docker groups sources
by parent dir (one tar per group), hoists mkdir/parent-chain-chown, chowns
each landed entry. Cloud loops the single-source primitive serially.
- Wire: `cp.*` RPC carries `{sources[], dest}` with backward-tolerant
normalization of the legacy `{boxPath, hostPath}` shape; shared in cp-rpc.ts.
- Relay: the cloud `runCpRpc` now re-shells `agentbox cp` like the docker path
instead of calling the cloud primitives directly — so excludes and the size
guard are honored on every provider (the cloud path silently dropped them
before, while the consent prompt advertised them). Consent prompt lists all
sources.
- Docs/skills/system prompts updated; new unit tests for parseArgs, the cp-rpc
wire helpers, and cloud multi-source orchestration.
Verified live (docker): multi-source + wildcard upload, multi-source download,
dest-not-a-dir / box-to-box errors, single-source rename + download-to-cwd
back-compat, and the in-box ctl variadic parsing.
Claude-Session: https://claude.ai/code/session_01XvuW3YwvHzvCrmXMJyC33W
feat(cp): copy multiple files/dirs in one call
|
The latest updates on your projects. Learn more about Vercel for GitHub. 1 Skipped Deployment
|
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes using default effort and found 2 potential issues.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 1d72ceb. Configure here.
| const resume = await agentResumeArgs(provider, box, args.binary); | ||
| if (resume) extraArgs = resume; | ||
| } | ||
| const command = buildCloudAttachInnerCommand(args.binary, extraArgs); |
There was a problem hiding this comment.
Hetzner firewall skipped detached start
Medium Severity
cloudAgentAttach warms the Hetzner SSH tunnel with withFirewallRepair before exec/buildAttach, but cloudAgentStartDetached—now used for cloud --no-attach, background creates, and queue workers—does not. After a host egress IP change, detached agent startup can fail while an interactive attach on the same box succeeds.
Reviewed by Cursor Bugbot for commit 1d72ceb. Configure here.
| } | ||
| const alias = agentboxAliasFor(box.name); | ||
| return { alias, host: target.host, user: target.user, identityFile: target.identityFile }; | ||
| } |
There was a problem hiding this comment.
SSH config lacks firewall heal
Medium Severity
New resolveCloudSshTarget (used by agentbox shell --ssh-config and shared alias helpers) brings the box online and calls buildAttach without the Hetzner withFirewallRepair pass added to cloudAgentAttach. A stale firewall after an egress IP change can make --ssh-config fail even though attach self-heals.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit 1d72ceb. Configure here.


Note
Medium Risk
Touches cloud reconnect, Hetzner firewall self-heal, SSH config alias semantics, and agent session restore—important for connectivity and git workflows but mostly additive CLI paths with tests on cp/ssh/restore.
Overview
This nightly batch extends the CLI around reconnection, file copy, git, and external-app SSH, plus Codex onboarding and small cloud attach fixes.
agentbox recoverrehydrates the relay, callsprovider.reconnect(no power-cycle when the sandbox is already up), restores the box’slastAgentviarestoreAgentSessions(resume or fresh start, including OpenCode), optionally attaches, supports--alland--provider … --adoptfor sandboxes missing from local state. Hetzner connect failures can trigger a one-shot firewall sync throughwithFirewallRepair(establish paths only—not mid-session reconnect).agentbox cpis now variadic: multiple host or box sources with the last path as the destination; upload size limits apply per source.git push --host-only(and--as/--force) lands the box branch in the host repo without hitting a remote.agentbox shell --ssh-config(Hetzner with a persistent key) writes~/.ssh/configusing the box name as the Host alias (with legacyagentbox-cloud-*cleanup), shared withcode/openviaensureCloudSshAlias. Cloud--no-attachagent launches now start detached tmux immediately (aligned with Docker). Agent launches recordrecordLastAgentfor recover.agentbox install codex(and hook ininstall) registers the Codex marketplace/plugin and enables it in~/.codex/config.toml; dev checkouts can use a local marketplace and skill symlinks. Runtime staging addsagentbox-portless-trustfor TLS Portless trust inside boxes. Docs/skills cover recover, multi-cp, host-only push, and Codex/Claude SSH links.Reviewed by Cursor Bugbot for commit 1d72ceb. Configure here.