Nightly by madarco · Pull Request #132 · madarco/agentbox

madarco · 2026-06-30T07:31:35Z

Note

Medium Risk
Touches cloud reconnect, Hetzner firewall self-heal, SSH config alias semantics, and agent session restore—important for connectivity and git workflows but mostly additive CLI paths with tests on cp/ssh/restore.

Overview
This nightly batch extends the CLI around reconnection, file copy, git, and external-app SSH, plus Codex onboarding and small cloud attach fixes.

agentbox recover rehydrates the relay, calls provider.reconnect (no power-cycle when the sandbox is already up), restores the box’s lastAgent via restoreAgentSessions (resume or fresh start, including OpenCode), optionally attaches, supports --all and --provider … --adopt for sandboxes missing from local state. Hetzner connect failures can trigger a one-shot firewall sync through withFirewallRepair (establish paths only—not mid-session reconnect).

agentbox cp is now variadic: multiple host or box sources with the last path as the destination; upload size limits apply per source. git push --host-only (and --as / --force) lands the box branch in the host repo without hitting a remote.

agentbox shell --ssh-config (Hetzner with a persistent key) writes ~/.ssh/config using the box name as the Host alias (with legacy agentbox-cloud-* cleanup), shared with code / open via ensureCloudSshAlias. Cloud --no-attach agent launches now start detached tmux immediately (aligned with Docker). Agent launches record recordLastAgent for recover.

agentbox install codex (and hook in install) registers the Codex marketplace/plugin and enables it in ~/.codex/config.toml; dev checkouts can use a local marketplace and skill symlinks. Runtime staging adds agentbox-portless-trust for TLS Portless trust inside boxes. Docs/skills cover recover, multi-cp, host-only push, and Codex/Claude SSH links.

^{Reviewed by Cursor Bugbot for commit 1d72ceb. Configure here.}

On cloud providers (daytona/hetzner/vercel/e2b) the bare `agentbox claude`/`codex`/`opencode` commands and `agentbox fork` route through `cloudAgentCreate`, which on `--no-attach` returned before the agent's tmux session was ever created. The cloud session is created lazily by the attach step, so skipping attach skipped the agent entirely — the box came up with no agent running, contradicting the documented behavior ("create the box and start the agent session, but do not attach"). Docker was unaffected: it creates the session before the attach check. Fix `cloudAgentCreate` to call `cloudAgentStartDetached` in the `attach === false` branch — the same helper the `-i` queue worker uses, which starts a detached tmux session and verifies it stayed up (fail-loud on immediate exit / credential rejection). The `<agent> start` subcommand cloud branches did the same lazy no-op (printing "started lazily on attach"); they now resolve args first, then start the detached session in background mode. Cloud now matches docker on every path. Verified end-to-end: docker `--no-attach` still starts the session (regression guard), and e2b `--no-attach` now brings up a live, logged-in claude tmux session where before it created an empty box. Claude-Session: https://claude.ai/code/session_01PTY4KwAeZdAVvgSWxjpYfs

cloudAgentStartDetached launched the agent with raw extraArgs only, so a background `agentbox <agent> start <box> --no-attach` (and idle-resumed creates) started a fresh agent instead of resuming the box's recorded claude/codex session — the interactive cloudAgentAttach path applies agentResumeArgs when args are empty, but the detached path did not. Apply the same resume-args resolution in cloudAgentStartDetached so the detached path is symmetric with attach. The `-i` queue path always seeds a prompt (non-empty extraArgs), so this no-ops there. Found by Cursor Bugbot on PR #116. Claude-Session: https://claude.ai/code/session_01PTY4KwAeZdAVvgSWxjpYfs

…te_*.sqlite We stopped seeding Codex's state_*.sqlite index (commit 2eaf2b4), so Codex now creates it at startup instead of receiving it pre-uploaded. The create failed with a permission error because the directory wasn't owned by vscode (the user the agent runs as). Two distinct ownership defects: 1. The agent home dirs (~/.codex, ~/.claude, ~/.local/share/opencode) were not reliably vscode-owned in cloud templates (E2B's base image ships a `node` user; the root `npm install -g @openai/codex` bake step left ~/.codex as node:node). This breaks even a plain `agentbox codex` start. Fixed with a cheap, idempotent create-time chown (ensureAgentHomeDirsOwned) — no re-bake. 2. The upload primitives only chowned the final landed path, not the parent directory chain they mkdir -p'd as root. Session-teleport lands a rollout at ~/.codex/sessions/YYYY/MM/DD/, leaving that chain root-owned so Codex can't write a new rollout / its sqlite index. Mirror the carry.ts parent-walk fix in both upload primitives (cloud-cp.ts + docker box-cp.ts), gated on the dest being under /home/vscode. Chowns are name/id-derived (vscode / id -un), not hardcoded 1000, since the vscode uid varies per provider (docker/hetzner=1000, vercel=1001, e2b=1002). Claude-Session: https://claude.ai/code/session_0152GmbNW3e7QpXNkQFd3MB2

Bugbot: when an upload's resolved finalPath was exactly /home/vscode, the `=== BOX_HOME` branch of the gate let the parent walk run with dirname=/home, reassigning /home itself to the agent user. Gate strictly on `startsWith(BOX_HOME + '/')` (a trailing segment), matching carry.ts. Applies to both cloud-cp.ts and docker box-cp.ts; adds a regression test. Claude-Session: https://claude.ai/code/session_0152GmbNW3e7QpXNkQFd3MB2

…SH (#118) * feat(cli): add `agentbox shell --ssh-config` for Codex / Claude desktop SSH Write a `~/.ssh/config` alias on demand so external apps (the Codex app, Claude desktop) can connect to a box over plain SSH, and surface the identity-file path + a Codex deep link. - New `agentbox shell <box> --ssh-config` (+ `--json`): brings the box online, writes the alias, prints alias/host/user/identity + the `codex://settings/connections/ssh/add?name=<alias>` link and Claude desktop instructions. Gated to providers with a persistent per-box key (Hetzner) — Docker/Daytona/Vercel/E2B exit cleanly without writing. - Extract the shared bring-online → buildAttach → parseSshTarget → write alias flow into `cloud-ssh.ts` (`resolveCloudSshTarget` / `ensureCloudSshAlias`); reuse it from `code` and `open`. - Alias is now the box name (clean `ssh <box>` + Codex `name=` param); add `readAgentboxSshAlias` and surface `ssh alias` / `ssh identity` in cloud `inspect`. - Document the flow in the agentbox-info and fork skills. Claude-Session: https://claude.ai/code/session_011VoAz7mUaUGh6dKAvr7kAP * fix(cli): address Cursor Bugbot findings on SSH-config - Check SSH support (`buildAttach`) before any lifecycle action so an unsupported box (e.g. a stopped Docker box) errors without being started; add a `bringOnline` option to skip a redundant lifecycle pass. - `agentbox code` now passes `bringOnline: false` (it already brings the box online + waits) — removes the duplicate resume/start. - Migrate away legacy `agentbox-cloud-<box>` blocks on write/remove so the box-name rename doesn't leave stale Host entries behind. - Warn (don't fail) when `~/.ssh/config` already has a user-authored `Host <box>` that could shadow agentbox's entry. - Tests for legacy-block migration and conflict detection. Claude-Session: https://claude.ai/code/session_011VoAz7mUaUGh6dKAvr7kAP

…erride.md The box "system prompt" baked at /etc/claude-code/CLAUDE.md (sandbox facts: DinD, per-box worktree, push/PR/cp via the host relay, identity in /etc/agentbox/box.env) previously only reached Claude. Codex got none of it. Codex loads a global personal-instructions file from CODEX_HOME, first-match of ~/.codex/AGENTS.override.md then ~/.codex/AGENTS.md (no concat, no @import). At create time we now regenerate ~/.codex/AGENTS.override.md = sentinel + box facts (read fresh) + the user's own AGENTS.md / authored override folded in beneath, so the in-box Codex agent reads the same facts. A line-1 sentinel makes it idempotent and preserves user content (the host ~/.codex is re-synced before each seed, restoring the source). No-op when the facts file is absent. One shared generator (buildCodexAgentsOverrideScript) drives both paths: - docker: seedCodexAgentsOverride() seeds the codex-config volume, called after seedCodexHooks() in create.ts (post host rsync). - cloud (daytona/hetzner/vercel/e2b): ensureCodexAgentsOverride() runs the same script in-box via backend.exec, wired into cloud-provider.ts. Verified: codex debug prompt-input shows the box facts in Codex's model-visible prompt; compose + authored-override + facts-only + no-op cases all check out. Claude-Session: https://claude.ai/code/session_01PTY4KwAeZdAVvgSWxjpYfs

Cloud backends signal script failure via a non-zero CloudExecResult.exitCode rather than throwing, so the prior try/await/log('seeded') reported success even when the `set -e` script aborted (perms, missing paths) — a box could boot without box facts while create logs looked healthy. Read the exitCode and log the failure (still best-effort, never fails create). Found by Cursor Bugbot. Claude-Session: https://claude.ai/code/session_01PTY4KwAeZdAVvgSWxjpYfs

…ritten The shared generator exits 0 in the no-op case (box-facts file absent) without writing the override, so a bare exitCode===0 check logged a false "seeded" on the cloud path. Move the success signal into the script as a stdout marker (CODEX_OVERRIDE_WROTE_MARKER) printed only after the write; both docker and cloud now key their "seeded" log off the marker, and cloud logs an explicit "skipped: box-facts file absent" otherwise. Found by Cursor Bugbot. Claude-Session: https://claude.ai/code/session_01PTY4KwAeZdAVvgSWxjpYfs

…lhost works When the host Portless proxy runs in TLS mode, the symmetric <box>.localhost URL is served inside the box by a self-signed CA the box doesn't trust: - hetzner: the in-VPS mirror's own CA at /root/.portless/ca.pem - docker: the host CA bind-mounted at /home/vscode/.portless/ca.pem portless proxy start only trusts the CA in the Linux system store, not the box user's NSS db, so the in-box VNC Chromium window and Playwright (via Codex) rejected the cert with an HTTPS error. Fix by trusting the CA across every in-box client: - New baked helper agentbox-portless-trust: installs a CA into the system store (update-ca-certificates) + the box user's NSS db (certutil), idempotent and best-effort, prints the system CA path for NODE_EXTRA_CA_CERTS. - Bake libnss3-tools (certutil) into the hetzner snapshot + docker base image. - hetzner startInBoxPortless: when tls, run the helper on /root/.portless/ca.pem and export NODE_EXTRA_CA_CERTS via /etc/profile.d. - docker create: when the resolved portless URL is https, run the helper on the bind-mounted host CA + drop the same profile.d export. No-TLS host proxies (the --no-tls -p 1355 default) serve plain http and skip this entirely. Requires a snapshot re-bake / docker image rebuild to pick up libnss3-tools + the helper. Claude-Session: https://claude.ai/code/session_0152GmbNW3e7QpXNkQFd3MB2

…ugin Installing agentbox left the Codex plugin (the `agentbox`/`agentbox-info` skills) a manual chore: `codex plugin marketplace add` + `codex plugin add`, then a toggle in the Codex app. Automate all three. New `install-codex.ts` (mirrors install-herdr.ts), gated on Codex being present (~/.codex + `codex` on PATH), best-effort so it never aborts `agentbox install`: - `codex plugin marketplace add madarco/agentbox` + `codex plugin add agentbox@agentbox` - enable by default: append `[plugins."agentbox@agentbox"] enabled = true` to ~/.codex/config.toml when no such table exists. Codex has no `plugin enable` CLI — this is the same key the TUI toggle writes. Robustness: - Skips the network re-add when already installed (codex plugin add re-enables a deliberately-disabled plugin), so a user's explicit disable is respected; --force bypasses to re-enable. - The config write only appends when the key is absent — never duplicates the table (a second table is a TOML parse error) and respects enabled = false. Wired into the `agentbox install` wizard (runs when Codex is detected) and a standalone `agentbox install codex`. Adds smol-toml to apps/cli (read-only parse for the presence check). Unit tests cover the upsert/respect-disable logic; e2e verified against codex 0.142 (fresh -> enabled, disabled re-run -> stays off, --force -> re-enabled). Docs: cli.mdx + plugin README. Claude-Session: https://claude.ai/code/session_01PTY4KwAeZdAVvgSWxjpYfs

Addresses the CI failure and Cursor Bugbot findings on the enable step: - commands.test.ts asserted the exact `install` subcommand list — add `codex`. - The managed-block approach conflicted with the `[plugins."agentbox@agentbox"]` table Codex itself writes (`plugin add` / TUI toggle): a second table is a TOML parse error, and the strip+regenerate path could (a) leave a stale duplicate unwritten and (b) override a disable set inside the block. Replace it: parse config.toml read-only and - append a plain `[plugins."agentbox@agentbox"] enabled = true` table only when the key is ABSENT (Codex usually writes it itself on `plugin add`); - respect a present value (enabled defaults true; only explicit `false` is off); - with --force, flip a disabled entry to true via a targeted in-place line edit that preserves the rest of the file (comments/order/formatting). Always write when the text changed (fixes the "cleanup not written" gap). E2E re-verified vs codex 0.142: fresh -> enabled; disabled re-run -> stays off with comments preserved; --force -> re-enabled, file still single-table valid TOML. Claude-Session: https://claude.ai/code/session_01PTY4KwAeZdAVvgSWxjpYfs

From a source checkout, `agentbox install codex` now points the Codex marketplace at the local repo (`source_type = "local"`) instead of the `madarco/agentbox` GitHub slug, re-syncs the bundle skill copy, stages the plugin from the working tree, then symlinks the staged per-skill SKILL.md back to the repo so skill edits go live on the next Codex restart. A published npm install is unaffected (no `plugins/`/`.agents/` ships, so it always uses GitHub). - New `apps/cli/src/lib/source-checkout.ts`: shared `isSourceCheckout`, `resolveHostSkillsDir`, and `resolveDevRepoRoot` (repo root only in a checkout carrying the Codex sources, else null). `install.ts` now imports these. - `--no-dev` forces the published GitHub path even inside a checkout; a source-type conflict (local <-> git) is handled by remove-then-add. - Symlink the skill *files* only — a whole-dir symlink makes Codex report the plugin "not installed". - Tests for the source-checkout helpers and `marketplaceSource`; docs in docs/development.md. Claude-Session: https://claude.ai/code/session_01An9tT8HqjQoGKKVWuYg4bb

…mpts Tell the in-box agent that `agentbox.yaml` services/containers start automatically (check with `agentbox-ctl status`) and that web services are reachable at https://<AGENTBOX_BOX_HOST> from both box and host. Wording is tailored per provider (local proxy for docker, cloud URL for the cloud ones). Claude-Session: https://claude.ai/code/session_01An9tT8HqjQoGKKVWuYg4bb

- Attempt the dev skill symlink whenever a staged dir exists, not only when `codex plugin add` exits 0: a non-zero exit is treated as "already installed" and continues, leaving a staged cache we still want live-symlinked. - Never symlink into a non-existent manifest-version path: if the manifest version dir is absent and the cache doesn't hold exactly one child, bail with "staged plugin dir not found" instead of mkdir-ing an orphan tree Codex ignores. Claude-Session: https://claude.ai/code/session_01An9tT8HqjQoGKKVWuYg4bb

…r on failure The Claude native installer (`curl https://claude.ai/install.sh | bash`) can get an intermittent HTTP 403 from Cloudflare on cloud-datacenter egress IPs (Hetzner among them) under load. The bare `curl | bash` masked it (pipeline exit = bash's 0), so a 403 silently baked a snapshot with no `claude` — boxes from it had no agent, so the in-box tmux session died instantly and `attach` crash-looped on "no server running on /tmp/tmux-1000/default". Replace the masked install with a `retry_backoff` helper: retry the native installer 3x with 60s then 240s backoff (~5 min), keep `set -o pipefail`, and fold `command -v claude` into the retried command so a "succeeded but absent" result also retries. If all attempts fail, abort the bake (exit 71) rather than ship a claude-less snapshot. Applied to hetzner / vercel / e2b bake scripts and the docker Dockerfile (sh/dash plain-loop variant). No npm fallback: `npm install -g @anthropic-ai/claude-code` lacks native-only features and lands at /usr/bin/claude, mismatching the host-seeded installMethod=native and tripping Claude Code's "missing or broken" doctor warning. `prepareHetzner` now special-cases exit 71 with an actionable message (the generic one showed empty stderr because the install runs `bash -x ... 2>&1 | tee`, merging stderr into stdout). Known gap (deferred): the 403 can outlast the ~5-min retry window, so prepare can still fail and need a manual re-run. The validated reliable fix (host-proxies the native binary download, places it at ~/.local/bin/claude) is documented in docs/hertzner_backlog.md as a follow-up. Claude-Session: https://claude.ai/code/session_019m5WHxP4vmsoXaHUhQdY9e

fix(hetzner): retry Claude native installer with backoff + clear error

Adding a box as a remote host (Codex app, VS Code Remote-SSH, plain `ssh <box>`) opened the session in /home/vscode instead of the project at /workspace. SSH config has no start-directory directive, and the only client-side option (RemoteCommand) would break scp/sftp and VS Code's remote bootstrap on the same shared alias. So fix it server-side: make interactive login shells cd into /workspace via the existing /etc/profile.d/agentbox.sh shim every box installs. Guarded to interactive shells only (scp/sftp and `ssh box <cmd>` untouched) and only when still at $HOME (never overrides a caller-chosen dir, e.g. agentbox's own tmux `-c /workspace`). Applied to all provider shims for consistency: hetzner, vercel, e2b, and the canonical docker Dockerfile.box. Claude-Session: https://claude.ai/code/session_01An9tT8HqjQoGKKVWuYg4bb

feat(box): land interactive SSH/login shells in /workspace

The host `~/.codex` is ~1.1 GB and was being synced into boxes almost whole: ~485 MB of macOS aarch64 standalone release binaries (`packages/`), a ~238 MB plugin app-server runtime (`plugins/.plugin-appserver`), the macOS `Codex Computer Use.app` bundle (`computer-use/`), host session archives, and regenerable caches (`.tmp` ~213 MB, `tmp`, `cache`, `vendor_imports`, `sqlite`, `models_cache.json`). None of it is usable in a Linux box — the in-box codex is npm-installed and rebuilds these caches on demand. Exclude all of it from both codex staging paths: - `CODEX_RSYNC_EXCLUDES` (host-stage.ts) — the cloud bake path (hetzner/vercel/ e2b/daytona `stageCodexStaticForUpload`). Dry-run: staged tarball 820 MB -> 482 KB. - the docker `agentbox-codex-config` volume rsync (codex.ts), plus its `rm -rf` purge so existing shared volumes get cleaned on the next sync. Verified live: a fresh docker box's `~/.codex` dropped 1.5 GB -> 59 MB and `codex exec` still returns a real turn. Config / auth / skills / prompts / rules / memories / plugins are still synced, so codex keeps working — just without the host-only ballast. Claude-Session: https://claude.ai/code/session_019m5WHxP4vmsoXaHUhQdY9e

fix(codex): drop heavy host-only artifacts from codex config staging

Cloud boxes (hetzner/daytona/vercel/e2b) have no global env primitive, so the in-box agent — launched via a tmux login shell — only saw the relay token through /etc/agentbox/box.env. Commit b9e4ebf made the ctl daemon overwrite box.env without the token (correctly, to keep secrets out of a 0644 file), which severed the agent's only channel: `agentbox-ctl git push` failed with "no relay configured". Persist the per-box relay URL + token to a 0600 /run/agentbox/relay.env (tmpfs, never snapshotted) written by the daemon once it validates its own token, and have agentbox-ctl's relay clients (postRpc, RelayClient) fall back to it when env is absent. agentbox-ctl is the only in-box relay consumer, so a single on-demand file read fixes every path — the agent and the host-driven `agentbox git push` on all backends — without spraying the token into every login shell's env. The bridge token stays daemon-only. Also stop hetzner cloud-init from copying the relay/bridge tokens into the 0644 box.env (they now travel via relay.env / the daemon process env). Claude-Session: https://claude.ai/code/session_01SAturA5Fs2XHzzondT6DDv

Correct environment.mdx's docker-only box.env claim, and document in host-relay.md / cloud-providers.md / the hetzner+vercel backlogs that the cloud relay token now reaches agentbox-ctl via a 0600 /run/agentbox/relay.env (read by resolveRelayEnv), not login-shell env — guarding the b9e4ebf regression. Bridge token stays daemon-only. Claude-Session: https://claude.ai/code/session_01SAturA5Fs2XHzzondT6DDv

fix(cloud): restore relay token for in-box agent via 0600 relay.env

A box's host-side state (the relay's in-memory registry + CloudBoxPoller, the Hetzner SSH ControlMaster + port forwards, the host Portless aliases, the detached agent tmux session) is separate from the box and is lost on a host reboot / relay restart / new CLI process while the sandbox keeps running. `start`/`unpause` only fix this by power-cycling the box and can't touch a box missing from local state at all. `agentbox recover [box]`: - ensures the host relay is up and rehydrates every box into it, - calls the new `Provider.reconnect(box)` — the no-power-cycle sibling of `start`: cloud re-runs `reEnsureCloudBox` (refresh preview URLs, re-open the Hetzner tunnel, re-register Portless + the relay poller, relaunch in-box daemons) without `backend.start`; docker re-runs the idempotent `startBox`, - relaunches the agent the box was running (resuming, or starting `box.lastAgent` fresh) and attaches. Adds `BoxRecord.lastAgent` (claude/codex/opencode), written on every agent launch (foreground + queued via `recordLastAgent`) — durable, unlike the in-box session pointers cleared on stop, so recover knows which agent to bring back. `recover --provider <cloud> --adopt [ref]` rebuilds local state for a sandbox missing from this host (from `backend.list()` + the agentbox.name tag), minting fresh relay/bridge tokens that reach the in-box agent when reconnect relaunches the ctl daemon (it writes /run/agentbox/relay.env). Hetzner adoption needs the box's per-host SSH key; a box created elsewhere can't be controlled and recover says so. Works across all five providers. Docs updated (cli.mdx, state.md, host-relay.md, cloud-providers.md). Unit tests for recordLastAgent and the restoreAgentSessions launch-fresh path; docker reconnect + lastAgent + fresh-launch verified live (StartedAt unchanged → no power-cycle). Claude-Session: https://claude.ai/code/session_01Ja5HgEjwyER5BhhFCpPUup

…ession Bugbot: `recover` passed the target agent to `restoreAgentSessions` only to gate the fresh-launch pass, while pass 1 still resumed every resumable agent that had an in-box pointer — so recover could resurrect an unrelated Claude/ Codex session (possibly from a stale pointer) alongside the intended agent. Rework `restoreAgentSessions`: `restoreOnly` (was `launchFresh`) now scopes the whole restore to that one agent — resume it if there's a live/resumable session, else start it fresh — and touches nothing else. `start`/`unpause` (no `restoreOnly`) keep the resume-every-running-agent semantics. A box created before `lastAgent` existed passes `undefined` and so falls back to resume-all. Claude-Session: https://claude.ai/code/session_01Ja5HgEjwyER5BhhFCpPUup

… start Bugbot: docker `reconnect` always delegated to `startBox` (`docker start`), which errors on a paused container ("cannot start a paused container"). So `agentbox recover` on a paused docker box left it frozen while reporting the relay/portless recovery as success — exec, agent restore, and attach kept failing until the user ran `unpause`. Probe state first: paused -> `unpauseBox` (resumes the still-frozen ctl/dockerd/vnc; the portless alias survives a pause); running/stopped -> `startBox` (idempotent, relaunches dead daemons + re-aliases portless); missing/destroyed -> clear error. Mirrors the cloud provider's state-routed reconnect. Claude-Session: https://claude.ai/code/session_01Ja5HgEjwyER5BhhFCpPUup

feat(recover): add `agentbox recover` to reconnect a box without power-cycling

…-up) `launchCloudCtlDaemon` spawned `agentbox-ctl daemon` unconditionally on every create/start/unpause/resume/recover (via `reEnsureCloudBox`). The daemon has no singleton guard — when :8788 is already bound it catches EADDRINUSE and keeps running — so each relaunch leaked another idle daemon (a live Hetzner box had 9, only one serving the relay). `recover`, which targets already-running boxes, hit this every time. Guard the spawn with a liveness probe: a healthy ctl daemon already serving the box relay's /healthz (port parsed from relayUrl, default 8788) means the launch is skipped — it already has the right env (stable across a host-only reconnect), and only a real sandbox restart kills it (probe fails -> relaunch). Probe via node (guaranteed present; curl may not be). Same guard for docker's `launchCtlDaemon` (`pgrep -f 'agentbox-ctl daemon$'`, end-anchored so it can't match the `sh -c` wrapper): docker doesn't pile up today because the daemon dies with the container, but the new docker `reconnect` runs `startBox` against a possibly-running container and would otherwise double-launch. Verified live: `recover` twice on the Hetzner box left the daemon count flat and :8788 still healthy on the original pid. Claude-Session: https://claude.ai/code/session_01Ja5HgEjwyER5BhhFCpPUup

fix(ctl): make the in-box ctl-daemon launch idempotent (stop the pile-up)

A Hetzner box's firewall locks SSH to the host's egress IP at create time and is never re-synced. When the host IP changes (laptop moves networks), every comms op fails with an opaque `ssh ControlMaster failed … Operation timed out` and the user has to know to run `agentbox hetzner firewall sync`. Two fixes, both gated to the connection-failure path so the happy path never pays the egress-detect cost, and the firewall is re-synced ONLY when the IP actually changed: 1. Hint (read-only): wrap `tunnels.open` in `ensureTunnel` — the one choke point all of exec/scp/forward/poller/attach funnel through. On a real mismatch it turns the opaque timeout into "firewall allows X but your egress is now Y — run `firewall sync`/`recover`". Safe on a checkpoint drop (box merely stopped, IP unchanged → no hint). 2. Auto-sync, scoped to connection ESTABLISHMENT only. New optional `repairReachability` on CloudBackend/Provider (Hetzner-only): re-syncs the firewall to the current egress, but only when it changed (else changed:false). A `withFirewallRepair` CLI helper retries the attempt once iff something changed, wired at the two establish sites — `recover` (provider.reconnect) and the INITIAL attach connect (`_cloud-attach` buildAttach). Deliberately NOT the mid-session reconnect closure: a checkpoint stops the box and drops the PTY, which must not be mistaken for an IP change. `--no-firewall-sync` opts out on recover (shared/untrusted egress). A short-TTL egress cache avoids probe storms across retries / `recover --all`. `0.0.0.0/0` (explicit dynamic-IP opt-in) is never hinted or synced. Verified live on a Hetzner box: locking the firewall to a bogus IP makes `shell` fail with the hint (no auto-repair), `recover` auto-syncs back + reconnects, and `--no-firewall-sync` leaves it locked. Unit tests cover firewallNeedsSync + the egress TTL cache. Claude-Session: https://claude.ai/code/session_01Ja5HgEjwyER5BhhFCpPUup

…establishes 1. Stale egress cache could mask a real IP change: cut the cache TTL from 60s to 5s. It only exists to dedup a burst of failure-path probes (poller backoff, `recover --all`), not to remember the IP over time — a long TTL would hide the very IP change we're detecting. 2. The firewall self-heal wrapped only the final buildAttach, but the resume probe and the detached pre-start connect first — a firewall block there aborted the attach (or silently dropped the resumed session) before repair ran. Move the repair to a single up-front warm-up (`exec true`, Hetzner-only) that opens the tunnel + self-heals BEFORE any later establish touch, which then reuse the live master. Verified live: a locked firewall is now auto-synced on `claude attach` before it connects. Claude-Session: https://claude.ai/code/session_01Ja5HgEjwyER5BhhFCpPUup

…re path Bugbot (round 2): even a 5s-TTL cache could read a just-changed egress IP as "unchanged" in the firewall comparison and skip the heal — the exact mismatch this exists to catch. The cache only dedup'd failure-path probes, but the cloud poller already de-dupes its recover calls and `recover --all` is sequential, so a fresh `detectEgressIp` in `firewallEgressStatus` won't storm. Remove the cache entirely; correctness over a marginal probe dedup. Claude-Session: https://claude.ai/code/session_01Ja5HgEjwyER5BhhFCpPUup

feat(hetzner): self-heal the per-box firewall on a host egress-IP change

… --host-only) Add a `--host-only` flag to `agentbox git push <box>` and in-box `agentbox-ctl git push` that makes the box's branch available in the host's *local* repo without pushing to any remote — nothing is published online. The destination branch defaults to the box's current branch name; `--as <branch>` overrides it and `--force` allows a non-fast-forward overwrite. `--host-only` is incompatible with `--remote` (exit 64). Because nothing leaves the host, the relay skips the push-confirm gate / host-initiated-token path entirely. Docker copies the box branch ref within the shared bind-mounted `.git/` via a self-fetch (`handleGitSaveToHost`); cloud reuses the push flow's git-bundle pull-back, stopping before the remote push (`runGitRpc` short-circuit), so all four cloud providers are covered. The in-box agent system prompt (custom-system-CLAUDE.md, all providers) and the docs (host-relay, features, web cli/sync-and-git) document the new mode. Claude-Session: https://claude.ai/code/session_01TmyXca2hNF9TtK6q9MAh1L

Making `--force` a known option on `push` (for `--host-only`) meant Commander consumed it for every push, but only the host-only path forwarded it — a normal `agentbox git push <box> --force` silently dropped `--force` (`params.force` is only honored on the host-only land path). Re-append `--force` to the forwarded git args on a remote push, in both the host CLI and in-box ctl, so the relay appends it to `git push <remote> <branch>`. The host CLI's predicted params hash stays in lockstep with ctl's normalized args tail. Add a pure unit test for ctl's buildParams covering remote --force re-forwarding vs host-only params. Verified live (docker): non-ff push rejected without --force (exit 1, remote unchanged), forced update succeeds with --force; host-only land still works. Claude-Session: https://claude.ai/code/session_01TmyXca2hNF9TtK6q9MAh1L

`agentbox cp` and the in-box `agentbox-ctl cp toHost|fromHost` took a single source per call; copying several files meant several invocations (and, from inside a box, several host approval prompts). Both now accept multiple sources in one call — list files/dirs before the destination, which must then be a directory. Wildcards are handled by the invoking shell (host shell for host-side sources, box shell for in-box sources), so no glob expansion lives in the code and there's no box->host shell-injection surface. - Host CLI: `cp <paths...>` (variadic, arity-split). One arg keeps the download-to-cwd back-compat; a single source keeps full docker-cp rename semantics. >=2 sources require a directory dest. All sources must be on the side opposite the dest, and box sources must name one box. Size guard runs per source. - Providers: `uploadPath`/`downloadPath` take `string[]`. Docker groups sources by parent dir (one tar per group), hoists mkdir/parent-chain-chown, chowns each landed entry. Cloud loops the single-source primitive serially. - Wire: `cp.*` RPC carries `{sources[], dest}` with backward-tolerant normalization of the legacy `{boxPath, hostPath}` shape; shared in cp-rpc.ts. - Relay: the cloud `runCpRpc` now re-shells `agentbox cp` like the docker path instead of calling the cloud primitives directly — so excludes and the size guard are honored on every provider (the cloud path silently dropped them before, while the consent prompt advertised them). Consent prompt lists all sources. - Docs/skills/system prompts updated; new unit tests for parseArgs, the cp-rpc wire helpers, and cloud multi-source orchestration. Verified live (docker): multi-source + wildcard upload, multi-source download, dest-not-a-dir / box-to-box errors, single-source rename + download-to-cwd back-compat, and the in-box ctl variadic parsing. Claude-Session: https://claude.ai/code/session_01XvuW3YwvHzvCrmXMJyC33W

feat(cp): copy multiple files/dirs in one call

vercel · 2026-06-30T07:31:43Z

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment

Project	Deployment	Actions	Updated (UTC)
agentbox-web	Skipped		Jun 30, 2026 11:33am

cursor

Cursor Bugbot has reviewed your changes using default effort and found 2 potential issues.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 1d72ceb. Configure here.}

cursor · 2026-06-30T07:32:54Z

+    const resume = await agentResumeArgs(provider, box, args.binary);
+    if (resume) extraArgs = resume;
+  }
+  const command = buildCloudAttachInnerCommand(args.binary, extraArgs);


Hetzner firewall skipped detached start

Medium Severity

cloudAgentAttach warms the Hetzner SSH tunnel with withFirewallRepair before exec/buildAttach, but cloudAgentStartDetached—now used for cloud --no-attach, background creates, and queue workers—does not. After a host egress IP change, detached agent startup can fail while an interactive attach on the same box succeeds.

^{Reviewed by Cursor Bugbot for commit 1d72ceb. Configure here.}

cursor · 2026-06-30T07:32:54Z

+  }
+  const alias = agentboxAliasFor(box.name);
+  return { alias, host: target.host, user: target.user, identityFile: target.identityFile };
+}


SSH config lacks firewall heal

Medium Severity

New resolveCloudSshTarget (used by agentbox shell --ssh-config and shared alias helpers) brings the box online and calls buildAttach without the Hetzner withFirewallRepair pass added to cloudAgentAttach. A stale firewall after an egress IP change can make --ssh-config fail even though attach self-heals.

Additional Locations (1)

apps/cli/src/commands/shell.ts#L224-L226

^{Reviewed by Cursor Bugbot for commit 1d72ceb. Configure here.}

Claude-Session: https://claude.ai/code/session_01Ja5HgEjwyER5BhhFCpPUup

madarco added 30 commits June 26, 2026 11:43

feat() small reference to full skill

e80f1b1

Merge branch 'nightly' of github.com:madarco/agentbox into nightly

6052abf

feat() Updated icon in codex

e6964d3

Merge pull request #123 from madarco/fix/hetzner-claude-install-retry

0991c33

fix(hetzner): retry Claude native installer with backoff + clear error

Merge pull request #124 from madarco/ssh-login-start-in-workspace

b17399a

feat(box): land interactive SSH/login shells in /workspace

Merge pull request #125 from madarco/fix/codex-static-trim

968aac8

fix(codex): drop heavy host-only artifacts from codex config staging

Merge pull request #126 from madarco/fix/cloud-relay-token-file

395a03c

fix(cloud): restore relay token for in-box agent via 0600 relay.env

Merge pull request #127 from madarco/feat/recover-command

3c611d4

feat(recover): add `agentbox recover` to reconnect a box without power-cycling

madarco added 10 commits June 29, 2026 22:28

Merge pull request #128 from madarco/fix/ctl-daemon-idempotent

59f6848

fix(ctl): make the in-box ctl-daemon launch idempotent (stop the pile-up)

Merge pull request #129 from madarco/fix/hetzner-firewall-egress-resync

6c13700

feat(hetzner): self-heal the per-box firewall on a host egress-IP change

Merge pull request #131 from madarco/feat/cp-multi-source

1d72ceb

feat(cp): copy multiple files/dirs in one call

cursor Bot reviewed Jun 30, 2026

View reviewed changes

release: v0.21.0

f444457

Claude-Session: https://claude.ai/code/session_01Ja5HgEjwyER5BhhFCpPUup

vercel Bot temporarily deployed to Preview June 30, 2026 11:33 Inactive

madarco merged commit f444457 into main Jun 30, 2026
2 of 3 checks passed

vercel Bot deployed to Production June 30, 2026 11:35 View deployment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Nightly#132

Nightly#132
madarco merged 41 commits into
mainfrom
nightly

madarco commented Jun 30, 2026 •

edited by cursor Bot

Loading

Uh oh!

vercel Bot commented Jun 30, 2026 •

edited

Loading

Uh oh!

cursor Bot left a comment

Uh oh!

cursor Bot Jun 30, 2026

Uh oh!

cursor Bot Jun 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

madarco commented Jun 30, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vercel Bot commented Jun 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot Jun 30, 2026

Choose a reason for hiding this comment

Hetzner firewall skipped detached start

Uh oh!

cursor Bot Jun 30, 2026

Choose a reason for hiding this comment

SSH config lacks firewall heal

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

madarco commented Jun 30, 2026 •

edited by cursor Bot

Loading

vercel Bot commented Jun 30, 2026 •

edited

Loading