Skip to content

fix(tmux): real reboot-cycle — clean-machine rig init sets up working tmux persistence (tmux v2)#28

Open
alex-mextner wants to merge 3 commits into
mainfrom
tmux-v2-reboot-fix
Open

fix(tmux): real reboot-cycle — clean-machine rig init sets up working tmux persistence (tmux v2)#28
alex-mextner wants to merge 3 commits into
mainfrom
tmux-v2-reboot-fix

Conversation

@alex-mextner

Copy link
Copy Markdown
Owner

What

A clean-machine rig init now sets up FULLY-WORKING tmux persistence. Fixes all 6 reboot-cycle
defects a real reboot exposed (the #24/#26 provisioning was unit-only — never run through an actual
apply→save→reboot→restore on a live machine).

Spec: agent-tools ROADMAP "rig tmux v2 — REAL reboot cycle broke".

The 6 defects, fixed

  1. Boot launchd ran an EMPTY server. The agent ran tmux start-server (no conf, no plugins →
    continuum-restore never fired). Now it runs a generated boot script doing tmux new-session -d (loads ~/.tmux.conf → sourced rig.tmux.conf → continuum → restore), and rig apply
    **launchctl load -w**s the agent so it actually fires at login.
  2. cc-save detected cc by pane_current_command — but cc shows up as its VERSION (e.g.
    2.1.178); the real claude is a CHILD of the pane shell. cc-save now walks the pane's
    process tree (pane_pid descendants) and matches a claude process → the map is non-empty.
  3. default-command '' restored NON-login shells (no ~/.zprofile/PATH). Now sets a
    login-shell default-command (shell path baked at generation — NOT a tmux ${SHELL} ref,
    which tmux rejects and would abort the whole config). A non-empty override must be absolute.
  4. ~/.tmux/resurrect/ was absent → no snapshot ever written. apply creates it.
  5. Old continuum boot competed (osx_*_start_tmux.sh Login Items + a stale Tmux.Start
    launchd agent). apply runs continuum's osx_disable.sh + launchctl bootout/removes the old
    plist; keeps @continuum-boot off.
  6. Clean machine had no plugins / no first save. apply clones tpm + resurrect + continuum into
    ~/.tmux/plugins and takes a first resurrect save. PLUGINS is now
    {dir: (url, real_entrypoint)} — ONE source so activation / completeness / drift agree
    (resurrect ships resurrect.tmux, NOT tmux-resurrect.tmux — the old check re-cloned a real
    checkout every apply).

Live activation is idempotent + non-fatal (offline degrades, never aborts) and gated behind
RIG_TMUX_DRY_RUN (unit suite + CI set it). drift/status honors the same flag so apply ↔ status
stay consistent. The first-save probe uses tmux list-sessions (a bare has-session without -t
resolves the current session and fails outside tmux, silently skipping the save).

Tests

  • Unit + a REAL e2e (tests/test_tmux_e2e.py): drives a real tmux server on a private -L
    socket — apply -> EXECUTE the boot script (no reboot) -> assert a server comes up WITH config + a
    session; a fake claude child under a pane -> NON-EMPTY cc-map; login-shell default-command; a
    real resurrect save writes a .txt snapshot; old-boot cleanup. The e2e is opt-in
    (RIG_TMUX_E2E=1, needs network for the real plugin clones) so the default pytest -q stays
    hermetic; a dedicated CI job (tmux-e2e) runs it.
  • smoke exercises the tmux catalog through the real CLI (asserts new-session -d + a non-comment
    set -g default-command) and drops the dead mcp.items.review slot (removed upstream — it
    made smoke fail unknown mcp item).

Local: bash tests/smoke.sh -> green (500 unit passed, smoke OK); RIG_TMUX_E2E=1 pytest tests/test_tmux_e2e.py -> 5 passed.

Needs a real reboot

The cold-boot proof (power-cycle -> tmux comes up restored by the launchd agent) needs an actual user
reboot — the e2e simulates everything up to that by executing the boot script directly.

🤖 Generated with Claude Code

… tmux persistence (HYP tmux v2)

A reboot exposed 6 defects in the #24/#26 tmux provisioning (unit-only, never run
through a real apply->save->reboot->restore). Fix so a CLEAN-machine `rig init` does
everything with NO manual steps:

1. boot: the launchd agent ran `tmux start-server` (an EMPTY server — conf/plugins load
   only on the first session, so continuum-restore never fired). Now it runs a generated
   boot script doing `tmux new-session -d` (loads the conf -> continuum -> restore), and
   `rig apply` `launchctl load -w`s the agent so it actually fires at login.
2. cc-save: detected cc via `pane_current_command == claude`, but cc shows as its VERSION
   (e.g. 2.1.178); the real `claude` is a CHILD of the pane shell. cc-save now walks the
   pane's process TREE (pane_pid descendants) -> the map is non-empty -> cc resumes.
3. default-command: `''` restored NON-login shells (no ~/.zprofile/PATH). Set a login-shell
   default-command (shell path baked at generation, not a tmux ${SHELL} ref tmux would
   reject). A non-empty override must be an absolute path.
4. resurrect dir: `~/.tmux/resurrect` was absent -> no snapshot written. apply creates it.
5. old continuum boot: its osx_*_start_tmux.sh Login Items + a stale Tmux.Start launchd
   agent competed with rig's boot. apply runs continuum's osx_disable.sh + bootout/removes
   the old plist; keeps @continuum-boot off.
6. plugins + first save: apply clones tpm + resurrect + continuum into ~/.tmux/plugins and
   takes a first resurrect save. PLUGINS is now {dir: (url, real_entrypoint)} — ONE source
   so activation/completeness/drift agree (resurrect ships resurrect.tmux, NOT
   tmux-resurrect.tmux — the old check re-cloned a real checkout every apply).

Live activation is idempotent + non-fatal (offline degrades, never aborts) and gated behind
RIG_TMUX_DRY_RUN (unit suite + CI set it). drift/status honors the same flag so apply and
status stay consistent. The first-save probe uses `tmux list-sessions` (a bare `has-session`
without -t resolves the current session and fails outside tmux, silently skipping the save).

Tests: unit + a REAL e2e (tests/test_tmux_e2e.py) that drives a real tmux server on a private
socket — apply -> EXECUTE the boot script (no reboot) -> assert a server comes up WITH config +
a session; a fake `claude` child under a pane -> non-empty cc-map; login-shell default-command;
a real resurrect .txt snapshot; old-boot cleanup. The e2e is opt-in (RIG_TMUX_E2E=1, needs
network for the real plugin clones) so the default `pytest -q` stays hermetic; a dedicated CI
job runs it. smoke now exercises the tmux catalog through the real CLI and drops the dead
`mcp.items.review` slot (removed upstream — it made smoke fail `unknown mcp item`).

The cold-boot proof (power-cycle -> tmux comes up restored by the launchd agent) needs a real
user reboot; the e2e simulates everything up to that by executing the boot script directly.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ddb671ceea

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread riglib/actions/runner.py Outdated
Comment thread riglib/actions/runner.py
alex-mextner and others added 2 commits June 17, 2026 00:11
…ross-platform cc-save e2e)

The cc-save e2e (`test_cc_save_populates_map_from_a_real_claude_child`) needs a process whose
`comm` basename is `claude` on BOTH macOS and Linux. The two single-OS tricks each fail on the
other: `exec -a claude sleep` rewrites only argv[0], so Linux `comm` still reads `sleep` (the
descendant was invisible to the tree-walk → the job FAILED on ubuntu CI); a COPY of the `sleep`
binary won't run on macOS (SIP refuses to exec an unsigned copy of a system binary). A SYMLINK
named `claude` → the real `sleep` works on both: the kernel sets `comm` from the invoked name, so
`comm`'s basename is `claude` on Linux AND macOS. Verified: 5/5 e2e pass locally with RIG_TMUX_E2E=1.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ex P1/P2 review threads

Addresses the two codex review threads on PR #28 plus related boot/drift hardening:

- Gate stale-continuum-boot cleanup on rig's REPLACEMENT boot being active (codex P1): a new
  `rig_boot_active` flag in `_tmux_activate` tracks whether rig's launchd boot agent is actually
  loaded-and-enabled after the load block (freshly loaded, or already-loaded-and-safe). Step 5 now
  only removes continuum's own autostart (Login Items / old Tmux.Start) when `rig_boot_active` is
  true — so a conflict-skip / offline / launchctl-failure apply never strips the last tmux autostart
  while our replacement isn't in place.

- The generated-config conflict is already in the boot-load safety expression (codex P2):
  `boot_load_safe = not (boot_plist_conflicted or boot_script_conflicted or conf_conflicted)`, so a
  skip-left stale `rig.tmux.conf` already suppresses the launchctl-load (now the cleanup too).

- Resolve the login shell ONCE at plan time (`plan._build_tmux`), not per render: a per-render
  `$SHELL` resolve made `rig apply` (with $SHELL) and `rig status` (launchd/cron, empty $SHELL)
  render different `default-command` lines → permanent flapping drift. The concrete path is baked
  into the action so render/drift are deterministic.

- Reload the boot agent when the plist content changed (unload-then-load-w), so a steady-state
  re-apply is a no-op but a changed plist is picked up by launchd.

New test asserts the cleanup is suppressed on a conflict-skipped boot. All green: 507 unit + 5 e2e
(RIG_TMUX_E2E=1) + smoke OK.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant