fix(tmux): real reboot-cycle — clean-machine rig init sets up working tmux persistence (tmux v2)#28
Open
alex-mextner wants to merge 3 commits into
Open
fix(tmux): real reboot-cycle — clean-machine rig init sets up working tmux persistence (tmux v2)#28alex-mextner wants to merge 3 commits into
alex-mextner wants to merge 3 commits into
Conversation
… tmux persistence (HYP tmux v2) A reboot exposed 6 defects in the #24/#26 tmux provisioning (unit-only, never run through a real apply->save->reboot->restore). Fix so a CLEAN-machine `rig init` does everything with NO manual steps: 1. boot: the launchd agent ran `tmux start-server` (an EMPTY server — conf/plugins load only on the first session, so continuum-restore never fired). Now it runs a generated boot script doing `tmux new-session -d` (loads the conf -> continuum -> restore), and `rig apply` `launchctl load -w`s the agent so it actually fires at login. 2. cc-save: detected cc via `pane_current_command == claude`, but cc shows as its VERSION (e.g. 2.1.178); the real `claude` is a CHILD of the pane shell. cc-save now walks the pane's process TREE (pane_pid descendants) -> the map is non-empty -> cc resumes. 3. default-command: `''` restored NON-login shells (no ~/.zprofile/PATH). Set a login-shell default-command (shell path baked at generation, not a tmux ${SHELL} ref tmux would reject). A non-empty override must be an absolute path. 4. resurrect dir: `~/.tmux/resurrect` was absent -> no snapshot written. apply creates it. 5. old continuum boot: its osx_*_start_tmux.sh Login Items + a stale Tmux.Start launchd agent competed with rig's boot. apply runs continuum's osx_disable.sh + bootout/removes the old plist; keeps @continuum-boot off. 6. plugins + first save: apply clones tpm + resurrect + continuum into ~/.tmux/plugins and takes a first resurrect save. PLUGINS is now {dir: (url, real_entrypoint)} — ONE source so activation/completeness/drift agree (resurrect ships resurrect.tmux, NOT tmux-resurrect.tmux — the old check re-cloned a real checkout every apply). Live activation is idempotent + non-fatal (offline degrades, never aborts) and gated behind RIG_TMUX_DRY_RUN (unit suite + CI set it). drift/status honors the same flag so apply and status stay consistent. The first-save probe uses `tmux list-sessions` (a bare `has-session` without -t resolves the current session and fails outside tmux, silently skipping the save). Tests: unit + a REAL e2e (tests/test_tmux_e2e.py) that drives a real tmux server on a private socket — apply -> EXECUTE the boot script (no reboot) -> assert a server comes up WITH config + a session; a fake `claude` child under a pane -> non-empty cc-map; login-shell default-command; a real resurrect .txt snapshot; old-boot cleanup. The e2e is opt-in (RIG_TMUX_E2E=1, needs network for the real plugin clones) so the default `pytest -q` stays hermetic; a dedicated CI job runs it. smoke now exercises the tmux catalog through the real CLI and drops the dead `mcp.items.review` slot (removed upstream — it made smoke fail `unknown mcp item`). The cold-boot proof (power-cycle -> tmux comes up restored by the launchd agent) needs a real user reboot; the e2e simulates everything up to that by executing the boot script directly. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ddb671ceea
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
…ross-platform cc-save e2e) The cc-save e2e (`test_cc_save_populates_map_from_a_real_claude_child`) needs a process whose `comm` basename is `claude` on BOTH macOS and Linux. The two single-OS tricks each fail on the other: `exec -a claude sleep` rewrites only argv[0], so Linux `comm` still reads `sleep` (the descendant was invisible to the tree-walk → the job FAILED on ubuntu CI); a COPY of the `sleep` binary won't run on macOS (SIP refuses to exec an unsigned copy of a system binary). A SYMLINK named `claude` → the real `sleep` works on both: the kernel sets `comm` from the invoked name, so `comm`'s basename is `claude` on Linux AND macOS. Verified: 5/5 e2e pass locally with RIG_TMUX_E2E=1. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ex P1/P2 review threads Addresses the two codex review threads on PR #28 plus related boot/drift hardening: - Gate stale-continuum-boot cleanup on rig's REPLACEMENT boot being active (codex P1): a new `rig_boot_active` flag in `_tmux_activate` tracks whether rig's launchd boot agent is actually loaded-and-enabled after the load block (freshly loaded, or already-loaded-and-safe). Step 5 now only removes continuum's own autostart (Login Items / old Tmux.Start) when `rig_boot_active` is true — so a conflict-skip / offline / launchctl-failure apply never strips the last tmux autostart while our replacement isn't in place. - The generated-config conflict is already in the boot-load safety expression (codex P2): `boot_load_safe = not (boot_plist_conflicted or boot_script_conflicted or conf_conflicted)`, so a skip-left stale `rig.tmux.conf` already suppresses the launchctl-load (now the cleanup too). - Resolve the login shell ONCE at plan time (`plan._build_tmux`), not per render: a per-render `$SHELL` resolve made `rig apply` (with $SHELL) and `rig status` (launchd/cron, empty $SHELL) render different `default-command` lines → permanent flapping drift. The concrete path is baked into the action so render/drift are deterministic. - Reload the boot agent when the plist content changed (unload-then-load-w), so a steady-state re-apply is a no-op but a changed plist is picked up by launchd. New test asserts the cleanup is suppressed on a conflict-skipped boot. All green: 507 unit + 5 e2e (RIG_TMUX_E2E=1) + smoke OK. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
A clean-machine
rig initnow sets up FULLY-WORKING tmux persistence. Fixes all 6 reboot-cycledefects a real reboot exposed (the #24/#26 provisioning was unit-only — never run through an actual
apply→save→reboot→restore on a live machine).
Spec: agent-tools ROADMAP "rig tmux v2 — REAL reboot cycle broke".
The 6 defects, fixed
tmux start-server(no conf, no plugins →continuum-restore never fired). Now it runs a generated boot script doing
tmux new-session -d(loads~/.tmux.conf→ sourcedrig.tmux.conf→ continuum → restore), andrig apply**
launchctl load -w**s the agent so it actually fires at login.pane_current_command— but cc shows up as its VERSION (e.g.2.1.178); the realclaudeis a CHILD of the pane shell. cc-save now walks the pane'sprocess tree (
pane_piddescendants) and matches aclaudeprocess → the map is non-empty.default-command ''restored NON-login shells (no~/.zprofile/PATH). Now sets alogin-shell
default-command(shell path baked at generation — NOT a tmux${SHELL}ref,which tmux rejects and would abort the whole config). A non-empty override must be absolute.
~/.tmux/resurrect/was absent → no snapshot ever written. apply creates it.osx_*_start_tmux.shLogin Items + a staleTmux.Startlaunchd agent). apply runs continuum's
osx_disable.sh+launchctl bootout/removes the oldplist; keeps
@continuum-boot off.~/.tmux/pluginsand takes a firstresurrect save.PLUGINSis now{dir: (url, real_entrypoint)}— ONE source so activation / completeness / drift agree(resurrect ships
resurrect.tmux, NOTtmux-resurrect.tmux— the old check re-cloned a realcheckout every apply).
Live activation is idempotent + non-fatal (offline degrades, never aborts) and gated behind
RIG_TMUX_DRY_RUN(unit suite + CI set it).drift/statushonors the same flag so apply ↔ statusstay consistent. The first-save probe uses
tmux list-sessions(a barehas-sessionwithout-tresolves the current session and fails outside tmux, silently skipping the save).
Tests
tests/test_tmux_e2e.py): drives a real tmux server on a private-Lsocket — apply -> EXECUTE the boot script (no reboot) -> assert a server comes up WITH config + a
session; a fake
claudechild under a pane -> NON-EMPTY cc-map; login-shelldefault-command; areal
resurrect savewrites a.txtsnapshot; old-boot cleanup. The e2e is opt-in(
RIG_TMUX_E2E=1, needs network for the real plugin clones) so the defaultpytest -qstayshermetic; a dedicated CI job (
tmux-e2e) runs it.new-session -d+ a non-commentset -g default-command) and drops the deadmcp.items.reviewslot (removed upstream — itmade smoke fail
unknown mcp item).Local:
bash tests/smoke.sh-> green (500 unit passed, smoke OK);RIG_TMUX_E2E=1 pytest tests/test_tmux_e2e.py-> 5 passed.Needs a real reboot
The cold-boot proof (power-cycle -> tmux comes up restored by the launchd agent) needs an actual user
reboot — the e2e simulates everything up to that by executing the boot script directly.
🤖 Generated with Claude Code