Skip to content

fix(tmux): cc-save detects the versioned-binary claude install (resume after reboot)#33

Merged
alex-mextner merged 2 commits into
mainfrom
w2/rig-tmux-provisioning
Jun 17, 2026
Merged

fix(tmux): cc-save detects the versioned-binary claude install (resume after reboot)#33
alex-mextner merged 2 commits into
mainfrom
w2/rig-tmux-provisioning

Conversation

@alex-mextner

Copy link
Copy Markdown
Owner

What

The 2026-06-17 tmux incident's headline defect: after a reboot, Claude Code panes were not resumed because the cc-session map was empty. Root cause: Claude Code installs as a symlink

~/.local/bin/claude -> ~/.local/share/claude/versions/<version>

so the real executable FILE is named by its version (2.1.179). When cc is launched by the resolved path (not the claude symlink), the process's name is the version string, not claude. cc-save.sh's process-tree walk matched only claude/*/claude, so it missed that process — exactly as the original pane_current_command == claude filter did — and the cc-sessions.map stayed empty → cc never resumed.

This finishes the cc-save detection that tmux v2 (#28) started: detect cc by the pane PID-tree, now robust to the versioned-binary install.

Scope note: the 5 INCIDENT fixes — (1) PID-tree detection, (2) login-shell panes, (3) boot new-session -d, (4) legacy Tmux.Start.plist cleanup, (5) single attach-or-create — were largely landed by #28. This PR closes the one remaining real gap in (1): the versioned-binary case the INCIDENT explicitly calls out (2.1.179).

Fix

pane_has_claude now reads ps -eo args (the full command line) instead of ps -eo comm:

  • Portability: macOS comm is the full executable PATH, but Linux comm is the 15-char-truncated basename with no path — so the …/claude/versions/ segment is invisible to a comm match on Linux. args carries the full path on both platforms.
  • Takes argv[0] (executable = the line up to the first space) and matches it against claude / */claude (symlink launch) OR a path under …/claude/versions/ (direct-path launch of the versioned binary).
  • Keys on argv[0] ONLY, not the whole args line — so a claude / claude/versions/ token appearing in an argument (grep -r x …/claude/versions/, cp /opt/claude /tmp, vim claude.md) never false-matches and writes a bogus map entry. (A pre-commit review with two models caught a whole-line-match false-positive bug in an earlier revision; this is the corrected approach.)

Accepted, pinned limitations

  • An install path containing a space truncates argv[0] (isolating a spaced argv[0] from ps args is impossible; the whole-line match that would cover it reintroduces the argument false-positives).
  • A wrapper launch that rewrites argv[0] (npx claude, node …/cli.js) puts the real claude in argv[1+].

Both are absent from the canonical direct-exec install this targets, and both are pinned by tests that assert the non-match so the trade-off can't silently change.

Acceptance evidence

Hermetic unit (tests/test_tmux.py) — BFS run against synthetic ps snapshots:

  • versioned binary detected: bare, with args, deep in the tree
  • argument false-positive guards (grep/ls/tar/cp/find over claude paths → no match)
  • …/notclaude/versions/… → no match
  • spaced-install / wrapper-launch limitations pinned as no-match

Real-tmux e2e (tests/test_tmux_e2e.py, RIG_TMUX_E2E=1)test_cc_save_detects_the_versioned_binary_install: runs a …/claude/versions/<v> symlink by its resolved path as a real pane-shell descendant, runs the unmodified generated cc-save.sh, and asserts the map is non-empty with the right cwd+session-id. Verified it FAILS on the old basename-only match (map EMPTY — the exact incident) and passes with the fix.

673 passed, 7 skipped        # full hermetic suite (pytest -q), zero warnings
7 passed                     # real-tmux e2e (RIG_TMUX_E2E=1)
smoke OK                     # bash tests/smoke.sh
All checks passed!           # ruff on changed files

CI runs the same: hermetic pytest -q, smoke.sh, and the opt-in tmux-e2e job (RIG_TMUX_E2E=1) on Linux.

Not done (out of scope, ship=HOLD)

Nothing applied to the live machine / running tmux / launchd — this PR only changes the rig TEMPLATE that generates cc-save.sh. The live machine picks it up on the next rig apply.

🤖 Generated with Claude Code

alex-mextner and others added 2 commits June 17, 2026 13:45
…e after reboot)

THE 2026-06-17 incident: Claude Code installs as a symlink
~/.local/bin/claude -> ~/.local/share/claude/versions/<version>. Launched by
the resolved path (not the symlink), the process's name is the VERSION string
(2.1.179), NOT `claude`. cc-save's tree walk matched only `claude`/`*/claude`,
so it missed that process exactly as the old `pane_current_command == claude`
filter did -> the cc-session map stayed empty -> cc never resumed after a reboot.

Fix: the tree walk now reads `ps -eo args` (the full command line — argv[0] is
the full PATH on both macOS and Linux, unlike `comm`, whose Linux value is the
15-char-truncated basename with no path), takes argv[0], and matches it against
`claude`/`*/claude` (symlink launch) OR a path under `.../claude/versions/`
(direct-path launch of the versioned binary). Keying on argv[0] ONLY (not the
whole args line) keeps a `claude`/`claude/versions/` token appearing in an
ARGUMENT (`grep .../claude/versions/`, `cp /opt/claude /tmp`, `vim claude.md`)
from a bogus match — a false positive a pre-commit review caught.

Accepted, pinned limitations: a space in the install path, or a wrapper launch
that rewrites argv[0] (`npx claude`, `node .../cli.js`), is not detected — both
are absent from the canonical direct-exec install this targets, and covering
them reintroduces the argument false positives.

Tests: hermetic BFS tests for the versioned binary (bare + with args, deep in
tree), the argument false-positive guards, and the spaced-install / wrapper
limitations; a REAL-tmux e2e that runs the versioned binary by its resolved path
and asserts cc-save writes a non-empty map (fails on the old basename-only match,
proving the guard). 673 unit + 7 real-tmux e2e green, smoke OK, ruff clean.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…lake)

The versioned-binary e2e waited on a flat host `ps -eo args=` substring probe for
the version string before running cc-save. That probe diverged from how the
shimmed-socket tree walk sees the pane and timed out on the Linux CI runner
(`the versioned claude descendant never appeared in the process table`), even
though the production matcher is correct.

Replace the separate probe with a poll on the REAL generated cc-save's own
output: run cc-save in a loop (15s) and break as soon as OUR versioned pane
(this cwd + sid) appears in the map. This validates the exact production matcher
on the exact platform and absorbs the launch->visibility race, instead of
asserting on a proxy `ps` view that can disagree with the pane tree.

Break ONLY on the target line (this cwd + sid), not on any non-empty map — a
review caught that cc-save scans ALL panes, so an unrelated claude process on the
host could populate the map before our descendant is visible and flake the final
assertion. Seed the projects session file before starting the session so the id
is present on the first scan.

7 real-tmux e2e green locally; full hermetic suite + smoke unaffected.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@alex-mextner alex-mextner force-pushed the w2/rig-tmux-provisioning branch from b6bb7ef to 10050a8 Compare June 17, 2026 11:45
@alex-mextner alex-mextner merged commit 1daedfe into main Jun 17, 2026
12 checks passed
@alex-mextner alex-mextner deleted the w2/rig-tmux-provisioning branch June 17, 2026 12:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant