diff --git a/docs/config-schema.md b/docs/config-schema.md index c707715..c5fea80 100644 --- a/docs/config-schema.md +++ b/docs/config-schema.md @@ -570,8 +570,17 @@ scripts and wires them via `@resurrect-hook-post-save-all` / `@resurrect-hook-po `window/pane → cwd → session_id` map. **Detection is by the process TREE, not the command string:** Claude Code shows up in `pane_current_command` as its VERSION (e.g. `2.1.178`), and the real `claude` process is a CHILD of the pane's shell — so cc-save walks the pane's - descendants (`ps -eo pid,ppid,comm`) for a process whose command is `claude`. (Filtering on - `pane_current_command == claude` matched nothing → an empty map → cc never resumed.) + descendants (`ps -eo pid,ppid,args`) for a process whose **executable** (argv[0]) is `claude`. + (Filtering on `pane_current_command == claude` matched nothing → an empty map → cc never + resumed.) **It matches the versioned install too:** cc installs as a symlink + `~/.local/bin/claude → …/claude/versions/`, so launched by the resolved path the + process name is the *version* (`2.1.179`), not `claude`; cc-save also matches an argv[0] under + `…/claude/versions/`. It reads the full `args` (not `comm`) so the path is visible on **both + macOS and Linux** (Linux `comm` is the truncated basename, with no path), and keys on argv[0] + only so a `claude` that is merely an *argument* (`vim claude.md`, `grep …/claude/versions/`) + never false-matches. *Accepted limitations:* an install path containing a **space**, or a + **wrapper** launch that rewrites argv[0] (`npx claude`, `node …/cli.js`), is not detected — both + are absent from the canonical direct-exec install this targets. **Encoding (verified against real on-disk dirs):** the projects-dir name is the cwd with **every `/` and `.` replaced by `-`** (e.g. `/Users/u/.files` → `-Users-u--files`). - **`cc-restore.sh`** — after a reboot, for each mapped window run `claude --resume ` — diff --git a/riglib/tmux.py b/riglib/tmux.py index 2f6c6ac..23b62f6 100644 --- a/riglib/tmux.py +++ b/riglib/tmux.py @@ -340,8 +340,32 @@ def render_cc_save(self) -> str: REAL ``claude`` process is a CHILD of the pane's shell. Filtering on ``pane_current_command == claude`` therefore matched NOTHING → the map stayed empty → cc never resumed. So cc-save now walks the pane's process TREE: it takes ``pane_pid`` and - recursively descends children (``ps -eo pid,ppid,comm``) looking for a process whose - command basename is ``claude``. A pane with a ``claude`` descendant is a cc pane. + recursively descends children (``ps -eo pid,ppid,args``) looking for the ``claude`` + process. A pane with a ``claude`` descendant is a cc pane. + + WHY the match is on the EXECUTABLE PATH, not just the basename ``claude`` (the 2026-06-17 + incident): Claude Code installs as a SYMLINK ``~/.local/bin/claude`` → + ``…/claude/versions/``, and the real executable FILE is named by its VERSION + (``2.1.179``). Launched via the ``claude`` symlink the kernel keeps the invoked name + ``claude`` — but launched by the RESOLVED path the process's name is the version string, + NOT ``claude``. A basename-only ``claude`` / ``*/claude`` match misses THAT process exactly + as the old command-string filter did → the map stays empty → cc never resumes after a + reboot (the live incident). So the tree walk reads the full command line (``ps -eo args``), + takes argv[0] (the executable = the line up to the first space), and matches THAT against: + ``claude`` / ``*/claude`` (symlink launch) OR a path under ``…/claude/versions/`` + (direct-path launch of the versioned binary). Reading ``args`` — not ``comm`` (the + basename-only, 15-char-truncated value on Linux) — makes the versioned PATH visible on both + macOS and Linux. Keying on argv[0] ONLY (not the whole args line) is load-bearing: a + ``claude`` / ``claude/versions/`` token appearing in an ARGUMENT (``vim claude.md``, + ``grep -r x …/claude/versions/``, ``cp /opt/claude /tmp``) must NOT mark the pane as cc — + whole-line matching would write a bogus cc-map entry. (Limitations, both accepted: (a) an + install path with a SPACE truncates argv[0] at the space — isolating a spaced argv[0] from + ``ps args`` is not possible, and the whole-line match that would cover it reintroduces the + false positives; the default ``~/.local/share/claude/versions/`` path has no space. (b) a + WRAPPER launch that rewrites argv[0] — ``npx claude`` / ``node …/cli.js`` / a shell function + — puts the real claude in argv[1+], so the pane is missed; the canonical installs this fix + targets exec the binary directly (``claude`` symlink or the versioned path), so argv[0] is + the claude executable. Matching argv[1+] would resurrect the argument false-positives.) Encoding (VERIFIED on a real machine, see module/test docs): the projects dir name is the cwd with every ``/`` AND ``.`` replaced by ``-`` (so ``/Users/u/.files`` → @@ -363,7 +387,14 @@ def render_cc_save(self) -> str: # so cc-restore can relaunch `claude --resume ` in the right window after a reboot. # WHY a tree walk and not `pane_current_command == claude`: Claude Code shows up in # `pane_current_command` as its VERSION (e.g. 2.1.178); the real `claude` process is a CHILD of -# the pane's shell. So we descend the pane PID's children and match a process named `claude`. +# the pane's shell. So we descend the pane PID's children and match the `claude` process. +# WHY argv[0]'s PATH, not just basename `claude`: cc installs as a symlink +# ~/.local/bin/claude -> .../claude/versions/; launched by the RESOLVED path the process +# name is the version (2.1.179), NOT `claude`. So we read the full `args` (argv[0] is the full +# path on macOS AND Linux, unlike `comm` which is the truncated basename on Linux), take argv[0] +# (up to the first space), and match `claude`/`*/claude` OR a path under `.../claude/versions/`. +# Matching argv[0] ONLY (not the whole args) stops a `claude`/`claude/versions/` token in an +# ARGUMENT (`grep .../claude/versions/`, `vim claude.md`) from a bogus match. # Encoding: the ~/.claude/projects/ dir name is the pane cwd with every '/' and '.' # replaced by '-' (verified against real on-disk dirs). # Limitation: the session id is per-CWD (newest jsonl), not strictly per-pane — two claude @@ -391,26 +422,49 @@ def render_cc_save(self) -> str: basename "$newest" .jsonl }} -# Snapshot the whole process table ONCE (pid ppid comm) — walking the tree per pane against a -# live `ps` each time would race; one snapshot is consistent and cheap. `comm` is the basename -# tmux/ps report (`claude`), not the full argv, so a path like /opt/homebrew/bin/claude still -# reports `claude`. Stored as parallel maps pid->ppid and pid->comm for an O(depth) descent. -PS_SNAPSHOT=$(ps -eo pid=,ppid=,comm= 2>/dev/null || true) +# Snapshot the whole process table ONCE (pid ppid args) — walking the tree per pane against a +# live `ps` each time would race; one snapshot is consistent and cheap. We capture `args` (the +# full command line: executable path + argv), NOT `comm`, for PORTABILITY: macOS `comm` is the +# full executable PATH, but LINUX `comm` is the 15-char-truncated BASENAME with no path — so the +# versioned-binary install (.../claude/versions/, basename = the VERSION) is INVISIBLE +# to a comm match on Linux (comm would read `2.1.179`, no path). `args` carries the full path on +# BOTH platforms, so the `.../claude/versions/` segment is matchable everywhere. The `read -r pid +# ppid rest` below puts the WHOLE remaining line (the args) into `rest`, so variable-width argv +# never breaks the 2-field key parse. +PS_SNAPSHOT=$(ps -eo pid=,ppid=,args= 2>/dev/null || true) pane_has_claude() {{ - # BFS over the descendants of the pane's pid; return 0 if any descendant's command is `claude`. + # BFS over the descendants of the pane's pid; return 0 if any descendant IS a `claude` process. local root="$1" local -a queue=("$root") - local pid ppid comm cur + local pid ppid rest exe cur while [ "${{#queue[@]}}" -gt 0 ]; do cur="${{queue[0]}}" queue=("${{queue[@]:1}}") # scan the snapshot for: (a) `cur`'s own command, and (b) `cur`'s direct children to enqueue. - while read -r pid ppid comm; do + while read -r pid ppid rest; do [ -n "$pid" ] || continue if [ "$pid" = "$cur" ]; then - case "$comm" in + # `rest` is the full command line: argv[0] (the EXECUTABLE) then its args. We match the + # EXECUTABLE ONLY — `exe=${{rest%% *}}` is argv[0] up to the first space — NOT the whole + # `rest`. Matching the whole line would let a `claude`/`claude/versions/` token appearing + # in an ARGUMENT false-positive (`grep -r x .../claude/versions/`, `cp /opt/claude /tmp`, + # `vim claude.md`) and write a bogus cc-map entry. Keying on argv[0] is the documented + # contract. (Limitation: an install path containing a SPACE truncates argv[0] here — but + # isolating a spaced argv[0] is impossible from `ps args` alone, and tolerating it would + # require the whole-line match that reintroduces the false positives, so we accept the rare + # spaced-install miss over false positives for everyone. The default install path + # ~/.local/share/claude/versions/ has no space.) + exe=${{rest%% *}} + case "$exe" in + # a binary/symlink named `claude` (the symlink-launch case: argv[0] basename is `claude`). claude|*/claude) return 0 ;; + # direct-path launch of the VERSIONED binary (e.g. ~/.local/share/claude/versions/2.1.179 + # — its name is the VERSION, not `claude`). Matchable on macOS AND Linux because we read + # the full `args` PATH for argv[0], not `comm` (Linux `comm` is the truncated basename, + # no path). The leading `*/` requires `claude/versions/` to be a real path segment, so + # `…/notclaude/versions/…` never matches. + */claude/versions/*) return 0 ;; esac fi if [ "$ppid" = "$cur" ]; then diff --git a/tests/test_tmux.py b/tests/test_tmux.py index f436d48..631ce43 100644 --- a/tests/test_tmux.py +++ b/tests/test_tmux.py @@ -274,6 +274,15 @@ def test_cc_save_matches_claude_basename_in_the_tree(): assert "MAP_FILE" in body and ".jsonl" in body +def test_cc_save_matches_the_versioned_binary_path(): + """The generated match must also cover the versioned-binary install (the 2026-06-17 incident): + cc launched by its resolved path reports `comm` of `.../claude/versions/`, basename = + the version, not `claude`. The case-glob must include the `*/claude/versions/*` arm or that + process is invisible to the tree walk (the map stays empty → cc never resumes).""" + body = _plan().render_cc_save() + assert "*/claude/versions/*" in body + + def test_cc_save_still_records_cwd_and_session_id(): """Detection changed (tree walk), but the RECORDED data is unchanged: pane addr, cwd, id.""" body = _plan().render_cc_save() @@ -286,6 +295,8 @@ def _run_pane_has_claude(snapshot: str, root: str) -> int: """Extract the REAL `pane_has_claude` BFS from the generated cc-save script and run it against a SYNTHETIC ps snapshot — HERMETIC (no tmux/network/real processes), so the tree-walk logic is covered even when the e2e is opted out (opus finding: the BFS was only exercised in the e2e). + The snapshot lines are ` ` (args = the full command line, as the production + `ps -eo pid=,ppid=,args=` emits — argv[0] is the executable the matcher keys on). Returns the function's exit code (0 = a `claude` descendant of `root` was found).""" import shlex import subprocess @@ -324,6 +335,126 @@ def test_pane_has_claude_matches_absolute_path_basename(): assert _run_pane_has_claude(snap, "100") == 0 +def test_pane_has_claude_matches_versioned_binary_under_claude_versions(): + """THE 2026-06-17 INCIDENT: cc installs as a symlink ~/.local/bin/claude -> + .../claude/versions/. Launched by the RESOLVED path, `ps comm` reports the full path + whose basename is the VERSION string (`2.1.179`), not `claude` — so a basename-only match + missed it and the cc map stayed empty (cc never resumed after a reboot). The tree walk must + catch a descendant whose path is under `.../claude/versions/`.""" + snap = "100 1 /bin/zsh\n200 100 /Users/u/.local/share/claude/versions/2.1.179\n300 200 sleep\n" + assert _run_pane_has_claude(snap, "100") == 0 + + +def test_pane_has_claude_versioned_binary_deep_in_tree(): + """The versioned cc binary several levels below the pane shell is still found by the BFS.""" + snap = ( + "100 1 zsh\n" + "200 100 node\n" + "300 200 /Users/u/.local/share/claude/versions/3.0.0-beta.1\n" + ) + assert _run_pane_has_claude(snap, "100") == 0 + + +def test_pane_has_claude_no_false_positive_on_unrelated_versioned_path(): + """A numeric-named process NOT under `claude/versions/` (e.g. a runtime under its own + `versions/` dir, or a bare version-named binary) must NOT match — the `claude/versions/` + path segment is required, so the rule can't be tripped by any dotted-numeric basename.""" + snap = ( + "100 1 bash\n" + "200 100 /opt/node/versions/20.11.0/bin/node\n" # node, not claude + "300 100 /usr/local/foo/2.1.179\n" # bare version, no claude/versions/ + ) + assert _run_pane_has_claude(snap, "100") != 0 + + +def test_pane_has_claude_matches_versioned_binary_with_args(): + """The versioned binary launched WITH arguments (`…/claude/versions/2.1.179 --resume`) still + matches — the matcher keys on argv[0] (the executable path), ignoring the trailing args.""" + snap = "100 1 /bin/zsh\n200 100 /Users/u/.local/share/claude/versions/2.1.179 --resume\n" + assert _run_pane_has_claude(snap, "100") == 0 + + +def test_pane_has_claude_ignores_claude_only_in_arguments(): + """`claude` appearing only as an ARGUMENT (not argv[0]) must NOT match — the matcher reads + argv[0] only, so `vim claude-notes.md` / `grep claude` / `cat ~/claude/x` is not a cc pane.""" + snap = ( + "100 1 bash\n" + "200 100 /usr/bin/vim claude-notes.md\n" + "300 100 /usr/bin/grep claude /var/log/x\n" + ) + assert _run_pane_has_claude(snap, "100") != 0 + + +def test_pane_has_claude_matches_symlink_launch_with_args(): + """The common live case: `claude --resume` (argv[0] == `claude`, launched via the symlink) + matches even with trailing args — argv[0] basename equality.""" + snap = "100 1 zsh\n200 100 claude --resume\n" + assert _run_pane_has_claude(snap, "100") == 0 + + +def test_pane_has_claude_does_not_false_match_claude_versions_in_an_argument(): + """REGRESSION GUARD (review finding, 2 models): `claude/versions/` appearing in an ARGUMENT + (not argv[0]) must NOT mark the pane as cc — else a routine command writes a bogus cc-map entry. + The matcher keys on argv[0] (the executable) only, so a `grep`/`ls`/`tar` over the versions dir, + or a `cp` of a `claude` file, is correctly ignored.""" + for argline in ( + "/bin/grep -r foo /home/u/.local/share/claude/versions/", + "/bin/ls /home/u/.local/share/claude/versions/", + "/usr/bin/tar czf b.tgz /home/u/.local/share/claude/versions/2.1.179", + "/bin/cp /opt/claude /tmp/", + "/usr/bin/find / -path */claude/versions*", + ): + snap = f"100 1 bash\n200 100 {argline}\n" + assert _run_pane_has_claude(snap, "100") != 0, argline + + +def test_pane_has_claude_no_match_on_notclaude_versions_path(): + """The `*/claude/versions/*` glob requires `claude` to be a real path SEGMENT: a sibling + project like `/opt/notclaude/versions/2.0.0` (or `myclaude`) must NOT false-match.""" + snap = "100 1 bash\n200 100 /opt/notclaude/versions/2.0.0 --x\n" + assert _run_pane_has_claude(snap, "100") != 0 + + +def test_pane_has_claude_spaced_install_path_is_a_known_limitation(): + """DOCUMENTED LIMITATION (review finding): an install path containing a SPACE + (`/Users/J D/.local/share/claude/versions/2.1.179`) is NOT detected — argv[0] cannot be + isolated from `ps args` when it contains a space, and the whole-line match that would cover it + reintroduces the argument false-positives above. The default `~/.local/share/claude/versions/` + path has no space, so this never bites a normal install. This test PINS the accepted behavior + (not an aspiration): if a future change makes spaced paths match, revisit the false-positive + trade-off in `pane_has_claude` deliberately.""" + snap = "100 1 /bin/zsh\n200 100 /Users/J D/.local/share/claude/versions/2.1.179 --resume\n" + assert _run_pane_has_claude(snap, "100") != 0 + + +def test_pane_has_claude_spaced_path_symlink_arm_is_a_known_limitation(): + """Same accepted limitation for the plain `*/claude` symlink arm: a `claude` binary under a + path with a SPACE (`/Users/J D/bin/claude`) is truncated at the space and NOT detected. Pinned + so the symlink arm's contract can't silently change either (paired with the versions-arm pin).""" + snap = "100 1 zsh\n200 100 /Users/J D/bin/claude --resume\n" + assert _run_pane_has_claude(snap, "100") != 0 + + +def test_pane_has_claude_ignores_wrapper_launch_with_claude_in_argv1(): + """DOCUMENTED LIMITATION: a WRAPPER that rewrites argv[0] (`npx claude`, `node …/cli.js`) puts + the real claude in argv[1+], so it is NOT detected — matching argv[1+] would resurrect the + argument false-positives. The canonical installs exec the binary directly (argv[0] = claude), + so this is an accepted miss. Pinned to make the trade-off explicit.""" + snap = ( + "100 1 bash\n" + "200 100 /usr/bin/node /home/u/.local/share/claude/cli.js\n" + "300 100 /usr/bin/npx claude --resume\n" + ) + assert _run_pane_has_claude(snap, "100") != 0 + + +def test_pane_has_claude_tolerates_empty_or_bracketed_args_lines(): + """A degenerate snapshot line — an empty `args` (kernel/zombie) or a bracketed kernel-thread + name (`[kthreadd]`) — must neither match nor error out the BFS (it just isn't a cc process).""" + snap = "100 1 bash\n200 100 [kthreadd]\n300 100 \n" # 300 has empty args + assert _run_pane_has_claude(snap, "100") != 0 + + def test_pane_has_claude_no_match_when_absent(): """No `claude` anywhere in the tree → not found (exit non-zero).""" snap = "100 1 bash\n200 100 vim\n300 100 less\n" diff --git a/tests/test_tmux_e2e.py b/tests/test_tmux_e2e.py index aa6c81d..aa127e8 100644 --- a/tests/test_tmux_e2e.py +++ b/tests/test_tmux_e2e.py @@ -351,6 +351,81 @@ def test_cc_save_populates_map_from_a_real_claude_child(tmux_env, monkeypatch): assert any(str(work) in ln and sid in ln for ln in lines), lines +def test_cc_save_detects_the_versioned_binary_install(tmux_env, monkeypatch): + """THE 2026-06-17 INCIDENT (versioned-binary half of DEFECT 2): cc installs as a symlink + ``~/.local/bin/claude`` → ``…/claude/versions/``. Launched by the RESOLVED path (not + the ``claude`` symlink), the process's name is the VERSION string (``2.1.179``), NOT ``claude`` + — so a basename-only ``claude``/``*/claude`` match missed it and the map stayed empty (cc never + resumed after a reboot, the live incident). cc-save must still detect it via the + ``…/claude/versions/`` path arm of the tree-walk match (which reads ``ps -o args`` so the path + is visible on both macOS and Linux — Linux ``comm`` is the truncated basename with no path). + + We reproduce the EXACT install shape: a ``claude/versions/`` symlink → ``sleep`` run by + its resolved PATH, as a real child of a pane shell, so its ``args`` (argv[0]) is + ``…/claude/versions/``. A SYMLINK (not a copy) is used so the launcher works on macOS + too: macOS SIP refuses to exec an unsigned COPY of a protected system binary, but exec'ing a + symlink to it is allowed, and ``args`` reflects the invoked path either way. + """ + home, socket, run = tmux_env + _apply_with_real_plugins(home, monkeypatch) + gen = home / ".config" / "rig" / "tmux" + + work = home / "verproj" + work.mkdir() + version = "2.1.179" + versions_dir = home / ".local" / "share" / "claude" / "versions" + versions_dir.mkdir(parents=True) + versioned = versions_dir / version + real_sleep = shutil.which("sleep") or "/bin/sleep" + versioned.symlink_to(real_sleep) # argv[0] == the versioned path under claude/versions/ + launcher = home / "launch-ver.sh" + # run the versioned binary BY ITS RESOLVED PATH (the failing production case), backgrounded so + # it stays a genuine descendant of the launcher (which keeps the pane shell alive). + launcher.write_text( + f"#!/usr/bin/env bash\n{shlex.quote(str(versioned))} 300 &\nsleep 300\n", encoding="utf-8" + ) + launcher.chmod(0o755) + + # seed a Claude Code session file for that cwd so cc-save has an id to record. + enc = str(work).replace("/", "-").replace(".", "-") + proj = home / ".claude" / "projects" / enc + proj.mkdir(parents=True) + sid = "99999999-8888-7777-6666-555555555555" + (proj / f"{sid}.jsonl").write_text("{}\n", encoding="utf-8") + + run(["tmux", "new-session", "-d", "-s", "main", "-c", str(work), str(launcher)]) + # POLL the REAL generated cc-save until it records the pane (or time out). We assert on cc-save's + # OWN output — the production acceptance criterion — instead of a separate `ps` probe, so the + # test validates the exact production matcher on the exact platform (a flat `ps args` probe + # diverged from how the shimmed-socket tree walk sees the pane and flaked on CI). This loop also + # absorbs the launch→ps-visibility race. If cc-save can detect the versioned descendant, the map + # is non-empty; if it can't (the regression / a real platform gap), the loop times out and fails. + map_file = gen / "cc-sessions.map" + deadline = time.time() + 15 + lines: list[str] = [] + found = False + while time.time() < deadline: + r = run(["bash", str(gen / "cc-save.sh")]) + assert r.returncode == 0, f"cc-save failed: {r.stderr}" + # cc-save has finished writing (sequential — `run` returns before we read), so the read is + # not racing a concurrent writer. Break ONLY when OUR versioned pane (this cwd + sid) is in + # the map — not merely on any non-empty map: cc-save scans ALL panes, so an unrelated claude + # process on the host (a parallel test, the dev's own session) could populate it without our + # descendant yet being visible, and an early break would then flake the final assertion. + if map_file.is_file(): + lines = [ln for ln in map_file.read_text().splitlines() if ln.strip()] + if any(str(work) in ln and sid in ln for ln in lines): + found = True + break + time.sleep(0.3) + # the production matcher (the `.../claude/versions/` arm) found OUR versioned pane and recorded + # its exact cwd→session-id. A timeout here is the INCIDENT regression (or a real platform gap). + assert found, ( + "INCIDENT: cc-save did NOT record the VERSIONED-binary cc pane " + f"(cwd={work}, sid={sid}); map lines={lines!r}" + ) + + def test_cc_restore_relaunches_claude_resume_into_fresh_shell(tmux_env, monkeypatch): """DEFECT 2 (restore half): with a seeded map, cc-restore sends `cd && claude --resume ` into a FRESH shell pane (never on top of a running claude / an editor)."""