Skip to content

ci(mutants): add --in-place to dodge cargo-mutants v27.0.0 #611 tmp-tree bug#41

Merged
mcarvin8 merged 1 commit into
mainfrom
cursor/mutation-yml-in-place-workaround
May 13, 2026
Merged

ci(mutants): add --in-place to dodge cargo-mutants v27.0.0 #611 tmp-tree bug#41
mcarvin8 merged 1 commit into
mainfrom
cursor/mutation-yml-in-place-workaround

Conversation

@mcarvin8
Copy link
Copy Markdown
Owner

Summary

Both full mutation sweeps run after #37/#38/#39/#40 landed -- run 25807382458 and run 25812733214 -- died with the exact same upstream error after passing the actual mutation score check (missed.txt was empty in both):

ERROR cargo_mutants::lab: Worker thread failed:
  "/tmp/cargo-mutants-config-disassembler-<rand>.tmp/src/xml/cli.rs"
  is not a file

The worker thread crashes before cargo-mutants can write its summary, the step exit code is 1, and the workflow goes red even though the underlying score is 100%.

Root cause (upstream)

This is cargo-mutants v27.0.0 issue #611, a regression introduced by #557 in v26.0.0 (Dec 2025). cargo-mutants copies the source tree into a per-mutant scratch directory under /tmp using reflink::reflink, which preserves the source file's mtime exactly. On macOS, /usr/libexec/dirhelper periodically scans /tmp and unlinks regular files whose mtime is older than CLEAN_FILES_OLDER_THAN_DAYS (default 3); on Linux, systemd-tmpfiles does the same on a configurable cadence. Any file in the repo with sufficiently-old mtime ends up in the scratch tree as a reaper target, gets silently unlinked mid-run, and the next mutant's BuildDir::overwrite_file trips this check at src/build_dir.rs:96:

ensure!(full_path.is_file(), "{full_path:?} is not a file");

The bug is load-dependent: short runs that finish between reaper invocations don't trigger it. Our debug.log on both failing runs shows the failure immediately after the revert of parse_reassemble_args -> (None, None, true), which is consistently late enough in the run for the reaper to have already fired and unlinked src/xml/cli.rs (the file in our repo with the right combination of stale source mtime + position in cargo-mutants' deterministic mutant order).

The fix landed upstream in #613, merged May 11 2026, which bumps dest mtime to now() after every reflink and gives a clearer error message. But the latest tagged cargo-mutants release (v27.0.0, 2026-03-07) does not contain the fix, and taiki-e/install-action installs from tagged binary releases via cargo-binstall.

Workaround

The upstream issue explicitly recommends --in-place for users not ready to switch off v27.0.0: it bypasses the scratch-tree copy entirely and runs mutations against the workspace source files in the runner's checkout directory. On ephemeral CI runners that's exactly what we want -- the runner is thrown away after the job, so "mutating the actual source tree" has no downside. --in-place takes the same --in-diff / --file filters and produces the same report shape as the default mode.

Both the full and incremental jobs are updated:

  • full: cargo mutants --no-shuffle --in-place
  • incremental fast path: cargo mutants --no-shuffle --in-place --in-diff mutation.diff
  • incremental test-only-diff fallback: cargo mutants --no-shuffle --in-place --file ...

Each command grows a single comment block linking back to upstream #611 / #613 so the next maintainer can drop the flag once cargo-mutants ships a fixed release.

Test plan

  • Merge.
  • Manually trigger Full mutation testing (gh workflow run mutation.yml -f full=true).
  • Expected: no is not a file worker error, exit code 0, the workflow step turns green, and the mutants.out artifact still reports missed=0 with the same caught / timeout / unviable buckets as the prior post-refactor(xml/cli): use iterator-based loop in parse_disassemble_args #40 run.

…ree bug

Both full sweeps run after #37/#38/#39/#40 landed
(https://github.com/mcarvin8/config-disassembler/actions/runs/25807382458
and
https://github.com/mcarvin8/config-disassembler/actions/runs/25812733214)
died with the exact same upstream error:

  ERROR cargo_mutants::lab: Worker thread failed:
    "/tmp/cargo-mutants-config-disassembler-<rand>.tmp/src/xml/cli.rs"
    is not a file

Even though missed.txt was empty in both reports, the worker thread
crashed before cargo-mutants could write its summary, exit code was
1, and the GitHub Actions step was marked failed.

Root cause (upstream)
---------------------

This is cargo-mutants v27.0.0 issue
sourcefrog/cargo-mutants#611, a regression
introduced by PR sourcefrog/cargo-mutants#557
in v26.0.0 (Dec 2025). cargo-mutants copies the source tree into a
per-mutant scratch directory under `/tmp` using `reflink::reflink`,
which preserves the *source* file's mtime exactly. On macOS,
`/usr/libexec/dirhelper` periodically scans `/tmp` and unlinks
regular files whose mtime is older than CLEAN_FILES_OLDER_THAN_DAYS
(default 3). The systemd-tmpfiles equivalent on Linux does the same
thing on a configurable cadence. Any source file in the repo that
hasn't been edited in three days ends up in the scratch tree with a
stale mtime, the reaper unlinks it mid-run, and the next mutant's
`BuildDir::overwrite_file` trips
`ensure!(full_path.is_file(), "{full_path:?} is not a file")` at
src/build_dir.rs:96 and the worker thread dies.

The bug is load-dependent: short runs that finish between reaper
invocations do not trigger it. cargo-mutants' debug.log on our two
failing runs shows the failure both times immediately after the
revert of `parse_reassemble_args -> (None, None, true)` -- the only
xml/cli.rs file in the repo with sufficiently stale mtime to be a
consistent reaper target on the GitHub Actions ubuntu-latest runner.

The fix landed upstream in PR
sourcefrog/cargo-mutants#613, merged May 11
2026, which bumps dest mtime to now after every reflink and gives a
clearer error message. But the latest tagged cargo-mutants release
(v27.0.0, 2026-03-07) does **not** contain the fix, and
`taiki-e/install-action` installs from tagged binary releases via
cargo-binstall.

Workaround
----------

The upstream issue explicitly recommends `--in-place` for users not
ready to switch off v27.0.0: it bypasses the scratch-tree copy
entirely and runs mutations against the workspace source files in
the runner's checkout directory. On ephemeral CI runners that's
exactly what we want -- the runner is thrown away after the job, so
"mutating the actual source tree" has no downside. `--in-place`
takes the same `--in-diff` / `--file` filters and produces the same
report shape as the default mode.

Both the `full` and `incremental` jobs are updated:

* `full` uses `cargo mutants --no-shuffle --in-place`.
* `incremental` uses `cargo mutants --no-shuffle --in-place --in-diff
  mutation.diff` for the fast path and `cargo mutants --no-shuffle
  --in-place --file ...` for the test-only-diff fallback.

Each command grows a single comment block linking back to upstream
#611/#613 so the next maintainer can drop the flag once cargo-mutants
ships a fixed release.

Verification plan
-----------------

After merging this PR, trigger the `Full mutation testing` workflow
manually (`gh workflow run mutation.yml -f full=true`). Expected
outcome: no `is not a file` worker error, exit code 0, the workflow
step turns green, and the mutants.out artifact still reports
missed=0 with the same caught / timeout buckets as the previous
runs.

Co-authored-by: Cursor <cursoragent@cursor.com>
@codecov-commenter
Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@mcarvin8 mcarvin8 merged commit c0c7d89 into main May 13, 2026
9 checks passed
@mcarvin8 mcarvin8 deleted the cursor/mutation-yml-in-place-workaround branch May 13, 2026 17:41
mcarvin8 added a commit that referenced this pull request May 13, 2026
Switch both the `incremental` and `full` jobs of the mutation workflow
from `taiki-e/install-action@v2` (which installs the latest tagged
binary release, v27.0.0 from 2026-03-07) to `cargo install --git
sourcefrog/cargo-mutants --rev cbdfe8a` (the merge commit of upstream
PR sourcefrog/cargo-mutants#613, merged
May 11 2026).

Why
---

Three consecutive full mutation sweeps on `main` after #37/#38/#39/#40
landed all crashed at the same point with cargo-mutants v27.0.0:

  ERROR Worker thread failed: ".../src/xml/cli.rs" is not a file
  Error: ".../src/xml/cli.rs" is not a file

Even though `missed.txt` was empty (0 missed mutants -- the score goal
was met), the worker thread died before cargo-mutants could write its
summary and the workflow step turned red. Investigation:

* The first two failures had the missing-file path inside the per-
  mutant scratch tempdir
  (`/tmp/cargo-mutants-config-disassembler-XXX.tmp/src/xml/cli.rs`).
  These match upstream issue
  sourcefrog/cargo-mutants#611 exactly:
  cargo-mutants v26.0.0+ uses `reflink::reflink` for the scratch-tree
  copy, which preserves the source mtime. Systemd-tmpfiles (Linux) and
  `/usr/libexec/dirhelper` (macOS) periodically delete files in `/tmp`
  with mtime older than a configurable threshold. Long-running sweeps
  cross that interval and have files silently unlinked from under
  them mid-run.
* The third failure was added in a follow-up attempt that switched to
  `--in-place` (#41 as originally proposed). The missing-file path
  this time was the *workspace* path
  (`/home/runner/work/.../src/xml/cli.rs`), which can't be a
  `dirhelper`/`systemd-tmpfiles` artifact. The same misleading error
  message in `BuildDir::overwrite_file` covers a separate failure
  mode that v27.0.0's `ensure!(full_path.is_file(), ...)` cannot
  distinguish.

Upstream PR #613 fixes both halves of this:

1. `src/copy_tree.rs`: bumps `dest` mtime to `now()` after every
   successful `reflink::reflink`, so reaper services see freshly-
   touched files and leave them alone. Closes #611.
1. `src/build_dir.rs`: rewrites `BuildDir::overwrite_file` to use
   `symlink_metadata` and emit specific error messages distinguishing
   `is a symlink` / `is not a regular file (type is X)` / `does not
   exist` / `failed to stat`. Whatever's actually going wrong on our
   `--in-place` run will finally surface as a useful diagnostic
   instead of "is not a file".

What changed
------------

Both jobs:

* Replace `uses: taiki-e/install-action@v2 / tool: cargo-mutants` with
  `run: cargo install --locked --git https://github.com/sourcefrog/cargo-mutants
   --rev cbdfe8a574566e01cef9ffaa7475dfaf69c88440 cargo-mutants`. The
  rev is pinned to the exact merge commit of #613 so the install is
  reproducible; `cargo install --locked` honours the upstream
  `Cargo.lock` for the same reason.
* Drop the `--in-place` flag that the previous version of this PR
  added: the underlying cause is the mtime issue (or whatever the
  improved error message surfaces), not the copy-vs-in-place mode.
  Default copy mode is the upstream-recommended default and gets the
  full benefit of the mtime fix.

Cost
----

`cargo install --git` builds cargo-mutants from source, which takes
~2-3 minutes on a cold runner. `Swatinem/rust-cache@v2` (already in
the workflow) caches `~/.cargo/registry`, `~/.cargo/git`, and the
build's `target/`, so warm runs are much faster -- typically under
a minute. Revisit once a 27.x.x release ships with this fix and
switch back to `taiki-e/install-action@v2`.

Verification plan
-----------------

After merging this PR, trigger the `Full mutation testing` workflow
manually (`gh workflow run mutation.yml -f full=true`). Expected
outcomes:

* `cargo-mutants` reports its version as a post-v27.0.0 git build
  (`cargo-mutants 27.0.0+...` or similar).
* `missed.txt` stays at 0.
* No `is not a file` worker error.
* If a *different* error surfaces, it'll be the new specific message
  from #613 (`is a symlink`, `is not a regular file (type is ...)`,
  `does not exist, refusing to create it`, etc.), which will tell us
  exactly what to fix next.

Co-authored-by: Cursor <cursoragent@cursor.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants