Skip to content

build: pre-commit Rust pipeline spuriously fails on a cold cache (parallel clippy --all-features + capped cargo test contend on package-cache lock) #629

@bpowers

Description

@bpowers

Problem

The pre-commit hook (`scripts/pre-commit`) can spuriously fail its Rust pipeline on a cold/partially-cold build cache, even for commits that touch no Rust at all. Observed on 2026-05-22 while committing only Markdown design-plan files under `docs/`: the hook reported "Rust tests failed" / "cargo test exceeded the 180s wall-clock cap", yet the change was docs-only.

Root cause / mechanism

Pipeline A (Rust) runs clippy and the test suite concurrently, sharing one build cache (`scripts/pre-commit` around lines 125-140):

cargo clippy --all-targets --all-features -- -D warnings > "$CLIPPY_TMP" 2>&1 &
PID_CLIPPY=$!
RUST_BACKTRACE=1 "$TIMEOUT_CMD" --kill-after=30 180 cargo test > "$TEST_TMP" 2>&1 &
PID_TEST=$!

`cargo test` is wrapped in a 180s wall-clock cap (SIGTERM at 180s, SIGKILL 30s later). On a cold cache the two cargo invocations contend on the package-cache file lock ("Blocking waiting for file lock on package cache"), and clippy's `--all-features` build versus `cargo test`'s default-feature build cannot fully share compiled artifacts. The build-under-contention pushes `cargo test` past the 180s cap, the timeout kills it (exit 124/137), and the hook reports it as a test failure via the diagnostic at lines 157-167.

Evidence it is spurious, not a real regression

Immediately after the failure, run standalone with a warm cache:

  • `cargo fmt --check` passed
  • `cargo clippy --all-targets --all-features -- -D warnings` passed
  • `cargo test` passed in ~22s wall-clock (vs the 180s cap) with zero failures

Re-running the identical commit with the cache warm passed all checks. So nothing in the suite actually regressed; the cap was tripped by cold-cache build + lock contention, not by slow tests.

Why it matters (developer experience)

  • Spurious pre-commit failures and wasted full-suite reruns, most likely after feature-flag-affecting changes or a fresh checkout / `cargo clean`.
  • Especially misleading because the hook prints "The project passed all pre-commit checks before your changes. Any failing checks reported are the result of your changes..." (line 13) even when the change is docs-only and the failure is environmental. This actively misdirects debugging.

Component(s) affected

  • `scripts/pre-commit` (Pipeline A: Rust -- the concurrent clippy/test launch and the 180s `cargo test` cap)

Possible approaches for resolution (for maintainers to weigh; not prescribing)

  1. Run clippy and test sequentially in the hook, eliminating the package-cache lock contention (at the cost of some wall-clock on warm caches).
  2. Give the test step its own `--target-dir` so the two cargo invocations don't share/contend on the same cache lock (at the cost of extra disk + a second cold build).
  3. Exclude build/lock-wait time from the 180s budget -- e.g. a warm-up `cargo test --no-run` (or `cargo build --all-targets`) before the timed test run, so the cap measures test execution rather than compilation-under-contention.
  4. Optionally soften the line-13 banner / the 124-137 diagnostic to acknowledge that a cold-cache timeout may be environmental rather than caused by the staged change.

The 180s cap itself is intentional (it guards against the PR #461 pathological-runtime regression class; see the comment at lines 130-137), so the fix should preserve that guard rail while not letting cold-cache compile time count against it.

How it was discovered

Observed on 2026-05-22 while committing only Markdown design-plan files under `docs/`; the docs-only commit tripped the Rust `cargo test` 180s cap on a cold cache, and a warm-cache rerun of the same commit passed cleanly (cargo test ~22s).

Related (distinct) issues

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions