feat(errors): structured what/why/fix errors + stable exit codes (error-system v2)#27
feat(errors): structured what/why/fix errors + stable exit codes (error-system v2)#27alex-mextner wants to merge 1 commit into
Conversation
…or-system v2) Born from two same-day prod failures whose errors were thin and undiagnosable: `unknown mcp item(s): review (known: none)` (no hint it was a REMOVED slot) and a dead hook path surfacing only as a generic harness "PreToolUse error". This makes every rig error a consistent WHAT / WHY / HOW-to-fix block with a stable per-class exit code (a PUBLIC CONTRACT documented in `rig --help`). What's in: - riglib/errors.py: RigError(what/why/fix/exit_code) + per-class codes (2 config, 3 drift, 4 unknown-item, 5 missing-target, 6 not-a-repo, 127 missing-dep), a single top-level `guard()` renderer, did-you-mean (Levenshtein), and a removed-slot registry (mcp.items.review -> names agent-tools #32 + the fix). - riglib/layers.py: GLOBAL vs REPO classification — single source of truth so `rig status` groups drift by layer and names WHICH config file declares each item. - riglib/missing_target.py: proactively scans the harness settings.json for hook commands pointing at a script gone from disk (the dead-rtk-hook case). Resolves the invoked SCRIPT (skips bare/absolute interpreters like /usr/bin/env, honors `-c` only as an interpreter flag) and checks only that token, so a runtime output arg never false-flags. - cli.py: `rig status` renders the 3-part errors + layer grouping; a non-git dir shows "not a git repository — repo layer N/A" (never the "should be committed" nag), and degrades gracefully when the agent-tools catalog can't be resolved. plan.py raises the structured UnknownItemError. config.py adds primary_config_path for provenance. `rig doctor` honors the documented 127 missing-dependency exit code. Tests: 504 pytest passing; tests/smoke.sh fully green (incl. the non-git + dead-hook + removed-slot legs). New: test_errors, test_missing_target, test_plan_errors, test_status_layers; smoke gains non-git / clean-sample / removed-slot assertions. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d4de60aaa3
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if self.repo_path is not None: | ||
| return self.repo_path |
There was a problem hiding this comment.
Track which layer declared unknown items
When both a global config and a repo rig.yaml are loaded, an unknown item can come solely from ~/.config/rig/config.yaml, but this always reports the repo file because repo_path exists. In that scenario the new structured error tells the user to remove a key from ./rig.yaml even though the key is only in the global config, making the fix path misleading for stale global entries such as removed MCP slots. The loader needs per-layer/key provenance (or at least category provenance) instead of assuming the repo file is the offender.
Useful? React with 👍 / 👎.
| if via_interpreter and tok == "-c": | ||
| return [] | ||
| if _looks_like_script_path(tok): | ||
| p = Path(tok).expanduser() | ||
| return [tok] if not p.exists() else [] |
There was a problem hiding this comment.
Skip module invocations when scanning hook commands
For interpreter commands that use -m, e.g. python3 -m my_hook --out /tmp/result.json, there is no script file to verify, but this loop only special-cases -c and then flags the first absolute argument as a missing hook script if it does not exist. That makes healthy module-based Claude hooks produce false missing-target errors and causes rig status/rig doctor to exit non-zero. Treat -m <module> like an interpreter mode and stop looking for a script path there.
Useful? React with 👍 / 👎.
| if dead_targets and report.in_sync: | ||
| # no config↔disk drift, but a dead hook reference IS a problem worth a non-zero exit. | ||
| print(_dim("\n (no config↔disk drift, but a missing target above needs attention)")) | ||
| return errors.EXIT_MISSING_TARGET |
There was a problem hiding this comment.
Return missing-target exit code even with drift
This only returns EXIT_MISSING_TARGET when the normal drift report is otherwise clean; if a dead hook path is found alongside any config↔disk drift, the function falls through and returns 3. In that mixed case scripts following the documented stable exit-code contract cannot detect the missing-target class until all unrelated drift is fixed, even though the missing target was found and printed. Give missing targets an explicit precedence (or otherwise document a multi-error precedence) so exit code 5 is not masked by ordinary drift.
Useful? React with 👍 / 👎.
error-system v2 — structured what/why/fix errors + stable exit codes
Born from two same-day prod failures whose errors were thin and undiagnosable:
unknown mcp item(s): review (known: none)(no hint it was a removed slot) and adead hook path surfacing only as a generic harness "PreToolUse error". This makes every
rig error a consistent WHAT / WHY / HOW-to-fix block with a stable per-class exit
code (a public contract, documented in
rig --help).What's in
riglib/errors.py—RigError(what/why/fix/exit_code)+ per-class codes(
2config,3drift,4unknown-item,5missing-target,6not-a-repo,127missing-dep), a single top-levelguard()renderer, did-you-mean(Levenshtein-nearest), and a removed-slot registry (
mcp.items.review→ namesagent-tools #32 + the exact fix instead of "known: none").
riglib/layers.py— GLOBAL vs REPO classification, single source of truth sorig statusgroups drift by layer and names which config file declares each item.riglib/missing_target.py— proactively scans the harnesssettings.jsonforhook commands pointing at a script gone from disk (the dead-rtk-hook case). Resolves
the invoked script (skips bare/absolute interpreters like
/usr/bin/env,/opt/homebrew/bin/python3; honors-conly as an interpreter flag), checks only thattoken so a runtime output arg never false-flags.
cli.py— 3-part error rendering + layer-groupedrig status; a non-git dirshows
not a git repository — repo layer N/A(never the "should be committed" nag) anddegrades gracefully when the agent-tools catalog can't be resolved.
rig doctorhonorsthe documented
127missing-dependency exit code.plan.pyraises the structuredUnknownItemError;config.pyaddsprimary_config_pathfor error provenance.Verification
uv run --with pytest python -m pytest tests/→ 504 passed.bash tests/smoke.sh→ fully green (incl. the non-git, clean-sample, removed-slot,and dead-hook legs). Fixed a latent smoke bug: the non-git test dir was nested under the
smoke's own
git init-ed$TMP, so it wasn't actually non-git.review --staged(Opus / Codex); their findings on the newmodules were addressed (HOME-isolated the new status tests; tightened the missing-target
script resolution for absolute interpreters and
-c).Known follow-ups (out of scope for this PR; CTO to triage)
primary_config_pathnames the repo file even when thebad key came from the global layer.
rig statushas both drift and a dead target, itreturns
3(drift) and the5(missing-target) signal is masked.errors.ConfigErrorexists butconfig.ConfigErroris still what'sraised for malformed config (so YAML errors still render the old one-line form).
~/.claudehook is surfaced in normal status but not in the non-git + no-catalog path).
🤖 Generated with Claude Code