Skip to content

feat(errors): structured what/why/fix errors + stable exit codes (error-system v2)#27

Open
alex-mextner wants to merge 1 commit into
mainfrom
error-system-v2
Open

feat(errors): structured what/why/fix errors + stable exit codes (error-system v2)#27
alex-mextner wants to merge 1 commit into
mainfrom
error-system-v2

Conversation

@alex-mextner

Copy link
Copy Markdown
Owner

error-system v2 — structured what/why/fix errors + stable exit codes

Born from two same-day prod failures whose errors were thin and undiagnosable:
unknown mcp item(s): review (known: none) (no hint it was a removed slot) and a
dead hook path surfacing only as a generic harness "PreToolUse error". This makes every
rig error a consistent WHAT / WHY / HOW-to-fix block with a stable per-class exit
code
(a public contract, documented in rig --help).

What's in

  • riglib/errors.pyRigError(what/why/fix/exit_code) + per-class codes
    (2 config, 3 drift, 4 unknown-item, 5 missing-target, 6 not-a-repo,
    127 missing-dep), a single top-level guard() renderer, did-you-mean
    (Levenshtein-nearest), and a removed-slot registry (mcp.items.review → names
    agent-tools #32 + the exact fix instead of "known: none").
  • riglib/layers.py — GLOBAL vs REPO classification, single source of truth so
    rig status groups drift by layer and names which config file declares each item.
  • riglib/missing_target.py — proactively scans the harness settings.json for
    hook commands pointing at a script gone from disk (the dead-rtk-hook case). Resolves
    the invoked script (skips bare/absolute interpreters like /usr/bin/env,
    /opt/homebrew/bin/python3; honors -c only as an interpreter flag), checks only that
    token so a runtime output arg never false-flags.
  • cli.py — 3-part error rendering + layer-grouped rig status; a non-git dir
    shows not a git repository — repo layer N/A (never the "should be committed" nag) and
    degrades gracefully when the agent-tools catalog can't be resolved. rig doctor honors
    the documented 127 missing-dependency exit code.
  • plan.py raises the structured UnknownItemError; config.py adds
    primary_config_path for error provenance.

Verification

  • uv run --with pytest python -m pytest tests/ → 504 passed.
  • bash tests/smoke.sh → fully green (incl. the non-git, clean-sample, removed-slot,
    and dead-hook legs). Fixed a latent smoke bug: the non-git test dir was nested under the
    smoke's own git init-ed $TMP, so it wasn't actually non-git.
  • Reviewed across models with review --staged (Opus / Codex); their findings on the new
    modules were addressed (HOME-isolated the new status tests; tightened the missing-target
    script resolution for absolute interpreters and -c).

Known follow-ups (out of scope for this PR; CTO to triage)

  • Per-key config provenance: primary_config_path names the repo file even when the
    bad key came from the global layer.
  • Exit-code class precedence: when rig status has both drift and a dead target, it
    returns 3 (drift) and the 5 (missing-target) signal is masked.
  • The new structured errors.ConfigError exists but config.ConfigError is still what's
    raised for malformed config (so YAML errors still render the old one-line form).
  • The non-git CatalogError fallback skips the missing-target scan (a dead ~/.claude
    hook is surfaced in normal status but not in the non-git + no-catalog path).

🤖 Generated with Claude Code

…or-system v2)

Born from two same-day prod failures whose errors were thin and undiagnosable:
`unknown mcp item(s): review (known: none)` (no hint it was a REMOVED slot) and a
dead hook path surfacing only as a generic harness "PreToolUse error". This makes
every rig error a consistent WHAT / WHY / HOW-to-fix block with a stable per-class
exit code (a PUBLIC CONTRACT documented in `rig --help`).

What's in:
- riglib/errors.py: RigError(what/why/fix/exit_code) + per-class codes (2 config,
  3 drift, 4 unknown-item, 5 missing-target, 6 not-a-repo, 127 missing-dep), a
  single top-level `guard()` renderer, did-you-mean (Levenshtein), and a
  removed-slot registry (mcp.items.review -> names agent-tools #32 + the fix).
- riglib/layers.py: GLOBAL vs REPO classification — single source of truth so
  `rig status` groups drift by layer and names WHICH config file declares each item.
- riglib/missing_target.py: proactively scans the harness settings.json for hook
  commands pointing at a script gone from disk (the dead-rtk-hook case). Resolves the
  invoked SCRIPT (skips bare/absolute interpreters like /usr/bin/env, honors `-c` only
  as an interpreter flag) and checks only that token, so a runtime output arg never
  false-flags.
- cli.py: `rig status` renders the 3-part errors + layer grouping; a non-git dir shows
  "not a git repository — repo layer N/A" (never the "should be committed" nag), and
  degrades gracefully when the agent-tools catalog can't be resolved. plan.py raises
  the structured UnknownItemError. config.py adds primary_config_path for provenance.
  `rig doctor` honors the documented 127 missing-dependency exit code.

Tests: 504 pytest passing; tests/smoke.sh fully green (incl. the non-git + dead-hook
+ removed-slot legs). New: test_errors, test_missing_target, test_plan_errors,
test_status_layers; smoke gains non-git / clean-sample / removed-slot assertions.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d4de60aaa3

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread riglib/config.py
Comment on lines +131 to +132
if self.repo_path is not None:
return self.repo_path

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Track which layer declared unknown items

When both a global config and a repo rig.yaml are loaded, an unknown item can come solely from ~/.config/rig/config.yaml, but this always reports the repo file because repo_path exists. In that scenario the new structured error tells the user to remove a key from ./rig.yaml even though the key is only in the global config, making the fix path misleading for stale global entries such as removed MCP slots. The loader needs per-layer/key provenance (or at least category provenance) instead of assuming the repo file is the offender.

Useful? React with 👍 / 👎.

Comment thread riglib/missing_target.py
Comment on lines +75 to +79
if via_interpreter and tok == "-c":
return []
if _looks_like_script_path(tok):
p = Path(tok).expanduser()
return [tok] if not p.exists() else []

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Skip module invocations when scanning hook commands

For interpreter commands that use -m, e.g. python3 -m my_hook --out /tmp/result.json, there is no script file to verify, but this loop only special-cases -c and then flags the first absolute argument as a missing hook script if it does not exist. That makes healthy module-based Claude hooks produce false missing-target errors and causes rig status/rig doctor to exit non-zero. Treat -m <module> like an interpreter mode and stop looking for a script path there.

Useful? React with 👍 / 👎.

Comment thread riglib/cli.py
Comment on lines +540 to +543
if dead_targets and report.in_sync:
# no config↔disk drift, but a dead hook reference IS a problem worth a non-zero exit.
print(_dim("\n (no config↔disk drift, but a missing target above needs attention)"))
return errors.EXIT_MISSING_TARGET

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Return missing-target exit code even with drift

This only returns EXIT_MISSING_TARGET when the normal drift report is otherwise clean; if a dead hook path is found alongside any config↔disk drift, the function falls through and returns 3. In that mixed case scripts following the documented stable exit-code contract cannot detect the missing-target class until all unrelated drift is fixed, even though the missing target was found and printed. Give missing targets an explicit precedence (or otherwise document a multi-error precedence) so exit code 5 is not masked by ordinary drift.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant