diff --git a/.specify/feature.json b/.specify/feature.json index a124639..eb767d7 100644 --- a/.specify/feature.json +++ b/.specify/feature.json @@ -1,3 +1,3 @@ { - "feature_directory": "specs/011-remediation-safety-rename" + "feature_directory": "specs/012-remediation-safety" } diff --git a/CHANGELOG.md b/CHANGELOG.md index 08cf521..d80b68e 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -7,6 +7,29 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ## [Unreleased] +### Added + +- A ruleset analyser (`analyseRuleset()`) that assigns every rule in a loaded + ruleset a risk level (`low`/`medium`/`high`), a confidence level + (`high`/`medium`/`low`), and a derived remediation safety level + (`safe`/`humanreview`/`unsafe`), with provenance and a human-readable + rationale. See + [automated_remediation_safety_algorithm_spec.md](specs/algorithms/automated_remediation_safety_algorithm_spec.md). +- `--remediation-safety ` (CLI) and the `grade-api-remediation-safety` + MCP tool's `level` parameter now accept all three levels — `safe`, + `humanreview`, and `unsafe` — instead of only `safe`. Every returned item now + also carries `riskLevel`, `confidenceLevel`, `remediationSafetyLevel`, and + `staleFingerprintWarning`. `safe` membership is unchanged from prior + behavior. +- A new CLI subcommand, `ruleset-analysis [--ruleset-path ] [--format + json|human]`, and a new MCP tool, `analyse-ruleset-safety`, expose the + analyser's output independent of grading any specific spec. +- `ruleset-analysis correct --rule-id --level ` persists a + human-confirmed correction for one rule, colocated with the ruleset (or as a + personal override when the ruleset's location isn't locally writable), and + reloaded automatically on future runs against the same ruleset — including by + teammates pointed at the same shared ruleset. + ### Changed - **Breaking**: the CLI's `--quick-fixes-only` flag is renamed to diff --git a/CLAUDE.md b/CLAUDE.md index 67e91d7..c2087e1 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -1,5 +1,5 @@ For additional context about technologies to be used, project structure, shell commands, and other important information, read the current plan -at specs/011-remediation-safety-rename/plan.md +at specs/012-remediation-safety/plan.md diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 11132b3..dc13d7c 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -68,7 +68,7 @@ tests/ |---------|------|-------------| | `@dawmatt/api-grade` | `/` (root) | CLI tool (`api-grade` binary) | | `@dawmatt/api-grade-core` | `packages/api-grade-core/` | Standalone grading library used by all other packages | -| `@dawmatt/api-grade-mcp` | `packages/api-grade-mcp/` | MCP server exposing six AI tools (`grade-api`, `grade-api-detailed`, `assert-api-grade`, `grade-api-quick-fixes-only`, `set-ruleset-config`, `get-ruleset-config`) | +| `@dawmatt/api-grade-mcp` | `packages/api-grade-mcp/` | MCP server exposing seven AI tools (`grade-api`, `grade-api-detailed`, `assert-api-grade`, `grade-api-remediation-safety`, `analyse-ruleset-safety`, `set-ruleset-config`, `get-ruleset-config`) | | `@dawmatt/backstage-plugin-api-grade` | `packages/backstage-plugin-api-grade/` | Backstage frontend card plugin | | `@dawmatt/backstage-plugin-api-grade-backend` | `packages/backstage-plugin-api-grade-backend/` | Backstage backend grading plugin | diff --git a/docs/cli/commands.md b/docs/cli/commands.md index 137c13e..2c47edc 100644 --- a/docs/cli/commands.md +++ b/docs/cli/commands.md @@ -26,7 +26,7 @@ api-grade [options] | `--token ` | GitHub Personal Access Token used to authenticate a remote ruleset fetch (only consulted when `--auth-type github-pat`) | | `--format ` | Output format: `human` (default) or `json` | | `--top ` | Show only the top N diagnostics (useful for large specs) | -| `--remediation-safety ` | Filter diagnostics to the given remediation safety level (currently: `safe`) | +| `--remediation-safety ` | Filter diagnostics to the given remediation safety level: `safe`, `humanreview`, or `unsafe` | | `--verbose` | Print the full error stack when a runtime error occurs | | `-V, --version` | Print the version number | | `-h, --help` | Show usage information | @@ -319,9 +319,10 @@ api-grade openapi.yaml --ruleset my-rules.yaml > `diagnosticCounts` wrapper. See [CHANGELOG.md](../../CHANGELOG.md) for the > old → new field mapping. -When using `--format json`, the output is a JSON object with the same flat field -names used by the MCP server's `grade-api` / `grade-api-detailed` tools — one -parser works for both: +When using `--format json`, the output is **pretty-printed** (two-space indented, like every +other end-user-visible JSON document this CLI prints — no compact/minified output) and is a +JSON object with the same flat field names used by the MCP server's `grade-api` / +`grade-api-detailed` tools — one parser works for both: ```json { @@ -351,13 +352,29 @@ parser works for both: "message": "\"version\" property must be string.", "severity": "error", "path": ["info", "version"], - "range": { "start": { "line": 3, "character": 0 }, "end": { "line": 3, "character": 5 } } + "range": { "start": { "line": 3, "character": 0 }, "end": { "line": 3, "character": 5 } }, + "source": "openapi.yaml", + "riskLevel": "high", + "confidenceLevel": "low", + "remediationSafetyLevel": "unsafe", + "staleFingerprintWarning": null } ], "rulesetSource": "default" } ``` +Every diagnostic always carries `riskLevel`, `confidenceLevel`, `remediationSafetyLevel`, and +`staleFingerprintWarning` — the same per-violation remediation-safety signals described below +under [Remediation Safety](#remediation-safety---remediation-safety-level) — computed from the +ruleset analyser against the spec's effective ruleset. This is **not** gated behind +`--remediation-safety`; it is present on every `--format json` (and, as a one-line +`safety=... risk=... confidence=...` annotation under each finding, every `--format human`) +grading run, so a regular user can see at a glance how risky each fix is without requesting a +separate filtered view. `--remediation-safety ` (below) additionally *filters* the +diagnostics down to one level and reshapes them into `remediationItems`; absent that flag, the +full, unfiltered diagnostic list is annotated in place instead. + `truncated: true` is added only when `--top` actually drops entries from `diagnostics`. `rulesetPath` is added only when a custom ruleset was used. @@ -365,38 +382,63 @@ parser works for both: ## Remediation Safety (`--remediation-safety `) -`--remediation-safety safe` filters diagnostics down to the non-breaking, -safely-automatable subset — the same classification used by the MCP server's -`grade-api-remediation-safety` tool. It is a *filter*, independent of `--format`, so it -works with either output format. Only `safe` is accepted today; any other value is -rejected with a non-zero exit code. +`--remediation-safety ` filters diagnostics down to one of three remediation-safety +levels — the same classification used by the MCP server's `grade-api-remediation-safety` +tool — and is computed by the ruleset analyser (see `ruleset-analysis` below). It is a +*filter*, independent of `--format`, so it works with either output format. + +| Level | Meaning | +|---|---| +| `safe` | Non-breaking, safe to auto-apply without per-change human review | +| `humanreview` | Typically additive/clarifying, but should be confirmed by a human before applying at scale | +| `unsafe` | Could change request/response validation, required fields, types, or the parameter surface — requires human (or explicitly-confirmed agent) review | + +Any other value is rejected with `Error: --remediation-safety must be one of: safe, +humanreview, unsafe.` and a non-zero exit code. **Machine-readable:** ```bash -api-grade openapi.yaml --remediation-safety safe --format json +api-grade openapi.yaml --remediation-safety humanreview --format json ``` +This output is pretty-printed the same as the regular `--format json` output above: + ```json { "specPath": "openapi.yaml", "format": "openapi-3", "totalViolations": 22, - "quickFixCount": 3, - "quickFixes": [ + "requestedLevel": "humanreview", + "remediationItemCount": 3, + "remediationItems": [ { - "ruleId": "info-contact", - "message": "Info object must have \"contact\" object.", + "ruleId": "operation-operationId", + "message": "Operation must have \"operationId\".", "severity": "warn", - "path": ["info"], - "location": "info", + "path": ["paths", "/pets", "get"], + "location": "paths./pets.get", + "range": { "start": { "line": 11, "character": 2 }, "end": { "line": 11, "character": 5 } }, "currentValue": null, - "expectedImprovement": "Add a `contact` object to the info block with name, email, or url" + "expectedImprovement": "Fix: Operation must have \"operationId\". Add or update `operationId` as required", + "riskLevel": "medium", + "confidenceLevel": "high", + "remediationSafetyLevel": "humanreview", + "staleFingerprintWarning": null } ] } ``` +Each item carries `riskLevel` (`low`/`medium`/`high`) and `confidenceLevel` +(`high`/`medium`/`low`) alongside `remediationSafetyLevel` — the field +`--remediation-safety`/`requestedLevel` filters against — and a `staleFingerprintWarning` +that is non-null only when a human-assessed rule classification's underlying rule +definition has since changed (see `ruleset-analysis` below). `severity` is the diagnostic's +actual severity (`error`/`warn`/`info`/`hint`, not a fixed placeholder), and `range` carries +the same line/character location as the regular diagnostics output — both are required to +act on a remediation item without losing the line-number context a linter normally provides. + **Human-readable** (default, or with `--format human`): ```bash @@ -405,11 +447,42 @@ api-grade openapi.yaml --remediation-safety safe Prints the same filtered list as readable text instead of JSON. -`--remediation-safety safe` has no effect on `--min-grade` — the gate still evaluates the +`--remediation-safety ` has no effect on `--min-grade` — the gate still evaluates the spec's actual letter grade from the full, unfiltered diagnostics. --- +## Ruleset Analysis (`ruleset-analysis`) + +Inspects a ruleset's per-rule remediation-safety analysis independent of grading any spec: + +```bash +api-grade ruleset-analysis --format human +api-grade ruleset-analysis --ruleset-path ./my-ruleset.yaml --format json +``` + +`--format human` (default) prints a table with rule id, risk level, confidence level, +remediation safety level, assessed by (`human`/`automated`), and rationale — plus a +fingerprint-mismatch warning line for any human-assessed rule whose underlying definition +has changed since it was last reviewed. `--format json` returns the full `RulesetAnalysis` +document. Without `--ruleset-path`, analyses the built-in ruleset. + +To persist a human-confirmed correction for one rule (reloaded automatically on future runs +against the same ruleset): + +```bash +api-grade ruleset-analysis correct --rule-id operation-operationId --level safe \ + --ruleset-path ./my-ruleset.yaml +``` + +For a local ruleset, this writes a colocated `.remediation-safety.json` file next +to the ruleset (commit it so your team shares the same judgements). For a non-writable +ruleset location (e.g. a GitHub-hosted ruleset, or the built-in ruleset), the correction is +recorded locally as a personal override instead, and the equivalent shared-file content is +printed for you to commit yourself. + +--- + ## Structured `--min-grade` Outcome in JSON Mode When `--min-grade ` is combined with `--format json`, the CLI prints a diff --git a/docs/getting-started.md b/docs/getting-started.md index 1398a7c..21b5b6e 100644 --- a/docs/getting-started.md +++ b/docs/getting-started.md @@ -55,7 +55,7 @@ Two Backstage plugin packages that display API grades directly on your Backstage ### MCP Server (`@dawmatt/api-grade-mcp`) -An MCP (Model Context Protocol) server that exposes api-grade as six AI tools: `grade-api`, `grade-api-detailed`, `assert-api-grade`, `grade-api-remediation-safety`, `set-ruleset-config`, and `get-ruleset-config`. Register it in Claude Code, GitHub Copilot (VS Code Agent mode), or any MCP-compatible AI host and let the AI grade specs directly. +An MCP (Model Context Protocol) server that exposes api-grade as seven AI tools: `grade-api`, `grade-api-detailed`, `assert-api-grade`, `grade-api-remediation-safety`, `analyse-ruleset-safety`, `set-ruleset-config`, and `get-ruleset-config`. Register it in Claude Code, GitHub Copilot (VS Code Agent mode), or any MCP-compatible AI host and let the AI grade specs directly. ```bash claude mcp add api-grade -- npx -y @dawmatt/api-grade-mcp diff --git a/docs/index.md b/docs/index.md index a0300d9..e6f3509 100644 --- a/docs/index.md +++ b/docs/index.md @@ -15,9 +15,9 @@ | [Package Usage Guide](package/usage-guide.md) | Common integration patterns and worked examples | | [Package API Reference](package/api-reference.md) | All exported functions, classes, and types | | [API Diagnostic Algorithm Specification](../specs/algorithms/api_diagnostic_algorithm_spec.md) | How scores, grades, and recommendations are computed | -| [Quick-Fixes Algorithm Specification](../specs/algorithms/quick_fixes_algorithm_spec.md) | How non-breaking, safely-automatable violations are identified | +| [Automated Remediation Safety Algorithm Specification](../specs/algorithms/automated_remediation_safety_algorithm_spec.md) | How risk, confidence, and remediation safety are determined per rule | | [MCP Server](mcp/README.md) | Grade specs from AI tools via MCP | -| [MCP Server Overview](package/api-grade-mcp.md) | All six MCP tools and their inputs/outputs | +| [MCP Server Overview](package/api-grade-mcp.md) | All MCP tools and their inputs/outputs | | [MCP Quick Start](mcp/quick-start.md) | Install and configure the MCP server in minutes | | [MCP Configuration Reference](mcp/configuration.md) | Default rulesets, auth, and scope precedence | | [MCP GitHub Token Setup](mcp/github-pat-setup.md) | One-time GitHub PAT creation for `github-pat` ruleset auth | diff --git a/docs/mcp/quick-start.md b/docs/mcp/quick-start.md index 9d1d4c0..fee6505 100644 --- a/docs/mcp/quick-start.md +++ b/docs/mcp/quick-start.md @@ -175,7 +175,8 @@ Reload Cursor after saving. | `grade-api` | Quick grade: letter grade, numeric score, and summary | | `grade-api-detailed` | Full grade with all violations, diagnostics, and recommendations | | `assert-api-grade` | Pass/fail assertion for a minimum grade threshold | -| `grade-api-remediation-safety` | Classified list of diagnostics filtered by remediation safety level (`safe`: non-breaking improvements) for AI-assisted correction | +| `grade-api-remediation-safety` | Classified list of diagnostics filtered by remediation safety level (`safe`, `humanreview`, or `unsafe`), each with a risk/confidence indicator, for AI-assisted correction | +| `analyse-ruleset-safety` | Per-rule risk, confidence, and remediation-safety analysis for a ruleset, independent of grading any spec | | `set-ruleset-config` | Set the default Spectral ruleset at session, workspace, or global scope | | `get-ruleset-config` | Get the active Spectral ruleset and which scope is effective | @@ -207,7 +208,7 @@ To confirm the server starts correctly: echo '{"jsonrpc":"2.0","id":1,"method":"tools/list","params":{}}' | npx -y @dawmatt/api-grade-mcp ``` -You should see a JSON response listing all six tools. +You should see a JSON response listing all the tools above. --- diff --git a/docs/package/README.md b/docs/package/README.md index 12170d1..c0c43b1 100644 --- a/docs/package/README.md +++ b/docs/package/README.md @@ -79,6 +79,6 @@ After building, the `api-grade-core` package is available under `packages/api-gr - [Usage Guide](usage-guide.md) — common integration patterns and worked examples - [API Reference](api-reference.md) — all exported functions, classes, and types - [API Diagnostic Algorithm Specification](../../specs/algorithms/api_diagnostic_algorithm_spec.md) — how scores, grades, and recommendations are computed -- [Quick-Fixes Algorithm Specification](../../specs/algorithms/quick_fixes_algorithm_spec.md) — how non-breaking, safely-automatable violations are identified +- [Automated Remediation Safety Algorithm Specification](../../specs/algorithms/automated_remediation_safety_algorithm_spec.md) — how risk, confidence, and remediation safety are determined per rule - [Documentation Index](../index.md) — full navigation across all docs - [CLI Tool](../cli/README.md) — use api-grade from the command line diff --git a/docs/package/api-grade-mcp.md b/docs/package/api-grade-mcp.md index 79f00fc..9fbd19b 100644 --- a/docs/package/api-grade-mcp.md +++ b/docs/package/api-grade-mcp.md @@ -121,14 +121,24 @@ Assert that an API specification meets a minimum grade threshold (A > B > C > D ### `grade-api-remediation-safety` -Return a classified, AI-actionable list of diagnostics filtered by remediation safety level. The `safe` level covers improvements that can be made via non-breaking changes (those that do not alter paths, methods, required parameters, schema types, or response structures). Each result includes `ruleId`, `path`, `location`, `currentValue`, and `expectedImprovement`. +Return a classified, AI-actionable list of diagnostics filtered by remediation safety level: `safe` (non-breaking, safe to auto-apply), `humanreview` (typically additive/clarifying but should be confirmed by a human before applying at scale), or `unsafe` (could change request/response validation, required fields, types, or the parameter surface — requires human or explicitly-confirmed-agent review). Each result includes `ruleId`, `severity`, `path`, `location`, `range`, `currentValue`, `expectedImprovement`, and a confidence indicator (`riskLevel`, `confidenceLevel`, `remediationSafetyLevel`, `staleFingerprintWarning`) — `severity` and `range` are carried over unchanged from the underlying diagnostic, so no line-number/severity context is lost relative to `grade-api-detailed`. -**Input**: `specPath` (required), `level` (required: `safe`), `rulesetPath` (optional), `recoveryOption` (optional) +**Input**: `specPath` (required), `level` (required: `safe`/`humanreview`/`unsafe`), `rulesetPath` (optional), `recoveryOption` (optional) **Use when**: Asking the AI to generate fixes for documentation and metadata issues without risking breaking changes. Use this tool instead of `grade-api-detailed` when the goal is AI-assisted correction. --- +### `analyse-ruleset-safety` + +Inspect a Spectral ruleset's per-rule remediation-safety analysis (`riskLevel`, `confidenceLevel`, `remediationSafetyLevel`, `assessedBy`, `rationale`) without grading any specific API specification. Returns the same `RulesetAnalysis` document the CLI's `ruleset-analysis` subcommand produces. + +**Input**: `rulesetPath` (optional), `recoveryOption` (optional) + +**Use when**: You want to understand how risky it would be to auto-remediate violations of each rule in a ruleset before running `grade-api-remediation-safety` against a real spec. + +--- + ### `set-ruleset-config` Set the default Spectral ruleset at session, workspace, or global scope. The configured default applies to all subsequent grading requests without needing to supply `rulesetPath` each time. diff --git a/docs/package/api-reference.md b/docs/package/api-reference.md index a77c082..9e2e509 100644 --- a/docs/package/api-reference.md +++ b/docs/package/api-reference.md @@ -67,38 +67,47 @@ console.log(result.numericScore); // 74 --- -## `formatJson(result: GradeResult): string` +## `formatJson(result: GradeResult, top?: number, rulesetAnalysis?: RulesetAnalysis): string` -Serialises a `GradeResult` to a JSON string suitable for machine-readable output. The output shape matches the `--format json` CLI output. +Serialises a `GradeResult` to a **pretty-printed** (two-space indented) JSON string suitable +for both machine-readable and human-readable output — every JSON document this package emits +is pretty-printed, never minified. The output shape matches the `--format json` CLI output. **Parameters:** | Name | Type | Required | Description | |------|------|----------|-------------| | `result` | `GradeResult` | Yes | The result returned by `grade()` or `gradeContent()` | +| `top` | `number` | No | Truncate `diagnostics` to the first N entries (sets `truncated: true` if entries were dropped) | +| `rulesetAnalysis` | `RulesetAnalysis` | No | When supplied (see `analyseRuleset()` below), each diagnostic is decorated in place with `riskLevel`, `confidenceLevel`, `remediationSafetyLevel`, and `staleFingerprintWarning` — the same remediation-safety signals `buildRemediationSafetyOutput()` filters on, but applied to every diagnostic, not just one level | -**Returns:** `string` — a formatted JSON string +**Returns:** `string` — a pretty-printed JSON string **Example:** ```typescript -import { GradeEngine, formatJson } from '@dawmatt/api-grade-core'; +import { GradeEngine, formatJson, analyseRuleset, loadRuleset } from '@dawmatt/api-grade-core'; const engine = new GradeEngine(); const result = await engine.grade({ specPath: './openapi.yaml' }); -console.log(formatJson(result)); +const rulesetAnalysis = await analyseRuleset(await loadRuleset(result.format, result.rulesetPath)); +console.log(formatJson(result, undefined, rulesetAnalysis)); // diagnostics include safety info ``` --- -## `formatHuman(result: GradeResult): string` +## `formatHuman(result: GradeResult, top?: number, rulesetAnalysis?: RulesetAnalysis): string` -Serialises a `GradeResult` to a human-readable text string. The output matches the default CLI output. +Serialises a `GradeResult` to a human-readable text string. The output matches the default CLI +output. When `rulesetAnalysis` is supplied, a `safety=... risk=... confidence=...` line is +printed under each diagnostic, same as `formatJson`'s decoration. **Parameters:** | Name | Type | Required | Description | |------|------|----------|-------------| | `result` | `GradeResult` | Yes | The result returned by `grade()` or `gradeContent()` | +| `top` | `number` | No | Show only the first N diagnostics | +| `rulesetAnalysis` | `RulesetAnalysis` | No | When supplied, annotates each printed diagnostic with its remediation-safety signals | **Returns:** `string` — a formatted human-readable report @@ -112,7 +121,7 @@ Serialises a `GradeResult` to a human-readable text string. The output matches t > these exact field names — see [CLI Commands](../cli/commands.md#json-output-schema) > and [MCP Server Tool Reference](api-grade-mcp.md) for where each shape is used. -### `buildCommonGradeOutput(result: GradeResult, options?: { top?: number }): CommonGradeOutput` +### `buildCommonGradeOutput(result: GradeResult, options?: { top?: number; rulesetAnalysis?: RulesetAnalysis }): CommonGradeOutput` Shapes a `GradeResult` for "grade a spec, give me everything" output. Used by the CLI's `--format json`, MCP's `grade-api`, and MCP's `grade-api-detailed`. @@ -125,13 +134,27 @@ interface CommonGradeOutput { gradeLabel: GradeLabel; numericScore: number; summary: DiagnosticSummary; - diagnostics: Diagnostic[]; + diagnostics: Diagnostic[] | DiagnosticWithSafety[]; truncated?: boolean; // present only when `options.top` actually dropped entries rulesetSource: 'default' | 'custom'; rulesetPath?: string; // present only when a custom ruleset was used } + +interface DiagnosticWithSafety extends Diagnostic { + riskLevel: RiskLevel | null; + confidenceLevel: ConfidenceLevel; + remediationSafetyLevel: RemediationSafetyLevel; + staleFingerprintWarning: StaleFingerprintWarning | null; +} ``` +`diagnostics` is `DiagnosticWithSafety[]` whenever `options.rulesetAnalysis` is supplied — +each entry is the original `Diagnostic` plus the four remediation-safety fields, looked up via +`getRemediationSafety()` (below) — and plain `Diagnostic[]` otherwise. Callers that always +want safety info on regular grade output (not just the `--remediation-safety`-filtered view) +should always pass `rulesetAnalysis`; the CLI's `--format json`/`--format human` paths do this +unconditionally. + Tool-specific data (e.g. MCP's `largeSpecWarning`, `recoveryOptions`) is layered additively on top of this shape by the consuming package — it is never renamed or restructured. @@ -151,43 +174,95 @@ interface AssertOutput { } ``` -### `buildQuickFixOutput(result: GradeResult, specContent: string): QuickFixOutput` +### `analyseRuleset(loadedRuleset: LoadedRuleset, options?: { auth?: AuthConfig | null }): Promise` + +Computes a per-rule remediation-safety analysis for a loaded ruleset — risk level, +confidence level, and the derived remediation safety level for every rule, with +provenance (`assessedBy`, `source`) and a rationale. Checks persisted/bundled stores +(workspace override → global override → colocated shared analysis → bundled default for +the built-in ruleset) before falling through to the automated heuristic. See the +[Automated Remediation Safety Algorithm Specification](../../specs/algorithms/automated_remediation_safety_algorithm_spec.md) +for the full algorithm. + +```typescript +interface RulesetAnalysis { + rulesetSource: 'default' | 'custom'; + rulesetPath?: string; + rules: RuleAnalysis[]; +} + +interface RuleAnalysis { + ruleId: string; + riskLevel: RiskLevel | null; // "low" | "medium" | "high" + confidenceLevel: ConfidenceLevel; // "high" | "medium" | "low" + remediationSafetyLevel: RemediationSafetyLevel; // "safe" | "humanreview" | "unsafe" + assessedBy: AssessmentOrigin; // "human" | "automated" + staleFingerprintWarning: StaleFingerprintWarning | null; + rationale: string; + source: AnalysisSource; // "persisted" | "bundled-default" | "heuristic" | "fallback" +} +``` + +### `getRemediationSafety(diagnostic: Diagnostic, rulesetAnalysis: RulesetAnalysis)` + +Looks up a single violation's remediation safety against a previously computed +`RulesetAnalysis`, by `ruleId`. Defaults to `{ riskLevel: "high", confidenceLevel: "low", +remediationSafetyLevel: "unsafe", staleFingerprintWarning: null }` when the rule isn't +covered by the analysis. -Shapes the "safely-automatable fixes" subset. Used by MCP's -`grade-api-remediation-safety` and the CLI's `--remediation-safety safe --format json`. +### `buildRemediationSafetyOutput(result: GradeResult, specContent: string, rulesetAnalysis: RulesetAnalysis, requestedLevel: RemediationSafetyLevel): RemediationSafetyOutput` + +Shapes the diagnostics matching one remediation-safety level. Used by MCP's +`grade-api-remediation-safety` and the CLI's `--remediation-safety --format json`. ```typescript -interface QuickFixOutput { +interface RemediationSafetyOutput { specPath: string; format: ApiFormat; totalViolations: number; - quickFixCount: number; - quickFixes: QuickFix[]; + remediationItemCount: number; + remediationItems: RemediationItem[]; + requestedLevel: RemediationSafetyLevel; } -interface QuickFix { +interface RemediationItem { ruleId: string; message: string; - severity: string; + severity: DiagnosticSeverity; // "error" | "warn" | "info" | "hint" — the diagnostic's actual severity path: string[]; location: string; // dot-joined `path` + range: Diagnostic['range']; // line/character location, carried over from the source diagnostic currentValue: string | null; expectedImprovement: string; + riskLevel: RiskLevel | null; + confidenceLevel: ConfidenceLevel; + remediationSafetyLevel: RemediationSafetyLevel; + staleFingerprintWarning: StaleFingerprintWarning | null; } ``` -### `formatQuickFixesHuman(result: GradeResult, specContent: string): string` +`severity` and `range` are carried over unchanged from the underlying `Diagnostic` — a +`RemediationItem` is never missing the line-number/severity context a regular diagnostic has, +even though it's filtered down to one remediation-safety level and reshaped with +remediation-specific fields (`location`, `currentValue`, `expectedImprovement`). + +The JSON returned by `buildRemediationSafetyOutput()` is pretty-printed by every caller +(`JSON.stringify(output, null, 2)`), matching `formatJson()`'s output style — it is not +minified. -Renders the same filtered `QuickFix[]` list used by `buildQuickFixOutput()` as -human-readable text. Used by the CLI's `--remediation-safety safe` with `--format human` -(the default). +### `formatRemediationSafetyHuman(result: GradeResult, specContent: string, rulesetAnalysis: RulesetAnalysis, requestedLevel: RemediationSafetyLevel): string` -### `classifyViolation(diagnostic: Diagnostic): ViolationClass` +Renders the same filtered `RemediationItem[]` list used by `buildRemediationSafetyOutput()` +as human-readable text, including each item's line number (`Line N`) when `range` is present. +Used by the CLI's `--remediation-safety ` with `--format human` (the default). -Classifies a single diagnostic as `'nonBreaking' | 'breaking' | 'unknown'`. The -classification basis for `buildQuickFixOutput()`'s filtering. See the -[Quick-Fixes Algorithm Specification](../../specs/algorithms/quick_fixes_algorithm_spec.md) -for the full rationale behind which violations are classified which way. +### `persistRuleAnalysisCorrection(loadedRuleset, ruleId, remediationSafetyLevel, scope?)` + +Persists a human-confirmed remediation-safety correction for one rule, written to the +colocated shared analysis file (default, for a writable local ruleset) or a personal +override (workspace/global scope, or as a fallback for a non-writable remote/built-in +ruleset location). Reloaded automatically by `analyseRuleset()` on future runs against the +same ruleset. --- @@ -282,6 +357,11 @@ interface Diagnostic { } ``` +See `DiagnosticWithSafety` (under `buildCommonGradeOutput` above) for the shape a `Diagnostic` +takes on once decorated with remediation-safety fields, and `RemediationItem` (under +`buildRemediationSafetyOutput` above) for the shape it takes on once filtered to one +remediation-safety level — both preserve `severity` and `range` unchanged from this base type. + --- ### `RuleMetadata` @@ -305,8 +385,8 @@ interface RuleMetadata { - [Usage Guide](usage-guide.md) — common patterns and worked examples - [Package Overview](README.md) — installation and minimal usage -- [MCP Server Tool Reference](api-grade-mcp.md) — all six MCP tools including `recoveryOption` +- [MCP Server Tool Reference](api-grade-mcp.md) — all MCP tools including `recoveryOption` - [CLI Commands](../cli/commands.md#json-output-schema) — CLI-specific usage of the JSON Output Schema above - [API Diagnostic Algorithm Specification](../../specs/algorithms/api_diagnostic_algorithm_spec.md) — full scoring/grading/recommendation algorithm -- [Quick-Fixes Algorithm Specification](../../specs/algorithms/quick_fixes_algorithm_spec.md) — full non-breaking-vs-breaking classification algorithm +- [Automated Remediation Safety Algorithm Specification](../../specs/algorithms/automated_remediation_safety_algorithm_spec.md) — full risk/confidence/remediation-safety classification algorithm - [Documentation Index](../index.md) — full navigation across all docs diff --git a/eslint.config.mjs b/eslint.config.mjs index 9a6f025..40092db 100644 --- a/eslint.config.mjs +++ b/eslint.config.mjs @@ -26,7 +26,7 @@ export default tseslint.config( 'node_modules/**', 'coverage/**', 'packages/*/coverage/**', - 'scripts/**', + '**/scripts/**', '**/*.config.ts', '**/*.config.mjs', ], diff --git a/package-lock.json b/package-lock.json index 930237b..df61400 100644 --- a/package-lock.json +++ b/package-lock.json @@ -1,12 +1,12 @@ { "name": "@dawmatt/api-grade", - "version": "1.0.0", + "version": "0.4.0", "lockfileVersion": 3, "requires": true, "packages": { "": { "name": "@dawmatt/api-grade", - "version": "1.0.0", + "version": "0.4.0", "license": "MIT", "workspaces": [ "packages/*" @@ -23379,7 +23379,7 @@ }, "packages/api-grade-core": { "name": "@dawmatt/api-grade-core", - "version": "1.0.0", + "version": "0.4.0", "license": "MIT", "dependencies": { "@stoplight/spectral-core": "^1.23.0", @@ -23402,7 +23402,7 @@ }, "packages/api-grade-mcp": { "name": "@dawmatt/api-grade-mcp", - "version": "1.0.0", + "version": "0.4.0", "license": "MIT", "dependencies": { "@dawmatt/api-grade-core": "*", @@ -23424,7 +23424,7 @@ }, "packages/backstage-plugin-api-grade": { "name": "@dawmatt/backstage-plugin-api-grade", - "version": "1.0.0", + "version": "0.4.0", "license": "MIT", "dependencies": { "@dawmatt/api-grade-core": "*" @@ -23459,7 +23459,7 @@ }, "packages/backstage-plugin-api-grade-backend": { "name": "@dawmatt/backstage-plugin-api-grade-backend", - "version": "1.0.0", + "version": "0.4.0", "license": "MIT", "dependencies": { "@dawmatt/api-grade-core": "*", diff --git a/package.json b/package.json index ad35b77..f1be86a 100644 --- a/package.json +++ b/package.json @@ -1,6 +1,6 @@ { "name": "@dawmatt/api-grade", - "version": "0.4.0", + "version": "0.5.0", "description": "Grade API quality and share diagnostics using Spectral-compatible linting", "keywords": [ "api", diff --git a/packages/api-grade-core/package.json b/packages/api-grade-core/package.json index e2f99db..31a532e 100644 --- a/packages/api-grade-core/package.json +++ b/packages/api-grade-core/package.json @@ -1,6 +1,6 @@ { "name": "@dawmatt/api-grade-core", - "version": "0.4.0", + "version": "0.5.0", "description": "Core grading library for api-grade — standalone, framework-agnostic", "keywords": [ "api", @@ -31,6 +31,7 @@ }, "scripts": { "build": "tsc", + "postbuild": "node scripts/copy-bundled-analysis.mjs", "test": "vitest run", "test:watch": "vitest", "test:coverage": "vitest run --coverage", diff --git a/packages/api-grade-core/scripts/copy-bundled-analysis.mjs b/packages/api-grade-core/scripts/copy-bundled-analysis.mjs new file mode 100644 index 0000000..696c750 --- /dev/null +++ b/packages/api-grade-core/scripts/copy-bundled-analysis.mjs @@ -0,0 +1,10 @@ +import { cpSync, mkdirSync } from 'node:fs'; +import { dirname, join } from 'node:path'; +import { fileURLToPath } from 'node:url'; + +const __dirname = dirname(fileURLToPath(import.meta.url)); +const srcDir = join(__dirname, '..', 'src', 'rulesets', 'bundled-analysis'); +const destDir = join(__dirname, '..', 'dist', 'rulesets', 'bundled-analysis'); + +mkdirSync(destDir, { recursive: true }); +cpSync(srcDir, destDir, { recursive: true, filter: (src) => src.endsWith('.json') || !src.includes('.') }); diff --git a/packages/api-grade-core/scripts/generate-bundled-analysis.mjs b/packages/api-grade-core/scripts/generate-bundled-analysis.mjs new file mode 100644 index 0000000..9093779 --- /dev/null +++ b/packages/api-grade-core/scripts/generate-bundled-analysis.mjs @@ -0,0 +1,75 @@ +// Maintenance utility: regenerates src/rulesets/bundled-analysis/{openapi,asyncapi}.json by +// running the real analyseRuleset() Stage 1/2 engine (dist/remediation-safety.js) over the +// entire built-in OpenAPI/AsyncAPI rulesets, so the built-in ruleset's analysis never requires +// per-rule computation at request time (SC-007). +// +// Run manually after bumping @stoplight/spectral-rulesets or after changing the analyser's +// heuristic. Requires `npm run build` to have run first (reads dist/). +// +// Human review (FR-020) is recorded directly in the JSON output files, not in this script: edit +// an entry's assessedBy to "human" (and set remediationSafetyLevel/rationale to match the +// reviewer's conclusion) after actually reading that rule's definition. This script reads the +// existing JSON before writing and leaves any "human" entry untouched — it only recomputes +// entries that are still "automated". If a left-alone human entry's fingerprint no longer +// matches the rule's current definition, the rule changed since it was reviewed; this script +// prints those as "stale rules to consider for re-review" rather than silently recalculating +// them. +import { existsSync, readFileSync, writeFileSync } from 'node:fs'; +import { dirname, join } from 'node:path'; +import { fileURLToPath } from 'node:url'; +import { oas, asyncapi } from '@stoplight/spectral-rulesets'; +import { analyseRuleset, computeRuleFingerprint } from '../dist/remediation-safety.js'; + +const __dirname = dirname(fileURLToPath(import.meta.url)); +const outDir = join(__dirname, '..', 'src', 'rulesets', 'bundled-analysis'); + +async function generate(ruleset, fileName) { + const outPath = join(outDir, fileName); + const existing = existsSync(outPath) ? JSON.parse(readFileSync(outPath, 'utf-8')) : { rules: {} }; + + // rulesetSource: 'custom' (not 'default') so analyseRuleset() doesn't try to read this very + // file via its own Stage 0 bundled-lookup branch while we're regenerating it. + const loadedRuleset = { ruleset, rulesetSource: 'custom' }; + const analysis = await analyseRuleset(loadedRuleset); + + const rules = {}; + const staleHumanRuleIds = []; + + for (const ruleAnalysis of analysis.rules) { + const { ruleId } = ruleAnalysis; + const fingerprint = computeRuleFingerprint(ruleId, ruleset.rules[ruleId]); + const existingEntry = existing.rules[ruleId]; + + if (existingEntry?.assessedBy === 'human') { + rules[ruleId] = existingEntry; + if (existingEntry.fingerprint !== fingerprint) { + staleHumanRuleIds.push(ruleId); + } + continue; + } + + rules[ruleId] = { + ruleId, + riskLevel: ruleAnalysis.riskLevel, + confidenceLevel: ruleAnalysis.confidenceLevel, + remediationSafetyLevel: ruleAnalysis.remediationSafetyLevel, + assessedBy: 'automated', + staleFingerprintWarning: null, + rationale: ruleAnalysis.rationale, + source: 'bundled-default', + fingerprint, + }; + } + + writeFileSync(outPath, JSON.stringify({ rules }, null, 2) + '\n', 'utf-8'); + console.log(`Wrote ${outPath} (${Object.keys(rules).length} entries)`); + if (staleHumanRuleIds.length > 0) { + console.log(` Stale rules to consider for re-review (human-reviewed, definition changed since):`); + for (const ruleId of staleHumanRuleIds) { + console.log(` - ${ruleId}`); + } + } +} + +await generate(oas, 'openapi.json'); +await generate(asyncapi, 'asyncapi.json'); diff --git a/packages/api-grade-core/src/config/personal-ruleset-override.ts b/packages/api-grade-core/src/config/personal-ruleset-override.ts new file mode 100644 index 0000000..68ff683 --- /dev/null +++ b/packages/api-grade-core/src/config/personal-ruleset-override.ts @@ -0,0 +1,46 @@ +import { readFile, writeFile, mkdir } from 'node:fs/promises'; +import { dirname, join } from 'node:path'; +import { homedir } from 'node:os'; +import type { PersonalRulesetAnalysisOverride } from '../types.js'; +import { ConfigWriteError } from './ruleset-config.js'; + +export function getWorkspaceOverridePath(): string { + return join(process.cwd(), '.api-grade', 'ruleset-analysis-override.json'); +} + +export function getGlobalOverridePath(): string { + return join(homedir(), '.api-grade', 'ruleset-analysis-override.json'); +} + +async function loadOverride(filePath: string): Promise { + try { + const data = await readFile(filePath, 'utf-8'); + return JSON.parse(data) as PersonalRulesetAnalysisOverride; + } catch { + return null; + } +} + +export async function loadWorkspaceRulesetAnalysisOverride(): Promise { + return loadOverride(getWorkspaceOverridePath()); +} + +export async function loadGlobalRulesetAnalysisOverride(): Promise { + return loadOverride(getGlobalOverridePath()); +} + +export async function saveRulesetAnalysisOverride( + scope: 'workspace' | 'global', + override: PersonalRulesetAnalysisOverride +): Promise { + const filePath = scope === 'workspace' ? getWorkspaceOverridePath() : getGlobalOverridePath(); + try { + await mkdir(dirname(filePath), { recursive: true }); + await writeFile(filePath, JSON.stringify(override, null, 2), 'utf-8'); + } catch (err) { + throw new ConfigWriteError( + `Could not write ${scope} ruleset analysis override to ${filePath}: ${err instanceof Error ? err.message : String(err)}`, + err + ); + } +} diff --git a/packages/api-grade-core/src/config/shared-ruleset-analysis.ts b/packages/api-grade-core/src/config/shared-ruleset-analysis.ts new file mode 100644 index 0000000..a038d63 --- /dev/null +++ b/packages/api-grade-core/src/config/shared-ruleset-analysis.ts @@ -0,0 +1,56 @@ +import { readFile, writeFile } from 'node:fs/promises'; +import { existsSync } from 'node:fs'; +import type { AuthConfig, SharedRulesetAnalysis } from '../types.js'; +import { fetchRulesetContent, INITIAL_FETCH_TIMEOUT_MS } from '../auth/github.js'; + +const SHARED_ANALYSIS_SUFFIX = '.remediation-safety.json'; + +// Colocated location, derived deterministically from the ruleset's own path/URL — never a +// separately-tracked or registered location (FR-016/FR-017). +export function deriveSharedAnalysisLocation(rulesetPathOrUrl: string): string { + return `${rulesetPathOrUrl}${SHARED_ANALYSIS_SUFFIX}`; +} + +export async function loadLocalSharedRulesetAnalysis(rulesetPath: string): Promise { + const location = deriveSharedAnalysisLocation(rulesetPath); + if (!existsSync(location)) return null; + try { + const data = await readFile(location, 'utf-8'); + return JSON.parse(data) as SharedRulesetAnalysis; + } catch { + return null; + } +} + +export async function saveLocalSharedRulesetAnalysis( + rulesetPath: string, + analysis: SharedRulesetAnalysis +): Promise { + const location = deriveSharedAnalysisLocation(rulesetPath); + await writeFile(location, JSON.stringify(analysis, null, 2), 'utf-8'); +} + +// Read-only for a GitHub-hosted ruleset — reuses the same resolution/auth flow already used to +// fetch the ruleset itself (FR-017); never written automatically (FR-019). +export async function loadRemoteSharedRulesetAnalysis( + rulesetUrl: string, + auth: AuthConfig | null +): Promise { + const location = deriveSharedAnalysisLocation(rulesetUrl); + try { + const token = auth?.type === 'github-pat' ? auth.githubToken ?? process.env.GITHUB_TOKEN : undefined; + const content = await fetchRulesetContent(location, token, INITIAL_FETCH_TIMEOUT_MS); + return JSON.parse(content) as SharedRulesetAnalysis; + } catch { + return null; + } +} + +export async function loadSharedRulesetAnalysis( + rulesetPath: string | undefined, + auth: AuthConfig | null +): Promise { + if (!rulesetPath) return null; + if (rulesetPath.startsWith('http')) return loadRemoteSharedRulesetAnalysis(rulesetPath, auth); + return loadLocalSharedRulesetAnalysis(rulesetPath); +} diff --git a/packages/api-grade-core/src/formatter.ts b/packages/api-grade-core/src/formatter.ts index a075ef8..91f5698 100644 --- a/packages/api-grade-core/src/formatter.ts +++ b/packages/api-grade-core/src/formatter.ts @@ -1,6 +1,7 @@ import chalk from 'chalk'; -import type { GradeResult, DiagnosticSeverity } from './types.js'; +import type { GradeResult, DiagnosticSeverity, RulesetAnalysis } from './types.js'; import { buildCommonGradeOutput } from './json-output.js'; +import { getRemediationSafety } from './remediation-safety.js'; const SEVERITY_COLORS: Record string> = { error: chalk.red, @@ -9,7 +10,7 @@ const SEVERITY_COLORS: Record string> = { hint: chalk.gray, }; -export function formatHuman(result: GradeResult, top?: number): string { +export function formatHuman(result: GradeResult, top?: number, rulesetAnalysis?: RulesetAnalysis): string { const lines: string[] = []; // Section 1: Grade line @@ -59,6 +60,12 @@ export function formatHuman(result: GradeResult, top?: number): string { ` ${color(d.severity.padEnd(5))} ${d.ruleId.padEnd(42)} ${pathStr}${lineNum}` ); lines.push(` ${d.message}`); + if (rulesetAnalysis) { + const safety = getRemediationSafety(d, rulesetAnalysis); + lines.push( + ` safety=${safety.remediationSafetyLevel} risk=${safety.riskLevel ?? 'n/a'} confidence=${safety.confidenceLevel}` + ); + } } if (remaining > 0) { @@ -73,7 +80,7 @@ export function formatHuman(result: GradeResult, top?: number): string { return lines.join('\n'); } -export function formatJson(result: GradeResult, top?: number): string { - const output = buildCommonGradeOutput(result, { top }); +export function formatJson(result: GradeResult, top?: number, rulesetAnalysis?: RulesetAnalysis): string { + const output = buildCommonGradeOutput(result, { top, rulesetAnalysis }); return JSON.stringify(output, null, 2); } diff --git a/packages/api-grade-core/src/index.ts b/packages/api-grade-core/src/index.ts index 8dc6510..253794e 100644 --- a/packages/api-grade-core/src/index.ts +++ b/packages/api-grade-core/src/index.ts @@ -3,7 +3,20 @@ export { formatHuman, formatJson } from './formatter.js'; export { computeScore, LETTER_GRADE_ORDER, gradeToNumber } from './scorer.js'; export { extractCategory } from './types.js'; export { buildCommonGradeOutput, buildAssertOutput } from './json-output.js'; -export { classifyViolation, buildQuickFix, buildQuickFixOutput, formatQuickFixesHuman } from './quick-fixes.js'; +export { + analyseRuleset, + getRemediationSafety, + buildRemediationItem, + buildRemediationSafetyOutput, + formatRemediationSafetyHuman, + decisionMatrix, + computeRuleFingerprint, + persistRuleAnalysisCorrection, +} from './remediation-safety.js'; +export type { + PersistRulesetAnalysisCorrectionScope, + PersistRulesetAnalysisCorrectionResult, +} from './remediation-safety.js'; export type { ApiFormat, @@ -19,11 +32,22 @@ export type { ImpactLevel, LetterGrade, RuleMetadata, - QuickFix, - ViolationClass, + RemediationItem, + RemediationSafetyLevel, + RiskLevel, + ConfidenceLevel, + AssessmentOrigin, + AnalysisSource, + RuleAnalysis, + RulesetAnalysis, + StaleFingerprintWarning, CommonGradeOutput, AssertOutput, - QuickFixOutput, + RemediationSafetyOutput, + PersistedRuleEntry, + SharedRulesetAnalysis, + PersonalRulesetAnalysisOverride, + BundledRulesetAnalysis, } from './types.js'; export type { @@ -53,3 +77,22 @@ export { } from './config/ruleset-config.js'; export { resolveRuleset } from './config/resolve-ruleset.js'; + +export { loadRuleset, loadRulesetFromUrl, getDefaultRuleset } from './rulesets/loader.js'; +export type { LoadedRuleset } from './rulesets/loader.js'; + +export { + deriveSharedAnalysisLocation, + loadLocalSharedRulesetAnalysis, + saveLocalSharedRulesetAnalysis, + loadRemoteSharedRulesetAnalysis, + loadSharedRulesetAnalysis, +} from './config/shared-ruleset-analysis.js'; + +export { + getWorkspaceOverridePath, + getGlobalOverridePath, + loadWorkspaceRulesetAnalysisOverride, + loadGlobalRulesetAnalysisOverride, + saveRulesetAnalysisOverride, +} from './config/personal-ruleset-override.js'; diff --git a/packages/api-grade-core/src/json-output.ts b/packages/api-grade-core/src/json-output.ts index 6ea3941..a7ce2ae 100644 --- a/packages/api-grade-core/src/json-output.ts +++ b/packages/api-grade-core/src/json-output.ts @@ -1,13 +1,19 @@ -import type { GradeResult, CommonGradeOutput, AssertOutput, LetterGrade } from './types.js'; +import type { GradeResult, CommonGradeOutput, AssertOutput, LetterGrade, RulesetAnalysis } from './types.js'; import { gradeToNumber } from './scorer.js'; +import { getRemediationSafety } from './remediation-safety.js'; export function buildCommonGradeOutput( result: GradeResult, - options?: { top?: number } + options?: { top?: number; rulesetAnalysis?: RulesetAnalysis } ): CommonGradeOutput { const top = options?.top; - const diagnostics = top !== undefined ? result.diagnostics.slice(0, top) : result.diagnostics; - const truncated = top !== undefined && diagnostics.length < result.diagnostics.length; + const sourceDiagnostics = top !== undefined ? result.diagnostics.slice(0, top) : result.diagnostics; + const truncated = top !== undefined && sourceDiagnostics.length < result.diagnostics.length; + + const rulesetAnalysis = options?.rulesetAnalysis; + const diagnostics = rulesetAnalysis + ? sourceDiagnostics.map((d) => ({ ...d, ...getRemediationSafety(d, rulesetAnalysis) })) + : sourceDiagnostics; const output: CommonGradeOutput = { specPath: result.specPath, diff --git a/packages/api-grade-core/src/quick-fixes.ts b/packages/api-grade-core/src/quick-fixes.ts deleted file mode 100644 index dcd5913..0000000 --- a/packages/api-grade-core/src/quick-fixes.ts +++ /dev/null @@ -1,169 +0,0 @@ -import type { Diagnostic, GradeResult, ViolationClass, QuickFix, QuickFixOutput } from './types.js'; - -const RULE_ID_NON_BREAKING_PREFIXES = [ - 'operation-description', - 'operation-summary', - 'info-contact', - 'info-description', - 'info-license', - 'oas3-examples-', - 'tag-description', -]; - -const NON_BREAKING_SEGMENTS = new Set([ - 'description', - 'summary', - 'title', - 'contact', - 'license', - 'termsOfService', - 'externalDocs', - 'example', - 'examples', - 'tags', - 'info', -]); - -const BREAKING_SEGMENTS = new Set([ - 'required', - 'type', - 'format', -]); - -function isNonBreakingPath(path: string[]): boolean { - for (const segment of path) { - if (segment.startsWith('x-')) return true; - if (NON_BREAKING_SEGMENTS.has(segment)) return true; - } - return false; -} - -function isBreakingPath(path: string[]): boolean { - for (const segment of path) { - if (BREAKING_SEGMENTS.has(segment)) return true; - if (segment === 'parameters') return true; - } - return false; -} - -export function classifyViolation(diagnostic: Diagnostic): ViolationClass { - // Rule ID overrides take priority - for (const prefix of RULE_ID_NON_BREAKING_PREFIXES) { - if (diagnostic.ruleId.startsWith(prefix)) return 'nonBreaking'; - } - - const path = diagnostic.path ?? []; - - if (isBreakingPath(path)) return 'breaking'; - if (isNonBreakingPath(path)) return 'nonBreaking'; - return 'unknown'; -} - -const SEVERITY_LABELS: Record = { - 0: 'error', - 1: 'warn', - 2: 'info', - 3: 'hint', -}; - -export function buildQuickFix( - diagnostic: Diagnostic, - specContent: string -): QuickFix { - const path = (diagnostic.path ?? []) as string[]; - const location = path.join('.'); - - let currentValue: string | null = null; - try { - if (path.length > 0) { - const parsed: unknown = JSON.parse(specContent); - let node: unknown = parsed; - for (const segment of path) { - if (node === null || typeof node !== 'object') { - node = undefined; - break; - } - node = (node as Record)[segment]; - } - if (node !== undefined && node !== null) { - currentValue = typeof node === 'string' ? node : JSON.stringify(node); - } - } - } catch { - // JSON parse failed (e.g. YAML spec) — leave currentValue as null - } - - const lastSegment = path[path.length - 1] ?? 'field'; - const expectedImprovement = deriveExpectedImprovement(diagnostic.ruleId, diagnostic.message, lastSegment, path); - - const severityNum = typeof diagnostic.severity === 'number' ? diagnostic.severity : 1; - - return { - ruleId: diagnostic.ruleId, - message: diagnostic.message, - severity: SEVERITY_LABELS[severityNum] ?? 'warn', - path, - location, - currentValue, - expectedImprovement, - }; -} - -function deriveExpectedImprovement( - ruleId: string, - message: string, - lastSegment: string, - path: string[] -): string { - if (ruleId.includes('description')) { - const entity = path.length > 1 ? path[path.length - 2] : 'item'; - return `Add a \`description\` field that explains the purpose of this ${entity}`; - } - if (ruleId.includes('summary')) { - return `Add a \`summary\` field with a brief one-line description`; - } - if (ruleId.includes('contact')) { - return `Add a \`contact\` object to the info block with name, email, or url`; - } - if (ruleId.includes('license')) { - return `Add a \`license\` object to the info block with name and url`; - } - if (ruleId.includes('example')) { - return `Add an \`example\` or \`examples\` field illustrating expected values`; - } - if (ruleId.includes('tag-description')) { - return `Add a \`description\` field to this tag explaining its purpose`; - } - return `Fix: ${message}. Add or update \`${lastSegment}\` as required`; -} - -export function buildQuickFixOutput(result: GradeResult, specContent: string): QuickFixOutput { - const quickFixes = result.diagnostics - .filter((d) => classifyViolation(d) === 'nonBreaking') - .map((d) => buildQuickFix(d, specContent)); - - return { - specPath: result.specPath, - format: result.format, - totalViolations: result.diagnostics.length, - quickFixCount: quickFixes.length, - quickFixes, - }; -} - -export function formatQuickFixesHuman(result: GradeResult, specContent: string): string { - const { quickFixes } = buildQuickFixOutput(result, specContent); - const lines: string[] = []; - - lines.push(`Quick Fixes (${quickFixes.length} of ${result.diagnostics.length} total violations):`); - - for (const fix of quickFixes) { - lines.push(''); - const location = fix.location || '(root)'; - lines.push(` ${fix.severity.padEnd(5)} ${fix.ruleId.padEnd(42)} ${location}`); - lines.push(` ${fix.message}`); - lines.push(` ${fix.expectedImprovement}`); - } - - return lines.join('\n'); -} diff --git a/packages/api-grade-core/src/remediation-safety.ts b/packages/api-grade-core/src/remediation-safety.ts new file mode 100644 index 0000000..9af09b1 --- /dev/null +++ b/packages/api-grade-core/src/remediation-safety.ts @@ -0,0 +1,711 @@ +import { createHash } from 'node:crypto'; +import type { + AnalysisSource, + AuthConfig, + ConfidenceLevel, + Diagnostic, + GradeResult, + PersistedRuleEntry, + RemediationItem, + RemediationSafetyLevel, + RemediationSafetyOutput, + RiskLevel, + RuleAnalysis, + RulesetAnalysis, +} from './types.js'; +import type { LoadedRuleset } from './rulesets/loader.js'; +import { + loadSharedRulesetAnalysis, + deriveSharedAnalysisLocation, + loadLocalSharedRulesetAnalysis, + saveLocalSharedRulesetAnalysis, +} from './config/shared-ruleset-analysis.js'; +import { + loadWorkspaceRulesetAnalysisOverride, + loadGlobalRulesetAnalysisOverride, + saveRulesetAnalysisOverride, +} from './config/personal-ruleset-override.js'; +import { loadBundledRulesetAnalysis } from './rulesets/bundled-analysis.js'; + +type Tier = 'safe' | 'humanreview' | 'unsafe'; + +interface StageResult { + riskLevel: RiskLevel; + confidenceLevel: ConfidenceLevel; + rationale: string; + source: AnalysisSource; +} + +interface SpectralThen { + // Spectral resolves `then.function` to an actual function reference once a ruleset is + // loaded (e.g. via @stoplight/spectral-rulesets); only hand-authored YAML rulesets parsed + // before bundling carry it as a plain string. Both forms must be handled. + function?: string | { name?: string }; + field?: string; + functionOptions?: Record; +} + +interface SpectralRule { + given?: string | string[]; + then?: SpectralThen | SpectralThen[]; + severity?: unknown; + description?: string; +} + +const UNSAFE_SEGMENTS = new Set([ + 'required', + 'type', + 'format', + 'parameters', + 'address', + 'action', + 'messages', + 'payload', +]); + +const HUMANREVIEW_SEGMENTS = new Set([ + 'enum', + 'default', + 'security', + 'servers', + 'operationId', + 'additionalProperties', + 'responses', + 'channels', + 'operations', + 'reply', +]); + +const SAFE_SEGMENTS = new Set([ + 'description', + 'summary', + 'title', + 'contact', + 'license', + 'termsOfService', + 'externalDocs', + 'example', + 'examples', + 'tags', + 'info', +]); + +const ADDITIVE_FUNCTIONS = new Set(['truthy', 'defined']); +const RENAME_FUNCTIONS = new Set(['pattern', 'casing']); +const KNOWN_FUNCTIONS = new Set([ + ...ADDITIVE_FUNCTIONS, + ...RENAME_FUNCTIONS, + 'alphabetical', + 'enumeration', + 'falsy', + 'length', + 'schema', + 'undefined', + 'unreferencedReusableObject', + 'xor', +]); + +function tokenize(given: string): string[] { + return given.match(/[A-Za-z_][A-Za-z0-9_-]*/g) ?? []; +} + +function isKeySelector(given: string): boolean { + return /~\s*$/.test(given.trim()); +} + +function givenExprsOf(rule: SpectralRule): string[] { + if (!rule.given) return []; + return Array.isArray(rule.given) ? rule.given : [rule.given]; +} + +// Spectral's built-in rulesets (and many custom ones) express `given` via macro aliases — +// e.g. "#OperationObject" — rather than literal JSONPath. An alias resolves to one or more +// JSONPath expressions, declared at the ruleset level either as a plain array or as +// { targets: [{ given: [...] }] }, and may itself reference other aliases recursively +// (e.g. "OperationObject" -> "#PathItem[get,put,...]" -> "$.paths[*]"). Without resolving +// these, segment/key-selector matching never sees a real path for most built-in rules. +type AliasDefinition = string[] | { targets?: Array<{ given?: string | string[] }> }; +type AliasMap = Record; + +const ALIAS_REF_RE = /^#([A-Za-z0-9_]+)(.*)$/; + +function resolveGivenExpr(expr: string, aliases: AliasMap, depth = 0): string[] { + if (depth > 10) return [expr]; + const match = ALIAS_REF_RE.exec(expr.trim()); + if (!match) return [expr]; + const [, aliasName, suffix] = match; + const aliasDef = aliases[aliasName]; + if (!aliasDef) return [expr]; + + const bases = Array.isArray(aliasDef) + ? aliasDef + : (aliasDef.targets ?? []).flatMap((t) => (Array.isArray(t.given) ? t.given : t.given ? [t.given] : [])); + + const resolved: string[] = []; + for (const base of bases) { + resolved.push(...resolveGivenExpr(`${base}${suffix}`, aliases, depth + 1)); + } + return resolved.length > 0 ? resolved : [expr]; +} + +function resolvedGivenExprsOf(rule: SpectralRule, aliases: AliasMap): string[] { + return givenExprsOf(rule).flatMap((expr) => resolveGivenExpr(expr, aliases)); +} + +function functionNameOf(fn: SpectralThen['function']): string | undefined { + if (typeof fn === 'string') return fn; + if (typeof fn === 'function') return (fn as { name?: string }).name || undefined; + if (fn && typeof fn === 'object' && typeof fn.name === 'string') return fn.name; + return undefined; +} + +function functionNamesOf(rule: SpectralRule): string[] { + const then = rule.then; + if (!then) return []; + const thens = Array.isArray(then) ? then : [then]; + return thens.map((t) => functionNameOf(t?.function)).filter((f): f is string => typeof f === 'string' && f.length > 0); +} + +// `then.field` names the specific sub-field a function actually targets (e.g. given +// "#OperationObject", field "operationId") — segment matching must consider it alongside +// `given`, since two rules sharing the same `given` (e.g. "operationId" vs "description" on +// the same OperationObject) are only distinguishable by their field. +function fieldTokensOf(rule: SpectralRule): string[] { + const then = rule.then; + if (!then) return []; + const thens = Array.isArray(then) ? then : [then]; + return thens.flatMap((t) => (typeof t?.field === 'string' ? tokenize(t.field) : [])); +} + +function fieldNamesOf(rule: SpectralRule): string[] { + const then = rule.then; + if (!then) return []; + const thens = Array.isArray(then) ? then : [then]; + return thens.map((t) => t?.field).filter((f): f is string => typeof f === 'string'); +} + +// `pattern` with `notMatch`-only (no `match`) is an existence/validity check — it asserts the +// field does NOT contain a bad value (empty object, trailing slash, example.com, etc.) rather +// than enforcing a specific format or naming convention. Semantically closer to `falsy`/`truthy` +// than to `casing` or a format-match pattern. When `match` is also present the intent is +// ambiguous so we fall through to the rename/reformat classification. +function isPatternExistenceCheck(rule: SpectralRule): boolean { + const then = rule.then; + if (!then) return false; + const thens = Array.isArray(then) ? then : [then]; + return thens.some((t) => { + if (functionNameOf(t?.function) !== 'pattern') return false; + const opts = t?.functionOptions; + return typeof opts === 'object' && opts !== null && 'notMatch' in opts && !('match' in opts); + }); +} + +function matchedTiers(givenExprs: string[], extraSegments: string[] = []): Set { + const tiers = new Set(); + const scan = (segment: string): void => { + if (segment.startsWith('x-')) tiers.add('safe'); + if (UNSAFE_SEGMENTS.has(segment)) tiers.add('unsafe'); + if (HUMANREVIEW_SEGMENTS.has(segment)) tiers.add('humanreview'); + if (SAFE_SEGMENTS.has(segment)) tiers.add('safe'); + }; + for (const given of givenExprs) { + for (const segment of tokenize(given)) scan(segment); + } + for (const segment of extraSegments) scan(segment); + return tiers; +} + +function mostConservativeTier(tiers: Set): Tier | null { + if (tiers.has('unsafe')) return 'unsafe'; + if (tiers.has('humanreview')) return 'humanreview'; + if (tiers.has('safe')) return 'safe'; + return null; +} + +function tierToRisk(tier: Tier): RiskLevel { + return tier === 'unsafe' ? 'high' : tier === 'humanreview' ? 'medium' : 'low'; +} + +// Stage 1a: a `given` expression that selects path/channel object keys directly (via the JSONPath +// `~` key-selector), OR a rule using Spectral's `then.field: "@key"` on a paths/channels +// collection — the function-based equivalent of the `~` key-selector. In AsyncAPI 2.x the channel +// key IS the routing address; in OpenAPI the path key is the route. Both forms carry identical +// semantic risk: any satisfying edit renames a public path or channel. +function stage1a(givenExprs: string[], fieldNames: string[] = []): StageResult | null { + for (const given of givenExprs) { + if (!isKeySelector(given)) continue; + const tokens = tokenize(given); + if (tokens.includes('paths') || tokens.includes('channels')) { + return { + riskLevel: 'high', + confidenceLevel: 'high', + rationale: + 'given path selects path/channel object keys directly — any satisfying edit renames a public path or channel', + source: 'heuristic', + }; + } + } + if (fieldNames.includes('@key')) { + const givenTokens = givenExprs.flatMap(tokenize); + if (givenTokens.includes('paths') || givenTokens.includes('channels')) { + return { + riskLevel: 'high', + confidenceLevel: 'high', + rationale: + 'then.field "@key" on paths/channels collection — equivalent to a path/channel key-selector; any satisfying edit renames a public path or channel', + source: 'heuristic', + }; + } + } + return null; +} + +// Stage 1b: classify by the rule's `then.function` mechanics. +function stage1b( + givenExprs: string[], + functionNames: string[], + fieldTokens: string[], + patternIsExistenceCheck: boolean, +): StageResult | null { + if (functionNames.length === 0) return null; + const tiers = matchedTiers(givenExprs, fieldTokens); + + // When then.field names a token that is exclusively in SAFE_SEGMENTS (e.g. `description`, + // `summary`), the field's tier overrides the parent path context for additive-style + // operations. Adding a documentation field is safe regardless of the parent object it + // lives in — `$.operations.*.description` is no riskier than `$.info.description`. + const fieldOnlyTiers = fieldTokens.length > 0 ? matchedTiers([], fieldTokens) : new Set(); + const fieldIsExclusivelySafe = + fieldOnlyTiers.size > 0 && + fieldOnlyTiers.has('safe') && + !fieldOnlyTiers.has('unsafe') && + !fieldOnlyTiers.has('humanreview'); + + for (const fn of functionNames) { + if (ADDITIVE_FUNCTIONS.has(fn)) { + const effectiveTiers = fieldIsExclusivelySafe ? fieldOnlyTiers : tiers; + let riskLevel: RiskLevel = 'low'; + if (effectiveTiers.has('unsafe')) riskLevel = 'high'; + else if (effectiveTiers.has('humanreview')) riskLevel = 'medium'; + const confidenceLevel: ConfidenceLevel = effectiveTiers.size <= 1 ? 'high' : 'medium'; + return { + riskLevel, + confidenceLevel, + rationale: `\`${fn}\` function (additive — add/populate a field) on a target matching the ${riskLevel} tier`, + source: 'heuristic', + }; + } + if (fn === 'pattern' && patternIsExistenceCheck) { + // notMatch-only pattern: existence/validity check, not rename/reformat. Risk escalates the + // same as additive on recognized tiers; falls back to medium (not low) on an unrecognized + // target so that e.g. `pattern` on a bare `$` with `field: host` stays conservative. + const effectiveTiers = fieldIsExclusivelySafe ? fieldOnlyTiers : tiers; + let riskLevel: RiskLevel = effectiveTiers.size === 0 ? 'medium' : 'low'; + if (effectiveTiers.has('unsafe')) riskLevel = 'high'; + else if (effectiveTiers.has('humanreview')) riskLevel = 'medium'; + const confidenceLevel: ConfidenceLevel = effectiveTiers.size <= 1 ? 'high' : 'medium'; + return { + riskLevel, + confidenceLevel, + rationale: `\`pattern\` function (existence/validity check — \`notMatch\` validates content is present and correctly formed) on a target matching the ${riskLevel} tier`, + source: 'heuristic', + }; + } + if (RENAME_FUNCTIONS.has(fn)) { + let riskLevel: RiskLevel = 'medium'; + if (tiers.has('unsafe')) riskLevel = 'high'; + else if (tiers.size === 1 && tiers.has('safe')) riskLevel = 'low'; + const confidenceLevel: ConfidenceLevel = tiers.size <= 1 ? 'high' : 'medium'; + return { + riskLevel, + confidenceLevel, + rationale: `\`${fn}\` function (rename/reformat) on a target matching the ${riskLevel} tier`, + source: 'heuristic', + }; + } + if (!KNOWN_FUNCTIONS.has(fn)) { + return { + riskLevel: 'high', + confidenceLevel: 'low', + rationale: `custom function \`${fn}\` — mechanics cannot be inferred statically`, + source: 'heuristic', + }; + } + } + return null; +} + +// Stage 1c: generic segment-membership fallback within Stage 1. +function stage1c(givenExprs: string[], fieldTokens: string[]): StageResult | null { + const tiers = matchedTiers(givenExprs, fieldTokens); + const tier = mostConservativeTier(tiers); + if (tier === null) return null; + const riskLevel = tierToRisk(tier); + const confidenceLevel: ConfidenceLevel = tiers.size === 1 ? 'medium' : 'low'; + const rationale = + tiers.size === 1 + ? `given path matched the ${tier} segment set` + : `given path matched multiple tiers (${[...tiers].join(', ')}) — conservative match, ambiguous`; + return { riskLevel, confidenceLevel, rationale, source: 'heuristic' }; +} + +const STAGE2_FALLBACK: StageResult = { + riskLevel: 'high', + confidenceLevel: 'low', + rationale: 'no recognizable rule-id, function, or path signal', + source: 'fallback', +}; + +function classifyRuleStages1And2(rule: SpectralRule, aliases: AliasMap): StageResult { + const givenExprs = resolvedGivenExprsOf(rule, aliases); + const fieldNames = fieldNamesOf(rule); + const fieldTokens = fieldTokensOf(rule); + const a = stage1a(givenExprs, fieldNames); + if (a) return a; + const functionNames = functionNamesOf(rule); + const patternIsExistenceCheck = isPatternExistenceCheck(rule); + const b = stage1b(givenExprs, functionNames, fieldTokens, patternIsExistenceCheck); + if (b) return b; + const c = stage1c(givenExprs, fieldTokens); + if (c) return c; + return STAGE2_FALLBACK; +} + +export function decisionMatrix(riskLevel: RiskLevel, confidenceLevel: ConfidenceLevel): RemediationSafetyLevel { + if (riskLevel === 'low' && (confidenceLevel === 'high' || confidenceLevel === 'medium')) return 'safe'; + if (riskLevel === 'medium' && confidenceLevel === 'high') return 'humanreview'; + if (riskLevel === 'high') return 'unsafe'; + return 'humanreview'; +} + +// Stage 0: a stable identifier for "this exact rule definition" — hash over the rule's own +// content (ruleId, given, then.function, severity, description), never the ruleset path/URL. +export function computeRuleFingerprint(ruleId: string, rule: SpectralRule): string { + const given = givenExprsOf(rule).join(','); + const fn = functionNamesOf(rule).join(','); + const severity = String(rule.severity ?? ''); + const description = rule.description ?? ''; + const raw = `${ruleId}|${given}|${fn}|${severity}|${description}`; + return createHash('sha256').update(raw).digest('hex'); +} + +function buildStaleFingerprintWarning(stored: string, current: string): RuleAnalysis['staleFingerprintWarning'] { + return { + storedFingerprint: stored, + currentFingerprint: current, + message: `rule changed since this was last reviewed (stored fingerprint ${stored.slice(0, 8)}..., current ${current.slice(0, 8)}...)`, + }; +} + +// Stage 0 lookup precedence: workspace override -> global override -> shared colocated analysis +// -> bundled default (built-in ruleset only). An `assessedBy: "human"` entry is used as soon as +// it is found, fingerprint match or not (flagged via staleFingerprintWarning on mismatch). An +// `assessedBy: "automated"` entry is only used on a fingerprint match; otherwise the lookup +// continues to the next store in precedence order. +function lookupStage0( + ruleId: string, + fingerprint: string, + stores: Array | null | undefined> +): RuleAnalysis | null { + for (const store of stores) { + const entry = store?.[ruleId]; + if (!entry) continue; + + if (entry.assessedBy === 'human') { + const stale = entry.fingerprint !== fingerprint; + return { + ruleId, + riskLevel: entry.riskLevel, + confidenceLevel: entry.confidenceLevel, + remediationSafetyLevel: entry.remediationSafetyLevel, + assessedBy: 'human', + rationale: entry.rationale, + source: entry.source, + staleFingerprintWarning: stale ? buildStaleFingerprintWarning(entry.fingerprint, fingerprint) : null, + }; + } + + if (entry.fingerprint === fingerprint) { + return { + ruleId, + riskLevel: entry.riskLevel, + confidenceLevel: entry.confidenceLevel, + remediationSafetyLevel: entry.remediationSafetyLevel, + assessedBy: 'automated', + rationale: entry.rationale, + source: entry.source, + staleFingerprintWarning: null, + }; + } + // automated entry, stale fingerprint -> not found; keep checking lower-precedence stores + } + return null; +} + +export async function analyseRuleset( + loadedRuleset: LoadedRuleset, + options?: { auth?: AuthConfig | null } +): Promise { + const rulesMap = (loadedRuleset.ruleset?.rules ?? {}) as Record; + const aliases = (loadedRuleset.ruleset?.aliases ?? {}) as AliasMap; + const ruleIds = Object.keys(rulesMap); + const isBuiltIn = loadedRuleset.rulesetSource === 'default'; + + const [workspaceOverride, globalOverride, sharedAnalysis, bundledAnalysis] = await Promise.all([ + loadWorkspaceRulesetAnalysisOverride(), + loadGlobalRulesetAnalysisOverride(), + loadSharedRulesetAnalysis(loadedRuleset.rulesetPath, options?.auth ?? null), + isBuiltIn ? loadBundledRulesetAnalysis(ruleIds) : Promise.resolve(null), + ]); + + const rules: RuleAnalysis[] = ruleIds.map((ruleId) => { + const rule = rulesMap[ruleId]; + const fingerprint = computeRuleFingerprint(ruleId, rule); + + const stage0 = lookupStage0(ruleId, fingerprint, [ + workspaceOverride?.rules, + globalOverride?.rules, + sharedAnalysis?.rules, + bundledAnalysis?.rules, + ]); + if (stage0) return stage0; + + const { riskLevel, confidenceLevel, rationale, source } = classifyRuleStages1And2(rule, aliases); + return { + ruleId, + riskLevel, + confidenceLevel, + remediationSafetyLevel: decisionMatrix(riskLevel, confidenceLevel), + assessedBy: 'automated', + staleFingerprintWarning: null, + rationale, + source, + }; + }); + + return { + rulesetSource: loadedRuleset.rulesetSource, + ...(loadedRuleset.rulesetPath !== undefined ? { rulesetPath: loadedRuleset.rulesetPath } : {}), + rules, + }; +} + +export type PersistRulesetAnalysisCorrectionScope = 'shared' | 'personal-workspace' | 'personal-global'; + +export interface PersistRulesetAnalysisCorrectionResult { + written: 'shared' | 'personal' | 'personal-fallback'; + sharedFileContent?: string; +} + +// Stage 4: an explicit, user-initiated write of a human-confirmed classification into one of +// the stores Stage 0 reads from. Defaults to the colocated shared file for a local, writable +// ruleset; falls back to a personal override (plus emitted shared-file content) for a remote or +// built-in ruleset location that isn't locally writable (FR-019). +export async function persistRuleAnalysisCorrection( + loadedRuleset: LoadedRuleset, + ruleId: string, + remediationSafetyLevel: RemediationSafetyLevel, + scope: PersistRulesetAnalysisCorrectionScope = 'shared' +): Promise { + const rulesMap = (loadedRuleset.ruleset?.rules ?? {}) as Record; + const rule = rulesMap[ruleId]; + if (!rule) { + throw new Error(`Rule '${ruleId}' was not found in this ruleset.`); + } + const fingerprint = computeRuleFingerprint(ruleId, rule); + + const entry: PersistedRuleEntry = { + ruleId, + riskLevel: null, + confidenceLevel: 'high', + remediationSafetyLevel, + assessedBy: 'human', + staleFingerprintWarning: null, + rationale: 'user-confirmed override', + source: 'persisted', + fingerprint, + }; + + if (scope === 'personal-workspace' || scope === 'personal-global') { + const overrideScope = scope === 'personal-workspace' ? 'workspace' : 'global'; + const existing = + overrideScope === 'workspace' + ? await loadWorkspaceRulesetAnalysisOverride() + : await loadGlobalRulesetAnalysisOverride(); + await saveRulesetAnalysisOverride(overrideScope, { + scope: overrideScope, + rules: { ...(existing?.rules ?? {}), [ruleId]: entry }, + }); + return { written: 'personal' }; + } + + const rulesetPath = loadedRuleset.rulesetPath; + const isRemote = rulesetPath?.startsWith('http'); + + if (rulesetPath && !isRemote) { + const existing = await loadLocalSharedRulesetAnalysis(rulesetPath); + await saveLocalSharedRulesetAnalysis(rulesetPath, { + location: deriveSharedAnalysisLocation(rulesetPath), + rules: { ...(existing?.rules ?? {}), [ruleId]: entry }, + }); + return { written: 'shared' }; + } + + // Remote (GitHub-hosted) or built-in ruleset location — never write automatically (FR-019). + const existing = await loadWorkspaceRulesetAnalysisOverride(); + const mergedRules = { ...(existing?.rules ?? {}), [ruleId]: entry }; + await saveRulesetAnalysisOverride('workspace', { scope: 'workspace', rules: mergedRules }); + const sharedFileContent = JSON.stringify( + { location: rulesetPath ? deriveSharedAnalysisLocation(rulesetPath) : undefined, rules: mergedRules }, + null, + 2 + ); + return { written: 'personal-fallback', sharedFileContent }; +} + +export function getRemediationSafety( + diagnostic: Diagnostic, + rulesetAnalysis: RulesetAnalysis +): Pick & { safetyRationale: string } { + const entry = rulesetAnalysis.rules.find((r) => r.ruleId === diagnostic.ruleId); + if (entry) { + return { + riskLevel: entry.riskLevel, + confidenceLevel: entry.confidenceLevel, + remediationSafetyLevel: entry.remediationSafetyLevel, + safetyRationale: entry.rationale, + staleFingerprintWarning: entry.staleFingerprintWarning, + }; + } + return { riskLevel: 'high', confidenceLevel: 'low', remediationSafetyLevel: 'unsafe', safetyRationale: 'no recognizable rule-id, function, or path signal', staleFingerprintWarning: null }; +} + +function deriveExpectedImprovement( + ruleId: string, + message: string, + lastSegment: string, + path: string[] +): string { + if (ruleId.includes('description')) { + const entity = path.length > 1 ? path[path.length - 2] : 'item'; + return `Add a \`description\` field that explains the purpose of this ${entity}`; + } + if (ruleId.includes('summary')) { + return `Add a \`summary\` field with a brief one-line description`; + } + if (ruleId.includes('contact')) { + return `Add a \`contact\` object to the info block with name, email, or url`; + } + if (ruleId.includes('license')) { + return `Add a \`license\` object to the info block with name and url`; + } + if (ruleId.includes('example')) { + return `Add an \`example\` or \`examples\` field illustrating expected values`; + } + if (ruleId.includes('tag-description')) { + return `Add a \`description\` field to this tag explaining its purpose`; + } + return `Fix: ${message}. Add or update \`${lastSegment}\` as required`; +} + +export function buildRemediationItem( + diagnostic: Diagnostic, + specContent: string, + rulesetAnalysis: RulesetAnalysis +): RemediationItem { + const path = (diagnostic.path ?? []) as string[]; + const location = path.join('.'); + + let currentValue: string | null = null; + try { + if (path.length > 0) { + const parsed: unknown = JSON.parse(specContent); + let node: unknown = parsed; + for (const segment of path) { + if (node === null || typeof node !== 'object') { + node = undefined; + break; + } + node = (node as Record)[segment]; + } + if (node !== undefined && node !== null) { + currentValue = typeof node === 'string' ? node : JSON.stringify(node); + } + } + } catch { + // JSON parse failed (e.g. YAML spec) — leave currentValue as null + } + + const lastSegment = path[path.length - 1] ?? 'field'; + const expectedImprovement = deriveExpectedImprovement(diagnostic.ruleId, diagnostic.message, lastSegment, path); + + const safety = getRemediationSafety(diagnostic, rulesetAnalysis); + + return { + ruleId: diagnostic.ruleId, + message: diagnostic.message, + severity: diagnostic.severity, + path, + location, + range: diagnostic.range, + currentValue, + expectedImprovement, + riskLevel: safety.riskLevel, + confidenceLevel: safety.confidenceLevel, + remediationSafetyLevel: safety.remediationSafetyLevel, + safetyRationale: safety.safetyRationale, + staleFingerprintWarning: safety.staleFingerprintWarning, + }; +} + +export function buildRemediationSafetyOutput( + result: GradeResult, + specContent: string, + rulesetAnalysis: RulesetAnalysis, + requestedLevel: RemediationSafetyLevel +): RemediationSafetyOutput { + const remediationItems = result.diagnostics + .map((d) => buildRemediationItem(d, specContent, rulesetAnalysis)) + .filter((item) => item.remediationSafetyLevel === requestedLevel); + + return { + specPath: result.specPath, + format: result.format, + totalViolations: result.diagnostics.length, + remediationItemCount: remediationItems.length, + remediationItems, + requestedLevel, + }; +} + +export function formatRemediationSafetyHuman( + result: GradeResult, + specContent: string, + rulesetAnalysis: RulesetAnalysis, + requestedLevel: RemediationSafetyLevel +): string { + const { remediationItems, totalViolations } = buildRemediationSafetyOutput( + result, + specContent, + rulesetAnalysis, + requestedLevel + ); + const lines: string[] = []; + + lines.push(`Remediation Safety: ${requestedLevel} (${remediationItems.length} of ${totalViolations} total violations):`); + + for (const item of remediationItems) { + lines.push(''); + const location = item.location || '(root)'; + const lineNum = item.range?.start?.line !== undefined ? ` Line ${item.range.start.line + 1}` : ''; + lines.push(` ${item.severity.padEnd(5)} ${item.ruleId.padEnd(42)} ${location}${lineNum}`); + lines.push(` risk=${item.riskLevel ?? 'n/a'} confidence=${item.confidenceLevel} safety=${item.remediationSafetyLevel}`); + lines.push(` ${item.message}`); + lines.push(` ${item.expectedImprovement}`); + if (item.staleFingerprintWarning) { + lines.push(` WARNING: ${item.staleFingerprintWarning.message}`); + } + } + + return lines.join('\n'); +} diff --git a/packages/api-grade-core/src/rulesets/bundled-analysis.ts b/packages/api-grade-core/src/rulesets/bundled-analysis.ts new file mode 100644 index 0000000..91b8f1b --- /dev/null +++ b/packages/api-grade-core/src/rulesets/bundled-analysis.ts @@ -0,0 +1,31 @@ +import { readFile } from 'node:fs/promises'; +import { dirname, join } from 'node:path'; +import { fileURLToPath } from 'node:url'; +import type { BundledRulesetAnalysis } from '../types.js'; + +const __dirname = dirname(fileURLToPath(import.meta.url)); + +let openapiCache: BundledRulesetAnalysis | null | undefined; +let asyncapiCache: BundledRulesetAnalysis | null | undefined; + +async function loadJson(fileName: string): Promise { + try { + const data = await readFile(join(__dirname, 'bundled-analysis', fileName), 'utf-8'); + return JSON.parse(data) as BundledRulesetAnalysis; + } catch { + return null; + } +} + +// The built-in ruleset's pre-calculated analysis, shipped with the package (FR-012). Detects +// OpenAPI vs. AsyncAPI by the presence of an "asyncapi"-prefixed rule id, since the built-in +// LoadedRuleset does not otherwise carry the API format back to the analyser. +export async function loadBundledRulesetAnalysis(ruleIds: string[]): Promise { + const isAsyncApi = ruleIds.some((id) => id.startsWith('asyncapi')); + if (isAsyncApi) { + if (asyncapiCache === undefined) asyncapiCache = await loadJson('asyncapi.json'); + return asyncapiCache; + } + if (openapiCache === undefined) openapiCache = await loadJson('openapi.json'); + return openapiCache; +} diff --git a/packages/api-grade-core/src/rulesets/bundled-analysis/asyncapi.json b/packages/api-grade-core/src/rulesets/bundled-analysis/asyncapi.json new file mode 100644 index 0000000..0517b1b --- /dev/null +++ b/packages/api-grade-core/src/rulesets/bundled-analysis/asyncapi.json @@ -0,0 +1,609 @@ +{ + "rules": { + "asyncapi-channel-no-empty-parameter": { + "ruleId": "asyncapi-channel-no-empty-parameter", + "riskLevel": "high", + "confidenceLevel": "medium", + "remediationSafetyLevel": "humanreview", + "assessedBy": "human", + "staleFingerprintWarning": null, + "rationale": "Defining parameters is a breaking change only when types and parameter constraints/defaults are defined", + "source": "bundled-default", + "fingerprint": "faef5a6495551bbb7fa3f2d0d85ba9e4703ba0fe0cbccf3c40a8c502ba81f98b" + }, + "asyncapi-3-channel-no-empty-parameter": { + "ruleId": "asyncapi-3-channel-no-empty-parameter", + "riskLevel": "high", + "confidenceLevel": "medium", + "remediationSafetyLevel": "humanreview", + "assessedBy": "human", + "staleFingerprintWarning": null, + "rationale": "Defining parameters is a breaking change only when types and parameter constraints/defaults are defined", + "source": "bundled-default", + "fingerprint": "7f20f52e4806ac9b686d5530ed6e7ff83965a6fc3afb47a2391ffad96d227031" + }, + "asyncapi-channel-no-query-nor-fragment": { + "ruleId": "asyncapi-channel-no-query-nor-fragment", + "riskLevel": "high", + "confidenceLevel": "high", + "remediationSafetyLevel": "unsafe", + "assessedBy": "human", + "staleFingerprintWarning": null, + "rationale": "Breaking change as impacts channel contract", + "source": "bundled-default", + "fingerprint": "73eff0af4d20d3818101af39362816099cd3a524492d55d905d50b2a3c3cbff5" + }, + "asyncapi-3-channel-no-query-nor-fragment": { + "ruleId": "asyncapi-3-channel-no-query-nor-fragment", + "riskLevel": "high", + "confidenceLevel": "medium", + "remediationSafetyLevel": "unsafe", + "assessedBy": "human", + "staleFingerprintWarning": null, + "rationale": "Breaking change as impacts channel contract", + "source": "bundled-default", + "fingerprint": "fbcc987e5a67e86b9220fee87b3226bd0b6d923fec9eb169c4aaad7d31b611ac" + }, + "asyncapi-channel-no-trailing-slash": { + "ruleId": "asyncapi-channel-no-trailing-slash", + "riskLevel": "high", + "confidenceLevel": "high", + "remediationSafetyLevel": "unsafe", + "assessedBy": "human", + "staleFingerprintWarning": null, + "rationale": "Breaking change as impacts channel contract", + "source": "bundled-default", + "fingerprint": "6e35de3017f8bd11b0aeefee5728c0a7d1d31af3765c0345f79c89c6d87a2953" + }, + "asyncapi-3-channel-no-trailing-slash": { + "ruleId": "asyncapi-3-channel-no-trailing-slash", + "riskLevel": "high", + "confidenceLevel": "medium", + "remediationSafetyLevel": "unsafe", + "assessedBy": "human", + "staleFingerprintWarning": null, + "rationale": "Breaking change as impacts channel contract", + "source": "bundled-default", + "fingerprint": "f3bc3a732e2ceb56f9ab5cfd9b75a12399480b456916d70cc609661a49c5868a" + }, + "asyncapi-channel-parameters": { + "ruleId": "asyncapi-channel-parameters", + "riskLevel": "high", + "confidenceLevel": "medium", + "remediationSafetyLevel": "humanreview", + "assessedBy": "human", + "staleFingerprintWarning": null, + "rationale": "Defining parameters is a breaking change only when types and parameter constraints/defaults are defined", + "source": "bundled-default", + "fingerprint": "546d72f461b4d23e69fc9ea1785691431fc74a4e799a9f35d766d18dbc45ecf1" + }, + "asyncapi-channel-servers": { + "ruleId": "asyncapi-channel-servers", + "riskLevel": "high", + "confidenceLevel": "low", + "remediationSafetyLevel": "unsafe", + "assessedBy": "human", + "staleFingerprintWarning": null, + "rationale": "Breaking change as constrains available servers", + "source": "bundled-default", + "fingerprint": "501921363ba8348262034651379c72bde32f843bc1a06d1cce4807ba0c98ca7d" + }, + "asyncapi-3-channel-servers": { + "ruleId": "asyncapi-3-channel-servers", + "riskLevel": "medium", + "confidenceLevel": "high", + "remediationSafetyLevel": "humanreview", + "assessedBy": "human", + "staleFingerprintWarning": null, + "rationale": "Breaking change as constrains available servers", + "source": "bundled-default", + "fingerprint": "f41d5f90903b2d7d6a4c32d07522249fd76bac6c3f4092bff84761c1b0b439c3" + }, + "asyncapi-headers-schema-type-object": { + "ruleId": "asyncapi-headers-schema-type-object", + "riskLevel": "high", + "confidenceLevel": "low", + "remediationSafetyLevel": "unsafe", + "assessedBy": "human", + "staleFingerprintWarning": null, + "rationale": "Breaking change as headers schema must be changed to type object", + "source": "bundled-default", + "fingerprint": "aecc325f9987b406d8cb954032bc0ed99526c56f30c01acbe0265f3120347bb7" + }, + "asyncapi-3-headers-schema-type-object": { + "ruleId": "asyncapi-3-headers-schema-type-object", + "riskLevel": "high", + "confidenceLevel": "low", + "remediationSafetyLevel": "unsafe", + "assessedBy": "human", + "staleFingerprintWarning": null, + "rationale": "Breaking change as headers schema must be changed to type object", + "source": "bundled-default", + "fingerprint": "d8a064576204e0cb9fec501dadb6c7a9b6088cd1211e8a1879296eabe6e17da9" + }, + "asyncapi-info-contact-properties": { + "ruleId": "asyncapi-info-contact-properties", + "riskLevel": "low", + "confidenceLevel": "high", + "remediationSafetyLevel": "safe", + "assessedBy": "automated", + "staleFingerprintWarning": null, + "rationale": "`truthy` function (additive — add/populate a field) on a target matching the low tier", + "source": "bundled-default", + "fingerprint": "720f6ac8c0660070d880cfbf10a400c8d47372d10666ff823b959accdeb0afc8" + }, + "asyncapi-info-contact": { + "ruleId": "asyncapi-info-contact", + "riskLevel": "low", + "confidenceLevel": "high", + "remediationSafetyLevel": "safe", + "assessedBy": "automated", + "staleFingerprintWarning": null, + "rationale": "`truthy` function (additive — add/populate a field) on a target matching the low tier", + "source": "bundled-default", + "fingerprint": "40da564631e6b29b1e7f8d7eb4452e87d3e39c0cd98ca71b865b83a666f3fb17" + }, + "asyncapi-info-description": { + "ruleId": "asyncapi-info-description", + "riskLevel": "low", + "confidenceLevel": "high", + "remediationSafetyLevel": "safe", + "assessedBy": "automated", + "staleFingerprintWarning": null, + "rationale": "`truthy` function (additive — add/populate a field) on a target matching the low tier", + "source": "bundled-default", + "fingerprint": "9d50c4f6e3f86a323ad186cc157699d9336a68b33b3d6d1ffe41e48852443ffe" + }, + "asyncapi-info-license-url": { + "ruleId": "asyncapi-info-license-url", + "riskLevel": "low", + "confidenceLevel": "high", + "remediationSafetyLevel": "safe", + "assessedBy": "automated", + "staleFingerprintWarning": null, + "rationale": "`truthy` function (additive — add/populate a field) on a target matching the low tier", + "source": "bundled-default", + "fingerprint": "4539169bb97df454320a01bdc44ff8e717d2e6a02baf9e3da1543f45108a61d1" + }, + "asyncapi-info-license": { + "ruleId": "asyncapi-info-license", + "riskLevel": "low", + "confidenceLevel": "high", + "remediationSafetyLevel": "safe", + "assessedBy": "automated", + "staleFingerprintWarning": null, + "rationale": "`truthy` function (additive — add/populate a field) on a target matching the low tier", + "source": "bundled-default", + "fingerprint": "564c2cc47a5d30e0fcb8190c317617f3085b480df2d9d04fc154d171380b3404" + }, + "asyncapi-latest-version": { + "ruleId": "asyncapi-latest-version", + "riskLevel": "high", + "confidenceLevel": "low", + "remediationSafetyLevel": "unsafe", + "assessedBy": "human", + "staleFingerprintWarning": null, + "rationale": "Breaking change as impacts basis of contract", + "source": "bundled-default", + "fingerprint": "e51e3e20cc1d8725ffc63beaa95d81c229652871f7355d452d8d3b1a7457f2bd" + }, + "asyncapi-message-examples": { + "ruleId": "asyncapi-message-examples", + "riskLevel": "low", + "confidenceLevel": "high", + "remediationSafetyLevel": "safe", + "assessedBy": "human", + "staleFingerprintWarning": null, + "rationale": "Message examples do not change contract surface, but can impact some other tooling", + "source": "bundled-default", + "fingerprint": "5cc61f794e110895be4e9998553f3a772c1ed5482703f754c05595732e22d438" + }, + "asyncapi-message-messageId-uniqueness": { + "ruleId": "asyncapi-message-messageId-uniqueness", + "riskLevel": "high", + "confidenceLevel": "medium", + "remediationSafetyLevel": "unsafe", + "assessedBy": "human", + "staleFingerprintWarning": null, + "rationale": "messageId uniqueueness is a breaking change as impacts contract surface, and code generation tooling", + "source": "bundled-default", + "fingerprint": "406375ccdcedf2a4bdcee791741bc01313b5be03a9c211db25945a4aec6f41b3" + }, + "asyncapi-operation-description": { + "ruleId": "asyncapi-operation-description", + "riskLevel": "low", + "confidenceLevel": "high", + "remediationSafetyLevel": "safe", + "assessedBy": "human", + "staleFingerprintWarning": null, + "rationale": "Operation description does not change contract surface", + "source": "bundled-default", + "fingerprint": "ab7f3fff78bcbb5051374ab1644d3f2312d91bc081d9a7cb5210f33ac745df86" + }, + "asyncapi-3-operation-description": { + "ruleId": "asyncapi-3-operation-description", + "riskLevel": "low", + "confidenceLevel": "high", + "remediationSafetyLevel": "safe", + "assessedBy": "human", + "staleFingerprintWarning": null, + "rationale": "Operation description does not change contract surface", + "source": "bundled-default", + "fingerprint": "9db022057bde86c48e4fd30348bf498c995d7dcbf1cba7a2ab77588ec38ec468" + }, + "asyncapi-operation-operationId-uniqueness": { + "ruleId": "asyncapi-operation-operationId-uniqueness", + "riskLevel": "high", + "confidenceLevel": "medium", + "remediationSafetyLevel": "unsafe", + "assessedBy": "human", + "staleFingerprintWarning": null, + "rationale": "operationId uniqueueness is a breaking change as impacts contract surface, and code generation tooling", + "source": "bundled-default", + "fingerprint": "ef6c87aaf69cec04cb0d91955ae4371d0b83df17954c3f35cb73e04a93c43080" + }, + "asyncapi-operation-operationId": { + "ruleId": "asyncapi-operation-operationId", + "riskLevel": "medium", + "confidenceLevel": "medium", + "remediationSafetyLevel": "unsafe", + "assessedBy": "human", + "staleFingerprintWarning": null, + "rationale": "Adding operationId is a breaking change as impacts contract surface, and code generation tooling", + "source": "bundled-default", + "fingerprint": "0b46aef66a0b399e3c55e73100ba41afbb42c3479f2d073dbb57966760b4602a" + }, + "asyncapi-operation-security": { + "ruleId": "asyncapi-operation-security", + "riskLevel": "high", + "confidenceLevel": "high", + "remediationSafetyLevel": "unsafe", + "assessedBy": "human", + "staleFingerprintWarning": null, + "rationale": "Referencing security schemes is a breaking change as impacts contract surface", + "source": "bundled-default", + "fingerprint": "6560182f944e99a49785c408312a0b8dd7c74c236693c7dd197d2ec19c765191" + }, + "asyncapi-3-operation-security": { + "ruleId": "asyncapi-3-operation-security", + "riskLevel": "high", + "confidenceLevel": "high", + "remediationSafetyLevel": "unsafe", + "assessedBy": "human", + "staleFingerprintWarning": null, + "rationale": "Referencing security schemes is a breaking change as impacts contract surface", + "source": "bundled-default", + "fingerprint": "a90de728f8e6ce307bbdfaf172004d4e70bdaf20dc91c619312b63b57587eae2" + }, + "asyncapi-parameter-description": { + "ruleId": "asyncapi-parameter-description", + "riskLevel": "low", + "confidenceLevel": "high", + "remediationSafetyLevel": "safe", + "assessedBy": "human", + "staleFingerprintWarning": null, + "rationale": "Operation description does not change contract surface", + "source": "bundled-default", + "fingerprint": "ea3bea998a82498d71473dd6a0babd13afe45883b80fad10cf05c9743bcab37e" + }, + "asyncapi-payload-default": { + "ruleId": "asyncapi-payload-default", + "riskLevel": "high", + "confidenceLevel": "high", + "remediationSafetyLevel": "unsafe", + "assessedBy": "human", + "staleFingerprintWarning": null, + "rationale": "Changing default payload is a breaking change as impacts contract surface", + "source": "bundled-default", + "fingerprint": "06e469803d27d75bb24daf00d492172c5f94fb119614fcbf28577f9939ba9585" + }, + "asyncapi-payload-examples": { + "ruleId": "asyncapi-payload-examples", + "riskLevel": "low", + "confidenceLevel": "high", + "remediationSafetyLevel": "safe", + "assessedBy": "human", + "staleFingerprintWarning": null, + "rationale": "Examples do not change contract surface, but can impact some other tooling", + "source": "bundled-default", + "fingerprint": "8b1f583bcbbf360262fd9a906a367202f1d83910632b5c39d7550dcd75d888b8" + }, + "asyncapi-payload-unsupported-schemaFormat": { + "ruleId": "asyncapi-payload-unsupported-schemaFormat", + "riskLevel": "medium", + "confidenceLevel": "medium", + "remediationSafetyLevel": "humanreview", + "assessedBy": "human", + "staleFingerprintWarning": null, + "rationale": "Changing payload schema format is a breaking change if it does not currently match the default schema format", + "source": "bundled-default", + "fingerprint": "70806490c4dcacc9e31a8391ef69dd0304466bd2b6812d41b6806315b6e46a41" + }, + "asyncapi-3-payload-unsupported-schemaFormat": { + "ruleId": "asyncapi-3-payload-unsupported-schemaFormat", + "riskLevel": "medium", + "confidenceLevel": "medium", + "remediationSafetyLevel": "humanreview", + "assessedBy": "human", + "staleFingerprintWarning": null, + "rationale": "Changing payload schema format is a breaking change if it does not currently match the default schema format", + "source": "bundled-default", + "fingerprint": "a7efbd604af1044c108ed1579e2895c54818373add14e9e0992e17dc55690cc0" + }, + "asyncapi-payload": { + "ruleId": "asyncapi-payload", + "riskLevel": "high", + "confidenceLevel": "low", + "remediationSafetyLevel": "unsafe", + "assessedBy": "automated", + "staleFingerprintWarning": null, + "rationale": "Changing payload is a breaking change as impacts contract surface", + "source": "bundled-default", + "fingerprint": "be86ce0d6e6f458480dd16a3325a73dfa034ba8cab33a3a1f2fa4008a0cae0d6" + }, + "asyncapi-schema-default": { + "ruleId": "asyncapi-schema-default", + "riskLevel": "high", + "confidenceLevel": "high", + "remediationSafetyLevel": "unsafe", + "assessedBy": "human", + "staleFingerprintWarning": null, + "rationale": "Changing default to align with schema is a breaking change as impacts contract surface", + "source": "bundled-default", + "fingerprint": "53561388f54e7f445a8c3e6aa246421a07a1d4006dfffa859fe6184b513ce06c" + }, + "asyncapi-schema-examples": { + "ruleId": "asyncapi-schema-examples", + "riskLevel": "low", + "confidenceLevel": "high", + "remediationSafetyLevel": "safe", + "assessedBy": "human", + "staleFingerprintWarning": null, + "rationale": "Examples do not change contract surface, but can impact some other tooling", + "source": "bundled-default", + "fingerprint": "0d83ca445e8b385d3bde04a8516d6f16127043e8bd7ee455aa1979a000cdd379" + }, + "asyncapi-schema": { + "ruleId": "asyncapi-schema", + "riskLevel": "medium", + "confidenceLevel": "medium", + "remediationSafetyLevel": "humanreview", + "assessedBy": "human", + "staleFingerprintWarning": null, + "rationale": "Current AsyncAPI specification is invalid. Some minor issues may be correctable without functionally impacting contract surface for most tools.", + "source": "bundled-default", + "fingerprint": "2447324def94a87a5cd8a6ce78b8674b0dd401c06d8f6cf48a4fee7c5547a15f" + }, + "asyncapi-server-variables": { + "ruleId": "asyncapi-server-variables", + "riskLevel": "high", + "confidenceLevel": "medium", + "remediationSafetyLevel": "humanreview", + "assessedBy": "human", + "staleFingerprintWarning": null, + "rationale": "Defining variables is a breaking change only when types and variable constraints/defaults are defined", + "source": "bundled-default", + "fingerprint": "a86cf9c53c3ed6f23c470d150edf449fe2af6bc3bb2cf00f6120dee1e15fbc37" + }, + "asyncapi-server-no-empty-variable": { + "ruleId": "asyncapi-server-no-empty-variable", + "riskLevel": "high", + "confidenceLevel": "medium", + "remediationSafetyLevel": "humanreview", + "assessedBy": "human", + "staleFingerprintWarning": null, + "rationale": "Defining variables is a breaking change only when types and variable constraints/defaults are defined", + "source": "bundled-default", + "fingerprint": "995dd4127b0111b66a8ccd95296c472a7ea005236c7bfb64abfdb9938dfaa45e" + }, + "asyncapi-3-server-no-empty-variable": { + "ruleId": "asyncapi-3-server-no-empty-variable", + "riskLevel": "high", + "confidenceLevel": "medium", + "remediationSafetyLevel": "humanreview", + "assessedBy": "human", + "staleFingerprintWarning": null, + "rationale": "Defining variables is a breaking change only when types and variable constraints/defaults are defined", + "source": "bundled-default", + "fingerprint": "c03357aacd4594dd77bd6823adeb8d811b559a59eb591c242ce499ec68064cd7" + }, + "asyncapi-server-no-trailing-slash": { + "ruleId": "asyncapi-server-no-trailing-slash", + "riskLevel": "medium", + "confidenceLevel": "medium", + "remediationSafetyLevel": "humanreview", + "assessedBy": "human", + "staleFingerprintWarning": null, + "rationale": "Technically a breaking change as impacts contract, but most tooling will handle this gracefully", + "source": "bundled-default", + "fingerprint": "73217cc58744bd1271f647e40904a6b5f65919124927e109532cf19cb540d2a2" + }, + "asyncapi-3-server-no-trailing-slash": { + "ruleId": "asyncapi-3-server-no-trailing-slash", + "riskLevel": "medium", + "confidenceLevel": "medium", + "remediationSafetyLevel": "humanreview", + "assessedBy": "human", + "staleFingerprintWarning": null, + "rationale": "Technically a breaking change as impacts contract, but most tooling will handle this gracefully", + "source": "bundled-default", + "fingerprint": "dbd318934d060020a4b9dd9ca039da253d8a9f5c5803c29997461ec59407c12d" + }, + "asyncapi-server-not-example-com": { + "ruleId": "asyncapi-server-not-example-com", + "riskLevel": "low", + "confidenceLevel": "high", + "remediationSafetyLevel": "safe", + "assessedBy": "human", + "staleFingerprintWarning": null, + "rationale": "Technically a breaking change as impacts contract, but anything relying upon this invalid server URL is already broken so safe to change.", + "source": "bundled-default", + "fingerprint": "6d16fc8ff55cfbbd2bce5a0ea27a6f480ef42bb7b3affe275bcabd86bd7d7b9c" + }, + "asyncapi-3-server-not-example-com": { + "ruleId": "asyncapi-3-server-not-example-com", + "riskLevel": "low", + "confidenceLevel": "high", + "remediationSafetyLevel": "safe", + "assessedBy": "human", + "staleFingerprintWarning": null, + "rationale": "Technically a breaking change as impacts contract, but anything relying upon this invalid server URL is already broken so safe to change.", + "source": "bundled-default", + "fingerprint": "4ee30c83abcb62482873a1ce7892a7b20fc10af9f6b05fb7d542701bd3297eb1" + }, + "asyncapi-server-security": { + "ruleId": "asyncapi-server-security", + "riskLevel": "high", + "confidenceLevel": "high", + "remediationSafetyLevel": "unsafe", + "assessedBy": "human", + "staleFingerprintWarning": null, + "rationale": "Referencing security schemes is a breaking change as impacts contract surface", + "source": "bundled-default", + "fingerprint": "be7582c1a1a3477d540e542f3108e47fa54227c518701511b2ec6845841262b6" + }, + "asyncapi-servers": { + "ruleId": "asyncapi-servers", + "riskLevel": "medium", + "confidenceLevel": "medium", + "remediationSafetyLevel": "humanreview", + "assessedBy": "human", + "staleFingerprintWarning": null, + "rationale": "Technically a breaking change as impacts contract, but most tooling will handle this gracefully", + "source": "bundled-default", + "fingerprint": "2e0f0eb538127a28f1225999b2b3f64cd883d2cfdadcdacdc1fcbdd506647864" + }, + "asyncapi-tag-description": { + "ruleId": "asyncapi-tag-description", + "riskLevel": "low", + "confidenceLevel": "high", + "remediationSafetyLevel": "safe", + "assessedBy": "automated", + "staleFingerprintWarning": null, + "rationale": "`truthy` function (additive — add/populate a field) on a target matching the low tier", + "source": "bundled-default", + "fingerprint": "473d407d2bfd3d2b87255230584e1d84a2593bdf3da8564188a3b4ad89523f4c" + }, + "asyncapi-3-tag-description": { + "ruleId": "asyncapi-3-tag-description", + "riskLevel": "low", + "confidenceLevel": "high", + "remediationSafetyLevel": "safe", + "assessedBy": "automated", + "staleFingerprintWarning": null, + "rationale": "`truthy` function (additive — add/populate a field) on a target matching the low tier", + "source": "bundled-default", + "fingerprint": "73c4edb519ef9fedc1f27aec4e998c494655c83cef71aec29ac8b3081a95166f" + }, + "asyncapi-tags-alphabetical": { + "ruleId": "asyncapi-tags-alphabetical", + "riskLevel": "low", + "confidenceLevel": "medium", + "remediationSafetyLevel": "safe", + "assessedBy": "automated", + "staleFingerprintWarning": null, + "rationale": "given path matched the safe segment set", + "source": "bundled-default", + "fingerprint": "4be99ae3283c046c2bb1a4230f04773fdfcc70e86cb407e0b98a4774bab3fc7b" + }, + "asyncapi-3-tags-alphabetical": { + "ruleId": "asyncapi-3-tags-alphabetical", + "riskLevel": "low", + "confidenceLevel": "medium", + "remediationSafetyLevel": "safe", + "assessedBy": "automated", + "staleFingerprintWarning": null, + "rationale": "given path matched the safe segment set", + "source": "bundled-default", + "fingerprint": "ac440f78a0fcf503a6e8edd5122cd24c24f8876da489497f2496c753532f5721" + }, + "asyncapi-tags-uniqueness": { + "ruleId": "asyncapi-tags-uniqueness", + "riskLevel": "low", + "confidenceLevel": "high", + "remediationSafetyLevel": "safe", + "assessedBy": "human", + "staleFingerprintWarning": null, + "rationale": "Tag uniqueness may impact some code generation tooling, but doesn't impact contract surface itself so is safe to change.", + "source": "bundled-default", + "fingerprint": "e44055bacacef48a6edabbe0f1234dbaea5dd767771fc04ddfe50f44e3c9b16e" + }, + "asyncapi-3-tags-uniqueness": { + "ruleId": "asyncapi-3-tags-uniqueness", + "riskLevel": "low", + "confidenceLevel": "high", + "remediationSafetyLevel": "safe", + "assessedBy": "human", + "staleFingerprintWarning": null, + "rationale": "Tag uniqueness may impact some code generation tooling, but doesn't impact contract surface itself so is safe to change.", + "source": "bundled-default", + "fingerprint": "c84b94e3b094e2d7c179172f8fa23a4a3109edba85d059b6fcc041076fd39436" + }, + "asyncapi-tags": { + "ruleId": "asyncapi-tags", + "riskLevel": "low", + "confidenceLevel": "high", + "remediationSafetyLevel": "safe", + "assessedBy": "human", + "staleFingerprintWarning": null, + "rationale": "Tags may impact some code generation tooling, but doesn't impact contract surface itself so is safe to change.", + "source": "bundled-default", + "fingerprint": "25d9843051c90a80ec8b2ac7abb63b36c62cf7189778bf775e7862459490ce6b" + }, + "asyncapi-3-tags": { + "ruleId": "asyncapi-3-tags", + "riskLevel": "low", + "confidenceLevel": "high", + "remediationSafetyLevel": "safe", + "assessedBy": "human", + "staleFingerprintWarning": null, + "rationale": "Tags may impact some code generation tooling, but doesn't impact contract surface itself so is safe to change.", + "source": "bundled-default", + "fingerprint": "61bd1ce568ee9740fb1909b0b9e1d10c4523ce3f7d920590268b13dbe830d6f1" + }, + "asyncapi-unused-components-schema": { + "ruleId": "asyncapi-unused-components-schema", + "riskLevel": "medium", + "confidenceLevel": "medium", + "remediationSafetyLevel": "humanreview", + "assessedBy": "human", + "staleFingerprintWarning": null, + "rationale": "Potentially unused components are identified for removal. Will impact contract surface if they are actually used.", + "source": "bundled-default", + "fingerprint": "ad2114ff35932d819129dd83aacc5af5b998faab8c59cf8a3628f97f2cd7696f" + }, + "asyncapi-unused-components-server": { + "ruleId": "asyncapi-unused-components-server", + "riskLevel": "medium", + "confidenceLevel": "medium", + "remediationSafetyLevel": "humanreview", + "assessedBy": "human", + "staleFingerprintWarning": null, + "rationale": "Potentially unused components are identified for removal. Will impact contract surface if they are actually used.", + "source": "bundled-default", + "fingerprint": "9b139f05f9c36f82841256dd8df2c4558fc8568f7051698347cb8723fc22b2b0" + }, + "asyncapi-3-document-resolved": { + "ruleId": "asyncapi-3-document-resolved", + "riskLevel": "medium", + "confidenceLevel": "medium", + "remediationSafetyLevel": "humanreview", + "assessedBy": "human", + "staleFingerprintWarning": null, + "rationale": "Current AsyncAPI specification is invalid. Some minor issues may be correctable without functionally impacting contract surface for most tools.", + "source": "bundled-default", + "fingerprint": "000fa2e885172a0f0776e07e8b1ac25ed6e8993f5d9e4eff7f95afb82e8269ac" + }, + "asyncapi-3-document-unresolved": { + "ruleId": "asyncapi-3-document-unresolved", + "riskLevel": "medium", + "confidenceLevel": "medium", + "remediationSafetyLevel": "humanreview", + "assessedBy": "human", + "staleFingerprintWarning": null, + "rationale": "Current AsyncAPI specification is invalid. Some minor issues may be correctable without functionally impacting contract surface for most tools.", + "source": "bundled-default", + "fingerprint": "4b40f541a13e64f709ff52b761001fd6e6bd0fbd1f99188b09cc498079aceb6c" + } + } +} diff --git a/packages/api-grade-core/src/rulesets/bundled-analysis/openapi.json b/packages/api-grade-core/src/rulesets/bundled-analysis/openapi.json new file mode 100644 index 0000000..4730090 --- /dev/null +++ b/packages/api-grade-core/src/rulesets/bundled-analysis/openapi.json @@ -0,0 +1,620 @@ +{ + "rules": { + "operation-success-response": { + "ruleId": "operation-success-response", + "riskLevel": "high", + "confidenceLevel": "medium", + "remediationSafetyLevel": "humanreview", + "assessedBy": "human", + "staleFingerprintWarning": null, + "rationale": "Technically a breaking change as impacts contract, but may be documenting a missing success response that already exists.", + "source": "bundled-default", + "fingerprint": "61fd168d9bd22c73ec4f5a8b1b0c3b7cef34953e54fe968d2bfcf652e15b9035" + }, + "oas2-operation-formData-consume-check": { + "ruleId": "oas2-operation-formData-consume-check", + "riskLevel": "high", + "confidenceLevel": "high", + "remediationSafetyLevel": "unsafe", + "assessedBy": "human", + "staleFingerprintWarning": null, + "rationale": "Breaking change as impacts contract surface, and code generation tooling", + "source": "bundled-default", + "fingerprint": "031f30e3251314ff995ec48893ac7bb71f2cd1ea501aa99206f4835e96efd905" + }, + "operation-operationId-unique": { + "ruleId": "operation-operationId-unique", + "riskLevel": "high", + "confidenceLevel": "medium", + "remediationSafetyLevel": "unsafe", + "assessedBy": "human", + "staleFingerprintWarning": null, + "rationale": "operationId uniqueueness is a breaking change as impacts contract surface, and code generation tooling", + "source": "bundled-default", + "fingerprint": "193b1b016bbb37d15bb9cea1a27551b57e17d9e40144c2b7cbe43746dcc2564a" + }, + "operation-parameters": { + "ruleId": "operation-parameters", + "riskLevel": "high", + "confidenceLevel": "medium", + "remediationSafetyLevel": "humanreview", + "assessedBy": "human", + "staleFingerprintWarning": null, + "rationale": "Defining parameters is a breaking change only when types and parameter constraints/defaults are defined", + "source": "bundled-default", + "fingerprint": "b19bdf139c3a32e2216b0ef46ab992fccaba72810530d778dc14ffab59a1c6cf" + }, + "operation-tag-defined": { + "ruleId": "operation-tag-defined", + "riskLevel": "low", + "confidenceLevel": "high", + "remediationSafetyLevel": "safe", + "assessedBy": "human", + "staleFingerprintWarning": null, + "rationale": "Tags may impact some code generation tooling, but doesn't impact contract surface itself so is safe to change.", + "source": "bundled-default", + "fingerprint": "8d46785b548305e07643e22ed832b2cce697964b3e24edabce26427f203a4642" + }, + "path-params": { + "ruleId": "path-params", + "riskLevel": "high", + "confidenceLevel": "medium", + "remediationSafetyLevel": "humanreview", + "assessedBy": "human", + "staleFingerprintWarning": null, + "rationale": "Defining parameters is a breaking change only when types and parameter constraints/defaults are defined", + "source": "bundled-default", + "fingerprint": "3e1ede2366a6dbfac8d8bb13681255e89ec40881316ddf844ace566bb46f9499" + }, + "contact-properties": { + "ruleId": "contact-properties", + "riskLevel": "low", + "confidenceLevel": "high", + "remediationSafetyLevel": "safe", + "assessedBy": "automated", + "staleFingerprintWarning": null, + "rationale": "`truthy` function (additive - add/populate a field) on a target matching the low tier", + "source": "bundled-default", + "fingerprint": "d59768d6529d4838dd2ea7ef0e9ff6527c897b12acffc53b949c22ebfaa1c923" + }, + "duplicated-entry-in-enum": { + "ruleId": "duplicated-entry-in-enum", + "riskLevel": "low", + "confidenceLevel": "high", + "remediationSafetyLevel": "safe", + "assessedBy": "human", + "staleFingerprintWarning": null, + "rationale": "Technically a breaking change as impacts contract, but most tooling will handle this gracefully", + "source": "bundled-default", + "fingerprint": "81a9374e76a50934f9c77375cfcd8af39454642a446e7ff0a68d94a5d98a5691" + }, + "info-contact": { + "ruleId": "info-contact", + "riskLevel": "low", + "confidenceLevel": "high", + "remediationSafetyLevel": "safe", + "assessedBy": "automated", + "staleFingerprintWarning": null, + "rationale": "`truthy` function (additive - add/populate a field) on a target matching the low tier", + "source": "bundled-default", + "fingerprint": "bfd79d377660294e00adc373539b65c8e6b03a4c6cb29033758e421ce4f5557d" + }, + "info-description": { + "ruleId": "info-description", + "riskLevel": "low", + "confidenceLevel": "high", + "remediationSafetyLevel": "safe", + "assessedBy": "automated", + "staleFingerprintWarning": null, + "rationale": "`truthy` function (additive - add/populate a field) on a target matching the low tier", + "source": "bundled-default", + "fingerprint": "f898530e669aaddf554b30a434a28412d61f3f47e196e4093cf73acfcde645e6" + }, + "info-license": { + "ruleId": "info-license", + "riskLevel": "low", + "confidenceLevel": "high", + "remediationSafetyLevel": "safe", + "assessedBy": "automated", + "staleFingerprintWarning": null, + "rationale": "`truthy` function (additive - add/populate a field) on a target matching the low tier", + "source": "bundled-default", + "fingerprint": "23d1efd67b85ecf0f54155777e27c019d4cb2a29bb0cff11efd849cc386c379f" + }, + "license-url": { + "ruleId": "license-url", + "riskLevel": "low", + "confidenceLevel": "high", + "remediationSafetyLevel": "safe", + "assessedBy": "automated", + "staleFingerprintWarning": null, + "rationale": "`truthy` function (additive - add/populate a field) on a target matching the low tier", + "source": "bundled-default", + "fingerprint": "735eb6f03143a91473b9573ab7e1526c94c10543ced941a88183a65b4d6c8d82" + }, + "no-eval-in-markdown": { + "ruleId": "no-eval-in-markdown", + "riskLevel": "low", + "confidenceLevel": "high", + "remediationSafetyLevel": "safe", + "assessedBy": "automated", + "staleFingerprintWarning": null, + "rationale": "`pattern` function (existence/validity check - `notMatch` validates content is present and correctly formed) on a target matching the low tier", + "source": "bundled-default", + "fingerprint": "d3ca7e2f67d2e957e30655eb4c6d567998bdbbdcbf390a12e71a23fdce3157fd" + }, + "no-script-tags-in-markdown": { + "ruleId": "no-script-tags-in-markdown", + "riskLevel": "low", + "confidenceLevel": "high", + "remediationSafetyLevel": "safe", + "assessedBy": "automated", + "staleFingerprintWarning": null, + "rationale": "`pattern` function (existence/validity check - `notMatch` validates content is present and correctly formed) on a target matching the low tier", + "source": "bundled-default", + "fingerprint": "be2229ebfd803a42d313318291544db932781c34973e60620fffda348fd0f394" + }, + "openapi-tags-alphabetical": { + "ruleId": "openapi-tags-alphabetical", + "riskLevel": "low", + "confidenceLevel": "medium", + "remediationSafetyLevel": "safe", + "assessedBy": "automated", + "staleFingerprintWarning": null, + "rationale": "given path matched the safe segment set", + "source": "bundled-default", + "fingerprint": "4403fea510bcccabfa5c4c1154134a92b2c61acc1421668162e163f4791790d4" + }, + "openapi-tags-uniqueness": { + "ruleId": "openapi-tags-uniqueness", + "riskLevel": "low", + "confidenceLevel": "high", + "remediationSafetyLevel": "safe", + "assessedBy": "human", + "staleFingerprintWarning": null, + "rationale": "Tag uniqueness may impact some code generation tooling, but doesn't impact contract surface itself so is safe to change.", + "source": "bundled-default", + "fingerprint": "dfaddcf0cbfe5b642ebcdaaeb580a9020a1db9ed6d484744e0a9088da0132aae" + }, + "openapi-tags": { + "ruleId": "openapi-tags", + "riskLevel": "low", + "confidenceLevel": "high", + "remediationSafetyLevel": "safe", + "assessedBy": "human", + "staleFingerprintWarning": null, + "rationale": "Tags may impact some code generation tooling, but doesn't impact contract surface itself so is safe to change.", + "source": "bundled-default", + "fingerprint": "995a098be9924c375e7b258224804477fe7b175f2cc70e785782672b1e3c72a7" + }, + "operation-description": { + "ruleId": "operation-description", + "riskLevel": "low", + "confidenceLevel": "high", + "remediationSafetyLevel": "safe", + "assessedBy": "human", + "staleFingerprintWarning": null, + "rationale": "Operation description does not change contract surface", + "source": "bundled-default", + "fingerprint": "163bd18a22ce5304b12d4d35f4c12b430d66147a04a3cd27505a9a61ddaa9dc8" + }, + "operation-operationId": { + "ruleId": "operation-operationId", + "riskLevel": "medium", + "confidenceLevel": "medium", + "remediationSafetyLevel": "unsafe", + "assessedBy": "human", + "staleFingerprintWarning": null, + "rationale": "Adding operationId is a breaking change as impacts contract surface, and code generation tooling", + "source": "bundled-default", + "fingerprint": "d4a660a6541790935065ad7a944cb0b3abf3a347e616d9737944fd1277065a88" + }, + "operation-operationId-valid-in-url": { + "ruleId": "operation-operationId-valid-in-url", + "riskLevel": "medium", + "confidenceLevel": "high", + "remediationSafetyLevel": "unsafe", + "assessedBy": "human", + "staleFingerprintWarning": null, + "rationale": "Changing an operationId is a breaking change as impacts contract surface, and code generation tooling", + "source": "bundled-default", + "fingerprint": "e66dd1ecb7649ac12ae77fcf35f033f4d7bffc423d43e8f1ad4a3e08470d3d10" + }, + "operation-singular-tag": { + "ruleId": "operation-singular-tag", + "riskLevel": "low", + "confidenceLevel": "medium", + "remediationSafetyLevel": "safe", + "assessedBy": "automated", + "staleFingerprintWarning": null, + "rationale": "given path matched the safe segment set", + "source": "bundled-default", + "fingerprint": "dba7be8211664f594c6467645bea5fb72955f15e009e96808fc3d37b997627ef" + }, + "operation-tags": { + "ruleId": "operation-tags", + "riskLevel": "low", + "confidenceLevel": "medium", + "remediationSafetyLevel": "safe", + "assessedBy": "automated", + "staleFingerprintWarning": null, + "rationale": "given path matched the safe segment set", + "source": "bundled-default", + "fingerprint": "eaf01c3f7f1b15840d54fe8dd6e238795022e175776ee4e71e5c039d46aff1ac" + }, + "path-declarations-must-exist": { + "ruleId": "path-declarations-must-exist", + "riskLevel": "high", + "confidenceLevel": "high", + "remediationSafetyLevel": "unsafe", + "assessedBy": "human", + "staleFingerprintWarning": null, + "rationale": "Impacts contract surface, and code generation tooling", + "source": "bundled-default", + "fingerprint": "f43a9d6a968dc24d5717f606028c828be8ea73057afcfdcb173d6f2528335408" + }, + "path-keys-no-trailing-slash": { + "ruleId": "path-keys-no-trailing-slash", + "riskLevel": "high", + "confidenceLevel": "high", + "remediationSafetyLevel": "unsafe", + "assessedBy": "human", + "staleFingerprintWarning": null, + "rationale": "Breaking change as impacts path contract", + "source": "bundled-default", + "fingerprint": "ea8d153efe6ac9cc8942c5f0292402751fa472ee5776e8a9a8dcafa7f99826a4" + }, + "path-not-include-query": { + "ruleId": "path-not-include-query", + "riskLevel": "high", + "confidenceLevel": "high", + "remediationSafetyLevel": "unsafe", + "assessedBy": "human", + "staleFingerprintWarning": null, + "rationale": "Breaking change as impacts path contract", + "source": "bundled-default", + "fingerprint": "e2738a25c076ae7f0ef2187431847d76fcab2f3e3d9f543b392f1ab9fa7138c6" + }, + "tag-description": { + "ruleId": "tag-description", + "riskLevel": "low", + "confidenceLevel": "high", + "remediationSafetyLevel": "safe", + "assessedBy": "automated", + "staleFingerprintWarning": null, + "rationale": "`truthy` function (additive - add/populate a field) on a target matching the low tier", + "source": "bundled-default", + "fingerprint": "5119c1fc513d486764bfe8f50d2479aaee0d096de0e79cee023e63848df5f71c" + }, + "no-$ref-siblings": { + "ruleId": "no-$ref-siblings", + "riskLevel": "low", + "confidenceLevel": "high", + "remediationSafetyLevel": "safe", + "assessedBy": "human", + "staleFingerprintWarning": null, + "rationale": "Stylistic change only, does not impact contract surface", + "source": "bundled-default", + "fingerprint": "908ccb23bbe35f2ea80f23a30cdbb6ddd56f1b62d0df483600c7f94a35a39199" + }, + "array-items": { + "ruleId": "array-items", + "riskLevel": "medium", + "confidenceLevel": "high", + "remediationSafetyLevel": "humanreview", + "assessedBy": "human", + "staleFingerprintWarning": null, + "rationale": "Impacts contract surface but functionally may not change expected behavior. Human review required to determine if change is safe.", + "source": "bundled-default", + "fingerprint": "3aa712ec537da47f9024300f105626c3bd25cad6951981e4f985e89f5c3dbb98" + }, + "typed-enum": { + "ruleId": "typed-enum", + "riskLevel": "high", + "confidenceLevel": "high", + "remediationSafetyLevel": "unsafe", + "assessedBy": "human", + "staleFingerprintWarning": null, + "rationale": "Impacts contract surface and code generation tooling", + "source": "bundled-default", + "fingerprint": "ee1665cbcfeb49e3c434e3f04c6adcede44ec1e306f875701236dd747bd27c5e" + }, + "oas2-api-host": { + "ruleId": "oas2-api-host", + "riskLevel": "low", + "confidenceLevel": "high", + "remediationSafetyLevel": "safe", + "assessedBy": "human", + "staleFingerprintWarning": null, + "rationale": "Impacts contract, but does not prevent use of other hosts so is safe to change.", + "source": "bundled-default", + "fingerprint": "606618dc6b0cd08c07fbe6aa8cccb58d0126d1ee387a43acb60adae316456d6d" + }, + "oas2-api-schemes": { + "ruleId": "oas2-api-schemes", + "riskLevel": "medium", + "confidenceLevel": "high", + "remediationSafetyLevel": "humanreview", + "assessedBy": "human", + "staleFingerprintWarning": null, + "rationale": "Technically a breaking change as impacts contract, but can just be reflecting current behavior.", + "source": "bundled-default", + "fingerprint": "8341742f2c75ae195b1e699c4625a845827a82db83eca103a900ebe8bb8d220d" + }, + "oas2-discriminator": { + "ruleId": "oas2-discriminator", + "riskLevel": "high", + "confidenceLevel": "medium", + "remediationSafetyLevel": "unsafe", + "assessedBy": "human", + "staleFingerprintWarning": null, + "rationale": "Impacts contract surface and code generation tooling", + "source": "bundled-default", + "fingerprint": "4f852c5d830bd5f2b2705ed4f41555099dbab8d164776ecaf03dde7a66a7525d" + }, + "oas2-host-not-example": { + "ruleId": "oas2-host-not-example", + "riskLevel": "low", + "confidenceLevel": "high", + "remediationSafetyLevel": "safe", + "assessedBy": "human", + "staleFingerprintWarning": null, + "rationale": "Technically a breaking change as impacts contract, but anything relying upon this invalid server URL is already broken so safe to change.", + "source": "bundled-default", + "fingerprint": "411bdde4f2ea237f4c2871f690551e6f5f6de96501448981823681e2b8dc9e5e" + }, + "oas2-host-trailing-slash": { + "ruleId": "oas2-host-trailing-slash", + "riskLevel": "medium", + "confidenceLevel": "medium", + "remediationSafetyLevel": "humanreview", + "assessedBy": "human", + "staleFingerprintWarning": null, + "rationale": "Technically a breaking change as impacts contract, but most tooling will handle this gracefully", + "source": "bundled-default", + "fingerprint": "ab7f3cd618bb07568e09911593a5eeab78ee8c22d04a0048696320863d562a16" + }, + "oas2-parameter-description": { + "ruleId": "oas2-parameter-description", + "riskLevel": "low", + "confidenceLevel": "high", + "remediationSafetyLevel": "safe", + "assessedBy": "human", + "staleFingerprintWarning": null, + "rationale": "Parameter description does not change contract surface", + "source": "bundled-default", + "fingerprint": "a71cf43534a262ed4e89f50f8f68892d376e09e6c7575cdec3d873c58a204aa9" + }, + "oas2-operation-security-defined": { + "ruleId": "oas2-operation-security-defined", + "riskLevel": "high", + "confidenceLevel": "high", + "remediationSafetyLevel": "unsafe", + "assessedBy": "human", + "staleFingerprintWarning": null, + "rationale": "Referencing security schemes is a breaking change as impacts contract surface", + "source": "bundled-default", + "fingerprint": "099d37d08607e8e42d0548e69312ec7e1499e8566c0d35b7e73e2ae14c7eb8ef" + }, + "oas2-valid-schema-example": { + "ruleId": "oas2-valid-schema-example", + "riskLevel": "low", + "confidenceLevel": "high", + "remediationSafetyLevel": "safe", + "assessedBy": "human", + "staleFingerprintWarning": null, + "rationale": "Examples do not change contract surface, but can impact some other tooling", + "source": "bundled-default", + "fingerprint": "0c45706676b8341bb78ab9224b84e55bd7603b459e42c9bfa7a4634707af3f03" + }, + "oas2-valid-media-example": { + "ruleId": "oas2-valid-media-example", + "riskLevel": "low", + "confidenceLevel": "high", + "remediationSafetyLevel": "safe", + "assessedBy": "human", + "staleFingerprintWarning": null, + "rationale": "Examples do not change contract surface, but can impact some other tooling", + "source": "bundled-default", + "fingerprint": "aa301d7f693d177527187bad6ab0080c3dbbfec2b1a7e544f72b0a5b8b24b436" + }, + "oas2-anyOf": { + "ruleId": "oas2-anyOf", + "riskLevel": "medium", + "confidenceLevel": "medium", + "remediationSafetyLevel": "humanreview", + "assessedBy": "human", + "staleFingerprintWarning": null, + "rationale": "Current specification is invalid. Impacts contract surface but current behavior is undefined.", + "source": "bundled-default", + "fingerprint": "83239f968899d5a54b93e5f9406fd2ec2d6da0848c6139219e6900aad75ba877" + }, + "oas2-oneOf": { + "ruleId": "oas2-oneOf", + "riskLevel": "medium", + "confidenceLevel": "medium", + "remediationSafetyLevel": "humanreview", + "assessedBy": "human", + "staleFingerprintWarning": null, + "rationale": "Current specification is invalid. Impacts contract surface but current behavior is undefined.", + "source": "bundled-default", + "fingerprint": "482d2dceaa7d6683611a89e9b8408672b68c8e027a99788b35c19ab028ea48e5" + }, + "oas2-schema": { + "ruleId": "oas2-schema", + "riskLevel": "medium", + "confidenceLevel": "medium", + "remediationSafetyLevel": "humanreview", + "assessedBy": "human", + "staleFingerprintWarning": null, + "rationale": "Current OpenAPI specification is invalid. Some minor issues may be correctable without functionally impacting contract surface for most tools.", + "source": "bundled-default", + "fingerprint": "b0f746a1d1d1b5f4aca6963162ac4c9d2ca2e7011302be6e03e6ad7e4163f50a" + }, + "oas2-unused-definition": { + "ruleId": "oas2-unused-definition", + "riskLevel": "medium", + "confidenceLevel": "medium", + "remediationSafetyLevel": "humanreview", + "assessedBy": "human", + "staleFingerprintWarning": null, + "rationale": "Potentially unused components are identified for removal. Will impact contract surface if they are actually used.", + "source": "bundled-default", + "fingerprint": "db2cdd3c98956ad2642445a26bbcbbe616ae5e376b25f765d6646d3bc930ebb7" + }, + "oas3-api-servers": { + "ruleId": "oas3-api-servers", + "riskLevel": "medium", + "confidenceLevel": "medium", + "remediationSafetyLevel": "humanreview", + "assessedBy": "automated", + "staleFingerprintWarning": null, + "rationale": "given path matched the humanreview segment set", + "source": "bundled-default", + "fingerprint": "2db1e86eb4b72586a0316dc0ca473b738159b3175909eb12064ee62d9097d2a3" + }, + "oas3-examples-value-or-externalValue": { + "ruleId": "oas3-examples-value-or-externalValue", + "riskLevel": "low", + "confidenceLevel": "high", + "remediationSafetyLevel": "safe", + "assessedBy": "human", + "staleFingerprintWarning": null, + "rationale": "Examples do not change contract surface, but can impact some other tooling", + "source": "bundled-default", + "fingerprint": "9ade61fceb521b0eb63ad4ca568a614cc05eded709048f2da4520116006cfe61" + }, + "oas3-operation-security-defined": { + "ruleId": "oas3-operation-security-defined", + "riskLevel": "high", + "confidenceLevel": "high", + "remediationSafetyLevel": "unsafe", + "assessedBy": "human", + "staleFingerprintWarning": null, + "rationale": "Referencing security schemes is a breaking change as impacts contract surface", + "source": "bundled-default", + "fingerprint": "99c4d02ae16e11a76b219fb92f75c67c4a3ce8150e6574f09c0c49c4c1af85cd" + }, + "oas3-parameter-description": { + "ruleId": "oas3-parameter-description", + "riskLevel": "low", + "confidenceLevel": "high", + "remediationSafetyLevel": "safe", + "assessedBy": "human", + "staleFingerprintWarning": null, + "rationale": "Parameter description does not change contract surface", + "source": "bundled-default", + "fingerprint": "2c00927d4b0b8bcb9bc3746ca56a9f65dd3bd3f06bc189ae7d51d9d630d01640" + }, + "oas3-server-not-example.com": { + "ruleId": "oas3-server-not-example.com", + "riskLevel": "low", + "confidenceLevel": "high", + "remediationSafetyLevel": "safe", + "assessedBy": "human", + "staleFingerprintWarning": null, + "rationale": "Technically a breaking change as impacts contract, but anything relying upon this invalid server URL is already broken so safe to change.", + "source": "bundled-default", + "fingerprint": "006381a3500034d8448234c9a2bc150b35b8eb1de914cd3d14fa8d83a176e172" + }, + "oas3-server-trailing-slash": { + "ruleId": "oas3-server-trailing-slash", + "riskLevel": "medium", + "confidenceLevel": "medium", + "remediationSafetyLevel": "humanreview", + "assessedBy": "human", + "staleFingerprintWarning": null, + "rationale": "Technically a breaking change as impacts contract, but most tooling will handle this gracefully", + "source": "bundled-default", + "fingerprint": "842d08729f0957b1bd259f16efbba29ae28dfa018339113170bde28633a0db7f" + }, + "oas3-valid-media-example": { + "ruleId": "oas3-valid-media-example", + "riskLevel": "low", + "confidenceLevel": "high", + "remediationSafetyLevel": "safe", + "assessedBy": "human", + "staleFingerprintWarning": null, + "rationale": "Examples do not change contract surface, but can impact some other tooling", + "source": "bundled-default", + "fingerprint": "d9020856ee9e770f11c51458177ca355c8f96a93877862fbad3dc2d0af38ac54" + }, + "oas3-valid-schema-example": { + "ruleId": "oas3-valid-schema-example", + "riskLevel": "low", + "confidenceLevel": "high", + "remediationSafetyLevel": "safe", + "assessedBy": "human", + "staleFingerprintWarning": null, + "rationale": "Examples do not change contract surface, but can impact some other tooling", + "source": "bundled-default", + "fingerprint": "5c7f466d75908f95e4f652d5adfc7b08d6dd9b1b2bea7c1987a0798c9a3dcd10" + }, + "oas3-schema": { + "ruleId": "oas3-schema", + "riskLevel": "medium", + "confidenceLevel": "medium", + "remediationSafetyLevel": "humanreview", + "assessedBy": "human", + "staleFingerprintWarning": null, + "rationale": "Current OpenAPI specification is invalid. Some minor issues may be correctable without functionally impacting contract surface for most tools.", + "source": "bundled-default", + "fingerprint": "2ec6ef11c1e00ad817c6f9b838bcb2739c4c72a699c03a631d1171dc063f4034" + }, + "oas3-unused-component": { + "ruleId": "oas3-unused-component", + "riskLevel": "medium", + "confidenceLevel": "medium", + "remediationSafetyLevel": "humanreview", + "assessedBy": "human", + "staleFingerprintWarning": null, + "rationale": "Potentially unused components are identified for removal. Will impact contract surface if they are actually used.", + "source": "bundled-default", + "fingerprint": "c975b2b51bddaa30766251d29190e6a4ae6d99a2a623dcf56a171cb8ce1255a6" + }, + "oas3-server-variables": { + "ruleId": "oas3-server-variables", + "riskLevel": "high", + "confidenceLevel": "medium", + "remediationSafetyLevel": "humanreview", + "assessedBy": "human", + "staleFingerprintWarning": null, + "rationale": "Defining variables is a breaking change only when types and variable constraints/defaults are defined", + "source": "bundled-default", + "fingerprint": "d258d7ca2a322070e5314d24bdf7d8838a42cfab024ff4a3e39e560574ac8dde" + }, + "oas3-callbacks-in-callbacks": { + "ruleId": "oas3-callbacks-in-callbacks", + "riskLevel": "medium", + "confidenceLevel": "medium", + "remediationSafetyLevel": "humanreview", + "assessedBy": "human", + "staleFingerprintWarning": null, + "rationale": "Impacts contract surface but current specification is invalid so human review is required to determine if change is safe.", + "source": "bundled-default", + "fingerprint": "8239e6a7a792354786211a3a29afc4ab82620ae5a15dd5e384a51ed12e265bfe" + }, + "oas3_1-servers-in-webhook": { + "ruleId": "oas3_1-servers-in-webhook", + "riskLevel": "medium", + "confidenceLevel": "medium", + "remediationSafetyLevel": "humanreview", + "assessedBy": "human", + "staleFingerprintWarning": null, + "rationale": "Impacts contract surface but current specification is invalid so human review is required to determine if change is safe.", + "source": "bundled-default", + "fingerprint": "557e24a270abf7b2b600bc993949675b105927fb3238b8534c493888cc73b6cd" + }, + "oas3_1-callbacks-in-webhook": { + "ruleId": "oas3_1-callbacks-in-webhook", + "riskLevel": "medium", + "confidenceLevel": "medium", + "remediationSafetyLevel": "humanreview", + "assessedBy": "human", + "staleFingerprintWarning": null, + "rationale": "Impacts contract surface but current specification is invalid so human review is required to determine if change is safe.", + "source": "bundled-default", + "fingerprint": "b806759bcaa2062f1331821f0635ace112d38957a954e967b7308c21b56260c3" + } + } +} diff --git a/packages/api-grade-core/src/types.ts b/packages/api-grade-core/src/types.ts index 1366ef5..7db8e3e 100644 --- a/packages/api-grade-core/src/types.ts +++ b/packages/api-grade-core/src/types.ts @@ -110,16 +110,61 @@ export interface SessionState { sessionRulesetOverride: 'builtin' | null; } -export type ViolationClass = 'nonBreaking' | 'breaking' | 'unknown'; +export type RemediationSafetyLevel = 'safe' | 'humanreview' | 'unsafe'; -export interface QuickFix { +export type RiskLevel = 'low' | 'medium' | 'high'; + +export type ConfidenceLevel = 'high' | 'medium' | 'low'; + +export type AssessmentOrigin = 'human' | 'automated'; + +export type AnalysisSource = 'persisted' | 'bundled-default' | 'heuristic' | 'fallback'; + +export interface StaleFingerprintWarning { + storedFingerprint: string; + currentFingerprint: string; + message: string; +} + +export interface RuleAnalysis { + ruleId: string; + riskLevel: RiskLevel | null; + confidenceLevel: ConfidenceLevel; + remediationSafetyLevel: RemediationSafetyLevel; + assessedBy: AssessmentOrigin; + staleFingerprintWarning: StaleFingerprintWarning | null; + rationale: string; + source: AnalysisSource; +} + +export interface RulesetAnalysis { + rulesetSource: 'default' | 'custom'; + rulesetPath?: string; + rules: RuleAnalysis[]; +} + +export interface RemediationItem { ruleId: string; message: string; - severity: string; + severity: DiagnosticSeverity; path: string[]; location: string; + range: Diagnostic['range']; currentValue: string | null; expectedImprovement: string; + riskLevel: RiskLevel | null; + confidenceLevel: ConfidenceLevel; + remediationSafetyLevel: RemediationSafetyLevel; + safetyRationale: string; + staleFingerprintWarning: StaleFingerprintWarning | null; +} + +export interface DiagnosticWithSafety extends Diagnostic { + riskLevel: RiskLevel | null; + confidenceLevel: ConfidenceLevel; + remediationSafetyLevel: RemediationSafetyLevel; + safetyRationale: string; + staleFingerprintWarning: StaleFingerprintWarning | null; } export interface CommonGradeOutput { @@ -129,7 +174,7 @@ export interface CommonGradeOutput { gradeLabel: GradeLabel; numericScore: number; summary: DiagnosticSummary; - diagnostics: Diagnostic[]; + diagnostics: Diagnostic[] | DiagnosticWithSafety[]; truncated?: boolean; rulesetSource: 'default' | 'custom'; rulesetPath?: string; @@ -143,10 +188,29 @@ export interface AssertOutput { numericScore: number; } -export interface QuickFixOutput { +export interface PersistedRuleEntry extends RuleAnalysis { + fingerprint: string; +} + +export interface SharedRulesetAnalysis { + location: string; + rules: Record; +} + +export interface PersonalRulesetAnalysisOverride { + scope: 'workspace' | 'global'; + rules: Record; +} + +export interface BundledRulesetAnalysis { + rules: Record; +} + +export interface RemediationSafetyOutput { specPath: string; format: ApiFormat; totalViolations: number; - quickFixCount: number; - quickFixes: QuickFix[]; + remediationItemCount: number; + remediationItems: RemediationItem[]; + requestedLevel: RemediationSafetyLevel; } diff --git a/packages/api-grade-core/tests/unit/quick-fixes.test.ts b/packages/api-grade-core/tests/unit/quick-fixes.test.ts deleted file mode 100644 index 961334c..0000000 --- a/packages/api-grade-core/tests/unit/quick-fixes.test.ts +++ /dev/null @@ -1,114 +0,0 @@ -import { describe, it, expect } from 'vitest'; -import { classifyViolation, buildQuickFix, buildQuickFixOutput, formatQuickFixesHuman } from '../../src/quick-fixes.js'; -import type { Diagnostic, GradeResult } from '../../src/types.js'; - -function makeDiagnostic(overrides: Partial): Diagnostic { - return { - ruleId: 'test-rule', - message: 'test message', - severity: 'warn', - path: [], - range: { start: { line: 0, character: 0 }, end: { line: 0, character: 0 } }, - source: 'test.yaml', - ...overrides, - }; -} - -describe('classifyViolation()', () => { - it('classifies operation-description as nonBreaking (rule ID override)', () => { - const d = makeDiagnostic({ ruleId: 'operation-description', path: ['paths', '/pets', 'get'] }); - expect(classifyViolation(d)).toBe('nonBreaking'); - }); - - it('classifies violation at required field as breaking', () => { - const d = makeDiagnostic({ ruleId: 'some-rule', path: ['paths', '/pets', 'get', 'parameters', '0', 'required'] }); - expect(classifyViolation(d)).toBe('breaking'); - }); - - it('classifies info-contact as nonBreaking (rule ID override)', () => { - const d = makeDiagnostic({ ruleId: 'info-contact', path: ['info'] }); - expect(classifyViolation(d)).toBe('nonBreaking'); - }); - - it('classifies violation with x- extension path as nonBreaking', () => { - const d = makeDiagnostic({ ruleId: 'some-rule', path: ['info', 'x-logo'] }); - expect(classifyViolation(d)).toBe('nonBreaking'); - }); - - it('classifies unknown path with no recognised segments as unknown', () => { - const d = makeDiagnostic({ ruleId: 'obscure-rule', path: ['components', 'securitySchemes', 'oauth2'] }); - expect(classifyViolation(d)).toBe('unknown'); - }); - - it('classifies oas3-examples-* rules as nonBreaking', () => { - const d = makeDiagnostic({ ruleId: 'oas3-examples-value-or-externalValue', path: ['paths', '/pets', 'get'] }); - expect(classifyViolation(d)).toBe('nonBreaking'); - }); - - it('classifies description path segment as nonBreaking', () => { - const d = makeDiagnostic({ ruleId: 'some-rule', path: ['info', 'description'] }); - expect(classifyViolation(d)).toBe('nonBreaking'); - }); - - it('classifies type path segment as breaking', () => { - const d = makeDiagnostic({ ruleId: 'some-rule', path: ['components', 'schemas', 'Pet', 'type'] }); - expect(classifyViolation(d)).toBe('breaking'); - }); -}); - -describe('buildQuickFix()', () => { - it('builds a QuickFix from a diagnostic', () => { - const d = makeDiagnostic({ ruleId: 'info-contact', message: 'Missing contact', path: ['info'], severity: 'warn' }); - const fix = buildQuickFix(d, '{}'); - expect(fix.ruleId).toBe('info-contact'); - expect(fix.location).toBe('info'); - expect(fix.expectedImprovement).toContain('contact'); - }); -}); - -const baseResult: GradeResult = { - specPath: 'test.yaml', - format: 'openapi-3', - letterGrade: 'B', - gradeLabel: 'Good', - numericScore: 85, - summary: { - tone: 'Good', - severityLevel: 'INFO', - errorCount: 0, - warnCount: 2, - infoCount: 0, - hintCount: 0, - commentary: 'Good.', - text: 'Good.', - focusRules: [], - recommendations: [], - }, - diagnostics: [ - makeDiagnostic({ ruleId: 'info-contact', message: 'Missing contact', path: ['info'], severity: 'warn' }), - makeDiagnostic({ ruleId: 'some-rule', message: 'Required field missing', path: ['paths', '/pets', 'get', 'required'], severity: 'error' }), - ], - rulesetSource: 'default', -}; - -describe('buildQuickFixOutput()', () => { - it('filters diagnostics to the nonBreaking subset and counts totals', () => { - const output = buildQuickFixOutput(baseResult, '{}'); - expect(output.specPath).toBe('test.yaml'); - expect(output.format).toBe('openapi-3'); - expect(output.totalViolations).toBe(2); - expect(output.quickFixCount).toBe(1); - expect(output.quickFixes).toHaveLength(1); - expect(output.quickFixes[0].ruleId).toBe('info-contact'); - }); -}); - -describe('formatQuickFixesHuman()', () => { - it('renders the filtered quick-fix list as human-readable text', () => { - const text = formatQuickFixesHuman(baseResult, '{}'); - expect(text).toContain('Quick Fixes'); - expect(text).toContain('info-contact'); - expect(text).toContain('Missing contact'); - expect(text).not.toContain('some-rule'); - }); -}); diff --git a/packages/api-grade-core/tests/unit/remediation-safety.test.ts b/packages/api-grade-core/tests/unit/remediation-safety.test.ts new file mode 100644 index 0000000..1310262 --- /dev/null +++ b/packages/api-grade-core/tests/unit/remediation-safety.test.ts @@ -0,0 +1,499 @@ +import { describe, it, expect, beforeEach, afterEach } from 'vitest'; +import { mkdtempSync, rmSync, writeFileSync, existsSync } from 'node:fs'; +import { tmpdir } from 'node:os'; +import { join } from 'node:path'; +import { + analyseRuleset, + getRemediationSafety, + buildRemediationItem, + buildRemediationSafetyOutput, + formatRemediationSafetyHuman, + decisionMatrix, + computeRuleFingerprint, + persistRuleAnalysisCorrection, +} from '../../src/remediation-safety.js'; +import { deriveSharedAnalysisLocation } from '../../src/config/shared-ruleset-analysis.js'; +import { getWorkspaceOverridePath } from '../../src/config/personal-ruleset-override.js'; +import type { Diagnostic, GradeResult } from '../../src/types.js'; +import type { LoadedRuleset } from '../../src/rulesets/loader.js'; + +function makeDiagnostic(overrides: Partial): Diagnostic { + return { + ruleId: 'test-rule', + message: 'test message', + severity: 'warn', + path: [], + range: { start: { line: 0, character: 0 }, end: { line: 0, character: 0 } }, + source: 'test.yaml', + ...overrides, + }; +} + +function makeRuleset(rules: Record, rulesetSource: 'default' | 'custom' = 'custom', rulesetPath?: string): LoadedRuleset { + return { ruleset: { rules }, rulesetSource, ...(rulesetPath !== undefined ? { rulesetPath } : {}) }; +} + +describe('decisionMatrix()', () => { + it('low risk + high/medium confidence => safe', () => { + expect(decisionMatrix('low', 'high')).toBe('safe'); + expect(decisionMatrix('low', 'medium')).toBe('safe'); + }); + it('medium risk + high confidence => humanreview', () => { + expect(decisionMatrix('medium', 'high')).toBe('humanreview'); + }); + it('high risk (any confidence) => unsafe', () => { + expect(decisionMatrix('high', 'high')).toBe('unsafe'); + expect(decisionMatrix('high', 'low')).toBe('unsafe'); + }); + it('every other combination => humanreview', () => { + expect(decisionMatrix('low', 'low')).toBe('humanreview'); + expect(decisionMatrix('medium', 'medium')).toBe('humanreview'); + expect(decisionMatrix('medium', 'low')).toBe('humanreview'); + }); +}); + +describe('analyseRuleset() — Stage 1a key-selector check', () => { + it('classifies a path-key-selector rule as unsafe/high', async () => { + const ruleset = makeRuleset({ + 'custom-naming-convention': { given: '$.paths[*]~', then: { function: 'casing' } }, + }); + const { rules } = await analyseRuleset(ruleset); + expect(rules[0].riskLevel).toBe('high'); + expect(rules[0].confidenceLevel).toBe('high'); + expect(rules[0].remediationSafetyLevel).toBe('unsafe'); + expect(rules[0].source).toBe('heuristic'); + }); + + it('classifies a channel-key-selector rule as unsafe/high', async () => { + const ruleset = makeRuleset({ + 'custom-channel-rename': { given: '$.channels[*]~', then: { function: 'casing' } }, + }); + const { rules } = await analyseRuleset(ruleset); + expect(rules[0].riskLevel).toBe('high'); + expect(rules[0].confidenceLevel).toBe('high'); + }); + + it('classifies then.field "@key" on $.channels as unsafe/high (AsyncAPI 2.x pattern)', async () => { + const ruleset = makeRuleset({ + 'asyncapi-channel-no-empty-parameter': { + given: '$.channels', + then: { field: '@key', function: 'pattern' }, + }, + }); + const { rules } = await analyseRuleset(ruleset); + expect(rules[0].riskLevel).toBe('high'); + expect(rules[0].confidenceLevel).toBe('high'); + expect(rules[0].remediationSafetyLevel).toBe('unsafe'); + expect(rules[0].source).toBe('heuristic'); + }); + + it('classifies then.field "@key" on $.paths as unsafe/high', async () => { + const ruleset = makeRuleset({ + 'custom-path-key-rule': { + given: '$.paths', + then: { field: '@key', function: 'pattern' }, + }, + }); + const { rules } = await analyseRuleset(ruleset); + expect(rules[0].riskLevel).toBe('high'); + expect(rules[0].confidenceLevel).toBe('high'); + expect(rules[0].remediationSafetyLevel).toBe('unsafe'); + }); + + it('does NOT apply @key check when given does not target paths/channels', async () => { + const ruleset = makeRuleset({ + 'custom-schema-key-rule': { + given: '$.components.schemas', + then: { field: '@key', function: 'pattern' }, + }, + }); + const { rules } = await analyseRuleset(ruleset); + // @key on $.components.schemas → not paths/channels → falls through to Stage 1b + // pattern fn, schemas has no tier match → medium risk, high confidence (single tier) + expect(rules[0].riskLevel).toBe('medium'); + expect(rules[0].remediationSafetyLevel).toBe('humanreview'); + }); +}); + +describe('analyseRuleset() — Stage 1b function-mechanics classification', () => { + it('additive function on a safe segment => low risk => safe', async () => { + const ruleset = makeRuleset({ + 'operation-description': { given: '$.paths[*][*]', then: { field: 'description', function: 'truthy' } }, + }); + const { rules } = await analyseRuleset(ruleset); + expect(rules[0].riskLevel).toBe('low'); + expect(rules[0].remediationSafetyLevel).toBe('safe'); + }); + + it('additive function on an unsafe segment => high risk => unsafe', async () => { + const ruleset = makeRuleset({ + 'custom-required-truthy': { given: '$.paths[*][*].parameters[*].required', then: { function: 'truthy' } }, + }); + const { rules } = await analyseRuleset(ruleset); + expect(rules[0].riskLevel).toBe('high'); + expect(rules[0].remediationSafetyLevel).toBe('unsafe'); + }); + + it('additive function targeting a safe field within a humanreview parent => low risk (field overrides parent)', async () => { + // $.operations.* has `operations` in HUMANREVIEW_SEGMENTS, but then.field=description is + // exclusively in SAFE_SEGMENTS — adding a description is safe regardless of parent. + const ruleset = makeRuleset({ + 'asyncapi-3-operation-description': { given: '$.operations.*', then: { field: 'description', function: 'truthy' } }, + }); + const { rules } = await analyseRuleset(ruleset); + expect(rules[0].riskLevel).toBe('low'); + expect(rules[0].confidenceLevel).toBe('high'); + expect(rules[0].remediationSafetyLevel).toBe('safe'); + }); + + it('additive function targeting a safe field within an unsafe parent => low risk (field overrides parent)', async () => { + // $.channels.*.parameters.* has `parameters` in UNSAFE_SEGMENTS, but then.field=description + // is exclusively safe — describing a parameter does not alter the contract. + const ruleset = makeRuleset({ + 'asyncapi-parameter-description': { + given: ['$.components.parameters.*', '$.channels.*.parameters.*'], + then: { field: 'description', function: 'truthy' }, + }, + }); + const { rules } = await analyseRuleset(ruleset); + expect(rules[0].riskLevel).toBe('low'); + expect(rules[0].confidenceLevel).toBe('high'); + expect(rules[0].remediationSafetyLevel).toBe('safe'); + }); + + it('additive function targeting a humanreview field is NOT overridden (operationId in operations parent)', async () => { + // then.field=operationId is in HUMANREVIEW_SEGMENTS itself, so field-override does not apply. + const ruleset = makeRuleset({ + 'asyncapi-operation-operationId': { given: '$.operations.*', then: { field: 'operationId', function: 'truthy' } }, + }); + const { rules } = await analyseRuleset(ruleset); + expect(rules[0].riskLevel).toBe('medium'); + expect(rules[0].remediationSafetyLevel).toBe('humanreview'); + }); + + it('rename function (pattern/casing) on default target => medium risk', async () => { + const ruleset = makeRuleset({ + 'custom-pattern-rule': { given: '$.components.schemas', then: { function: 'pattern' } }, + }); + const { rules } = await analyseRuleset(ruleset); + expect(rules[0].riskLevel).toBe('medium'); + }); + + it('pattern with match functionOption => rename/reformat classification (not existence check)', async () => { + const ruleset = makeRuleset({ + 'custom-format-rule': { + given: '$.paths[*][*]', + then: { field: 'operationId', function: 'pattern', functionOptions: { match: '^[a-z-]+$' } }, + }, + }); + const { rules } = await analyseRuleset(ruleset); + expect(rules[0].riskLevel).toBe('medium'); + expect(rules[0].rationale).toContain('rename/reformat'); + }); + + it('pattern with notMatch-only => existence/validity check classification', async () => { + const ruleset = makeRuleset({ + 'asyncapi-3-channel-no-empty-parameter': { + given: '$.channels.*', + then: { field: 'address', function: 'pattern', functionOptions: { notMatch: '{}' } }, + }, + }); + const { rules } = await analyseRuleset(ruleset); + // address in UNSAFE_SEGMENTS → high risk regardless of function mode + expect(rules[0].riskLevel).toBe('high'); + expect(rules[0].remediationSafetyLevel).toBe('unsafe'); + expect(rules[0].rationale).toContain('existence/validity check'); + }); + + it('pattern with notMatch-only on safe segment => low risk (additive)', async () => { + const ruleset = makeRuleset({ + 'custom-no-script-in-description': { + given: '$.paths[*][*]', + then: { field: 'description', function: 'pattern', functionOptions: { notMatch: ' conservative medium (not low)', async () => { + const ruleset = makeRuleset({ + 'custom-host-check': { + given: '$', + then: { field: 'host', function: 'pattern', functionOptions: { notMatch: 'example\\.com' } }, + }, + }); + const { rules } = await analyseRuleset(ruleset); + // host has no tier match; empty tiers → conservative medium, not low + expect(rules[0].riskLevel).toBe('medium'); + expect(rules[0].remediationSafetyLevel).toBe('humanreview'); + }); + + it('pattern with both match and notMatch => rename/reformat (not existence check)', async () => { + const ruleset = makeRuleset({ + 'custom-ambiguous-pattern': { + given: '$.paths[*][*]', + then: { field: 'operationId', function: 'pattern', functionOptions: { match: '^[a-z]', notMatch: '__' } }, + }, + }); + const { rules } = await analyseRuleset(ruleset); + expect(rules[0].rationale).toContain('rename/reformat'); + }); + + it('custom (unrecognized) function => high risk, low confidence', async () => { + const ruleset = makeRuleset({ + 'my-custom-rule': { given: '$.info', then: { function: 'myCustomFn' } }, + }); + const { rules } = await analyseRuleset(ruleset); + expect(rules[0].riskLevel).toBe('high'); + expect(rules[0].confidenceLevel).toBe('low'); + expect(rules[0].remediationSafetyLevel).toBe('unsafe'); + }); +}); + +describe('analyseRuleset() — Stage 1c generic segment fallback', () => { + it('unrecognized function, given matches single unsafe segment => high/medium confidence', async () => { + const ruleset = makeRuleset({ + 'custom-required-header': { + given: "$.paths[*][*].parameters[?(@.in=='header')].required", + then: { function: 'schema' }, + }, + }); + const { rules } = await analyseRuleset(ruleset); + expect(rules[0].riskLevel).toBe('high'); + expect(rules[0].confidenceLevel).toBe('medium'); + expect(rules[0].source).toBe('heuristic'); + }); + + it('given matches multiple tiers => ambiguous, low confidence', async () => { + const ruleset = makeRuleset({ + 'custom-ambiguous': { + given: '$.paths[*][*].description.required', + then: { function: 'schema' }, + }, + }); + const { rules } = await analyseRuleset(ruleset); + expect(rules[0].confidenceLevel).toBe('low'); + }); +}); + +describe('analyseRuleset() — Stage 2 whole-document fallback', () => { + it('no rule-id, function, or path signal at all => unsafe/low', async () => { + const ruleset = makeRuleset({ + 'oas3-schema': { given: '$', then: { function: 'schema' } }, + }); + const { rules } = await analyseRuleset(ruleset); + expect(rules[0].riskLevel).toBe('high'); + expect(rules[0].confidenceLevel).toBe('low'); + expect(rules[0].remediationSafetyLevel).toBe('unsafe'); + expect(rules[0].source).toBe('fallback'); + }); +}); + +describe('analyseRuleset() — total coverage (SC-005)', () => { + it('produces exactly one RuleAnalysis entry per rule key, no omissions', async () => { + const ruleset = makeRuleset({ + 'rule-a': { given: '$.info', then: { function: 'truthy' } }, + 'rule-b': { given: '$', then: { function: 'unknownFn' } }, + 'rule-c': { given: '$.paths[*]~', then: { function: 'casing' } }, + }); + const { rules } = await analyseRuleset(ruleset); + expect(rules).toHaveLength(3); + expect(rules.map((r) => r.ruleId).sort()).toEqual(['rule-a', 'rule-b', 'rule-c']); + }); +}); + +describe('getRemediationSafety()', () => { + it('returns the rule analysis fields verbatim for a recognized ruleId', async () => { + const ruleset = makeRuleset({ + 'operation-description': { given: '$.paths[*][*]', then: { field: 'description', function: 'truthy' } }, + }); + const rulesetAnalysis = await analyseRuleset(ruleset); + const d = makeDiagnostic({ ruleId: 'operation-description' }); + const result = getRemediationSafety(d, rulesetAnalysis); + expect(result.remediationSafetyLevel).toBe('safe'); + expect(result.riskLevel).toBe('low'); + }); + + it('FR-009: defaults to unsafe/low/high on lookup miss', async () => { + const ruleset = makeRuleset({ + 'operation-description': { given: '$.paths[*][*]', then: { field: 'description', function: 'truthy' } }, + }); + const rulesetAnalysis = await analyseRuleset(ruleset); + const d = makeDiagnostic({ ruleId: 'never-seen-rule' }); + const result = getRemediationSafety(d, rulesetAnalysis); + expect(result).toEqual({ + riskLevel: 'high', + confidenceLevel: 'low', + remediationSafetyLevel: 'unsafe', + safetyRationale: 'no recognizable rule-id, function, or path signal', + staleFingerprintWarning: null, + }); + }); +}); + +describe('buildRemediationItem()', () => { + it('builds a RemediationItem from a diagnostic with safety fields attached', async () => { + const ruleset = makeRuleset({ + 'info-contact': { given: '$.info', then: { function: 'truthy' } }, + }); + const rulesetAnalysis = await analyseRuleset(ruleset); + const d = makeDiagnostic({ ruleId: 'info-contact', message: 'Missing contact', path: ['info'], severity: 'warn' }); + const item = buildRemediationItem(d, '{}', rulesetAnalysis); + expect(item.ruleId).toBe('info-contact'); + expect(item.location).toBe('info'); + expect(item.remediationSafetyLevel).toBe('safe'); + expect(item.riskLevel).toBe('low'); + }); +}); + +describe('buildRemediationSafetyOutput() / formatRemediationSafetyHuman()', () => { + const baseRuleset = makeRuleset({ + 'info-contact': { given: '$.info', then: { function: 'truthy' } }, + 'custom-required-rule': { given: '$.paths[*][*].required', then: { function: 'schema' } }, + }); + + const baseResult: GradeResult = { + specPath: 'test.yaml', + format: 'openapi-3', + letterGrade: 'B', + gradeLabel: 'Good', + numericScore: 85, + summary: { + tone: 'Good', + severityLevel: 'INFO', + errorCount: 0, + warnCount: 2, + infoCount: 0, + hintCount: 0, + commentary: 'Good.', + text: 'Good.', + focusRules: [], + recommendations: [], + }, + diagnostics: [ + makeDiagnostic({ ruleId: 'info-contact', message: 'Missing contact', path: ['info'], severity: 'warn' }), + makeDiagnostic({ ruleId: 'custom-required-rule', message: 'Required field missing', path: ['paths', '/pets', 'get', 'required'], severity: 'error' }), + ], + rulesetSource: 'default', + }; + + it('filters diagnostics to the requested level and counts totals', async () => { + const rulesetAnalysis = await analyseRuleset(baseRuleset); + const output = buildRemediationSafetyOutput(baseResult, '{}', rulesetAnalysis, 'safe'); + expect(output.specPath).toBe('test.yaml'); + expect(output.format).toBe('openapi-3'); + expect(output.totalViolations).toBe(2); + expect(output.requestedLevel).toBe('safe'); + expect(output.remediationItemCount).toBe(1); + expect(output.remediationItems).toHaveLength(1); + expect(output.remediationItems[0].ruleId).toBe('info-contact'); + }); + + it('safe membership is unchanged from pre-feature classifyViolation() behavior (FR-007)', async () => { + const rulesetAnalysis = await analyseRuleset(baseRuleset); + const output = buildRemediationSafetyOutput(baseResult, '{}', rulesetAnalysis, 'safe'); + expect(output.remediationItems.map((i) => i.ruleId)).toEqual(['info-contact']); + }); + + it('renders the filtered list as human-readable text', async () => { + const rulesetAnalysis = await analyseRuleset(baseRuleset); + const text = formatRemediationSafetyHuman(baseResult, '{}', rulesetAnalysis, 'safe'); + expect(text).toContain('Remediation Safety: safe'); + expect(text).toContain('info-contact'); + expect(text).toContain('Missing contact'); + expect(text).not.toContain('custom-required-rule'); + }); +}); + +describe('computeRuleFingerprint()', () => { + it('is stable for the same rule definition', () => { + const rule = { given: '$.info', then: { function: 'truthy' }, severity: 1, description: 'd' }; + expect(computeRuleFingerprint('rule-a', rule)).toBe(computeRuleFingerprint('rule-a', rule)); + }); + + it('changes when the rule definition changes', () => { + const ruleA = { given: '$.info', then: { function: 'truthy' }, severity: 1, description: 'd' }; + const ruleB = { given: '$.info', then: { function: 'truthy' }, severity: 1, description: 'changed' }; + expect(computeRuleFingerprint('rule-a', ruleA)).not.toBe(computeRuleFingerprint('rule-a', ruleB)); + }); +}); + +describe('Stage 0 precedence, fingerprint staleness, and persisted corrections', () => { + let workDir: string; + let originalCwd: typeof process.cwd; + + beforeEach(() => { + workDir = mkdtempSync(join(tmpdir(), 'api-grade-stage0-')); + originalCwd = process.cwd; + process.cwd = () => workDir; + }); + + afterEach(() => { + process.cwd = originalCwd; + rmSync(workDir, { recursive: true, force: true }); + }); + + it('honors a colocated shared analysis entry over the automated heuristic', async () => { + const rulesetPath = join(workDir, 'custom.yaml'); + writeFileSync(rulesetPath, 'extends: []\n'); + const ruleset = makeRuleset( + { 'custom-rule': { given: '$', then: { function: 'schema' } } }, + 'custom', + rulesetPath + ); + + const result = await persistRuleAnalysisCorrection(ruleset, 'custom-rule', 'safe', 'shared'); + expect(result.written).toBe('shared'); + expect(existsSync(deriveSharedAnalysisLocation(rulesetPath))).toBe(true); + + const analysis = await analyseRuleset(ruleset); + const entry = analysis.rules.find((r) => r.ruleId === 'custom-rule'); + expect(entry?.remediationSafetyLevel).toBe('safe'); + expect(entry?.assessedBy).toBe('human'); + expect(entry?.source).toBe('persisted'); + expect(entry?.staleFingerprintWarning).toBeNull(); + }); + + it('flags a human-assessed entry whose fingerprint no longer matches, but still honors it', async () => { + const rulesetPath = join(workDir, 'custom.yaml'); + writeFileSync(rulesetPath, 'extends: []\n'); + const originalRule = { given: '$', then: { function: 'schema' }, description: 'original' }; + const ruleset = makeRuleset({ 'custom-rule': originalRule }, 'custom', rulesetPath); + + await persistRuleAnalysisCorrection(ruleset, 'custom-rule', 'safe', 'shared'); + + const editedRule = { given: '$', then: { function: 'schema' }, description: 'edited since review' }; + const editedRuleset = makeRuleset({ 'custom-rule': editedRule }, 'custom', rulesetPath); + + const analysis = await analyseRuleset(editedRuleset); + const entry = analysis.rules.find((r) => r.ruleId === 'custom-rule'); + expect(entry?.remediationSafetyLevel).toBe('safe'); + expect(entry?.assessedBy).toBe('human'); + expect(entry?.staleFingerprintWarning).not.toBeNull(); + expect(entry?.staleFingerprintWarning?.storedFingerprint).not.toBe(entry?.staleFingerprintWarning?.currentFingerprint); + }); + + it('personal workspace override takes precedence over the shared colocated analysis', async () => { + const rulesetPath = join(workDir, 'custom.yaml'); + writeFileSync(rulesetPath, 'extends: []\n'); + const ruleset = makeRuleset({ 'custom-rule': { given: '$', then: { function: 'schema' } } }, 'custom', rulesetPath); + + await persistRuleAnalysisCorrection(ruleset, 'custom-rule', 'humanreview', 'shared'); + await persistRuleAnalysisCorrection(ruleset, 'custom-rule', 'safe', 'personal-workspace'); + expect(existsSync(getWorkspaceOverridePath())).toBe(true); + + const analysis = await analyseRuleset(ruleset); + const entry = analysis.rules.find((r) => r.ruleId === 'custom-rule'); + expect(entry?.remediationSafetyLevel).toBe('safe'); + }); + + it('falls back to a personal-override write for a non-writable (built-in) ruleset location', async () => { + const ruleset = makeRuleset({ 'operation-description': { given: '$.info', then: { function: 'truthy' } } }, 'default'); + const result = await persistRuleAnalysisCorrection(ruleset, 'operation-description', 'unsafe', 'shared'); + expect(result.written).toBe('personal-fallback'); + expect(result.sharedFileContent).toBeDefined(); + expect(existsSync(getWorkspaceOverridePath())).toBe(true); + }); +}); diff --git a/packages/api-grade-mcp/README.md b/packages/api-grade-mcp/README.md index 588dfb9..bb35be7 100644 --- a/packages/api-grade-mcp/README.md +++ b/packages/api-grade-mcp/README.md @@ -1,6 +1,6 @@ # @dawmatt/api-grade-mcp -MCP (Model Context Protocol) server that exposes api-grade capabilities as six AI tools — grade OpenAPI and AsyncAPI specifications directly from Claude Code, GitHub Copilot, or any MCP-compatible AI host. +MCP (Model Context Protocol) server that exposes api-grade capabilities as seven AI tools — grade OpenAPI and AsyncAPI specifications directly from Claude Code, GitHub Copilot, or any MCP-compatible AI host. ## Installation @@ -60,7 +60,8 @@ Create `.vscode/mcp.json` in your project root: | `grade-api` | Letter grade, score, and summary — token-efficient overview | | `grade-api-detailed` | Full grade with all violations and diagnostics | | `assert-api-grade` | Pass/fail assertion for a minimum grade threshold | -| `grade-api-remediation-safety` | Classified list of diagnostics filtered by remediation safety level (`safe`: non-breaking improvements) for AI-assisted correction | +| `grade-api-remediation-safety` | Classified list of diagnostics filtered by remediation safety level (`safe`, `humanreview`, or `unsafe`), each with a risk/confidence indicator, for AI-assisted correction | +| `analyse-ruleset-safety` | Per-rule risk, confidence, and remediation-safety analysis for a ruleset, independent of grading any spec | | `set-ruleset-config` | Set the default Spectral ruleset at session, workspace, or global scope | | `get-ruleset-config` | Get the active Spectral ruleset and which scope is effective | diff --git a/packages/api-grade-mcp/package.json b/packages/api-grade-mcp/package.json index 2b322b6..d35cf8f 100644 --- a/packages/api-grade-mcp/package.json +++ b/packages/api-grade-mcp/package.json @@ -1,6 +1,6 @@ { "name": "@dawmatt/api-grade-mcp", - "version": "0.4.0", + "version": "0.5.0", "description": "MCP server exposing api-grade capabilities for LLMs and agentic AI tooling", "keywords": [ "api", diff --git a/packages/api-grade-mcp/src/server.ts b/packages/api-grade-mcp/src/server.ts index 95721bd..4b7f612 100644 --- a/packages/api-grade-mcp/src/server.ts +++ b/packages/api-grade-mcp/src/server.ts @@ -5,9 +5,10 @@ import { resolve, dirname } from 'node:path'; import { registerGradeTool } from './tools/grade.js'; import { registerAssertGradeTool } from './tools/assert-grade.js'; import { registerGradeDetailedTool } from './tools/grade-detailed.js'; -import { registerQuickFixesOnlyTool } from './tools/quick-fixes-only.js'; +import { registerRemediationSafetyTool } from './tools/remediation-safety.js'; import { registerSetRulesetConfigTool } from './tools/set-ruleset-config.js'; import { registerGetRulesetConfigTool } from './tools/get-ruleset-config.js'; +import { registerAnalyseRulesetSafetyTool } from './tools/analyse-ruleset-safety.js'; import type { SessionState } from './types.js'; function getVersion(): string { @@ -27,8 +28,9 @@ export function createServer(): McpServer { registerGradeTool(server, sessionState); registerAssertGradeTool(server, sessionState); registerGradeDetailedTool(server, sessionState); - registerQuickFixesOnlyTool(server, sessionState); + registerRemediationSafetyTool(server, sessionState); registerSetRulesetConfigTool(server, sessionState); registerGetRulesetConfigTool(server, sessionState); + registerAnalyseRulesetSafetyTool(server, sessionState); return server; } diff --git a/packages/api-grade-mcp/src/tools/analyse-ruleset-safety.ts b/packages/api-grade-mcp/src/tools/analyse-ruleset-safety.ts new file mode 100644 index 0000000..9428219 --- /dev/null +++ b/packages/api-grade-mcp/src/tools/analyse-ruleset-safety.ts @@ -0,0 +1,106 @@ +import { statSync, writeFileSync, unlinkSync } from 'node:fs'; +import { tmpdir } from 'node:os'; +import { join } from 'node:path'; +import { z } from 'zod'; +import { + loadWorkspaceConfig, + loadGlobalConfig, + resolveRuleset, + fetchRulesetContent, + RulesetAuthError, + INITIAL_FETCH_TIMEOUT_MS, + RETRY_FETCH_TIMEOUT_MS, + analyseRuleset, + loadRuleset, +} from '@dawmatt/api-grade-core'; +import type { McpServer } from '@modelcontextprotocol/sdk/server/mcp.js'; +import { mcpError, buildRulesetFetchFailureResponse, describeFetchFailureReason, ERROR_CODES } from '../utils/errors.js'; +import type { SessionState } from '@dawmatt/api-grade-core'; + +export function registerAnalyseRulesetSafetyTool(server: McpServer, sessionState: SessionState): void { + server.tool( + 'analyse-ruleset-safety', + "Inspect a Spectral ruleset's per-rule remediation-safety analysis (riskLevel, confidenceLevel, remediationSafetyLevel, assessedBy, rationale) without grading any specific API specification. Use this to understand how risky it would be to auto-remediate violations of each rule in a ruleset before running grade-api-remediation-safety against a real spec.", + { + rulesetPath: z + .string() + .optional() + .describe('Optional path to a custom Spectral-compatible ruleset file; omit to analyse the configured default or built-in ruleset'), + recoveryOption: z + .enum(['retry', 'use-builtin-once', 'use-builtin-session', 'cancel']) + .optional() + .describe( + 'Recovery action when the configured default ruleset is inaccessible. Only supply in response to a RULESET_AUTH_FAILED response. On receiving that response, present its recoveryOptions to the user verbatim and wait for their explicit choice before setting this field — do not select use-builtin-once or use-builtin-session on the user’s behalf.' + ), + }, + async ({ rulesetPath, recoveryOption }) => { + if (recoveryOption === 'cancel') { + return mcpError(ERROR_CODES.REQUEST_CANCELLED, 'Ruleset analysis cancelled by user.', {}); + } + + if (recoveryOption === 'use-builtin-session') { + sessionState.sessionRulesetOverride = 'builtin'; + } + + const workspaceConfig = await loadWorkspaceConfig(); + const globalConfig = await loadGlobalConfig(); + const resolved = resolveRuleset(rulesetPath, sessionState, workspaceConfig, globalConfig); + + let effectiveRulesetPath: string | undefined = resolved.rulesetPath ?? undefined; + let tempRulesetFile: string | undefined; + + if (resolved.rulesetPath?.startsWith('http')) { + if (recoveryOption === 'use-builtin-once') { + effectiveRulesetPath = undefined; + } else { + const timeoutMs = recoveryOption === 'retry' ? RETRY_FETCH_TIMEOUT_MS : INITIAL_FETCH_TIMEOUT_MS; + try { + let content: string; + if (resolved.auth?.type === 'github-pat') { + const token = resolved.auth.githubToken ?? process.env.GITHUB_TOKEN ?? ''; + content = await fetchRulesetContent(resolved.rulesetPath, token || undefined, timeoutMs); + } else { + content = await fetchRulesetContent(resolved.rulesetPath, undefined, timeoutMs); + } + tempRulesetFile = join(tmpdir(), `api-grade-ruleset-${Date.now()}.yaml`); + writeFileSync(tempRulesetFile, content); + effectiveRulesetPath = tempRulesetFile; + } catch (err) { + const reason = err instanceof RulesetAuthError ? err.reason : 'network-unreachable'; + return buildRulesetFetchFailureResponse( + reason, + resolved.rulesetPath, + resolved.scope, + `Could not fetch ruleset from '${resolved.rulesetPath}' (${resolved.scope} default): ${describeFetchFailureReason(reason)}.` + ); + } + } + } else if (effectiveRulesetPath) { + try { + statSync(effectiveRulesetPath); + } catch { + return mcpError( + ERROR_CODES.RULESET_NOT_FOUND, + `The ruleset file '${effectiveRulesetPath}' does not exist. Check the path and try again.`, + { rulesetPath: effectiveRulesetPath } + ); + } + } + + try { + const loadedRuleset = await loadRuleset('openapi-3', effectiveRulesetPath); + const analysis = await analyseRuleset(loadedRuleset); + return { content: [{ type: 'text', text: JSON.stringify(analysis) }] }; + } catch (err) { + const message = err instanceof Error ? err.message : String(err); + return mcpError( + ERROR_CODES.GRADE_ENGINE_ERROR, + `Ruleset analysis error: ${message}`, + { rulesetPath: effectiveRulesetPath } + ); + } finally { + if (tempRulesetFile) try { unlinkSync(tempRulesetFile); } catch { /* ignore */ } + } + } + ); +} diff --git a/packages/api-grade-mcp/src/tools/quick-fixes-only.ts b/packages/api-grade-mcp/src/tools/remediation-safety.ts similarity index 79% rename from packages/api-grade-mcp/src/tools/quick-fixes-only.ts rename to packages/api-grade-mcp/src/tools/remediation-safety.ts index c27f45f..9166175 100644 --- a/packages/api-grade-mcp/src/tools/quick-fixes-only.ts +++ b/packages/api-grade-mcp/src/tools/remediation-safety.ts @@ -11,18 +11,21 @@ import { RulesetAuthError, INITIAL_FETCH_TIMEOUT_MS, RETRY_FETCH_TIMEOUT_MS, - buildQuickFixOutput, + analyseRuleset, + buildRemediationSafetyOutput, + loadRuleset, } from '@dawmatt/api-grade-core'; +import type { RemediationSafetyLevel } from '@dawmatt/api-grade-core'; import type { McpServer } from '@modelcontextprotocol/sdk/server/mcp.js'; import { mcpError, buildRulesetFetchFailureResponse, describeFetchFailureReason, ERROR_CODES } from '../utils/errors.js'; import type { SessionState } from '@dawmatt/api-grade-core'; const LARGE_SPEC_THRESHOLD_BYTES = 500_000; -export function registerQuickFixesOnlyTool(server: McpServer, sessionState: SessionState): void { +export function registerRemediationSafetyTool(server: McpServer, sessionState: SessionState): void { server.tool( 'grade-api-remediation-safety', - 'Return a classified, AI-actionable list of diagnostics filtered by remediation safety level. The `safe` level covers improvements that can be made via non-breaking changes — those that do not alter the API interface contract (paths, methods, required parameters, schema types, or response structures). Use this tool (not grade-api-detailed) when the goal is for the AI to safely resolve violations; the AI generates the corrected specification content and the MCP server does not modify files.', + 'Return a classified, AI-actionable list of diagnostics filtered by remediation safety level: `safe` (non-breaking, safe to auto-apply), `humanreview` (typically additive/clarifying but should be confirmed by a human before applying at scale), or `unsafe` (could change request/response validation, required fields, types, or the parameter surface — requires human or explicitly-confirmed-agent review). Each returned item also carries a confidence indicator (riskLevel/confidenceLevel) explaining how sure the analyser is in its classification. Use this tool (not grade-api-detailed) when the goal is for the AI to safely resolve violations; the AI generates the corrected specification content and the MCP server does not modify files.', { specPath: z .string() @@ -30,8 +33,8 @@ export function registerQuickFixesOnlyTool(server: McpServer, sessionState: Sess 'Absolute or relative path to the OpenAPI or AsyncAPI specification file (YAML or JSON)' ), level: z - .enum(['safe']) - .describe('Remediation safety level to filter diagnostics by. Only "safe" is supported today.'), + .enum(['safe', 'humanreview', 'unsafe']) + .describe('Remediation safety level to filter diagnostics by.'), rulesetPath: z .string() .optional() @@ -43,7 +46,7 @@ export function registerQuickFixesOnlyTool(server: McpServer, sessionState: Sess 'Recovery action when the configured default ruleset is inaccessible. Only supply in response to a RULESET_AUTH_FAILED response. On receiving that response, present its recoveryOptions to the user verbatim and wait for their explicit choice before setting this field — do not select use-builtin-once or use-builtin-session on the user’s behalf.' ), }, - async ({ specPath, rulesetPath, recoveryOption }) => { + async ({ specPath, level, rulesetPath, recoveryOption }) => { if (recoveryOption === 'cancel') { return mcpError(ERROR_CODES.REQUEST_CANCELLED, 'Grading request cancelled by user.', { specPath }); } @@ -118,8 +121,12 @@ export function registerQuickFixesOnlyTool(server: McpServer, sessionState: Sess try { const engine = new GradeEngine(); const result = await engine.grade({ specPath, rulesetPath: effectiveRulesetPath }); + const loadedRuleset = await loadRuleset(result.format, effectiveRulesetPath); + const rulesetAnalysis = await analyseRuleset(loadedRuleset); - const response: Record = { ...buildQuickFixOutput(result, specContent) }; + const response: Record = { + ...buildRemediationSafetyOutput(result, specContent, rulesetAnalysis, level as RemediationSafetyLevel), + }; if (largeSpecWarning) { response.largeSpecWarning = largeSpecWarning; diff --git a/packages/api-grade-mcp/src/utils/classify.ts b/packages/api-grade-mcp/src/utils/classify.ts index 9337817..7bed85f 100644 --- a/packages/api-grade-mcp/src/utils/classify.ts +++ b/packages/api-grade-mcp/src/utils/classify.ts @@ -1,2 +1,2 @@ -export { classifyViolation, buildQuickFix } from '@dawmatt/api-grade-core'; -export type { QuickFix, ViolationClass } from '@dawmatt/api-grade-core'; +export { analyseRuleset, getRemediationSafety } from '@dawmatt/api-grade-core'; +export type { RuleAnalysis, RemediationSafetyLevel, RiskLevel, ConfidenceLevel } from '@dawmatt/api-grade-core'; diff --git a/packages/api-grade-mcp/tests/integration/analyse-ruleset-safety.test.ts b/packages/api-grade-mcp/tests/integration/analyse-ruleset-safety.test.ts new file mode 100644 index 0000000..c383430 --- /dev/null +++ b/packages/api-grade-mcp/tests/integration/analyse-ruleset-safety.test.ts @@ -0,0 +1,38 @@ +import { describe, it, expect } from 'vitest'; +import { createServer } from '../../src/server.js'; + +type ToolRegistry = Record, extra: unknown) => Promise }>; + +async function callTool(server: ReturnType, toolName: string, args: Record) { + const tools = (server as unknown as { _registeredTools: ToolRegistry })._registeredTools; + const tool = tools[toolName]; + if (!tool) throw new Error(`${toolName} tool not registered`); + return tool.handler(args, {}) as Promise<{ content: [{ type: string; text: string }]; isError?: boolean }>; +} + +describe('analyse-ruleset-safety tool', () => { + it('returns a RulesetAnalysis document for the built-in ruleset', async () => { + const server = createServer(); + const result = await callTool(server, 'analyse-ruleset-safety', {}); + expect(result.isError).toBeFalsy(); + const body = JSON.parse(result.content[0].text); + expect(body).toHaveProperty('rulesetSource', 'default'); + expect(Array.isArray(body.rules)).toBe(true); + expect(body.rules.length).toBeGreaterThan(0); + for (const rule of body.rules) { + expect(rule).toHaveProperty('ruleId'); + expect(rule).toHaveProperty('confidenceLevel'); + expect(rule).toHaveProperty('remediationSafetyLevel'); + expect(rule).toHaveProperty('assessedBy'); + expect(rule).toHaveProperty('rationale'); + } + }); + + it('returns RULESET_NOT_FOUND for a non-existent custom ruleset', async () => { + const server = createServer(); + const result = await callTool(server, 'analyse-ruleset-safety', { rulesetPath: '/nonexistent/ruleset.yaml' }); + expect(result.isError).toBe(true); + const body = JSON.parse(result.content[0].text); + expect(body.error).toBe('RULESET_NOT_FOUND'); + }); +}); diff --git a/packages/api-grade-mcp/tests/integration/quick-fixes-only.test.ts b/packages/api-grade-mcp/tests/integration/remediation-safety.test.ts similarity index 77% rename from packages/api-grade-mcp/tests/integration/quick-fixes-only.test.ts rename to packages/api-grade-mcp/tests/integration/remediation-safety.test.ts index 2c8f7c7..2e16563 100644 --- a/packages/api-grade-mcp/tests/integration/quick-fixes-only.test.ts +++ b/packages/api-grade-mcp/tests/integration/remediation-safety.test.ts @@ -18,23 +18,24 @@ async function callTool(server: ReturnType, toolName: strin } describe('grade-api-remediation-safety tool', () => { - it('returns non-empty quickFixes for a spec with documentation gaps (quick fix opportunities)', async () => { + it.each(['safe', 'humanreview', 'unsafe'])('returns the RemediationSafetyOutput shape for level=%s', async (level) => { const server = createServer(); - const result = await callTool(server, 'grade-api-remediation-safety', { specPath: OPENAPI_POOR, level: 'safe' }); + const result = await callTool(server, 'grade-api-remediation-safety', { specPath: OPENAPI_POOR, level }); expect(result.isError).toBeFalsy(); const body = JSON.parse(result.content[0].text); - expect(body).toHaveProperty('quickFixes'); - expect(body).toHaveProperty('quickFixCount'); + expect(body).toHaveProperty('remediationItems'); + expect(body).toHaveProperty('remediationItemCount'); expect(body).toHaveProperty('totalViolations'); + expect(body).toHaveProperty('requestedLevel', level); }); - it('each violation has all required fields', async () => { + it('each remediation item has all required fields', async () => { const server = createServer(); const result = await callTool(server, 'grade-api-remediation-safety', { specPath: OPENAPI_POOR, level: 'safe' }); expect(result.isError).toBeFalsy(); const body = JSON.parse(result.content[0].text); - if (body.quickFixes.length > 0) { - const v = body.quickFixes[0]; + if (body.remediationItems.length > 0) { + const v = body.remediationItems[0]; expect(v).toHaveProperty('ruleId'); expect(v).toHaveProperty('message'); expect(v).toHaveProperty('severity'); @@ -42,29 +43,33 @@ describe('grade-api-remediation-safety tool', () => { expect(v).toHaveProperty('location'); expect(v).toHaveProperty('currentValue'); expect(v).toHaveProperty('expectedImprovement'); + expect(v).toHaveProperty('riskLevel'); + expect(v).toHaveProperty('confidenceLevel'); + expect(v).toHaveProperty('remediationSafetyLevel', 'safe'); + expect(v).toHaveProperty('staleFingerprintWarning', null); expect(typeof v.expectedImprovement).toBe('string'); expect(v.expectedImprovement.length).toBeGreaterThan(0); } }); - it('no violation in quickFixes is a breaking change', async () => { + it('no violation in the safe level is a breaking change', async () => { const server = createServer(); const result = await callTool(server, 'grade-api-remediation-safety', { specPath: OPENAPI_POOR, level: 'safe' }); expect(result.isError).toBeFalsy(); const body = JSON.parse(result.content[0].text); - for (const v of body.quickFixes) { + for (const v of body.remediationItems) { expect(v.path).not.toContain('required'); expect(v.path).not.toContain('type'); } }); - it('quickFixCount matches quickFixes length', async () => { + it('remediationItemCount matches remediationItems length', async () => { const server = createServer(); const result = await callTool(server, 'grade-api-remediation-safety', { specPath: OPENAPI_MUSEUM, level: 'safe' }); expect(result.isError).toBeFalsy(); const body = JSON.parse(result.content[0].text); - expect(typeof body.quickFixCount).toBe('number'); - expect(body.quickFixCount).toBe(body.quickFixes.length); + expect(typeof body.remediationItemCount).toBe('number'); + expect(body.remediationItemCount).toBe(body.remediationItems.length); }); it('returns RULESET_NOT_FOUND for non-existent local ruleset', async () => { @@ -91,6 +96,6 @@ describe('grade-api-remediation-safety tool', () => { const server = createServer(); const tools = (server as unknown as { _registeredTools: ToolRegistry })._registeredTools; const tool = tools['grade-api-remediation-safety'] as unknown as { inputSchema: { parse: (v: unknown) => unknown } }; - expect(() => tool.inputSchema.parse({ specPath: OPENAPI_POOR, level: 'unsafe' })).toThrow(); + expect(() => tool.inputSchema.parse({ specPath: OPENAPI_POOR, level: 'breaking' })).toThrow(); }); }); diff --git a/packages/api-grade-mcp/tests/unit/classify.test.ts b/packages/api-grade-mcp/tests/unit/classify.test.ts index 7a39ea5..49de743 100644 --- a/packages/api-grade-mcp/tests/unit/classify.test.ts +++ b/packages/api-grade-mcp/tests/unit/classify.test.ts @@ -1,5 +1,5 @@ import { describe, it, expect } from 'vitest'; -import { classifyViolation } from '../../src/utils/classify.js'; +import { analyseRuleset, getRemediationSafety } from '../../src/utils/classify.js'; import type { Diagnostic } from '@dawmatt/api-grade-core'; function makeDiagnostic(overrides: Partial): Diagnostic { @@ -14,44 +14,40 @@ function makeDiagnostic(overrides: Partial): Diagnostic { }; } -describe('classifyViolation()', () => { - it('classifies operation-description as nonBreaking (rule ID override)', () => { +describe('classify.ts re-exports', () => { + it('analyseRuleset() classifies a safe rule', async () => { + const loadedRuleset = { + ruleset: { + rules: { + 'operation-description': { given: '$.paths[*][*]', then: { field: 'description', function: 'truthy' } }, + }, + }, + rulesetSource: 'custom' as const, + }; + const analysis = await analyseRuleset(loadedRuleset); + expect(analysis.rules[0].remediationSafetyLevel).toBe('safe'); + }); + + it('getRemediationSafety() looks up a violation against a RulesetAnalysis', async () => { + const loadedRuleset = { + ruleset: { + rules: { + 'operation-description': { given: '$.paths[*][*]', then: { field: 'description', function: 'truthy' } }, + }, + }, + rulesetSource: 'custom' as const, + }; + const analysis = await analyseRuleset(loadedRuleset); const d = makeDiagnostic({ ruleId: 'operation-description', path: ['paths', '/pets', 'get'] }); - expect(classifyViolation(d)).toBe('nonBreaking'); + const result = getRemediationSafety(d, analysis); + expect(result.remediationSafetyLevel).toBe('safe'); }); - it('classifies violation at required field as breaking', () => { - const d = makeDiagnostic({ ruleId: 'some-rule', path: ['paths', '/pets', 'get', 'parameters', '0', 'required'] }); - expect(classifyViolation(d)).toBe('breaking'); - }); - - it('classifies info-contact as nonBreaking (rule ID override)', () => { - const d = makeDiagnostic({ ruleId: 'info-contact', path: ['info'] }); - expect(classifyViolation(d)).toBe('nonBreaking'); - }); - - it('classifies violation with x- extension path as nonBreaking', () => { - const d = makeDiagnostic({ ruleId: 'some-rule', path: ['info', 'x-logo'] }); - expect(classifyViolation(d)).toBe('nonBreaking'); - }); - - it('classifies unknown path with no recognised segments as unknown', () => { - const d = makeDiagnostic({ ruleId: 'obscure-rule', path: ['components', 'securitySchemes', 'oauth2'] }); - expect(classifyViolation(d)).toBe('unknown'); - }); - - it('classifies oas3-examples-* rules as nonBreaking', () => { - const d = makeDiagnostic({ ruleId: 'oas3-examples-value-or-externalValue', path: ['paths', '/pets', 'get'] }); - expect(classifyViolation(d)).toBe('nonBreaking'); - }); - - it('classifies description path segment as nonBreaking', () => { - const d = makeDiagnostic({ ruleId: 'some-rule', path: ['info', 'description'] }); - expect(classifyViolation(d)).toBe('nonBreaking'); - }); - - it('classifies type path segment as breaking', () => { - const d = makeDiagnostic({ ruleId: 'some-rule', path: ['components', 'schemas', 'Pet', 'type'] }); - expect(classifyViolation(d)).toBe('breaking'); + it('getRemediationSafety() defaults to unsafe/low on lookup miss (FR-009)', () => { + const analysis = { rulesetSource: 'custom' as const, rules: [] }; + const d = makeDiagnostic({ ruleId: 'never-seen' }); + const result = getRemediationSafety(d, analysis); + expect(result.remediationSafetyLevel).toBe('unsafe'); + expect(result.confidenceLevel).toBe('low'); }); }); diff --git a/packages/backstage-plugin-api-grade-backend/package.json b/packages/backstage-plugin-api-grade-backend/package.json index a57647f..0fb3d81 100644 --- a/packages/backstage-plugin-api-grade-backend/package.json +++ b/packages/backstage-plugin-api-grade-backend/package.json @@ -1,6 +1,6 @@ { "name": "@dawmatt/backstage-plugin-api-grade-backend", - "version": "0.4.0", + "version": "0.5.0", "description": "Backstage backend plugin — grades API entity specs and returns results via HTTP", "keywords": [ "backstage", diff --git a/packages/backstage-plugin-api-grade/package.json b/packages/backstage-plugin-api-grade/package.json index b638cf3..0e631ba 100644 --- a/packages/backstage-plugin-api-grade/package.json +++ b/packages/backstage-plugin-api-grade/package.json @@ -1,6 +1,6 @@ { "name": "@dawmatt/backstage-plugin-api-grade", - "version": "0.4.0", + "version": "0.5.0", "description": "Backstage frontend plugin — displays API quality grades on API entity pages", "keywords": [ "backstage", diff --git a/specs/012-remediation-safety/checklists/requirements.md b/specs/012-remediation-safety/checklists/requirements.md new file mode 100644 index 0000000..792f444 --- /dev/null +++ b/specs/012-remediation-safety/checklists/requirements.md @@ -0,0 +1,34 @@ +# Specification Quality Checklist: Remediation Safety (Ruleset Analyser & Multi-Level Safety) + +**Purpose**: Validate specification completeness and quality before proceeding to planning +**Created**: 2026-06-23 +**Feature**: [spec.md](../spec.md) + +## Content Quality + +- [x] No implementation details (languages, frameworks, APIs) +- [x] Focused on user value and business needs +- [x] Written for non-technical stakeholders +- [x] All mandatory sections completed + +## Requirement Completeness + +- [x] No [NEEDS CLARIFICATION] markers remain +- [x] Requirements are testable and unambiguous +- [x] Success criteria are measurable +- [x] Success criteria are technology-agnostic (no implementation details) +- [x] All acceptance scenarios are defined +- [x] Edge cases are identified +- [x] Scope is clearly bounded +- [x] Dependencies and assumptions identified + +## Feature Readiness + +- [x] All functional requirements have clear acceptance criteria +- [x] User scenarios cover primary flows +- [x] Feature meets measurable outcomes defined in Success Criteria +- [x] No implementation details leak into specification + +## Notes + +- All checklist items pass on first draft; no [NEEDS CLARIFICATION] markers were required — reasonable defaults (documented in Assumptions) were used for the confidence-level scale, rationale field shape, and Backstage plugin scope. diff --git a/specs/012-remediation-safety/clarification-algorithm.md b/specs/012-remediation-safety/clarification-algorithm.md new file mode 100644 index 0000000..081633a --- /dev/null +++ b/specs/012-remediation-safety/clarification-algorithm.md @@ -0,0 +1,297 @@ +# Clarification - Remediation Safety + +## Document Purpose + +This document advises how to produce the specification for the remediation safety algorithm. The resultant specification will be called `automated_remediation_safety_algorithm_spec.md` and stored in the `spec/algorithms` folder. + +The specification will provide both a human readable description of the specification, including the rationale for key aspects of the algorithm, and the proposed algorithm. This specification will drive the implementation of the algorithm. + +## Purpose + +We are checking the output of a linter (Spectral) to estimate how safe it is to remediate each of the errors/warnings/etc returned after linting an OpenAPI or AsyncAPI specification. The estimated remediation safety levels are safe, human review, or unsafe. Human review indicates we can't confidently assess the safety level and recommend human review. + +The safety level indicates how likely it is that remediation will result in `breaking changes` to the API, requiring API consumers to change their code before they can continue to use the API. + +The goal of determining the safety level is to identify which API lint issue remediations could be safely automated using AI without impacting API consumers. + +## Constraints + +- The Spectral ruleset being assessed is configurable by the user. It may not have been seen before and the estimation of remediation safety level will usually need to be performed programatically. +- At the time we are estimating the remediation safety level we do not have a modified specification available. So we will be unable to verify whether a specific remediation attempt represents a breaking changes relative to the original API. +- You can **estimate** risk automatically, but you **cannot guarantee correctness** for an arbitrary new Spectral ruleset with zero human intervention. The reason is that Spectral rules can use JSONPath selectors plus built-in functions, **and** they can also use **custom JavaScript functions**. That means some rules are simple and machine-interpretable, while others are effectively arbitrary code whose remediation intent cannot be derived perfectly from the rule declaration alone. +- Use a conservative operating mode where “unclear” means a safety level of human review or unsafe. + +## Recommended High Level Approach + +1. When possible, load pre-calculated risks and safety levels for known rulesets. At a minimum this will include the default ruleset, but provide an option for users to pre-configured their own risk and safety levels for their rulesets. + +2. When no pre-calculated risks and safety levels are available, pre-process the ruleset Use an **automated risk estimator** that infers the likely remediation, and its likely consumer impact, for every rule and outputs both a **risk score** and a **confidence score**. + +3. Use the **risk score** and **confidence score** to calculate a **safety level** for every rule in the ruleset. + +4. Allow this information to be persisted, so users can clarify and enrich the risk levels so they can be reloaded for use in future. + +## Recommended High Level Estimating Model Approach + +The most effective fully automated approach is: + +1. **Parse the ruleset itself** to understand `given`, `then`, `function`, `field`, `formats`, `message`, and `description`. Spectral rules are explicitly structured that way. [\[github.com\]](https://github.com/stoplightio/spectral/blob/develop/docs/getting-started/3-rulesets.md), [\[docs.stoplight.io\]](https://docs.stoplight.io/docs/spectral/branches/develop/d3482ff0ccae9-rules) +2. **Map the targeted document locations** such as `$.paths`, parameters, request bodies, responses, channel addresses, operations, or metadata to a **contract-surface ontology** for OpenAPI and AsyncAPI. OpenAPI and AsyncAPI define these structures formally. [\[spec.openapis.org\]](https://spec.openapis.org/oas/v3.1.1.html), [\[asyncapi.com\]](https://www.asyncapi.com/docs/reference/specification/v3.1.0), [\[asyncapi.com\]](https://www.asyncapi.com/docs/concepts/asyncapi-document/adding-operations) +3. **Infer the minimal satisfying edit** that would make the finding disappear, using rule semantics where possible. Spectral findings are generated by applying functions to selected document locations, so the rule shape gives strong clues about what kind of edit would satisfy it. [\[github.com\]](https://github.com/stoplightio/spectral/blob/develop/docs/getting-started/3-rulesets.md), [\[docs.stoplight.io\]](https://docs.stoplight.io/docs/spectral/branches/develop/d3482ff0ccae9-rules), [\[docs.devne...cademy.com\]](https://docs.devnet-academy.com/docs/postman/api-governance/configurable-rules/spectral/index.html) +4. **Estimate whether that likely edit touches public contract elements** such as paths, parameters, request or response schemas, channel addresses, or send/receive operations. Those elements are what consumers depend on. [\[spec.openapis.org\]](https://spec.openapis.org/oas/v3.1.1.html), [\[asyncapi.com\]](https://www.asyncapi.com/docs/reference/specification/v3.1.0), [\[asyncapi.com\]](https://www.asyncapi.com/docs/concepts/asyncapi-document/dynamic-channel-address), [\[asyncapi.com\]](https://www.asyncapi.com/docs/concepts/asyncapi-document/adding-operations) +5. **Downgrade confidence** when the rule uses a custom function or when the remediation is ambiguous. Spectral explicitly supports custom JavaScript functions, so some rules are not safely explainable by static metadata alone. [\[github.com\]](https://github.com/stoplightio/spectral/blob/develop/docs/guides/5-custom-functions.md) + +For an unknown ruleset, the system should be designed to produce: + +* `estimatedRisk` (low, medium, high) +* `confidence` (low, medium, high) +* `remediationSafetyLevel` (safe, humanreview, unsafe) + +rather than a false claim of certainty. + +## Recommended approach for a new ruleset with no human intervention + +### 1. Treat the ruleset as data to be analysed + +Your system should ingest the ruleset itself, not just the lint output. Spectral rules expose at least: + +* `given` +* `then` +* `function` +* `field` +* `severity` +* `formats` +* `description` +* `message` [\[docs.stoplight.io\]](https://docs.stoplight.io/docs/spectral/branches/develop/d3482ff0ccae9-rules), [\[github.com\]](https://github.com/stoplightio/spectral/blob/develop/docs/getting-started/3-rulesets.md) + +That is enough to infer a lot about intent for many rules. + +#### Example + +A rule like this: + +```yaml +given: $.paths[*]~ +then: + function: pattern + functionOptions: + match: "^(\\/|[a-z0-9-.]+|{[a-zA-Z0-9_]+})+$" +``` + +clearly targets **path keys** and enforces a **naming convention**. Spectral’s own ruleset tutorial uses exactly this kind of example. [\[github.com\]](https://github.com/stoplightio/spectral/blob/develop/docs/getting-started/3-rulesets.md) + +That tells you immediately: + +* the finding touches a **public URI surface** +* the likely remediation is a **rename** +* renaming a real path is **consumer-affecting**. OpenAPI treats paths and path templating as part of the API contract. [\[github.com\]](https://github.com/stoplightio/spectral/blob/develop/docs/getting-started/3-rulesets.md), [\[swagger.io\]](https://swagger.io/specification/), [\[spec.openapis.org\]](https://spec.openapis.org/oas/v3.1.1.html) + +*** + +### 2. Build a format-aware contract-surface ontology + +Your system needs a built-in understanding of which parts of OpenAPI and AsyncAPI are more likely to affect consumers. + +#### For OpenAPI + +OpenAPI formally describes HTTP API structure including `paths`, operations, parameters, request bodies, and responses, and says the description is used by documentation generators, code generators, and testing tools. [\[swagger.io\]](https://swagger.io/specification/), [\[spec.openapis.org\]](https://spec.openapis.org/oas/v3.1.1.html) + +So you should classify targeted locations roughly like this: + +##### High consumer-impact areas + +* `paths` keys +* path template variables +* parameters +* request bodies +* response bodies +* response codes +* security requirements +* reusable schemas referenced by the above [\[swagger.io\]](https://swagger.io/specification/), [\[spec.openapis.org\]](https://spec.openapis.org/oas/v3.1.1.html) + +##### Medium consumer-impact areas + +* `operationId` +* tags or names used by codegen and docs +* component identifiers used in client generation [\[swagger.io\]](https://swagger.io/specification/), [\[spec.openapis.org\]](https://spec.openapis.org/oas/v3.1.1.html) + +##### Low consumer-impact areas + +* descriptions +* contact metadata +* licence metadata +* summaries, where not used as identifiers [\[github.com\]](https://github.com/stoplightio/spectral/blob/develop/docs/getting-started/3-rulesets.md), [\[docs.stoplight.io\]](https://docs.stoplight.io/docs/spectral/branches/develop/d3482ff0ccae9-rules) + +##### Important example + +OpenAPI path templating requires that each template expression correspond to a path parameter. That means a rule targeting path-template consistency is touching a real contract concern, even if worded as “correctness” rather than “compatibility”. [\[swagger.io\]](https://swagger.io/specification/), [\[spec.openapis.org\]](https://spec.openapis.org/oas/v3.1.1.html) + +*** + +#### For AsyncAPI + +AsyncAPI formally describes channels, operations, messages, and action semantics such as `send` and `receive`, and also describes dynamic channel addresses and parameters. [\[asyncapi.com\]](https://www.asyncapi.com/docs/reference/specification/v3.1.0), [\[asyncapi.com\]](https://www.asyncapi.com/docs/concepts/asyncapi-document/dynamic-channel-address), [\[asyncapi.com\]](https://www.asyncapi.com/docs/concepts/asyncapi-document/adding-operations) + +So you should classify: + +##### High consumer-impact areas + +* channel `address` +* channel parameters tied to address placeholders +* operation `action` +* operation-channel relationship +* messages and payload schemas +* reply or operation semantics if covered by the ruleset [\[asyncapi.com\]](https://www.asyncapi.com/docs/reference/specification/v3.1.0), [\[asyncapi.com\]](https://www.asyncapi.com/docs/concepts/asyncapi-document/dynamic-channel-address), [\[asyncapi.com\]](https://www.asyncapi.com/docs/concepts/asyncapi-document/adding-operations) + +##### Low consumer-impact areas + +* metadata such as descriptions and contact details [\[asyncapi.com\]](https://www.asyncapi.com/docs/reference/specification/v3.1.0) + +*** + +### 3. Infer the likely remediation from the rule mechanics + +This is the key step. + +#### If the rule uses a built-in structural function + +Many rules are understandable from the rule body alone. + +Examples: + +* `truthy` on a descriptive field usually means “add the missing field” or “make it non-empty” [\[docs.stoplight.io\]](https://docs.stoplight.io/docs/spectral/branches/develop/d3482ff0ccae9-rules) +* `pattern` usually means “rename or reformat the targeted value” [\[github.com\]](https://github.com/stoplightio/spectral/blob/develop/docs/getting-started/3-rulesets.md), [\[docs.devne...cademy.com\]](https://docs.devnet-academy.com/docs/postman/api-governance/configurable-rules/spectral/index.html) +* `field` plus `truthy` usually means “add a missing sub-field” [\[docs.stoplight.io\]](https://docs.stoplight.io/docs/spectral/branches/develop/d3482ff0ccae9-rules) + +That lets you classify the likely remediation with no manual rule catalogue. + +#### If the rule uses a custom JavaScript function + +Then you must assume lower explainability, because Spectral custom functions are arbitrary JavaScript functions with access to input, options, and rule context. [\[github.com\]](https://github.com/stoplightio/spectral/blob/develop/docs/guides/5-custom-functions.md) + +In that case, your system should: + +* inspect metadata such as `description`, `message`, `given`, `formats`, and `path` +* inspect the custom function name and file if available +* attempt static analysis only if safe and possible +* otherwise classify as `UnknownSemantics` with low confidence. [\[github.com\]](https://github.com/stoplightio/spectral/blob/develop/docs/guides/5-custom-functions.md), [\[docs.stoplight.io\]](https://docs.stoplight.io/docs/spectral/branches/develop/d3482ff0ccae9-rules) + +*** + +### 4. Use “minimal satisfying edit” as the risk basis + +The best no-human-intervention heuristic is: + +> **Estimate risk from the least invasive valid edit that would satisfy the rule.** + +Why this works: + +* lint findings do not directly encode the final remediation +* but many rules imply a **smallest-change fix** +* breakage risk is driven by what that smallest plausible fix changes + +#### Examples + +##### Example A: missing description + +A rule targeting `$.info.description` with `truthy` implies adding documentation text. That is low contract risk. [\[docs.stoplight.io\]](https://docs.stoplight.io/docs/spectral/branches/develop/d3482ff0ccae9-rules), [\[github.com\]](https://github.com/stoplightio/spectral/blob/develop/docs/getting-started/3-rulesets.md) + +##### Example B: path kebab-case + +A rule targeting `$.paths[*]~` with `pattern` implies renaming path keys to match the pattern. If that key is a public endpoint path, the minimal satisfying edit changes the API surface and is therefore high risk. [\[github.com\]](https://github.com/stoplightio/spectral/blob/develop/docs/getting-started/3-rulesets.md), [\[swagger.io\]](https://swagger.io/specification/), [\[spec.openapis.org\]](https://spec.openapis.org/oas/v3.1.1.html) + +##### Example C: AsyncAPI channel parameter rule + +A rule targeting channel parameters may be satisfiable either by adding the missing parameter definition or by changing the address. The first is probably safer than the second. AsyncAPI documentation explicitly describes address placeholders and parameter declarations together. [\[asyncapi.com\]](https://www.asyncapi.com/docs/concepts/asyncapi-document/dynamic-channel-address), [\[asyncapi.com\]](https://www.asyncapi.com/docs/reference/specification/v3.1.0) + +This is exactly why your algorithm must output **risk plus confidence**, not just risk. + +*** + +### 5. Separate risk from confidence from safety level + +This is essential. + +A rule can be: + +* **high risk, high confidence** + Example: pattern rule directly targeting `$.paths[*]~` and requiring literal path rename. [\[github.com\]](https://github.com/stoplightio/spectral/blob/develop/docs/getting-started/3-rulesets.md), [\[swagger.io\]](https://swagger.io/specification/) + +* **high risk, low confidence** + Example: custom function on `$.paths` with an unclear remediation path. [\[github.com\]](https://github.com/stoplightio/spectral/blob/develop/docs/guides/5-custom-functions.md), [\[docs.stoplight.io\]](https://docs.stoplight.io/docs/spectral/branches/develop/d3482ff0ccae9-rules) + +* **low risk, high confidence** + Example: truthy rule on `$.info.description`. [\[docs.stoplight.io\]](https://docs.stoplight.io/docs/spectral/branches/develop/d3482ff0ccae9-rules), [\[github.com\]](https://github.com/stoplightio/spectral/blob/develop/docs/getting-started/3-rulesets.md) + +Without this separation, the system will either over-block harmless changes or under-block dangerous ones. + +The implementation shall decide remediation safety as follows: +If estimatedRisk = Low and confidence in {High, Medium}: + remediationSafety = safe + +Else if estimatedRisk = Medium and confidence = High: + automationDecision = human review + +Else if estimatedRisk = High: + remediationSafety = unsafe + +Else: + remediationSafety = human review + + +## References + +- [OpenAPI Breaking Changes: The Complete List of Rules | oasdiff](https://www.oasdiff.com/docs/breaking-changes) +- [Backward Compatibility Rules | Specmatic](https://docs.specmatic.io/contract_driven_development/backward_compatibility_rules) +- [Spectral Ruleset Functions](https://github.com/stoplightio/spectral/blob/develop/docs/guides/5-custom-functions.md) + +## Additional Context + + +### Spectral Rule Facts + +* Spectral rules are built from selectors and functions, and rulesets can extend built-in format-specific support for OpenAPI and AsyncAPI. [\[github.com\]](https://github.com/stoplightio/spectral/blob/develop/docs/getting-started/3-rulesets.md), [\[docs.stoplight.io\]](https://docs.stoplight.io/docs/spectral/branches/develop/d3482ff0ccae9-rules) +* Spectral supports custom JavaScript functions. [\[github.com\]](https://github.com/stoplightio/spectral/blob/develop/docs/guides/5-custom-functions.md) +* OpenAPI descriptions define paths, operations, parameters, requests, and responses, and are used by documentation, testing, and code generation tooling. [\[spec.openapis.org\]](https://spec.openapis.org/oas/v3.1.1.html) +* AsyncAPI descriptions define channels, operations, actions, and channel-address parameters. [\[asyncapi.com\]](https://www.asyncapi.com/docs/reference/specification/v3.1.0), [\[asyncapi.com\]](https://www.asyncapi.com/docs/concepts/asyncapi-document/dynamic-channel-address), [\[asyncapi.com\]](https://www.asyncapi.com/docs/concepts/asyncapi-document/adding-operations) + + +### Risk estimation not verification + +The main obstacle is that Spectral is not just a fixed linter with a closed catalogue of built-in checks. It is a **ruleset engine** where rules consist of selectors and functions, and those functions can be **custom JavaScript**. A rule can target any JSON or YAML location with JSONPath and apply either a core function or custom logic. [\[github.com\]](https://github.com/stoplightio/spectral/blob/develop/docs/getting-started/3-rulesets.md), [\[docs.stoplight.io\]](https://docs.stoplight.io/docs/spectral/branches/develop/d3482ff0ccae9-rules), [\[github.com\]](https://github.com/stoplightio/spectral/blob/develop/docs/guides/5-custom-functions.md) + +That means a new ruleset may contain: + +* obvious style rules such as “path should be kebab-case”, where the likely remediation is easy to infer from the `pattern` function and the path selector, or [\[github.com\]](https://github.com/stoplightio/spectral/blob/develop/docs/getting-started/3-rulesets.md), [\[docs.stoplight.io\]](https://docs.stoplight.io/docs/spectral/branches/develop/d3482ff0ccae9-rules) +* arbitrary business-specific rules implemented in code, where the likely remediation may not be derivable statically with high confidence. [\[github.com\]](https://github.com/stoplightio/spectral/blob/develop/docs/guides/5-custom-functions.md) + +Therefore, a zero-human system **must be conservative by design**: + +* **estimate risk automatically** +* **attach confidence** +* **default to safer outcomes when semantics are unclear**. [\[github.com\]](https://github.com/stoplightio/spectral/blob/develop/docs/guides/5-custom-functions.md), [\[docs.stoplight.io\]](https://docs.stoplight.io/docs/spectral/branches/develop/d3482ff0ccae9-rules) + +### What changes when the ruleset is unknown? + +With a known ruleset, you can maintain a curated mapping from rule ID to likely remediation class. With an unknown ruleset, you cannot rely on rule names or prior human tagging. Instead, you must infer risk from the **rule structure** and the **API object model**. Spectral provides enough rule structure to do that in many cases because rules expose selectors, functions, and optional textual descriptions. [\[github.com\]](https://github.com/stoplightio/spectral/blob/develop/docs/getting-started/3-rulesets.md), [\[docs.stoplight.io\]](https://docs.stoplight.io/docs/spectral/branches/develop/d3482ff0ccae9-rules), [\[docs.devne...cademy.com\]](https://docs.devnet-academy.com/docs/postman/api-governance/configurable-rules/spectral/index.html) + +So the problem becomes: + +> **Given a Spectral rule and a finding, what is the most likely minimal edit that would satisfy the rule, and would that edit alter the public contract surface?** + +That is the right abstraction for unknown rulesets. + +### Expected user-specific ruleset usage behaviour + +This tool is open source so we can't be certain which rulesets users will choose to use as the basis for grading. We need to allow for a wide range of rulesets when estimating remediation safety. + +For any given user, it is reasonable to expect they will choose a limited number of rulesets (typically 1) they will use when they perform API grading. This implies we will likely end up automatically assessing the remediation safety of a ruleset many times for a single user. + +We recommend adding the ability for the user to: +- persist the ruleset remediation safety assessment; +- update the ruleset remediation safety assessment to align with their interpretation of remediation safety; and +- automatically load the ruleset remediation safety assessment when next using this ruleset. + +Benefits of this feature: +- "human review" safety level indicates we can't safely estimate the risk. The user can perform this review once and then encode the correct safety level for this rule in this ruleset. +- improved performance though avoiding the need to estimate ruleset safety on every run diff --git a/specs/012-remediation-safety/contracts/remediation-safety-surfaces.md b/specs/012-remediation-safety/contracts/remediation-safety-surfaces.md new file mode 100644 index 0000000..cfd9ab0 --- /dev/null +++ b/specs/012-remediation-safety/contracts/remediation-safety-surfaces.md @@ -0,0 +1,49 @@ +# Contract: Remediation Safety Surfaces (Multi-Level + Ruleset Analyser) + +Supersedes `specs/011-remediation-safety-rename/contracts/remediation-safety-surfaces.md` for the surfaces below; that document remains historical record of the Feature 11 rename. This contract covers the full implementation: three risk levels, confidence, and the new ruleset-analysis surfaces. + +## CLI: `--remediation-safety ` + +| Before this feature | After this feature | +|---|---| +| Accepts only `safe`; any other value rejected with `Error: --remediation-safety must be "safe".` | Accepts `safe`, `humanreview`, `unsafe`. Any other value rejected with `Error: --remediation-safety must be one of: safe, humanreview, unsafe.` | +| Filtered output built by `buildQuickFixOutput`/`formatQuickFixesHuman`, shape `QuickFixOutput` (`quickFixCount`, `quickFixes`). | Filtered output built by `buildRemediationSafetyOutput`/`formatRemediationSafetyHuman`, shape `RemediationSafetyOutput` (`remediationItemCount`, `remediationItems`, `requestedLevel`). Each item additionally carries `riskLevel` (`low`/`medium`/`high`), `confidenceLevel`, `remediationSafetyLevel` (`safe`/`humanreview`/`unsafe` — a field in its own right, not the same field/type as `riskLevel`), and `staleFingerprintWarning` (`null` unless the rule's classification is human-assessed and its fingerprint no longer matches — FR-021). | +| `--remediation-safety safe` output identical to pre-Feature-12 `safe` output in violation membership. | Unchanged for `safe` membership (FR-007); new fields (`riskLevel`, `confidenceLevel`, `remediationSafetyLevel`, `requestedLevel`) are additive. `--remediation-safety`/`requestedLevel` filter against `remediationSafetyLevel`, not `riskLevel`. | +| — | `severity` and `range` on each `RemediationItem` MUST be carried over unchanged from the underlying `Diagnostic` (FR-022/SC-010) — filtering/reshaping into a `RemediationItem` is strictly additive, never lossy, relative to the regular diagnostic. | +| — | `--format json` output (this filtered shape, the regular `CommonGradeOutput` shape, and `ruleset-analysis`'s `RulesetAnalysis` shape) is always pretty-printed (FR-023/SC-011) — `JSON.stringify(value, null, 2)`, never a single compact line. | +| — | Per-violation `riskLevel`/`confidenceLevel`/`remediationSafetyLevel`/`staleFingerprintWarning` are also surfaced on the **regular**, unfiltered `--format json`/`--format human` output (FR-024/SC-012) — i.e. without `--remediation-safety` supplied at all — by decorating `CommonGradeOutput.diagnostics` (now typed `Diagnostic[] | DiagnosticWithSafety[]`) in place. | + +## CLI: new `ruleset-analysis` subcommand + +```text +api-grade ruleset-analysis [--ruleset-path ] [--format json|human] +``` + +- Without `--ruleset-path`, analyses the built-in default ruleset for the relevant format(s). +- `--format json` returns a `RulesetAnalysis` JSON document. +- `--format human` (default) prints a table: rule id, risk level, confidence level, remediation safety level, assessed by (`human`/`automated` — FR-020), rationale, plus a fingerprint-mismatch warning line for any human-assessed rule whose stored fingerprint no longer matches (FR-021). Risk level and confidence level are the two independent signals the analyser produces (FR-003); remediation safety level is a field in its own right, derived from them via the decision matrix in `automated_remediation_safety_algorithm_spec.md`, not assigned directly — except for `assessed by: human` rows, which store `remediationSafetyLevel` directly and have no `riskLevel` to derive it from. +- Exits non-zero only on a genuine error (e.g. ruleset file not found / unparseable) — analysis itself never partially fails (every rule gets an entry, per FR-001/SC-005). + +## MCP: `grade-api-remediation-safety` tool — `level` parameter + +| Before this feature | After this feature | +|---|---| +| `level: z.enum(['safe'])` | `level: z.enum(['safe', 'humanreview', 'unsafe'])` | +| Response payload: `QuickFixOutput` shape under different field names (`quickFixCount`, `quickFixes`) | Response payload: `RemediationSafetyOutput` shape (`remediationItemCount`, `remediationItems`, `requestedLevel`); each item includes `riskLevel`, `confidenceLevel`, `remediationSafetyLevel`, `staleFingerprintWarning` | +| Tool description silent on confidence/risk-tier concept | Tool description updated to mention all three levels and that each returned item carries a confidence indicator | + +## MCP: new `analyse-ruleset-safety` tool + +```text +Tool: analyse-ruleset-safety +Input: { rulesetPath?: string, recoveryOption?: 'retry' | 'use-builtin-once' | 'use-builtin-session' | 'cancel' } +Output: RulesetAnalysis JSON (rulesetSource, rulesetPath?, rules[]) +``` + +- Follows the same ruleset-resolution/recovery-option flow already used by `grade-api-remediation-safety` and `set-ruleset-config`/`get-ruleset-config` (reuses `resolveRuleset`, `RulesetAuthError`, `mcpError`/`ERROR_CODES`). +- Self-describing per the constitution's AI Integration Requirements: description alone is sufficient for an MCP host to know when to call it (inspecting a ruleset's remediation risk without grading any spec). + +## Out of scope for this contract + +- No change to how a ruleset is supplied/located (file path, GitHub PAT, workspace/global config) — only to what is computed once it's loaded. +- Backstage plugin packages are not touched — they do not currently surface quick-fix/remediation-safety information (confirmed: no "quick fix" references found in `packages/backstage-plugin-*`). diff --git a/specs/012-remediation-safety/data-model.md b/specs/012-remediation-safety/data-model.md new file mode 100644 index 0000000..a88e4d7 --- /dev/null +++ b/specs/012-remediation-safety/data-model.md @@ -0,0 +1,180 @@ +# Data Model: Remediation Safety (Ruleset Analyser & Multi-Level Safety) + +## RemediationSafetyLevel + +Enum string: `"safe"` | `"humanreview"` | `"unsafe"`. Ordered from least to most cautious. Replaces the prior two-class `ViolationClass` (`nonBreaking`/`breaking`/`unknown`). + +## RiskLevel + +Enum string: `"low"` | `"medium"` | `"high"`. The analyser's **estimate of consumer-impact likelihood** for the minimal edit that would satisfy a rule — independent of how confident the analyser is in that estimate (`ConfidenceLevel`, below) and independent of the `RemediationSafetyLevel` it resolves to (see Decision Matrix). Carried on the `riskLevel` field of both `RuleAnalysis` and `RemediationItem` — a deliberately distinct field, with a distinct type and distinct values, from each entity's separate `remediationSafetyLevel` field. An earlier version of this document conflated the two under one field also named `riskLevel` but typed as `RemediationSafetyLevel`; that was incorrect and is corrected throughout this document. + +## ConfidenceLevel + +Enum string: `"high"` | `"medium"` | `"low"`. Describes how confident the ruleset analyser is in a `RuleAnalysis`'s/`RemediationItem`'s assigned `riskLevel` — **not** directly in `remediationSafetyLevel`, though it feeds into deriving that value via the Decision Matrix below. + +- `high` — the rule's `given` selected path/channel object keys directly (Stage 1a); a recognized function (`truthy`/`pattern`/etc.) targeted an ontology area matching exactly one tier (Stage 1b); or the entry came from a Stage 0 lookup (persisted user correction, shared colocated analysis, or bundled pre-calculated default). +- `medium` — a recognized function's target spanned more than one ontology tier (Stage 1b), or the generic segment fallback matched a single, unambiguous tier (Stage 1c). +- `low` — the rule's function is unrecognized/custom (Stage 1b), the generic segment fallback matched more than one tier (Stage 1c, genuine ambiguity), or no recognizable signal at all (Stage 2 fallback). + +## AssessmentOrigin + +Enum string: `"human"` | `"automated"`. Who produced a `RuleAnalysis` entry's `remediationSafetyLevel` judgement — carried on the `assessedBy` field, independent of `source` (which stage/store produced the entry) and independent of `confidenceLevel` (how confident an *automated* judgement is in itself). `"human"` means a person explicitly reviewed and persisted this rule's classification (FR-013), including the built-in ruleset's well-known rules, which are authored by a maintainer rather than computed by Stage 1/2 — see `research.md` §3/§8. `"automated"` means Stage 1 or Stage 2 produced it with no human review, whether freshly computed or pre-computed and cached in `BundledRulesetAnalysis` at release time. This distinction governs fingerprint-staleness handling: see `RuleFingerprint` below. + +## AnalysisSource + +Enum string: `"persisted"` | `"bundled-default"` | `"heuristic"` | `"fallback"`. Provenance of a `RuleAnalysis` entry — which stage of the algorithm (`automated_remediation_safety_algorithm_spec.md`) produced it. `"persisted"` covers both `SharedRulesetAnalysis` and `PersonalRulesetAnalysisOverride` lookups; `"heuristic"`/`"fallback"` are Stage 1/Stage 2 respectively. There is no `"curated"` value — the former hard-coded curated rule-id table no longer exists as a separate stage; its content is now `BundledRulesetAnalysis` entries with `source: "bundled-default"`, `assessedBy: "human"` (see `AssessmentOrigin`). Not used for classification logic itself, but surfaced so a user inspecting analyser output (FR-011) can tell which store/stage produced an entry — `assessedBy` is the complementary field for *who* (human vs. automated) made the judgement. + +## RuleAnalysis + +One entry per rule in an analysed ruleset. + +| Field | Type | Description | +|---|---|---| +| `ruleId` | string | The rule's identifier within its ruleset. | +| `riskLevel` | `RiskLevel` \| `null` | The analyser's estimate of consumer-impact likelihood (Stages 1–2 only — see Decision Matrix below). `null` for `source: "persisted"` / `"bundled-default"` entries, which store a human-confirmed or pre-computed `remediationSafetyLevel` directly rather than deriving it from a risk estimate. | +| `confidenceLevel` | `ConfidenceLevel` | Confidence in `riskLevel` (Stages 1–2), or the confidence carried over from a Stage 0 entry. | +| `remediationSafetyLevel` | `RemediationSafetyLevel` | A field in its own right — the final assigned safety level for auto-remediating violations of this rule. For Stages 1–2, derived from `riskLevel` + `confidenceLevel` via the Decision Matrix below — never assigned directly by a stage. For Stage 0 entries, this is the stored value itself. | +| `assessedBy` | `AssessmentOrigin` | Who produced this judgement — `"human"` for Stage 0 entries written via a persisted correction or authored by a maintainer for the built-in ruleset; `"automated"` for everything else (Stage 1, Stage 2, and any pre-computed `BundledRulesetAnalysis` entry without a maintainer judgement behind it). | +| `staleFingerprintWarning` | `{ storedFingerprint: string; currentFingerprint: string; message: string }` \| `null` | Set only when this entry came from a Stage 0 lookup with `assessedBy: "human"` whose stored `RuleFingerprint` no longer matches the rule's current fingerprint (see `RuleFingerprint` below) — the entry is still used, but flagged. `null` in every other case, including a fingerprint match. | +| `rationale` | string | Short human-readable explanation of why this level/confidence was assigned (e.g. "maintainer-confirmed safe classification" or "`pattern` function on a `paths` object key — public-surface rename"). | +| `source` | `AnalysisSource` | Which stage produced this entry — see above. | + +**Validation rules**: every rule present in the input ruleset MUST produce exactly one `RuleAnalysis` entry (FR-001, SC-005) — no rule is ever omitted from analyser output. For `source` values `"heuristic"` and `"fallback"` (Stages 1–2), `remediationSafetyLevel` MUST equal `decisionMatrix(riskLevel, confidenceLevel)` — it is a derived value, not independently settable, and `assessedBy` MUST be `"automated"`. For `source` values `"persisted"` and `"bundled-default"`, `remediationSafetyLevel` is the stored value and `assessedBy` MUST reflect how that stored value was produced (see `AssessmentOrigin`). + +### Decision Matrix + +The single function shared by Stages 1–2 to derive `remediationSafetyLevel` from `riskLevel` and `confidenceLevel`, taken verbatim from `clarification-algorithm.md` §5 (see `research.md` §3 for how each stage produces its `riskLevel`/`confidenceLevel` inputs): + +``` +If riskLevel = low and confidenceLevel in {high, medium}: remediationSafetyLevel = safe +Else if riskLevel = medium and confidenceLevel = high: remediationSafetyLevel = humanreview +Else if riskLevel = high: remediationSafetyLevel = unsafe +Else: remediationSafetyLevel = humanreview +``` + +This table is total over the 3×3 `(riskLevel, confidenceLevel)` space — every combination not explicitly listed (`low`/`low`, `medium`/`medium`, `medium`/`low`) falls into the final `Else` and resolves to `humanreview`, so there is no input pair this function leaves unresolved. + +## RuleFingerprint + +A stable identifier for "this exact rule definition" (FR-014), used as part of the lookup/storage key for persisted entries. Deliberately scoped to one rule, not the whole ruleset — see `research.md` §8 for why a whole-ruleset hash was rejected (it would invalidate an entire shared analysis file on any single rule edit). + +| Field | Type | Description | +|---|---|---| +| `value` | string | Hash over one rule's own content: `ruleId`, `given`, `then.function`, `severity`, `description`. | + +**Relationships**: Computed independently per `ruleId`; never derived from `rulesetPath`/`rulesetUrl`. Reuse of a stored entry on fingerprint mismatch depends on `AssessmentOrigin` (per direct user feedback): +- `assessedBy: "automated"` entry, fingerprint mismatch → treated as not found; skipped per-rule (not per-ruleset), falls through to Stages 1–2 (spec.md Edge Cases). +- `assessedBy: "human"` entry, fingerprint mismatch → still reused as-is (the stored `remediationSafetyLevel`/`riskLevel`/`confidenceLevel` are returned unchanged), but the returned `RuleAnalysis.staleFingerprintWarning` is populated with both the stored and current fingerprint values, so callers can detect and surface that the rule changed since a human reviewed it without the system second-guessing that review. +- Either origin, fingerprint match → reused with `staleFingerprintWarning: null`. + +## RulesetAnalysis + +| Field | Type | Description | +|---|---|---| +| `rulesetSource` | `"default" \| "custom"` | Mirrors `GradeResult.rulesetSource`. | +| `rulesetPath` | string (optional) | Present when `rulesetSource === "custom"`. | +| `rules` | `RuleAnalysis[]` | One entry per rule, see above. May be assembled from a mix of `source` values — some rules from Stage 0 (persisted/shared/bundled), the rest from Stages 1–2. | + +**Relationships**: Computed once per distinct rule definition (keyed by `RuleFingerprint`, see `SharedRulesetAnalysis`/`PersonalRulesetAnalysisOverride` below), not merely cached for the lifetime of one process — this corrects the original design, which assumed no cross-invocation persistence (see `research.md` §8, added after reassessment against `clarification-algorithm.md` and further revised after direct user input on the sharing requirement). `GradeEngine` (or a caller wrapping it) holds the `RulesetAnalysis` alongside the loaded ruleset for the duration of a single run and consults it when building remediation-safety output, rather than recomputing per violation; across separate runs — and across different users pointed at the same ruleset — the persisted/shared/bundled layer (Stage 0) is what avoids recomputing per-rule classification for rules it covers. + +## SharedRulesetAnalysis (new) + +A partial or full `RulesetAnalysis`, **colocated with the ruleset itself** (FR-016/FR-017) so it can be reloaded automatically on future runs against the same ruleset — by anyone who can read that ruleset, not just the user who created it. + +| Field | Type | Description | +|---|---|---| +| `location` | string | Derived deterministically from the ruleset's own path/URL via a fixed naming convention (e.g. appending a suffix to the ruleset's filename) — never a separately-tracked or registered location. | +| `rules` | `Record` | Keyed by `ruleId`. May cover all or only some of a ruleset's rules — uncovered rules are simply absent from the map. Each entry carries the `RuleFingerprint.value` it was captured against, for staleness detection, and an `assessedBy` value (see `AssessmentOrigin`) that determines whether a later fingerprint mismatch invalidates the entry or merely warns. | + +**Validation rules**: every `RuleAnalysis` value in `rules` MUST have `source: "persisted"`. `assessedBy` is typically `"human"` here — writing to this file is the act of a user persisting a correction (FR-013) — but is not constrained to `"human"` by the shape itself, since a future automated caching write would use the same record shape with `assessedBy: "automated"`. For a local ruleset this file lives on disk next to the ruleset and is read/written directly; for a GitHub-hosted ruleset it is *read* via the same `resolveRuleset`/`fetchRulesetContent` flow already used to fetch the ruleset (FR-017), but is never *written* automatically (FR-019) — see `PersonalRulesetAnalysisOverride` for what happens when a write is requested against a non-writable location. + +## PersonalRulesetAnalysisOverride (new, replaces the original PersistedRulesetAnalysis) + +A user-local correction (FR-018) that does not modify `SharedRulesetAnalysis`. Reuses the existing workspace/global config-file scope already established for `RulesetConfig` (`packages/api-grade-core/src/config/ruleset-config.ts`), narrowed to this role rather than serving as the primary persistence mechanism. + +| Field | Type | Description | +|---|---|---| +| `scope` | `"workspace" \| "global"` | Storage scope, reusing the precedence already established by `RulesetScope`/`RulesetResolution` for ruleset *selection* (workspace checked before global). | +| `rules` | `Record` | Keyed by `ruleId`, same shape as `SharedRulesetAnalysis.rules`. | + +**Validation rules**: every `RuleAnalysis` value in `rules` MUST have `source: "persisted"`. This is also the write target when a correction is requested against a ruleset whose location is not writable (e.g. GitHub-hosted) — see Stage 4 of the algorithm spec for the exact fallback behavior. + +## BundledRulesetAnalysis (new) + +The built-in ruleset's pre-calculated analysis, shipped with the package (FR-012's "at a minimum the default ruleset" baseline). Same shape as `RulesetAnalysis`, committed alongside the package source and not regenerated at runtime. Every entry has `source: "bundled-default"`, but `assessedBy` varies per entry: well-known built-in rules (the ones a hard-coded curated table used to cover, before being folded into this mechanism per direct user feedback — see `research.md` §3/§8) are authored directly by a maintainer and stored as `assessedBy: "human"`; the remainder are generated once at release time by running Stages 1–2 over the built-in ruleset and stored as `assessedBy: "automated"`. There is no separate hard-coded table for the human-authored entries — they are ordinary `BundledRulesetAnalysis` records, edited the same way a maintainer would edit any other persisted analysis file. + +## Lookup precedence (Stage 0) + +For a given `ruleId`, checked in order: workspace-scoped `PersonalRulesetAnalysisOverride` → global-scoped `PersonalRulesetAnalysisOverride` → `SharedRulesetAnalysis` colocated with the ruleset → `BundledRulesetAnalysis` (only if this is the built-in ruleset) → fall through to Stages 1–2 of the algorithm. Personal overrides are checked first because they represent the most specific, most recently expressed intent for that user. + +A store entry is used "until one matches a current `RuleFingerprint`" only for `assessedBy: "automated"` entries — an `assessedBy: "human"` entry is used as soon as it is found, fingerprint match or not (with `staleFingerprintWarning` populated on mismatch, per `RuleFingerprint` above). This means an earlier-precedence human entry always wins over a later-precedence store, even across a fingerprint mismatch; only when an entry is `"automated"` and its fingerprint is stale does the lookup continue to the next store in precedence order, exactly as before this revision. + +## RemediationItem (was `QuickFix`) + +| Field | Type | Description | +|---|---|---| +| `ruleId` | string | Unchanged from today's `QuickFix.ruleId`. | +| `message` | string | Unchanged. | +| `severity` | `DiagnosticSeverity` (`"error"` \| `"warn"` \| `"info"` \| `"hint"`) | The violation's actual severity, carried over unchanged from `Diagnostic.severity`. **Regression note**: an earlier implementation derived this from a numeric-severity assumption (`typeof diagnostic.severity === 'number'`) that no longer held once `Diagnostic.severity` became a string enum, so every item silently reported `"warn"` regardless of true severity. `buildRemediationItem()` MUST assign `severity: diagnostic.severity` directly — never re-derive it via a numeric lookup table. | +| `path` | string[] | Unchanged. | +| `location` | string | Unchanged. | +| `range` | `Diagnostic['range']` | **Restored** — carried over unchanged from `Diagnostic.range` (line/character start/end). Without it, a `RemediationItem` cannot be located in the source file by line number, which defeats the "actionable" requirement (FR-008/Principle VI) once a violation has been filtered out of the regular diagnostics list. `formatRemediationSafetyHuman()` MUST render it (`Line N`) the same way `formatHuman()` does for plain diagnostics. | +| `currentValue` | string \| null | Unchanged. | +| `expectedImprovement` | string | Unchanged. | +| `riskLevel` | `RiskLevel` \| `null` | **New** — the violation's rule-level estimated risk (`low`/`medium`/`high`), looked up from the rule's `RuleAnalysis`. `null` when the lookup hit a Stage 0 entry that has no `riskLevel` of its own (see `RuleAnalysis`). | +| `confidenceLevel` | `ConfidenceLevel` | **New** — confidence behind `riskLevel`, from the same lookup. | +| `remediationSafetyLevel` | `RemediationSafetyLevel` | **New** — a field in its own right, distinct from `riskLevel` both in name and in type/values (`safe`/`humanreview`/`unsafe`, not `low`/`medium`/`high`). The violation's computed remediation safety, looked up from the rule's `RuleAnalysis.remediationSafetyLevel`. This is the field `--remediation-safety`/`level` filtering matches against. | +| `staleFingerprintWarning` | `{ storedFingerprint: string; currentFingerprint: string; message: string }` \| `null` | **New** — carried over verbatim from the rule's `RuleAnalysis.staleFingerprintWarning`, so a CI pipeline or human reading per-violation output sees the same "this rule changed since a human reviewed it" warning without needing to separately inspect the ruleset analysis. | + +## DiagnosticWithSafety (regular grade output, additive) + +| Field | Type | Description | +|---|---|---| +| *(all `Diagnostic` fields)* | — | Unchanged — `ruleId`, `message`, `severity`, `path`, `range`, `source`. | +| `riskLevel` | `RiskLevel` \| `null` | Looked up via `getRemediationSafety(diagnostic, rulesetAnalysis)`, same as `RemediationItem.riskLevel`. | +| `confidenceLevel` | `ConfidenceLevel` | Same lookup. | +| `remediationSafetyLevel` | `RemediationSafetyLevel` | Same lookup. | +| `staleFingerprintWarning` | same shape as `RemediationItem.staleFingerprintWarning` | Same lookup. | + +`CommonGradeOutput.diagnostics` is `DiagnosticWithSafety[]` whenever the caller supplies a +`RulesetAnalysis` to `buildCommonGradeOutput(result, { top, rulesetAnalysis })` (equivalently, +to `formatJson`/`formatHuman`'s `rulesetAnalysis` parameter), and plain `Diagnostic[]` +otherwise. This is **independent of `--remediation-safety`/`level` filtering** — it is the +mechanism by which a *regular* (unfiltered) grading request also surfaces per-violation +remediation-safety information, so a user does not have to make a second, filtered request +just to learn how risky each finding is to fix. The CLI's default (non-`--remediation-safety`) +`--format json` and `--format human` paths MUST always supply `rulesetAnalysis`, computed via +the same `loadRuleset()`/`analyseRuleset()` call already made for `--remediation-safety`, so +this is not an opt-in flag — it is the default shape of regular grading output going forward. + +## Output formatting contract (all surfaces) + +Every JSON document any tool in this project prints to an end user — CLI (`--format json`, +`ruleset-analysis [correct]`, `config`/`set-ruleset`/`get-ruleset` error and success payloads) +and any future surface reusing these core functions — MUST be pretty-printed +(`JSON.stringify(value, null, 2)`), never minified. This was already true of the main +`formatJson()` grade output; `buildRemediationSafetyOutput()`'s JSON and the +`ruleset-analysis`/`ruleset-analysis correct` JSON output regressed to compact, single-line +JSON when first implemented, which is the specific regression this note exists to prevent. +MCP tool responses (`grade-api`, `grade-api-detailed`, `grade-api-remediation-safety`, +`analyse-ruleset-safety`, etc.) are explicitly exempt — their JSON stays compact/minified by +design, for token efficiency in an AI-agent context, not for end-user reading. + +## RemediationSafetyOutput (was `QuickFixOutput`) + +| Field | Type | Description | +|---|---|---| +| `specPath` | string | Unchanged. | +| `format` | `ApiFormat` | Unchanged. | +| `totalViolations` | number | Unchanged. | +| `remediationItemCount` | number | Renamed from `quickFixCount` — count of violations matching the requested `level`. | +| `remediationItems` | `RemediationItem[]` | Renamed from `quickFixes`. | +| `requestedLevel` | `RemediationSafetyLevel` | **New** — echoes the level that was filtered for, since there are now three possible values instead of one implicit one. | + +**State transitions**: `RemediationItem`/`RemediationSafetyOutput` are computed fresh per grading/analysis request and never persisted — only the per-rule `RuleAnalysis` entries behind them (via `SharedRulesetAnalysis`/`PersonalRulesetAnalysisOverride`/`BundledRulesetAnalysis`) are persisted, and only at the granularity of "one rule's classification, keyed by that rule's fingerprint," not as a snapshot of any specific request's output. This corrects the original assumption (carried over from Feature 11's request-scoped data model) that nothing in this feature persists across requests — `clarification-algorithm.md`, and the project's own goal of letting a team share judgements rather than each configuring their own copy, both require the per-rule analysis layer to survive across requests and across users. + +## Lookup / default behavior + +`getRemediationSafety(diagnostic, rulesetAnalysis) -> { riskLevel, confidenceLevel, remediationSafetyLevel, staleFingerprintWarning }`: +- If `rulesetAnalysis.rules` contains an entry for `diagnostic.ruleId`, return its `riskLevel`/`confidenceLevel`/`remediationSafetyLevel`/`staleFingerprintWarning` verbatim — all four are carried through to `RemediationItem` unchanged. +- Otherwise (FR-009), return `{ riskLevel: "high", confidenceLevel: "low", remediationSafetyLevel: "unsafe", staleFingerprintWarning: null }` — equivalent to a synthetic Stage 2 entry (`assessedBy: "automated"`) run through the Decision Matrix. diff --git a/specs/012-remediation-safety/plan.md b/specs/012-remediation-safety/plan.md new file mode 100644 index 0000000..76e8542 --- /dev/null +++ b/specs/012-remediation-safety/plan.md @@ -0,0 +1,137 @@ +# Implementation Plan: Remediation Safety (Ruleset Analyser & Multi-Level Safety) + +**Branch**: `012-remediation-safety` | **Date**: 2026-06-23 | **Spec**: [spec.md](./spec.md) + +**Input**: Feature specification from `/specs/012-remediation-safety/spec.md` + +**Note**: This template is filled in by the `/speckit-plan` command. See `.specify/templates/plan-template.md` for the execution workflow. + +## Summary + +Build a deterministic, rule-metadata-driven ruleset analyser (`analyseRuleset()`) that assigns every rule in a loaded Spectral ruleset a risk level (`low`/`medium`/`high`) and a confidence level (`high`/`medium`/`low`), deriving a remediation safety level (`safe`/`humanreview`/`unsafe`) from those two signals via the decision matrix in [`automated_remediation_safety_algorithm_spec.md`](../algorithms/automated_remediation_safety_algorithm_spec.md). Extend `--remediation-safety` (CLI) and the `grade-api-remediation-safety` MCP tool's `level` parameter from the single `safe` value (Feature 11) to all three levels, computed via per-violation lookup against the analyser's cached result. Add a new CLI subcommand (`ruleset-analysis`) and MCP tool (`analyse-ruleset-safety`) so the analyser's output is inspectable independent of grading a spec. Complete the internal rename Feature 11 deferred: no source, test, or current documentation file may reference "quick fix(es)" in any form afterward (historical `CHANGELOG.md`/`GOAL.md` entries excluded as accurate historical record). + +## Technical Context + +**Language/Version**: TypeScript (Node.js, ES modules), per existing `packages/api-grade-core`, `packages/api-grade-mcp`, and `src/cli` packages + +**Primary Dependencies**: `@stoplight/spectral-rulesets` / `@stoplight/spectral-ruleset-bundler` (already used by `rulesets/loader.ts` to load rule metadata — the analyser reads `LoadedRuleset.ruleset.rules`, no new parsing dependency needed), `commander` (new `ruleset-analysis` CLI subcommand), `zod` (new `analyse-ruleset-safety` MCP tool schema + extended `level` enum), `@modelcontextprotocol/sdk` + +**Storage**: Three-tier persistence for ruleset analysis results — (1) bundled pre-calculated `BundledRulesetAnalysis` shipped with the package for the built-in rulesets (FR-012), (2) shared colocated `SharedRulesetAnalysis` stored alongside the ruleset file/URL for team sharing (FR-016/FR-017), (3) workspace/global `PersonalRulesetAnalysisOverride` for user-local corrections that take precedence without modifying shared data (FR-018). All three are read at Stage 0 of `analyseRuleset()` before heuristic stages run; writes occur only for local/writable rulesets (FR-019). `RulesetAnalysis` is otherwise ephemeral within a process invocation (not cached to disk beyond these stores). + +**Testing**: Vitest (`vitest run`). New unit tests for `analyseRuleset()`/`getRemediationSafety()` in `packages/api-grade-core/tests/unit/remediation-safety.test.ts` (replacing `quick-fixes.test.ts`); updated CLI integration test `tests/integration/cli-remediation-safety.test.ts` (replacing `cli-quick-fixes.test.ts`) covering all three levels plus the new `ruleset-analysis` subcommand; updated MCP integration test `packages/api-grade-mcp/tests/integration/remediation-safety.test.ts` (replacing `quick-fixes-only.test.ts`) plus a new test for `analyse-ruleset-safety` + +**Target Platform**: Cross-platform Node.js CLI and MCP server (Windows/macOS), per constitution Principle V + +**Project Type**: CLI + MCP server + core library packages within an existing npm workspace monorepo + +**Performance Goals**: Ruleset analysis is O(rules) and runs once per loaded ruleset per process invocation; per-violation lookup is O(1) (map lookup by `ruleId`) — no measurable change to existing grading throughput + +**Constraints**: Must not regress `--remediation-safety safe` membership (FR-007); must classify 100% of rules in any ruleset, built-in or custom, with no omissions (SC-005); zero new monetary-cost dependencies (constitution Principle V) — the analyser is pure rule-metadata inspection, no external service or model call + +**Scale/Scope**: Touches `packages/api-grade-core` (new `remediation-safety.ts` replacing `quick-fixes.ts`, `types.ts` additions, `index.ts` exports), `packages/api-grade-mcp` (rename + extend existing tool, add new `analyse-ruleset-safety` tool, `server.ts` registration), `src/cli` (extend `--remediation-safety`, add new `ruleset-analysis-cli.ts` subcommand), and documentation (`docs/cli/commands.md`, `docs/mcp/quick-start.md`, `docs/package/api-grade-mcp.md`, `docs/package/README.md`, `docs/package/api-reference.md`, `docs/index.md`, `docs/getting-started.md`, `packages/api-grade-mcp/README.md`, `CONTRIBUTING.md`). New algorithm spec document at `specs/algorithms/automated_remediation_safety_algorithm_spec.md` (already authored as part of this planning phase). No Backstage plugin changes — they do not currently surface quick-fix/remediation-safety information. + +## Constitution Check + +*GATE: Must pass before Phase 0 research. Re-check after Phase 1 design.* + +| Principle | Gate | Status | +|-----------|------|--------| +| I. Multi-Format API Support | Analyser/remediation-safety must not be scoped to one spec format | PASS — operates on ruleset rule metadata (`ruleId`, `given`), uniform across the OpenAPI and AsyncAPI built-in rulesets and custom rulesets; no format-specific branching | +| II. Core-First Architecture | CLI and MCP must consume shared core logic, not duplicate classification | PASS — `analyseRuleset()`/`getRemediationSafety()` live in `@dawmatt/api-grade-core`; CLI and MCP both call the same core functions, mirroring how `buildQuickFixOutput` is shared today | +| III. Spectral-Ruleset Based Grading | Must not alter scoring/diagnostic generation; custom rulesets must remain supported | PASS — analyser is a separate, additive computation; no change to `grader.ts`/`scorer.ts`; works against any Spectral-compatible ruleset (built-in or custom), including ones sourced via GitHub PAT (existing `resolveRuleset`/`fetchRulesetContent` flow, unchanged) | +| IV. Test-Driven Quality | New algorithm and renamed surfaces need test coverage written alongside implementation | PASS — plan specifies new/renamed unit and integration tests covering all three levels, the analyser's total-coverage guarantee (SC-005), and the fallback/lookup-miss default (FR-009) | +| V. Cross-Platform & Zero-Cost Prerequisites | No new paid dependencies or platform-specific behavior | PASS — reuses existing `@stoplight/spectral-rulesets`/`commander`/`zod`; analyser is pure in-process logic, no external service | +| VI. Educational Excellence | Diagnostic-adjacent output should explain *why*, not just *what* | PASS — every `RuleAnalysis` carries a `rationale` field explaining the classification (FR-003), satisfying the "actionable, explained" principle for this new diagnostic surface too | + +No violations — Complexity Tracking section is not needed. + +## Project Structure + +### Documentation (this feature) + +```text +specs/012-remediation-safety/ +├── plan.md # This file (/speckit-plan command output) +├── research.md # Phase 0 output (/speckit-plan command) +├── data-model.md # Phase 1 output (/speckit-plan command) +├── quickstart.md # Phase 1 output (/speckit-plan command) +├── contracts/ # Phase 1 output (/speckit-plan command) +└── tasks.md # Phase 2 output (/speckit-tasks command - NOT created by /speckit-plan) + +specs/algorithms/ +└── automated_remediation_safety_algorithm_spec.md # New domain algorithm spec (FR-004), authored in this planning phase +``` + +### Source Code (repository root) + +```text +packages/api-grade-core/src/ +├── remediation-safety.ts # NEW — replaces quick-fixes.ts: analyseRuleset(), getRemediationSafety(), buildRemediationItem(), buildRemediationSafetyOutput(), formatRemediationSafetyHuman(); buildRemediationItem() carries severity/range over from Diagnostic unchanged (FR-022) +├── rulesets/loader.ts # unchanged — analyser consumes its LoadedRuleset.ruleset.rules +├── types.ts # add RemediationSafetyLevel, ConfidenceLevel, RuleAnalysis, RulesetAnalysis, RemediationItem (incl. range), RemediationSafetyOutput, DiagnosticWithSafety; remove ViolationClass, QuickFix, QuickFixOutput +├── json-output.ts # buildCommonGradeOutput() accepts options.rulesetAnalysis; decorates diagnostics via getRemediationSafety() when supplied (FR-024) +├── formatter.ts # formatJson()/formatHuman() accept an optional rulesetAnalysis param, threaded into buildCommonGradeOutput()/per-diagnostic safety annotation; JSON output remains pretty-printed (FR-023) +└── index.ts # export new remediation-safety.ts symbols/types in place of quick-fixes.ts ones + +packages/api-grade-core/tests/unit/ +├── remediation-safety.test.ts # replaces quick-fixes.test.ts; adds analyseRuleset()/getRemediationSafety() coverage for all 3 levels + confidence + SC-005 total-coverage check +├── json-output.test.ts # buildCommonGradeOutput() with/without rulesetAnalysis +└── formatter.test.ts # formatJson()/formatHuman() with/without rulesetAnalysis + +src/cli/ +├── index.ts # extend --remediation-safety to accept safe|humanreview|unsafe; call renamed core functions; always compute rulesetAnalysis and pass to formatJson()/formatHuman() on the regular (non-filtered) path too (FR-024); all printed JSON pretty-printed (FR-023) +├── ruleset-analysis-cli.ts # NEW — `ruleset-analysis` subcommand (mirrors ruleset-config-cli.ts pattern); JSON output pretty-printed +└── ruleset-config-cli.ts # JSON output pretty-printed for consistency (FR-023) + +tests/integration/ +├── cli-remediation-safety.test.ts # replaces cli-quick-fixes.test.ts; covers all 3 levels + ruleset-analysis subcommand +└── cli-json-output.test.ts # updated to parse multiple back-to-back pretty-printed JSON documents from stdout (brace-depth split) instead of one compact JSON object per line + +packages/api-grade-mcp/src/ +├── server.ts # register renamed tool + new analyse-ruleset-safety tool +└── tools/ + ├── remediation-safety.ts # renamed from quick-fixes-only.ts; level enum extended to 3 values + └── analyse-ruleset-safety.ts # NEW — exposes analyseRuleset() independent of grading + +packages/api-grade-mcp/tests/integration/ +├── remediation-safety.test.ts # renamed from quick-fixes-only.test.ts; covers all 3 levels +└── analyse-ruleset-safety.test.ts # NEW + +packages/api-grade-mcp/src/utils/classify.ts # update re-exports to new core names + +docs/ +├── cli/commands.md # --remediation-safety 3-level reference + ruleset-analysis subcommand +├── mcp/quick-start.md # renamed/extended tool + new analyse-ruleset-safety tool +├── package/api-grade-mcp.md # tool reference updates +├── package/README.md # remove remaining "quick fix" mentions +├── package/api-reference.md # core API reference: new types/functions +├── index.md # remove remaining "quick fix" mentions +└── getting-started.md # tool list mention update + +packages/api-grade-mcp/README.md # tool table update +CONTRIBUTING.md # package/tool table correction (still names pre-Feature-11 tool) +``` + +**Structure Decision**: Single-project monorepo (existing `src/cli` + `packages/*` workspaces), unchanged from Feature 11. The analyser and remediation-safety calculation live entirely in `@dawmatt/api-grade-core` (Core-First Architecture, Principle II); CLI and MCP packages each add one new thin surface (a subcommand, a tool) that calls the shared core functions, matching the existing `ruleset-config`/`get-ruleset-config`/`set-ruleset-config` pattern rather than introducing a new architectural layer. + +## Complexity Tracking + +> **Fill ONLY if Constitution Check has violations that must be justified** + +No violations — section not applicable. + +## Post-implementation Corrections + +Three issues were discovered during the Phase 6 manual walkthrough (T040) and corrected in subsequent phases. They are documented here so the plan remains an accurate record of what was built. + +### Phase 7 — Output-shape regression fix + +`buildRemediationItem()` inherited a stale numeric-severity assumption from before `Diagnostic.severity` became a string enum, causing every `RemediationItem` to report `severity: "warn"` regardless of actual severity. `range` was also dropped from `RemediationItem`. Separately, `--remediation-safety` and `ruleset-analysis` JSON output regressed to compact (single-line) formatting, and regular (unfiltered) grading output never surfaced per-violation safety signals (FR-024/SC-012). Corrections: assign `severity: diagnostic.severity` directly; add `range` to `RemediationItem`; pretty-print all CLI JSON via `JSON.stringify(value, null, 2)`; decorate `CommonGradeOutput.diagnostics` with safety signals on the unfiltered path (new `DiagnosticWithSafety` type, `buildCommonGradeOutput()` / `formatJson()` / `formatHuman()` accept optional `rulesetAnalysis`). + +### Phase 8 — Heuristic correctness: `then.field "@key"` on paths/channels + +Stage 1a of the heuristic only recognised the JSONPath `~` key-selector form (e.g. `$.channels[*]~`) as targeting path/channel keys. Spectral's built-in rulesets also express the same semantics via `then.field: "@key"` on `given: "$.channels"` or `given: "$.paths"`. Without this check, those rules fell into Stage 1b's `pattern`/`casing` default and received `medium/high` risk (humanreview) instead of the correct `high/high` (unsafe) — affecting 6 AsyncAPI channel rules and 3 OpenAPI path-key rules. Correction: added `fieldNamesOf()` helper and a `then.field: "@key"` branch to `stage1a()`; regenerated bundled analysis; updated `automated_remediation_safety_algorithm_spec.md` Stage 2a. + +### Phase 9 — Heuristic correctness: `pattern notMatch`-only is an existence check + +Stage 1b classified all `pattern` function uses as "rename/reformat", defaulting to `medium` risk. `pattern` with `notMatch`-only in `functionOptions` is semantically an existence/validity check (closer to `falsy`/`truthy`), producing incorrect rationale text ("rename/reformat" for emptiness checks) and mis-classifying custom `pattern`+`notMatch` rules on `SAFE_SEGMENTS` targets as `medium` instead of `low`. Correction: added `isPatternExistenceCheck()` helper; `stage1b()` applies additive-style tier escalation when the flag is set; rationale text updated in bundled analysis; no risk-level changes to built-in rulesets (tier lookup dominates). diff --git a/specs/012-remediation-safety/quickstart.md b/specs/012-remediation-safety/quickstart.md new file mode 100644 index 0000000..2ed59c6 --- /dev/null +++ b/specs/012-remediation-safety/quickstart.md @@ -0,0 +1,68 @@ +# Quickstart: Remediation Safety (Ruleset Analyser & Multi-Level Safety) + +## 1. CLI: filter by any of the three levels + +```bash +api-grade openapi.yaml --remediation-safety safe # unchanged behavior from Feature 11 +api-grade openapi.yaml --remediation-safety humanreview # new +api-grade openapi.yaml --remediation-safety unsafe # new +``` + +Each returned item now includes `riskLevel`, `confidenceLevel`, `remediationSafetyLevel`, and `staleFingerprintWarning` (usually `null`). `riskLevel` is `low`/`medium`/`high`; `remediationSafetyLevel` is `safe`/`humanreview`/`unsafe` and is what `--remediation-safety`/`requestedLevel` filters against: + +```json +{ + "specPath": "openapi.yaml", + "format": "openapi-3", + "totalViolations": 12, + "requestedLevel": "humanreview", + "remediationItemCount": 2, + "remediationItems": [ + { + "ruleId": "operation-operationId", + "riskLevel": "medium", + "confidenceLevel": "high", + "remediationSafetyLevel": "humanreview", + "staleFingerprintWarning": null, + "...": "..." + } + ] +} +``` + +## 2. CLI: inspect a ruleset's remediation risk without grading a spec + +```bash +api-grade ruleset-analysis --format human +# rule id risk level confidence remediation safety assessed by rationale +# operation-description low high safe automated `truthy` function (additive — add/populate a field) on a target matching the low tier +# operation-operationId medium high humanreview automated `truthy` function (additive — add/populate a field) on a target matching the medium tier +# oas3-schema high low unsafe automated custom function `oasDocumentSchema` — mechanics cannot be inferred statically +# custom-team-rule-007 low high safe human WARNING: fingerprint mismatch (stored a1b2c3..., current d4e5f6...) — rule changed since this was last reviewed; persisted classification still honored + +api-grade ruleset-analysis --ruleset-path ./my-ruleset.yaml --format json +``` + +The built-in ruleset's bundled entries (FR-012/FR-020) are pre-computed by running the Stage 1/2 heuristic over every rule once at build time — `assessedBy: "automated"` — not a maintainer's reviewed judgement; this only avoids recomputing the heuristic per request (SC-007), it is not a substitute for review. A maintainer who actually reviews a rule and runs `ruleset-analysis correct` on it (the same mechanism a user would use to persist a correction for their own ruleset, FR-013) produces a genuine `assessedBy: "human"` entry that overrides the bundled default. The last row illustrates FR-021: a human-assessed entry whose rule definition has since changed is still honored, but flagged with both the stored and current fingerprint rather than silently discarded. + +## 3. MCP: same filtering, plus a dedicated ruleset-analysis tool + +```text +Tool: grade-api-remediation-safety +Input: { "specPath": "/workspace/my-api/openapi.yaml", "level": "humanreview" } + +Tool: analyse-ruleset-safety +Input: { "rulesetPath": "/workspace/my-ruleset.yaml" } +Output: { "rulesetSource": "custom", "rulesetPath": "...", "rules": [ ... ] } +``` + +## 4. Verify the "quick fixes" cleanup is complete + +```bash +grep -rniE "quick.?fix" --include="*.ts" --include="*.md" \ + src/ packages/api-grade-core/src packages/api-grade-mcp/src \ + packages/api-grade-core/tests packages/api-grade-mcp/tests tests/ \ + docs/ packages/api-grade-mcp/README.md CONTRIBUTING.md +``` + +This should return zero matches (SC-003). `CHANGELOG.md` and `GOAL.md` historical entries describing what shipped in past releases are intentionally excluded — they are an accurate record of the past, not current documentation. diff --git a/specs/012-remediation-safety/research.md b/specs/012-remediation-safety/research.md new file mode 100644 index 0000000..4535c96 --- /dev/null +++ b/specs/012-remediation-safety/research.md @@ -0,0 +1,121 @@ +# Research: Remediation Safety (Ruleset Analyser & Multi-Level Safety) + +## 1. Rule-level vs. violation-level classification + +**Decision**: The ruleset analyser classifies at the **rule** level (one risk + confidence pair per `ruleId`), computed once per loaded ruleset and cached. Remediation safety for a specific violation is a lookup against this cache by `ruleId`, not a fresh per-instance computation. + +**Reassessed against `clarification-algorithm.md`**: confirmed and unchanged. The clarification document's "Recommended High Level Approach" frames the problem the same way — analyse the ruleset once, "for every rule," and reuse that output — and explicitly motivates this with the same performance argument given here ("avoiding the need to estimate ruleset safety on every run"). One addition: the clarification document expects this cache to outlive a single process/grading run, and (per direct user input, see §8) to be shared across an entire team, not just across one user's invocations — "computed once per loaded ruleset" should be read as "computed once per rule definition, persisted colocated with the ruleset, and reused by anyone who reads that ruleset," not merely cached for the lifetime of one process or one user's machine. + +**Rationale**: FR-001 and the spec's Key Entities explicitly scope the analyser to "for each rule". This is also what makes FR-011 possible (inspecting ruleset risk independent of grading any specific spec) and keeps the calculation O(1) per violation at grading time instead of re-running heuristics per occurrence. + +**Alternatives considered**: Per-violation classification (today's `classifyViolation`, keyed on the diagnostic's instance `path`) gives finer granularity for generic rules (e.g. `oas3-schema`, which fires on both breaking and cosmetic mismatches) but cannot satisfy FR-011 (no spec to derive an instance path from) and isn't "ruleset analysis" — it's per-result analysis. Rejected in favor of rule-level, accepting the coarser-granularity tradeoff called out in the spec's Edge Cases (a rule spanning levels gets one conservative classification, flagged with reduced confidence — see §3). + +## 2. Confidence scale + +**Decision**: Three discrete levels — `high`, `medium`, `low` — mirroring the project's existing preference for small, explainable categories over numeric scores (same pattern as `ImpactLevel`, `DiagnosticSeverityLevel`). The `riskLevel` field (§3) uses the same three-level shape (`low`/`medium`/`high`) for consistency, since both are inputs to the same decision matrix and a mismatched scale (e.g. risk on a 5-point scale, confidence on 3) would have no principled justification. + +**Rationale**: Constitution Principle VI favors explanation over raw scores; a numeric 0–1 confidence would need its own thresholds restated everywhere it's displayed, with no added value for a binary "trust this / verify this" decision a user actually makes. `confidenceLevel` is not merely descriptive metadata — per §3's decision matrix it is one of two inputs that determine `remediationSafetyLevel`, so its discreteness also keeps the matrix a small, exhaustively-enumerable table rather than a continuous function needing its own threshold tuning. + +**Alternatives considered**: Continuous 0–100 confidence score — rejected as over-precise for a heuristic, rule-metadata-only analyser, and inconsistent with how grades/impact are presented elsewhere in the project. + +## 3. Three-tier risk classification algorithm + +**Decision**: Extend the existing two-stage quick-fixes algorithm (`specs/algorithms/quick_fixes_algorithm_spec.md`) from two outcome classes (`nonBreaking`/`breaking`, plus `unknown`) to a model with three **independent** signals per rule — `riskLevel` (`low`/`medium`/`high`), `confidenceLevel` (`high`/`medium`/`low`), and `remediationSafetyLevel` (`safe`/`humanreview`/`unsafe`) **derived from the other two via a fixed decision matrix, as a field in its own right** — operating on **rule metadata** (`ruleId`, the rule's `given` JSONPath expression(s), `then.function`) rather than a violation's instance path: + +- **Stage 0 — persisted/bundled lookup** (confidence: as stored; `assessedBy` as stored — see §8): checked first, before any automated computation. There is no separate hard-coded "curated rule-id table" stage. The classifications that used to live in such a table (e.g. `operation-description` is `safe`, `operation-operationId` is `humanreview`) are, for the built-in ruleset, simply entries in `BundledRulesetAnalysis` (§8) with `assessedBy: "human"` — the same persisted-analysis mechanism used for any user-supplied ruleset, not a separate code path or data shape. This removes the duplication of having one mechanism for "rules we know about" and a different one for "rules a user has told us about," and means a maintainer correcting a built-in rule's classification edits the same kind of record a user would. Anything not covered by Stage 0 falls through to Stage 1. +- **Stage 1 — automated estimation from rule mechanics + contract-surface ontology** (confidence: `high`/`medium`/`low`, see below), checked in order: + - **1a. Key-selector check**: a `given` expression that selects object *keys* (the JSONPath Plus `~` modifier) under `paths` or `channels` → `riskLevel: high`, confidence `high`. Renaming a path or channel key is a public-surface rename by construction (clarification document Example B), independent of which function is applied. + - **1b. Function-mechanics classification of `then.function`** — infers the *likely minimal edit*, per the clarification document's "infer the minimal satisfying edit" step, then estimates risk from where that edit lands on the contract-surface ontology (§"Build a format-aware contract-surface ontology"): + - *Additive* functions (`truthy`, `defined`, `field`+`truthy` on a sub-field) imply "add/populate a field". Base `riskLevel: low`; escalated to `medium` if the targeted field is itself in `HUMANREVIEW_SEGMENTS`, or to `high` if in `UNSAFE_SEGMENTS` (e.g. `truthy` on a `parameters` entry is a different risk than `truthy` on `$.info.description`). + - *Rename/reformat* functions (`pattern`, `casing`) imply "rename or reformat the targeted value". Base `riskLevel: medium`; escalated to `high` when the target is a high-impact ontology area (path/channel keys, parameters, security, request/response schemas); de-escalated to `low` only for low-impact metadata (description, contact, licence). + - Confidence for both: `high` when the function+target combination matches exactly one ontology tier unambiguously; `medium` when it matches but spans tiers; `low` when the function is unrecognized. + - *Custom JavaScript* functions: mechanics cannot be inferred statically (clarification document, "If the rule uses a custom JavaScript function"). `riskLevel: high`, confidence `low` — conservative by construction, per the constraint that custom functions "are arbitrary JavaScript functions" whose remediation intent cannot be derived from the declaration alone. + - **1c. Generic segment fallback within Stage 1**: for a rule whose function isn't recognized as additive/rename/custom but whose `given` still matches a known segment tier, `riskLevel` follows the matched tier (`UNSAFE_SEGMENTS` → `high`, `HUMANREVIEW_SEGMENTS` → `medium`, `SAFE_SEGMENTS` → `low`), confidence `medium` for a single unambiguous tier match, downgraded to `low` if the `given` matches segments from more than one tier (genuine ambiguity). + - `UNSAFE_SEGMENTS`/`HUMANREVIEW_SEGMENTS`/`SAFE_SEGMENTS` are the same three tiers as before, extended for AsyncAPI (see the corrected paragraph below). +- **Stage 2 — fallback** (confidence: `low`): no rule-id, function, or path signal recognized at all (e.g. a whole-document rule like `given: "$"`) → `riskLevel: high`. Conservative-by-default, matching the existing project philosophy ("absence of a safety signal is never treated as evidence of safety"); the decision matrix below resolves `high` risk to `unsafe` regardless of confidence, so this is equivalent to today's hard-coded fallback. Always `assessedBy: "automated"` — there is no human judgement behind a Stage 2 entry. +- **Decision matrix** (applied uniformly to the output of Stages 1–2 to produce `remediationSafetyLevel`, taken verbatim from `clarification-algorithm.md` §5; Stage 0 entries store `remediationSafetyLevel` directly and never pass through this matrix): + ``` + If riskLevel = low and confidenceLevel in {high, medium}: remediationSafetyLevel = safe + Else if riskLevel = medium and confidenceLevel = high: remediationSafetyLevel = humanreview + Else if riskLevel = high: remediationSafetyLevel = unsafe + Else: remediationSafetyLevel = humanreview + ``` + This table is total over the 3×3 input space (every other combination — e.g. `low`/`low`, `medium`/`medium`, `medium`/`low` — falls into the `Else` branch and resolves to `humanreview`), so no additional default is needed beyond Stage 2's `high`/`low` fallback. + +**Rationale**: Reuses a proven, explainable, deterministic pattern already accepted by the project (and by users reading `quick_fixes_algorithm_spec.md`) rather than inventing a new paradigm; running every automated stage's output through one shared decision-matrix function (rather than having a curated table assign `remediationSafetyLevel` directly) keeps `riskLevel` and `confidenceLevel` as genuinely independent signals end-to-end, satisfying FR-002/FR-003 without a separate code path per stage, and keeps `remediationSafetyLevel` a field in its own right rather than a relabeling of `riskLevel`. Folding the former curated table into Stage 0/`BundledRulesetAnalysis` (per direct user feedback) additionally means the built-in ruleset's known-good classifications benefit from the same fingerprint-staleness and human-override handling (§8) as any other persisted entry, instead of being a hard-coded table that silently drifts out of sync with the built-in ruleset's actual rule definitions. + +**Alternatives considered**: Statistical/ML classification over rule descriptions — rejected: nondeterministic, costly, and violates Constitution Principle V (zero-cost prerequisites) if it requires an external model; also harder to explain ("rationale" requirement, FR-003) than a deterministic rule table. Having a curated rule-id table assign `remediationSafetyLevel` directly as its own hard-coded code-level stage (the original design, "Stage 1") — rejected after reassessment below: besides conflating risk and safety level, it duplicated the persisted-analysis mechanism (§8) with a second, parallel "known rules" concept that had its own staleness story (none) instead of reusing fingerprinting. + +**Reassessed against `clarification-algorithm.md` — gap found and corrected**: the prior segment tables (`UNSAFE_SEGMENTS`/`HUMANREVIEW_SEGMENTS`/`SAFE_SEGMENTS`) were carried over unchanged from the OpenAPI-only quick-fixes algorithm and contained no AsyncAPI-specific terms, despite Constitution Principle I requiring format-neutral treatment and the clarification document dedicating an explicit section ("Build a format-aware contract-surface ontology") to AsyncAPI's high-impact surfaces — channel `address`, channel parameters, operation `action`, the operation-channel relationship, and messages/payload schemas. The clarification document's worked example (Example B: a `pattern` rule on `$.paths[*]~`, the object-*key* selector) is also not caught by plain segment-membership matching, since `paths`/`channels` were never in any tier as bare segments — adding them as ordinary segments would over-match every rule that merely reads something nested under a path/channel (including safe ones like `operation-description`). Two corrections folded into the updated algorithm spec: (a) extend `UNSAFE_SEGMENTS` with AsyncAPI's high-impact segment terms (`address`, `action`, `messages`, `payload`) alongside the existing OpenAPI ones, and add `channels`/`operations`/`reply` to `HUMANREVIEW_SEGMENTS` as broader/ambiguous AsyncAPI surfaces; (b) add a dedicated **key-selector check** ahead of generic segment matching (Stage 1a above), since renaming a path or channel key is a public-surface rename by construction, matching the clarification document's Example B directly. + +**Reassessed a third time, per direct user feedback — curated table folded into Stage 0**: the former "Stage 1 — curated rule-id tables" has been removed as a distinct, hard-coded stage. Its content (the `safe`-prefix and `humanreview`-prefix rule-id mappings) is migrated into `BundledRulesetAnalysis` (§8) as ordinary persisted entries with `assessedBy: "human"`, looked up via the existing Stage 0 precedence rather than a separate code table. Stages formerly numbered 2 and 3 are renumbered to 1 and 2 accordingly. This is a pure mechanism consolidation, not a behavior change for the built-in ruleset's existing classifications — but it does mean those classifications now benefit from fingerprint staleness detection and the human-override warning behavior (§8) that a hard-coded table could never have had (a hard-coded table has no fingerprint to compare against, so it could silently go stale against a built-in ruleset edit with no detection at all). + +**Reassessed a second time against `clarification-algorithm.md` — second gap found and corrected**: an earlier pass of this document claimed the risk/confidence/safety separation was "confirmed and unchanged," reasoning that confidence-as-an-explanatory-annotation alongside a directly-assigned three-value `riskLevel` field already satisfied the clarification document's §5 ("Separate risk from confidence from safety level"). That reassessment was made before `clarification-algorithm.md` was edited to add its explicit `estimatedRisk`/`confidence`/`remediationSafetyLevel` field list and decision-matrix pseudocode (§5), and did not survive contact with that addition: the original design had no risk-estimate field independent of the final classification at all — Stages 1–3 assigned the final three-value level directly under the name `riskLevel`, and `confidence` never changed which bucket a rule landed in, only how the assignment was explained. That is precisely the conflation the clarification document warns produces "high risk, low confidence" and "low risk, high confidence" cases that a system without the separation cannot represent (its own worked examples in §5). The revision above restores the three independent signals — renaming the low/medium/high estimate to `riskLevel` and giving the derived safe/humanreview/unsafe value its own field, `remediationSafetyLevel`, rather than the two sharing one overloaded field name — and the literal decision matrix, and also folds in the clarification document's "Recommended High Level Estimating Model Approach" steps 3–4 (infer the minimal satisfying edit; estimate whether it touches public-contract elements) and the "Recommended approach for a new ruleset" step 3 (infer likely remediation from rule mechanics — function semantics, not just rule-id/segment matching), neither of which the prior design implemented either. + +## 4. Default/fallback behavior when a rule has no analysis + +**Decision**: Any violation whose `ruleId` is absent from the cached `RulesetAnalysis` (e.g. ruleset changed between analysis and grading) defaults to `riskLevel: high` / `confidenceLevel: low` / `remediationSafetyLevel: unsafe` / `assessedBy: "automated"` at lookup time, not just at Stage 2 of the analyser itself — this is the same quadruple Stage 2 produces, computed directly rather than via the decision matrix, since there is no rule metadata at all to run the matrix against. + +**Rationale**: Directly required by FR-009 and the spec's first Edge Case; keeps the conservative-by-default guarantee end-to-end, not just inside the analyser. + +**Reassessed against `clarification-algorithm.md`**: confirmed and unchanged, and now also governs persisted/pre-calculated entries (§8): a persisted analysis covering only some rules (FR-015) is exactly the "absent from `RulesetAnalysis`" case for the rules it doesn't cover — they fall through to Stages 0–2, then to this same lookup-miss default if still unclassified. One lookup-miss path, reused for every reason a rule might be unclassified (unanalysed ruleset, an automated entry invalidated by a fingerprint mismatch, or partial persisted coverage). Note this does **not** apply to a human-assessed (`assessedBy: "human"`) entry whose fingerprint no longer matches — per §8, that entry is still used (with a warning), not treated as absent. + +## 5. Internal naming cleanup (completing the Feature 11 deferral) + +**Decision**: Rename internal identifiers wholesale — no backward-compatible aliases (pre-v1.0, consistent with Feature 11's precedent): + +| Old | New | +|---|---| +| `packages/api-grade-core/src/quick-fixes.ts` | `packages/api-grade-core/src/remediation-safety.ts` | +| `classifyViolation()` | `classifyViolation()` removed; replaced by `analyseRuleset()` (new) + `getRemediationSafety(diagnostic, rulesetAnalysis)` (lookup) | +| `buildQuickFix()` | `buildRemediationItem()` | +| `buildQuickFixOutput()` | `buildRemediationSafetyOutput()` | +| `formatQuickFixesHuman()` | `formatRemediationSafetyHuman()` | +| Types: `QuickFix`, `QuickFixOutput`, `ViolationClass` | `RemediationItem`, `RemediationSafetyOutput`, `RemediationSafetyLevel` (3-value, the `remediationSafetyLevel` field), plus new `RiskLevel` (3-value, the `riskLevel` field — distinct field and type from `RemediationSafetyLevel`), `ConfidenceLevel`, `AssessmentOrigin` (`"human"` \| `"automated"`, the `assessedBy` field), `RuleAnalysis`, `RulesetAnalysis` | +| `quick_fixes_algorithm_spec.md`'s hard-coded `safe`/`humanreview`-prefix rule-id tables | Removed as a distinct code-level stage; migrated into `BundledRulesetAnalysis` (§8) as `assessedBy: "human"` persisted entries, consulted via the same Stage 0 lookup used for any other ruleset | +| `packages/api-grade-mcp/src/tools/quick-fixes-only.ts`, `registerQuickFixesOnlyTool` | `remediation-safety.ts`, `registerRemediationSafetyTool` | +| `tests/integration/cli-quick-fixes.test.ts`, `packages/api-grade-mcp/tests/integration/quick-fixes-only.test.ts`, `packages/api-grade-core/tests/unit/quick-fixes.test.ts` | renamed to `cli-remediation-safety.test.ts`, `remediation-safety.test.ts`, `remediation-safety.test.ts` | + +`CHANGELOG.md` entries describing **past** releases are historical record and are not rewritten (an accurate record of what a prior version did); `CONTRIBUTING.md`'s package/tool table is **current-state** documentation and is in scope for correction (it still names the pre-Feature-11 tool, which is already stale today). + +## 6. Exposing the ruleset analyser independent of grading (FR-011) + +**Decision**: Add one new surface per tool, following each tool's existing naming convention: + +- **CLI**: new subcommand `ruleset-analysis [--ruleset-path ] [--format json|human]`, implemented alongside the existing `config` subcommand (`src/cli/ruleset-config-cli.ts` pattern) in a new `src/cli/ruleset-analysis-cli.ts`. Defaults to analysing the built-in ruleset when no path is given. +- **MCP**: new tool `analyse-ruleset-safety`, following the `get-ruleset-config`/`set-ruleset-config` `-ruleset-` naming convention, accepting an optional `rulesetPath`. + +**Rationale**: Matches existing patterns exactly rather than inventing a new naming scheme; keeps the tool list self-describing per the constitution's AI Integration Requirements (no extra docs needed to discover it). + +## 7. Filtering semantics for the new levels + +**Decision**: `--remediation-safety ` / MCP `level` parameter remains an **exact-match filter** (return only violations whose computed risk equals the requested level), not a cumulative "at or below" filter. + +**Rationale**: Preserves FR-007 (identical behavior for `safe`) without redefining what the existing parameter means; a cumulative mode is not requested by the spec and would be a separate, additive feature if ever needed (YAGNI per the constitution's Development Workflow). + +## 8. Persisting and reloading ruleset analysis (FR-012–FR-019) + +**Decision**: This is a correction to the original plan/data-model, prompted directly by `clarification-algorithm.md`'s "Recommended High Level Approach" (steps 1 and 4) and its "Expected user-specific ruleset usage behaviour" section, neither of which was reflected in the initial design. Both documents explicitly call for (a) loading pre-calculated risk/safety levels for known rulesets — "at a minimum... the default ruleset" — before running automated analysis, and (b) letting users persist corrections so they are reloaded automatically next time the same ruleset is used. The original data-model statement that "nothing is persisted across requests" directly contradicts this and is superseded. + +**Revised again, per direct user input**: the first version of this decision made the workspace/global per-user config the primary store, keyed by a whole-ruleset content hash. The user reviewing this plan pointed out the actual goal is broader than personal reuse: colleagues in the same organisation should be able to share one set of judgements without each separately configuring their own copy, and the mechanism needs to work for both local and GitHub-hosted rulesets. A per-user config file cannot do that — it lives on one machine and is never seen by a colleague pointed at the same ruleset. The design below replaces the *primary* mechanism with a colocated, shared file, and narrows the original workspace/global design to a secondary, personal-override role. + +Design: + +- **Primary mechanism — colocated Shared Ruleset Analysis (FR-016/FR-017)**: the persisted analysis for a ruleset lives next to the ruleset itself, at a location derived deterministically from the ruleset's own path/URL by a naming convention (e.g. appending a fixed suffix to the ruleset's filename). Presence/absence is therefore a direct lookup at that derived location — no separate index or registry needs to be consulted or kept in sync. For a local ruleset, this is a sibling file in the same directory. For a GitHub-hosted ruleset, this is a sibling file at the same repo path/ref, fetched through the *same* resolution and auth flow already used to fetch the ruleset (`resolveRuleset`/`fetchRulesetContent`, reusing whatever `AuthConfig`/GitHub PAT was already supplied) — no new auth concept. Because it is colocated and (for GitHub-hosted rulesets) typically version-controlled in the same repository, anyone who can read the ruleset automatically sees the same shared analysis — satisfying the sharing goal without any per-user setup (SC-008). +- **Per-rule Fingerprint, not a whole-ruleset hash**: each entry in the shared file is keyed by `ruleId` and carries a fingerprint of that rule's own content (`given`/`then.function`/`severity`/`description`). Recomputing the current ruleset's rules and comparing fingerprints per-`ruleId` means one edited rule invalidates only its own entry, not the whole shared file — important precisely because this file is meant to be shared and incrementally maintained by a team over time, not regenerated wholesale on every edit. (A whole-ruleset hash, as in the original design, would invalidate every entry the moment any one rule changed — too coarse for a file meant to accumulate team knowledge.) +- **`assessedBy: "human" | "automated"` on every persisted entry**: each entry in the shared file, the personal override, and `BundledRulesetAnalysis` carries who produced the stored judgement, not just what it is. A correction written via FR-013 (a user explicitly persisting their own judgement) is always `assessedBy: "human"`. An entry written by the tool itself without a human in the loop — e.g. a future caching optimization that persists an automated Stage 1/2 result purely to avoid recomputing it — would be `assessedBy: "automated"`. This distinction is what the fingerprint-mismatch handling below keys off. +- **Fingerprint mismatch on a human-assessed entry is honored, not discarded, but warned about**: per direct user feedback, a stale fingerprint on an `assessedBy: "automated"` entry is treated as today (§4) — not found, falls through to Stages 1–2. A stale fingerprint on an `assessedBy: "human"` entry is **still used** — a human's judgement about a rule's remediation safety does not become wrong just because the rule's `given`/`then.function`/`severity`/`description` text changed; the human may well have already accounted for the kind of change that occurred, and silently discarding their explicit correction on every minor rule edit would force them to re-confirm it indefinitely, undermining the entire point of FR-013. Instead the analyser surfaces a warning carrying both fingerprints — the one the entry was captured against and the rule's current one — so a user/CI pipeline can see that *something* about the rule changed since a human last reviewed it, without the tool unilaterally deciding the human's prior judgement no longer applies. The warning is attached to that rule's `RuleAnalysis` entry (visible via FR-011) and to the rule's `RemediationItem` for any violation it produces during grading, rather than only logged — so it survives into JSON output for programmatic consumers (e.g. a CI job that wants to fail/flag when this occurs) and not just human-readable text. +- **Secondary mechanism — Personal Ruleset Analysis Override (FR-018)**: the original workspace/global config-file design (`RulesetConfig`-style storage) is retained, but repurposed: it now holds only a user's *personal* corrections, which take precedence over the shared colocated file and over automated analysis, without writing to the shared file. This covers the case where a user disagrees with the team's shared judgement, or can read but not write the ruleset's location (e.g. a GitHub-hosted ruleset they have only read access to). Lookup precedence: personal override (workspace, then global) → shared colocated analysis → bundled default (built-in ruleset only) → Stages 1–2. +- **Partial coverage**: both the shared file and the personal override are maps keyed by `ruleId`; only rules present in a given map short-circuit Stages 1–2 for this ruleset (FR-015). This is the same mechanism as the existing lookup-miss default (§4) — a rule not in either map is simply not a hit, and analysis proceeds normally for it. +- **Writing a correction (FR-013/FR-018/FR-019)**: for a *local* ruleset, persisting "to the shared file" is a normal local file write — low-risk and reversible, no different from any other local file edit the tool makes, and the resulting diff is something the user can review/commit/PR through their normal process. For a *GitHub-hosted* (or otherwise remote, non-writable) ruleset, the tool does not push a commit automatically (FR-019) — pushing a change to a shared, version-controlled artifact that other colleagues rely on is a different risk class than editing a local file, and should go through the user's normal review process, not a silent automated write. In that case the correction is recorded as a Personal Override locally, and the tool can additionally emit the updated shared-file content for the user to commit themselves. +- **The former curated rule-id table, migrated**: `BundledRulesetAnalysis` (the built-in ruleset's pre-calculated analysis) is no longer "Stages 1–3 run once at release time over the built-in ruleset" for its well-known rules — those well-known rules' entries are authored directly by a maintainer (the same act of judgement the old hard-coded table represented) and stored with `assessedBy: "human"`, exactly like any other persisted correction. The remaining built-in rules without a maintainer judgement are still pre-computed by Stages 1–2 at release time and stored with `assessedBy: "automated"`. Both kinds of entry live in the same file, looked up the same way — there is no longer a separate "is this rule in the curated table" code path. + +**Rationale**: Both source documents (`clarification-algorithm.md` and the user's own architectural input) converge on the same underlying need — avoid re-deriving (and re-human-reviewing) the same ruleset's classifications repeatedly, for every person who uses it. Colocation is the simpler, more direct way to satisfy "the user can perform this review once and then encode the correct safety level for this rule in this ruleset" *for an entire team*, rather than once per person. It also reuses existing project capability (the ruleset resolution/fetch path already handles "give me a file or URL, with optional auth") rather than inventing a new shared-storage concept. Honoring `assessedBy: "human"` entries across a fingerprint mismatch (with a warning) extends that same "review once, trust it" goal to surviving small, incidental rule edits — a maintainer rewording a rule's `description` should not silently un-confirm a human's safety judgement about that rule's `given`/`function` semantics. + +**Alternatives considered**: +- Keying the shared file by a whole-ruleset content hash (the original decision) — rejected per the fingerprint discussion above: too coarse for a file intended to be incrementally maintained. +- A purely per-user workspace/global store as the *only* mechanism (the original decision) — rejected: cannot satisfy the stated organisational-sharing goal; a teammate pointed at the same ruleset would never see it. +- A separate index/registry file mapping ruleset identities to analysis file locations — rejected in favor of a pure naming-convention derivation: an index is one more thing that can drift out of sync with the rulesets it describes, where a deterministic derived path cannot. +- Automatically committing corrections back to a GitHub-hosted ruleset's repository — rejected: writing to a shared, remote artifact without an explicit human review step is a meaningfully different risk than a local file edit, and not something this feature should do silently. +- Discarding a human-assessed entry on any fingerprint mismatch, same as an automated entry (the original design) — rejected per direct user feedback: it treats a human's explicit, persisted judgement as no more durable than a heuristic guess, defeating the purpose of letting a human confirm a classification at all. +- Silently honoring a human-assessed entry on mismatch with no warning at all — rejected: a user/CI pipeline has a legitimate interest in knowing the rule changed since a human last looked at it, even though the prior judgement is still being trusted; surfacing the two fingerprints costs little and preserves an audit trail. diff --git a/specs/012-remediation-safety/spec.md b/specs/012-remediation-safety/spec.md new file mode 100644 index 0000000..91fefdc --- /dev/null +++ b/specs/012-remediation-safety/spec.md @@ -0,0 +1,147 @@ +# Feature Specification: Remediation Safety (Ruleset Analyser & Multi-Level Safety) + +**Feature Branch**: `012-remediation-safety` + +**Created**: 2026-06-23 + +**Status**: Draft + +**Input**: User description: "Feature 12 - Remediation safety (from GOAL.md): Build a ruleset analyser that determines the level of risk associated with remediating violations identified by each of its rules, along with a confidence level in that determination. Extend remediation safety to support additional levels (humanreview, unsafe) beyond the existing safe level, calculated from the analyser's output, in alignment with a new automated_remediation_safety_algorithm_spec.md. Surface remediation safety in JSON and human output across tools and packages. Complete the refactor away from the 'quick fixes only' concept — started superficially in Feature 11 — so 'quick fixes' no longer appears anywhere in the code base or user-facing documentation." + +## User Scenarios & Testing *(mandatory)* + +### User Story 1 - Developer sees a risk-graded remediation plan, not just a flat safe/unsafe split (Priority: P1) + +A developer grading their API spec wants to know, for every violation, how risky it would be to auto-remediate it: which violations can be fixed with no review, which need a human to sanity-check the result, and which should not be auto-remediated at all. Today the tool only distinguishes "safe" from "everything else"; the developer wants the middle ground made visible so they can triage efficiently instead of treating every non-trivial fix as equally risky. + +**Why this priority**: This is the core value of the feature — without the three-level split, the rest of the feature (analyser, confidence, spec doc) has no visible user benefit. + +**Independent Test**: Grade a sample spec containing violations that are clearly auto-fixable (e.g. missing descriptions), violations that need human judgement (e.g. missing operation `operationId`), and violations that should never be auto-fixed (e.g. removing a required field). Confirm the three groups are reported under `safe`, `humanreview`, and `unsafe` respectively, with counts matching expectations. + +**Acceptance Scenarios**: + +1. **Given** a spec with violations spanning all three risk categories, **When** the user requests remediation-safety output (CLI, MCP, or package API), **Then** each violation is labeled with exactly one of `safe`, `humanreview`, or `unsafe`. +2. **Given** the user filters output to `--remediation-safety safe`, **When** results are returned, **Then** only violations classified `safe` are included, identical in scope to today's behavior. +3. **Given** the user filters output to `--remediation-safety humanreview`, **When** results are returned, **Then** violations classified as `humanreview` are included (the new level introduced by this feature). +4. **Given** the user filters output to `--remediation-safety unsafe`, **When** results are returned, **Then** violations classified as `unsafe` are included (the new level introduced by this feature). + +--- + +### User Story 2 - Ruleset maintainer trusts the analyser's classification because confidence is shown (Priority: P2) + +A team supplies its own custom Spectral ruleset. They want to understand, rule by rule, why the analyser assigned a given remediation safety level, and how confident the analyser is in that assignment, so they can spot-check or override classifications they disagree with before relying on them in CI. + +**Why this priority**: Confidence is what makes the analyser trustworthy for custom/third-party rulesets where the built-in heuristics may not apply cleanly; without it, users have no way to judge whether a `safe` label is well-founded. + +**Independent Test**: Run the ruleset analyser against both the built-in ruleset and a custom ruleset containing rules with no recognizable pattern. Confirm every rule receives a risk level, a confidence level, and a remediation safety level, and that unrecognized/ambiguous rules receive a visibly lower confidence than well-known rules. + +**Acceptance Scenarios**: + +1. **Given** a ruleset is analysed, **When** the analysis completes, **Then** every rule in the ruleset has an assigned remediation safety level (`safe`, `humanreview`, or `unsafe`) and a confidence level for that assignment. +2. **Given** a rule the analyser cannot confidently classify (e.g. a custom rule with no recognizable id pattern or schema path), **When** it is analysed, **Then** it is still assigned a remediation safety level (defaulting to the more conservative `unsafe` or `humanreview`) but with a low confidence indicator, rather than being silently omitted. +3. **Given** the analyser's output for a ruleset, **When** a user inspects it (JSON or human format), **Then** they can see, per rule, the risk level, confidence level, remediation safety level, and a brief rationale. +4. **Given** a user disagrees with a rule's assigned remediation safety level and persists a correction for it, **When** the same ruleset is analysed again in a later, separate invocation, **Then** the corrected remediation safety level is returned for that rule without requiring the correction to be re-applied. + +--- + +### User Story 3 - Documentation and code no longer mention "quick fixes" anywhere (Priority: P3) + +A new contributor or documentation reader should encounter "remediation safety" consistently everywhere — command help, MCP tool descriptions, READMEs, internal function/type names, test names — with no leftover "quick fixes" terminology anywhere, since Feature 11 deliberately left internal naming unchanged pending this feature. + +**Why this priority**: Lower priority than the functional capability itself, but required to close out the deferred work from Feature 11 and avoid the codebase carrying two names for the same concept indefinitely. + +**Independent Test**: Search the entire repository (source, tests, docs, package metadata) for "quick fix" / "quickFix" / "quick-fix" (case-insensitive) and confirm zero matches. + +**Acceptance Scenarios**: + +1. **Given** the full repository (all packages, docs, tests), **When** searched for "quick fix" in any casing or separator style, **Then** no matches are found. +2. **Given** the CLI, MCP server, and package public APIs, **When** their exported names, types, and option/tool names are inspected, **Then** all of them use "remediation safety" terminology exclusively. + +--- + +### Edge Cases + +- What happens when a violation's rule was never analysed (e.g. ruleset changed between analysis and grading, or a dynamically generated rule id)? The system must assign a safe default (most conservative: `unsafe`) rather than crash or silently drop the violation from output. +- How does the analyser handle a rule that legitimately spans multiple risk levels depending on context (e.g. a rule that sometimes flags a breaking change and sometimes a cosmetic one)? The specification (`automated_remediation_safety_algorithm_spec.md`) must define how such rules are classified — at the rule level, the analyser assigns one risk/confidence pair per rule; finer-grained, per-violation distinctions are out of scope for the analyser itself. +- What happens when a custom/private ruleset is supplied that the analyser has never seen before? It must still produce a complete classification (risk level + confidence level + derived remediation safety level) for every rule, with confidence honestly reflecting the lack of prior knowledge, rather than failing the grading run. +- What happens when risk level and confidence level disagree (e.g. `medium` risk with only `low` confidence, or `low` risk with `low` confidence)? The decision matrix (FR-003, `automated_remediation_safety_algorithm_spec.md`) resolves every such combination to `humanreview`, not `safe` — low confidence in a risk estimate is never, by itself, grounds for the most permissive classification, consistent with the project's conservative-by-default posture (FR-009). +- What happens to existing consumers (CI pipelines, scripts) that depend on today's binary `safe` vs "not safe" filtering? `--remediation-safety safe` (and equivalent MCP/package usage) must continue to mean exactly what it means today; the new levels are additive, not a breaking redefinition of `safe`. +- What happens when a rule's definition changes after a colocated/shared or personal analysis entry was captured for it? If that entry was produced automatically (no human review), its Fingerprint no longer matches, so it is treated as not found for that rule alone (FR-014) and falls back to automated analysis — unaffected rules elsewhere in the same shared file remain trusted. If that entry was assessed by a human (FR-020), it is still honored as-is (FR-021) — a human's judgement is not invalidated by a rule edit the same way an automated guess is — but a warning naming the rule and both fingerprints (old and current) is surfaced, so the discrepancy is visible rather than silent. +- What happens when only some rules in a ruleset have a pre-calculated, shared, or personal entry? Every rule still gets a classification (FR-015) — covered rules use the matching entry, the rest go through automated analysis as if no persisted analysis existed for them. +- What happens on a machine/environment where no persisted analysis exists yet for a ruleset that has never been analysed before (including the very first time anyone on a team uses it)? The system performs the existing automated analysis (Stages 1–2) and proceeds normally; persistence is an optimization and trust-building mechanism, never a precondition for producing output. +- What happens when a user wants to disagree with a colleague's shared classification for a rule, but doesn't have write access to the ruleset's location (e.g. it's a GitHub-hosted ruleset they can read but not push to) or doesn't want to change what the rest of the team sees? They persist a Personal Ruleset Analysis Override (FR-018) instead, which is honored for them locally without modifying the shared, colocated data. +- What happens when the ruleset is GitHub-hosted and a user persists a correction intended to be shared? The system does not push a commit to the remote location automatically (FR-019); it can still produce the updated shared-analysis content for the user to commit themselves through their normal review process, and in the meantime the correction is honored locally as a Personal Ruleset Analysis Override. + +## Requirements *(mandatory)* + +### Functional Requirements + +- **FR-001**: The system MUST provide a ruleset analyser that, given a Spectral-compatible ruleset, produces for each rule a remediation safety level describing how safe it would be to automatically remediate violations of that rule. +- **FR-002**: The remediation safety level produced for each rule MUST be one of exactly three values: `safe`, `humanreview`, or `unsafe`. +- **FR-003**: The ruleset analyser MUST produce, for each rule, a risk level (`low`/`medium`/`high`, reflecting how likely the rule's minimal satisfying remediation is to touch consumer-facing contract surface) and a confidence level (`high`/`medium`/`low`) in that estimate, as two independent signals — not as a single combined value. The rule's remediation safety level (FR-002) MUST be a field in its own right, derived from these two signals via the decision matrix defined in `automated_remediation_safety_algorithm_spec.md`, not assigned directly by any analysis stage and not merely a relabeling of the risk level. +- **FR-004**: The ruleset analyser's classification logic MUST be implemented in alignment with a new specification document, `automated_remediation_safety_algorithm_spec.md`, authored as part of this feature and stored alongside the existing algorithm specs (`specs/algorithms/`). +- **FR-005**: Remediation safety for a given violation MUST be calculated by looking up the remediation safety level (and risk/confidence signals) the ruleset analyser assigned to that violation's rule, rather than via the prior ad hoc rule-id-prefix/path heuristic. +- **FR-006**: The `--remediation-safety` CLI option (and equivalent MCP/package parameters) MUST accept all three levels: `safe`, `humanreview`, and `unsafe`. +- **FR-007**: Requesting `--remediation-safety safe` MUST produce output equivalent in scope to today's pre-feature behavior (no regression for existing users of the `safe` level). +- **FR-008**: Remediation safety information (remediation safety level per violation, and the rule-level risk and confidence signals behind it) MUST be included in both the JSON output and the human-readable output of every tool that currently reports remediation-safety/quick-fix information (CLI, MCP server tools, and any consuming packages such as the Backstage plugin where applicable). +- **FR-009**: When a violation's rule has no analyser result available at grading time, the system MUST default that violation to the most conservative remediation safety level (`unsafe`) rather than omitting it or failing. +- **FR-010**: All source code, tests, type/function/tool names, package metadata, and user-facing or contributor-facing documentation across the repository MUST be updated so that no "quick fix" terminology (in any casing or separator style) remains. +- **FR-011**: The ruleset analyser's per-rule results (risk level, confidence level, remediation safety level, and rationale) MUST be inspectable by users, in both JSON and human-readable form, independent of grading a specific API spec (i.e. "analyse this ruleset" is a capability in its own right, not only an internal implementation detail) — so a user can see not just the final classification but the two independent signals (FR-003) it was derived from, to judge whether they agree with it. +- **FR-012**: Before running the automated analysis stages, the system MUST check for a previously computed or pre-calculated ruleset analysis for the loaded ruleset and, when found, use it directly instead of recomputing from rule metadata. At minimum, the built-in ruleset MUST ship with such a pre-calculated analysis. +- **FR-013**: Users MUST be able to persist a correction to a rule's remediation safety level (and, implicitly, raise its confidence to reflect human confirmation) for a specific ruleset, such that the corrected classification is automatically loaded and used the next time that same ruleset is analysed or graded against, without requiring the correction to be re-entered. +- **FR-014**: The system MUST be able to recognize, for a given rule within a ruleset, whether a pre-calculated or persisted classification for that exact rule definition is still valid (i.e. the rule hasn't changed since the classification was captured). For a classification that was produced automatically (no human review), a stale entry for a changed rule MUST NOT be silently reused — it MUST be treated as not found for that rule, falling back to automated analysis, and MUST NOT prevent the other, unchanged rules' entries from being used. For a classification that a human explicitly assessed and persisted (FR-013/FR-020), staleness MUST NOT cause the entry to be discarded (FR-021). +- **FR-015**: When a persisted or pre-calculated analysis only covers some of the rules in the currently loaded ruleset (e.g. the ruleset gained rules since the analysis was captured), the system MUST still produce a complete classification for every rule (FR-001/SC-005) — covered rules use the persisted/pre-calculated entry, uncovered rules fall through to automated analysis. +- **FR-016**: The system MUST support storing a ruleset's persisted analysis **colocated with the ruleset itself**, using a deterministic naming convention derived from the ruleset's own location, so that (a) presence or absence of persisted data for a given ruleset can be determined by a direct lookup at that derived location rather than a separate index, and (b) the persisted data can be shared between colleagues simply by it living alongside the ruleset (e.g. committed in the same repository), rather than each person having to separately configure their own copy. +- **FR-017**: The colocated lookup (FR-016) MUST work uniformly whether the ruleset is supplied as a local file path or fetched from a remote/GitHub-hosted location, reusing the same resolution and authentication mechanism already used to fetch the ruleset itself. +- **FR-018**: In addition to the shared, colocated analysis (FR-016), a user MUST be able to persist a personal correction that does not modify the shared colocated data — for cases where they lack write access to the ruleset's location, or want to apply their own judgement locally without changing what their colleagues see. A personal correction MUST take precedence, for that user, over both the shared colocated analysis and the automated analysis stages for the rule(s) it covers. +- **FR-019**: When the ruleset's location is not writable by the system directly (e.g. a GitHub-hosted ruleset), the system MUST NOT automatically write or commit a correction back to that remote location. It MAY still read any existing colocated shared analysis there (FR-017), and MAY produce the content a user would need to commit themselves to update the shared analysis. +- **FR-020**: Every persisted or pre-calculated per-rule classification (shared colocated analysis, personal override, or bundled default — FR-012/FR-016/FR-018) MUST record whether it was assessed by a human (an explicit correction persisted via FR-013, including a maintainer's judgement for a well-known built-in rule) or produced automatically with no human review. This distinction MUST be inspectable wherever per-rule results are inspectable (FR-011). There is no separate hard-coded table of "known" classifications distinct from this persisted mechanism — the built-in ruleset's pre-curated classifications are persisted entries assessed by a human, like any other. +- **FR-021**: When a rule's definition changes after a human-assessed classification (FR-020) was captured for it, the system MUST continue to honor that classification rather than treating it as stale and falling back to automated analysis (contrast FR-014's handling of an automated classification under the same circumstance). The system MUST also surface a warning for that rule — in both JSON and human-readable output, at both the ruleset-analysis level (FR-011) and the per-violation level (FR-008) — identifying the rule and including both the fingerprint the classification was captured against and the rule's current fingerprint, so a user can tell the rule changed since a human last reviewed it even though the prior judgement is still being trusted. +- **FR-022**: A per-violation remediation-safety item (FR-008) MUST NOT drop any field a regular (non-filtered) diagnostic for the same violation would have carried — at minimum, its actual severity (`error`/`warn`/`info`/`hint`, not a fixed placeholder) and its source-location `range` (line/character). Filtering diagnostics down to one remediation-safety level and reshaping them with remediation-specific fields (`location`, `currentValue`, `expectedImprovement`) MUST be strictly additive to a regular diagnostic's fields, never lossy. +- **FR-023**: Every JSON document a CLI command prints for an end user to read (as opposed to an MCP tool response consumed by an AI agent) MUST be pretty-printed (human-legible, multi-line, indented) — consistent across the regular grade output, the remediation-safety-filtered output, and the ruleset-analysis output. MCP tool JSON responses are explicitly out of scope for this requirement, since they are optimized for token efficiency in an AI-agent context rather than direct human reading. +- **FR-024**: Per-violation remediation-safety information (FR-008) MUST be available on a *regular*, unfiltered grading request — not only when `--remediation-safety`/`level` is explicitly supplied. A user grading a spec with no remediation-safety filter applied MUST still be able to see, per diagnostic, the same `riskLevel`/`confidenceLevel`/`remediationSafetyLevel`/`staleFingerprintWarning` signals, in both JSON and human-readable output, so they do not need to make a second, filtered request just to learn how risky a finding is to fix. + +### Key Entities *(include if feature involves data)* + +- **Ruleset Analyser Result**: Per analysed ruleset, a collection of per-rule entries. Each entry references a rule id and carries the rule's risk level, confidence level, remediation safety level (a field in its own right, derived from the first two via a decision matrix — see FR-003), a short human-readable rationale for the assignment, and where that assignment came from (freshly computed, pre-calculated/bundled, or a persisted user correction). For freshly-computed entries, "risk level" reflects the analyser's inference of the *likely minimal remediation* for the rule and whether that remediation would touch consumer-facing contract surface — not the final safe/humanreview/unsafe classification itself. +- **Remediation Safety Level**: The rule's *final* classification — one of `safe`, `humanreview`, `unsafe` — describing how safe it is to automatically remediate a violation of a given rule without human review. For automated entries this is always derived from Risk Level and Confidence Level via a fixed decision matrix, never assigned directly, so that a rule's classification and the analyser's certainty in it remain independently visible (see Edge Cases). +- **Risk Level**: One of `low`, `medium`, `high` — the analyser's estimate, independent of its confidence, of how likely the rule's minimal satisfying remediation is to alter consumer-facing contract surface (paths/channels, parameters, request/response or message schemas, security) versus only low-impact metadata. Distinct field, distinct type, distinct values from Remediation Safety Level. +- **Confidence Level**: Describes how confident the analyser is in a rule's assigned Risk Level (e.g. driven by how well-known/recognizable the rule's id or function is versus how custom/ambiguous it is, or whether a human has explicitly confirmed it). A rule can be high-risk/low-confidence (e.g. a custom function targeting `$.paths`) or low-risk/high-confidence (e.g. a `truthy` check on `$.info.description`) — these are not the same axis, and the decision matrix treats them as such. +- **Remediation Safety (per violation)**: The remediation safety level applied to a specific violation found during grading, derived from the ruleset analyser's result for that violation's rule. +- **Rule Fingerprint**: A stable identifier for "this exact rule definition," derived from a rule's content (`given`, `then.function`, `severity`, `description`), used to detect whether a pre-calculated/persisted entry for that `ruleId` is still valid for the rule as currently defined (FR-014), independent of where the ruleset as a whole is stored or fetched from. A mismatch invalidates an automated entry but not a human-assessed one (FR-020/FR-021) — for the latter it instead produces a Fingerprint Mismatch Warning. +- **Assessment Origin**: Whether a per-rule classification was produced by a human (an explicit, persisted correction, including a maintainer's judgement for a well-known built-in rule) or automatically with no human review (FR-020). Governs whether a Rule Fingerprint mismatch invalidates the classification (automated) or merely produces a Fingerprint Mismatch Warning (human). +- **Fingerprint Mismatch Warning**: Produced when a human-assessed classification's stored Rule Fingerprint no longer matches the rule's current one (FR-021). Carries the rule id and both fingerprint values (the one the classification was captured against, and the rule's current one), and is surfaced wherever that rule's classification is shown — ruleset-analysis output and per-violation remediation output alike — without changing the classification itself. +- **Shared Ruleset Analysis**: A ruleset analysis (in full or in part) stored colocated with the ruleset itself via a deterministic naming convention (FR-016), readable by anyone who can read the ruleset (whether local file or GitHub-hosted, FR-017). This is the primary mechanism for a team/organisation to share one set of remediation-safety judgements instead of each person maintaining their own. +- **Personal Ruleset Analysis Override**: A user-local correction (FR-018) that takes precedence over the Shared Ruleset Analysis and the automated analysis stages for the rule(s) it covers, without modifying the shared, colocated data. + +## Success Criteria *(mandatory)* + +### Measurable Outcomes + +- **SC-001**: Users grading any spec can distinguish all three remediation-safety levels (`safe`, `humanreview`, `unsafe`) in both JSON and human output, for both the built-in ruleset and a supplied custom ruleset. +- **SC-002**: For the built-in ruleset, every rule has a documented risk level, confidence level, and remediation safety level traceable to the `automated_remediation_safety_algorithm_spec.md` specification. +- **SC-003**: A repository-wide search for "quick fix" (any casing/separator) returns zero matches after the feature is complete. +- **SC-004**: Existing `--remediation-safety safe` users observe no behavioral change in the set of violations returned, compared to before this feature. +- **SC-005**: For an arbitrary, previously-unseen custom ruleset, the analyser completes and returns a risk and confidence level for 100% of its rules (no rule left unclassified). +- **SC-006**: A user-corrected (human-assessed) remediation safety level for a rule in a given ruleset is honored (returned without re-running automated analysis for that rule) on a subsequent, separate invocation against the same rule definition, **and continues to be honored** even after that specific rule's definition changes — accompanied by a visible fingerprint-mismatch warning (FR-021) rather than being discarded. An automated (non-human-assessed) entry, by contrast, is no longer honored once the rule's definition changes. +- **SC-007**: The built-in ruleset's analysis is available without any per-rule automated computation having to run at request time (served from a pre-calculated/bundled result), for both the CLI and MCP surfaces. +- **SC-008**: Two different users pointed at the same ruleset location (local path or GitHub-hosted) see identical classifications for every rule covered by that ruleset's shared, colocated analysis, without either of them having separately configured it. +- **SC-009**: For every rule whose classification came from a human-assessed entry with a fingerprint mismatch, both the rule's stored fingerprint and its current fingerprint are visible in the output, in both JSON and human-readable form, at both the ruleset-analysis and per-violation surfaces. +- **SC-010**: A `--remediation-safety`-filtered remediation item for a given violation carries the same severity and line/character location as the unfiltered diagnostic for that same violation would — verifiable by grading the same spec with and without the filter and comparing the matching violations' `severity`/`range` fields. +- **SC-011**: Every CLI-printed JSON document (regular grade output, remediation-safety-filtered output, ruleset-analysis output, and config/error payloads) is valid multi-line, indented JSON — verifiable by confirming it is not a single line. +- **SC-012**: Grading a spec with no `--remediation-safety`/`level` filter still returns, per diagnostic, the same `riskLevel`/`confidenceLevel`/`remediationSafetyLevel`/`staleFingerprintWarning` fields a filtered request would show for that violation — in both JSON and human-readable output. + +## Assumptions + +- The three remediation safety levels (`safe`, `humanreview`, `unsafe`) and their relative ordering (in terms of caution) were fixed by Feature 11 and GOAL.md and are not renegotiated here. The `low`/`medium`/`high` risk level introduced by this feature (FR-003) is a separate, new scale, not a renaming or expansion of these three. +- "Confidence level" is assumed to be a small ordered set (e.g. high/medium/low) rather than a continuous numeric score, consistent with how grades and other diagnostics in this project favor discrete, explainable categories over raw scores; the exact scale is defined in `automated_remediation_safety_algorithm_spec.md` during planning. +- The ruleset analyser operates on rule definitions/metadata (id, applied path/schema patterns, severity, description) rather than on a corpus of historical remediation outcomes — there is no assumption of a training/feedback loop in this feature. +- "Rationale" per rule is a short, human-readable explanation (not a separate structured field requiring its own schema beyond a text string) sufficient for users to understand why a level was assigned. +- Backstage plugin packages are in scope for surfacing remediation safety only insofar as they already surface quick-fix/remediation-safety information today; if they do not yet do so, extending them is out of scope for this feature. +- This feature does not change how a custom ruleset is supplied (file path, GitHub PAT, etc.) — only how its rules are risk-classified once available. +- The primary persistence mechanism (FR-016/FR-017) colocates shared analysis data with the ruleset itself via a naming convention, rather than a separate per-user store — this is a deliberate choice to make sharing across a team/organisation the default, not an opt-in synchronization step. The existing workspace/global config scope (`RulesetConfig`/`RulesetResolution`) is retained, but narrowed to the Personal Ruleset Analysis Override role (FR-018) rather than being the primary persistence layer originally assumed. +- Rule Fingerprinting is computed from individual rule content (e.g. a hash of `given`/`then.function`/`severity`/`description` for that `ruleId`), not from the ruleset as a whole and not from the path/URL the ruleset was supplied with — this gives per-rule staleness detection (one changed rule doesn't invalidate an entire shared analysis file) and survives the ruleset being relocated or re-fetched from a mirror. +- The built-in ruleset's well-known, previously hard-coded classifications (e.g. "this rule id is typically `safe`") are not a separate code-level table — per direct user feedback, they are ordinary bundled persisted entries (FR-012/FR-020) assessed by a human (a maintainer), reusing the same persistence/fingerprinting/warning mechanism (FR-013–FR-021) as a correction any other user would persist for any other ruleset. +- The exact naming convention (FR-016) and file format are implementation details for planning, not fixed by this specification, but MUST satisfy: derivable from the ruleset's own path/URL alone (no separate index/registry to consult first), and human-readable/diffable enough to be code-reviewed when shared via a pull request. +- Automatic write-back to a remote/GitHub-hosted ruleset location (FR-019) is explicitly out of scope for this feature — sharing a correction to such a location is a human action (a commit/PR), which the tool may assist by producing the content but does not perform itself. +- "Persist a correction" (FR-013/FR-018) refers to the data being saved for reuse; *how* a user supplies that correction (a CLI flag, an MCP tool call, hand-editing the colocated/override file) is an implementation detail for planning, not fixed by this specification. diff --git a/specs/012-remediation-safety/tasks.md b/specs/012-remediation-safety/tasks.md new file mode 100644 index 0000000..f6bf0ca --- /dev/null +++ b/specs/012-remediation-safety/tasks.md @@ -0,0 +1,257 @@ +--- + +description: "Task list for Feature 12: Remediation Safety (Ruleset Analyser & Multi-Level Safety)" +--- + +# Tasks: Remediation Safety (Ruleset Analyser & Multi-Level Safety) + +**Input**: Design documents from `/specs/012-remediation-safety/` + +**Prerequisites**: plan.md, spec.md, research.md, data-model.md, contracts/remediation-safety-surfaces.md, quickstart.md, `specs/algorithms/automated_remediation_safety_algorithm_spec.md` *(authored during the `/speckit-plan` phase — not a task output; see plan.md Scale/Scope note)* + +**Tests**: Included — Constitution Principle IV (Test-Driven Quality) and plan.md's Technical Context both mandate test coverage written alongside this feature's implementation. + +**Organization**: Tasks are grouped by user story (spec.md priorities P1/P2/P3) to enable independent implementation and testing of each story. + +## Format: `[ID] [P?] [Story] Description` + +- **[P]**: Can run in parallel (different files, no dependencies) +- **[Story]**: Which user story this task belongs to (US1/US2/US3) +- Exact file paths are given in every description + +--- + +## Phase 1: Setup + +No new project scaffolding is required — this feature extends the existing `packages/api-grade-core`, `packages/api-grade-mcp`, and `src/cli` workspaces. Setup work is folded into Phase 2 (the rename/replacement of `quick-fixes.ts` is itself the foundational setup for this feature). + +--- + +## Phase 2: Foundational (Blocking Prerequisites) + +**Purpose**: The ruleset analyser engine (Stages 1–2 of `automated_remediation_safety_algorithm_spec.md`) and its supporting types. Both User Story 1 (filtering) and User Story 2 (inspection) call the same `analyseRuleset()`/`getRemediationSafety()` functions, so this must exist before either story's surfaces can be built. + +**⚠️ CRITICAL**: No user story work can begin until this phase is complete. + +- [X] T001 Replace `ViolationClass`/`QuickFix`/`QuickFixOutput` with the new type set in `packages/api-grade-core/src/types.ts`: add `RemediationSafetyLevel` (`"safe"|"humanreview"|"unsafe"`), `RiskLevel` (`"low"|"medium"|"high"`), `ConfidenceLevel` (`"high"|"medium"|"low"`), `AssessmentOrigin` (`"human"|"automated"`), `AnalysisSource` (`"persisted"|"bundled-default"|"heuristic"|"fallback"`), `RuleAnalysis`, `RulesetAnalysis`, `RemediationItem` (was `QuickFix`, with new `riskLevel`/`confidenceLevel`/`remediationSafetyLevel`/`staleFingerprintWarning` fields), `RemediationSafetyOutput` (was `QuickFixOutput`, with `remediationItemCount`/`remediationItems`/`requestedLevel`) — per data-model.md +- [X] T002 [P] Write failing unit tests for `analyseRuleset()` Stage 1/2 heuristics in `packages/api-grade-core/tests/unit/remediation-safety.test.ts` (new file, replaces `quick-fixes.test.ts`): key-selector check (1a), additive/rename/custom function-mechanics classification (1b), generic segment fallback (1c), Stage 2 whole-document fallback, and the decision matrix table from research.md §3 — depends on T001 +- [X] T003 Implement `analyseRuleset(loadedRuleset: LoadedRuleset): RulesetAnalysis` in `packages/api-grade-core/src/remediation-safety.ts` (new file) implementing Stages 1–2 of `specs/algorithms/automated_remediation_safety_algorithm_spec.md` (key-selector check, function-mechanics classification with extended AsyncAPI segment tiers, generic segment fallback, whole-document fallback, and the shared decision matrix) so T002 passes — depends on T002 +- [X] T004 Implement `getRemediationSafety(diagnostic: Diagnostic, rulesetAnalysis: RulesetAnalysis): { riskLevel, confidenceLevel, remediationSafetyLevel, staleFingerprintWarning }` in `packages/api-grade-core/src/remediation-safety.ts`, including the FR-009 lookup-miss default (`riskLevel: "high"`, `confidenceLevel: "low"`, `remediationSafetyLevel: "unsafe"`) — depends on T003 +- [X] T005 Implement `buildRemediationItem()`, `buildRemediationSafetyOutput()`, `formatRemediationSafetyHuman()` in `packages/api-grade-core/src/remediation-safety.ts`, replacing `buildQuickFix()`/`buildQuickFixOutput()`/`formatQuickFixesHuman()` — filters by `remediationSafetyLevel` against a requested level, preserving FR-007 (`safe` membership unchanged) — depends on T004 +- [X] T006 Delete `packages/api-grade-core/src/quick-fixes.ts` and `packages/api-grade-core/tests/unit/quick-fixes.test.ts`; update `packages/api-grade-core/src/index.ts` to remove the `quick-fixes.js` export line and the `QuickFix`/`ViolationClass`/`QuickFixOutput` type exports, replacing them with `analyseRuleset`, `getRemediationSafety`, `buildRemediationItem`, `buildRemediationSafetyOutput`, `formatRemediationSafetyHuman` and the new types from T001 — depends on T005 + +- [X] T006b Fix Stage 2b field-override rule in `packages/api-grade-core/src/remediation-safety.ts` (`stage1b`): for additive and existence-check functions, when `then.field` resolves exclusively to `SAFE_SEGMENTS` (e.g. `description`, `summary`), use the field-only tier as the effective tier instead of the combined path+field tiers — prevents parent path segments like `operations` or `parameters` from escalating the risk of adding a documentation-only field. Update `specs/algorithms/automated_remediation_safety_algorithm_spec.md` Stage 2b to document the `field_is_exclusively_safe` rule and the `classify_by_function` override. Correct the two affected bundled analysis entries: `asyncapi-3-operation-description` (was medium/medium/humanreview, now low/high/safe) and `asyncapi-parameter-description` (was high/medium/unsafe, now low/high/safe). Add three unit tests covering: (a) additive + safe field in humanreview parent, (b) additive + safe field in unsafe parent, (c) additive + humanreview field is NOT overridden — depends on T003 + +**Checkpoint**: `analyseRuleset()`/`getRemediationSafety()` exist, are unit-tested, and are exported from `@dawmatt/api-grade-core`. User story work can now begin. + +--- + +## Phase 3: User Story 1 - Developer sees a risk-graded remediation plan (Priority: P1) 🎯 MVP + +**Goal**: `--remediation-safety` (CLI) and the `grade-api-remediation-safety` MCP tool accept and filter on all three levels (`safe`/`humanreview`/`unsafe`), with `riskLevel`/`confidenceLevel`/`remediationSafetyLevel`/`staleFingerprintWarning` visible on every returned item, in both JSON and human output. + +**Independent Test**: Grade a sample spec with violations spanning all three categories; confirm `--remediation-safety safe|humanreview|unsafe` each return the expected, correctly-labeled subset, and `safe` output is unchanged from pre-feature behavior. + +### Tests for User Story 1 + +- [X] T007 [P] [US1] Integration test for the extended `--remediation-safety` CLI flag (accepts `safe`/`humanreview`/`unsafe`, rejects other values with the 3-value error message, `safe` membership unchanged) in `tests/integration/cli-remediation-safety.test.ts` (new file, replaces `tests/integration/cli-quick-fixes.test.ts`) +- [X] T008 [P] [US1] Integration test for the `grade-api-remediation-safety` MCP tool's extended `level` enum and `RemediationSafetyOutput` response shape (`remediationItemCount`, `remediationItems`, `requestedLevel`, per-item `riskLevel`/`confidenceLevel`/`remediationSafetyLevel`/`staleFingerprintWarning`) in `packages/api-grade-mcp/tests/integration/remediation-safety.test.ts` (new file, replaces `packages/api-grade-mcp/tests/integration/quick-fixes-only.test.ts`) + +### Implementation for User Story 1 + +- [X] T009 [US1] Update `src/cli/index.ts`: extend the `--remediation-safety ` option to accept `safe|humanreview|unsafe`, update the rejection error message to `Error: --remediation-safety must be one of: safe, humanreview, unsafe.`, load the ruleset via `analyseRuleset()`, and call `buildRemediationSafetyOutput()`/`formatRemediationSafetyHuman()` in place of the removed `buildQuickFixOutput()`/`formatQuickFixesHuman()` (lines ~14-15, ~80, ~116, ~183-184) — depends on T006, makes T007 pass +- [X] T010 [US1] Rename `packages/api-grade-mcp/src/tools/quick-fixes-only.ts` to `packages/api-grade-mcp/src/tools/remediation-safety.ts`: rename `registerQuickFixesOnlyTool` to `registerRemediationSafetyTool`, extend the `level` Zod enum to `['safe', 'humanreview', 'unsafe']`, call `analyseRuleset()` + `buildRemediationSafetyOutput()` instead of `buildQuickFixOutput()`, and update the tool description per `contracts/remediation-safety-surfaces.md` (mention all three levels and the confidence indicator) — depends on T006, makes T008 pass +- [X] T011 [US1] Update `packages/api-grade-mcp/src/server.ts`: replace the `registerQuickFixesOnlyTool` import/registration (lines 8, 30) with `registerRemediationSafetyTool` from `./tools/remediation-safety.js` — depends on T010 +- [X] T012 [US1] Update `packages/api-grade-mcp/src/utils/classify.ts`: replace the `classifyViolation`/`buildQuickFix`/`QuickFix`/`ViolationClass` re-exports with `analyseRuleset`/`getRemediationSafety` and the `RuleAnalysis`/`RemediationSafetyLevel`/`RiskLevel`/`ConfidenceLevel` type re-exports from `@dawmatt/api-grade-core` — depends on T006 + +**Checkpoint**: `--remediation-safety`/`grade-api-remediation-safety` fully support all three levels end-to-end (CLI + MCP, JSON + human), with `safe` behavior unchanged (FR-007, SC-001, SC-004). + +--- + +## Phase 4: User Story 2 - Ruleset maintainer trusts the analyser via confidence + persistence (Priority: P2) + +**Goal**: The analyser's per-rule output (risk, confidence, remediation safety, rationale, `assessedBy`) is inspectable independent of grading (`ruleset-analysis` CLI subcommand, `analyse-ruleset-safety` MCP tool); classifications can be persisted (bundled default, shared colocated, personal override) and reloaded automatically, with fingerprint-staleness handling that honors human-assessed entries across rule edits. + +**Independent Test**: Run the ruleset analyser against the built-in ruleset and a custom ruleset with unrecognizable rules; confirm every rule gets risk/confidence/safety/rationale, low-confidence on unrecognized rules, and that persisting a correction is honored — including after the rule's definition changes, with a visible fingerprint-mismatch warning. + +### Implementation for User Story 2 + +- [X] T013 [P] [US2] Implement `RuleFingerprint` computation (hash over a rule's `ruleId`/`given`/`then.function`/`severity`/`description`) in `packages/api-grade-core/src/remediation-safety.ts` per data-model.md `RuleFingerprint` — depends on T006 +- [X] T014 [P] [US2] Implement colocated `SharedRulesetAnalysis` read/write for local rulesets (deterministic filename derived from the ruleset's own path, e.g. sibling file with a fixed suffix) in `packages/api-grade-core/src/config/shared-ruleset-analysis.ts` (new file) per data-model.md `SharedRulesetAnalysis` — depends on T013 +- [X] T015 [US2] Extend `packages/api-grade-core/src/config/shared-ruleset-analysis.ts` to read (never write) the colocated `SharedRulesetAnalysis` for a GitHub-hosted ruleset by reusing `resolveRuleset`/`fetchRulesetContent` (`packages/api-grade-core/src/config/resolve-ruleset.ts`, `packages/api-grade-core/src/auth/github.ts`) with the same `AuthConfig` already supplied for the ruleset itself (FR-017, FR-019) — depends on T014 +- [X] T016 [P] [US2] Implement `PersonalRulesetAnalysisOverride` storage (workspace/global scope, same precedence as `RulesetConfig`) in `packages/api-grade-core/src/config/personal-ruleset-override.ts` (new file), reusing the `loadConfig`/`saveConfig`/`getWorkspaceConfigPath`/`getGlobalConfigPath` pattern from `packages/api-grade-core/src/config/ruleset-config.ts` — depends on T013 +- [X] T017 [US2] Author `BundledRulesetAnalysis` for the built-in OpenAPI and AsyncAPI rulesets in `packages/api-grade-core/src/rulesets/bundled-analysis/openapi.json` and `.../asyncapi.json` (new files): migrate the former `RULE_ID_NON_BREAKING_PREFIXES`-style curated mappings (e.g. `operation-description` → `safe`, `operation-operationId` → `humanreview`) into `assessedBy: "human"` entries with maintainer-authored rationale, per research.md §3/§8 and FR-012/FR-020 — depends on T013 +- [X] T018 [US2] Wire Stage 0 lookup precedence into `analyseRuleset()` in `packages/api-grade-core/src/remediation-safety.ts`: workspace `PersonalRulesetAnalysisOverride` → global `PersonalRulesetAnalysisOverride` → colocated `SharedRulesetAnalysis` → `BundledRulesetAnalysis` (built-in ruleset only) → fall through to Stages 1–2; implement fingerprint-staleness handling (an `assessedBy: "automated"` entry with a stale fingerprint is treated as not found and falls through; an `assessedBy: "human"` entry with a stale fingerprint is still used, with `staleFingerprintWarning` populated) — depends on T014, T015, T016, T017 +- [X] T019 [US2] Implement persisting a correction (FR-013/FR-018/FR-019) — a function that writes an `assessedBy: "human"`, `confidenceLevel: "high"` entry to the colocated `SharedRulesetAnalysis` for a writable/local ruleset, or to the workspace-scoped `PersonalRulesetAnalysisOverride` when the ruleset's location is not writable (e.g. GitHub-hosted) — in `packages/api-grade-core/src/remediation-safety.ts` — depends on T018 +- [X] T020 [P] [US2] Update `packages/api-grade-core/src/index.ts` to export the new Stage 0/persistence symbols and types from T013-T019 (`SharedRulesetAnalysis`, `PersonalRulesetAnalysisOverride`, `BundledRulesetAnalysis`, `RuleFingerprint`, `AssessmentOrigin`, `AnalysisSource`, and the persist-correction function) — depends on T019 +- [X] T021 [P] [US2] Unit tests for Stage 0 precedence, fingerprint staleness (automated-discarded vs. human-honored-with-warning), and persisting a correction in `packages/api-grade-core/tests/unit/remediation-safety.test.ts` (extends T002's file) covering SC-006, SC-008, SC-009 — depends on T020 +- [X] T022 [P] [US2] New CLI subcommand `ruleset-analysis [--ruleset-path ] [--format json|human]` in `src/cli/ruleset-analysis-cli.ts` (new file, mirrors `src/cli/ruleset-config-cli.ts`), registered in `src/cli/index.ts`; `--format human` prints rule id, risk level, confidence level, remediation safety level, assessed by, rationale, and any fingerprint-mismatch warning per quickstart.md §2 — depends on T020 +- [X] T023 [P] [US2] Add a `correct` action to `src/cli/ruleset-analysis-cli.ts` for persisting a correction (FR-013), e.g. `api-grade ruleset-analysis correct --rule-id --level [--ruleset-path ]`, calling the persist function from T019 — depends on T019, T022 +- [X] T024 [P] [US2] New MCP tool `analyse-ruleset-safety` in `packages/api-grade-mcp/src/tools/analyse-ruleset-safety.ts` (new file, mirrors `packages/api-grade-mcp/src/tools/get-ruleset-config.ts`), input `{ rulesetPath?: string, recoveryOption?: ... }`, output a `RulesetAnalysis` JSON document, reusing the `resolveRuleset`/`RulesetAuthError`/`mcpError` flow already used by `grade-api-remediation-safety`; register it in `packages/api-grade-mcp/src/server.ts` — depends on T020 +- [X] T025 [P] [US2] Integration test for the `ruleset-analysis` CLI subcommand (human + json format, fingerprint-mismatch warning display, `correct` action) in `tests/integration/cli-remediation-safety.test.ts` — depends on T022, T023 +- [X] T026 [P] [US2] Integration test for the `analyse-ruleset-safety` MCP tool in `packages/api-grade-mcp/tests/integration/analyse-ruleset-safety.test.ts` (new file) — depends on T024 +- [X] T027 [US2] Verify `staleFingerprintWarning` is threaded through `RemediationItem`/`RemediationSafetyOutput` (built in T004/T005) into the CLI (`--remediation-safety`, T009) and MCP (`grade-api-remediation-safety`, T010) human + JSON output, satisfying FR-021/SC-009 at the per-violation surface, not just the ruleset-analysis surface — depends on T018, T009, T010 + +**Checkpoint**: `ruleset-analysis`/`analyse-ruleset-safety` expose full per-rule analysis with confidence and provenance; persisted corrections (shared, personal, bundled) are loaded automatically and survive rule edits when human-assessed (FR-011 through FR-021, SC-002, SC-005 through SC-009). + +--- + +## Phase 5: User Story 3 - "Quick fixes" terminology fully removed (Priority: P3) + +**Goal**: No source, test, type/function/tool name, package metadata, or current documentation references "quick fix" in any casing/separator style (historical `CHANGELOG.md`/`GOAL.md` entries excluded). + +**Independent Test**: `grep -rniE "quick.?fix"` across the repository (excluding historical changelog/goal entries) returns zero matches. + +### Implementation for User Story 3 + +- [X] T028 [P] [US3] Update `docs/cli/commands.md`: document the 3-level `--remediation-safety` reference and the new `ruleset-analysis` subcommand +- [X] T029 [P] [US3] Update `docs/mcp/quick-start.md`: document the renamed/extended `grade-api-remediation-safety` tool and the new `analyse-ruleset-safety` tool +- [X] T030 [P] [US3] Update `docs/package/api-grade-mcp.md`: tool reference updates for both tools above +- [X] T031 [P] [US3] Update `docs/package/README.md`: remove remaining "quick fix" mentions +- [X] T032 [P] [US3] Update `docs/package/api-reference.md`: document the new core API (`analyseRuleset`, `getRemediationSafety`, `RuleAnalysis`, `RulesetAnalysis`, `RemediationItem`, `RemediationSafetyOutput`, etc.) in place of the removed `QuickFix`/`QuickFixOutput`/`classifyViolation` +- [X] T033 [P] [US3] Update `docs/index.md`: remove remaining "quick fix" mentions +- [X] T034 [P] [US3] Update `docs/getting-started.md`: update the tool list mention +- [X] T035 [P] [US3] Update `packages/api-grade-mcp/README.md`: tool table update (`grade-api-remediation-safety`, `analyse-ruleset-safety`) +- [X] T036 [P] [US3] Update `CONTRIBUTING.md`: correct the package/tool table entry that still names the pre-Feature-11 tool +- [X] T037 [US3] Run `grep -rniE "quick.?fix" --include="*.ts" --include="*.md" src/ packages/api-grade-core/src packages/api-grade-mcp/src packages/api-grade-core/tests packages/api-grade-mcp/tests tests/ docs/ packages/api-grade-mcp/README.md CONTRIBUTING.md` (per quickstart.md §4) and fix any remaining matches until it returns zero (SC-003) — depends on T009-T036 + +**Checkpoint**: SC-003 satisfied — zero "quick fix" references remain anywhere in current source, tests, or documentation. + +--- + +## Phase 6: Polish & Cross-Cutting Concerns + +**Purpose**: Final validation across all three stories. + +- [X] T038 [P] Add a new `CHANGELOG.md` entry for this feature (do not modify historical entries) +- [X] T039 Run `vitest run` across all workspaces, `tsc --noEmit`, and lint; fix any failures +- [X] T040 Manually walk through `quickstart.md` end-to-end (all 4 sections) against a real local ruleset and a GitHub-hosted ruleset to confirm SC-001 through SC-009 +- [ ] T057 Verify `analyse-ruleset-safety` and the extended `grade-api-remediation-safety` function correctly when invoked from **Claude Code** (CLI) and **GitHub Copilot VS Code Agent mode** — the two primary MCP targets required by the constitution's AI Integration Requirements. At minimum: call `analyse-ruleset-safety` (no args, built-in ruleset) and `grade-api-remediation-safety` (`level: "humanreview"`) from each client and confirm a valid response is returned with no tool-call errors. Record pass/fail per client in a brief comment on this task before marking `[X]`. + - **Claude Code** ✅ PASS (2026-06-27) — `analyse-ruleset-safety` returned full `RulesetAnalysis` (57 rules, all with `riskLevel`/`confidenceLevel`/`remediationSafetyLevel`/`assessedBy`/`rationale`/`staleFingerprintWarning`). `grade-api-remediation-safety` (`level: "humanreview"`, `poor-quality.yaml`) returned 2 items each with `severity`, `range`, and all safety fields populated correctly. No tool-call errors. + - **GitHub Copilot VS Code Agent mode** — ⏳ PENDING manual verification by user. + +--- + +## Phase 7: Output-shape regression fix (post-T040) + +**Purpose**: T009's initial `--remediation-safety` implementation regressed the output shape +relative to the regular `--format json`/`--format human` diagnostics: every `RemediationItem` +reported `severity: "warn"` regardless of actual severity (a stale numeric-severity assumption +left over from before `Diagnostic.severity` became a string enum), `range` was dropped +entirely, and the safety JSON/`ruleset-analysis` JSON output regressed to compact (non-pretty) +formatting, unlike the main `formatJson()` output. Separately, regular (unfiltered) grading +output never surfaced the per-violation safety signals this feature computes, so a user had to +make a second, `--remediation-safety`-filtered request just to see how risky a finding was to +fix. See `data-model.md` "Output formatting contract (all surfaces)" and "DiagnosticWithSafety" +sections for the corrected contract. + +- [X] T041 Fix `buildRemediationItem()` in `packages/api-grade-core/src/remediation-safety.ts`: assign `severity: diagnostic.severity` directly (remove the `SEVERITY_LABELS`/numeric-severity lookup) and add `range: diagnostic.range` to the returned `RemediationItem` — depends on T005 +- [X] T042 Add `range: Diagnostic['range']` to the `RemediationItem` type (`packages/api-grade-core/src/types.ts`) and render it (`Line N`) in `formatRemediationSafetyHuman()` — depends on T041 +- [X] T043 Pretty-print every CLI-printed JSON document with `JSON.stringify(value, null, 2)`: `buildRemediationSafetyOutput()`/`ruleset-analysis`/`ruleset-analysis correct` output in `src/cli/index.ts` and `src/cli/ruleset-analysis-cli.ts` (and, for consistency across all CLI JSON output, `src/cli/ruleset-config-cli.ts`'s `config`/`set-ruleset`/`get-ruleset` JSON payloads) — MCP tool JSON responses are explicitly exempt (kept compact for AI-agent token efficiency) — depends on T041 +- [X] T044 Add `DiagnosticWithSafety` to `packages/api-grade-core/src/types.ts` (extends `Diagnostic` with `riskLevel`/`confidenceLevel`/`remediationSafetyLevel`/`staleFingerprintWarning`); extend `buildCommonGradeOutput()` (`json-output.ts`), `formatJson()`, and `formatHuman()` (`formatter.ts`) to accept an optional `rulesetAnalysis` and, when supplied, decorate each diagnostic via `getRemediationSafety()` — depends on T004 +- [X] T045 Wire `src/cli/index.ts`'s default (non-`--remediation-safety`) `--format json`/`--format human` path to always compute `rulesetAnalysis` (via the same `loadRuleset()`/`analyseRuleset()` call already made for `--remediation-safety`) and pass it to `formatJson()`/`formatHuman()`, so regular grading output always includes per-violation safety info — depends on T044 +- [X] T046 Update `tests/integration/cli-json-output.test.ts`'s `--min-grade --format json` test to parse multiple back-to-back pretty-printed JSON documents from stdout (brace-depth splitting) instead of assuming one compact JSON object per line — depends on T043 +- [X] T047 [P] Update docs to match: `docs/cli/commands.md` (pretty-print note, `range`/safety fields in both JSON Output Schema and Remediation Safety examples), `docs/package/api-reference.md` (`formatJson`/`formatHuman`/`buildCommonGradeOutput` signatures, `DiagnosticWithSafety`, `RemediationItem.range`/`severity` correction), `docs/package/api-grade-mcp.md` (`grade-api-remediation-safety` description mentions `severity`/`range`), `data-model.md` (this file) — depends on T041-T045 + +**Checkpoint**: `vitest run` (all workspaces) and `tsc --noEmit` pass; a manual CLI run confirms `severity` reflects true diagnostic severity, `range`/line numbers appear in `--remediation-safety` output, all CLI JSON is pretty-printed, and regular (non-filtered) `--format json`/`--format human` output includes per-diagnostic `riskLevel`/`confidenceLevel`/`remediationSafetyLevel`/`staleFingerprintWarning`. + +--- + +--- + +## Phase 8: Heuristic correctness — `then.field "@key"` on paths/channels (post-Phase 7) + +**Purpose**: Stage 2a of the heuristic only recognised the JSONPath `~` key-selector form (e.g. +`$.channels[*]~`) as targeting path/channel keys. Spectral's built-in rulesets also express the +same semantics via `then.field: "@key"` on `given: "$.channels"` or `given: "$.paths"` (the +function-based equivalent). Without this check, those rules were falling into Stage 1b's +`pattern`/`casing` default logic and receiving `medium/high` risk (humanreview) instead of the +correct `high/high` (unsafe) — e.g. `asyncapi-channel-no-empty-parameter` and its siblings, and +the OpenAPI `path-keys-no-trailing-slash` / `path-not-include-query` / +`path-declarations-must-exist` rules. + +- [X] T048 Extend Stage 1a in `packages/api-grade-core/src/remediation-safety.ts`: add `fieldNamesOf()` helper (raw `then.field` strings, not tokenized), pass field names to `stage1a()`, and check for `then.field: "@key"` on a `given` that tokenizes to include `"paths"` or `"channels"` — returning `high/high/unsafe` with the key-selector-equivalent rationale; update `classifyRuleStages1And2()` to pass `fieldNames` — depends on T006 +- [X] T049 [P] Add three unit tests to `packages/api-grade-core/tests/unit/remediation-safety.test.ts` in the Stage 1a describe block: `$.channels` + `field: "@key"` → `high/high/unsafe`; `$.paths` + `field: "@key"` → `high/high/unsafe`; `$.components.schemas` + `field: "@key"` → still `medium` (control, no paths/channels) — depends on T048 +- [X] T050 [P] Update `specs/algorithms/automated_remediation_safety_algorithm_spec.md` Stage 2a: document both the `~` and the `@key` checks, add rationale for why `@key` carries identical risk in AsyncAPI 2.x (channel key IS the routing address) and OpenAPI (path key IS the route); update the Example table to include `asyncapi-channel-no-empty-parameter` — depends on T048 +- [X] T051 Rebuild `packages/api-grade-core` (`npm run build`) and regenerate the bundled analysis (`node scripts/generate-bundled-analysis.mjs`): 6 AsyncAPI 2.x channel rules and 3 OpenAPI path-key rules are upgraded from `medium/high/humanreview` to `high/high/unsafe` in `src/rulesets/bundled-analysis/{asyncapi,openapi}.json` — depends on T048, T050 + +**Checkpoint**: `vitest run` (all workspaces) passes; bundled analysis reflects corrected `@key` classifications for all 9 affected rules. + +--- + +## Phase 9: Heuristic correctness — `pattern` `notMatch`-only is existence check, not rename (post-Phase 8) + +**Purpose**: Stage 1b classified all `pattern` uses as "rename/reformat", defaulting to `medium` risk. But `pattern` with `notMatch`-only in `functionOptions` is semantically an existence/validity check (closer to `falsy`/`truthy`) — the fix adds content, not reformats it. This produced accurate risk levels for the built-in rulesets (target tiers dominate) but incorrect rationale text ("rename/reformat" for emptiness checks like `notMatch: '{}'`). It also mis-classified custom `pattern`+`notMatch` rules on SAFE_SEGMENTS targets as `medium` (rename default) instead of `low` (additive). + +- [X] T052 Add `functionOptions` field to `SpectralThen` interface in `packages/api-grade-core/src/remediation-safety.ts`; add `isPatternExistenceCheck()` helper (true when any `then.function: "pattern"` has `notMatch` in `functionOptions` and no `match`) — depends on T048 +- [X] T053 Update `stage1b()` in `packages/api-grade-core/src/remediation-safety.ts`: add `patternIsExistenceCheck` parameter; for `pattern` when that flag is set, apply additive-style tier escalation with a conservative `medium` fallback on empty tiers (unknown target); update `classifyRuleStages1And2()` to compute and pass the flag — depends on T052 +- [X] T054 [P] Add five unit tests to `packages/api-grade-core/tests/unit/remediation-safety.test.ts` in the Stage 1b describe block: `pattern`+`match` → rename rationale; `pattern`+`notMatch` on unsafe segment → high/unsafe with existence-check rationale; `pattern`+`notMatch` on safe segment → low/safe; `pattern`+`notMatch` on unknown target → conservative medium; `pattern`+both `match`+`notMatch` → rename rationale — depends on T053 +- [X] T055 [P] Update `specs/algorithms/automated_remediation_safety_algorithm_spec.md` Stage 2: add Stage 2a(ii) documenting the `pattern` function-mode distinction (`notMatch`-only vs `match`/no-options) with rationale; no risk-level changes in built-in rulesets (tier lookup dominates), but rationale text and custom-rule handling are corrected — depends on T053 +- [X] T056 Rebuild `packages/api-grade-core` and regenerate bundled analysis: rationale text updated for all `notMatch`-only `pattern` rules (no risk-level changes); risk levels confirmed stable via test suite — depends on T053, T055 + +**Checkpoint**: `vitest run` (all workspaces) passes (376 tests); bundled analysis shows "existence/validity check" rationale for `notMatch`-only `pattern` rules; risk levels unchanged. + +--- + +## Dependencies & Execution Order + +### Phase Dependencies + +- **Foundational (Phase 2)**: No dependencies — start immediately. BLOCKS all user stories. +- **User Story 1 (Phase 3)**: Depends on Foundational completion only. +- **User Story 2 (Phase 4)**: Depends on Foundational completion only — independently testable from US1, though T027 also touches US1's CLI/MCP surfaces to thread `staleFingerprintWarning` through. +- **User Story 3 (Phase 5)**: Depends on US1 and US2 implementation tasks being complete (T009–T027) so the docs/grep sweep has nothing left to rename. +- **Polish (Phase 6)**: Depends on all prior phases. + +### Within Each Phase + +- Tests before the implementation tasks they validate (T002 before T003; T007/T008 before T009-T012; T021 before nothing further, written test-after here since it covers integration of T013-T020). +- Types (T001) before engine (T003) before lookup (T004) before output builders (T005) before deletion/export cleanup (T006). + +### Parallel Opportunities + +- T002 has no code dependency on other Phase 2 tasks besides T001, but is sequenced before T003 (TDD). +- Within US1: T007 and T008 in parallel; T009-T012 are mostly sequential (T011 depends on T010; T012 is independent of T009-T011 and can run in parallel with them). +- Within US2: T013, T016 in parallel; T014→T015 sequential; T017 parallel with T013-T016; T020-T026 have mixed [P] markers as marked above. +- All of US3 (T028-T036) can run in parallel — different files; T037 must run last. + +--- + +## Parallel Example: Foundational Phase + +```bash +# T002 depends on T001 only: +Task: "Write failing unit tests for analyseRuleset() in packages/api-grade-core/tests/unit/remediation-safety.test.ts" +``` + +## Parallel Example: User Story 3 + +```bash +# All documentation files are independent — launch together: +Task: "Update docs/cli/commands.md" +Task: "Update docs/mcp/quick-start.md" +Task: "Update docs/package/api-grade-mcp.md" +Task: "Update docs/package/README.md" +Task: "Update docs/package/api-reference.md" +Task: "Update docs/index.md" +Task: "Update docs/getting-started.md" +Task: "Update packages/api-grade-mcp/README.md" +Task: "Update CONTRIBUTING.md" +``` + +--- + +## Implementation Strategy + +### MVP First (User Story 1 Only) + +1. Complete Phase 2: Foundational (analyser engine, types, rename). +2. Complete Phase 3: User Story 1 — three-level filtering in CLI + MCP. +3. **STOP and VALIDATE**: Run `tests/integration/cli-remediation-safety.test.ts` and `packages/api-grade-mcp/tests/integration/remediation-safety.test.ts`; confirm `safe` output is byte-for-byte unchanged from pre-feature behavior (SC-004). +4. This is a usable, demoable increment: developers can already triage by all three levels, even before confidence/persistence/inspection (US2) or the terminology cleanup (US3) land. + +### Incremental Delivery + +1. Foundational → US1 (MVP, three-level filtering) → US2 (confidence, inspection, persistence) → US3 (terminology cleanup) → Polish. +2. US2 can be developed in parallel with US1 by a second contributor once Foundational is done, since both consume but don't modify each other's surfaces — except T027, which touches US1's files and must land after both T009/T010 (US1) and T018 (US2) exist. +3. US3 is intentionally last: it depends on the renamed/new surfaces from US1/US2 actually existing before the documentation and grep sweep can be final. diff --git a/specs/algorithms/automated_remediation_safety_algorithm_spec.md b/specs/algorithms/automated_remediation_safety_algorithm_spec.md new file mode 100644 index 0000000..c024702 --- /dev/null +++ b/specs/algorithms/automated_remediation_safety_algorithm_spec.md @@ -0,0 +1,357 @@ +# Automated Remediation Safety Algorithm Specification + +**Version:** 1.0.0 | **Scope:** Spectral-compatible rulesets (OpenAPI 3.0+, AsyncAPI 3.0+) + +--- + +## Overview + +Determines, for every **rule** in a loaded ruleset, how risky it would be to automatically apply a fix for any violation of that rule (`riskLevel`), and how confident that determination is (`confidenceLevel`). Runs once per loaded ruleset (the "ruleset analyser"), independent of grading any specific API spec. A violation's remediation safety is then a cached lookup against this per-rule result, not a fresh computation. + +This algorithm supersedes the two-class `classifyViolation()` algorithm described in [`quick_fixes_algorithm_spec.md`](./quick_fixes_algorithm_spec.md), extending it from a binary `nonBreaking`/`breaking` split (with `unknown` as an exclusion bucket) to three first-class risk levels with an explicit confidence dimension. It consumes rule **metadata** (`ruleId`, the rule's `given` JSONPath expression(s), `then.function`) from a loaded Spectral ruleset object — it does not consume `Diagnostic[]` directly; diagnostics are matched to their rule's pre-computed result by `ruleId` when remediation safety is needed for a grading run. + +Before running automated classification, the analyser first checks, per rule, for a previously persisted or pre-calculated analysis (Stage 0): a baseline bundled for the built-in ruleset, a **Shared Ruleset Analysis** colocated with the ruleset itself so a team shares one set of judgements automatically, and a personal override layer for individual corrections. This means the same ruleset is not re-estimated, and re-reviewed, from scratch on every run — or by every person who uses it. + +--- + +## Risk Levels + +A rule is classified into exactly one of three risk levels: + +- **`safe`** — fixing violations of this rule only adds or corrects descriptive metadata. No client, server, or contract test validates against it. Safe to apply automatically, including by an AI agent acting without per-change human review. +- **`humanreview`** — fixing violations of this rule is typically additive or clarifying (e.g. adding a missing `operationId`, declaring a security requirement, adjusting an `enum`/`default`), but could plausibly change generated-client behavior, routing, or validation in ways a human should confirm before applying at scale. +- **`unsafe`** — fixing violations of this rule could change request/response validation, required fields, types, or the parameter surface, or the rule's risk could not be determined with any confidence. Requires human (or explicitly-confirmed agent) review before applying. + +**Design principle (inherited from the quick-fixes algorithm):** classification is positive-evidence-only for `safe`, and conservative-by-default everywhere else. A rule becomes `safe` only when a specific signal says it's safe. A rule with no signal, or with signals spanning multiple tiers, is never assumed safe — it falls to the more cautious level. + +## Confidence Levels + +Each rule's risk level carries a confidence level: + +- **`high`** — the rule id matched a curated table (Stage 1). +- **`medium`** — the rule id was unrecognized, but the rule's `given` JSONPath unambiguously matched exactly one risk tier's segment set (Stage 2). +- **`low`** — either no recognizable signal at all (Stage 3 fallback to `unsafe`), or the `given` path matched segments from **more than one** tier (Stage 2 ambiguity, downgraded from `medium`). + +--- + +## Input & Output + +**Input:** a loaded Spectral ruleset object (`LoadedRuleset.ruleset` from `packages/api-grade-core/src/rulesets/loader.ts`), specifically its `rules` map: `{ [ruleId]: { given: string | string[], then: { function: string }, severity, description, recommended } }`, plus the ruleset's resolved location (local path or remote/GitHub URL). A per-rule **Fingerprint** (see Stage 0) is derived from each rule's own content and used to look up any persisted or bundled pre-calculated analysis for that rule. + +**Output:** `analyseRuleset(ruleset) -> RulesetAnalysis`: +- `rulesetSource: 'default' | 'custom'`, `rulesetPath?: string` — mirrors the input `LoadedRuleset`. +- `rules: RuleAnalysis[]` — exactly one entry per rule key in the input ruleset (no omissions — see Implementation Notes). + +Each `RuleAnalysis`: `{ ruleId, riskLevel, confidenceLevel, rationale, source }`, where `source` is one of `'persisted'` (Stage 0a/0b), `'bundled-default'` (Stage 0c), `'curated'` (Stage 1), `'heuristic'` (Stage 2), or `'fallback'` (Stage 3) — see Data Model for the full enum. + +A second function, `getRemediationSafety(diagnostic, rulesetAnalysis) -> { riskLevel, confidenceLevel }`, performs the per-violation lookup at grading time (see Stage 5). + +A third function, `persistRuleAnalysisOverride(ruleset, ruleId, riskLevel, scope)`, writes a user correction for one rule into either the colocated Shared Ruleset Analysis (`scope: 'shared'`) or the personal-override store (`scope: 'personal'`, `workspace` | `global`), to be picked up by Stage 0 on future runs (see Stage 4 for the write-target rules, including the remote/GitHub-hosted fallback). + +--- + +## Stage 0: Persisted / Pre-Calculated Lookup + +Runs **before** Stage 1, once per loaded ruleset (not per rule). For each rule, computes a **Rule Fingerprint** — a hash over that rule's own content: `hash(ruleId + '|' + given + '|' + then.function + '|' + severity + '|' + description)`. Fingerprinting is per-rule, not a single whole-ruleset hash, so that editing one rule invalidates only that rule's persisted entry, not every entry in a shared analysis file that a team may have spent time curating (FR-014). + +Checked in order; the first hit for a given `ruleId` (with a matching Fingerprint) is used, and per-`ruleId` lookup continues independently — a ruleset's overall analysis is typically assembled from a mix of sources: + +``` +0a. Personal Ruleset Analysis Override (workspace-scoped, then global-scoped) for this ruleId+fingerprint, if present +0b. Shared Ruleset Analysis colocated with the ruleset, for this ruleId+fingerprint, if present +0c. bundled pre-calculated analysis, ONLY if this is the built-in ruleset +``` + +For each `ruleId` covered by a hit: `RETURN { riskLevel, confidenceLevel, rationale, source: 'persisted' | 'bundled-default' }` using the stored values as-is (a `persisted` entry's `confidenceLevel` is whatever was stored when the correction was made — typically `high`, since a human confirmed it). + +Any `ruleId` **not** covered by Stage 0 — no hit in 0a/0b/0c, or a hit exists but its stored Fingerprint no longer matches the rule's current definition (the rule was edited since the entry was captured) — falls through to Stage 1. This is the same "lookup miss → keep going" behavior as the per-violation lookup in Stage 5; Stage 0 never blocks or fails the analysis, it only short-circuits the rules it has valid prior knowledge of (FR-012, FR-014, FR-015). + +**Shared Ruleset Analysis location (0b) — colocated via naming convention (FR-016/FR-017):** derived deterministically from the ruleset's own path/URL (e.g. appending a fixed suffix to the ruleset's filename), so presence is a direct lookup at that location, not a separate index. For a local ruleset this is a sibling file on disk; for a GitHub-hosted ruleset this is a sibling file fetched via the same resolution/auth flow already used to fetch the ruleset (`resolveRuleset`/`fetchRulesetContent`). Anyone who can read the ruleset can read this file, so a team sharing a ruleset automatically shares its analysis (SC-008) with no per-user configuration step. + +**Personal Ruleset Analysis Override (0a):** checked first because it represents the most specific, most recently expressed intent — a user actively disagreeing with or supplementing the shared analysis for themselves, without writing to the shared file (FR-018). Stored using the existing workspace/global config-file scope (`packages/api-grade-core/src/config/ruleset-config.ts` pattern), repurposed for this narrower role. + +**Bundled pre-calculated analysis for the built-in ruleset (0c):** shipped with the package (generated by running Stages 1–3 once over the built-in ruleset at release time and committing the result), so `ruleset-analysis`/`analyse-ruleset-safety` against the built-in ruleset never requires per-rule computation at request time (SC-007), and so the built-in ruleset itself satisfies the "at a minimum the default ruleset" baseline the clarification document calls for. + +**Rationale:** directly required by `clarification-algorithm.md`'s "Recommended High Level Approach" (steps 1 and 4) and by the project's own stated goal of letting an organisation share one set of judgements rather than each person separately configuring their own copy. Colocation (rather than a per-user store as the primary mechanism) is what makes that sharing automatic; per-rule fingerprinting (rather than a whole-ruleset hash) is what keeps a shared file useful as the ruleset evolves incrementally instead of being invalidated wholesale by any single edit. + +--- + +## Stage 1: Curated Rule-ID Tables + +Checked first, per rule. Short-circuits the rest of classification when it matches. Three disjoint, curated tables (extending the single `RULE_ID_NON_BREAKING_PREFIXES` table from the quick-fixes algorithm into three tiers): + +``` +SAFE_RULE_ID_PREFIXES = [ + "operation-description", "operation-summary", + "info-contact", "info-description", "info-license", + "oas3-examples-", "tag-description" +] + +HUMANREVIEW_RULE_ID_PREFIXES = [ + "operation-operationId", "operation-2xx-response", + "oas3-server-not-example-com", "oas3-server-trailing-slash", + "operation-security-defined" +] + +UNSAFE_RULE_ID_PREFIXES = [ + "oas3-schema", "oas3-valid-schema-example" +] + +FOR EACH rule IN ruleset.rules: + FOR EACH prefix IN SAFE_RULE_ID_PREFIXES: + IF rule.id.startsWith(prefix): RETURN { riskLevel: "safe", confidenceLevel: "high", rationale: "rule id matched curated safe-prefix table", source: "curated" } + FOR EACH prefix IN HUMANREVIEW_RULE_ID_PREFIXES: + IF rule.id.startsWith(prefix): RETURN { riskLevel: "humanreview", confidenceLevel: "high", rationale: "rule id matched curated humanreview-prefix table", source: "curated" } + FOR EACH prefix IN UNSAFE_RULE_ID_PREFIXES: + IF rule.id.startsWith(prefix): RETURN { riskLevel: "unsafe", confidenceLevel: "high", rationale: "rule id matched curated unsafe-prefix table", source: "curated" } +``` + +**Rationale:** identical justification to the quick-fixes algorithm's Stage 1 — these rule IDs are curated from the built-in rulesets and the curators have direct knowledge of what each rule actually validates, making the rule ID itself an authoritative signal that outranks any generic path heuristic. + +**Maintenance note:** these tables are expected to grow as the project encounters new well-known rules (including from popular custom/community rulesets). Adding an entry is a config-only change, not an algorithm change. + +--- + +## Stage 2: Path-Segment Heuristic on the Rule's `given` + +Runs only when Stage 1 doesn't match. Two checks, in order: a structural **key-selector check** (2a), then the **segment-membership heuristic** (2b). Both operate on the rule's `given` JSONPath expression(s) — the schema location(s) the rule applies to — and are format-aware, covering both OpenAPI's and AsyncAPI's contract-surface ontologies per `clarification-algorithm.md`'s "Build a format-aware contract-surface ontology" guidance. + +### Stage 2a: Key-Selector Check + +Catches rules that target the *key* of a `paths` or `channels` collection — any such rule cannot be satisfied without renaming a public route or channel. Two equivalent spellings exist in Spectral: the JSONPath `~` key-selector on the `given` expression, and `then.field: "@key"` on a `given` that targets the collection itself. Both are checked here; neither appears in the segment-membership set in 2b. + +``` +IS_KEY_SELECTOR(given) = given ends with the JSONPath "~" modifier + (e.g. "$.paths[*]~", "$.channels[*]~") + +IS_KEY_FIELD(rule) = rule.then.field == "@key" + (Spectral's function-based equivalent of the "~" key-selector, + used e.g. by AsyncAPI 2.x channel rules such as asyncapi-channel-no-empty-parameter) + +FOR EACH given_expr IN rule.given: + IF IS_KEY_SELECTOR(given_expr) AND given_expr contains "paths" or "channels": + RETURN { riskLevel: "high", confidenceLevel: "high", + rationale: "given path selects path/channel object keys directly — any satisfying edit renames a public path or channel", + source: "heuristic" } + +IF IS_KEY_FIELD(rule) AND (rule.given tokens include "paths" or "channels"): + RETURN { riskLevel: "high", confidenceLevel: "high", + rationale: "then.field \"@key\" on paths/channels collection — equivalent to a path/channel key-selector; any satisfying edit renames a public path or channel", + source: "heuristic" } +``` + +**Rationale for `~` check:** a rule targeting the *keys* of `paths`/`channels` (e.g. a kebab-case naming convention) cannot be satisfied without renaming a real, public path or channel — by construction the riskiest, highest-confidence case the heuristic can recognize. + +**Rationale for `@key` check:** Spectral's built-in rulesets often use `given: "$.paths"` or `given: "$.channels"` with `then.field: "@key"` rather than `given: "$.paths[*]~"` — these are semantically identical (both select the collection key), but the `~` check above would miss them. In AsyncAPI 2.x the channel key *is* the routing address; in OpenAPI the path key *is* the route. Both target the same renaming risk. `paths` and `channels` are deliberately *not* included as bare segments in 2b, since most rules with those tokens in their `given` reach into the collection's content (e.g. `operation-description` → `$.paths[*][*].description`) and must not be over-classified as unsafe. + +### Stage 2a(ii): `pattern` Function-Mode Distinction + +Before applying the rename/reformat classification to a `pattern` function, the implementation checks `then.functionOptions` to distinguish two semantically different uses of `pattern`: + +``` +IS_EXISTENCE_CHECK(rule) = + rule.then.function == "pattern" + AND "notMatch" in rule.then.functionOptions + AND "match" NOT in rule.then.functionOptions +``` + +- **`notMatch`-only** (`IS_EXISTENCE_CHECK` = true): the rule asserts that the field does NOT contain a bad value (empty object `{}`, trailing slash, `example.com`, `