From a23f2d09c324f17da3a59acfa42c850c1704cf81 Mon Sep 17 00:00:00 2001 From: DawMatt Date: Tue, 23 Jun 2026 22:11:44 +1000 Subject: [PATCH 01/22] Initial remediation safety plan --- .specify/feature.json | 2 +- .../checklists/requirements.md | 34 +++ .../contracts/remediation-safety-surfaces.md | 46 +++++ specs/012-remediation-safety/data-model.md | 69 +++++++ specs/012-remediation-safety/plan.md | 115 +++++++++++ specs/012-remediation-safety/quickstart.md | 63 ++++++ specs/012-remediation-safety/research.md | 67 ++++++ specs/012-remediation-safety/spec.md | 108 ++++++++++ ...mated_remediation_safety_algorithm_spec.md | 194 ++++++++++++++++++ 9 files changed, 697 insertions(+), 1 deletion(-) create mode 100644 specs/012-remediation-safety/checklists/requirements.md create mode 100644 specs/012-remediation-safety/contracts/remediation-safety-surfaces.md create mode 100644 specs/012-remediation-safety/data-model.md create mode 100644 specs/012-remediation-safety/plan.md create mode 100644 specs/012-remediation-safety/quickstart.md create mode 100644 specs/012-remediation-safety/research.md create mode 100644 specs/012-remediation-safety/spec.md create mode 100644 specs/algorithms/automated_remediation_safety_algorithm_spec.md diff --git a/.specify/feature.json b/.specify/feature.json index a124639..eb767d7 100644 --- a/.specify/feature.json +++ b/.specify/feature.json @@ -1,3 +1,3 @@ { - "feature_directory": "specs/011-remediation-safety-rename" + "feature_directory": "specs/012-remediation-safety" } diff --git a/specs/012-remediation-safety/checklists/requirements.md b/specs/012-remediation-safety/checklists/requirements.md new file mode 100644 index 0000000..792f444 --- /dev/null +++ b/specs/012-remediation-safety/checklists/requirements.md @@ -0,0 +1,34 @@ +# Specification Quality Checklist: Remediation Safety (Ruleset Analyser & Multi-Level Safety) + +**Purpose**: Validate specification completeness and quality before proceeding to planning +**Created**: 2026-06-23 +**Feature**: [spec.md](../spec.md) + +## Content Quality + +- [x] No implementation details (languages, frameworks, APIs) +- [x] Focused on user value and business needs +- [x] Written for non-technical stakeholders +- [x] All mandatory sections completed + +## Requirement Completeness + +- [x] No [NEEDS CLARIFICATION] markers remain +- [x] Requirements are testable and unambiguous +- [x] Success criteria are measurable +- [x] Success criteria are technology-agnostic (no implementation details) +- [x] All acceptance scenarios are defined +- [x] Edge cases are identified +- [x] Scope is clearly bounded +- [x] Dependencies and assumptions identified + +## Feature Readiness + +- [x] All functional requirements have clear acceptance criteria +- [x] User scenarios cover primary flows +- [x] Feature meets measurable outcomes defined in Success Criteria +- [x] No implementation details leak into specification + +## Notes + +- All checklist items pass on first draft; no [NEEDS CLARIFICATION] markers were required — reasonable defaults (documented in Assumptions) were used for the confidence-level scale, rationale field shape, and Backstage plugin scope. diff --git a/specs/012-remediation-safety/contracts/remediation-safety-surfaces.md b/specs/012-remediation-safety/contracts/remediation-safety-surfaces.md new file mode 100644 index 0000000..d0ddfc5 --- /dev/null +++ b/specs/012-remediation-safety/contracts/remediation-safety-surfaces.md @@ -0,0 +1,46 @@ +# Contract: Remediation Safety Surfaces (Multi-Level + Ruleset Analyser) + +Supersedes `specs/011-remediation-safety-rename/contracts/remediation-safety-surfaces.md` for the surfaces below; that document remains historical record of the Feature 11 rename. This contract covers the full implementation: three risk levels, confidence, and the new ruleset-analysis surfaces. + +## CLI: `--remediation-safety ` + +| Before this feature | After this feature | +|---|---| +| Accepts only `safe`; any other value rejected with `Error: --remediation-safety must be "safe".` | Accepts `safe`, `humanreview`, `unsafe`. Any other value rejected with `Error: --remediation-safety must be one of: safe, humanreview, unsafe.` | +| Filtered output built by `buildQuickFixOutput`/`formatQuickFixesHuman`, shape `QuickFixOutput` (`quickFixCount`, `quickFixes`). | Filtered output built by `buildRemediationSafetyOutput`/`formatRemediationSafetyHuman`, shape `RemediationSafetyOutput` (`remediationItemCount`, `remediationItems`, `requestedLevel`). Each item additionally carries `riskLevel` and `confidenceLevel`. | +| `--remediation-safety safe` output identical to pre-Feature-12 `safe` output in violation membership. | Unchanged for `safe` membership (FR-007); new fields (`riskLevel`, `confidenceLevel`, `requestedLevel`) are additive. | + +## CLI: new `ruleset-analysis` subcommand + +```text +api-grade ruleset-analysis [--ruleset-path ] [--format json|human] +``` + +- Without `--ruleset-path`, analyses the built-in default ruleset for the relevant format(s). +- `--format json` returns a `RulesetAnalysis` JSON document. +- `--format human` (default) prints a table: rule id, risk level, confidence level, rationale. +- Exits non-zero only on a genuine error (e.g. ruleset file not found / unparseable) — analysis itself never partially fails (every rule gets an entry, per FR-001/SC-005). + +## MCP: `grade-api-remediation-safety` tool — `level` parameter + +| Before this feature | After this feature | +|---|---| +| `level: z.enum(['safe'])` | `level: z.enum(['safe', 'humanreview', 'unsafe'])` | +| Response payload: `QuickFixOutput` shape under different field names (`quickFixCount`, `quickFixes`) | Response payload: `RemediationSafetyOutput` shape (`remediationItemCount`, `remediationItems`, `requestedLevel`); each item includes `riskLevel`, `confidenceLevel` | +| Tool description silent on confidence/risk-tier concept | Tool description updated to mention all three levels and that each returned item carries a confidence indicator | + +## MCP: new `analyse-ruleset-safety` tool + +```text +Tool: analyse-ruleset-safety +Input: { rulesetPath?: string, recoveryOption?: 'retry' | 'use-builtin-once' | 'use-builtin-session' | 'cancel' } +Output: RulesetAnalysis JSON (rulesetSource, rulesetPath?, rules[]) +``` + +- Follows the same ruleset-resolution/recovery-option flow already used by `grade-api-remediation-safety` and `set-ruleset-config`/`get-ruleset-config` (reuses `resolveRuleset`, `RulesetAuthError`, `mcpError`/`ERROR_CODES`). +- Self-describing per the constitution's AI Integration Requirements: description alone is sufficient for an MCP host to know when to call it (inspecting a ruleset's remediation risk without grading any spec). + +## Out of scope for this contract + +- No change to how a ruleset is supplied/located (file path, GitHub PAT, workspace/global config) — only to what is computed once it's loaded. +- Backstage plugin packages are not touched — they do not currently surface quick-fix/remediation-safety information (confirmed: no "quick fix" references found in `packages/backstage-plugin-*`). diff --git a/specs/012-remediation-safety/data-model.md b/specs/012-remediation-safety/data-model.md new file mode 100644 index 0000000..6c695fd --- /dev/null +++ b/specs/012-remediation-safety/data-model.md @@ -0,0 +1,69 @@ +# Data Model: Remediation Safety (Ruleset Analyser & Multi-Level Safety) + +## RemediationSafetyLevel + +Enum string: `"safe"` | `"humanreview"` | `"unsafe"`. Ordered from least to most cautious. Replaces the prior two-class `ViolationClass` (`nonBreaking`/`breaking`/`unknown`). + +## ConfidenceLevel + +Enum string: `"high"` | `"medium"` | `"low"`. Describes how confident the ruleset analyser is in a `RuleAnalysis`'s assigned `RemediationSafetyLevel`. + +- `high` — the rule id matched a curated, known table (Stage 1 of the algorithm). +- `medium` — classification came from the generic path-segment heuristic only (Stage 2), with a single, unambiguous tier match. +- `low` — either no recognizable signal at all (Stage 3 fallback), or the path-segment heuristic matched more than one tier (genuine ambiguity, downgraded from `medium`). + +## RuleAnalysis + +One entry per rule in an analysed ruleset. + +| Field | Type | Description | +|---|---|---| +| `ruleId` | string | The rule's identifier within its ruleset. | +| `riskLevel` | `RemediationSafetyLevel` | The assigned risk level for auto-remediating violations of this rule. | +| `confidenceLevel` | `ConfidenceLevel` | Confidence in `riskLevel`. | +| `rationale` | string | Short human-readable explanation of why this level/confidence was assigned (e.g. "rule id matched curated safe-prefix table" or "given path touches `parameters` and `description` — conservative match, ambiguous"). | + +**Validation rules**: every rule present in the input ruleset MUST produce exactly one `RuleAnalysis` entry (FR-001, SC-005) — no rule is ever omitted from analyser output. + +## RulesetAnalysis + +| Field | Type | Description | +|---|---|---| +| `rulesetSource` | `"default" \| "custom"` | Mirrors `GradeResult.rulesetSource`. | +| `rulesetPath` | string (optional) | Present when `rulesetSource === "custom"`. | +| `rules` | `RuleAnalysis[]` | One entry per rule, see above. | + +**Relationships**: Computed once per loaded ruleset (`LoadedRuleset` from `rulesets/loader.ts`) and cached for the lifetime of a grading run. `GradeEngine` (or a caller wrapping it) holds the `RulesetAnalysis` alongside the loaded ruleset and consults it when building remediation-safety output, rather than recomputing per violation. + +## RemediationItem (was `QuickFix`) + +| Field | Type | Description | +|---|---|---| +| `ruleId` | string | Unchanged from today's `QuickFix.ruleId`. | +| `message` | string | Unchanged. | +| `severity` | string | Unchanged. | +| `path` | string[] | Unchanged. | +| `location` | string | Unchanged. | +| `currentValue` | string \| null | Unchanged. | +| `expectedImprovement` | string | Unchanged. | +| `riskLevel` | `RemediationSafetyLevel` | **New** — the violation's computed remediation safety, looked up from the rule's `RuleAnalysis`. | +| `confidenceLevel` | `ConfidenceLevel` | **New** — confidence behind `riskLevel`, from the same lookup. | + +## RemediationSafetyOutput (was `QuickFixOutput`) + +| Field | Type | Description | +|---|---|---| +| `specPath` | string | Unchanged. | +| `format` | `ApiFormat` | Unchanged. | +| `totalViolations` | number | Unchanged. | +| `remediationItemCount` | number | Renamed from `quickFixCount` — count of violations matching the requested `level`. | +| `remediationItems` | `RemediationItem[]` | Renamed from `quickFixes`. | +| `requestedLevel` | `RemediationSafetyLevel` | **New** — echoes the level that was filtered for, since there are now three possible values instead of one implicit one. | + +**State transitions**: N/A — both `RulesetAnalysis` and remediation-safety output are computed fresh per grading/analysis request; nothing is persisted across requests (consistent with Feature 11's data model, which established this as a request-scoped concept). + +## Lookup / default behavior + +`getRemediationSafety(diagnostic, rulesetAnalysis) -> { riskLevel, confidenceLevel }`: +- If `rulesetAnalysis.rules` contains an entry for `diagnostic.ruleId`, return its `riskLevel`/`confidenceLevel`. +- Otherwise (FR-009), return `{ riskLevel: "unsafe", confidenceLevel: "low" }`. diff --git a/specs/012-remediation-safety/plan.md b/specs/012-remediation-safety/plan.md new file mode 100644 index 0000000..6d5cf22 --- /dev/null +++ b/specs/012-remediation-safety/plan.md @@ -0,0 +1,115 @@ +# Implementation Plan: Remediation Safety (Ruleset Analyser & Multi-Level Safety) + +**Branch**: `012-remediation-safety` | **Date**: 2026-06-23 | **Spec**: [spec.md](./spec.md) + +**Input**: Feature specification from `/specs/012-remediation-safety/spec.md` + +**Note**: This template is filled in by the `/speckit-plan` command. See `.specify/templates/plan-template.md` for the execution workflow. + +## Summary + +Build a deterministic, rule-metadata-driven ruleset analyser (`analyseRuleset()`) that assigns every rule in a loaded Spectral ruleset a risk level (`safe`/`humanreview`/`unsafe`) and a confidence level (`high`/`medium`/`low`), per [`automated_remediation_safety_algorithm_spec.md`](../algorithms/automated_remediation_safety_algorithm_spec.md). Extend `--remediation-safety` (CLI) and the `grade-api-remediation-safety` MCP tool's `level` parameter from the single `safe` value (Feature 11) to all three levels, computed via per-violation lookup against the analyser's cached result. Add a new CLI subcommand (`ruleset-analysis`) and MCP tool (`analyse-ruleset-safety`) so the analyser's output is inspectable independent of grading a spec. Complete the internal rename Feature 11 deferred: no source, test, or current documentation file may reference "quick fix(es)" in any form afterward (historical `CHANGELOG.md`/`GOAL.md` entries excluded as accurate historical record). + +## Technical Context + +**Language/Version**: TypeScript (Node.js, ES modules), per existing `packages/api-grade-core`, `packages/api-grade-mcp`, and `src/cli` packages + +**Primary Dependencies**: `@stoplight/spectral-rulesets` / `@stoplight/spectral-ruleset-bundler` (already used by `rulesets/loader.ts` to load rule metadata — the analyser reads `LoadedRuleset.ruleset.rules`, no new parsing dependency needed), `commander` (new `ruleset-analysis` CLI subcommand), `zod` (new `analyse-ruleset-safety` MCP tool schema + extended `level` enum), `@modelcontextprotocol/sdk` + +**Storage**: N/A — `RulesetAnalysis` is computed fresh per request/grading run, never persisted (consistent with Feature 11's data model) + +**Testing**: Vitest (`vitest run`). New unit tests for `analyseRuleset()`/`getRemediationSafety()` in `packages/api-grade-core/tests/unit/remediation-safety.test.ts` (replacing `quick-fixes.test.ts`); updated CLI integration test `tests/integration/cli-remediation-safety.test.ts` (replacing `cli-quick-fixes.test.ts`) covering all three levels plus the new `ruleset-analysis` subcommand; updated MCP integration test `packages/api-grade-mcp/tests/integration/remediation-safety.test.ts` (replacing `quick-fixes-only.test.ts`) plus a new test for `analyse-ruleset-safety` + +**Target Platform**: Cross-platform Node.js CLI and MCP server (Windows/macOS), per constitution Principle V + +**Project Type**: CLI + MCP server + core library packages within an existing npm workspace monorepo + +**Performance Goals**: Ruleset analysis is O(rules) and runs once per loaded ruleset per process invocation; per-violation lookup is O(1) (map lookup by `ruleId`) — no measurable change to existing grading throughput + +**Constraints**: Must not regress `--remediation-safety safe` membership (FR-007); must classify 100% of rules in any ruleset, built-in or custom, with no omissions (SC-005); zero new monetary-cost dependencies (constitution Principle V) — the analyser is pure rule-metadata inspection, no external service or model call + +**Scale/Scope**: Touches `packages/api-grade-core` (new `remediation-safety.ts` replacing `quick-fixes.ts`, `types.ts` additions, `index.ts` exports), `packages/api-grade-mcp` (rename + extend existing tool, add new `analyse-ruleset-safety` tool, `server.ts` registration), `src/cli` (extend `--remediation-safety`, add new `ruleset-analysis-cli.ts` subcommand), and documentation (`docs/cli/commands.md`, `docs/mcp/quick-start.md`, `docs/package/api-grade-mcp.md`, `docs/package/README.md`, `docs/package/api-reference.md`, `docs/index.md`, `docs/getting-started.md`, `packages/api-grade-mcp/README.md`, `CONTRIBUTING.md`). New algorithm spec document at `specs/algorithms/automated_remediation_safety_algorithm_spec.md` (already authored as part of this planning phase). No Backstage plugin changes — they do not currently surface quick-fix/remediation-safety information. + +## Constitution Check + +*GATE: Must pass before Phase 0 research. Re-check after Phase 1 design.* + +| Principle | Gate | Status | +|-----------|------|--------| +| I. Multi-Format API Support | Analyser/remediation-safety must not be scoped to one spec format | PASS — operates on ruleset rule metadata (`ruleId`, `given`), uniform across the OpenAPI and AsyncAPI built-in rulesets and custom rulesets; no format-specific branching | +| II. Core-First Architecture | CLI and MCP must consume shared core logic, not duplicate classification | PASS — `analyseRuleset()`/`getRemediationSafety()` live in `@dawmatt/api-grade-core`; CLI and MCP both call the same core functions, mirroring how `buildQuickFixOutput` is shared today | +| III. Spectral-Ruleset Based Grading | Must not alter scoring/diagnostic generation; custom rulesets must remain supported | PASS — analyser is a separate, additive computation; no change to `grader.ts`/`scorer.ts`; works against any Spectral-compatible ruleset (built-in or custom), including ones sourced via GitHub PAT (existing `resolveRuleset`/`fetchRulesetContent` flow, unchanged) | +| IV. Test-Driven Quality | New algorithm and renamed surfaces need test coverage written alongside implementation | PASS — plan specifies new/renamed unit and integration tests covering all three levels, the analyser's total-coverage guarantee (SC-005), and the fallback/lookup-miss default (FR-009) | +| V. Cross-Platform & Zero-Cost Prerequisites | No new paid dependencies or platform-specific behavior | PASS — reuses existing `@stoplight/spectral-rulesets`/`commander`/`zod`; analyser is pure in-process logic, no external service | +| VI. Educational Excellence | Diagnostic-adjacent output should explain *why*, not just *what* | PASS — every `RuleAnalysis` carries a `rationale` field explaining the classification (FR-003), satisfying the "actionable, explained" principle for this new diagnostic surface too | + +No violations — Complexity Tracking section is not needed. + +## Project Structure + +### Documentation (this feature) + +```text +specs/012-remediation-safety/ +├── plan.md # This file (/speckit-plan command output) +├── research.md # Phase 0 output (/speckit-plan command) +├── data-model.md # Phase 1 output (/speckit-plan command) +├── quickstart.md # Phase 1 output (/speckit-plan command) +├── contracts/ # Phase 1 output (/speckit-plan command) +└── tasks.md # Phase 2 output (/speckit-tasks command - NOT created by /speckit-plan) + +specs/algorithms/ +└── automated_remediation_safety_algorithm_spec.md # New domain algorithm spec (FR-004), authored in this planning phase +``` + +### Source Code (repository root) + +```text +packages/api-grade-core/src/ +├── remediation-safety.ts # NEW — replaces quick-fixes.ts: analyseRuleset(), getRemediationSafety(), buildRemediationItem(), buildRemediationSafetyOutput(), formatRemediationSafetyHuman() +├── rulesets/loader.ts # unchanged — analyser consumes its LoadedRuleset.ruleset.rules +├── types.ts # add RemediationSafetyLevel, ConfidenceLevel, RuleAnalysis, RulesetAnalysis, RemediationItem, RemediationSafetyOutput; remove ViolationClass, QuickFix, QuickFixOutput +└── index.ts # export new remediation-safety.ts symbols/types in place of quick-fixes.ts ones + +packages/api-grade-core/tests/unit/ +└── remediation-safety.test.ts # replaces quick-fixes.test.ts; adds analyseRuleset()/getRemediationSafety() coverage for all 3 levels + confidence + SC-005 total-coverage check + +src/cli/ +├── index.ts # extend --remediation-safety to accept safe|humanreview|unsafe; call renamed core functions +└── ruleset-analysis-cli.ts # NEW — `ruleset-analysis` subcommand (mirrors ruleset-config-cli.ts pattern) + +tests/integration/ +└── cli-remediation-safety.test.ts # replaces cli-quick-fixes.test.ts; covers all 3 levels + ruleset-analysis subcommand + +packages/api-grade-mcp/src/ +├── server.ts # register renamed tool + new analyse-ruleset-safety tool +└── tools/ + ├── remediation-safety.ts # renamed from quick-fixes-only.ts; level enum extended to 3 values + └── analyse-ruleset-safety.ts # NEW — exposes analyseRuleset() independent of grading + +packages/api-grade-mcp/tests/integration/ +├── remediation-safety.test.ts # renamed from quick-fixes-only.test.ts; covers all 3 levels +└── analyse-ruleset-safety.test.ts # NEW + +packages/api-grade-mcp/src/utils/classify.ts # update re-exports to new core names + +docs/ +├── cli/commands.md # --remediation-safety 3-level reference + ruleset-analysis subcommand +├── mcp/quick-start.md # renamed/extended tool + new analyse-ruleset-safety tool +├── package/api-grade-mcp.md # tool reference updates +├── package/README.md # remove remaining "quick fix" mentions +├── package/api-reference.md # core API reference: new types/functions +├── index.md # remove remaining "quick fix" mentions +└── getting-started.md # tool list mention update + +packages/api-grade-mcp/README.md # tool table update +CONTRIBUTING.md # package/tool table correction (still names pre-Feature-11 tool) +``` + +**Structure Decision**: Single-project monorepo (existing `src/cli` + `packages/*` workspaces), unchanged from Feature 11. The analyser and remediation-safety calculation live entirely in `@dawmatt/api-grade-core` (Core-First Architecture, Principle II); CLI and MCP packages each add one new thin surface (a subcommand, a tool) that calls the shared core functions, matching the existing `ruleset-config`/`get-ruleset-config`/`set-ruleset-config` pattern rather than introducing a new architectural layer. + +## Complexity Tracking + +> **Fill ONLY if Constitution Check has violations that must be justified** + +No violations — section not applicable. diff --git a/specs/012-remediation-safety/quickstart.md b/specs/012-remediation-safety/quickstart.md new file mode 100644 index 0000000..7e699cd --- /dev/null +++ b/specs/012-remediation-safety/quickstart.md @@ -0,0 +1,63 @@ +# Quickstart: Remediation Safety (Ruleset Analyser & Multi-Level Safety) + +## 1. CLI: filter by any of the three levels + +```bash +api-grade openapi.yaml --remediation-safety safe # unchanged behavior from Feature 11 +api-grade openapi.yaml --remediation-safety humanreview # new +api-grade openapi.yaml --remediation-safety unsafe # new +``` + +Each returned item now includes `riskLevel` and `confidenceLevel`: + +```json +{ + "specPath": "openapi.yaml", + "format": "openapi-3", + "totalViolations": 12, + "requestedLevel": "humanreview", + "remediationItemCount": 2, + "remediationItems": [ + { + "ruleId": "operation-operationId", + "riskLevel": "humanreview", + "confidenceLevel": "high", + "...": "..." + } + ] +} +``` + +## 2. CLI: inspect a ruleset's remediation risk without grading a spec + +```bash +api-grade ruleset-analysis --format human +# rule id risk confidence rationale +# operation-description safe high rule id matched curated safe-prefix table +# operation-operationId humanreview high rule id matched curated humanreview-prefix table +# oas3-schema unsafe low no recognizable rule-id or path signal + +api-grade ruleset-analysis --ruleset-path ./my-ruleset.yaml --format json +``` + +## 3. MCP: same filtering, plus a dedicated ruleset-analysis tool + +```text +Tool: grade-api-remediation-safety +Input: { "specPath": "/workspace/my-api/openapi.yaml", "level": "humanreview" } + +Tool: analyse-ruleset-safety +Input: { "rulesetPath": "/workspace/my-ruleset.yaml" } +Output: { "rulesetSource": "custom", "rulesetPath": "...", "rules": [ ... ] } +``` + +## 4. Verify the "quick fixes" cleanup is complete + +```bash +grep -rniE "quick.?fix" --include="*.ts" --include="*.md" \ + src/ packages/api-grade-core/src packages/api-grade-mcp/src \ + packages/api-grade-core/tests packages/api-grade-mcp/tests tests/ \ + docs/ packages/api-grade-mcp/README.md CONTRIBUTING.md +``` + +This should return zero matches (SC-003). `CHANGELOG.md` and `GOAL.md` historical entries describing what shipped in past releases are intentionally excluded — they are an accurate record of the past, not current documentation. diff --git a/specs/012-remediation-safety/research.md b/specs/012-remediation-safety/research.md new file mode 100644 index 0000000..10f0bdd --- /dev/null +++ b/specs/012-remediation-safety/research.md @@ -0,0 +1,67 @@ +# Research: Remediation Safety (Ruleset Analyser & Multi-Level Safety) + +## 1. Rule-level vs. violation-level classification + +**Decision**: The ruleset analyser classifies at the **rule** level (one risk + confidence pair per `ruleId`), computed once per loaded ruleset and cached. Remediation safety for a specific violation is a lookup against this cache by `ruleId`, not a fresh per-instance computation. + +**Rationale**: FR-001 and the spec's Key Entities explicitly scope the analyser to "for each rule". This is also what makes FR-011 possible (inspecting ruleset risk independent of grading any specific spec) and keeps the calculation O(1) per violation at grading time instead of re-running heuristics per occurrence. + +**Alternatives considered**: Per-violation classification (today's `classifyViolation`, keyed on the diagnostic's instance `path`) gives finer granularity for generic rules (e.g. `oas3-schema`, which fires on both breaking and cosmetic mismatches) but cannot satisfy FR-011 (no spec to derive an instance path from) and isn't "ruleset analysis" — it's per-result analysis. Rejected in favor of rule-level, accepting the coarser-granularity tradeoff called out in the spec's Edge Cases (a rule spanning levels gets one conservative classification, flagged with reduced confidence — see §3). + +## 2. Confidence scale + +**Decision**: Three discrete levels — `high`, `medium`, `low` — mirroring the project's existing preference for small, explainable categories over numeric scores (same pattern as `ImpactLevel`, `DiagnosticSeverityLevel`). + +**Rationale**: Constitution Principle VI favors explanation over raw scores; a numeric 0–1 confidence would need its own thresholds restated everywhere it's displayed, with no added value for a binary "trust this / verify this" decision a user actually makes. + +**Alternatives considered**: Continuous 0–100 confidence score — rejected as over-precise for a heuristic, rule-metadata-only analyser, and inconsistent with how grades/impact are presented elsewhere in the project. + +## 3. Three-tier risk classification algorithm + +**Decision**: Extend the existing two-stage quick-fixes algorithm (`specs/algorithms/quick_fixes_algorithm_spec.md`) from two outcome classes (`nonBreaking`/`breaking`, plus `unknown`) to three risk levels (`safe`/`humanreview`/`unsafe`), operating on **rule metadata** (`ruleId`, the rule's `given` JSONPath expression(s), `then.function`) rather than a violation's instance path: + +- **Stage 1 — curated rule-id tables** (high confidence): extend the existing `safe`-prefix table; add a new `humanreview`-prefix table for rules whose fixes are typically additive but operationally significant (e.g. `operation-operationId`, `oas3-server-not-example-com`, security/server-related rules). Anything not listed falls through. +- **Stage 2 — path-segment heuristic on the rule's `given`** (medium confidence): extend `BREAKING_SEGMENTS`/`NON_BREAKING_SEGMENTS` into three tiers — `UNSAFE_SEGMENTS` (`required`, `type`, `format`, `parameters`), `HUMANREVIEW_SEGMENTS` (`enum`, `default`, `security`, `servers`, `operationId`, `additionalProperties`, `responses`), `SAFE_SEGMENTS` (existing list). Checked most-conservative-first (`unsafe` > `humanreview` > `safe`). If a rule's `given` matches segments from **more than one tier**, the most conservative matched tier is still chosen, but confidence is downgraded one step (e.g. `medium` → `low`) to flag the genuine ambiguity for a ruleset maintainer. +- **Stage 3 — fallback** (low confidence): no rule-id or path signal recognized (e.g. a custom rule, or a whole-document rule like `given: "$"`) → defaults to `unsafe` with `low` confidence. Conservative-by-default, matching the existing project philosophy ("absence of a safety signal is never treated as evidence of safety"). + +**Rationale**: Reuses a proven, explainable, deterministic pattern already accepted by the project (and by users reading `quick_fixes_algorithm_spec.md`) rather than inventing a new paradigm; the three-tier extension is the minimal change that satisfies FR-002 while preserving FR-007 (no regression for `safe`). + +**Alternatives considered**: Statistical/ML classification over rule descriptions — rejected: nondeterministic, costly, and violates Constitution Principle V (zero-cost prerequisites) if it requires an external model; also harder to explain ("rationale" requirement, FR-003) than a deterministic rule table. + +## 4. Default/fallback behavior when a rule has no analysis + +**Decision**: Any violation whose `ruleId` is absent from the cached `RulesetAnalysis` (e.g. ruleset changed between analysis and grading) defaults to `unsafe` / `low` confidence at lookup time, not just at Stage 3 of the analyser itself. + +**Rationale**: Directly required by FR-009 and the spec's first Edge Case; keeps the conservative-by-default guarantee end-to-end, not just inside the analyser. + +## 5. Internal naming cleanup (completing the Feature 11 deferral) + +**Decision**: Rename internal identifiers wholesale — no backward-compatible aliases (pre-v1.0, consistent with Feature 11's precedent): + +| Old | New | +|---|---| +| `packages/api-grade-core/src/quick-fixes.ts` | `packages/api-grade-core/src/remediation-safety.ts` | +| `classifyViolation()` | `classifyViolation()` removed; replaced by `analyseRuleset()` (new) + `getRemediationSafety(diagnostic, rulesetAnalysis)` (lookup) | +| `buildQuickFix()` | `buildRemediationItem()` | +| `buildQuickFixOutput()` | `buildRemediationSafetyOutput()` | +| `formatQuickFixesHuman()` | `formatRemediationSafetyHuman()` | +| Types: `QuickFix`, `QuickFixOutput`, `ViolationClass` | `RemediationItem`, `RemediationSafetyOutput`, `RemediationSafetyLevel` (3-value), plus new `ConfidenceLevel`, `RuleAnalysis`, `RulesetAnalysis` | +| `packages/api-grade-mcp/src/tools/quick-fixes-only.ts`, `registerQuickFixesOnlyTool` | `remediation-safety.ts`, `registerRemediationSafetyTool` | +| `tests/integration/cli-quick-fixes.test.ts`, `packages/api-grade-mcp/tests/integration/quick-fixes-only.test.ts`, `packages/api-grade-core/tests/unit/quick-fixes.test.ts` | renamed to `cli-remediation-safety.test.ts`, `remediation-safety.test.ts`, `remediation-safety.test.ts` | + +`CHANGELOG.md` entries describing **past** releases are historical record and are not rewritten (an accurate record of what a prior version did); `CONTRIBUTING.md`'s package/tool table is **current-state** documentation and is in scope for correction (it still names the pre-Feature-11 tool, which is already stale today). + +## 6. Exposing the ruleset analyser independent of grading (FR-011) + +**Decision**: Add one new surface per tool, following each tool's existing naming convention: + +- **CLI**: new subcommand `ruleset-analysis [--ruleset-path ] [--format json|human]`, implemented alongside the existing `config` subcommand (`src/cli/ruleset-config-cli.ts` pattern) in a new `src/cli/ruleset-analysis-cli.ts`. Defaults to analysing the built-in ruleset when no path is given. +- **MCP**: new tool `analyse-ruleset-safety`, following the `get-ruleset-config`/`set-ruleset-config` `-ruleset-` naming convention, accepting an optional `rulesetPath`. + +**Rationale**: Matches existing patterns exactly rather than inventing a new naming scheme; keeps the tool list self-describing per the constitution's AI Integration Requirements (no extra docs needed to discover it). + +## 7. Filtering semantics for the new levels + +**Decision**: `--remediation-safety ` / MCP `level` parameter remains an **exact-match filter** (return only violations whose computed risk equals the requested level), not a cumulative "at or below" filter. + +**Rationale**: Preserves FR-007 (identical behavior for `safe`) without redefining what the existing parameter means; a cumulative mode is not requested by the spec and would be a separate, additive feature if ever needed (YAGNI per the constitution's Development Workflow). diff --git a/specs/012-remediation-safety/spec.md b/specs/012-remediation-safety/spec.md new file mode 100644 index 0000000..b65498d --- /dev/null +++ b/specs/012-remediation-safety/spec.md @@ -0,0 +1,108 @@ +# Feature Specification: Remediation Safety (Ruleset Analyser & Multi-Level Safety) + +**Feature Branch**: `012-remediation-safety` + +**Created**: 2026-06-23 + +**Status**: Draft + +**Input**: User description: "Feature 12 - Remediation safety (from GOAL.md): Build a ruleset analyser that determines the level of risk associated with remediating violations identified by each of its rules, along with a confidence level in that determination. Extend remediation safety to support additional levels (humanreview, unsafe) beyond the existing safe level, calculated from the analyser's output, in alignment with a new automated_remediation_safety_algorithm_spec.md. Surface remediation safety in JSON and human output across tools and packages. Complete the refactor away from the 'quick fixes only' concept — started superficially in Feature 11 — so 'quick fixes' no longer appears anywhere in the code base or user-facing documentation." + +## User Scenarios & Testing *(mandatory)* + +### User Story 1 - Developer sees a risk-graded remediation plan, not just a flat safe/unsafe split (Priority: P1) + +A developer grading their API spec wants to know, for every violation, how risky it would be to auto-remediate it: which violations can be fixed with no review, which need a human to sanity-check the result, and which should not be auto-remediated at all. Today the tool only distinguishes "safe" from "everything else"; the developer wants the middle ground made visible so they can triage efficiently instead of treating every non-trivial fix as equally risky. + +**Why this priority**: This is the core value of the feature — without the three-level split, the rest of the feature (analyser, confidence, spec doc) has no visible user benefit. + +**Independent Test**: Grade a sample spec containing violations that are clearly auto-fixable (e.g. missing descriptions), violations that need human judgement (e.g. missing operation `operationId`), and violations that should never be auto-fixed (e.g. removing a required field). Confirm the three groups are reported under `safe`, `humanreview`, and `unsafe` respectively, with counts matching expectations. + +**Acceptance Scenarios**: + +1. **Given** a spec with violations spanning all three risk categories, **When** the user requests remediation-safety output (CLI, MCP, or package API), **Then** each violation is labeled with exactly one of `safe`, `humanreview`, or `unsafe`. +2. **Given** the user filters output to `--remediation-safety safe`, **When** results are returned, **Then** only violations classified `safe` are included, identical in scope to today's behavior. +3. **Given** the user filters output to `--remediation-safety humanreview`, **When** results are returned, **Then** violations classified as `humanreview` are included (the new level introduced by this feature). +4. **Given** the user filters output to `--remediation-safety unsafe`, **When** results are returned, **Then** violations classified as `unsafe` are included (the new level introduced by this feature). + +--- + +### User Story 2 - Ruleset maintainer trusts the analyser's classification because confidence is shown (Priority: P2) + +A team supplies its own custom Spectral ruleset. They want to understand, rule by rule, why the analyser assigned a given risk level, and how confident the analyser is in that assignment, so they can spot-check or override classifications they disagree with before relying on them in CI. + +**Why this priority**: Confidence is what makes the analyser trustworthy for custom/third-party rulesets where the built-in heuristics may not apply cleanly; without it, users have no way to judge whether a `safe` label is well-founded. + +**Independent Test**: Run the ruleset analyser against both the built-in ruleset and a custom ruleset containing rules with no recognizable pattern. Confirm every rule receives a risk level and a confidence level, and that unrecognized/ambiguous rules receive a visibly lower confidence than well-known rules. + +**Acceptance Scenarios**: + +1. **Given** a ruleset is analysed, **When** the analysis completes, **Then** every rule in the ruleset has an assigned risk level (`safe`, `humanreview`, or `unsafe`) and a confidence level for that assignment. +2. **Given** a rule the analyser cannot confidently classify (e.g. a custom rule with no recognizable id pattern or schema path), **When** it is analysed, **Then** it is still assigned a risk level (defaulting to the more conservative `unsafe` or `humanreview`) but with a low confidence indicator, rather than being silently omitted. +3. **Given** the analyser's output for a ruleset, **When** a user inspects it (JSON or human format), **Then** they can see, per rule, the risk level, confidence level, and a brief rationale. + +--- + +### User Story 3 - Documentation and code no longer mention "quick fixes" anywhere (Priority: P3) + +A new contributor or documentation reader should encounter "remediation safety" consistently everywhere — command help, MCP tool descriptions, READMEs, internal function/type names, test names — with no leftover "quick fixes" terminology anywhere, since Feature 11 deliberately left internal naming unchanged pending this feature. + +**Why this priority**: Lower priority than the functional capability itself, but required to close out the deferred work from Feature 11 and avoid the codebase carrying two names for the same concept indefinitely. + +**Independent Test**: Search the entire repository (source, tests, docs, package metadata) for "quick fix" / "quickFix" / "quick-fix" (case-insensitive) and confirm zero matches. + +**Acceptance Scenarios**: + +1. **Given** the full repository (all packages, docs, tests), **When** searched for "quick fix" in any casing or separator style, **Then** no matches are found. +2. **Given** the CLI, MCP server, and package public APIs, **When** their exported names, types, and option/tool names are inspected, **Then** all of them use "remediation safety" terminology exclusively. + +--- + +### Edge Cases + +- What happens when a violation's rule was never analysed (e.g. ruleset changed between analysis and grading, or a dynamically generated rule id)? The system must assign a safe default (most conservative: `unsafe`) rather than crash or silently drop the violation from output. +- How does the analyser handle a rule that legitimately spans multiple risk levels depending on context (e.g. a rule that sometimes flags a breaking change and sometimes a cosmetic one)? The specification (`automated_remediation_safety_algorithm_spec.md`) must define how such rules are classified — at the rule level, the analyser assigns one risk/confidence pair per rule; finer-grained, per-violation distinctions are out of scope for the analyser itself. +- What happens when a custom/private ruleset is supplied that the analyser has never seen before? It must still produce a complete classification (risk + confidence) for every rule, with confidence honestly reflecting the lack of prior knowledge, rather than failing the grading run. +- What happens to existing consumers (CI pipelines, scripts) that depend on today's binary `safe` vs "not safe" filtering? `--remediation-safety safe` (and equivalent MCP/package usage) must continue to mean exactly what it means today; the new levels are additive, not a breaking redefinition of `safe`. + +## Requirements *(mandatory)* + +### Functional Requirements + +- **FR-001**: The system MUST provide a ruleset analyser that, given a Spectral-compatible ruleset, produces for each rule a risk level describing how risky it would be to automatically remediate violations of that rule. +- **FR-002**: The risk level produced for each rule MUST be one of exactly three values: `safe`, `humanreview`, or `unsafe`. +- **FR-003**: The ruleset analyser MUST also produce, for each rule, a confidence level indicating how confident the analyser is in the assigned risk level. +- **FR-004**: The ruleset analyser's classification logic MUST be implemented in alignment with a new specification document, `automated_remediation_safety_algorithm_spec.md`, authored as part of this feature and stored alongside the existing algorithm specs (`specs/algorithms/`). +- **FR-005**: Remediation safety for a given violation MUST be calculated by looking up the risk level (and confidence) the ruleset analyser assigned to that violation's rule, rather than via the prior ad hoc rule-id-prefix/path heuristic. +- **FR-006**: The `--remediation-safety` CLI option (and equivalent MCP/package parameters) MUST accept all three levels: `safe`, `humanreview`, and `unsafe`. +- **FR-007**: Requesting `--remediation-safety safe` MUST produce output equivalent in scope to today's pre-feature behavior (no regression for existing users of the `safe` level). +- **FR-008**: Remediation safety information (risk level per violation, and the rule-level confidence behind it) MUST be included in both the JSON output and the human-readable output of every tool that currently reports remediation-safety/quick-fix information (CLI, MCP server tools, and any consuming packages such as the Backstage plugin where applicable). +- **FR-009**: When a violation's rule has no analyser result available at grading time, the system MUST default that violation to the most conservative risk level (`unsafe`) rather than omitting it or failing. +- **FR-010**: All source code, tests, type/function/tool names, package metadata, and user-facing or contributor-facing documentation across the repository MUST be updated so that no "quick fix" terminology (in any casing or separator style) remains. +- **FR-011**: The ruleset analyser's per-rule results (risk level, confidence level, and rationale) MUST be inspectable by users, in both JSON and human-readable form, independent of grading a specific API spec (i.e. "analyse this ruleset" is a capability in its own right, not only an internal implementation detail). + +### Key Entities *(include if feature involves data)* + +- **Ruleset Analyser Result**: Per analysed ruleset, a collection of per-rule entries. Each entry references a rule id and carries the rule's assigned risk level, confidence level, and a short human-readable rationale for the assignment. +- **Risk Level**: One of `safe`, `humanreview`, `unsafe` — describes how safe it is to automatically remediate a violation of a given rule without human review. +- **Confidence Level**: Describes how confident the analyser is in a rule's assigned risk level (e.g. driven by how well-known/recognizable the rule is versus how custom/ambiguous it is). +- **Remediation Safety (per violation)**: The risk level applied to a specific violation found during grading, derived from the ruleset analyser's result for that violation's rule. + +## Success Criteria *(mandatory)* + +### Measurable Outcomes + +- **SC-001**: Users grading any spec can distinguish all three remediation-safety levels (`safe`, `humanreview`, `unsafe`) in both JSON and human output, for both the built-in ruleset and a supplied custom ruleset. +- **SC-002**: For the built-in ruleset, every rule has a documented risk level and confidence level traceable to the `automated_remediation_safety_algorithm_spec.md` specification. +- **SC-003**: A repository-wide search for "quick fix" (any casing/separator) returns zero matches after the feature is complete. +- **SC-004**: Existing `--remediation-safety safe` users observe no behavioral change in the set of violations returned, compared to before this feature. +- **SC-005**: For an arbitrary, previously-unseen custom ruleset, the analyser completes and returns a risk and confidence level for 100% of its rules (no rule left unclassified). + +## Assumptions + +- The three risk levels (`safe`, `humanreview`, `unsafe`) and their relative ordering (in terms of caution) were fixed by Feature 11 and GOAL.md and are not renegotiated here. +- "Confidence level" is assumed to be a small ordered set (e.g. high/medium/low) rather than a continuous numeric score, consistent with how grades and other diagnostics in this project favor discrete, explainable categories over raw scores; the exact scale is defined in `automated_remediation_safety_algorithm_spec.md` during planning. +- The ruleset analyser operates on rule definitions/metadata (id, applied path/schema patterns, severity, description) rather than on a corpus of historical remediation outcomes — there is no assumption of a training/feedback loop in this feature. +- "Rationale" per rule is a short, human-readable explanation (not a separate structured field requiring its own schema beyond a text string) sufficient for users to understand why a level was assigned. +- Backstage plugin packages are in scope for surfacing remediation safety only insofar as they already surface quick-fix/remediation-safety information today; if they do not yet do so, extending them is out of scope for this feature. +- This feature does not change how a custom ruleset is supplied (file path, GitHub PAT, etc.) — only how its rules are risk-classified once available. diff --git a/specs/algorithms/automated_remediation_safety_algorithm_spec.md b/specs/algorithms/automated_remediation_safety_algorithm_spec.md new file mode 100644 index 0000000..db1e517 --- /dev/null +++ b/specs/algorithms/automated_remediation_safety_algorithm_spec.md @@ -0,0 +1,194 @@ +# Automated Remediation Safety Algorithm Specification + +**Version:** 1.0.0 | **Scope:** Spectral-compatible rulesets (OpenAPI 3.0+, AsyncAPI 3.0+) + +--- + +## Overview + +Determines, for every **rule** in a loaded ruleset, how risky it would be to automatically apply a fix for any violation of that rule (`riskLevel`), and how confident that determination is (`confidenceLevel`). Runs once per loaded ruleset (the "ruleset analyser"), independent of grading any specific API spec. A violation's remediation safety is then a cached lookup against this per-rule result, not a fresh computation. + +This algorithm supersedes the two-class `classifyViolation()` algorithm described in [`quick_fixes_algorithm_spec.md`](./quick_fixes_algorithm_spec.md), extending it from a binary `nonBreaking`/`breaking` split (with `unknown` as an exclusion bucket) to three first-class risk levels with an explicit confidence dimension. It consumes rule **metadata** (`ruleId`, the rule's `given` JSONPath expression(s), `then.function`) from a loaded Spectral ruleset object — it does not consume `Diagnostic[]` directly; diagnostics are matched to their rule's pre-computed result by `ruleId` when remediation safety is needed for a grading run. + +--- + +## Risk Levels + +A rule is classified into exactly one of three risk levels: + +- **`safe`** — fixing violations of this rule only adds or corrects descriptive metadata. No client, server, or contract test validates against it. Safe to apply automatically, including by an AI agent acting without per-change human review. +- **`humanreview`** — fixing violations of this rule is typically additive or clarifying (e.g. adding a missing `operationId`, declaring a security requirement, adjusting an `enum`/`default`), but could plausibly change generated-client behavior, routing, or validation in ways a human should confirm before applying at scale. +- **`unsafe`** — fixing violations of this rule could change request/response validation, required fields, types, or the parameter surface, or the rule's risk could not be determined with any confidence. Requires human (or explicitly-confirmed agent) review before applying. + +**Design principle (inherited from the quick-fixes algorithm):** classification is positive-evidence-only for `safe`, and conservative-by-default everywhere else. A rule becomes `safe` only when a specific signal says it's safe. A rule with no signal, or with signals spanning multiple tiers, is never assumed safe — it falls to the more cautious level. + +## Confidence Levels + +Each rule's risk level carries a confidence level: + +- **`high`** — the rule id matched a curated table (Stage 1). +- **`medium`** — the rule id was unrecognized, but the rule's `given` JSONPath unambiguously matched exactly one risk tier's segment set (Stage 2). +- **`low`** — either no recognizable signal at all (Stage 3 fallback to `unsafe`), or the `given` path matched segments from **more than one** tier (Stage 2 ambiguity, downgraded from `medium`). + +--- + +## Input & Output + +**Input:** a loaded Spectral ruleset object (`LoadedRuleset.ruleset` from `packages/api-grade-core/src/rulesets/loader.ts`), specifically its `rules` map: `{ [ruleId]: { given: string | string[], then: { function: string }, severity, description, recommended } }`. + +**Output:** `analyseRuleset(ruleset) -> RulesetAnalysis`: +- `rulesetSource: 'default' | 'custom'`, `rulesetPath?: string` — mirrors the input `LoadedRuleset`. +- `rules: RuleAnalysis[]` — exactly one entry per rule key in the input ruleset (no omissions — see Implementation Notes). + +Each `RuleAnalysis`: `{ ruleId, riskLevel, confidenceLevel, rationale }`. + +A second function, `getRemediationSafety(diagnostic, rulesetAnalysis) -> { riskLevel, confidenceLevel }`, performs the per-violation lookup at grading time (see Stage 4). + +--- + +## Stage 1: Curated Rule-ID Tables + +Checked first, per rule. Short-circuits the rest of classification when it matches. Three disjoint, curated tables (extending the single `RULE_ID_NON_BREAKING_PREFIXES` table from the quick-fixes algorithm into three tiers): + +``` +SAFE_RULE_ID_PREFIXES = [ + "operation-description", "operation-summary", + "info-contact", "info-description", "info-license", + "oas3-examples-", "tag-description" +] + +HUMANREVIEW_RULE_ID_PREFIXES = [ + "operation-operationId", "operation-2xx-response", + "oas3-server-not-example-com", "oas3-server-trailing-slash", + "operation-security-defined" +] + +UNSAFE_RULE_ID_PREFIXES = [ + "oas3-schema", "oas3-valid-schema-example" +] + +FOR EACH rule IN ruleset.rules: + FOR EACH prefix IN SAFE_RULE_ID_PREFIXES: + IF rule.id.startsWith(prefix): RETURN { riskLevel: "safe", confidenceLevel: "high", rationale: "rule id matched curated safe-prefix table" } + FOR EACH prefix IN HUMANREVIEW_RULE_ID_PREFIXES: + IF rule.id.startsWith(prefix): RETURN { riskLevel: "humanreview", confidenceLevel: "high", rationale: "rule id matched curated humanreview-prefix table" } + FOR EACH prefix IN UNSAFE_RULE_ID_PREFIXES: + IF rule.id.startsWith(prefix): RETURN { riskLevel: "unsafe", confidenceLevel: "high", rationale: "rule id matched curated unsafe-prefix table" } +``` + +**Rationale:** identical justification to the quick-fixes algorithm's Stage 1 — these rule IDs are curated from the built-in rulesets and the curators have direct knowledge of what each rule actually validates, making the rule ID itself an authoritative signal that outranks any generic path heuristic. + +**Maintenance note:** these tables are expected to grow as the project encounters new well-known rules (including from popular custom/community rulesets). Adding an entry is a config-only change, not an algorithm change. + +--- + +## Stage 2: Path-Segment Heuristic on the Rule's `given` + +Runs only when Stage 1 doesn't match. Inspects the rule's `given` JSONPath expression(s) — the schema location(s) the rule applies to — against three tiered, disjoint keyword sets, **most conservative checked first**: + +``` +UNSAFE_SEGMENTS = { "required", "type", "format", "parameters" } +HUMANREVIEW_SEGMENTS = { "enum", "default", "security", "servers", "operationId", "additionalProperties", "responses" } +SAFE_SEGMENTS = { + "description", "summary", "title", "contact", "license", + "termsOfService", "externalDocs", "example", "examples", "tags", "info" +} + +segments_for(given) = split the JSONPath expression(s) into path-like tokens (same tokenization the quick-fixes algorithm applies to a violation's `path` array) + +matched_tiers(rule): + tiers ← {} + FOR EACH segment IN segments_for(rule.given): + IF segment.startsWith("x-"): tiers.add("safe") + IF segment IN UNSAFE_SEGMENTS: tiers.add("unsafe") + IF segment IN HUMANREVIEW_SEGMENTS: tiers.add("humanreview") + IF segment IN SAFE_SEGMENTS: tiers.add("safe") + RETURN tiers + +classify_by_path(rule): + tiers ← matched_tiers(rule) + IF tiers is empty: RETURN null // Stage 3 fallback + level ← most conservative tier in tiers // unsafe > humanreview > safe + confidence ← (tiers.size == 1) ? "medium" : "low" + rationale ← (tiers.size == 1) + ? "given path matched the " + level + " segment set" + : "given path matched multiple tiers (" + tiers.join(", ") + ") — conservative match, ambiguous" + RETURN { riskLevel: level, confidenceLevel: confidence, rationale } +``` + +**Rationale for tier contents:** `UNSAFE_SEGMENTS` and `SAFE_SEGMENTS` are carried over unchanged from the quick-fixes algorithm's `BREAKING_SEGMENTS`/`NON_BREAKING_SEGMENTS` (same justification: `required`/`type`/`format`/`parameters` affect contract validity; documentation/metadata fields and `x-` vendor extensions never do). `HUMANREVIEW_SEGMENTS` is new: `enum`/`default` change what values are considered valid or assumed without removing existing valid values outright; `security`/`servers` change where/how requests are authenticated or routed — operationally significant but rarely rejected by a contract test; `operationId`/`responses`/`additionalProperties` affect generated-client method names or extensibility, plausible to need a human's confirmation but not a breaking validation change in the way `required`/`type` are. + +**Rationale for ambiguity downgrade:** a rule whose `given` spans multiple tiers (e.g. applies broadly to a schema with both `description` and `required` reachable beneath it) genuinely could not be classified with confidence by this heuristic alone — picking the conservative level avoids a false "safe", but the confidence MUST still reflect that the match itself was ambiguous, so a ruleset maintainer reviewing the analyser's output knows to look closer. + +--- + +## Stage 3: Fallback + +Runs only when neither Stage 1 nor Stage 2 produced a result (e.g. a custom rule with an unrecognized id and a `given` of `"$"` or another pattern with no matching segment). + +``` +RETURN { riskLevel: "unsafe", confidenceLevel: "low", rationale: "no recognizable rule-id or path signal" } +``` + +**Rationale:** conservative-by-default — an unanalyzable rule is never assumed safe to auto-remediate. This also guarantees SC-005 (100% of rules in any ruleset receive a classification): every rule reaches Stage 3 if Stages 1–2 don't match, so no rule is ever left unclassified. + +--- + +## Stage 4: Per-Violation Lookup (Remediation Safety) + +Used at grading time, not during ruleset analysis. Given a `Diagnostic` and a previously-computed `RulesetAnalysis`: + +``` +get_remediation_safety(diagnostic, rulesetAnalysis): + entry ← rulesetAnalysis.rules.find(r => r.ruleId == diagnostic.ruleId) + IF entry exists: RETURN { riskLevel: entry.riskLevel, confidenceLevel: entry.confidenceLevel } + RETURN { riskLevel: "unsafe", confidenceLevel: "low" } // FR-009: rule unanalysed at lookup time +``` + +**Rationale:** keeps grading O(1) per violation (a map lookup) instead of re-running the analyser per diagnostic, and preserves the conservative-default guarantee even in the edge case where the ruleset changed between analysis and grading (e.g. a remote ruleset URL was re-fetched and gained a rule). + +--- + +## Example: Mixed Rules + +``` +rules = [ + { id: "operation-description", given: "$.paths[*][*]" }, + { id: "operation-operationId", given: "$.paths[*][*]" }, + { id: "oas3-schema", given: "$" }, + { id: "custom-required-header", given: "$.paths[*][*].parameters[?(@.in=='header')].required" }, + { id: "custom-naming-convention", given: "$.paths[*]" } +] +``` + +| Rule | Stage matched | Risk | Confidence | Why | +|---|---|---|---|---| +| `operation-description` | Stage 1 (safe table) | `safe` | `high` | Rule id matched curated safe-prefix table | +| `operation-operationId` | Stage 1 (humanreview table) | `humanreview` | `high` | Rule id matched curated humanreview-prefix table | +| `oas3-schema` | Stage 1 (unsafe table) | `unsafe` | `high` | Rule id matched curated unsafe-prefix table | +| `custom-required-header` | Stage 2 (`required` segment) | `unsafe` | `medium` | `given` path matched the unsafe segment set only | +| `custom-naming-convention` | Stage 3 (fallback) | `unsafe` | `low` | No recognizable rule-id or path signal | + +--- + +## Key Decision Points + +| Component | Logic | +|---|---| +| **Classification granularity** | Per rule, not per violation instance — one `RuleAnalysis` per `ruleId` in the ruleset | +| **Stage priority** | Curated rule-id table (Stage 1) → path heuristic on `given` (Stage 2) → fallback (Stage 3) | +| **Tier priority (Stage 1 and Stage 2)** | `unsafe` checked/preferred over `humanreview` over `safe` whenever ambiguity exists | +| **Confidence assignment** | `high` = curated table match; `medium` = single-tier path match; `low` = fallback or multi-tier path match | +| **Default when unanalysable** | `unsafe` / `low` confidence — never `safe` | +| **Per-violation lookup miss** | Defaults to `unsafe` / `low`, same as an unanalysable rule (FR-009) | +| **Caching** | Computed once per loaded ruleset; reused for every diagnostic in a grading run that shares that ruleset | + +--- + +## Implementation Notes + +- **Deterministic:** no randomization, timestamps, or external state; re-analysing the same ruleset always yields the same `RulesetAnalysis`. +- **Total coverage:** every rule key present in the input ruleset produces exactly one `RuleAnalysis` (Stage 3 guarantees this) — satisfies SC-005. +- **Spec-format agnostic:** operates on ruleset rule metadata, which is uniform across the OpenAPI and AsyncAPI built-in rulesets and any custom Spectral-compatible ruleset; no spec-type branching required. +- **Conservative by design:** `unsafe`/`low` is the universal fallback, not an error condition. +- **Relationship to grading:** does not affect score, letter grade, or diagnostic ordering — it is consulted only when building remediation-safety-specific output (CLI `--remediation-safety`, MCP `grade-api-remediation-safety` and `analyse-ruleset-safety`). From 3ef9adb621fadb486a8c122b6973363cf272a3b7 Mon Sep 17 00:00:00 2001 From: DawMatt Date: Wed, 24 Jun 2026 09:58:47 +1000 Subject: [PATCH 02/22] Remediation safety algorithm scope clarified --- .../clarification-algorithm.md | 283 ++++++++++++++++++ .../algorithms/quick_fixes_algorithm_spec.md | 247 --------------- 2 files changed, 283 insertions(+), 247 deletions(-) create mode 100644 specs/012-remediation-safety/clarification-algorithm.md delete mode 100644 specs/algorithms/quick_fixes_algorithm_spec.md diff --git a/specs/012-remediation-safety/clarification-algorithm.md b/specs/012-remediation-safety/clarification-algorithm.md new file mode 100644 index 0000000..a8b1266 --- /dev/null +++ b/specs/012-remediation-safety/clarification-algorithm.md @@ -0,0 +1,283 @@ +# Clarification - Remediation Safety + +## Document Purpose + +This document advises how to produce the specification for the remediation safety algorithm. The resultant specification will be called `automated_remediation_safety_algorithm_spec.md` and stored in the `spec/algorithms` folder. + +The specification will provide both a human readable description of the specification, including the rationale for key aspects of the algorithm, and the proposed algorithm. This specification will drive the implementation of the algorithm. + +## Purpose + +We are checking the output of a linter (Spectral) to estimate how safe it is to remediate each of the errors/warnings/etc returned after linting an OpenAPI or AsyncAPI specification. The estimated remediation safety levels are safe, human review, or unsafe. Human review indicates we can't confidently assess the safety level and recommend human review. + +The safety level indicates how likely it is that remediation will result in `breaking changes` to the API, requiring API consumers to change their code before they can continue to use the API. + +The goal of determining the safety level is to identify which API lint issue remediations could be safely automated using AI without impacting API consumers. + +## Constraints + +- The Spectral ruleset being assessed is configurable by the user. It may not have been seen before and the estimation of remediation safety level will usually need to be performed programatically. +- At the time we are estimating the remediation safety level we do not have a modified specification available. So we will be unable to verify whether a specific remediation attempt represents a breaking changes relative to the original API. +- You can **estimate** risk automatically, but you **cannot guarantee correctness** for an arbitrary new Spectral ruleset with zero human intervention. The reason is that Spectral rules can use JSONPath selectors plus built-in functions, **and** they can also use **custom JavaScript functions**. That means some rules are simple and machine-interpretable, while others are effectively arbitrary code whose remediation intent cannot be derived perfectly from the rule declaration alone. +- Use a conservative operating mode where “unclear” means a safety level of human review or unsafe. + +## Recommended High Level Approach + +1. When possible, load pre-calculated risks and safety levels for known rulesets. At a minimum this will include the default ruleset, but provide an option for users to pre-configured their own risk and safety levels for their rulesets. + +2. When no pre-calculated risks and safety levels are available, pre-process the ruleset Use an **automated risk estimator** that infers the likely remediation, and its likely consumer impact, for every rule and outputs both a **risk score** and a **confidence score**. + +3. Use the **risk score** and **confidence score** to calculate a **safety level** for every rule in the ruleset. + +4. Allow this information to be persisted, so users can clarify and enrich the risk levels so they can be reloaded for use in future. + +## Recommended High Level Estimating Model Approach + +The most effective fully automated approach is: + +1. **Parse the ruleset itself** to understand `given`, `then`, `function`, `field`, `formats`, `message`, and `description`. Spectral rules are explicitly structured that way. [\[github.com\]](https://github.com/stoplightio/spectral/blob/develop/docs/getting-started/3-rulesets.md), [\[docs.stoplight.io\]](https://docs.stoplight.io/docs/spectral/branches/develop/d3482ff0ccae9-rules) +2. **Map the targeted document locations** such as `$.paths`, parameters, request bodies, responses, channel addresses, operations, or metadata to a **contract-surface ontology** for OpenAPI and AsyncAPI. OpenAPI and AsyncAPI define these structures formally. [\[spec.openapis.org\]](https://spec.openapis.org/oas/v3.1.1.html), [\[asyncapi.com\]](https://www.asyncapi.com/docs/reference/specification/v3.1.0), [\[asyncapi.com\]](https://www.asyncapi.com/docs/concepts/asyncapi-document/adding-operations) +3. **Infer the minimal satisfying edit** that would make the finding disappear, using rule semantics where possible. Spectral findings are generated by applying functions to selected document locations, so the rule shape gives strong clues about what kind of edit would satisfy it. [\[github.com\]](https://github.com/stoplightio/spectral/blob/develop/docs/getting-started/3-rulesets.md), [\[docs.stoplight.io\]](https://docs.stoplight.io/docs/spectral/branches/develop/d3482ff0ccae9-rules), [\[docs.devne...cademy.com\]](https://docs.devnet-academy.com/docs/postman/api-governance/configurable-rules/spectral/index.html) +4. **Estimate whether that likely edit touches public contract elements** such as paths, parameters, request or response schemas, channel addresses, or send/receive operations. Those elements are what consumers depend on. [\[spec.openapis.org\]](https://spec.openapis.org/oas/v3.1.1.html), [\[asyncapi.com\]](https://www.asyncapi.com/docs/reference/specification/v3.1.0), [\[asyncapi.com\]](https://www.asyncapi.com/docs/concepts/asyncapi-document/dynamic-channel-address), [\[asyncapi.com\]](https://www.asyncapi.com/docs/concepts/asyncapi-document/adding-operations) +5. **Downgrade confidence** when the rule uses a custom function or when the remediation is ambiguous. Spectral explicitly supports custom JavaScript functions, so some rules are not safely explainable by static metadata alone. [\[github.com\]](https://github.com/stoplightio/spectral/blob/develop/docs/guides/5-custom-functions.md) + +For an unknown ruleset, the system should be designed to produce: + +* `estimatedRisk` +* `confidence` +* `remediationSafetyLevel` + +rather than a false claim of certainty. + +## Recommended approach for a new ruleset with no human intervention + +### 1. Treat the ruleset as data to be analysed + +Your system should ingest the ruleset itself, not just the lint output. Spectral rules expose at least: + +* `given` +* `then` +* `function` +* `field` +* `severity` +* `formats` +* `description` +* `message` [\[docs.stoplight.io\]](https://docs.stoplight.io/docs/spectral/branches/develop/d3482ff0ccae9-rules), [\[github.com\]](https://github.com/stoplightio/spectral/blob/develop/docs/getting-started/3-rulesets.md) + +That is enough to infer a lot about intent for many rules. + +#### Example + +A rule like this: + +```yaml +given: $.paths[*]~ +then: + function: pattern + functionOptions: + match: "^(\\/|[a-z0-9-.]+|{[a-zA-Z0-9_]+})+$" +``` + +clearly targets **path keys** and enforces a **naming convention**. Spectral’s own ruleset tutorial uses exactly this kind of example. [\[github.com\]](https://github.com/stoplightio/spectral/blob/develop/docs/getting-started/3-rulesets.md) + +That tells you immediately: + +* the finding touches a **public URI surface** +* the likely remediation is a **rename** +* renaming a real path is **consumer-affecting**. OpenAPI treats paths and path templating as part of the API contract. [\[github.com\]](https://github.com/stoplightio/spectral/blob/develop/docs/getting-started/3-rulesets.md), [\[swagger.io\]](https://swagger.io/specification/), [\[spec.openapis.org\]](https://spec.openapis.org/oas/v3.1.1.html) + +*** + +### 2. Build a format-aware contract-surface ontology + +Your system needs a built-in understanding of which parts of OpenAPI and AsyncAPI are more likely to affect consumers. + +#### For OpenAPI + +OpenAPI formally describes HTTP API structure including `paths`, operations, parameters, request bodies, and responses, and says the description is used by documentation generators, code generators, and testing tools. [\[swagger.io\]](https://swagger.io/specification/), [\[spec.openapis.org\]](https://spec.openapis.org/oas/v3.1.1.html) + +So you should classify targeted locations roughly like this: + +#### High consumer-impact areas + +* `paths` keys +* path template variables +* parameters +* request bodies +* response bodies +* response codes +* security requirements +* reusable schemas referenced by the above [\[swagger.io\]](https://swagger.io/specification/), [\[spec.openapis.org\]](https://spec.openapis.org/oas/v3.1.1.html) + +#### Medium consumer-impact areas + +* `operationId` +* tags or names used by codegen and docs +* component identifiers used in client generation [\[swagger.io\]](https://swagger.io/specification/), [\[spec.openapis.org\]](https://spec.openapis.org/oas/v3.1.1.html) + +#### Low consumer-impact areas + +* descriptions +* contact metadata +* licence metadata +* summaries, where not used as identifiers [\[github.com\]](https://github.com/stoplightio/spectral/blob/develop/docs/getting-started/3-rulesets.md), [\[docs.stoplight.io\]](https://docs.stoplight.io/docs/spectral/branches/develop/d3482ff0ccae9-rules) + +#### Important example + +OpenAPI path templating requires that each template expression correspond to a path parameter. That means a rule targeting path-template consistency is touching a real contract concern, even if worded as “correctness” rather than “compatibility”. [\[swagger.io\]](https://swagger.io/specification/), [\[spec.openapis.org\]](https://spec.openapis.org/oas/v3.1.1.html) + +*** + +#### For AsyncAPI + +AsyncAPI formally describes channels, operations, messages, and action semantics such as `send` and `receive`, and also describes dynamic channel addresses and parameters. [\[asyncapi.com\]](https://www.asyncapi.com/docs/reference/specification/v3.1.0), [\[asyncapi.com\]](https://www.asyncapi.com/docs/concepts/asyncapi-document/dynamic-channel-address), [\[asyncapi.com\]](https://www.asyncapi.com/docs/concepts/asyncapi-document/adding-operations) + +So you should classify: + +### High consumer-impact areas + +* channel `address` +* channel parameters tied to address placeholders +* operation `action` +* operation-channel relationship +* messages and payload schemas +* reply or operation semantics if covered by the ruleset [\[asyncapi.com\]](https://www.asyncapi.com/docs/reference/specification/v3.1.0), [\[asyncapi.com\]](https://www.asyncapi.com/docs/concepts/asyncapi-document/dynamic-channel-address), [\[asyncapi.com\]](https://www.asyncapi.com/docs/concepts/asyncapi-document/adding-operations) + +#### Low consumer-impact areas + +* metadata such as descriptions and contact details [\[asyncapi.com\]](https://www.asyncapi.com/docs/reference/specification/v3.1.0) + +*** + +### 3. Infer the likely remediation from the rule mechanics + +This is the key step. + +#### If the rule uses a built-in structural function + +Many rules are understandable from the rule body alone. + +Examples: + +* `truthy` on a descriptive field usually means “add the missing field” or “make it non-empty” [\[docs.stoplight.io\]](https://docs.stoplight.io/docs/spectral/branches/develop/d3482ff0ccae9-rules) +* `pattern` usually means “rename or reformat the targeted value” [\[github.com\]](https://github.com/stoplightio/spectral/blob/develop/docs/getting-started/3-rulesets.md), [\[docs.devne...cademy.com\]](https://docs.devnet-academy.com/docs/postman/api-governance/configurable-rules/spectral/index.html) +* `field` plus `truthy` usually means “add a missing sub-field” [\[docs.stoplight.io\]](https://docs.stoplight.io/docs/spectral/branches/develop/d3482ff0ccae9-rules) + +That lets you classify the likely remediation with no manual rule catalogue. + +#### If the rule uses a custom JavaScript function + +Then you must assume lower explainability, because Spectral custom functions are arbitrary JavaScript functions with access to input, options, and rule context. [\[github.com\]](https://github.com/stoplightio/spectral/blob/develop/docs/guides/5-custom-functions.md) + +In that case, your system should: + +* inspect metadata such as `description`, `message`, `given`, `formats`, and `path` +* inspect the custom function name and file if available +* attempt static analysis only if safe and possible +* otherwise classify as `UnknownSemantics` with low confidence. [\[github.com\]](https://github.com/stoplightio/spectral/blob/develop/docs/guides/5-custom-functions.md), [\[docs.stoplight.io\]](https://docs.stoplight.io/docs/spectral/branches/develop/d3482ff0ccae9-rules) + +*** + +### 4. Use “minimal satisfying edit” as the risk basis + +The best no-human-intervention heuristic is: + +> **Estimate risk from the least invasive valid edit that would satisfy the rule.** + +Why this works: + +* lint findings do not directly encode the final remediation +* but many rules imply a **smallest-change fix** +* breakage risk is driven by what that smallest plausible fix changes + +#### Examples + +##### Example A: missing description + +A rule targeting `$.info.description` with `truthy` implies adding documentation text. That is low contract risk. [\[docs.stoplight.io\]](https://docs.stoplight.io/docs/spectral/branches/develop/d3482ff0ccae9-rules), [\[github.com\]](https://github.com/stoplightio/spectral/blob/develop/docs/getting-started/3-rulesets.md) + +##### Example B: path kebab-case + +A rule targeting `$.paths[*]~` with `pattern` implies renaming path keys to match the pattern. If that key is a public endpoint path, the minimal satisfying edit changes the API surface and is therefore high risk. [\[github.com\]](https://github.com/stoplightio/spectral/blob/develop/docs/getting-started/3-rulesets.md), [\[swagger.io\]](https://swagger.io/specification/), [\[spec.openapis.org\]](https://spec.openapis.org/oas/v3.1.1.html) + +##### Example C: AsyncAPI channel parameter rule + +A rule targeting channel parameters may be satisfiable either by adding the missing parameter definition or by changing the address. The first is probably safer than the second. AsyncAPI documentation explicitly describes address placeholders and parameter declarations together. [\[asyncapi.com\]](https://www.asyncapi.com/docs/concepts/asyncapi-document/dynamic-channel-address), [\[asyncapi.com\]](https://www.asyncapi.com/docs/reference/specification/v3.1.0) + +This is exactly why your algorithm must output **risk plus confidence**, not just risk. + +*** + +### 5. Separate risk from confidence + +This is essential. + +A rule can be: + +* **high risk, high confidence** + Example: pattern rule directly targeting `$.paths[*]~` and requiring literal path rename. [\[github.com\]](https://github.com/stoplightio/spectral/blob/develop/docs/getting-started/3-rulesets.md), [\[swagger.io\]](https://swagger.io/specification/) + +* **high risk, low confidence** + Example: custom function on `$.paths` with an unclear remediation path. [\[github.com\]](https://github.com/stoplightio/spectral/blob/develop/docs/guides/5-custom-functions.md), [\[docs.stoplight.io\]](https://docs.stoplight.io/docs/spectral/branches/develop/d3482ff0ccae9-rules) + +* **low risk, high confidence** + Example: truthy rule on `$.info.description`. [\[docs.stoplight.io\]](https://docs.stoplight.io/docs/spectral/branches/develop/d3482ff0ccae9-rules), [\[github.com\]](https://github.com/stoplightio/spectral/blob/develop/docs/getting-started/3-rulesets.md) + +Without this separation, the system will either over-block harmless changes or under-block dangerous ones. + +## References + +- [OpenAPI Breaking Changes: The Complete List of Rules | oasdiff](https://www.oasdiff.com/docs/breaking-changes) +- [Backward Compatibility Rules | Specmatic](https://docs.specmatic.io/contract_driven_development/backward_compatibility_rules) +- [Spectral Ruleset Functions](https://github.com/stoplightio/spectral/blob/develop/docs/guides/5-custom-functions.md) + +## Additional Context + + +## Spectral Rule Facts + +* Spectral rules are built from selectors and functions, and rulesets can extend built-in format-specific support for OpenAPI and AsyncAPI. [\[github.com\]](https://github.com/stoplightio/spectral/blob/develop/docs/getting-started/3-rulesets.md), [\[docs.stoplight.io\]](https://docs.stoplight.io/docs/spectral/branches/develop/d3482ff0ccae9-rules) +* Spectral supports custom JavaScript functions. [\[github.com\]](https://github.com/stoplightio/spectral/blob/develop/docs/guides/5-custom-functions.md) +* OpenAPI descriptions define paths, operations, parameters, requests, and responses, and are used by documentation, testing, and code generation tooling. [\[spec.openapis.org\]](https://spec.openapis.org/oas/v3.1.1.html) +* AsyncAPI descriptions define channels, operations, actions, and channel-address parameters. [\[asyncapi.com\]](https://www.asyncapi.com/docs/reference/specification/v3.1.0), [\[asyncapi.com\]](https://www.asyncapi.com/docs/concepts/asyncapi-document/dynamic-channel-address), [\[asyncapi.com\]](https://www.asyncapi.com/docs/concepts/asyncapi-document/adding-operations) + + +### Risk estimation not verification + +The main obstacle is that Spectral is not just a fixed linter with a closed catalogue of built-in checks. It is a **ruleset engine** where rules consist of selectors and functions, and those functions can be **custom JavaScript**. A rule can target any JSON or YAML location with JSONPath and apply either a core function or custom logic. [\[github.com\]](https://github.com/stoplightio/spectral/blob/develop/docs/getting-started/3-rulesets.md), [\[docs.stoplight.io\]](https://docs.stoplight.io/docs/spectral/branches/develop/d3482ff0ccae9-rules), [\[github.com\]](https://github.com/stoplightio/spectral/blob/develop/docs/guides/5-custom-functions.md) + +That means a new ruleset may contain: + +* obvious style rules such as “path should be kebab-case”, where the likely remediation is easy to infer from the `pattern` function and the path selector, or [\[github.com\]](https://github.com/stoplightio/spectral/blob/develop/docs/getting-started/3-rulesets.md), [\[docs.stoplight.io\]](https://docs.stoplight.io/docs/spectral/branches/develop/d3482ff0ccae9-rules) +* arbitrary business-specific rules implemented in code, where the likely remediation may not be derivable statically with high confidence. [\[github.com\]](https://github.com/stoplightio/spectral/blob/develop/docs/guides/5-custom-functions.md) + +Therefore, a zero-human system **must be conservative by design**: + +* **estimate risk automatically** +* **attach confidence** +* **default to safer outcomes when semantics are unclear**. [\[github.com\]](https://github.com/stoplightio/spectral/blob/develop/docs/guides/5-custom-functions.md), [\[docs.stoplight.io\]](https://docs.stoplight.io/docs/spectral/branches/develop/d3482ff0ccae9-rules) + +### What changes when the ruleset is unknown? + +With a known ruleset, you can maintain a curated mapping from rule ID to likely remediation class. With an unknown ruleset, you cannot rely on rule names or prior human tagging. Instead, you must infer risk from the **rule structure** and the **API object model**. Spectral provides enough rule structure to do that in many cases because rules expose selectors, functions, and optional textual descriptions. [\[github.com\]](https://github.com/stoplightio/spectral/blob/develop/docs/getting-started/3-rulesets.md), [\[docs.stoplight.io\]](https://docs.stoplight.io/docs/spectral/branches/develop/d3482ff0ccae9-rules), [\[docs.devne...cademy.com\]](https://docs.devnet-academy.com/docs/postman/api-governance/configurable-rules/spectral/index.html) + +So the problem becomes: + +> **Given a Spectral rule and a finding, what is the most likely minimal edit that would satisfy the rule, and would that edit alter the public contract surface?** + +That is the right abstraction for unknown rulesets. + +### Expected user-specific ruleset usage behaviour + +This tool is open source so we can't be certain which rulesets users will choose to use as the basis for grading. We need to allow for a wide range of rulesets when estimating remediation safety. + +For any given user, it is reasonable to expect they will choose a limited number of rulesets (typically 1) they will use when they perform API grading. This implies we will likely end up automatically assessing the remediation safety of a ruleset many times for a single user. + +We recommend adding the ability for the user to: +- persist the ruleset remediation safety assessment; +- update the ruleset remediation safety assessment to align with their interpretation of remediation safety; and +- automatically load the ruleset remediation safety assessment when next using this ruleset. + +Benefits of this feature: +- "human review" safety level indicates we can't safely estimate the risk. The user can perform this review once and then encode the correct safety level for this rule in this ruleset. +- improved performance though avoiding the need to estimate ruleset safety on every run diff --git a/specs/algorithms/quick_fixes_algorithm_spec.md b/specs/algorithms/quick_fixes_algorithm_spec.md deleted file mode 100644 index 674b762..0000000 --- a/specs/algorithms/quick_fixes_algorithm_spec.md +++ /dev/null @@ -1,247 +0,0 @@ -# Quick-Fixes Classification Algorithm Specification - -**Version:** 1.0.0 | **Scope:** OpenAPI 3.0+, AsyncAPI 3.0+ - ---- - -## Overview - -Classifies each diagnostic violation as safe to auto-fix without altering the -API's contract (`nonBreaking`), unsafe to auto-fix (`breaking`), or -unclassified (`unknown`), then builds an enriched, AI-actionable fix -suggestion for every `nonBreaking` violation. Single-pass, deterministic, -O(n) execution. Implemented by `classifyViolation()`, `buildQuickFix()`, and -`buildQuickFixOutput()` in `packages/api-grade-core/src/quick-fixes.ts`. - -This algorithm consumes the `Diagnostic[]` list produced by the diagnostic -algorithm (see [`api_diagnostic_algorithm_spec.md`](./api_diagnostic_algorithm_spec.md)) -— it runs independently, after grading, and does not affect score, grade, or -diagnostic ordering. - ---- - -## Violation Classes - -A violation is classified into exactly one of three classes: - -- **`nonBreaking`** — fixing it only adds or corrects descriptive metadata. - No client, server, or contract test validates against it. Safe to apply - automatically, including by an AI agent acting without per-change human - review. -- **`breaking`** — fixing it could change request/response validation, - required fields, types, or the parameter surface. Requires human (or - explicitly-confirmed agent) review before applying. -- **`unknown`** — neither an explicit non-breaking nor breaking signal - matched. Excluded from the quick-fix list by design (see Key Decision - Points). - -**Design principle:** classification is positive-evidence-only. A violation -becomes `nonBreaking` only when a specific signal says it's safe — absence of -a `breaking` signal is never treated as evidence of safety. This trades -completeness (some genuinely safe fixes may land in `unknown` and be -excluded) for the guarantee that nothing in the quick-fix list is ever a -breaking change in disguise. - ---- - -## Input & Output - -**Input:** `Diagnostic` (ruleId, message, severity, path) + the spec's raw -content (string, for `currentValue` lookup). - -**Output:** -- `classifyViolation(diagnostic) → ViolationClass` ("nonBreaking" | "breaking" | "unknown") -- `buildQuickFixOutput(result, specContent) → QuickFixOutput` containing: - - `totalViolations: number` — all diagnostics, regardless of class - - `quickFixCount: number` — count of `nonBreaking` violations - - `quickFixes: QuickFix[]` — one enriched entry per `nonBreaking` violation - ---- - -## Stage 1: Rule-ID Override Check - -Checked first, and short-circuits the rest of classification when it matches. - -``` -RULE_ID_NON_BREAKING_PREFIXES = [ - "operation-description", "operation-summary", - "info-contact", "info-description", "info-license", - "oas3-examples-", "tag-description" -] - -FOR EACH prefix IN RULE_ID_NON_BREAKING_PREFIXES: - IF diagnostic.ruleId.startsWith(prefix): - RETURN "nonBreaking" -``` - -**Rationale:** these rule IDs are curated from the ruleset and exist solely to -check documentation/metadata completeness — there is no JSON Schema or -OpenAPI/AsyncAPI semantic under which satisfying one of them could change a -request or response shape. Because the rule ID is an authoritative signal -(unlike the path, which is generic), this check runs before path inspection -and wins outright when it matches, even if the violation's path also contains -a segment that Stage 2 would otherwise call `breaking`. - ---- - -## Stage 2: Path-Based Heuristic Check - -Runs only when Stage 1 doesn't match. Inspects the diagnostic's JSON `path` -against two keyword sets, **breaking checked before non-breaking**: - -``` -BREAKING_SEGMENTS = { "required", "type", "format" } -NON_BREAKING_SEGMENTS = { - "description", "summary", "title", "contact", "license", - "termsOfService", "externalDocs", "example", "examples", "tags", "info" -} - -is_breaking_path(path): - FOR EACH segment IN path: - IF segment IN BREAKING_SEGMENTS: RETURN true - IF segment == "parameters": RETURN true - RETURN false - -is_non_breaking_path(path): - FOR EACH segment IN path: - IF segment.startsWith("x-"): RETURN true - IF segment IN NON_BREAKING_SEGMENTS: RETURN true - RETURN false - -classify_by_path(path): - IF is_breaking_path(path): RETURN "breaking" - IF is_non_breaking_path(path): RETURN "nonBreaking" - RETURN "unknown" -``` - -**Rationale for breaking segments:** -- `required` controls whether a client must supply a field, or whether a - response is guaranteed to include one — changing it changes what's valid. -- `type` / `format` control how a value is parsed and validated — changing - either can reject previously-valid payloads or accept previously-invalid - ones. -- `parameters` is breaking *categorically*, even though some parameter-level - fixes (e.g. adding a parameter `description`) are harmless. This is a - deliberate precision-over-recall choice: parameters are the part of an API - surface most likely to have a breaking fix disguised as a small one (e.g. - flipping `required: false` to `true`), so the entire `parameters` subtree is - excluded from automatic fixing rather than attempting field-by-field safety - analysis. - -**Rationale for non-breaking segments:** these are pure documentation/metadata -properties — no validator or generated client reads them to decide whether a -payload is valid. Any violation whose path passes through one of these -segments is, by construction, about content quality, not contract shape. -Vendor extensions (`x-` prefix) are always non-breaking because the -OpenAPI/AsyncAPI specification requires conforming consumers to ignore -unrecognized `x-` fields. - -**Rationale for breaking-before-non-breaking ordering:** a path can contain -both kinds of segment (e.g. a `required` array of field names sitting near an -`info`-adjacent structure). Checking breaking first ensures a coincidental -documentation keyword never overrides a genuine breaking signal. - ---- - -## Stage 3: Build Quick Fix - -Runs only for violations classified `nonBreaking`. Enriches the raw diagnostic -with two fields an automated fixer (human or AI) needs that the diagnostic -alone doesn't provide. - -``` -build_quick_fix(diagnostic, specContent): - path ← diagnostic.path - location ← path.join(".") - - currentValue ← null - TRY: - parsed ← JSON.parse(specContent) - node ← walk(parsed, path) - IF node is defined: currentValue ← stringify_if_not_string(node) - CATCH: leave currentValue as null // e.g. YAML spec content - - expectedImprovement ← derive_expected_improvement(ruleId, message, path) - - RETURN { ruleId, message, severity, path, location, currentValue, expectedImprovement } -``` - -**`expectedImprovement` derivation** (`deriveExpectedImprovement`) matches on -`ruleId` substring to produce a rule-shaped, actionable instruction rather -than restating the violation message: - -| `ruleId` contains | `expectedImprovement` | -|---|---| -| `description` | "Add a `description` field that explains the purpose of this {parent entity}" | -| `summary` | "Add a `summary` field with a brief one-line description" | -| `contact` | "Add a `contact` object to the info block with name, email, or url" | -| `license` | "Add a `license` object to the info block with name and url" | -| `example` | "Add an `example` or `examples` field illustrating expected values" | -| `tag-description` | "Add a `description` field to this tag explaining its purpose" | -| *(none of the above)* | "Fix: {message}. Add or update `{lastPathSegment}` as required" | - -**Rationale:** this follows the project's diagnostic philosophy (Constitution -Principle VI — "Actionable next steps"): a fix suggestion should say *what to -do*, not just *what's wrong*. `currentValue` lets a fixer frame the change as -"replace X" rather than "add something," when a value is already present and -the spec is JSON; it stays `null` for YAML specs rather than risk producing a -wrong value from a failed parse. - ---- - -## Example: Mixed Violations - -``` -diagnostics = [ - { ruleId: "info-contact", path: ["info"] }, - { ruleId: "oas3-schema", path: ["paths", "/pets", "get", "parameters", 0, "required"] }, - { ruleId: "oas3-schema", path: ["components", "schemas", "Pet", "properties", "id", "type"] }, - { ruleId: "operation-tag-defined", path: ["paths", "/pets", "get", "tags"] }, - { ruleId: "custom-naming-convention", path: ["paths", "/pets"] } -] -``` - -**Classification:** - -| Diagnostic | Stage matched | Class | Why | -|---|---|---|---| -| `info-contact` @ `info` | Stage 1 (rule-ID override) | `nonBreaking` | `info-contact` is on the curated prefix list | -| `oas3-schema` @ `...parameters[0].required` | Stage 2 (breaking: `parameters` + `required`) | `breaking` | Path passes through `parameters` and `required` | -| `oas3-schema` @ `...properties.id.type` | Stage 2 (breaking: `type`) | `breaking` | Path ends in `type` | -| `operation-tag-defined` @ `...tags` | Stage 2 (non-breaking: `tags`) | `nonBreaking` | Path passes through `tags`, no breaking segment present | -| `custom-naming-convention` @ `/pets` | Stage 2 (no match) | `unknown` | No breaking or non-breaking segment in path; excluded from quick fixes | - -`quickFixCount = 2`, `totalViolations = 5`. - ---- - -## Key Decision Points - -| Component | Logic | -|---|---| -| **Classification priority** | Rule-ID override (Stage 1) before path heuristic (Stage 2) | -| **Path heuristic priority** | Breaking segments checked before non-breaking segments | -| **Breaking segments** | `required`, `type`, `format`, any path through `parameters` | -| **Non-breaking segments** | `description`, `summary`, `title`, `contact`, `license`, `termsOfService`, `externalDocs`, `example`, `examples`, `tags`, `info`, any `x-` prefixed segment | -| **Unmatched paths** | Classified `unknown`; excluded from `quickFixes` (not surfaced as `breaking` or `nonBreaking`) | -| **`parameters` handling** | Categorically breaking — no field-level distinction within the `parameters` subtree | -| **`currentValue` resolution** | Best-effort JSON parse of spec content; `null` on parse failure (e.g. YAML) or absent value | -| **`expectedImprovement` derivation** | `ruleId` substring match against a fixed table; generic fallback otherwise | - ---- - -## Implementation Notes - -- **Deterministic:** no randomization, timestamps, or external state. -- **Order-independent:** diagnostic list order doesn't affect any individual - classification. -- **Conservative by design:** `unknown` is a valid, expected outcome, not an - error — it deliberately excludes ambiguous violations from automatic - fixing rather than guessing. -- **Spec-format agnostic:** classification only inspects `ruleId` and `path`, - which are uniform across OpenAPI and AsyncAPI diagnostics; no spec-type - branching is needed (contrast with the diagnostic algorithm's Stage 4 - AsyncAPI-aware narrative). -- **`currentValue` limitation:** only resolves for specs supplied as JSON - content; YAML specs always yield `currentValue: null`. This is a known, - accepted limitation, not a defect. From d6f30ff53b1e6e6c32eff47f9ab7f43f111c552e Mon Sep 17 00:00:00 2001 From: DawMatt Date: Wed, 24 Jun 2026 10:27:04 +1000 Subject: [PATCH 03/22] Initial remediation safety clarification --- specs/012-remediation-safety/data-model.md | 46 ++++++- specs/012-remediation-safety/research.md | 24 ++++ specs/012-remediation-safety/spec.md | 19 ++- ...mated_remediation_safety_algorithm_spec.md | 127 ++++++++++++++---- 4 files changed, 185 insertions(+), 31 deletions(-) diff --git a/specs/012-remediation-safety/data-model.md b/specs/012-remediation-safety/data-model.md index 6c695fd..75e2b0e 100644 --- a/specs/012-remediation-safety/data-model.md +++ b/specs/012-remediation-safety/data-model.md @@ -8,10 +8,14 @@ Enum string: `"safe"` | `"humanreview"` | `"unsafe"`. Ordered from least to most Enum string: `"high"` | `"medium"` | `"low"`. Describes how confident the ruleset analyser is in a `RuleAnalysis`'s assigned `RemediationSafetyLevel`. -- `high` — the rule id matched a curated, known table (Stage 1 of the algorithm). -- `medium` — classification came from the generic path-segment heuristic only (Stage 2), with a single, unambiguous tier match. +- `high` — the rule id matched a curated, known table (Stage 1), the rule's `given` selected path/channel object keys directly (Stage 2a), or the entry came from a persisted user correction or bundled pre-calculated default (Stage 0). +- `medium` — classification came from the generic path-segment heuristic only (Stage 2b), with a single, unambiguous tier match. - `low` — either no recognizable signal at all (Stage 3 fallback), or the path-segment heuristic matched more than one tier (genuine ambiguity, downgraded from `medium`). +## AnalysisSource + +Enum string: `"persisted"` | `"bundled-default"` | `"curated"` | `"heuristic"` | `"fallback"`. Provenance of a `RuleAnalysis` entry — which stage of the algorithm (`automated_remediation_safety_algorithm_spec.md`) produced it. Not used for classification logic itself, but surfaced so a user inspecting analyser output (FR-011) can tell a human-confirmed entry (`persisted`) apart from an algorithmically-derived one. + ## RuleAnalysis One entry per rule in an analysed ruleset. @@ -22,18 +26,50 @@ One entry per rule in an analysed ruleset. | `riskLevel` | `RemediationSafetyLevel` | The assigned risk level for auto-remediating violations of this rule. | | `confidenceLevel` | `ConfidenceLevel` | Confidence in `riskLevel`. | | `rationale` | string | Short human-readable explanation of why this level/confidence was assigned (e.g. "rule id matched curated safe-prefix table" or "given path touches `parameters` and `description` — conservative match, ambiguous"). | +| `source` | `AnalysisSource` | Which stage produced this entry — see above. | **Validation rules**: every rule present in the input ruleset MUST produce exactly one `RuleAnalysis` entry (FR-001, SC-005) — no rule is ever omitted from analyser output. +## RulesetIdentity + +A stable identifier for "the same ruleset" across separate invocations (FR-014), used as the lookup/storage key for `PersistedRulesetAnalysis` and the bundled default. + +| Field | Type | Description | +|---|---|---| +| `value` | string | SHA-256 hash over the ruleset's normalized rule definitions (`ruleId`, `given`, `then.function`, `severity`, `description`, sorted by `ruleId`). | + +**Relationships**: Derived solely from ruleset *content*, never from `rulesetPath`/`rulesetUrl` — the same content hashes identically regardless of where it was loaded from; different content (even at an unchanged path/URL) hashes differently. This is what lets a persisted/bundled analysis be correctly reused or correctly invalidated (spec.md Edge Cases). + ## RulesetAnalysis | Field | Type | Description | |---|---|---| | `rulesetSource` | `"default" \| "custom"` | Mirrors `GradeResult.rulesetSource`. | | `rulesetPath` | string (optional) | Present when `rulesetSource === "custom"`. | -| `rules` | `RuleAnalysis[]` | One entry per rule, see above. | +| `rulesetIdentity` | string | The `RulesetIdentity.value` for this ruleset's content. | +| `rules` | `RuleAnalysis[]` | One entry per rule, see above. May be assembled from a mix of `source` values — some rules from Stage 0 (persisted/bundled), the rest from Stages 1–3. | + +**Relationships**: Computed once per distinct ruleset content (keyed by `rulesetIdentity`, see `PersistedRulesetAnalysis` below), not merely cached for the lifetime of one process — this corrects the original design, which assumed no cross-invocation persistence (see `research.md` §8, added after reassessment against `clarification-algorithm.md`). `GradeEngine` (or a caller wrapping it) holds the `RulesetAnalysis` alongside the loaded ruleset for the duration of a single run and consults it when building remediation-safety output, rather than recomputing per violation; across separate runs, the persisted/bundled layer (Stage 0) is what avoids recomputing per-rule classification for rules it covers. + +## PersistedRulesetAnalysis (new) + +A partial or full `RulesetAnalysis`, saved against a `RulesetIdentity` so it can be reloaded automatically on future runs against the same ruleset content (FR-012, FR-013). + +| Field | Type | Description | +|---|---|---| +| `rulesetIdentity` | string | The `RulesetIdentity.value` this analysis applies to. | +| `scope` | `"workspace" \| "global"` | Storage scope, reusing the precedence already established by `RulesetScope`/`RulesetResolution` for ruleset *selection* (workspace checked before global). | +| `rules` | `Record` | Keyed by `ruleId`. May cover all or only some of a ruleset's rules — uncovered rules are simply absent from the map, not represented as explicit nulls. | + +**Validation rules**: every `RuleAnalysis` value in `rules` MUST have `source: "persisted"` (entries are only ever written here via an explicit user correction, Stage 4 of the algorithm spec). Storage location reuses the existing workspace/global config file scope already used for `RulesetConfig` (`packages/api-grade-core/src/config/ruleset-config.ts`), rather than introducing a new persistence subsystem. + +## BundledRulesetAnalysis (new) + +The built-in ruleset's pre-calculated analysis, shipped with the package (FR-012's "at a minimum the default ruleset" baseline). Same shape as `RulesetAnalysis`, generated once at release time by running Stages 1–3 over the built-in ruleset and committed alongside the package source — not regenerated at runtime. Every entry has `source: "bundled-default"`. + +## Lookup precedence (Stage 0) -**Relationships**: Computed once per loaded ruleset (`LoadedRuleset` from `rulesets/loader.ts`) and cached for the lifetime of a grading run. `GradeEngine` (or a caller wrapping it) holds the `RulesetAnalysis` alongside the loaded ruleset and consults it when building remediation-safety output, rather than recomputing per violation. +For a given `rulesetIdentity` and `ruleId`, checked in order until one matches: workspace-scoped `PersistedRulesetAnalysis` → global-scoped `PersistedRulesetAnalysis` → `BundledRulesetAnalysis` (only if this is the built-in ruleset) → fall through to Stages 1–3 of the algorithm. ## RemediationItem (was `QuickFix`) @@ -60,7 +96,7 @@ One entry per rule in an analysed ruleset. | `remediationItems` | `RemediationItem[]` | Renamed from `quickFixes`. | | `requestedLevel` | `RemediationSafetyLevel` | **New** — echoes the level that was filtered for, since there are now three possible values instead of one implicit one. | -**State transitions**: N/A — both `RulesetAnalysis` and remediation-safety output are computed fresh per grading/analysis request; nothing is persisted across requests (consistent with Feature 11's data model, which established this as a request-scoped concept). +**State transitions**: `RemediationItem`/`RemediationSafetyOutput` are computed fresh per grading/analysis request and never persisted — only the per-rule `RuleAnalysis` entries behind them (via `RulesetAnalysis`/`PersistedRulesetAnalysis`/`BundledRulesetAnalysis`) are persisted, and only at the granularity of "one rule's classification within one ruleset's identity," not as a snapshot of any specific request's output. This corrects the original assumption (carried over from Feature 11's request-scoped data model) that nothing in this feature persists across requests — `clarification-algorithm.md` requires the per-rule analysis layer specifically to survive across requests so it is not re-estimated, and re-reviewed by a human, on every run against the same ruleset. ## Lookup / default behavior diff --git a/specs/012-remediation-safety/research.md b/specs/012-remediation-safety/research.md index 10f0bdd..75fe3c5 100644 --- a/specs/012-remediation-safety/research.md +++ b/specs/012-remediation-safety/research.md @@ -4,6 +4,8 @@ **Decision**: The ruleset analyser classifies at the **rule** level (one risk + confidence pair per `ruleId`), computed once per loaded ruleset and cached. Remediation safety for a specific violation is a lookup against this cache by `ruleId`, not a fresh per-instance computation. +**Reassessed against `clarification-algorithm.md`**: confirmed and unchanged. The clarification document's "Recommended High Level Approach" frames the problem the same way — analyse the ruleset once, "for every rule," and reuse that output — and explicitly motivates this with the same performance argument given here ("avoiding the need to estimate ruleset safety on every run"). One addition: the clarification document expects this cache to outlive a single process/grading run (see §8, new) — "computed once per loaded ruleset" should be read as "computed once per distinct ruleset *content*, persisted, and reused across invocations," not merely cached for the lifetime of one process. + **Rationale**: FR-001 and the spec's Key Entities explicitly scope the analyser to "for each rule". This is also what makes FR-011 possible (inspecting ruleset risk independent of grading any specific spec) and keeps the calculation O(1) per violation at grading time instead of re-running heuristics per occurrence. **Alternatives considered**: Per-violation classification (today's `classifyViolation`, keyed on the diagnostic's instance `path`) gives finer granularity for generic rules (e.g. `oas3-schema`, which fires on both breaking and cosmetic mismatches) but cannot satisfy FR-011 (no spec to derive an instance path from) and isn't "ruleset analysis" — it's per-result analysis. Rejected in favor of rule-level, accepting the coarser-granularity tradeoff called out in the spec's Edge Cases (a rule spanning levels gets one conservative classification, flagged with reduced confidence — see §3). @@ -28,12 +30,18 @@ **Alternatives considered**: Statistical/ML classification over rule descriptions — rejected: nondeterministic, costly, and violates Constitution Principle V (zero-cost prerequisites) if it requires an external model; also harder to explain ("rationale" requirement, FR-003) than a deterministic rule table. +**Reassessed against `clarification-algorithm.md` — gap found and corrected**: the prior segment tables (`UNSAFE_SEGMENTS`/`HUMANREVIEW_SEGMENTS`/`SAFE_SEGMENTS`) were carried over unchanged from the OpenAPI-only quick-fixes algorithm and contained no AsyncAPI-specific terms, despite Constitution Principle I requiring format-neutral treatment and the clarification document dedicating an explicit section ("Build a format-aware contract-surface ontology") to AsyncAPI's high-impact surfaces — channel `address`, channel parameters, operation `action`, the operation-channel relationship, and messages/payload schemas. The clarification document's worked example (Example B: a `pattern` rule on `$.paths[*]~`, the object-*key* selector) is also not caught by plain segment-membership matching, since `paths`/`channels` were never in any tier as bare segments — adding them as ordinary segments would over-match every rule that merely reads something nested under a path/channel (including safe ones like `operation-description`). Two corrections folded into the updated algorithm spec: (a) extend `UNSAFE_SEGMENTS` with AsyncAPI's high-impact segment terms (`address`, `action`, `messages`, `payload`) alongside the existing OpenAPI ones, and add `channels`/`operations`/`reply` to `HUMANREVIEW_SEGMENTS` as broader/ambiguous AsyncAPI surfaces; (b) add a dedicated **key-selector check** ahead of generic segment matching — a `given` expression that selects object *keys* (the JSONPath Plus `~` modifier) under `paths` or `channels` is always `unsafe`/`high` confidence regardless of segment membership, since renaming a path or channel key is a public-surface rename by construction, matching the clarification document's Example B directly. + +**Reassessed against `clarification-algorithm.md`**: confirmed and unchanged. The document's "Recommended High Level Estimating Model Approach" independently arrives at the same separation of "risk" from "confidence" (§5 of that document) and the same rationale — a rule can be high-risk/low-confidence or low-risk/high-confidence, and conflating the two either over-blocks harmless changes or under-blocks dangerous ones. No revision needed. + ## 4. Default/fallback behavior when a rule has no analysis **Decision**: Any violation whose `ruleId` is absent from the cached `RulesetAnalysis` (e.g. ruleset changed between analysis and grading) defaults to `unsafe` / `low` confidence at lookup time, not just at Stage 3 of the analyser itself. **Rationale**: Directly required by FR-009 and the spec's first Edge Case; keeps the conservative-by-default guarantee end-to-end, not just inside the analyser. +**Reassessed against `clarification-algorithm.md`**: confirmed and unchanged, and now also governs persisted/pre-calculated entries (§8): a persisted analysis covering only some rules (FR-015) is exactly the "absent from `RulesetAnalysis`" case for the rules it doesn't cover — they fall through to Stages 1–3, then to this same lookup-miss default if still unclassified. One lookup-miss path, reused for every reason a rule might be unclassified (unanalysed ruleset, stale persisted entry, or partial persisted coverage). + ## 5. Internal naming cleanup (completing the Feature 11 deferral) **Decision**: Rename internal identifiers wholesale — no backward-compatible aliases (pre-v1.0, consistent with Feature 11's precedent): @@ -65,3 +73,19 @@ **Decision**: `--remediation-safety ` / MCP `level` parameter remains an **exact-match filter** (return only violations whose computed risk equals the requested level), not a cumulative "at or below" filter. **Rationale**: Preserves FR-007 (identical behavior for `safe`) without redefining what the existing parameter means; a cumulative mode is not requested by the spec and would be a separate, additive feature if ever needed (YAGNI per the constitution's Development Workflow). + +## 8. Persisting and reloading ruleset analysis (FR-012–FR-015) + +**Decision**: This is a correction to the original plan/data-model, prompted directly by `clarification-algorithm.md`'s "Recommended High Level Approach" (steps 1 and 4) and its "Expected user-specific ruleset usage behaviour" section, neither of which was reflected in the initial design. Both documents explicitly call for (a) loading pre-calculated risk/safety levels for known rulesets — "at a minimum... the default ruleset" — before running automated analysis, and (b) letting users persist corrections so they are reloaded automatically next time the same ruleset is used. The original data-model statement that "nothing is persisted across requests" directly contradicts this and is superseded. + +Design: + +- **Ruleset Identity**: a SHA-256 hash of the ruleset's normalized rule definitions (`given`/`then`/`severity`/`description` per rule, sorted by `ruleId`), not the supplied path/URL. Path/URL is not a reliable identity (the same content can be re-fetched from a different mirror; the same path can later point at edited content), but content is exactly what the analyser's classification depends on. +- **Lookup order, ahead of Stage 1**: (0a) a workspace-scoped persisted analysis for this Ruleset Identity, (0b) a global-scoped one, (0c) for the built-in ruleset specifically, a pre-calculated analysis bundled with the package at build time (so SC-007 holds even on a machine with no prior user activity), then (1)–(3) as already specified. This mirrors the existing `RulesetScope` precedence (`per-request` > `session` > `workspace` > `global` > `built-in`) already used for ruleset *selection*, reused here for ruleset *analysis* rather than inventing a second precedence model. +- **Partial coverage**: a persisted/bundled analysis is a map keyed by `ruleId`; only rules present in that map short-circuit Stages 1–3 for this ruleset (FR-015). This is the same mechanism as the existing lookup-miss default (§4) — a rule not in the persisted map is simply not a hit, and analysis proceeds normally for it. +- **Writing a correction**: a user-supplied override for one rule is merged into the workspace-scoped persisted analysis for that ruleset's current Identity (last-write-wins per `ruleId`); it does not require re-submitting every other rule's classification. Exact surface (CLI flag vs. MCP tool input) is an implementation detail for the planning phase, not fixed here. +- **Storage location**: reuses the existing workspace (`.api-grade/config.json`-adjacent) / global (`~/.api-grade/`) file scope already established for `RulesetConfig`, rather than a new persistence subsystem — consistent with the constitution's preference against unnecessary new infrastructure. + +**Rationale**: Both source documents treat this as integral to the algorithm, not an optional extra — the clarification document frames it as the mechanism that makes "human review" classifications actually useful over time ("the user can perform this review once and then encode the correct safety level for this rule in this ruleset") and as a deliberate performance optimization (avoiding re-estimation on every run for the common case of a user repeatedly grading against the same one or two rulesets). + +**Alternatives considered**: Keying persisted analyses by `rulesetPath`/`rulesetUrl` instead of content hash — rejected, because it would either wrongly reuse a stale analysis after the file at that path changes, or wrongly treat the same ruleset fetched via two different paths/mirrors as unrelated; a content hash gets both cases right. A dedicated new config subsystem instead of extending the existing workspace/global config scope — rejected as unnecessary duplication of infrastructure that already solves "where does per-user, per-workspace state live" for this exact project. diff --git a/specs/012-remediation-safety/spec.md b/specs/012-remediation-safety/spec.md index b65498d..04f9169 100644 --- a/specs/012-remediation-safety/spec.md +++ b/specs/012-remediation-safety/spec.md @@ -40,6 +40,7 @@ A team supplies its own custom Spectral ruleset. They want to understand, rule b 1. **Given** a ruleset is analysed, **When** the analysis completes, **Then** every rule in the ruleset has an assigned risk level (`safe`, `humanreview`, or `unsafe`) and a confidence level for that assignment. 2. **Given** a rule the analyser cannot confidently classify (e.g. a custom rule with no recognizable id pattern or schema path), **When** it is analysed, **Then** it is still assigned a risk level (defaulting to the more conservative `unsafe` or `humanreview`) but with a low confidence indicator, rather than being silently omitted. 3. **Given** the analyser's output for a ruleset, **When** a user inspects it (JSON or human format), **Then** they can see, per rule, the risk level, confidence level, and a brief rationale. +4. **Given** a user disagrees with a rule's assigned risk level and persists a correction for it, **When** the same ruleset is analysed again in a later, separate invocation, **Then** the corrected risk level is returned for that rule without requiring the correction to be re-applied. --- @@ -64,6 +65,9 @@ A new contributor or documentation reader should encounter "remediation safety" - How does the analyser handle a rule that legitimately spans multiple risk levels depending on context (e.g. a rule that sometimes flags a breaking change and sometimes a cosmetic one)? The specification (`automated_remediation_safety_algorithm_spec.md`) must define how such rules are classified — at the rule level, the analyser assigns one risk/confidence pair per rule; finer-grained, per-violation distinctions are out of scope for the analyser itself. - What happens when a custom/private ruleset is supplied that the analyser has never seen before? It must still produce a complete classification (risk + confidence) for every rule, with confidence honestly reflecting the lack of prior knowledge, rather than failing the grading run. - What happens to existing consumers (CI pipelines, scripts) that depend on today's binary `safe` vs "not safe" filtering? `--remediation-safety safe` (and equivalent MCP/package usage) must continue to mean exactly what it means today; the new levels are additive, not a breaking redefinition of `safe`. +- What happens when a user corrects a rule's risk level for a ruleset, and that ruleset's content later changes (rules added, removed, or edited)? The correction is keyed to the Ruleset Identity (content-derived); if the identity no longer matches, the persisted analysis is treated as not found for the changed rule(s) rather than silently misapplied, and those rule(s) fall back to automated analysis (FR-014). +- What happens when only some rules in a ruleset have a pre-calculated or user-corrected entry? Every rule still gets a classification (FR-015) — covered rules use the persisted entry, the rest go through automated analysis as if no persisted analysis existed for them. +- What happens on a machine/environment where no persisted analysis exists yet for a ruleset the user has never used before (including the very first run for any user)? The system performs the existing automated analysis (Stages 1–3) and proceeds normally; persistence is an optimization and trust-building mechanism, never a precondition for producing output. ## Requirements *(mandatory)* @@ -80,13 +84,19 @@ A new contributor or documentation reader should encounter "remediation safety" - **FR-009**: When a violation's rule has no analyser result available at grading time, the system MUST default that violation to the most conservative risk level (`unsafe`) rather than omitting it or failing. - **FR-010**: All source code, tests, type/function/tool names, package metadata, and user-facing or contributor-facing documentation across the repository MUST be updated so that no "quick fix" terminology (in any casing or separator style) remains. - **FR-011**: The ruleset analyser's per-rule results (risk level, confidence level, and rationale) MUST be inspectable by users, in both JSON and human-readable form, independent of grading a specific API spec (i.e. "analyse this ruleset" is a capability in its own right, not only an internal implementation detail). +- **FR-012**: Before running the automated analysis stages, the system MUST check for a previously computed or pre-calculated ruleset analysis for the loaded ruleset and, when found, use it directly instead of recomputing from rule metadata. At minimum, the built-in ruleset MUST ship with such a pre-calculated analysis. +- **FR-013**: Users MUST be able to persist a correction to a rule's risk level (and, implicitly, raise its confidence to reflect human confirmation) for a specific ruleset, such that the corrected classification is automatically loaded and used the next time that same ruleset is analysed or graded against, without requiring the correction to be re-entered. +- **FR-014**: The system MUST be able to recognize "the same ruleset" across separate invocations for the purpose of FR-012/FR-013 reuse, even when the ruleset is supplied by file path or URL rather than by an identical in-memory reference, so that a pre-calculated or user-corrected analysis is not silently skipped or, conversely, wrongly reused against ruleset content that has actually changed. +- **FR-015**: When a persisted or pre-calculated analysis only covers some of the rules in the currently loaded ruleset (e.g. the ruleset gained rules since the analysis was captured), the system MUST still produce a complete classification for every rule (FR-001/SC-005) — covered rules use the persisted/pre-calculated entry, uncovered rules fall through to automated analysis. ### Key Entities *(include if feature involves data)* -- **Ruleset Analyser Result**: Per analysed ruleset, a collection of per-rule entries. Each entry references a rule id and carries the rule's assigned risk level, confidence level, and a short human-readable rationale for the assignment. +- **Ruleset Analyser Result**: Per analysed ruleset, a collection of per-rule entries. Each entry references a rule id and carries the rule's assigned risk level, confidence level, a short human-readable rationale for the assignment, and where that assignment came from (freshly computed, pre-calculated/bundled, or a persisted user correction). - **Risk Level**: One of `safe`, `humanreview`, `unsafe` — describes how safe it is to automatically remediate a violation of a given rule without human review. -- **Confidence Level**: Describes how confident the analyser is in a rule's assigned risk level (e.g. driven by how well-known/recognizable the rule is versus how custom/ambiguous it is). +- **Confidence Level**: Describes how confident the analyser is in a rule's assigned risk level (e.g. driven by how well-known/recognizable the rule is versus how custom/ambiguous it is, or whether a human has explicitly confirmed it). - **Remediation Safety (per violation)**: The risk level applied to a specific violation found during grading, derived from the ruleset analyser's result for that violation's rule. +- **Ruleset Identity**: A stable identifier for "the same ruleset" across separate invocations, used to look up and store pre-calculated/persisted analyses (FR-012–FR-014). Derived from ruleset content, not from the path/URL it was supplied with, so that the identity survives a ruleset being re-fetched or relocated, and so that genuinely changed content is not mistaken for an unchanged ruleset. +- **Persisted Ruleset Analysis**: A ruleset analysis (in full or in part, e.g. just the rules a user has corrected) saved against a Ruleset Identity so it can be reloaded automatically on future runs against that same ruleset, without re-prompting the user or re-running automated analysis for the rules it covers. ## Success Criteria *(mandatory)* @@ -97,6 +107,8 @@ A new contributor or documentation reader should encounter "remediation safety" - **SC-003**: A repository-wide search for "quick fix" (any casing/separator) returns zero matches after the feature is complete. - **SC-004**: Existing `--remediation-safety safe` users observe no behavioral change in the set of violations returned, compared to before this feature. - **SC-005**: For an arbitrary, previously-unseen custom ruleset, the analyser completes and returns a risk and confidence level for 100% of its rules (no rule left unclassified). +- **SC-006**: A user-corrected risk level for a rule in a given ruleset is honored (returned without re-running automated analysis for that rule) on a subsequent, separate invocation against the same ruleset content, and is no longer honored if that ruleset's content subsequently changes. +- **SC-007**: The built-in ruleset's analysis is available without any per-rule automated computation having to run at request time (served from a pre-calculated/bundled result), for both the CLI and MCP surfaces. ## Assumptions @@ -106,3 +118,6 @@ A new contributor or documentation reader should encounter "remediation safety" - "Rationale" per rule is a short, human-readable explanation (not a separate structured field requiring its own schema beyond a text string) sufficient for users to understand why a level was assigned. - Backstage plugin packages are in scope for surfacing remediation safety only insofar as they already surface quick-fix/remediation-safety information today; if they do not yet do so, extending them is out of scope for this feature. - This feature does not change how a custom ruleset is supplied (file path, GitHub PAT, etc.) — only how its rules are risk-classified once available. +- Persistence of pre-calculated/user-corrected ruleset analyses (FR-012–FR-015) reuses the same storage scope model (workspace/global config) already established for ruleset selection (`RulesetConfig`/`RulesetResolution`), rather than introducing a new persistence layer; the exact file/location is an implementation detail for planning, not a renegotiation of scope. +- Ruleset Identity is computed from ruleset content (e.g. a content hash), not from the path or URL the ruleset was supplied with, so the same ruleset retains its persisted analysis if relocated, and a different ruleset at the same path does not wrongly inherit one. +- "Persist a correction" (FR-013) refers to the data being saved for reuse; *how* a user supplies that correction (a CLI flag, an MCP tool call, hand-editing a config file) is an implementation detail for planning, not fixed by this specification. diff --git a/specs/algorithms/automated_remediation_safety_algorithm_spec.md b/specs/algorithms/automated_remediation_safety_algorithm_spec.md index db1e517..65ca6b2 100644 --- a/specs/algorithms/automated_remediation_safety_algorithm_spec.md +++ b/specs/algorithms/automated_remediation_safety_algorithm_spec.md @@ -10,6 +10,8 @@ Determines, for every **rule** in a loaded ruleset, how risky it would be to aut This algorithm supersedes the two-class `classifyViolation()` algorithm described in [`quick_fixes_algorithm_spec.md`](./quick_fixes_algorithm_spec.md), extending it from a binary `nonBreaking`/`breaking` split (with `unknown` as an exclusion bucket) to three first-class risk levels with an explicit confidence dimension. It consumes rule **metadata** (`ruleId`, the rule's `given` JSONPath expression(s), `then.function`) from a loaded Spectral ruleset object — it does not consume `Diagnostic[]` directly; diagnostics are matched to their rule's pre-computed result by `ruleId` when remediation safety is needed for a grading run. +Before running automated classification, the analyser first checks for a previously persisted or pre-calculated analysis for this exact ruleset's content (Stage 0) — both a baseline for the built-in ruleset and a place for users to durably correct classifications they disagree with, so the same ruleset is not re-estimated, and re-reviewed, from scratch on every run. + --- ## Risk Levels @@ -34,15 +36,40 @@ Each rule's risk level carries a confidence level: ## Input & Output -**Input:** a loaded Spectral ruleset object (`LoadedRuleset.ruleset` from `packages/api-grade-core/src/rulesets/loader.ts`), specifically its `rules` map: `{ [ruleId]: { given: string | string[], then: { function: string }, severity, description, recommended } }`. +**Input:** a loaded Spectral ruleset object (`LoadedRuleset.ruleset` from `packages/api-grade-core/src/rulesets/loader.ts`), specifically its `rules` map: `{ [ruleId]: { given: string | string[], then: { function: string }, severity, description, recommended } }`. A **Ruleset Identity** (see Stage 0) is derived from this same input and used to look up any persisted or bundled pre-calculated analysis. **Output:** `analyseRuleset(ruleset) -> RulesetAnalysis`: - `rulesetSource: 'default' | 'custom'`, `rulesetPath?: string` — mirrors the input `LoadedRuleset`. +- `rulesetIdentity: string` — the content hash described in Stage 0. - `rules: RuleAnalysis[]` — exactly one entry per rule key in the input ruleset (no omissions — see Implementation Notes). -Each `RuleAnalysis`: `{ ruleId, riskLevel, confidenceLevel, rationale }`. +Each `RuleAnalysis`: `{ ruleId, riskLevel, confidenceLevel, rationale, source }`, where `source` is one of `'persisted'` (Stage 0a/0b), `'bundled-default'` (Stage 0c), `'curated'` (Stage 1), `'heuristic'` (Stage 2), or `'fallback'` (Stage 3) — see Data Model for the full enum. + +A second function, `getRemediationSafety(diagnostic, rulesetAnalysis) -> { riskLevel, confidenceLevel }`, performs the per-violation lookup at grading time (see Stage 5). + +A third function, `persistRuleAnalysisOverride(rulesetIdentity, ruleId, riskLevel, scope)`, writes a user correction for one rule into the persisted-analysis store at the given scope (`workspace` | `global`), to be picked up by Stage 0 on future runs. + +--- + +## Stage 0: Persisted / Pre-Calculated Lookup + +Runs **before** Stage 1, once per loaded ruleset (not per rule). Computes the ruleset's **Ruleset Identity** — a SHA-256 hash over its rule definitions, normalized as `sortBy(ruleId) -> ruleId + '|' + given + '|' + then.function + '|' + severity + '|' + description`, joined and hashed. Identity is derived from rule *content*, never from `rulesetPath`/`rulesetUrl`, so relocating or re-fetching an unchanged ruleset still hits the cache, and editing a ruleset at a stable path correctly misses it. + +Checked in order; the first hit for a given `ruleId` is used, and per-`ruleId` lookup continues independently — a ruleset's overall analysis can be assembled from a mix of sources: -A second function, `getRemediationSafety(diagnostic, rulesetAnalysis) -> { riskLevel, confidenceLevel }`, performs the per-violation lookup at grading time (see Stage 4). +``` +0a. workspace-scoped persisted analysis for this rulesetIdentity (if present) +0b. global-scoped persisted analysis for this rulesetIdentity (if present) +0c. bundled pre-calculated analysis, ONLY if this is the built-in ruleset +``` + +For each `ruleId` covered by a hit: `RETURN { riskLevel, confidenceLevel, rationale, source: 'persisted' | 'bundled-default' }` using the stored values as-is (a `persisted` entry's `confidenceLevel` is whatever was stored when the correction was made — typically `high`, since a human confirmed it). + +Any `ruleId` **not** covered by Stage 0 (no persisted/bundled entry exists for it — including the common case of an entirely new custom ruleset) falls through to Stage 1. This is the same "lookup miss → keep going" behavior as the per-violation lookup in Stage 5; Stage 0 never blocks or fails the analysis, it only short-circuits the rules it has prior knowledge of (FR-012, FR-015). + +**Rationale:** directly required by `clarification-algorithm.md`'s "Recommended High Level Approach" (steps 1 and 4): load pre-calculated risk/safety for known rulesets first, and let users persist their own corrections so the same ruleset doesn't need re-estimating — and re-confirming via human review — on every run. Content-hash identity (rather than path/URL) is what makes "the same ruleset" a meaningful, stable lookup key across separate invocations, possibly on different machines or after the ruleset file/URL has moved (FR-014). + +**Bundled pre-calculated analysis for the built-in ruleset:** shipped with the package (generated by running Stages 1–3 once over the built-in ruleset at release time and committing the result), so `ruleset-analysis`/`analyse-ruleset-safety` against the built-in ruleset never requires per-rule computation at request time (SC-007), and so the built-in ruleset itself satisfies the "at a minimum the default ruleset" baseline the clarification document calls for. --- @@ -69,11 +96,11 @@ UNSAFE_RULE_ID_PREFIXES = [ FOR EACH rule IN ruleset.rules: FOR EACH prefix IN SAFE_RULE_ID_PREFIXES: - IF rule.id.startsWith(prefix): RETURN { riskLevel: "safe", confidenceLevel: "high", rationale: "rule id matched curated safe-prefix table" } + IF rule.id.startsWith(prefix): RETURN { riskLevel: "safe", confidenceLevel: "high", rationale: "rule id matched curated safe-prefix table", source: "curated" } FOR EACH prefix IN HUMANREVIEW_RULE_ID_PREFIXES: - IF rule.id.startsWith(prefix): RETURN { riskLevel: "humanreview", confidenceLevel: "high", rationale: "rule id matched curated humanreview-prefix table" } + IF rule.id.startsWith(prefix): RETURN { riskLevel: "humanreview", confidenceLevel: "high", rationale: "rule id matched curated humanreview-prefix table", source: "curated" } FOR EACH prefix IN UNSAFE_RULE_ID_PREFIXES: - IF rule.id.startsWith(prefix): RETURN { riskLevel: "unsafe", confidenceLevel: "high", rationale: "rule id matched curated unsafe-prefix table" } + IF rule.id.startsWith(prefix): RETURN { riskLevel: "unsafe", confidenceLevel: "high", rationale: "rule id matched curated unsafe-prefix table", source: "curated" } ``` **Rationale:** identical justification to the quick-fixes algorithm's Stage 1 — these rule IDs are curated from the built-in rulesets and the curators have direct knowledge of what each rule actually validates, making the rule ID itself an authoritative signal that outranks any generic path heuristic. @@ -84,11 +111,36 @@ FOR EACH rule IN ruleset.rules: ## Stage 2: Path-Segment Heuristic on the Rule's `given` -Runs only when Stage 1 doesn't match. Inspects the rule's `given` JSONPath expression(s) — the schema location(s) the rule applies to — against three tiered, disjoint keyword sets, **most conservative checked first**: +Runs only when Stage 1 doesn't match. Two checks, in order: a structural **key-selector check** (2a), then the **segment-membership heuristic** (2b). Both operate on the rule's `given` JSONPath expression(s) — the schema location(s) the rule applies to — and are format-aware, covering both OpenAPI's and AsyncAPI's contract-surface ontologies per `clarification-algorithm.md`'s "Build a format-aware contract-surface ontology" guidance. + +### Stage 2a: Key-Selector Check ``` -UNSAFE_SEGMENTS = { "required", "type", "format", "parameters" } -HUMANREVIEW_SEGMENTS = { "enum", "default", "security", "servers", "operationId", "additionalProperties", "responses" } +IS_KEY_SELECTOR(given) = given matches a JSONPath Plus object-key selector + (the trailing "~" modifier, e.g. "$.paths[*]~", "$.channels[*]~") + +FOR EACH given_expr IN rule.given: + IF IS_KEY_SELECTOR(given_expr) AND given_expr contains "paths" or "channels" as the selected collection: + RETURN { riskLevel: "unsafe", confidenceLevel: "high", rationale: "given path selects path/channel object keys directly — any satisfying edit renames a public path or channel", source: "heuristic" } +``` + +**Rationale:** a rule targeting the *keys* of `paths`/`channels` (e.g. a kebab-case naming convention, `clarification-algorithm.md`'s Example B) cannot be satisfied without renaming a real, public path or channel — by construction this is the riskiest, highest-confidence case the heuristic can recognize, and it would otherwise be missed by segment-membership matching alone (`paths`/`channels` are deliberately *not* included as bare segments in 2b, since most rules with `paths`/`channels` somewhere in their `given` — e.g. `operation-description`, which reaches `$.paths[*][*].description` — are not targeting the key itself and must not be over-classified as unsafe). + +### Stage 2b: Segment-Membership Heuristic + +``` +UNSAFE_SEGMENTS = { + // OpenAPI: request/response validity and parameter surface + "required", "type", "format", "parameters", + // AsyncAPI: channel address and message/payload surface + "address", "action", "messages", "payload" +} +HUMANREVIEW_SEGMENTS = { + // OpenAPI: additive-but-operationally-significant + "enum", "default", "security", "servers", "operationId", "additionalProperties", "responses", + // AsyncAPI: broader/ambiguous operation and channel-level surfaces + "channels", "operations", "reply" +} SAFE_SEGMENTS = { "description", "summary", "title", "contact", "license", "termsOfService", "externalDocs", "example", "examples", "tags", "info" @@ -113,10 +165,10 @@ classify_by_path(rule): rationale ← (tiers.size == 1) ? "given path matched the " + level + " segment set" : "given path matched multiple tiers (" + tiers.join(", ") + ") — conservative match, ambiguous" - RETURN { riskLevel: level, confidenceLevel: confidence, rationale } + RETURN { riskLevel: level, confidenceLevel: confidence, rationale, source: "heuristic" } ``` -**Rationale for tier contents:** `UNSAFE_SEGMENTS` and `SAFE_SEGMENTS` are carried over unchanged from the quick-fixes algorithm's `BREAKING_SEGMENTS`/`NON_BREAKING_SEGMENTS` (same justification: `required`/`type`/`format`/`parameters` affect contract validity; documentation/metadata fields and `x-` vendor extensions never do). `HUMANREVIEW_SEGMENTS` is new: `enum`/`default` change what values are considered valid or assumed without removing existing valid values outright; `security`/`servers` change where/how requests are authenticated or routed — operationally significant but rarely rejected by a contract test; `operationId`/`responses`/`additionalProperties` affect generated-client method names or extensibility, plausible to need a human's confirmation but not a breaking validation change in the way `required`/`type` are. +**Rationale for tier contents:** the OpenAPI portions of `UNSAFE_SEGMENTS` and `SAFE_SEGMENTS` are carried over unchanged from the quick-fixes algorithm's `BREAKING_SEGMENTS`/`NON_BREAKING_SEGMENTS` (same justification: `required`/`type`/`format`/`parameters` affect contract validity; documentation/metadata fields and `x-` vendor extensions never do). The AsyncAPI additions mirror `clarification-algorithm.md`'s ontology directly: channel `address` and operation `action` are AsyncAPI's equivalent of an OpenAPI path/HTTP-method — changing either changes where/how a consumer connects, so they sit in `UNSAFE_SEGMENTS` alongside `parameters`; `messages`/`payload` are AsyncAPI's equivalent of request/response bodies, so they join `required`/`type`/`format`. `HUMANREVIEW_SEGMENTS` covers fields that are typically additive or operationally significant without invalidating an existing consumer outright: `enum`/`default` change what values are considered valid or assumed without removing existing valid values outright; `security`/`servers` change where/how requests are authenticated or routed; `operationId`/`responses`/`additionalProperties` affect generated-client method names or extensibility; `channels`/`operations`/`reply`, matched only here (not in `UNSAFE_SEGMENTS`), cover rules that reach the channel/operation collections broadly without the Stage 2a key-selector shape — plausible to need a human's confirmation but not unambiguously a breaking validation change. **Rationale for ambiguity downgrade:** a rule whose `given` spans multiple tiers (e.g. applies broadly to a schema with both `description` and `required` reachable beneath it) genuinely could not be classified with confidence by this heuristic alone — picking the conservative level avoids a false "safe", but the confidence MUST still reflect that the match itself was ambiguous, so a ruleset maintainer reviewing the analyser's output knows to look closer. @@ -124,17 +176,34 @@ classify_by_path(rule): ## Stage 3: Fallback -Runs only when neither Stage 1 nor Stage 2 produced a result (e.g. a custom rule with an unrecognized id and a `given` of `"$"` or another pattern with no matching segment). +Runs only when neither Stage 0, Stage 1, nor Stage 2 produced a result (e.g. a custom rule with an unrecognized id and a `given` of `"$"` or another pattern with no matching segment, and no persisted/bundled entry for it). + +``` +RETURN { riskLevel: "unsafe", confidenceLevel: "low", rationale: "no recognizable rule-id or path signal", source: "fallback" } +``` + +**Rationale:** conservative-by-default — an unanalyzable rule is never assumed safe to auto-remediate. This also guarantees SC-005 (100% of rules in any ruleset receive a classification): every rule reaches Stage 3 if Stages 0–2 don't match, so no rule is ever left unclassified. + +--- + +## Stage 4: Persisting a Correction + +Not part of the per-rule classification pipeline — an explicit, user-initiated action (FR-013) that writes into the store Stage 0 reads from: ``` -RETURN { riskLevel: "unsafe", confidenceLevel: "low", rationale: "no recognizable rule-id or path signal" } +persist_rule_analysis_override(rulesetIdentity, ruleId, riskLevel, scope): + confidenceLevel ← "high" // a human has explicitly confirmed this level + rationale ← "user-confirmed override" + entry ← { ruleId, riskLevel, confidenceLevel, rationale, source: "persisted" } + write entry into the (workspace | global) persisted-analysis store, keyed by (rulesetIdentity, ruleId) + // last-write-wins per ruleId; does not require re-submitting other rules' entries (FR-015) ``` -**Rationale:** conservative-by-default — an unanalyzable rule is never assumed safe to auto-remediate. This also guarantees SC-005 (100% of rules in any ruleset receive a classification): every rule reaches Stage 3 if Stages 1–2 don't match, so no rule is ever left unclassified. +**Rationale:** this is the mechanism `clarification-algorithm.md` describes as letting "the user perform this review once and then encode the correct safety level for this rule in this ruleset" — turning a one-time `humanreview`/`unsafe`, low-confidence determination into a durable `high`-confidence one for every future run against that same ruleset content. --- -## Stage 4: Per-Violation Lookup (Remediation Safety) +## Stage 5: Per-Violation Lookup (Remediation Safety) Used at grading time, not during ruleset analysis. Given a `Diagnostic` and a previously-computed `RulesetAnalysis`: @@ -145,7 +214,7 @@ get_remediation_safety(diagnostic, rulesetAnalysis): RETURN { riskLevel: "unsafe", confidenceLevel: "low" } // FR-009: rule unanalysed at lookup time ``` -**Rationale:** keeps grading O(1) per violation (a map lookup) instead of re-running the analyser per diagnostic, and preserves the conservative-default guarantee even in the edge case where the ruleset changed between analysis and grading (e.g. a remote ruleset URL was re-fetched and gained a rule). +**Rationale:** keeps grading O(1) per violation (a map lookup) instead of re-running the analyser per diagnostic, and preserves the conservative-default guarantee even in the edge case where the ruleset changed between analysis and grading (e.g. a remote ruleset URL was re-fetched and gained a rule, or a persisted analysis only covered some of the ruleset's rules). --- @@ -157,7 +226,11 @@ rules = [ { id: "operation-operationId", given: "$.paths[*][*]" }, { id: "oas3-schema", given: "$" }, { id: "custom-required-header", given: "$.paths[*][*].parameters[?(@.in=='header')].required" }, - { id: "custom-naming-convention", given: "$.paths[*]" } + { id: "custom-naming-convention", given: "$.paths[*]~" }, + { id: "custom-channel-rename", given: "$.channels[*]~" }, + { id: "custom-channel-address", given: "$.channels[*].address" }, + { id: "custom-no-signal", given: "$.x-custom-thing" }, + { id: "previously-reviewed-rule", given: "$.unrecognizedExtension" } ] ``` @@ -166,8 +239,12 @@ rules = [ | `operation-description` | Stage 1 (safe table) | `safe` | `high` | Rule id matched curated safe-prefix table | | `operation-operationId` | Stage 1 (humanreview table) | `humanreview` | `high` | Rule id matched curated humanreview-prefix table | | `oas3-schema` | Stage 1 (unsafe table) | `unsafe` | `high` | Rule id matched curated unsafe-prefix table | -| `custom-required-header` | Stage 2 (`required` segment) | `unsafe` | `medium` | `given` path matched the unsafe segment set only | -| `custom-naming-convention` | Stage 3 (fallback) | `unsafe` | `low` | No recognizable rule-id or path signal | +| `custom-required-header` | Stage 2b (`required` segment) | `unsafe` | `medium` | `given` path matched the unsafe segment set only | +| `custom-naming-convention` | Stage 2a (path key-selector) | `unsafe` | `high` | `given` selects path object keys directly | +| `custom-channel-rename` | Stage 2a (channel key-selector) | `unsafe` | `high` | `given` selects channel object keys directly | +| `custom-channel-address` | Stage 2b (`address` segment) | `unsafe` | `medium` | `given` path matched the unsafe segment set only (AsyncAPI channel address) | +| `custom-no-signal` | Stage 3 (fallback) | `unsafe` | `low` | No recognizable rule-id or path signal | +| `previously-reviewed-rule` | Stage 0 (persisted) | *(whatever the user set)* | `high` | User-confirmed override from a prior run against this same ruleset content | --- @@ -176,19 +253,21 @@ rules = [ | Component | Logic | |---|---| | **Classification granularity** | Per rule, not per violation instance — one `RuleAnalysis` per `ruleId` in the ruleset | -| **Stage priority** | Curated rule-id table (Stage 1) → path heuristic on `given` (Stage 2) → fallback (Stage 3) | +| **Stage priority** | Persisted/bundled lookup (Stage 0) → curated rule-id table (Stage 1) → key-selector + path heuristic on `given` (Stage 2) → fallback (Stage 3) | +| **Ruleset identity** | Content hash over normalized rule definitions (`ruleId`, `given`, `then.function`, `severity`, `description`) — never the supplied path/URL | | **Tier priority (Stage 1 and Stage 2)** | `unsafe` checked/preferred over `humanreview` over `safe` whenever ambiguity exists | -| **Confidence assignment** | `high` = curated table match; `medium` = single-tier path match; `low` = fallback or multi-tier path match | +| **Confidence assignment** | `high` = persisted/curated/key-selector match; `medium` = single-tier path-segment match; `low` = fallback or multi-tier path match | | **Default when unanalysable** | `unsafe` / `low` confidence — never `safe` | | **Per-violation lookup miss** | Defaults to `unsafe` / `low`, same as an unanalysable rule (FR-009) | -| **Caching** | Computed once per loaded ruleset; reused for every diagnostic in a grading run that shares that ruleset | +| **Caching** | Computed once per distinct ruleset content (by identity); persisted across invocations, not just cached for one process (FR-012) | +| **Partial persisted coverage** | A persisted/bundled analysis covering only some `ruleId`s short-circuits just those rules; the rest proceed through Stages 1–3 normally (FR-015) | --- ## Implementation Notes -- **Deterministic:** no randomization, timestamps, or external state; re-analysing the same ruleset always yields the same `RulesetAnalysis`. +- **Deterministic for a given input state:** re-analysing the same ruleset content with the same persisted-analysis store always yields the same `RulesetAnalysis`. Stage 0 deliberately introduces store-dependence by design — a user's persisted correction is *supposed* to change the outcome on later runs; it does not undermine determinism, since the store itself is also keyed by content and changes only on an explicit user action (Stage 4). - **Total coverage:** every rule key present in the input ruleset produces exactly one `RuleAnalysis` (Stage 3 guarantees this) — satisfies SC-005. -- **Spec-format agnostic:** operates on ruleset rule metadata, which is uniform across the OpenAPI and AsyncAPI built-in rulesets and any custom Spectral-compatible ruleset; no spec-type branching required. +- **Spec-format agnostic:** operates on ruleset rule metadata, which is uniform across the OpenAPI and AsyncAPI built-in rulesets and any custom Spectral-compatible ruleset; Stage 2's segment sets and key-selector check are explicitly format-aware (covering both OpenAPI and AsyncAPI contract-surface terms) but require no spec-type branching in the algorithm itself. - **Conservative by design:** `unsafe`/`low` is the universal fallback, not an error condition. - **Relationship to grading:** does not affect score, letter grade, or diagnostic ordering — it is consulted only when building remediation-safety-specific output (CLI `--remediation-safety`, MCP `grade-api-remediation-safety` and `analyse-ruleset-safety`). From 2c19124271574c3dcd67cc952bb9fa612ddc8041 Mon Sep 17 00:00:00 2001 From: DawMatt Date: Wed, 24 Jun 2026 12:35:40 +1000 Subject: [PATCH 04/22] Safety assessment persistence clarification --- specs/012-remediation-safety/data-model.md | 37 ++++++----- specs/012-remediation-safety/research.md | 24 +++++--- specs/012-remediation-safety/spec.md | 30 ++++++--- ...mated_remediation_safety_algorithm_spec.md | 61 ++++++++++++------- 4 files changed, 97 insertions(+), 55 deletions(-) diff --git a/specs/012-remediation-safety/data-model.md b/specs/012-remediation-safety/data-model.md index 75e2b0e..e1f7e56 100644 --- a/specs/012-remediation-safety/data-model.md +++ b/specs/012-remediation-safety/data-model.md @@ -30,15 +30,15 @@ One entry per rule in an analysed ruleset. **Validation rules**: every rule present in the input ruleset MUST produce exactly one `RuleAnalysis` entry (FR-001, SC-005) — no rule is ever omitted from analyser output. -## RulesetIdentity +## RuleFingerprint -A stable identifier for "the same ruleset" across separate invocations (FR-014), used as the lookup/storage key for `PersistedRulesetAnalysis` and the bundled default. +A stable identifier for "this exact rule definition" (FR-014), used as part of the lookup/storage key for persisted entries. Deliberately scoped to one rule, not the whole ruleset — see `research.md` §8 for why a whole-ruleset hash was rejected (it would invalidate an entire shared analysis file on any single rule edit). | Field | Type | Description | |---|---|---| -| `value` | string | SHA-256 hash over the ruleset's normalized rule definitions (`ruleId`, `given`, `then.function`, `severity`, `description`, sorted by `ruleId`). | +| `value` | string | Hash over one rule's own content: `ruleId`, `given`, `then.function`, `severity`, `description`. | -**Relationships**: Derived solely from ruleset *content*, never from `rulesetPath`/`rulesetUrl` — the same content hashes identically regardless of where it was loaded from; different content (even at an unchanged path/URL) hashes differently. This is what lets a persisted/bundled analysis be correctly reused or correctly invalidated (spec.md Edge Cases). +**Relationships**: Computed independently per `ruleId`; never derived from `rulesetPath`/`rulesetUrl`. A `RuleAnalysis` entry stored with a given `RuleFingerprint` is only reused if the rule's current `RuleFingerprint` still matches (spec.md Edge Cases — stale entries are skipped per-rule, not per-ruleset). ## RulesetAnalysis @@ -46,22 +46,31 @@ A stable identifier for "the same ruleset" across separate invocations (FR-014), |---|---|---| | `rulesetSource` | `"default" \| "custom"` | Mirrors `GradeResult.rulesetSource`. | | `rulesetPath` | string (optional) | Present when `rulesetSource === "custom"`. | -| `rulesetIdentity` | string | The `RulesetIdentity.value` for this ruleset's content. | -| `rules` | `RuleAnalysis[]` | One entry per rule, see above. May be assembled from a mix of `source` values — some rules from Stage 0 (persisted/bundled), the rest from Stages 1–3. | +| `rules` | `RuleAnalysis[]` | One entry per rule, see above. May be assembled from a mix of `source` values — some rules from Stage 0 (persisted/shared/bundled), the rest from Stages 1–3. | -**Relationships**: Computed once per distinct ruleset content (keyed by `rulesetIdentity`, see `PersistedRulesetAnalysis` below), not merely cached for the lifetime of one process — this corrects the original design, which assumed no cross-invocation persistence (see `research.md` §8, added after reassessment against `clarification-algorithm.md`). `GradeEngine` (or a caller wrapping it) holds the `RulesetAnalysis` alongside the loaded ruleset for the duration of a single run and consults it when building remediation-safety output, rather than recomputing per violation; across separate runs, the persisted/bundled layer (Stage 0) is what avoids recomputing per-rule classification for rules it covers. +**Relationships**: Computed once per distinct rule definition (keyed by `RuleFingerprint`, see `SharedRulesetAnalysis`/`PersonalRulesetAnalysisOverride` below), not merely cached for the lifetime of one process — this corrects the original design, which assumed no cross-invocation persistence (see `research.md` §8, added after reassessment against `clarification-algorithm.md` and further revised after direct user input on the sharing requirement). `GradeEngine` (or a caller wrapping it) holds the `RulesetAnalysis` alongside the loaded ruleset for the duration of a single run and consults it when building remediation-safety output, rather than recomputing per violation; across separate runs — and across different users pointed at the same ruleset — the persisted/shared/bundled layer (Stage 0) is what avoids recomputing per-rule classification for rules it covers. -## PersistedRulesetAnalysis (new) +## SharedRulesetAnalysis (new) -A partial or full `RulesetAnalysis`, saved against a `RulesetIdentity` so it can be reloaded automatically on future runs against the same ruleset content (FR-012, FR-013). +A partial or full `RulesetAnalysis`, **colocated with the ruleset itself** (FR-016/FR-017) so it can be reloaded automatically on future runs against the same ruleset — by anyone who can read that ruleset, not just the user who created it. + +| Field | Type | Description | +|---|---|---| +| `location` | string | Derived deterministically from the ruleset's own path/URL via a fixed naming convention (e.g. appending a suffix to the ruleset's filename) — never a separately-tracked or registered location. | +| `rules` | `Record` | Keyed by `ruleId`. May cover all or only some of a ruleset's rules — uncovered rules are simply absent from the map. Each entry carries the `RuleFingerprint.value` it was captured against, for staleness detection. | + +**Validation rules**: every `RuleAnalysis` value in `rules` MUST have `source: "persisted"`. For a local ruleset this file lives on disk next to the ruleset and is read/written directly; for a GitHub-hosted ruleset it is *read* via the same `resolveRuleset`/`fetchRulesetContent` flow already used to fetch the ruleset (FR-017), but is never *written* automatically (FR-019) — see `PersonalRulesetAnalysisOverride` for what happens when a write is requested against a non-writable location. + +## PersonalRulesetAnalysisOverride (new, replaces the original PersistedRulesetAnalysis) + +A user-local correction (FR-018) that does not modify `SharedRulesetAnalysis`. Reuses the existing workspace/global config-file scope already established for `RulesetConfig` (`packages/api-grade-core/src/config/ruleset-config.ts`), narrowed to this role rather than serving as the primary persistence mechanism. | Field | Type | Description | |---|---|---| -| `rulesetIdentity` | string | The `RulesetIdentity.value` this analysis applies to. | | `scope` | `"workspace" \| "global"` | Storage scope, reusing the precedence already established by `RulesetScope`/`RulesetResolution` for ruleset *selection* (workspace checked before global). | -| `rules` | `Record` | Keyed by `ruleId`. May cover all or only some of a ruleset's rules — uncovered rules are simply absent from the map, not represented as explicit nulls. | +| `rules` | `Record` | Keyed by `ruleId`, same shape as `SharedRulesetAnalysis.rules`. | -**Validation rules**: every `RuleAnalysis` value in `rules` MUST have `source: "persisted"` (entries are only ever written here via an explicit user correction, Stage 4 of the algorithm spec). Storage location reuses the existing workspace/global config file scope already used for `RulesetConfig` (`packages/api-grade-core/src/config/ruleset-config.ts`), rather than introducing a new persistence subsystem. +**Validation rules**: every `RuleAnalysis` value in `rules` MUST have `source: "persisted"`. This is also the write target when a correction is requested against a ruleset whose location is not writable (e.g. GitHub-hosted) — see Stage 4 of the algorithm spec for the exact fallback behavior. ## BundledRulesetAnalysis (new) @@ -69,7 +78,7 @@ The built-in ruleset's pre-calculated analysis, shipped with the package (FR-012 ## Lookup precedence (Stage 0) -For a given `rulesetIdentity` and `ruleId`, checked in order until one matches: workspace-scoped `PersistedRulesetAnalysis` → global-scoped `PersistedRulesetAnalysis` → `BundledRulesetAnalysis` (only if this is the built-in ruleset) → fall through to Stages 1–3 of the algorithm. +For a given `ruleId`, checked in order until one matches a current `RuleFingerprint`: workspace-scoped `PersonalRulesetAnalysisOverride` → global-scoped `PersonalRulesetAnalysisOverride` → `SharedRulesetAnalysis` colocated with the ruleset → `BundledRulesetAnalysis` (only if this is the built-in ruleset) → fall through to Stages 1–3 of the algorithm. Personal overrides are checked first because they represent the most specific, most recently expressed intent for that user. ## RemediationItem (was `QuickFix`) @@ -96,7 +105,7 @@ For a given `rulesetIdentity` and `ruleId`, checked in order until one matches: | `remediationItems` | `RemediationItem[]` | Renamed from `quickFixes`. | | `requestedLevel` | `RemediationSafetyLevel` | **New** — echoes the level that was filtered for, since there are now three possible values instead of one implicit one. | -**State transitions**: `RemediationItem`/`RemediationSafetyOutput` are computed fresh per grading/analysis request and never persisted — only the per-rule `RuleAnalysis` entries behind them (via `RulesetAnalysis`/`PersistedRulesetAnalysis`/`BundledRulesetAnalysis`) are persisted, and only at the granularity of "one rule's classification within one ruleset's identity," not as a snapshot of any specific request's output. This corrects the original assumption (carried over from Feature 11's request-scoped data model) that nothing in this feature persists across requests — `clarification-algorithm.md` requires the per-rule analysis layer specifically to survive across requests so it is not re-estimated, and re-reviewed by a human, on every run against the same ruleset. +**State transitions**: `RemediationItem`/`RemediationSafetyOutput` are computed fresh per grading/analysis request and never persisted — only the per-rule `RuleAnalysis` entries behind them (via `SharedRulesetAnalysis`/`PersonalRulesetAnalysisOverride`/`BundledRulesetAnalysis`) are persisted, and only at the granularity of "one rule's classification, keyed by that rule's fingerprint," not as a snapshot of any specific request's output. This corrects the original assumption (carried over from Feature 11's request-scoped data model) that nothing in this feature persists across requests — `clarification-algorithm.md`, and the project's own goal of letting a team share judgements rather than each configuring their own copy, both require the per-rule analysis layer to survive across requests and across users. ## Lookup / default behavior diff --git a/specs/012-remediation-safety/research.md b/specs/012-remediation-safety/research.md index 75fe3c5..63e89ce 100644 --- a/specs/012-remediation-safety/research.md +++ b/specs/012-remediation-safety/research.md @@ -4,7 +4,7 @@ **Decision**: The ruleset analyser classifies at the **rule** level (one risk + confidence pair per `ruleId`), computed once per loaded ruleset and cached. Remediation safety for a specific violation is a lookup against this cache by `ruleId`, not a fresh per-instance computation. -**Reassessed against `clarification-algorithm.md`**: confirmed and unchanged. The clarification document's "Recommended High Level Approach" frames the problem the same way — analyse the ruleset once, "for every rule," and reuse that output — and explicitly motivates this with the same performance argument given here ("avoiding the need to estimate ruleset safety on every run"). One addition: the clarification document expects this cache to outlive a single process/grading run (see §8, new) — "computed once per loaded ruleset" should be read as "computed once per distinct ruleset *content*, persisted, and reused across invocations," not merely cached for the lifetime of one process. +**Reassessed against `clarification-algorithm.md`**: confirmed and unchanged. The clarification document's "Recommended High Level Approach" frames the problem the same way — analyse the ruleset once, "for every rule," and reuse that output — and explicitly motivates this with the same performance argument given here ("avoiding the need to estimate ruleset safety on every run"). One addition: the clarification document expects this cache to outlive a single process/grading run, and (per direct user input, see §8) to be shared across an entire team, not just across one user's invocations — "computed once per loaded ruleset" should be read as "computed once per rule definition, persisted colocated with the ruleset, and reused by anyone who reads that ruleset," not merely cached for the lifetime of one process or one user's machine. **Rationale**: FR-001 and the spec's Key Entities explicitly scope the analyser to "for each rule". This is also what makes FR-011 possible (inspecting ruleset risk independent of grading any specific spec) and keeps the calculation O(1) per violation at grading time instead of re-running heuristics per occurrence. @@ -74,18 +74,24 @@ **Rationale**: Preserves FR-007 (identical behavior for `safe`) without redefining what the existing parameter means; a cumulative mode is not requested by the spec and would be a separate, additive feature if ever needed (YAGNI per the constitution's Development Workflow). -## 8. Persisting and reloading ruleset analysis (FR-012–FR-015) +## 8. Persisting and reloading ruleset analysis (FR-012–FR-019) **Decision**: This is a correction to the original plan/data-model, prompted directly by `clarification-algorithm.md`'s "Recommended High Level Approach" (steps 1 and 4) and its "Expected user-specific ruleset usage behaviour" section, neither of which was reflected in the initial design. Both documents explicitly call for (a) loading pre-calculated risk/safety levels for known rulesets — "at a minimum... the default ruleset" — before running automated analysis, and (b) letting users persist corrections so they are reloaded automatically next time the same ruleset is used. The original data-model statement that "nothing is persisted across requests" directly contradicts this and is superseded. +**Revised again, per direct user input**: the first version of this decision made the workspace/global per-user config the primary store, keyed by a whole-ruleset content hash. The user reviewing this plan pointed out the actual goal is broader than personal reuse: colleagues in the same organisation should be able to share one set of judgements without each separately configuring their own copy, and the mechanism needs to work for both local and GitHub-hosted rulesets. A per-user config file cannot do that — it lives on one machine and is never seen by a colleague pointed at the same ruleset. The design below replaces the *primary* mechanism with a colocated, shared file, and narrows the original workspace/global design to a secondary, personal-override role. + Design: -- **Ruleset Identity**: a SHA-256 hash of the ruleset's normalized rule definitions (`given`/`then`/`severity`/`description` per rule, sorted by `ruleId`), not the supplied path/URL. Path/URL is not a reliable identity (the same content can be re-fetched from a different mirror; the same path can later point at edited content), but content is exactly what the analyser's classification depends on. -- **Lookup order, ahead of Stage 1**: (0a) a workspace-scoped persisted analysis for this Ruleset Identity, (0b) a global-scoped one, (0c) for the built-in ruleset specifically, a pre-calculated analysis bundled with the package at build time (so SC-007 holds even on a machine with no prior user activity), then (1)–(3) as already specified. This mirrors the existing `RulesetScope` precedence (`per-request` > `session` > `workspace` > `global` > `built-in`) already used for ruleset *selection*, reused here for ruleset *analysis* rather than inventing a second precedence model. -- **Partial coverage**: a persisted/bundled analysis is a map keyed by `ruleId`; only rules present in that map short-circuit Stages 1–3 for this ruleset (FR-015). This is the same mechanism as the existing lookup-miss default (§4) — a rule not in the persisted map is simply not a hit, and analysis proceeds normally for it. -- **Writing a correction**: a user-supplied override for one rule is merged into the workspace-scoped persisted analysis for that ruleset's current Identity (last-write-wins per `ruleId`); it does not require re-submitting every other rule's classification. Exact surface (CLI flag vs. MCP tool input) is an implementation detail for the planning phase, not fixed here. -- **Storage location**: reuses the existing workspace (`.api-grade/config.json`-adjacent) / global (`~/.api-grade/`) file scope already established for `RulesetConfig`, rather than a new persistence subsystem — consistent with the constitution's preference against unnecessary new infrastructure. +- **Primary mechanism — colocated Shared Ruleset Analysis (FR-016/FR-017)**: the persisted analysis for a ruleset lives next to the ruleset itself, at a location derived deterministically from the ruleset's own path/URL by a naming convention (e.g. appending a fixed suffix to the ruleset's filename). Presence/absence is therefore a direct lookup at that derived location — no separate index or registry needs to be consulted or kept in sync. For a local ruleset, this is a sibling file in the same directory. For a GitHub-hosted ruleset, this is a sibling file at the same repo path/ref, fetched through the *same* resolution and auth flow already used to fetch the ruleset (`resolveRuleset`/`fetchRulesetContent`, reusing whatever `AuthConfig`/GitHub PAT was already supplied) — no new auth concept. Because it is colocated and (for GitHub-hosted rulesets) typically version-controlled in the same repository, anyone who can read the ruleset automatically sees the same shared analysis — satisfying the sharing goal without any per-user setup (SC-008). +- **Per-rule Fingerprint, not a whole-ruleset hash**: each entry in the shared file is keyed by `ruleId` and carries a fingerprint of that rule's own content (`given`/`then.function`/`severity`/`description`). Recomputing the current ruleset's rules and comparing fingerprints per-`ruleId` means one edited rule invalidates only its own entry, not the whole shared file — important precisely because this file is meant to be shared and incrementally maintained by a team over time, not regenerated wholesale on every edit. (A whole-ruleset hash, as in the original design, would invalidate every entry the moment any one rule changed — too coarse for a file meant to accumulate team knowledge.) +- **Secondary mechanism — Personal Ruleset Analysis Override (FR-018)**: the original workspace/global config-file design (`RulesetConfig`-style storage) is retained, but repurposed: it now holds only a user's *personal* corrections, which take precedence over the shared colocated file and over automated analysis, without writing to the shared file. This covers the case where a user disagrees with the team's shared judgement, or can read but not write the ruleset's location (e.g. a GitHub-hosted ruleset they have only read access to). Lookup precedence: personal override (workspace, then global) → shared colocated analysis → bundled default (built-in ruleset only) → Stages 1–3. +- **Partial coverage**: both the shared file and the personal override are maps keyed by `ruleId`; only rules present in a given map short-circuit Stages 1–3 for this ruleset (FR-015). This is the same mechanism as the existing lookup-miss default (§4) — a rule not in either map is simply not a hit, and analysis proceeds normally for it. +- **Writing a correction (FR-013/FR-018/FR-019)**: for a *local* ruleset, persisting "to the shared file" is a normal local file write — low-risk and reversible, no different from any other local file edit the tool makes, and the resulting diff is something the user can review/commit/PR through their normal process. For a *GitHub-hosted* (or otherwise remote, non-writable) ruleset, the tool does not push a commit automatically (FR-019) — pushing a change to a shared, version-controlled artifact that other colleagues rely on is a different risk class than editing a local file, and should go through the user's normal review process, not a silent automated write. In that case the correction is recorded as a Personal Override locally, and the tool can additionally emit the updated shared-file content for the user to commit themselves. -**Rationale**: Both source documents treat this as integral to the algorithm, not an optional extra — the clarification document frames it as the mechanism that makes "human review" classifications actually useful over time ("the user can perform this review once and then encode the correct safety level for this rule in this ruleset") and as a deliberate performance optimization (avoiding re-estimation on every run for the common case of a user repeatedly grading against the same one or two rulesets). +**Rationale**: Both source documents (`clarification-algorithm.md` and the user's own architectural input) converge on the same underlying need — avoid re-deriving (and re-human-reviewing) the same ruleset's classifications repeatedly, for every person who uses it. Colocation is the simpler, more direct way to satisfy "the user can perform this review once and then encode the correct safety level for this rule in this ruleset" *for an entire team*, rather than once per person. It also reuses existing project capability (the ruleset resolution/fetch path already handles "give me a file or URL, with optional auth") rather than inventing a new shared-storage concept. -**Alternatives considered**: Keying persisted analyses by `rulesetPath`/`rulesetUrl` instead of content hash — rejected, because it would either wrongly reuse a stale analysis after the file at that path changes, or wrongly treat the same ruleset fetched via two different paths/mirrors as unrelated; a content hash gets both cases right. A dedicated new config subsystem instead of extending the existing workspace/global config scope — rejected as unnecessary duplication of infrastructure that already solves "where does per-user, per-workspace state live" for this exact project. +**Alternatives considered**: +- Keying the shared file by a whole-ruleset content hash (the original decision) — rejected per the fingerprint discussion above: too coarse for a file intended to be incrementally maintained. +- A purely per-user workspace/global store as the *only* mechanism (the original decision) — rejected: cannot satisfy the stated organisational-sharing goal; a teammate pointed at the same ruleset would never see it. +- A separate index/registry file mapping ruleset identities to analysis file locations — rejected in favor of a pure naming-convention derivation: an index is one more thing that can drift out of sync with the rulesets it describes, where a deterministic derived path cannot. +- Automatically committing corrections back to a GitHub-hosted ruleset's repository — rejected: writing to a shared, remote artifact without an explicit human review step is a meaningfully different risk than a local file edit, and not something this feature should do silently. diff --git a/specs/012-remediation-safety/spec.md b/specs/012-remediation-safety/spec.md index 04f9169..b1061c7 100644 --- a/specs/012-remediation-safety/spec.md +++ b/specs/012-remediation-safety/spec.md @@ -65,9 +65,11 @@ A new contributor or documentation reader should encounter "remediation safety" - How does the analyser handle a rule that legitimately spans multiple risk levels depending on context (e.g. a rule that sometimes flags a breaking change and sometimes a cosmetic one)? The specification (`automated_remediation_safety_algorithm_spec.md`) must define how such rules are classified — at the rule level, the analyser assigns one risk/confidence pair per rule; finer-grained, per-violation distinctions are out of scope for the analyser itself. - What happens when a custom/private ruleset is supplied that the analyser has never seen before? It must still produce a complete classification (risk + confidence) for every rule, with confidence honestly reflecting the lack of prior knowledge, rather than failing the grading run. - What happens to existing consumers (CI pipelines, scripts) that depend on today's binary `safe` vs "not safe" filtering? `--remediation-safety safe` (and equivalent MCP/package usage) must continue to mean exactly what it means today; the new levels are additive, not a breaking redefinition of `safe`. -- What happens when a user corrects a rule's risk level for a ruleset, and that ruleset's content later changes (rules added, removed, or edited)? The correction is keyed to the Ruleset Identity (content-derived); if the identity no longer matches, the persisted analysis is treated as not found for the changed rule(s) rather than silently misapplied, and those rule(s) fall back to automated analysis (FR-014). -- What happens when only some rules in a ruleset have a pre-calculated or user-corrected entry? Every rule still gets a classification (FR-015) — covered rules use the persisted entry, the rest go through automated analysis as if no persisted analysis existed for them. -- What happens on a machine/environment where no persisted analysis exists yet for a ruleset the user has never used before (including the very first run for any user)? The system performs the existing automated analysis (Stages 1–3) and proceeds normally; persistence is an optimization and trust-building mechanism, never a precondition for producing output. +- What happens when a rule's definition changes after a colocated/shared or personal analysis entry was captured for it? That rule's Fingerprint no longer matches, so the stale entry is treated as not found for that rule alone (FR-014) and falls back to automated analysis — unaffected rules elsewhere in the same shared file remain trusted. +- What happens when only some rules in a ruleset have a pre-calculated, shared, or personal entry? Every rule still gets a classification (FR-015) — covered rules use the matching entry, the rest go through automated analysis as if no persisted analysis existed for them. +- What happens on a machine/environment where no persisted analysis exists yet for a ruleset that has never been analysed before (including the very first time anyone on a team uses it)? The system performs the existing automated analysis (Stages 1–3) and proceeds normally; persistence is an optimization and trust-building mechanism, never a precondition for producing output. +- What happens when a user wants to disagree with a colleague's shared classification for a rule, but doesn't have write access to the ruleset's location (e.g. it's a GitHub-hosted ruleset they can read but not push to) or doesn't want to change what the rest of the team sees? They persist a Personal Ruleset Analysis Override (FR-018) instead, which is honored for them locally without modifying the shared, colocated data. +- What happens when the ruleset is GitHub-hosted and a user persists a correction intended to be shared? The system does not push a commit to the remote location automatically (FR-019); it can still produce the updated shared-analysis content for the user to commit themselves through their normal review process, and in the meantime the correction is honored locally as a Personal Ruleset Analysis Override. ## Requirements *(mandatory)* @@ -86,8 +88,12 @@ A new contributor or documentation reader should encounter "remediation safety" - **FR-011**: The ruleset analyser's per-rule results (risk level, confidence level, and rationale) MUST be inspectable by users, in both JSON and human-readable form, independent of grading a specific API spec (i.e. "analyse this ruleset" is a capability in its own right, not only an internal implementation detail). - **FR-012**: Before running the automated analysis stages, the system MUST check for a previously computed or pre-calculated ruleset analysis for the loaded ruleset and, when found, use it directly instead of recomputing from rule metadata. At minimum, the built-in ruleset MUST ship with such a pre-calculated analysis. - **FR-013**: Users MUST be able to persist a correction to a rule's risk level (and, implicitly, raise its confidence to reflect human confirmation) for a specific ruleset, such that the corrected classification is automatically loaded and used the next time that same ruleset is analysed or graded against, without requiring the correction to be re-entered. -- **FR-014**: The system MUST be able to recognize "the same ruleset" across separate invocations for the purpose of FR-012/FR-013 reuse, even when the ruleset is supplied by file path or URL rather than by an identical in-memory reference, so that a pre-calculated or user-corrected analysis is not silently skipped or, conversely, wrongly reused against ruleset content that has actually changed. +- **FR-014**: The system MUST be able to recognize, for a given rule within a ruleset, whether a pre-calculated or persisted classification for that exact rule definition is still valid (i.e. the rule hasn't changed since the classification was captured) — a stale entry for a changed rule MUST NOT be silently reused, and MUST NOT prevent the other, unchanged rules' entries from being used. - **FR-015**: When a persisted or pre-calculated analysis only covers some of the rules in the currently loaded ruleset (e.g. the ruleset gained rules since the analysis was captured), the system MUST still produce a complete classification for every rule (FR-001/SC-005) — covered rules use the persisted/pre-calculated entry, uncovered rules fall through to automated analysis. +- **FR-016**: The system MUST support storing a ruleset's persisted analysis **colocated with the ruleset itself**, using a deterministic naming convention derived from the ruleset's own location, so that (a) presence or absence of persisted data for a given ruleset can be determined by a direct lookup at that derived location rather than a separate index, and (b) the persisted data can be shared between colleagues simply by it living alongside the ruleset (e.g. committed in the same repository), rather than each person having to separately configure their own copy. +- **FR-017**: The colocated lookup (FR-016) MUST work uniformly whether the ruleset is supplied as a local file path or fetched from a remote/GitHub-hosted location, reusing the same resolution and authentication mechanism already used to fetch the ruleset itself. +- **FR-018**: In addition to the shared, colocated analysis (FR-016), a user MUST be able to persist a personal correction that does not modify the shared colocated data — for cases where they lack write access to the ruleset's location, or want to apply their own judgement locally without changing what their colleagues see. A personal correction MUST take precedence, for that user, over both the shared colocated analysis and the automated analysis stages for the rule(s) it covers. +- **FR-019**: When the ruleset's location is not writable by the system directly (e.g. a GitHub-hosted ruleset), the system MUST NOT automatically write or commit a correction back to that remote location. It MAY still read any existing colocated shared analysis there (FR-017), and MAY produce the content a user would need to commit themselves to update the shared analysis. ### Key Entities *(include if feature involves data)* @@ -95,8 +101,9 @@ A new contributor or documentation reader should encounter "remediation safety" - **Risk Level**: One of `safe`, `humanreview`, `unsafe` — describes how safe it is to automatically remediate a violation of a given rule without human review. - **Confidence Level**: Describes how confident the analyser is in a rule's assigned risk level (e.g. driven by how well-known/recognizable the rule is versus how custom/ambiguous it is, or whether a human has explicitly confirmed it). - **Remediation Safety (per violation)**: The risk level applied to a specific violation found during grading, derived from the ruleset analyser's result for that violation's rule. -- **Ruleset Identity**: A stable identifier for "the same ruleset" across separate invocations, used to look up and store pre-calculated/persisted analyses (FR-012–FR-014). Derived from ruleset content, not from the path/URL it was supplied with, so that the identity survives a ruleset being re-fetched or relocated, and so that genuinely changed content is not mistaken for an unchanged ruleset. -- **Persisted Ruleset Analysis**: A ruleset analysis (in full or in part, e.g. just the rules a user has corrected) saved against a Ruleset Identity so it can be reloaded automatically on future runs against that same ruleset, without re-prompting the user or re-running automated analysis for the rules it covers. +- **Rule Fingerprint**: A stable identifier for "this exact rule definition," derived from a rule's content (`given`, `then.function`, `severity`, `description`), used to detect whether a pre-calculated/persisted entry for that `ruleId` is still valid for the rule as currently defined (FR-014), independent of where the ruleset as a whole is stored or fetched from. +- **Shared Ruleset Analysis**: A ruleset analysis (in full or in part) stored colocated with the ruleset itself via a deterministic naming convention (FR-016), readable by anyone who can read the ruleset (whether local file or GitHub-hosted, FR-017). This is the primary mechanism for a team/organisation to share one set of remediation-safety judgements instead of each person maintaining their own. +- **Personal Ruleset Analysis Override**: A user-local correction (FR-018) that takes precedence over the Shared Ruleset Analysis and the automated analysis stages for the rule(s) it covers, without modifying the shared, colocated data. ## Success Criteria *(mandatory)* @@ -107,8 +114,9 @@ A new contributor or documentation reader should encounter "remediation safety" - **SC-003**: A repository-wide search for "quick fix" (any casing/separator) returns zero matches after the feature is complete. - **SC-004**: Existing `--remediation-safety safe` users observe no behavioral change in the set of violations returned, compared to before this feature. - **SC-005**: For an arbitrary, previously-unseen custom ruleset, the analyser completes and returns a risk and confidence level for 100% of its rules (no rule left unclassified). -- **SC-006**: A user-corrected risk level for a rule in a given ruleset is honored (returned without re-running automated analysis for that rule) on a subsequent, separate invocation against the same ruleset content, and is no longer honored if that ruleset's content subsequently changes. +- **SC-006**: A user-corrected risk level for a rule in a given ruleset is honored (returned without re-running automated analysis for that rule) on a subsequent, separate invocation against the same rule definition, and is no longer honored once that specific rule's definition changes. - **SC-007**: The built-in ruleset's analysis is available without any per-rule automated computation having to run at request time (served from a pre-calculated/bundled result), for both the CLI and MCP surfaces. +- **SC-008**: Two different users pointed at the same ruleset location (local path or GitHub-hosted) see identical classifications for every rule covered by that ruleset's shared, colocated analysis, without either of them having separately configured it. ## Assumptions @@ -118,6 +126,8 @@ A new contributor or documentation reader should encounter "remediation safety" - "Rationale" per rule is a short, human-readable explanation (not a separate structured field requiring its own schema beyond a text string) sufficient for users to understand why a level was assigned. - Backstage plugin packages are in scope for surfacing remediation safety only insofar as they already surface quick-fix/remediation-safety information today; if they do not yet do so, extending them is out of scope for this feature. - This feature does not change how a custom ruleset is supplied (file path, GitHub PAT, etc.) — only how its rules are risk-classified once available. -- Persistence of pre-calculated/user-corrected ruleset analyses (FR-012–FR-015) reuses the same storage scope model (workspace/global config) already established for ruleset selection (`RulesetConfig`/`RulesetResolution`), rather than introducing a new persistence layer; the exact file/location is an implementation detail for planning, not a renegotiation of scope. -- Ruleset Identity is computed from ruleset content (e.g. a content hash), not from the path or URL the ruleset was supplied with, so the same ruleset retains its persisted analysis if relocated, and a different ruleset at the same path does not wrongly inherit one. -- "Persist a correction" (FR-013) refers to the data being saved for reuse; *how* a user supplies that correction (a CLI flag, an MCP tool call, hand-editing a config file) is an implementation detail for planning, not fixed by this specification. +- The primary persistence mechanism (FR-016/FR-017) colocates shared analysis data with the ruleset itself via a naming convention, rather than a separate per-user store — this is a deliberate choice to make sharing across a team/organisation the default, not an opt-in synchronization step. The existing workspace/global config scope (`RulesetConfig`/`RulesetResolution`) is retained, but narrowed to the Personal Ruleset Analysis Override role (FR-018) rather than being the primary persistence layer originally assumed. +- Rule Fingerprinting is computed from individual rule content (e.g. a hash of `given`/`then.function`/`severity`/`description` for that `ruleId`), not from the ruleset as a whole and not from the path/URL the ruleset was supplied with — this gives per-rule staleness detection (one changed rule doesn't invalidate an entire shared analysis file) and survives the ruleset being relocated or re-fetched from a mirror. +- The exact naming convention (FR-016) and file format are implementation details for planning, not fixed by this specification, but MUST satisfy: derivable from the ruleset's own path/URL alone (no separate index/registry to consult first), and human-readable/diffable enough to be code-reviewed when shared via a pull request. +- Automatic write-back to a remote/GitHub-hosted ruleset location (FR-019) is explicitly out of scope for this feature — sharing a correction to such a location is a human action (a commit/PR), which the tool may assist by producing the content but does not perform itself. +- "Persist a correction" (FR-013/FR-018) refers to the data being saved for reuse; *how* a user supplies that correction (a CLI flag, an MCP tool call, hand-editing the colocated/override file) is an implementation detail for planning, not fixed by this specification. diff --git a/specs/algorithms/automated_remediation_safety_algorithm_spec.md b/specs/algorithms/automated_remediation_safety_algorithm_spec.md index 65ca6b2..7b7e994 100644 --- a/specs/algorithms/automated_remediation_safety_algorithm_spec.md +++ b/specs/algorithms/automated_remediation_safety_algorithm_spec.md @@ -10,7 +10,7 @@ Determines, for every **rule** in a loaded ruleset, how risky it would be to aut This algorithm supersedes the two-class `classifyViolation()` algorithm described in [`quick_fixes_algorithm_spec.md`](./quick_fixes_algorithm_spec.md), extending it from a binary `nonBreaking`/`breaking` split (with `unknown` as an exclusion bucket) to three first-class risk levels with an explicit confidence dimension. It consumes rule **metadata** (`ruleId`, the rule's `given` JSONPath expression(s), `then.function`) from a loaded Spectral ruleset object — it does not consume `Diagnostic[]` directly; diagnostics are matched to their rule's pre-computed result by `ruleId` when remediation safety is needed for a grading run. -Before running automated classification, the analyser first checks for a previously persisted or pre-calculated analysis for this exact ruleset's content (Stage 0) — both a baseline for the built-in ruleset and a place for users to durably correct classifications they disagree with, so the same ruleset is not re-estimated, and re-reviewed, from scratch on every run. +Before running automated classification, the analyser first checks, per rule, for a previously persisted or pre-calculated analysis (Stage 0): a baseline bundled for the built-in ruleset, a **Shared Ruleset Analysis** colocated with the ruleset itself so a team shares one set of judgements automatically, and a personal override layer for individual corrections. This means the same ruleset is not re-estimated, and re-reviewed, from scratch on every run — or by every person who uses it. --- @@ -36,40 +36,43 @@ Each rule's risk level carries a confidence level: ## Input & Output -**Input:** a loaded Spectral ruleset object (`LoadedRuleset.ruleset` from `packages/api-grade-core/src/rulesets/loader.ts`), specifically its `rules` map: `{ [ruleId]: { given: string | string[], then: { function: string }, severity, description, recommended } }`. A **Ruleset Identity** (see Stage 0) is derived from this same input and used to look up any persisted or bundled pre-calculated analysis. +**Input:** a loaded Spectral ruleset object (`LoadedRuleset.ruleset` from `packages/api-grade-core/src/rulesets/loader.ts`), specifically its `rules` map: `{ [ruleId]: { given: string | string[], then: { function: string }, severity, description, recommended } }`, plus the ruleset's resolved location (local path or remote/GitHub URL). A per-rule **Fingerprint** (see Stage 0) is derived from each rule's own content and used to look up any persisted or bundled pre-calculated analysis for that rule. **Output:** `analyseRuleset(ruleset) -> RulesetAnalysis`: - `rulesetSource: 'default' | 'custom'`, `rulesetPath?: string` — mirrors the input `LoadedRuleset`. -- `rulesetIdentity: string` — the content hash described in Stage 0. - `rules: RuleAnalysis[]` — exactly one entry per rule key in the input ruleset (no omissions — see Implementation Notes). Each `RuleAnalysis`: `{ ruleId, riskLevel, confidenceLevel, rationale, source }`, where `source` is one of `'persisted'` (Stage 0a/0b), `'bundled-default'` (Stage 0c), `'curated'` (Stage 1), `'heuristic'` (Stage 2), or `'fallback'` (Stage 3) — see Data Model for the full enum. A second function, `getRemediationSafety(diagnostic, rulesetAnalysis) -> { riskLevel, confidenceLevel }`, performs the per-violation lookup at grading time (see Stage 5). -A third function, `persistRuleAnalysisOverride(rulesetIdentity, ruleId, riskLevel, scope)`, writes a user correction for one rule into the persisted-analysis store at the given scope (`workspace` | `global`), to be picked up by Stage 0 on future runs. +A third function, `persistRuleAnalysisOverride(ruleset, ruleId, riskLevel, scope)`, writes a user correction for one rule into either the colocated Shared Ruleset Analysis (`scope: 'shared'`) or the personal-override store (`scope: 'personal'`, `workspace` | `global`), to be picked up by Stage 0 on future runs (see Stage 4 for the write-target rules, including the remote/GitHub-hosted fallback). --- ## Stage 0: Persisted / Pre-Calculated Lookup -Runs **before** Stage 1, once per loaded ruleset (not per rule). Computes the ruleset's **Ruleset Identity** — a SHA-256 hash over its rule definitions, normalized as `sortBy(ruleId) -> ruleId + '|' + given + '|' + then.function + '|' + severity + '|' + description`, joined and hashed. Identity is derived from rule *content*, never from `rulesetPath`/`rulesetUrl`, so relocating or re-fetching an unchanged ruleset still hits the cache, and editing a ruleset at a stable path correctly misses it. +Runs **before** Stage 1, once per loaded ruleset (not per rule). For each rule, computes a **Rule Fingerprint** — a hash over that rule's own content: `hash(ruleId + '|' + given + '|' + then.function + '|' + severity + '|' + description)`. Fingerprinting is per-rule, not a single whole-ruleset hash, so that editing one rule invalidates only that rule's persisted entry, not every entry in a shared analysis file that a team may have spent time curating (FR-014). -Checked in order; the first hit for a given `ruleId` is used, and per-`ruleId` lookup continues independently — a ruleset's overall analysis can be assembled from a mix of sources: +Checked in order; the first hit for a given `ruleId` (with a matching Fingerprint) is used, and per-`ruleId` lookup continues independently — a ruleset's overall analysis is typically assembled from a mix of sources: ``` -0a. workspace-scoped persisted analysis for this rulesetIdentity (if present) -0b. global-scoped persisted analysis for this rulesetIdentity (if present) +0a. Personal Ruleset Analysis Override (workspace-scoped, then global-scoped) for this ruleId+fingerprint, if present +0b. Shared Ruleset Analysis colocated with the ruleset, for this ruleId+fingerprint, if present 0c. bundled pre-calculated analysis, ONLY if this is the built-in ruleset ``` For each `ruleId` covered by a hit: `RETURN { riskLevel, confidenceLevel, rationale, source: 'persisted' | 'bundled-default' }` using the stored values as-is (a `persisted` entry's `confidenceLevel` is whatever was stored when the correction was made — typically `high`, since a human confirmed it). -Any `ruleId` **not** covered by Stage 0 (no persisted/bundled entry exists for it — including the common case of an entirely new custom ruleset) falls through to Stage 1. This is the same "lookup miss → keep going" behavior as the per-violation lookup in Stage 5; Stage 0 never blocks or fails the analysis, it only short-circuits the rules it has prior knowledge of (FR-012, FR-015). +Any `ruleId` **not** covered by Stage 0 — no hit in 0a/0b/0c, or a hit exists but its stored Fingerprint no longer matches the rule's current definition (the rule was edited since the entry was captured) — falls through to Stage 1. This is the same "lookup miss → keep going" behavior as the per-violation lookup in Stage 5; Stage 0 never blocks or fails the analysis, it only short-circuits the rules it has valid prior knowledge of (FR-012, FR-014, FR-015). -**Rationale:** directly required by `clarification-algorithm.md`'s "Recommended High Level Approach" (steps 1 and 4): load pre-calculated risk/safety for known rulesets first, and let users persist their own corrections so the same ruleset doesn't need re-estimating — and re-confirming via human review — on every run. Content-hash identity (rather than path/URL) is what makes "the same ruleset" a meaningful, stable lookup key across separate invocations, possibly on different machines or after the ruleset file/URL has moved (FR-014). +**Shared Ruleset Analysis location (0b) — colocated via naming convention (FR-016/FR-017):** derived deterministically from the ruleset's own path/URL (e.g. appending a fixed suffix to the ruleset's filename), so presence is a direct lookup at that location, not a separate index. For a local ruleset this is a sibling file on disk; for a GitHub-hosted ruleset this is a sibling file fetched via the same resolution/auth flow already used to fetch the ruleset (`resolveRuleset`/`fetchRulesetContent`). Anyone who can read the ruleset can read this file, so a team sharing a ruleset automatically shares its analysis (SC-008) with no per-user configuration step. -**Bundled pre-calculated analysis for the built-in ruleset:** shipped with the package (generated by running Stages 1–3 once over the built-in ruleset at release time and committing the result), so `ruleset-analysis`/`analyse-ruleset-safety` against the built-in ruleset never requires per-rule computation at request time (SC-007), and so the built-in ruleset itself satisfies the "at a minimum the default ruleset" baseline the clarification document calls for. +**Personal Ruleset Analysis Override (0a):** checked first because it represents the most specific, most recently expressed intent — a user actively disagreeing with or supplementing the shared analysis for themselves, without writing to the shared file (FR-018). Stored using the existing workspace/global config-file scope (`packages/api-grade-core/src/config/ruleset-config.ts` pattern), repurposed for this narrower role. + +**Bundled pre-calculated analysis for the built-in ruleset (0c):** shipped with the package (generated by running Stages 1–3 once over the built-in ruleset at release time and committing the result), so `ruleset-analysis`/`analyse-ruleset-safety` against the built-in ruleset never requires per-rule computation at request time (SC-007), and so the built-in ruleset itself satisfies the "at a minimum the default ruleset" baseline the clarification document calls for. + +**Rationale:** directly required by `clarification-algorithm.md`'s "Recommended High Level Approach" (steps 1 and 4) and by the project's own stated goal of letting an organisation share one set of judgements rather than each person separately configuring their own copy. Colocation (rather than a per-user store as the primary mechanism) is what makes that sharing automatic; per-rule fingerprinting (rather than a whole-ruleset hash) is what keeps a shared file useful as the ruleset evolves incrementally instead of being invalidated wholesale by any single edit. --- @@ -188,18 +191,29 @@ RETURN { riskLevel: "unsafe", confidenceLevel: "low", rationale: "no recognizabl ## Stage 4: Persisting a Correction -Not part of the per-rule classification pipeline — an explicit, user-initiated action (FR-013) that writes into the store Stage 0 reads from: +Not part of the per-rule classification pipeline — an explicit, user-initiated action (FR-013/FR-018/FR-019) that writes into one of the two stores Stage 0 reads from. The target store depends on `scope` and on whether the ruleset's location is locally writable: ``` -persist_rule_analysis_override(rulesetIdentity, ruleId, riskLevel, scope): +persist_rule_analysis_override(ruleset, ruleId, riskLevel, scope): + fingerprint ← fingerprint_of(ruleset.rules[ruleId]) // see Stage 0 confidenceLevel ← "high" // a human has explicitly confirmed this level rationale ← "user-confirmed override" - entry ← { ruleId, riskLevel, confidenceLevel, rationale, source: "persisted" } - write entry into the (workspace | global) persisted-analysis store, keyed by (rulesetIdentity, ruleId) - // last-write-wins per ruleId; does not require re-submitting other rules' entries (FR-015) + entry ← { ruleId, fingerprint, riskLevel, confidenceLevel, rationale, source: "persisted" } + + IF scope == "personal": + write entry into the (workspace | global) personal-override store, keyed by ruleId + RETURN { written: "personal" } + + IF scope == "shared": + IF ruleset.location is a local file path: + write entry into the colocated Shared Ruleset Analysis file next to the ruleset (FR-016) + RETURN { written: "shared" } + ELSE: // remote/GitHub-hosted, not locally writable — FR-019 + write entry into the personal-override store as a fallback (so it still takes effect locally) + RETURN { written: "personal-fallback", sharedFileContent: } ``` -**Rationale:** this is the mechanism `clarification-algorithm.md` describes as letting "the user perform this review once and then encode the correct safety level for this rule in this ruleset" — turning a one-time `humanreview`/`unsafe`, low-confidence determination into a durable `high`-confidence one for every future run against that same ruleset content. +**Rationale:** this is the mechanism `clarification-algorithm.md` describes as letting "the user perform this review once and then encode the correct safety level for this rule in this ruleset" — turning a one-time `humanreview`/`unsafe`, low-confidence determination into a durable `high`-confidence one. Defaulting `scope` to `"shared"` for a local, writable ruleset maximizes the chance a correction benefits the whole team automatically (the spirit of FR-016); falling back to a personal-store write for a non-writable remote location (rather than failing, or silently attempting a remote write) keeps the correction useful immediately for the user who made it, without the tool performing an unrequested write to a shared, remote artifact (FR-019). --- @@ -244,7 +258,7 @@ rules = [ | `custom-channel-rename` | Stage 2a (channel key-selector) | `unsafe` | `high` | `given` selects channel object keys directly | | `custom-channel-address` | Stage 2b (`address` segment) | `unsafe` | `medium` | `given` path matched the unsafe segment set only (AsyncAPI channel address) | | `custom-no-signal` | Stage 3 (fallback) | `unsafe` | `low` | No recognizable rule-id or path signal | -| `previously-reviewed-rule` | Stage 0 (persisted) | *(whatever the user set)* | `high` | User-confirmed override from a prior run against this same ruleset content | +| `previously-reviewed-rule` | Stage 0 (persisted) | *(whatever was set)* | `high` | Matched a shared or personal override for this exact rule definition, from a prior run against this same ruleset | --- @@ -254,19 +268,22 @@ rules = [ |---|---| | **Classification granularity** | Per rule, not per violation instance — one `RuleAnalysis` per `ruleId` in the ruleset | | **Stage priority** | Persisted/bundled lookup (Stage 0) → curated rule-id table (Stage 1) → key-selector + path heuristic on `given` (Stage 2) → fallback (Stage 3) | -| **Ruleset identity** | Content hash over normalized rule definitions (`ruleId`, `given`, `then.function`, `severity`, `description`) — never the supplied path/URL | +| **Stage 0 lookup order** | Personal override (workspace, then global) → Shared Ruleset Analysis colocated with the ruleset → bundled default (built-in ruleset only) | +| **Rule identity** | Per-rule Fingerprint (hash of `ruleId`, `given`, `then.function`, `severity`, `description`) — never a whole-ruleset hash, and never the supplied path/URL | +| **Sharing mechanism** | Shared Ruleset Analysis is colocated with the ruleset via a naming convention (local sibling file, or same-repo-path fetch for GitHub-hosted) — not a per-user store (FR-016/FR-017) | | **Tier priority (Stage 1 and Stage 2)** | `unsafe` checked/preferred over `humanreview` over `safe` whenever ambiguity exists | | **Confidence assignment** | `high` = persisted/curated/key-selector match; `medium` = single-tier path-segment match; `low` = fallback or multi-tier path match | | **Default when unanalysable** | `unsafe` / `low` confidence — never `safe` | | **Per-violation lookup miss** | Defaults to `unsafe` / `low`, same as an unanalysable rule (FR-009) | -| **Caching** | Computed once per distinct ruleset content (by identity); persisted across invocations, not just cached for one process (FR-012) | -| **Partial persisted coverage** | A persisted/bundled analysis covering only some `ruleId`s short-circuits just those rules; the rest proceed through Stages 1–3 normally (FR-015) | +| **Caching** | Computed once per distinct rule definition (by Fingerprint); persisted across invocations and across users sharing a ruleset, not just cached for one process (FR-012, FR-016) | +| **Partial persisted coverage** | A personal/shared/bundled analysis covering only some `ruleId`s short-circuits just those rules; the rest proceed through Stages 1–3 normally (FR-015) | +| **Remote write-back** | Never automatic for a non-writable (e.g. GitHub-hosted) ruleset location — falls back to a local personal-override write plus emitted shared-file content for the user to commit (FR-019) | --- ## Implementation Notes -- **Deterministic for a given input state:** re-analysing the same ruleset content with the same persisted-analysis store always yields the same `RulesetAnalysis`. Stage 0 deliberately introduces store-dependence by design — a user's persisted correction is *supposed* to change the outcome on later runs; it does not undermine determinism, since the store itself is also keyed by content and changes only on an explicit user action (Stage 4). +- **Deterministic for a given input state:** re-analysing the same ruleset with the same shared/personal override data present always yields the same `RulesetAnalysis`. Stage 0 deliberately introduces store-dependence by design — a persisted correction (personal or shared) is *supposed* to change the outcome on later runs, including for colleagues who read the same shared file; it does not undermine determinism, since lookups are keyed by per-rule Fingerprint and change only via an explicit write (Stage 4). - **Total coverage:** every rule key present in the input ruleset produces exactly one `RuleAnalysis` (Stage 3 guarantees this) — satisfies SC-005. - **Spec-format agnostic:** operates on ruleset rule metadata, which is uniform across the OpenAPI and AsyncAPI built-in rulesets and any custom Spectral-compatible ruleset; Stage 2's segment sets and key-selector check are explicitly format-aware (covering both OpenAPI and AsyncAPI contract-surface terms) but require no spec-type branching in the algorithm itself. - **Conservative by design:** `unsafe`/`low` is the universal fallback, not an error condition. From 22daeeb144cbcc5e7419ede40c2daaae2cc5cfcc Mon Sep 17 00:00:00 2001 From: DawMatt Date: Wed, 24 Jun 2026 21:10:46 +1000 Subject: [PATCH 05/22] Safety assessment core algorithm clarification --- .../clarification-algorithm.md | 36 ++++++++++----- .../contracts/remediation-safety-surfaces.md | 8 ++-- specs/012-remediation-safety/data-model.md | 43 ++++++++++++----- specs/012-remediation-safety/quickstart.md | 13 +++--- specs/012-remediation-safety/research.md | 40 +++++++++++----- specs/012-remediation-safety/spec.md | 46 ++++++++++--------- 6 files changed, 119 insertions(+), 67 deletions(-) diff --git a/specs/012-remediation-safety/clarification-algorithm.md b/specs/012-remediation-safety/clarification-algorithm.md index a8b1266..081633a 100644 --- a/specs/012-remediation-safety/clarification-algorithm.md +++ b/specs/012-remediation-safety/clarification-algorithm.md @@ -43,9 +43,9 @@ The most effective fully automated approach is: For an unknown ruleset, the system should be designed to produce: -* `estimatedRisk` -* `confidence` -* `remediationSafetyLevel` +* `estimatedRisk` (low, medium, high) +* `confidence` (low, medium, high) +* `remediationSafetyLevel` (safe, humanreview, unsafe) rather than a false claim of certainty. @@ -98,7 +98,7 @@ OpenAPI formally describes HTTP API structure including `paths`, operations, par So you should classify targeted locations roughly like this: -#### High consumer-impact areas +##### High consumer-impact areas * `paths` keys * path template variables @@ -109,20 +109,20 @@ So you should classify targeted locations roughly like this: * security requirements * reusable schemas referenced by the above [\[swagger.io\]](https://swagger.io/specification/), [\[spec.openapis.org\]](https://spec.openapis.org/oas/v3.1.1.html) -#### Medium consumer-impact areas +##### Medium consumer-impact areas * `operationId` * tags or names used by codegen and docs * component identifiers used in client generation [\[swagger.io\]](https://swagger.io/specification/), [\[spec.openapis.org\]](https://spec.openapis.org/oas/v3.1.1.html) -#### Low consumer-impact areas +##### Low consumer-impact areas * descriptions * contact metadata * licence metadata * summaries, where not used as identifiers [\[github.com\]](https://github.com/stoplightio/spectral/blob/develop/docs/getting-started/3-rulesets.md), [\[docs.stoplight.io\]](https://docs.stoplight.io/docs/spectral/branches/develop/d3482ff0ccae9-rules) -#### Important example +##### Important example OpenAPI path templating requires that each template expression correspond to a path parameter. That means a rule targeting path-template consistency is touching a real contract concern, even if worded as “correctness” rather than “compatibility”. [\[swagger.io\]](https://swagger.io/specification/), [\[spec.openapis.org\]](https://spec.openapis.org/oas/v3.1.1.html) @@ -134,7 +134,7 @@ AsyncAPI formally describes channels, operations, messages, and action semantics So you should classify: -### High consumer-impact areas +##### High consumer-impact areas * channel `address` * channel parameters tied to address placeholders @@ -143,7 +143,7 @@ So you should classify: * messages and payload schemas * reply or operation semantics if covered by the ruleset [\[asyncapi.com\]](https://www.asyncapi.com/docs/reference/specification/v3.1.0), [\[asyncapi.com\]](https://www.asyncapi.com/docs/concepts/asyncapi-document/dynamic-channel-address), [\[asyncapi.com\]](https://www.asyncapi.com/docs/concepts/asyncapi-document/adding-operations) -#### Low consumer-impact areas +##### Low consumer-impact areas * metadata such as descriptions and contact details [\[asyncapi.com\]](https://www.asyncapi.com/docs/reference/specification/v3.1.0) @@ -208,7 +208,7 @@ This is exactly why your algorithm must output **risk plus confidence**, not jus *** -### 5. Separate risk from confidence +### 5. Separate risk from confidence from safety level This is essential. @@ -225,6 +225,20 @@ A rule can be: Without this separation, the system will either over-block harmless changes or under-block dangerous ones. +The implementation shall decide remediation safety as follows: +If estimatedRisk = Low and confidence in {High, Medium}: + remediationSafety = safe + +Else if estimatedRisk = Medium and confidence = High: + automationDecision = human review + +Else if estimatedRisk = High: + remediationSafety = unsafe + +Else: + remediationSafety = human review + + ## References - [OpenAPI Breaking Changes: The Complete List of Rules | oasdiff](https://www.oasdiff.com/docs/breaking-changes) @@ -234,7 +248,7 @@ Without this separation, the system will either over-block harmless changes or u ## Additional Context -## Spectral Rule Facts +### Spectral Rule Facts * Spectral rules are built from selectors and functions, and rulesets can extend built-in format-specific support for OpenAPI and AsyncAPI. [\[github.com\]](https://github.com/stoplightio/spectral/blob/develop/docs/getting-started/3-rulesets.md), [\[docs.stoplight.io\]](https://docs.stoplight.io/docs/spectral/branches/develop/d3482ff0ccae9-rules) * Spectral supports custom JavaScript functions. [\[github.com\]](https://github.com/stoplightio/spectral/blob/develop/docs/guides/5-custom-functions.md) diff --git a/specs/012-remediation-safety/contracts/remediation-safety-surfaces.md b/specs/012-remediation-safety/contracts/remediation-safety-surfaces.md index d0ddfc5..17982cf 100644 --- a/specs/012-remediation-safety/contracts/remediation-safety-surfaces.md +++ b/specs/012-remediation-safety/contracts/remediation-safety-surfaces.md @@ -7,8 +7,8 @@ Supersedes `specs/011-remediation-safety-rename/contracts/remediation-safety-sur | Before this feature | After this feature | |---|---| | Accepts only `safe`; any other value rejected with `Error: --remediation-safety must be "safe".` | Accepts `safe`, `humanreview`, `unsafe`. Any other value rejected with `Error: --remediation-safety must be one of: safe, humanreview, unsafe.` | -| Filtered output built by `buildQuickFixOutput`/`formatQuickFixesHuman`, shape `QuickFixOutput` (`quickFixCount`, `quickFixes`). | Filtered output built by `buildRemediationSafetyOutput`/`formatRemediationSafetyHuman`, shape `RemediationSafetyOutput` (`remediationItemCount`, `remediationItems`, `requestedLevel`). Each item additionally carries `riskLevel` and `confidenceLevel`. | -| `--remediation-safety safe` output identical to pre-Feature-12 `safe` output in violation membership. | Unchanged for `safe` membership (FR-007); new fields (`riskLevel`, `confidenceLevel`, `requestedLevel`) are additive. | +| Filtered output built by `buildQuickFixOutput`/`formatQuickFixesHuman`, shape `QuickFixOutput` (`quickFixCount`, `quickFixes`). | Filtered output built by `buildRemediationSafetyOutput`/`formatRemediationSafetyHuman`, shape `RemediationSafetyOutput` (`remediationItemCount`, `remediationItems`, `requestedLevel`). Each item additionally carries `riskLevel` (`low`/`medium`/`high`), `confidenceLevel`, and `remediationSafetyLevel` (`safe`/`humanreview`/`unsafe` — a field in its own right, not the same field/type as `riskLevel`). | +| `--remediation-safety safe` output identical to pre-Feature-12 `safe` output in violation membership. | Unchanged for `safe` membership (FR-007); new fields (`riskLevel`, `confidenceLevel`, `remediationSafetyLevel`, `requestedLevel`) are additive. `--remediation-safety`/`requestedLevel` filter against `remediationSafetyLevel`, not `riskLevel`. | ## CLI: new `ruleset-analysis` subcommand @@ -18,7 +18,7 @@ api-grade ruleset-analysis [--ruleset-path ] [--format json|human] - Without `--ruleset-path`, analyses the built-in default ruleset for the relevant format(s). - `--format json` returns a `RulesetAnalysis` JSON document. -- `--format human` (default) prints a table: rule id, risk level, confidence level, rationale. +- `--format human` (default) prints a table: rule id, risk level, confidence level, remediation safety level, rationale. Risk level and confidence level are the two independent signals the analyser produces (FR-003); remediation safety level is a field in its own right, derived from them via the decision matrix in `automated_remediation_safety_algorithm_spec.md`, not assigned directly. - Exits non-zero only on a genuine error (e.g. ruleset file not found / unparseable) — analysis itself never partially fails (every rule gets an entry, per FR-001/SC-005). ## MCP: `grade-api-remediation-safety` tool — `level` parameter @@ -26,7 +26,7 @@ api-grade ruleset-analysis [--ruleset-path ] [--format json|human] | Before this feature | After this feature | |---|---| | `level: z.enum(['safe'])` | `level: z.enum(['safe', 'humanreview', 'unsafe'])` | -| Response payload: `QuickFixOutput` shape under different field names (`quickFixCount`, `quickFixes`) | Response payload: `RemediationSafetyOutput` shape (`remediationItemCount`, `remediationItems`, `requestedLevel`); each item includes `riskLevel`, `confidenceLevel` | +| Response payload: `QuickFixOutput` shape under different field names (`quickFixCount`, `quickFixes`) | Response payload: `RemediationSafetyOutput` shape (`remediationItemCount`, `remediationItems`, `requestedLevel`); each item includes `riskLevel`, `confidenceLevel`, `remediationSafetyLevel` | | Tool description silent on confidence/risk-tier concept | Tool description updated to mention all three levels and that each returned item carries a confidence indicator | ## MCP: new `analyse-ruleset-safety` tool diff --git a/specs/012-remediation-safety/data-model.md b/specs/012-remediation-safety/data-model.md index e1f7e56..bd6a6b9 100644 --- a/specs/012-remediation-safety/data-model.md +++ b/specs/012-remediation-safety/data-model.md @@ -4,13 +4,17 @@ Enum string: `"safe"` | `"humanreview"` | `"unsafe"`. Ordered from least to most cautious. Replaces the prior two-class `ViolationClass` (`nonBreaking`/`breaking`/`unknown`). +## RiskLevel + +Enum string: `"low"` | `"medium"` | `"high"`. The analyser's **estimate of consumer-impact likelihood** for the minimal edit that would satisfy a rule — independent of how confident the analyser is in that estimate (`ConfidenceLevel`, below) and independent of the `RemediationSafetyLevel` it resolves to (see Decision Matrix). Carried on the `riskLevel` field of both `RuleAnalysis` and `RemediationItem` — a deliberately distinct field, with a distinct type and distinct values, from each entity's separate `remediationSafetyLevel` field. An earlier version of this document conflated the two under one field also named `riskLevel` but typed as `RemediationSafetyLevel`; that was incorrect and is corrected throughout this document. + ## ConfidenceLevel -Enum string: `"high"` | `"medium"` | `"low"`. Describes how confident the ruleset analyser is in a `RuleAnalysis`'s assigned `RemediationSafetyLevel`. +Enum string: `"high"` | `"medium"` | `"low"`. Describes how confident the ruleset analyser is in a `RuleAnalysis`'s/`RemediationItem`'s assigned `riskLevel` — **not** directly in `remediationSafetyLevel`, though it feeds into deriving that value via the Decision Matrix below. -- `high` — the rule id matched a curated, known table (Stage 1), the rule's `given` selected path/channel object keys directly (Stage 2a), or the entry came from a persisted user correction or bundled pre-calculated default (Stage 0). -- `medium` — classification came from the generic path-segment heuristic only (Stage 2b), with a single, unambiguous tier match. -- `low` — either no recognizable signal at all (Stage 3 fallback), or the path-segment heuristic matched more than one tier (genuine ambiguity, downgraded from `medium`). +- `high` — the rule id matched a curated, known table (Stage 1); the rule's `given` selected path/channel object keys directly (Stage 2a); a recognized function (`truthy`/`pattern`/etc.) targeted an ontology area matching exactly one tier (Stage 2b); or the entry came from a persisted user correction or bundled pre-calculated default (Stage 0). +- `medium` — a recognized function's target spanned more than one ontology tier (Stage 2b), or the generic segment fallback matched a single, unambiguous tier (Stage 2c). +- `low` — the rule's function is unrecognized/custom (Stage 2b), the generic segment fallback matched more than one tier (Stage 2c, genuine ambiguity), or no recognizable signal at all (Stage 3 fallback). ## AnalysisSource @@ -23,12 +27,26 @@ One entry per rule in an analysed ruleset. | Field | Type | Description | |---|---|---| | `ruleId` | string | The rule's identifier within its ruleset. | -| `riskLevel` | `RemediationSafetyLevel` | The assigned risk level for auto-remediating violations of this rule. | -| `confidenceLevel` | `ConfidenceLevel` | Confidence in `riskLevel`. | -| `rationale` | string | Short human-readable explanation of why this level/confidence was assigned (e.g. "rule id matched curated safe-prefix table" or "given path touches `parameters` and `description` — conservative match, ambiguous"). | +| `riskLevel` | `RiskLevel` \| `null` | The analyser's estimate of consumer-impact likelihood (Stages 1–3 only — see Decision Matrix below). `null` for `source: "persisted"` / `"bundled-default"` entries, which store a human-confirmed or pre-computed `remediationSafetyLevel` directly rather than deriving it from a risk estimate. | +| `confidenceLevel` | `ConfidenceLevel` | Confidence in `riskLevel` (Stages 1–3), or the confidence carried over from a Stage 0 entry. | +| `remediationSafetyLevel` | `RemediationSafetyLevel` | A field in its own right — the final assigned safety level for auto-remediating violations of this rule. For Stages 1–3, derived from `riskLevel` + `confidenceLevel` via the Decision Matrix below — never assigned directly by a stage. For Stage 0 entries, this is the stored value itself. | +| `rationale` | string | Short human-readable explanation of why this level/confidence was assigned (e.g. "rule id matched curated safe-prefix table" or "`pattern` function on a `paths` object key — public-surface rename"). | | `source` | `AnalysisSource` | Which stage produced this entry — see above. | -**Validation rules**: every rule present in the input ruleset MUST produce exactly one `RuleAnalysis` entry (FR-001, SC-005) — no rule is ever omitted from analyser output. +**Validation rules**: every rule present in the input ruleset MUST produce exactly one `RuleAnalysis` entry (FR-001, SC-005) — no rule is ever omitted from analyser output. For `source` values `"curated"`, `"heuristic"`, and `"fallback"` (Stages 1–3), `remediationSafetyLevel` MUST equal `decisionMatrix(riskLevel, confidenceLevel)` — it is a derived value, not independently settable. + +### Decision Matrix + +The single function shared by Stages 1–3 to derive `remediationSafetyLevel` from `riskLevel` and `confidenceLevel`, taken verbatim from `clarification-algorithm.md` §5 (see `research.md` §3 for how each stage produces its `riskLevel`/`confidenceLevel` inputs): + +``` +If riskLevel = low and confidenceLevel in {high, medium}: remediationSafetyLevel = safe +Else if riskLevel = medium and confidenceLevel = high: remediationSafetyLevel = humanreview +Else if riskLevel = high: remediationSafetyLevel = unsafe +Else: remediationSafetyLevel = humanreview +``` + +This table is total over the 3×3 `(riskLevel, confidenceLevel)` space — every combination not explicitly listed (`low`/`low`, `medium`/`medium`, `medium`/`low`) falls into the final `Else` and resolves to `humanreview`, so there is no input pair this function leaves unresolved. ## RuleFingerprint @@ -91,8 +109,9 @@ For a given `ruleId`, checked in order until one matches a current `RuleFingerpr | `location` | string | Unchanged. | | `currentValue` | string \| null | Unchanged. | | `expectedImprovement` | string | Unchanged. | -| `riskLevel` | `RemediationSafetyLevel` | **New** — the violation's computed remediation safety, looked up from the rule's `RuleAnalysis`. | +| `riskLevel` | `RiskLevel` \| `null` | **New** — the violation's rule-level estimated risk (`low`/`medium`/`high`), looked up from the rule's `RuleAnalysis`. `null` when the lookup hit a Stage 0 entry that has no `riskLevel` of its own (see `RuleAnalysis`). | | `confidenceLevel` | `ConfidenceLevel` | **New** — confidence behind `riskLevel`, from the same lookup. | +| `remediationSafetyLevel` | `RemediationSafetyLevel` | **New** — a field in its own right, distinct from `riskLevel` both in name and in type/values (`safe`/`humanreview`/`unsafe`, not `low`/`medium`/`high`). The violation's computed remediation safety, looked up from the rule's `RuleAnalysis.remediationSafetyLevel`. This is the field `--remediation-safety`/`level` filtering matches against. | ## RemediationSafetyOutput (was `QuickFixOutput`) @@ -109,6 +128,6 @@ For a given `ruleId`, checked in order until one matches a current `RuleFingerpr ## Lookup / default behavior -`getRemediationSafety(diagnostic, rulesetAnalysis) -> { riskLevel, confidenceLevel }`: -- If `rulesetAnalysis.rules` contains an entry for `diagnostic.ruleId`, return its `riskLevel`/`confidenceLevel`. -- Otherwise (FR-009), return `{ riskLevel: "unsafe", confidenceLevel: "low" }`. +`getRemediationSafety(diagnostic, rulesetAnalysis) -> { riskLevel, confidenceLevel, remediationSafetyLevel }`: +- If `rulesetAnalysis.rules` contains an entry for `diagnostic.ruleId`, return its `riskLevel`/`confidenceLevel`/`remediationSafetyLevel` verbatim — all three are carried through to `RemediationItem` unchanged. +- Otherwise (FR-009), return `{ riskLevel: "high", confidenceLevel: "low", remediationSafetyLevel: "unsafe" }` — equivalent to a synthetic Stage 3 entry run through the Decision Matrix. diff --git a/specs/012-remediation-safety/quickstart.md b/specs/012-remediation-safety/quickstart.md index 7e699cd..47fa981 100644 --- a/specs/012-remediation-safety/quickstart.md +++ b/specs/012-remediation-safety/quickstart.md @@ -8,7 +8,7 @@ api-grade openapi.yaml --remediation-safety humanreview # new api-grade openapi.yaml --remediation-safety unsafe # new ``` -Each returned item now includes `riskLevel` and `confidenceLevel`: +Each returned item now includes `riskLevel`, `confidenceLevel`, and `remediationSafetyLevel` — three separate fields, not one. `riskLevel` is `low`/`medium`/`high`; `remediationSafetyLevel` is `safe`/`humanreview`/`unsafe` and is what `--remediation-safety`/`requestedLevel` filters against: ```json { @@ -20,8 +20,9 @@ Each returned item now includes `riskLevel` and `confidenceLevel`: "remediationItems": [ { "ruleId": "operation-operationId", - "riskLevel": "humanreview", + "riskLevel": "medium", "confidenceLevel": "high", + "remediationSafetyLevel": "humanreview", "...": "..." } ] @@ -32,10 +33,10 @@ Each returned item now includes `riskLevel` and `confidenceLevel`: ```bash api-grade ruleset-analysis --format human -# rule id risk confidence rationale -# operation-description safe high rule id matched curated safe-prefix table -# operation-operationId humanreview high rule id matched curated humanreview-prefix table -# oas3-schema unsafe low no recognizable rule-id or path signal +# rule id risk level confidence remediation safety rationale +# operation-description low high safe rule id matched curated safe-prefix table +# operation-operationId medium high humanreview rule id matched curated humanreview-prefix table +# oas3-schema high low unsafe no recognizable rule-id, function, or path signal api-grade ruleset-analysis --ruleset-path ./my-ruleset.yaml --format json ``` diff --git a/specs/012-remediation-safety/research.md b/specs/012-remediation-safety/research.md index 63e89ce..7cc85db 100644 --- a/specs/012-remediation-safety/research.md +++ b/specs/012-remediation-safety/research.md @@ -12,31 +12,47 @@ ## 2. Confidence scale -**Decision**: Three discrete levels — `high`, `medium`, `low` — mirroring the project's existing preference for small, explainable categories over numeric scores (same pattern as `ImpactLevel`, `DiagnosticSeverityLevel`). +**Decision**: Three discrete levels — `high`, `medium`, `low` — mirroring the project's existing preference for small, explainable categories over numeric scores (same pattern as `ImpactLevel`, `DiagnosticSeverityLevel`). The `riskLevel` field (§3) uses the same three-level shape (`low`/`medium`/`high`) for consistency, since both are inputs to the same decision matrix and a mismatched scale (e.g. risk on a 5-point scale, confidence on 3) would have no principled justification. -**Rationale**: Constitution Principle VI favors explanation over raw scores; a numeric 0–1 confidence would need its own thresholds restated everywhere it's displayed, with no added value for a binary "trust this / verify this" decision a user actually makes. +**Rationale**: Constitution Principle VI favors explanation over raw scores; a numeric 0–1 confidence would need its own thresholds restated everywhere it's displayed, with no added value for a binary "trust this / verify this" decision a user actually makes. `confidenceLevel` is not merely descriptive metadata — per §3's decision matrix it is one of two inputs that determine `remediationSafetyLevel`, so its discreteness also keeps the matrix a small, exhaustively-enumerable table rather than a continuous function needing its own threshold tuning. **Alternatives considered**: Continuous 0–100 confidence score — rejected as over-precise for a heuristic, rule-metadata-only analyser, and inconsistent with how grades/impact are presented elsewhere in the project. ## 3. Three-tier risk classification algorithm -**Decision**: Extend the existing two-stage quick-fixes algorithm (`specs/algorithms/quick_fixes_algorithm_spec.md`) from two outcome classes (`nonBreaking`/`breaking`, plus `unknown`) to three risk levels (`safe`/`humanreview`/`unsafe`), operating on **rule metadata** (`ruleId`, the rule's `given` JSONPath expression(s), `then.function`) rather than a violation's instance path: +**Decision**: Extend the existing two-stage quick-fixes algorithm (`specs/algorithms/quick_fixes_algorithm_spec.md`) from two outcome classes (`nonBreaking`/`breaking`, plus `unknown`) to a model with three **independent** signals per rule — `riskLevel` (`low`/`medium`/`high`), `confidenceLevel` (`high`/`medium`/`low`), and `remediationSafetyLevel` (`safe`/`humanreview`/`unsafe`) **derived from the other two via a fixed decision matrix, as a field in its own right** — operating on **rule metadata** (`ruleId`, the rule's `given` JSONPath expression(s), `then.function`) rather than a violation's instance path: -- **Stage 1 — curated rule-id tables** (high confidence): extend the existing `safe`-prefix table; add a new `humanreview`-prefix table for rules whose fixes are typically additive but operationally significant (e.g. `operation-operationId`, `oas3-server-not-example-com`, security/server-related rules). Anything not listed falls through. -- **Stage 2 — path-segment heuristic on the rule's `given`** (medium confidence): extend `BREAKING_SEGMENTS`/`NON_BREAKING_SEGMENTS` into three tiers — `UNSAFE_SEGMENTS` (`required`, `type`, `format`, `parameters`), `HUMANREVIEW_SEGMENTS` (`enum`, `default`, `security`, `servers`, `operationId`, `additionalProperties`, `responses`), `SAFE_SEGMENTS` (existing list). Checked most-conservative-first (`unsafe` > `humanreview` > `safe`). If a rule's `given` matches segments from **more than one tier**, the most conservative matched tier is still chosen, but confidence is downgraded one step (e.g. `medium` → `low`) to flag the genuine ambiguity for a ruleset maintainer. -- **Stage 3 — fallback** (low confidence): no rule-id or path signal recognized (e.g. a custom rule, or a whole-document rule like `given: "$"`) → defaults to `unsafe` with `low` confidence. Conservative-by-default, matching the existing project philosophy ("absence of a safety signal is never treated as evidence of safety"). +- **Stage 1 — curated rule-id tables** (confidence: `high`): extend the existing `safe`-prefix table (maps to `riskLevel: low`) and add a new `humanreview`-prefix table for rules whose fixes are typically additive but operationally significant, e.g. `operation-operationId`, `oas3-server-not-example-com`, security/server-related rules (maps to `riskLevel: medium`). Anything not listed falls through to Stage 2. +- **Stage 2 — automated estimation from rule mechanics + contract-surface ontology** (confidence: `high`/`medium`/`low`, see below), checked in order: + - **2a. Key-selector check**: a `given` expression that selects object *keys* (the JSONPath Plus `~` modifier) under `paths` or `channels` → `riskLevel: high`, confidence `high`. Renaming a path or channel key is a public-surface rename by construction (clarification document Example B), independent of which function is applied. + - **2b. Function-mechanics classification of `then.function`** — infers the *likely minimal edit*, per the clarification document's "infer the minimal satisfying edit" step, then estimates risk from where that edit lands on the contract-surface ontology (§"Build a format-aware contract-surface ontology"): + - *Additive* functions (`truthy`, `defined`, `field`+`truthy` on a sub-field) imply "add/populate a field". Base `riskLevel: low`; escalated to `medium` if the targeted field is itself in `HUMANREVIEW_SEGMENTS`, or to `high` if in `UNSAFE_SEGMENTS` (e.g. `truthy` on a `parameters` entry is a different risk than `truthy` on `$.info.description`). + - *Rename/reformat* functions (`pattern`, `casing`) imply "rename or reformat the targeted value". Base `riskLevel: medium`; escalated to `high` when the target is a high-impact ontology area (path/channel keys, parameters, security, request/response schemas); de-escalated to `low` only for low-impact metadata (description, contact, licence). + - Confidence for both: `high` when the function+target combination matches exactly one ontology tier unambiguously; `medium` when it matches but spans tiers; `low` when the function is unrecognized. + - *Custom JavaScript* functions: mechanics cannot be inferred statically (clarification document, "If the rule uses a custom JavaScript function"). `riskLevel: high`, confidence `low` — conservative by construction, per the constraint that custom functions "are arbitrary JavaScript functions" whose remediation intent cannot be derived from the declaration alone. + - **2c. Generic segment fallback within Stage 2**: for a rule whose function isn't recognized as additive/rename/custom but whose `given` still matches a known segment tier, `riskLevel` follows the matched tier (`UNSAFE_SEGMENTS` → `high`, `HUMANREVIEW_SEGMENTS` → `medium`, `SAFE_SEGMENTS` → `low`), confidence `medium` for a single unambiguous tier match, downgraded to `low` if the `given` matches segments from more than one tier (genuine ambiguity). + - `UNSAFE_SEGMENTS`/`HUMANREVIEW_SEGMENTS`/`SAFE_SEGMENTS` are the same three tiers as before, extended for AsyncAPI (see the corrected paragraph below). +- **Stage 3 — fallback** (confidence: `low`): no rule-id, function, or path signal recognized at all (e.g. a whole-document rule like `given: "$"`) → `riskLevel: high`. Conservative-by-default, matching the existing project philosophy ("absence of a safety signal is never treated as evidence of safety"); the decision matrix below resolves `high` risk to `unsafe` regardless of confidence, so this is equivalent to today's hard-coded fallback. +- **Decision matrix** (applied uniformly to the output of Stages 1–3 to produce `remediationSafetyLevel`, taken verbatim from `clarification-algorithm.md` §5): + ``` + If riskLevel = low and confidenceLevel in {high, medium}: remediationSafetyLevel = safe + Else if riskLevel = medium and confidenceLevel = high: remediationSafetyLevel = humanreview + Else if riskLevel = high: remediationSafetyLevel = unsafe + Else: remediationSafetyLevel = humanreview + ``` + This table is total over the 3×3 input space (every other combination — e.g. `low`/`low`, `medium`/`medium`, `medium`/`low` — falls into the `Else` branch and resolves to `humanreview`), so no additional default is needed beyond Stage 3's `high`/`low` fallback. -**Rationale**: Reuses a proven, explainable, deterministic pattern already accepted by the project (and by users reading `quick_fixes_algorithm_spec.md`) rather than inventing a new paradigm; the three-tier extension is the minimal change that satisfies FR-002 while preserving FR-007 (no regression for `safe`). +**Rationale**: Reuses a proven, explainable, deterministic pattern already accepted by the project (and by users reading `quick_fixes_algorithm_spec.md`) rather than inventing a new paradigm; running every stage's output through one shared decision-matrix function (rather than having curated tables assign `remediationSafetyLevel` directly) keeps `riskLevel` and `confidenceLevel` as genuinely independent signals end-to-end, satisfying FR-002/FR-003 without a separate code path per stage, and keeps `remediationSafetyLevel` a field in its own right rather than a relabeling of `riskLevel`. -**Alternatives considered**: Statistical/ML classification over rule descriptions — rejected: nondeterministic, costly, and violates Constitution Principle V (zero-cost prerequisites) if it requires an external model; also harder to explain ("rationale" requirement, FR-003) than a deterministic rule table. +**Alternatives considered**: Statistical/ML classification over rule descriptions — rejected: nondeterministic, costly, and violates Constitution Principle V (zero-cost prerequisites) if it requires an external model; also harder to explain ("rationale" requirement, FR-003) than a deterministic rule table. Having Stage 1's curated tables assign `remediationSafetyLevel` directly (the original design) — rejected after reassessment below: it conflated risk and safety level exactly the way the clarification document warns against, and gave Stage 1 a different output shape than Stages 2–3. -**Reassessed against `clarification-algorithm.md` — gap found and corrected**: the prior segment tables (`UNSAFE_SEGMENTS`/`HUMANREVIEW_SEGMENTS`/`SAFE_SEGMENTS`) were carried over unchanged from the OpenAPI-only quick-fixes algorithm and contained no AsyncAPI-specific terms, despite Constitution Principle I requiring format-neutral treatment and the clarification document dedicating an explicit section ("Build a format-aware contract-surface ontology") to AsyncAPI's high-impact surfaces — channel `address`, channel parameters, operation `action`, the operation-channel relationship, and messages/payload schemas. The clarification document's worked example (Example B: a `pattern` rule on `$.paths[*]~`, the object-*key* selector) is also not caught by plain segment-membership matching, since `paths`/`channels` were never in any tier as bare segments — adding them as ordinary segments would over-match every rule that merely reads something nested under a path/channel (including safe ones like `operation-description`). Two corrections folded into the updated algorithm spec: (a) extend `UNSAFE_SEGMENTS` with AsyncAPI's high-impact segment terms (`address`, `action`, `messages`, `payload`) alongside the existing OpenAPI ones, and add `channels`/`operations`/`reply` to `HUMANREVIEW_SEGMENTS` as broader/ambiguous AsyncAPI surfaces; (b) add a dedicated **key-selector check** ahead of generic segment matching — a `given` expression that selects object *keys* (the JSONPath Plus `~` modifier) under `paths` or `channels` is always `unsafe`/`high` confidence regardless of segment membership, since renaming a path or channel key is a public-surface rename by construction, matching the clarification document's Example B directly. +**Reassessed against `clarification-algorithm.md` — gap found and corrected**: the prior segment tables (`UNSAFE_SEGMENTS`/`HUMANREVIEW_SEGMENTS`/`SAFE_SEGMENTS`) were carried over unchanged from the OpenAPI-only quick-fixes algorithm and contained no AsyncAPI-specific terms, despite Constitution Principle I requiring format-neutral treatment and the clarification document dedicating an explicit section ("Build a format-aware contract-surface ontology") to AsyncAPI's high-impact surfaces — channel `address`, channel parameters, operation `action`, the operation-channel relationship, and messages/payload schemas. The clarification document's worked example (Example B: a `pattern` rule on `$.paths[*]~`, the object-*key* selector) is also not caught by plain segment-membership matching, since `paths`/`channels` were never in any tier as bare segments — adding them as ordinary segments would over-match every rule that merely reads something nested under a path/channel (including safe ones like `operation-description`). Two corrections folded into the updated algorithm spec: (a) extend `UNSAFE_SEGMENTS` with AsyncAPI's high-impact segment terms (`address`, `action`, `messages`, `payload`) alongside the existing OpenAPI ones, and add `channels`/`operations`/`reply` to `HUMANREVIEW_SEGMENTS` as broader/ambiguous AsyncAPI surfaces; (b) add a dedicated **key-selector check** ahead of generic segment matching (Stage 2a above), since renaming a path or channel key is a public-surface rename by construction, matching the clarification document's Example B directly. -**Reassessed against `clarification-algorithm.md`**: confirmed and unchanged. The document's "Recommended High Level Estimating Model Approach" independently arrives at the same separation of "risk" from "confidence" (§5 of that document) and the same rationale — a rule can be high-risk/low-confidence or low-risk/high-confidence, and conflating the two either over-blocks harmless changes or under-blocks dangerous ones. No revision needed. +**Reassessed a second time against `clarification-algorithm.md` — second gap found and corrected**: an earlier pass of this document claimed the risk/confidence/safety separation was "confirmed and unchanged," reasoning that confidence-as-an-explanatory-annotation alongside a directly-assigned three-value `riskLevel` field already satisfied the clarification document's §5 ("Separate risk from confidence from safety level"). That reassessment was made before `clarification-algorithm.md` was edited to add its explicit `estimatedRisk`/`confidence`/`remediationSafetyLevel` field list and decision-matrix pseudocode (§5), and did not survive contact with that addition: the original design had no risk-estimate field independent of the final classification at all — Stages 1–3 assigned the final three-value level directly under the name `riskLevel`, and `confidence` never changed which bucket a rule landed in, only how the assignment was explained. That is precisely the conflation the clarification document warns produces "high risk, low confidence" and "low risk, high confidence" cases that a system without the separation cannot represent (its own worked examples in §5). The revision above restores the three independent signals — renaming the low/medium/high estimate to `riskLevel` and giving the derived safe/humanreview/unsafe value its own field, `remediationSafetyLevel`, rather than the two sharing one overloaded field name — and the literal decision matrix, and also folds in the clarification document's "Recommended High Level Estimating Model Approach" steps 3–4 (infer the minimal satisfying edit; estimate whether it touches public-contract elements) and the "Recommended approach for a new ruleset" step 3 (infer likely remediation from rule mechanics — function semantics, not just rule-id/segment matching), neither of which the prior design implemented either. ## 4. Default/fallback behavior when a rule has no analysis -**Decision**: Any violation whose `ruleId` is absent from the cached `RulesetAnalysis` (e.g. ruleset changed between analysis and grading) defaults to `unsafe` / `low` confidence at lookup time, not just at Stage 3 of the analyser itself. +**Decision**: Any violation whose `ruleId` is absent from the cached `RulesetAnalysis` (e.g. ruleset changed between analysis and grading) defaults to `riskLevel: high` / `confidenceLevel: low` / `remediationSafetyLevel: unsafe` at lookup time, not just at Stage 3 of the analyser itself — this is the same triple Stage 3 produces, computed directly rather than via the decision matrix, since there is no rule metadata at all to run the matrix against. **Rationale**: Directly required by FR-009 and the spec's first Edge Case; keeps the conservative-by-default guarantee end-to-end, not just inside the analyser. @@ -53,7 +69,7 @@ | `buildQuickFix()` | `buildRemediationItem()` | | `buildQuickFixOutput()` | `buildRemediationSafetyOutput()` | | `formatQuickFixesHuman()` | `formatRemediationSafetyHuman()` | -| Types: `QuickFix`, `QuickFixOutput`, `ViolationClass` | `RemediationItem`, `RemediationSafetyOutput`, `RemediationSafetyLevel` (3-value), plus new `ConfidenceLevel`, `RuleAnalysis`, `RulesetAnalysis` | +| Types: `QuickFix`, `QuickFixOutput`, `ViolationClass` | `RemediationItem`, `RemediationSafetyOutput`, `RemediationSafetyLevel` (3-value, the `remediationSafetyLevel` field), plus new `RiskLevel` (3-value, the `riskLevel` field — distinct field and type from `RemediationSafetyLevel`), `ConfidenceLevel`, `RuleAnalysis`, `RulesetAnalysis` | | `packages/api-grade-mcp/src/tools/quick-fixes-only.ts`, `registerQuickFixesOnlyTool` | `remediation-safety.ts`, `registerRemediationSafetyTool` | | `tests/integration/cli-quick-fixes.test.ts`, `packages/api-grade-mcp/tests/integration/quick-fixes-only.test.ts`, `packages/api-grade-core/tests/unit/quick-fixes.test.ts` | renamed to `cli-remediation-safety.test.ts`, `remediation-safety.test.ts`, `remediation-safety.test.ts` | diff --git a/specs/012-remediation-safety/spec.md b/specs/012-remediation-safety/spec.md index b1061c7..8228856 100644 --- a/specs/012-remediation-safety/spec.md +++ b/specs/012-remediation-safety/spec.md @@ -29,18 +29,18 @@ A developer grading their API spec wants to know, for every violation, how risky ### User Story 2 - Ruleset maintainer trusts the analyser's classification because confidence is shown (Priority: P2) -A team supplies its own custom Spectral ruleset. They want to understand, rule by rule, why the analyser assigned a given risk level, and how confident the analyser is in that assignment, so they can spot-check or override classifications they disagree with before relying on them in CI. +A team supplies its own custom Spectral ruleset. They want to understand, rule by rule, why the analyser assigned a given remediation safety level, and how confident the analyser is in that assignment, so they can spot-check or override classifications they disagree with before relying on them in CI. **Why this priority**: Confidence is what makes the analyser trustworthy for custom/third-party rulesets where the built-in heuristics may not apply cleanly; without it, users have no way to judge whether a `safe` label is well-founded. -**Independent Test**: Run the ruleset analyser against both the built-in ruleset and a custom ruleset containing rules with no recognizable pattern. Confirm every rule receives a risk level and a confidence level, and that unrecognized/ambiguous rules receive a visibly lower confidence than well-known rules. +**Independent Test**: Run the ruleset analyser against both the built-in ruleset and a custom ruleset containing rules with no recognizable pattern. Confirm every rule receives a risk level, a confidence level, and a remediation safety level, and that unrecognized/ambiguous rules receive a visibly lower confidence than well-known rules. **Acceptance Scenarios**: -1. **Given** a ruleset is analysed, **When** the analysis completes, **Then** every rule in the ruleset has an assigned risk level (`safe`, `humanreview`, or `unsafe`) and a confidence level for that assignment. -2. **Given** a rule the analyser cannot confidently classify (e.g. a custom rule with no recognizable id pattern or schema path), **When** it is analysed, **Then** it is still assigned a risk level (defaulting to the more conservative `unsafe` or `humanreview`) but with a low confidence indicator, rather than being silently omitted. -3. **Given** the analyser's output for a ruleset, **When** a user inspects it (JSON or human format), **Then** they can see, per rule, the risk level, confidence level, and a brief rationale. -4. **Given** a user disagrees with a rule's assigned risk level and persists a correction for it, **When** the same ruleset is analysed again in a later, separate invocation, **Then** the corrected risk level is returned for that rule without requiring the correction to be re-applied. +1. **Given** a ruleset is analysed, **When** the analysis completes, **Then** every rule in the ruleset has an assigned remediation safety level (`safe`, `humanreview`, or `unsafe`) and a confidence level for that assignment. +2. **Given** a rule the analyser cannot confidently classify (e.g. a custom rule with no recognizable id pattern or schema path), **When** it is analysed, **Then** it is still assigned a remediation safety level (defaulting to the more conservative `unsafe` or `humanreview`) but with a low confidence indicator, rather than being silently omitted. +3. **Given** the analyser's output for a ruleset, **When** a user inspects it (JSON or human format), **Then** they can see, per rule, the risk level, confidence level, remediation safety level, and a brief rationale. +4. **Given** a user disagrees with a rule's assigned remediation safety level and persists a correction for it, **When** the same ruleset is analysed again in a later, separate invocation, **Then** the corrected remediation safety level is returned for that rule without requiring the correction to be re-applied. --- @@ -63,7 +63,8 @@ A new contributor or documentation reader should encounter "remediation safety" - What happens when a violation's rule was never analysed (e.g. ruleset changed between analysis and grading, or a dynamically generated rule id)? The system must assign a safe default (most conservative: `unsafe`) rather than crash or silently drop the violation from output. - How does the analyser handle a rule that legitimately spans multiple risk levels depending on context (e.g. a rule that sometimes flags a breaking change and sometimes a cosmetic one)? The specification (`automated_remediation_safety_algorithm_spec.md`) must define how such rules are classified — at the rule level, the analyser assigns one risk/confidence pair per rule; finer-grained, per-violation distinctions are out of scope for the analyser itself. -- What happens when a custom/private ruleset is supplied that the analyser has never seen before? It must still produce a complete classification (risk + confidence) for every rule, with confidence honestly reflecting the lack of prior knowledge, rather than failing the grading run. +- What happens when a custom/private ruleset is supplied that the analyser has never seen before? It must still produce a complete classification (risk level + confidence level + derived remediation safety level) for every rule, with confidence honestly reflecting the lack of prior knowledge, rather than failing the grading run. +- What happens when risk level and confidence level disagree (e.g. `medium` risk with only `low` confidence, or `low` risk with `low` confidence)? The decision matrix (FR-003, `automated_remediation_safety_algorithm_spec.md`) resolves every such combination to `humanreview`, not `safe` — low confidence in a risk estimate is never, by itself, grounds for the most permissive classification, consistent with the project's conservative-by-default posture (FR-009). - What happens to existing consumers (CI pipelines, scripts) that depend on today's binary `safe` vs "not safe" filtering? `--remediation-safety safe` (and equivalent MCP/package usage) must continue to mean exactly what it means today; the new levels are additive, not a breaking redefinition of `safe`. - What happens when a rule's definition changes after a colocated/shared or personal analysis entry was captured for it? That rule's Fingerprint no longer matches, so the stale entry is treated as not found for that rule alone (FR-014) and falls back to automated analysis — unaffected rules elsewhere in the same shared file remain trusted. - What happens when only some rules in a ruleset have a pre-calculated, shared, or personal entry? Every rule still gets a classification (FR-015) — covered rules use the matching entry, the rest go through automated analysis as if no persisted analysis existed for them. @@ -75,19 +76,19 @@ A new contributor or documentation reader should encounter "remediation safety" ### Functional Requirements -- **FR-001**: The system MUST provide a ruleset analyser that, given a Spectral-compatible ruleset, produces for each rule a risk level describing how risky it would be to automatically remediate violations of that rule. -- **FR-002**: The risk level produced for each rule MUST be one of exactly three values: `safe`, `humanreview`, or `unsafe`. -- **FR-003**: The ruleset analyser MUST also produce, for each rule, a confidence level indicating how confident the analyser is in the assigned risk level. +- **FR-001**: The system MUST provide a ruleset analyser that, given a Spectral-compatible ruleset, produces for each rule a remediation safety level describing how safe it would be to automatically remediate violations of that rule. +- **FR-002**: The remediation safety level produced for each rule MUST be one of exactly three values: `safe`, `humanreview`, or `unsafe`. +- **FR-003**: The ruleset analyser MUST produce, for each rule, a risk level (`low`/`medium`/`high`, reflecting how likely the rule's minimal satisfying remediation is to touch consumer-facing contract surface) and a confidence level (`high`/`medium`/`low`) in that estimate, as two independent signals — not as a single combined value. The rule's remediation safety level (FR-002) MUST be a field in its own right, derived from these two signals via the decision matrix defined in `automated_remediation_safety_algorithm_spec.md`, not assigned directly by any analysis stage and not merely a relabeling of the risk level. - **FR-004**: The ruleset analyser's classification logic MUST be implemented in alignment with a new specification document, `automated_remediation_safety_algorithm_spec.md`, authored as part of this feature and stored alongside the existing algorithm specs (`specs/algorithms/`). -- **FR-005**: Remediation safety for a given violation MUST be calculated by looking up the risk level (and confidence) the ruleset analyser assigned to that violation's rule, rather than via the prior ad hoc rule-id-prefix/path heuristic. +- **FR-005**: Remediation safety for a given violation MUST be calculated by looking up the remediation safety level (and risk/confidence signals) the ruleset analyser assigned to that violation's rule, rather than via the prior ad hoc rule-id-prefix/path heuristic. - **FR-006**: The `--remediation-safety` CLI option (and equivalent MCP/package parameters) MUST accept all three levels: `safe`, `humanreview`, and `unsafe`. - **FR-007**: Requesting `--remediation-safety safe` MUST produce output equivalent in scope to today's pre-feature behavior (no regression for existing users of the `safe` level). -- **FR-008**: Remediation safety information (risk level per violation, and the rule-level confidence behind it) MUST be included in both the JSON output and the human-readable output of every tool that currently reports remediation-safety/quick-fix information (CLI, MCP server tools, and any consuming packages such as the Backstage plugin where applicable). -- **FR-009**: When a violation's rule has no analyser result available at grading time, the system MUST default that violation to the most conservative risk level (`unsafe`) rather than omitting it or failing. +- **FR-008**: Remediation safety information (remediation safety level per violation, and the rule-level risk and confidence signals behind it) MUST be included in both the JSON output and the human-readable output of every tool that currently reports remediation-safety/quick-fix information (CLI, MCP server tools, and any consuming packages such as the Backstage plugin where applicable). +- **FR-009**: When a violation's rule has no analyser result available at grading time, the system MUST default that violation to the most conservative remediation safety level (`unsafe`) rather than omitting it or failing. - **FR-010**: All source code, tests, type/function/tool names, package metadata, and user-facing or contributor-facing documentation across the repository MUST be updated so that no "quick fix" terminology (in any casing or separator style) remains. -- **FR-011**: The ruleset analyser's per-rule results (risk level, confidence level, and rationale) MUST be inspectable by users, in both JSON and human-readable form, independent of grading a specific API spec (i.e. "analyse this ruleset" is a capability in its own right, not only an internal implementation detail). +- **FR-011**: The ruleset analyser's per-rule results (risk level, confidence level, remediation safety level, and rationale) MUST be inspectable by users, in both JSON and human-readable form, independent of grading a specific API spec (i.e. "analyse this ruleset" is a capability in its own right, not only an internal implementation detail) — so a user can see not just the final classification but the two independent signals (FR-003) it was derived from, to judge whether they agree with it. - **FR-012**: Before running the automated analysis stages, the system MUST check for a previously computed or pre-calculated ruleset analysis for the loaded ruleset and, when found, use it directly instead of recomputing from rule metadata. At minimum, the built-in ruleset MUST ship with such a pre-calculated analysis. -- **FR-013**: Users MUST be able to persist a correction to a rule's risk level (and, implicitly, raise its confidence to reflect human confirmation) for a specific ruleset, such that the corrected classification is automatically loaded and used the next time that same ruleset is analysed or graded against, without requiring the correction to be re-entered. +- **FR-013**: Users MUST be able to persist a correction to a rule's remediation safety level (and, implicitly, raise its confidence to reflect human confirmation) for a specific ruleset, such that the corrected classification is automatically loaded and used the next time that same ruleset is analysed or graded against, without requiring the correction to be re-entered. - **FR-014**: The system MUST be able to recognize, for a given rule within a ruleset, whether a pre-calculated or persisted classification for that exact rule definition is still valid (i.e. the rule hasn't changed since the classification was captured) — a stale entry for a changed rule MUST NOT be silently reused, and MUST NOT prevent the other, unchanged rules' entries from being used. - **FR-015**: When a persisted or pre-calculated analysis only covers some of the rules in the currently loaded ruleset (e.g. the ruleset gained rules since the analysis was captured), the system MUST still produce a complete classification for every rule (FR-001/SC-005) — covered rules use the persisted/pre-calculated entry, uncovered rules fall through to automated analysis. - **FR-016**: The system MUST support storing a ruleset's persisted analysis **colocated with the ruleset itself**, using a deterministic naming convention derived from the ruleset's own location, so that (a) presence or absence of persisted data for a given ruleset can be determined by a direct lookup at that derived location rather than a separate index, and (b) the persisted data can be shared between colleagues simply by it living alongside the ruleset (e.g. committed in the same repository), rather than each person having to separately configure their own copy. @@ -97,10 +98,11 @@ A new contributor or documentation reader should encounter "remediation safety" ### Key Entities *(include if feature involves data)* -- **Ruleset Analyser Result**: Per analysed ruleset, a collection of per-rule entries. Each entry references a rule id and carries the rule's assigned risk level, confidence level, a short human-readable rationale for the assignment, and where that assignment came from (freshly computed, pre-calculated/bundled, or a persisted user correction). -- **Risk Level**: One of `safe`, `humanreview`, `unsafe` — describes how safe it is to automatically remediate a violation of a given rule without human review. -- **Confidence Level**: Describes how confident the analyser is in a rule's assigned risk level (e.g. driven by how well-known/recognizable the rule is versus how custom/ambiguous it is, or whether a human has explicitly confirmed it). -- **Remediation Safety (per violation)**: The risk level applied to a specific violation found during grading, derived from the ruleset analyser's result for that violation's rule. +- **Ruleset Analyser Result**: Per analysed ruleset, a collection of per-rule entries. Each entry references a rule id and carries the rule's risk level, confidence level, remediation safety level (a field in its own right, derived from the first two via a decision matrix — see FR-003), a short human-readable rationale for the assignment, and where that assignment came from (freshly computed, pre-calculated/bundled, or a persisted user correction). For freshly-computed entries, "risk level" reflects the analyser's inference of the *likely minimal remediation* for the rule and whether that remediation would touch consumer-facing contract surface — not the final safe/humanreview/unsafe classification itself. +- **Remediation Safety Level**: The rule's *final* classification — one of `safe`, `humanreview`, `unsafe` — describing how safe it is to automatically remediate a violation of a given rule without human review. For automated entries this is always derived from Risk Level and Confidence Level via a fixed decision matrix, never assigned directly, so that a rule's classification and the analyser's certainty in it remain independently visible (see Edge Cases). +- **Risk Level**: One of `low`, `medium`, `high` — the analyser's estimate, independent of its confidence, of how likely the rule's minimal satisfying remediation is to alter consumer-facing contract surface (paths/channels, parameters, request/response or message schemas, security) versus only low-impact metadata. Distinct field, distinct type, distinct values from Remediation Safety Level. +- **Confidence Level**: Describes how confident the analyser is in a rule's assigned Risk Level (e.g. driven by how well-known/recognizable the rule's id or function is versus how custom/ambiguous it is, or whether a human has explicitly confirmed it). A rule can be high-risk/low-confidence (e.g. a custom function targeting `$.paths`) or low-risk/high-confidence (e.g. a `truthy` check on `$.info.description`) — these are not the same axis, and the decision matrix treats them as such. +- **Remediation Safety (per violation)**: The remediation safety level applied to a specific violation found during grading, derived from the ruleset analyser's result for that violation's rule. - **Rule Fingerprint**: A stable identifier for "this exact rule definition," derived from a rule's content (`given`, `then.function`, `severity`, `description`), used to detect whether a pre-calculated/persisted entry for that `ruleId` is still valid for the rule as currently defined (FR-014), independent of where the ruleset as a whole is stored or fetched from. - **Shared Ruleset Analysis**: A ruleset analysis (in full or in part) stored colocated with the ruleset itself via a deterministic naming convention (FR-016), readable by anyone who can read the ruleset (whether local file or GitHub-hosted, FR-017). This is the primary mechanism for a team/organisation to share one set of remediation-safety judgements instead of each person maintaining their own. - **Personal Ruleset Analysis Override**: A user-local correction (FR-018) that takes precedence over the Shared Ruleset Analysis and the automated analysis stages for the rule(s) it covers, without modifying the shared, colocated data. @@ -110,17 +112,17 @@ A new contributor or documentation reader should encounter "remediation safety" ### Measurable Outcomes - **SC-001**: Users grading any spec can distinguish all three remediation-safety levels (`safe`, `humanreview`, `unsafe`) in both JSON and human output, for both the built-in ruleset and a supplied custom ruleset. -- **SC-002**: For the built-in ruleset, every rule has a documented risk level and confidence level traceable to the `automated_remediation_safety_algorithm_spec.md` specification. +- **SC-002**: For the built-in ruleset, every rule has a documented risk level, confidence level, and remediation safety level traceable to the `automated_remediation_safety_algorithm_spec.md` specification. - **SC-003**: A repository-wide search for "quick fix" (any casing/separator) returns zero matches after the feature is complete. - **SC-004**: Existing `--remediation-safety safe` users observe no behavioral change in the set of violations returned, compared to before this feature. - **SC-005**: For an arbitrary, previously-unseen custom ruleset, the analyser completes and returns a risk and confidence level for 100% of its rules (no rule left unclassified). -- **SC-006**: A user-corrected risk level for a rule in a given ruleset is honored (returned without re-running automated analysis for that rule) on a subsequent, separate invocation against the same rule definition, and is no longer honored once that specific rule's definition changes. +- **SC-006**: A user-corrected remediation safety level for a rule in a given ruleset is honored (returned without re-running automated analysis for that rule) on a subsequent, separate invocation against the same rule definition, and is no longer honored once that specific rule's definition changes. - **SC-007**: The built-in ruleset's analysis is available without any per-rule automated computation having to run at request time (served from a pre-calculated/bundled result), for both the CLI and MCP surfaces. - **SC-008**: Two different users pointed at the same ruleset location (local path or GitHub-hosted) see identical classifications for every rule covered by that ruleset's shared, colocated analysis, without either of them having separately configured it. ## Assumptions -- The three risk levels (`safe`, `humanreview`, `unsafe`) and their relative ordering (in terms of caution) were fixed by Feature 11 and GOAL.md and are not renegotiated here. +- The three remediation safety levels (`safe`, `humanreview`, `unsafe`) and their relative ordering (in terms of caution) were fixed by Feature 11 and GOAL.md and are not renegotiated here. The `low`/`medium`/`high` risk level introduced by this feature (FR-003) is a separate, new scale, not a renaming or expansion of these three. - "Confidence level" is assumed to be a small ordered set (e.g. high/medium/low) rather than a continuous numeric score, consistent with how grades and other diagnostics in this project favor discrete, explainable categories over raw scores; the exact scale is defined in `automated_remediation_safety_algorithm_spec.md` during planning. - The ruleset analyser operates on rule definitions/metadata (id, applied path/schema patterns, severity, description) rather than on a corpus of historical remediation outcomes — there is no assumption of a training/feedback loop in this feature. - "Rationale" per rule is a short, human-readable explanation (not a separate structured field requiring its own schema beyond a text string) sufficient for users to understand why a level was assigned. From e8e7bb5573ddc96fd07d0b624c0e7c4e74042683 Mon Sep 17 00:00:00 2001 From: DawMatt Date: Wed, 24 Jun 2026 21:29:24 +1000 Subject: [PATCH 06/22] Safety assessment improve human curation support --- .../contracts/remediation-safety-surfaces.md | 6 +-- specs/012-remediation-safety/data-model.md | 50 ++++++++++++------- specs/012-remediation-safety/quickstart.md | 14 ++++-- specs/012-remediation-safety/research.md | 42 +++++++++------- specs/012-remediation-safety/spec.md | 16 ++++-- 5 files changed, 79 insertions(+), 49 deletions(-) diff --git a/specs/012-remediation-safety/contracts/remediation-safety-surfaces.md b/specs/012-remediation-safety/contracts/remediation-safety-surfaces.md index 17982cf..0ecdd7b 100644 --- a/specs/012-remediation-safety/contracts/remediation-safety-surfaces.md +++ b/specs/012-remediation-safety/contracts/remediation-safety-surfaces.md @@ -7,7 +7,7 @@ Supersedes `specs/011-remediation-safety-rename/contracts/remediation-safety-sur | Before this feature | After this feature | |---|---| | Accepts only `safe`; any other value rejected with `Error: --remediation-safety must be "safe".` | Accepts `safe`, `humanreview`, `unsafe`. Any other value rejected with `Error: --remediation-safety must be one of: safe, humanreview, unsafe.` | -| Filtered output built by `buildQuickFixOutput`/`formatQuickFixesHuman`, shape `QuickFixOutput` (`quickFixCount`, `quickFixes`). | Filtered output built by `buildRemediationSafetyOutput`/`formatRemediationSafetyHuman`, shape `RemediationSafetyOutput` (`remediationItemCount`, `remediationItems`, `requestedLevel`). Each item additionally carries `riskLevel` (`low`/`medium`/`high`), `confidenceLevel`, and `remediationSafetyLevel` (`safe`/`humanreview`/`unsafe` — a field in its own right, not the same field/type as `riskLevel`). | +| Filtered output built by `buildQuickFixOutput`/`formatQuickFixesHuman`, shape `QuickFixOutput` (`quickFixCount`, `quickFixes`). | Filtered output built by `buildRemediationSafetyOutput`/`formatRemediationSafetyHuman`, shape `RemediationSafetyOutput` (`remediationItemCount`, `remediationItems`, `requestedLevel`). Each item additionally carries `riskLevel` (`low`/`medium`/`high`), `confidenceLevel`, `remediationSafetyLevel` (`safe`/`humanreview`/`unsafe` — a field in its own right, not the same field/type as `riskLevel`), and `staleFingerprintWarning` (`null` unless the rule's classification is human-assessed and its fingerprint no longer matches — FR-021). | | `--remediation-safety safe` output identical to pre-Feature-12 `safe` output in violation membership. | Unchanged for `safe` membership (FR-007); new fields (`riskLevel`, `confidenceLevel`, `remediationSafetyLevel`, `requestedLevel`) are additive. `--remediation-safety`/`requestedLevel` filter against `remediationSafetyLevel`, not `riskLevel`. | ## CLI: new `ruleset-analysis` subcommand @@ -18,7 +18,7 @@ api-grade ruleset-analysis [--ruleset-path ] [--format json|human] - Without `--ruleset-path`, analyses the built-in default ruleset for the relevant format(s). - `--format json` returns a `RulesetAnalysis` JSON document. -- `--format human` (default) prints a table: rule id, risk level, confidence level, remediation safety level, rationale. Risk level and confidence level are the two independent signals the analyser produces (FR-003); remediation safety level is a field in its own right, derived from them via the decision matrix in `automated_remediation_safety_algorithm_spec.md`, not assigned directly. +- `--format human` (default) prints a table: rule id, risk level, confidence level, remediation safety level, assessed by (`human`/`automated` — FR-020), rationale, plus a fingerprint-mismatch warning line for any human-assessed rule whose stored fingerprint no longer matches (FR-021). Risk level and confidence level are the two independent signals the analyser produces (FR-003); remediation safety level is a field in its own right, derived from them via the decision matrix in `automated_remediation_safety_algorithm_spec.md`, not assigned directly — except for `assessed by: human` rows, which store `remediationSafetyLevel` directly and have no `riskLevel` to derive it from. - Exits non-zero only on a genuine error (e.g. ruleset file not found / unparseable) — analysis itself never partially fails (every rule gets an entry, per FR-001/SC-005). ## MCP: `grade-api-remediation-safety` tool — `level` parameter @@ -26,7 +26,7 @@ api-grade ruleset-analysis [--ruleset-path ] [--format json|human] | Before this feature | After this feature | |---|---| | `level: z.enum(['safe'])` | `level: z.enum(['safe', 'humanreview', 'unsafe'])` | -| Response payload: `QuickFixOutput` shape under different field names (`quickFixCount`, `quickFixes`) | Response payload: `RemediationSafetyOutput` shape (`remediationItemCount`, `remediationItems`, `requestedLevel`); each item includes `riskLevel`, `confidenceLevel`, `remediationSafetyLevel` | +| Response payload: `QuickFixOutput` shape under different field names (`quickFixCount`, `quickFixes`) | Response payload: `RemediationSafetyOutput` shape (`remediationItemCount`, `remediationItems`, `requestedLevel`); each item includes `riskLevel`, `confidenceLevel`, `remediationSafetyLevel`, `staleFingerprintWarning` | | Tool description silent on confidence/risk-tier concept | Tool description updated to mention all three levels and that each returned item carries a confidence indicator | ## MCP: new `analyse-ruleset-safety` tool diff --git a/specs/012-remediation-safety/data-model.md b/specs/012-remediation-safety/data-model.md index bd6a6b9..65ae0a7 100644 --- a/specs/012-remediation-safety/data-model.md +++ b/specs/012-remediation-safety/data-model.md @@ -12,13 +12,17 @@ Enum string: `"low"` | `"medium"` | `"high"`. The analyser's **estimate of consu Enum string: `"high"` | `"medium"` | `"low"`. Describes how confident the ruleset analyser is in a `RuleAnalysis`'s/`RemediationItem`'s assigned `riskLevel` — **not** directly in `remediationSafetyLevel`, though it feeds into deriving that value via the Decision Matrix below. -- `high` — the rule id matched a curated, known table (Stage 1); the rule's `given` selected path/channel object keys directly (Stage 2a); a recognized function (`truthy`/`pattern`/etc.) targeted an ontology area matching exactly one tier (Stage 2b); or the entry came from a persisted user correction or bundled pre-calculated default (Stage 0). -- `medium` — a recognized function's target spanned more than one ontology tier (Stage 2b), or the generic segment fallback matched a single, unambiguous tier (Stage 2c). -- `low` — the rule's function is unrecognized/custom (Stage 2b), the generic segment fallback matched more than one tier (Stage 2c, genuine ambiguity), or no recognizable signal at all (Stage 3 fallback). +- `high` — the rule's `given` selected path/channel object keys directly (Stage 1a); a recognized function (`truthy`/`pattern`/etc.) targeted an ontology area matching exactly one tier (Stage 1b); or the entry came from a Stage 0 lookup (persisted user correction, shared colocated analysis, or bundled pre-calculated default). +- `medium` — a recognized function's target spanned more than one ontology tier (Stage 1b), or the generic segment fallback matched a single, unambiguous tier (Stage 1c). +- `low` — the rule's function is unrecognized/custom (Stage 1b), the generic segment fallback matched more than one tier (Stage 1c, genuine ambiguity), or no recognizable signal at all (Stage 2 fallback). + +## AssessmentOrigin + +Enum string: `"human"` | `"automated"`. Who produced a `RuleAnalysis` entry's `remediationSafetyLevel` judgement — carried on the `assessedBy` field, independent of `source` (which stage/store produced the entry) and independent of `confidenceLevel` (how confident an *automated* judgement is in itself). `"human"` means a person explicitly reviewed and persisted this rule's classification (FR-013), including the built-in ruleset's well-known rules, which are authored by a maintainer rather than computed by Stage 1/2 — see `research.md` §3/§8. `"automated"` means Stage 1 or Stage 2 produced it with no human review, whether freshly computed or pre-computed and cached in `BundledRulesetAnalysis` at release time. This distinction governs fingerprint-staleness handling: see `RuleFingerprint` below. ## AnalysisSource -Enum string: `"persisted"` | `"bundled-default"` | `"curated"` | `"heuristic"` | `"fallback"`. Provenance of a `RuleAnalysis` entry — which stage of the algorithm (`automated_remediation_safety_algorithm_spec.md`) produced it. Not used for classification logic itself, but surfaced so a user inspecting analyser output (FR-011) can tell a human-confirmed entry (`persisted`) apart from an algorithmically-derived one. +Enum string: `"persisted"` | `"bundled-default"` | `"heuristic"` | `"fallback"`. Provenance of a `RuleAnalysis` entry — which stage of the algorithm (`automated_remediation_safety_algorithm_spec.md`) produced it. `"persisted"` covers both `SharedRulesetAnalysis` and `PersonalRulesetAnalysisOverride` lookups; `"heuristic"`/`"fallback"` are Stage 1/Stage 2 respectively. There is no `"curated"` value — the former hard-coded curated rule-id table no longer exists as a separate stage; its content is now `BundledRulesetAnalysis` entries with `source: "bundled-default"`, `assessedBy: "human"` (see `AssessmentOrigin`). Not used for classification logic itself, but surfaced so a user inspecting analyser output (FR-011) can tell which store/stage produced an entry — `assessedBy` is the complementary field for *who* (human vs. automated) made the judgement. ## RuleAnalysis @@ -27,17 +31,19 @@ One entry per rule in an analysed ruleset. | Field | Type | Description | |---|---|---| | `ruleId` | string | The rule's identifier within its ruleset. | -| `riskLevel` | `RiskLevel` \| `null` | The analyser's estimate of consumer-impact likelihood (Stages 1–3 only — see Decision Matrix below). `null` for `source: "persisted"` / `"bundled-default"` entries, which store a human-confirmed or pre-computed `remediationSafetyLevel` directly rather than deriving it from a risk estimate. | -| `confidenceLevel` | `ConfidenceLevel` | Confidence in `riskLevel` (Stages 1–3), or the confidence carried over from a Stage 0 entry. | -| `remediationSafetyLevel` | `RemediationSafetyLevel` | A field in its own right — the final assigned safety level for auto-remediating violations of this rule. For Stages 1–3, derived from `riskLevel` + `confidenceLevel` via the Decision Matrix below — never assigned directly by a stage. For Stage 0 entries, this is the stored value itself. | -| `rationale` | string | Short human-readable explanation of why this level/confidence was assigned (e.g. "rule id matched curated safe-prefix table" or "`pattern` function on a `paths` object key — public-surface rename"). | +| `riskLevel` | `RiskLevel` \| `null` | The analyser's estimate of consumer-impact likelihood (Stages 1–2 only — see Decision Matrix below). `null` for `source: "persisted"` / `"bundled-default"` entries, which store a human-confirmed or pre-computed `remediationSafetyLevel` directly rather than deriving it from a risk estimate. | +| `confidenceLevel` | `ConfidenceLevel` | Confidence in `riskLevel` (Stages 1–2), or the confidence carried over from a Stage 0 entry. | +| `remediationSafetyLevel` | `RemediationSafetyLevel` | A field in its own right — the final assigned safety level for auto-remediating violations of this rule. For Stages 1–2, derived from `riskLevel` + `confidenceLevel` via the Decision Matrix below — never assigned directly by a stage. For Stage 0 entries, this is the stored value itself. | +| `assessedBy` | `AssessmentOrigin` | Who produced this judgement — `"human"` for Stage 0 entries written via a persisted correction or authored by a maintainer for the built-in ruleset; `"automated"` for everything else (Stage 1, Stage 2, and any pre-computed `BundledRulesetAnalysis` entry without a maintainer judgement behind it). | +| `staleFingerprintWarning` | `{ storedFingerprint: string; currentFingerprint: string; message: string }` \| `null` | Set only when this entry came from a Stage 0 lookup with `assessedBy: "human"` whose stored `RuleFingerprint` no longer matches the rule's current fingerprint (see `RuleFingerprint` below) — the entry is still used, but flagged. `null` in every other case, including a fingerprint match. | +| `rationale` | string | Short human-readable explanation of why this level/confidence was assigned (e.g. "maintainer-confirmed safe classification" or "`pattern` function on a `paths` object key — public-surface rename"). | | `source` | `AnalysisSource` | Which stage produced this entry — see above. | -**Validation rules**: every rule present in the input ruleset MUST produce exactly one `RuleAnalysis` entry (FR-001, SC-005) — no rule is ever omitted from analyser output. For `source` values `"curated"`, `"heuristic"`, and `"fallback"` (Stages 1–3), `remediationSafetyLevel` MUST equal `decisionMatrix(riskLevel, confidenceLevel)` — it is a derived value, not independently settable. +**Validation rules**: every rule present in the input ruleset MUST produce exactly one `RuleAnalysis` entry (FR-001, SC-005) — no rule is ever omitted from analyser output. For `source` values `"heuristic"` and `"fallback"` (Stages 1–2), `remediationSafetyLevel` MUST equal `decisionMatrix(riskLevel, confidenceLevel)` — it is a derived value, not independently settable, and `assessedBy` MUST be `"automated"`. For `source` values `"persisted"` and `"bundled-default"`, `remediationSafetyLevel` is the stored value and `assessedBy` MUST reflect how that stored value was produced (see `AssessmentOrigin`). ### Decision Matrix -The single function shared by Stages 1–3 to derive `remediationSafetyLevel` from `riskLevel` and `confidenceLevel`, taken verbatim from `clarification-algorithm.md` §5 (see `research.md` §3 for how each stage produces its `riskLevel`/`confidenceLevel` inputs): +The single function shared by Stages 1–2 to derive `remediationSafetyLevel` from `riskLevel` and `confidenceLevel`, taken verbatim from `clarification-algorithm.md` §5 (see `research.md` §3 for how each stage produces its `riskLevel`/`confidenceLevel` inputs): ``` If riskLevel = low and confidenceLevel in {high, medium}: remediationSafetyLevel = safe @@ -56,7 +62,10 @@ A stable identifier for "this exact rule definition" (FR-014), used as part of t |---|---|---| | `value` | string | Hash over one rule's own content: `ruleId`, `given`, `then.function`, `severity`, `description`. | -**Relationships**: Computed independently per `ruleId`; never derived from `rulesetPath`/`rulesetUrl`. A `RuleAnalysis` entry stored with a given `RuleFingerprint` is only reused if the rule's current `RuleFingerprint` still matches (spec.md Edge Cases — stale entries are skipped per-rule, not per-ruleset). +**Relationships**: Computed independently per `ruleId`; never derived from `rulesetPath`/`rulesetUrl`. Reuse of a stored entry on fingerprint mismatch depends on `AssessmentOrigin` (per direct user feedback): +- `assessedBy: "automated"` entry, fingerprint mismatch → treated as not found; skipped per-rule (not per-ruleset), falls through to Stages 1–2 (spec.md Edge Cases). +- `assessedBy: "human"` entry, fingerprint mismatch → still reused as-is (the stored `remediationSafetyLevel`/`riskLevel`/`confidenceLevel` are returned unchanged), but the returned `RuleAnalysis.staleFingerprintWarning` is populated with both the stored and current fingerprint values, so callers can detect and surface that the rule changed since a human reviewed it without the system second-guessing that review. +- Either origin, fingerprint match → reused with `staleFingerprintWarning: null`. ## RulesetAnalysis @@ -64,7 +73,7 @@ A stable identifier for "this exact rule definition" (FR-014), used as part of t |---|---|---| | `rulesetSource` | `"default" \| "custom"` | Mirrors `GradeResult.rulesetSource`. | | `rulesetPath` | string (optional) | Present when `rulesetSource === "custom"`. | -| `rules` | `RuleAnalysis[]` | One entry per rule, see above. May be assembled from a mix of `source` values — some rules from Stage 0 (persisted/shared/bundled), the rest from Stages 1–3. | +| `rules` | `RuleAnalysis[]` | One entry per rule, see above. May be assembled from a mix of `source` values — some rules from Stage 0 (persisted/shared/bundled), the rest from Stages 1–2. | **Relationships**: Computed once per distinct rule definition (keyed by `RuleFingerprint`, see `SharedRulesetAnalysis`/`PersonalRulesetAnalysisOverride` below), not merely cached for the lifetime of one process — this corrects the original design, which assumed no cross-invocation persistence (see `research.md` §8, added after reassessment against `clarification-algorithm.md` and further revised after direct user input on the sharing requirement). `GradeEngine` (or a caller wrapping it) holds the `RulesetAnalysis` alongside the loaded ruleset for the duration of a single run and consults it when building remediation-safety output, rather than recomputing per violation; across separate runs — and across different users pointed at the same ruleset — the persisted/shared/bundled layer (Stage 0) is what avoids recomputing per-rule classification for rules it covers. @@ -75,9 +84,9 @@ A partial or full `RulesetAnalysis`, **colocated with the ruleset itself** (FR-0 | Field | Type | Description | |---|---|---| | `location` | string | Derived deterministically from the ruleset's own path/URL via a fixed naming convention (e.g. appending a suffix to the ruleset's filename) — never a separately-tracked or registered location. | -| `rules` | `Record` | Keyed by `ruleId`. May cover all or only some of a ruleset's rules — uncovered rules are simply absent from the map. Each entry carries the `RuleFingerprint.value` it was captured against, for staleness detection. | +| `rules` | `Record` | Keyed by `ruleId`. May cover all or only some of a ruleset's rules — uncovered rules are simply absent from the map. Each entry carries the `RuleFingerprint.value` it was captured against, for staleness detection, and an `assessedBy` value (see `AssessmentOrigin`) that determines whether a later fingerprint mismatch invalidates the entry or merely warns. | -**Validation rules**: every `RuleAnalysis` value in `rules` MUST have `source: "persisted"`. For a local ruleset this file lives on disk next to the ruleset and is read/written directly; for a GitHub-hosted ruleset it is *read* via the same `resolveRuleset`/`fetchRulesetContent` flow already used to fetch the ruleset (FR-017), but is never *written* automatically (FR-019) — see `PersonalRulesetAnalysisOverride` for what happens when a write is requested against a non-writable location. +**Validation rules**: every `RuleAnalysis` value in `rules` MUST have `source: "persisted"`. `assessedBy` is typically `"human"` here — writing to this file is the act of a user persisting a correction (FR-013) — but is not constrained to `"human"` by the shape itself, since a future automated caching write would use the same record shape with `assessedBy: "automated"`. For a local ruleset this file lives on disk next to the ruleset and is read/written directly; for a GitHub-hosted ruleset it is *read* via the same `resolveRuleset`/`fetchRulesetContent` flow already used to fetch the ruleset (FR-017), but is never *written* automatically (FR-019) — see `PersonalRulesetAnalysisOverride` for what happens when a write is requested against a non-writable location. ## PersonalRulesetAnalysisOverride (new, replaces the original PersistedRulesetAnalysis) @@ -92,11 +101,13 @@ A user-local correction (FR-018) that does not modify `SharedRulesetAnalysis`. R ## BundledRulesetAnalysis (new) -The built-in ruleset's pre-calculated analysis, shipped with the package (FR-012's "at a minimum the default ruleset" baseline). Same shape as `RulesetAnalysis`, generated once at release time by running Stages 1–3 over the built-in ruleset and committed alongside the package source — not regenerated at runtime. Every entry has `source: "bundled-default"`. +The built-in ruleset's pre-calculated analysis, shipped with the package (FR-012's "at a minimum the default ruleset" baseline). Same shape as `RulesetAnalysis`, committed alongside the package source and not regenerated at runtime. Every entry has `source: "bundled-default"`, but `assessedBy` varies per entry: well-known built-in rules (the ones a hard-coded curated table used to cover, before being folded into this mechanism per direct user feedback — see `research.md` §3/§8) are authored directly by a maintainer and stored as `assessedBy: "human"`; the remainder are generated once at release time by running Stages 1–2 over the built-in ruleset and stored as `assessedBy: "automated"`. There is no separate hard-coded table for the human-authored entries — they are ordinary `BundledRulesetAnalysis` records, edited the same way a maintainer would edit any other persisted analysis file. ## Lookup precedence (Stage 0) -For a given `ruleId`, checked in order until one matches a current `RuleFingerprint`: workspace-scoped `PersonalRulesetAnalysisOverride` → global-scoped `PersonalRulesetAnalysisOverride` → `SharedRulesetAnalysis` colocated with the ruleset → `BundledRulesetAnalysis` (only if this is the built-in ruleset) → fall through to Stages 1–3 of the algorithm. Personal overrides are checked first because they represent the most specific, most recently expressed intent for that user. +For a given `ruleId`, checked in order: workspace-scoped `PersonalRulesetAnalysisOverride` → global-scoped `PersonalRulesetAnalysisOverride` → `SharedRulesetAnalysis` colocated with the ruleset → `BundledRulesetAnalysis` (only if this is the built-in ruleset) → fall through to Stages 1–2 of the algorithm. Personal overrides are checked first because they represent the most specific, most recently expressed intent for that user. + +A store entry is used "until one matches a current `RuleFingerprint`" only for `assessedBy: "automated"` entries — an `assessedBy: "human"` entry is used as soon as it is found, fingerprint match or not (with `staleFingerprintWarning` populated on mismatch, per `RuleFingerprint` above). This means an earlier-precedence human entry always wins over a later-precedence store, even across a fingerprint mismatch; only when an entry is `"automated"` and its fingerprint is stale does the lookup continue to the next store in precedence order, exactly as before this revision. ## RemediationItem (was `QuickFix`) @@ -112,6 +123,7 @@ For a given `ruleId`, checked in order until one matches a current `RuleFingerpr | `riskLevel` | `RiskLevel` \| `null` | **New** — the violation's rule-level estimated risk (`low`/`medium`/`high`), looked up from the rule's `RuleAnalysis`. `null` when the lookup hit a Stage 0 entry that has no `riskLevel` of its own (see `RuleAnalysis`). | | `confidenceLevel` | `ConfidenceLevel` | **New** — confidence behind `riskLevel`, from the same lookup. | | `remediationSafetyLevel` | `RemediationSafetyLevel` | **New** — a field in its own right, distinct from `riskLevel` both in name and in type/values (`safe`/`humanreview`/`unsafe`, not `low`/`medium`/`high`). The violation's computed remediation safety, looked up from the rule's `RuleAnalysis.remediationSafetyLevel`. This is the field `--remediation-safety`/`level` filtering matches against. | +| `staleFingerprintWarning` | `{ storedFingerprint: string; currentFingerprint: string; message: string }` \| `null` | **New** — carried over verbatim from the rule's `RuleAnalysis.staleFingerprintWarning`, so a CI pipeline or human reading per-violation output sees the same "this rule changed since a human reviewed it" warning without needing to separately inspect the ruleset analysis. | ## RemediationSafetyOutput (was `QuickFixOutput`) @@ -128,6 +140,6 @@ For a given `ruleId`, checked in order until one matches a current `RuleFingerpr ## Lookup / default behavior -`getRemediationSafety(diagnostic, rulesetAnalysis) -> { riskLevel, confidenceLevel, remediationSafetyLevel }`: -- If `rulesetAnalysis.rules` contains an entry for `diagnostic.ruleId`, return its `riskLevel`/`confidenceLevel`/`remediationSafetyLevel` verbatim — all three are carried through to `RemediationItem` unchanged. -- Otherwise (FR-009), return `{ riskLevel: "high", confidenceLevel: "low", remediationSafetyLevel: "unsafe" }` — equivalent to a synthetic Stage 3 entry run through the Decision Matrix. +`getRemediationSafety(diagnostic, rulesetAnalysis) -> { riskLevel, confidenceLevel, remediationSafetyLevel, staleFingerprintWarning }`: +- If `rulesetAnalysis.rules` contains an entry for `diagnostic.ruleId`, return its `riskLevel`/`confidenceLevel`/`remediationSafetyLevel`/`staleFingerprintWarning` verbatim — all four are carried through to `RemediationItem` unchanged. +- Otherwise (FR-009), return `{ riskLevel: "high", confidenceLevel: "low", remediationSafetyLevel: "unsafe", staleFingerprintWarning: null }` — equivalent to a synthetic Stage 2 entry (`assessedBy: "automated"`) run through the Decision Matrix. diff --git a/specs/012-remediation-safety/quickstart.md b/specs/012-remediation-safety/quickstart.md index 47fa981..2465076 100644 --- a/specs/012-remediation-safety/quickstart.md +++ b/specs/012-remediation-safety/quickstart.md @@ -8,7 +8,7 @@ api-grade openapi.yaml --remediation-safety humanreview # new api-grade openapi.yaml --remediation-safety unsafe # new ``` -Each returned item now includes `riskLevel`, `confidenceLevel`, and `remediationSafetyLevel` — three separate fields, not one. `riskLevel` is `low`/`medium`/`high`; `remediationSafetyLevel` is `safe`/`humanreview`/`unsafe` and is what `--remediation-safety`/`requestedLevel` filters against: +Each returned item now includes `riskLevel`, `confidenceLevel`, `remediationSafetyLevel`, and `staleFingerprintWarning` (usually `null`). `riskLevel` is `low`/`medium`/`high`; `remediationSafetyLevel` is `safe`/`humanreview`/`unsafe` and is what `--remediation-safety`/`requestedLevel` filters against: ```json { @@ -23,6 +23,7 @@ Each returned item now includes `riskLevel`, `confidenceLevel`, and `remediation "riskLevel": "medium", "confidenceLevel": "high", "remediationSafetyLevel": "humanreview", + "staleFingerprintWarning": null, "...": "..." } ] @@ -33,14 +34,17 @@ Each returned item now includes `riskLevel`, `confidenceLevel`, and `remediation ```bash api-grade ruleset-analysis --format human -# rule id risk level confidence remediation safety rationale -# operation-description low high safe rule id matched curated safe-prefix table -# operation-operationId medium high humanreview rule id matched curated humanreview-prefix table -# oas3-schema high low unsafe no recognizable rule-id, function, or path signal +# rule id risk level confidence remediation safety assessed by rationale +# operation-description low high safe human maintainer-confirmed safe classification (bundled) +# operation-operationId medium high humanreview human maintainer-confirmed humanreview classification (bundled) +# oas3-schema high low unsafe automated no recognizable rule-id, function, or path signal +# custom-team-rule-007 low high safe human WARNING: fingerprint mismatch (stored a1b2c3..., current d4e5f6...) — rule changed since this was last reviewed; persisted classification still honored api-grade ruleset-analysis --ruleset-path ./my-ruleset.yaml --format json ``` +There is no separate hard-coded table backing the `human`-assessed rows above for the built-in ruleset — they are ordinary bundled persisted entries (FR-012/FR-020), the same mechanism a user would use to persist a correction for their own ruleset (FR-013). The last row illustrates FR-021: a human-assessed entry whose rule definition has since changed is still honored, but flagged with both the stored and current fingerprint rather than silently discarded. + ## 3. MCP: same filtering, plus a dedicated ruleset-analysis tool ```text diff --git a/specs/012-remediation-safety/research.md b/specs/012-remediation-safety/research.md index 7cc85db..4535c96 100644 --- a/specs/012-remediation-safety/research.md +++ b/specs/012-remediation-safety/research.md @@ -22,41 +22,43 @@ **Decision**: Extend the existing two-stage quick-fixes algorithm (`specs/algorithms/quick_fixes_algorithm_spec.md`) from two outcome classes (`nonBreaking`/`breaking`, plus `unknown`) to a model with three **independent** signals per rule — `riskLevel` (`low`/`medium`/`high`), `confidenceLevel` (`high`/`medium`/`low`), and `remediationSafetyLevel` (`safe`/`humanreview`/`unsafe`) **derived from the other two via a fixed decision matrix, as a field in its own right** — operating on **rule metadata** (`ruleId`, the rule's `given` JSONPath expression(s), `then.function`) rather than a violation's instance path: -- **Stage 1 — curated rule-id tables** (confidence: `high`): extend the existing `safe`-prefix table (maps to `riskLevel: low`) and add a new `humanreview`-prefix table for rules whose fixes are typically additive but operationally significant, e.g. `operation-operationId`, `oas3-server-not-example-com`, security/server-related rules (maps to `riskLevel: medium`). Anything not listed falls through to Stage 2. -- **Stage 2 — automated estimation from rule mechanics + contract-surface ontology** (confidence: `high`/`medium`/`low`, see below), checked in order: - - **2a. Key-selector check**: a `given` expression that selects object *keys* (the JSONPath Plus `~` modifier) under `paths` or `channels` → `riskLevel: high`, confidence `high`. Renaming a path or channel key is a public-surface rename by construction (clarification document Example B), independent of which function is applied. - - **2b. Function-mechanics classification of `then.function`** — infers the *likely minimal edit*, per the clarification document's "infer the minimal satisfying edit" step, then estimates risk from where that edit lands on the contract-surface ontology (§"Build a format-aware contract-surface ontology"): +- **Stage 0 — persisted/bundled lookup** (confidence: as stored; `assessedBy` as stored — see §8): checked first, before any automated computation. There is no separate hard-coded "curated rule-id table" stage. The classifications that used to live in such a table (e.g. `operation-description` is `safe`, `operation-operationId` is `humanreview`) are, for the built-in ruleset, simply entries in `BundledRulesetAnalysis` (§8) with `assessedBy: "human"` — the same persisted-analysis mechanism used for any user-supplied ruleset, not a separate code path or data shape. This removes the duplication of having one mechanism for "rules we know about" and a different one for "rules a user has told us about," and means a maintainer correcting a built-in rule's classification edits the same kind of record a user would. Anything not covered by Stage 0 falls through to Stage 1. +- **Stage 1 — automated estimation from rule mechanics + contract-surface ontology** (confidence: `high`/`medium`/`low`, see below), checked in order: + - **1a. Key-selector check**: a `given` expression that selects object *keys* (the JSONPath Plus `~` modifier) under `paths` or `channels` → `riskLevel: high`, confidence `high`. Renaming a path or channel key is a public-surface rename by construction (clarification document Example B), independent of which function is applied. + - **1b. Function-mechanics classification of `then.function`** — infers the *likely minimal edit*, per the clarification document's "infer the minimal satisfying edit" step, then estimates risk from where that edit lands on the contract-surface ontology (§"Build a format-aware contract-surface ontology"): - *Additive* functions (`truthy`, `defined`, `field`+`truthy` on a sub-field) imply "add/populate a field". Base `riskLevel: low`; escalated to `medium` if the targeted field is itself in `HUMANREVIEW_SEGMENTS`, or to `high` if in `UNSAFE_SEGMENTS` (e.g. `truthy` on a `parameters` entry is a different risk than `truthy` on `$.info.description`). - *Rename/reformat* functions (`pattern`, `casing`) imply "rename or reformat the targeted value". Base `riskLevel: medium`; escalated to `high` when the target is a high-impact ontology area (path/channel keys, parameters, security, request/response schemas); de-escalated to `low` only for low-impact metadata (description, contact, licence). - Confidence for both: `high` when the function+target combination matches exactly one ontology tier unambiguously; `medium` when it matches but spans tiers; `low` when the function is unrecognized. - *Custom JavaScript* functions: mechanics cannot be inferred statically (clarification document, "If the rule uses a custom JavaScript function"). `riskLevel: high`, confidence `low` — conservative by construction, per the constraint that custom functions "are arbitrary JavaScript functions" whose remediation intent cannot be derived from the declaration alone. - - **2c. Generic segment fallback within Stage 2**: for a rule whose function isn't recognized as additive/rename/custom but whose `given` still matches a known segment tier, `riskLevel` follows the matched tier (`UNSAFE_SEGMENTS` → `high`, `HUMANREVIEW_SEGMENTS` → `medium`, `SAFE_SEGMENTS` → `low`), confidence `medium` for a single unambiguous tier match, downgraded to `low` if the `given` matches segments from more than one tier (genuine ambiguity). + - **1c. Generic segment fallback within Stage 1**: for a rule whose function isn't recognized as additive/rename/custom but whose `given` still matches a known segment tier, `riskLevel` follows the matched tier (`UNSAFE_SEGMENTS` → `high`, `HUMANREVIEW_SEGMENTS` → `medium`, `SAFE_SEGMENTS` → `low`), confidence `medium` for a single unambiguous tier match, downgraded to `low` if the `given` matches segments from more than one tier (genuine ambiguity). - `UNSAFE_SEGMENTS`/`HUMANREVIEW_SEGMENTS`/`SAFE_SEGMENTS` are the same three tiers as before, extended for AsyncAPI (see the corrected paragraph below). -- **Stage 3 — fallback** (confidence: `low`): no rule-id, function, or path signal recognized at all (e.g. a whole-document rule like `given: "$"`) → `riskLevel: high`. Conservative-by-default, matching the existing project philosophy ("absence of a safety signal is never treated as evidence of safety"); the decision matrix below resolves `high` risk to `unsafe` regardless of confidence, so this is equivalent to today's hard-coded fallback. -- **Decision matrix** (applied uniformly to the output of Stages 1–3 to produce `remediationSafetyLevel`, taken verbatim from `clarification-algorithm.md` §5): +- **Stage 2 — fallback** (confidence: `low`): no rule-id, function, or path signal recognized at all (e.g. a whole-document rule like `given: "$"`) → `riskLevel: high`. Conservative-by-default, matching the existing project philosophy ("absence of a safety signal is never treated as evidence of safety"); the decision matrix below resolves `high` risk to `unsafe` regardless of confidence, so this is equivalent to today's hard-coded fallback. Always `assessedBy: "automated"` — there is no human judgement behind a Stage 2 entry. +- **Decision matrix** (applied uniformly to the output of Stages 1–2 to produce `remediationSafetyLevel`, taken verbatim from `clarification-algorithm.md` §5; Stage 0 entries store `remediationSafetyLevel` directly and never pass through this matrix): ``` If riskLevel = low and confidenceLevel in {high, medium}: remediationSafetyLevel = safe Else if riskLevel = medium and confidenceLevel = high: remediationSafetyLevel = humanreview Else if riskLevel = high: remediationSafetyLevel = unsafe Else: remediationSafetyLevel = humanreview ``` - This table is total over the 3×3 input space (every other combination — e.g. `low`/`low`, `medium`/`medium`, `medium`/`low` — falls into the `Else` branch and resolves to `humanreview`), so no additional default is needed beyond Stage 3's `high`/`low` fallback. + This table is total over the 3×3 input space (every other combination — e.g. `low`/`low`, `medium`/`medium`, `medium`/`low` — falls into the `Else` branch and resolves to `humanreview`), so no additional default is needed beyond Stage 2's `high`/`low` fallback. -**Rationale**: Reuses a proven, explainable, deterministic pattern already accepted by the project (and by users reading `quick_fixes_algorithm_spec.md`) rather than inventing a new paradigm; running every stage's output through one shared decision-matrix function (rather than having curated tables assign `remediationSafetyLevel` directly) keeps `riskLevel` and `confidenceLevel` as genuinely independent signals end-to-end, satisfying FR-002/FR-003 without a separate code path per stage, and keeps `remediationSafetyLevel` a field in its own right rather than a relabeling of `riskLevel`. +**Rationale**: Reuses a proven, explainable, deterministic pattern already accepted by the project (and by users reading `quick_fixes_algorithm_spec.md`) rather than inventing a new paradigm; running every automated stage's output through one shared decision-matrix function (rather than having a curated table assign `remediationSafetyLevel` directly) keeps `riskLevel` and `confidenceLevel` as genuinely independent signals end-to-end, satisfying FR-002/FR-003 without a separate code path per stage, and keeps `remediationSafetyLevel` a field in its own right rather than a relabeling of `riskLevel`. Folding the former curated table into Stage 0/`BundledRulesetAnalysis` (per direct user feedback) additionally means the built-in ruleset's known-good classifications benefit from the same fingerprint-staleness and human-override handling (§8) as any other persisted entry, instead of being a hard-coded table that silently drifts out of sync with the built-in ruleset's actual rule definitions. -**Alternatives considered**: Statistical/ML classification over rule descriptions — rejected: nondeterministic, costly, and violates Constitution Principle V (zero-cost prerequisites) if it requires an external model; also harder to explain ("rationale" requirement, FR-003) than a deterministic rule table. Having Stage 1's curated tables assign `remediationSafetyLevel` directly (the original design) — rejected after reassessment below: it conflated risk and safety level exactly the way the clarification document warns against, and gave Stage 1 a different output shape than Stages 2–3. +**Alternatives considered**: Statistical/ML classification over rule descriptions — rejected: nondeterministic, costly, and violates Constitution Principle V (zero-cost prerequisites) if it requires an external model; also harder to explain ("rationale" requirement, FR-003) than a deterministic rule table. Having a curated rule-id table assign `remediationSafetyLevel` directly as its own hard-coded code-level stage (the original design, "Stage 1") — rejected after reassessment below: besides conflating risk and safety level, it duplicated the persisted-analysis mechanism (§8) with a second, parallel "known rules" concept that had its own staleness story (none) instead of reusing fingerprinting. -**Reassessed against `clarification-algorithm.md` — gap found and corrected**: the prior segment tables (`UNSAFE_SEGMENTS`/`HUMANREVIEW_SEGMENTS`/`SAFE_SEGMENTS`) were carried over unchanged from the OpenAPI-only quick-fixes algorithm and contained no AsyncAPI-specific terms, despite Constitution Principle I requiring format-neutral treatment and the clarification document dedicating an explicit section ("Build a format-aware contract-surface ontology") to AsyncAPI's high-impact surfaces — channel `address`, channel parameters, operation `action`, the operation-channel relationship, and messages/payload schemas. The clarification document's worked example (Example B: a `pattern` rule on `$.paths[*]~`, the object-*key* selector) is also not caught by plain segment-membership matching, since `paths`/`channels` were never in any tier as bare segments — adding them as ordinary segments would over-match every rule that merely reads something nested under a path/channel (including safe ones like `operation-description`). Two corrections folded into the updated algorithm spec: (a) extend `UNSAFE_SEGMENTS` with AsyncAPI's high-impact segment terms (`address`, `action`, `messages`, `payload`) alongside the existing OpenAPI ones, and add `channels`/`operations`/`reply` to `HUMANREVIEW_SEGMENTS` as broader/ambiguous AsyncAPI surfaces; (b) add a dedicated **key-selector check** ahead of generic segment matching (Stage 2a above), since renaming a path or channel key is a public-surface rename by construction, matching the clarification document's Example B directly. +**Reassessed against `clarification-algorithm.md` — gap found and corrected**: the prior segment tables (`UNSAFE_SEGMENTS`/`HUMANREVIEW_SEGMENTS`/`SAFE_SEGMENTS`) were carried over unchanged from the OpenAPI-only quick-fixes algorithm and contained no AsyncAPI-specific terms, despite Constitution Principle I requiring format-neutral treatment and the clarification document dedicating an explicit section ("Build a format-aware contract-surface ontology") to AsyncAPI's high-impact surfaces — channel `address`, channel parameters, operation `action`, the operation-channel relationship, and messages/payload schemas. The clarification document's worked example (Example B: a `pattern` rule on `$.paths[*]~`, the object-*key* selector) is also not caught by plain segment-membership matching, since `paths`/`channels` were never in any tier as bare segments — adding them as ordinary segments would over-match every rule that merely reads something nested under a path/channel (including safe ones like `operation-description`). Two corrections folded into the updated algorithm spec: (a) extend `UNSAFE_SEGMENTS` with AsyncAPI's high-impact segment terms (`address`, `action`, `messages`, `payload`) alongside the existing OpenAPI ones, and add `channels`/`operations`/`reply` to `HUMANREVIEW_SEGMENTS` as broader/ambiguous AsyncAPI surfaces; (b) add a dedicated **key-selector check** ahead of generic segment matching (Stage 1a above), since renaming a path or channel key is a public-surface rename by construction, matching the clarification document's Example B directly. + +**Reassessed a third time, per direct user feedback — curated table folded into Stage 0**: the former "Stage 1 — curated rule-id tables" has been removed as a distinct, hard-coded stage. Its content (the `safe`-prefix and `humanreview`-prefix rule-id mappings) is migrated into `BundledRulesetAnalysis` (§8) as ordinary persisted entries with `assessedBy: "human"`, looked up via the existing Stage 0 precedence rather than a separate code table. Stages formerly numbered 2 and 3 are renumbered to 1 and 2 accordingly. This is a pure mechanism consolidation, not a behavior change for the built-in ruleset's existing classifications — but it does mean those classifications now benefit from fingerprint staleness detection and the human-override warning behavior (§8) that a hard-coded table could never have had (a hard-coded table has no fingerprint to compare against, so it could silently go stale against a built-in ruleset edit with no detection at all). **Reassessed a second time against `clarification-algorithm.md` — second gap found and corrected**: an earlier pass of this document claimed the risk/confidence/safety separation was "confirmed and unchanged," reasoning that confidence-as-an-explanatory-annotation alongside a directly-assigned three-value `riskLevel` field already satisfied the clarification document's §5 ("Separate risk from confidence from safety level"). That reassessment was made before `clarification-algorithm.md` was edited to add its explicit `estimatedRisk`/`confidence`/`remediationSafetyLevel` field list and decision-matrix pseudocode (§5), and did not survive contact with that addition: the original design had no risk-estimate field independent of the final classification at all — Stages 1–3 assigned the final three-value level directly under the name `riskLevel`, and `confidence` never changed which bucket a rule landed in, only how the assignment was explained. That is precisely the conflation the clarification document warns produces "high risk, low confidence" and "low risk, high confidence" cases that a system without the separation cannot represent (its own worked examples in §5). The revision above restores the three independent signals — renaming the low/medium/high estimate to `riskLevel` and giving the derived safe/humanreview/unsafe value its own field, `remediationSafetyLevel`, rather than the two sharing one overloaded field name — and the literal decision matrix, and also folds in the clarification document's "Recommended High Level Estimating Model Approach" steps 3–4 (infer the minimal satisfying edit; estimate whether it touches public-contract elements) and the "Recommended approach for a new ruleset" step 3 (infer likely remediation from rule mechanics — function semantics, not just rule-id/segment matching), neither of which the prior design implemented either. ## 4. Default/fallback behavior when a rule has no analysis -**Decision**: Any violation whose `ruleId` is absent from the cached `RulesetAnalysis` (e.g. ruleset changed between analysis and grading) defaults to `riskLevel: high` / `confidenceLevel: low` / `remediationSafetyLevel: unsafe` at lookup time, not just at Stage 3 of the analyser itself — this is the same triple Stage 3 produces, computed directly rather than via the decision matrix, since there is no rule metadata at all to run the matrix against. +**Decision**: Any violation whose `ruleId` is absent from the cached `RulesetAnalysis` (e.g. ruleset changed between analysis and grading) defaults to `riskLevel: high` / `confidenceLevel: low` / `remediationSafetyLevel: unsafe` / `assessedBy: "automated"` at lookup time, not just at Stage 2 of the analyser itself — this is the same quadruple Stage 2 produces, computed directly rather than via the decision matrix, since there is no rule metadata at all to run the matrix against. **Rationale**: Directly required by FR-009 and the spec's first Edge Case; keeps the conservative-by-default guarantee end-to-end, not just inside the analyser. -**Reassessed against `clarification-algorithm.md`**: confirmed and unchanged, and now also governs persisted/pre-calculated entries (§8): a persisted analysis covering only some rules (FR-015) is exactly the "absent from `RulesetAnalysis`" case for the rules it doesn't cover — they fall through to Stages 1–3, then to this same lookup-miss default if still unclassified. One lookup-miss path, reused for every reason a rule might be unclassified (unanalysed ruleset, stale persisted entry, or partial persisted coverage). +**Reassessed against `clarification-algorithm.md`**: confirmed and unchanged, and now also governs persisted/pre-calculated entries (§8): a persisted analysis covering only some rules (FR-015) is exactly the "absent from `RulesetAnalysis`" case for the rules it doesn't cover — they fall through to Stages 0–2, then to this same lookup-miss default if still unclassified. One lookup-miss path, reused for every reason a rule might be unclassified (unanalysed ruleset, an automated entry invalidated by a fingerprint mismatch, or partial persisted coverage). Note this does **not** apply to a human-assessed (`assessedBy: "human"`) entry whose fingerprint no longer matches — per §8, that entry is still used (with a warning), not treated as absent. ## 5. Internal naming cleanup (completing the Feature 11 deferral) @@ -69,7 +71,8 @@ | `buildQuickFix()` | `buildRemediationItem()` | | `buildQuickFixOutput()` | `buildRemediationSafetyOutput()` | | `formatQuickFixesHuman()` | `formatRemediationSafetyHuman()` | -| Types: `QuickFix`, `QuickFixOutput`, `ViolationClass` | `RemediationItem`, `RemediationSafetyOutput`, `RemediationSafetyLevel` (3-value, the `remediationSafetyLevel` field), plus new `RiskLevel` (3-value, the `riskLevel` field — distinct field and type from `RemediationSafetyLevel`), `ConfidenceLevel`, `RuleAnalysis`, `RulesetAnalysis` | +| Types: `QuickFix`, `QuickFixOutput`, `ViolationClass` | `RemediationItem`, `RemediationSafetyOutput`, `RemediationSafetyLevel` (3-value, the `remediationSafetyLevel` field), plus new `RiskLevel` (3-value, the `riskLevel` field — distinct field and type from `RemediationSafetyLevel`), `ConfidenceLevel`, `AssessmentOrigin` (`"human"` \| `"automated"`, the `assessedBy` field), `RuleAnalysis`, `RulesetAnalysis` | +| `quick_fixes_algorithm_spec.md`'s hard-coded `safe`/`humanreview`-prefix rule-id tables | Removed as a distinct code-level stage; migrated into `BundledRulesetAnalysis` (§8) as `assessedBy: "human"` persisted entries, consulted via the same Stage 0 lookup used for any other ruleset | | `packages/api-grade-mcp/src/tools/quick-fixes-only.ts`, `registerQuickFixesOnlyTool` | `remediation-safety.ts`, `registerRemediationSafetyTool` | | `tests/integration/cli-quick-fixes.test.ts`, `packages/api-grade-mcp/tests/integration/quick-fixes-only.test.ts`, `packages/api-grade-core/tests/unit/quick-fixes.test.ts` | renamed to `cli-remediation-safety.test.ts`, `remediation-safety.test.ts`, `remediation-safety.test.ts` | @@ -100,14 +103,19 @@ Design: - **Primary mechanism — colocated Shared Ruleset Analysis (FR-016/FR-017)**: the persisted analysis for a ruleset lives next to the ruleset itself, at a location derived deterministically from the ruleset's own path/URL by a naming convention (e.g. appending a fixed suffix to the ruleset's filename). Presence/absence is therefore a direct lookup at that derived location — no separate index or registry needs to be consulted or kept in sync. For a local ruleset, this is a sibling file in the same directory. For a GitHub-hosted ruleset, this is a sibling file at the same repo path/ref, fetched through the *same* resolution and auth flow already used to fetch the ruleset (`resolveRuleset`/`fetchRulesetContent`, reusing whatever `AuthConfig`/GitHub PAT was already supplied) — no new auth concept. Because it is colocated and (for GitHub-hosted rulesets) typically version-controlled in the same repository, anyone who can read the ruleset automatically sees the same shared analysis — satisfying the sharing goal without any per-user setup (SC-008). - **Per-rule Fingerprint, not a whole-ruleset hash**: each entry in the shared file is keyed by `ruleId` and carries a fingerprint of that rule's own content (`given`/`then.function`/`severity`/`description`). Recomputing the current ruleset's rules and comparing fingerprints per-`ruleId` means one edited rule invalidates only its own entry, not the whole shared file — important precisely because this file is meant to be shared and incrementally maintained by a team over time, not regenerated wholesale on every edit. (A whole-ruleset hash, as in the original design, would invalidate every entry the moment any one rule changed — too coarse for a file meant to accumulate team knowledge.) -- **Secondary mechanism — Personal Ruleset Analysis Override (FR-018)**: the original workspace/global config-file design (`RulesetConfig`-style storage) is retained, but repurposed: it now holds only a user's *personal* corrections, which take precedence over the shared colocated file and over automated analysis, without writing to the shared file. This covers the case where a user disagrees with the team's shared judgement, or can read but not write the ruleset's location (e.g. a GitHub-hosted ruleset they have only read access to). Lookup precedence: personal override (workspace, then global) → shared colocated analysis → bundled default (built-in ruleset only) → Stages 1–3. -- **Partial coverage**: both the shared file and the personal override are maps keyed by `ruleId`; only rules present in a given map short-circuit Stages 1–3 for this ruleset (FR-015). This is the same mechanism as the existing lookup-miss default (§4) — a rule not in either map is simply not a hit, and analysis proceeds normally for it. +- **`assessedBy: "human" | "automated"` on every persisted entry**: each entry in the shared file, the personal override, and `BundledRulesetAnalysis` carries who produced the stored judgement, not just what it is. A correction written via FR-013 (a user explicitly persisting their own judgement) is always `assessedBy: "human"`. An entry written by the tool itself without a human in the loop — e.g. a future caching optimization that persists an automated Stage 1/2 result purely to avoid recomputing it — would be `assessedBy: "automated"`. This distinction is what the fingerprint-mismatch handling below keys off. +- **Fingerprint mismatch on a human-assessed entry is honored, not discarded, but warned about**: per direct user feedback, a stale fingerprint on an `assessedBy: "automated"` entry is treated as today (§4) — not found, falls through to Stages 1–2. A stale fingerprint on an `assessedBy: "human"` entry is **still used** — a human's judgement about a rule's remediation safety does not become wrong just because the rule's `given`/`then.function`/`severity`/`description` text changed; the human may well have already accounted for the kind of change that occurred, and silently discarding their explicit correction on every minor rule edit would force them to re-confirm it indefinitely, undermining the entire point of FR-013. Instead the analyser surfaces a warning carrying both fingerprints — the one the entry was captured against and the rule's current one — so a user/CI pipeline can see that *something* about the rule changed since a human last reviewed it, without the tool unilaterally deciding the human's prior judgement no longer applies. The warning is attached to that rule's `RuleAnalysis` entry (visible via FR-011) and to the rule's `RemediationItem` for any violation it produces during grading, rather than only logged — so it survives into JSON output for programmatic consumers (e.g. a CI job that wants to fail/flag when this occurs) and not just human-readable text. +- **Secondary mechanism — Personal Ruleset Analysis Override (FR-018)**: the original workspace/global config-file design (`RulesetConfig`-style storage) is retained, but repurposed: it now holds only a user's *personal* corrections, which take precedence over the shared colocated file and over automated analysis, without writing to the shared file. This covers the case where a user disagrees with the team's shared judgement, or can read but not write the ruleset's location (e.g. a GitHub-hosted ruleset they have only read access to). Lookup precedence: personal override (workspace, then global) → shared colocated analysis → bundled default (built-in ruleset only) → Stages 1–2. +- **Partial coverage**: both the shared file and the personal override are maps keyed by `ruleId`; only rules present in a given map short-circuit Stages 1–2 for this ruleset (FR-015). This is the same mechanism as the existing lookup-miss default (§4) — a rule not in either map is simply not a hit, and analysis proceeds normally for it. - **Writing a correction (FR-013/FR-018/FR-019)**: for a *local* ruleset, persisting "to the shared file" is a normal local file write — low-risk and reversible, no different from any other local file edit the tool makes, and the resulting diff is something the user can review/commit/PR through their normal process. For a *GitHub-hosted* (or otherwise remote, non-writable) ruleset, the tool does not push a commit automatically (FR-019) — pushing a change to a shared, version-controlled artifact that other colleagues rely on is a different risk class than editing a local file, and should go through the user's normal review process, not a silent automated write. In that case the correction is recorded as a Personal Override locally, and the tool can additionally emit the updated shared-file content for the user to commit themselves. +- **The former curated rule-id table, migrated**: `BundledRulesetAnalysis` (the built-in ruleset's pre-calculated analysis) is no longer "Stages 1–3 run once at release time over the built-in ruleset" for its well-known rules — those well-known rules' entries are authored directly by a maintainer (the same act of judgement the old hard-coded table represented) and stored with `assessedBy: "human"`, exactly like any other persisted correction. The remaining built-in rules without a maintainer judgement are still pre-computed by Stages 1–2 at release time and stored with `assessedBy: "automated"`. Both kinds of entry live in the same file, looked up the same way — there is no longer a separate "is this rule in the curated table" code path. -**Rationale**: Both source documents (`clarification-algorithm.md` and the user's own architectural input) converge on the same underlying need — avoid re-deriving (and re-human-reviewing) the same ruleset's classifications repeatedly, for every person who uses it. Colocation is the simpler, more direct way to satisfy "the user can perform this review once and then encode the correct safety level for this rule in this ruleset" *for an entire team*, rather than once per person. It also reuses existing project capability (the ruleset resolution/fetch path already handles "give me a file or URL, with optional auth") rather than inventing a new shared-storage concept. +**Rationale**: Both source documents (`clarification-algorithm.md` and the user's own architectural input) converge on the same underlying need — avoid re-deriving (and re-human-reviewing) the same ruleset's classifications repeatedly, for every person who uses it. Colocation is the simpler, more direct way to satisfy "the user can perform this review once and then encode the correct safety level for this rule in this ruleset" *for an entire team*, rather than once per person. It also reuses existing project capability (the ruleset resolution/fetch path already handles "give me a file or URL, with optional auth") rather than inventing a new shared-storage concept. Honoring `assessedBy: "human"` entries across a fingerprint mismatch (with a warning) extends that same "review once, trust it" goal to surviving small, incidental rule edits — a maintainer rewording a rule's `description` should not silently un-confirm a human's safety judgement about that rule's `given`/`function` semantics. **Alternatives considered**: - Keying the shared file by a whole-ruleset content hash (the original decision) — rejected per the fingerprint discussion above: too coarse for a file intended to be incrementally maintained. - A purely per-user workspace/global store as the *only* mechanism (the original decision) — rejected: cannot satisfy the stated organisational-sharing goal; a teammate pointed at the same ruleset would never see it. - A separate index/registry file mapping ruleset identities to analysis file locations — rejected in favor of a pure naming-convention derivation: an index is one more thing that can drift out of sync with the rulesets it describes, where a deterministic derived path cannot. - Automatically committing corrections back to a GitHub-hosted ruleset's repository — rejected: writing to a shared, remote artifact without an explicit human review step is a meaningfully different risk than a local file edit, and not something this feature should do silently. +- Discarding a human-assessed entry on any fingerprint mismatch, same as an automated entry (the original design) — rejected per direct user feedback: it treats a human's explicit, persisted judgement as no more durable than a heuristic guess, defeating the purpose of letting a human confirm a classification at all. +- Silently honoring a human-assessed entry on mismatch with no warning at all — rejected: a user/CI pipeline has a legitimate interest in knowing the rule changed since a human last looked at it, even though the prior judgement is still being trusted; surfacing the two fingerprints costs little and preserves an audit trail. diff --git a/specs/012-remediation-safety/spec.md b/specs/012-remediation-safety/spec.md index 8228856..f51b4c6 100644 --- a/specs/012-remediation-safety/spec.md +++ b/specs/012-remediation-safety/spec.md @@ -66,9 +66,9 @@ A new contributor or documentation reader should encounter "remediation safety" - What happens when a custom/private ruleset is supplied that the analyser has never seen before? It must still produce a complete classification (risk level + confidence level + derived remediation safety level) for every rule, with confidence honestly reflecting the lack of prior knowledge, rather than failing the grading run. - What happens when risk level and confidence level disagree (e.g. `medium` risk with only `low` confidence, or `low` risk with `low` confidence)? The decision matrix (FR-003, `automated_remediation_safety_algorithm_spec.md`) resolves every such combination to `humanreview`, not `safe` — low confidence in a risk estimate is never, by itself, grounds for the most permissive classification, consistent with the project's conservative-by-default posture (FR-009). - What happens to existing consumers (CI pipelines, scripts) that depend on today's binary `safe` vs "not safe" filtering? `--remediation-safety safe` (and equivalent MCP/package usage) must continue to mean exactly what it means today; the new levels are additive, not a breaking redefinition of `safe`. -- What happens when a rule's definition changes after a colocated/shared or personal analysis entry was captured for it? That rule's Fingerprint no longer matches, so the stale entry is treated as not found for that rule alone (FR-014) and falls back to automated analysis — unaffected rules elsewhere in the same shared file remain trusted. +- What happens when a rule's definition changes after a colocated/shared or personal analysis entry was captured for it? If that entry was produced automatically (no human review), its Fingerprint no longer matches, so it is treated as not found for that rule alone (FR-014) and falls back to automated analysis — unaffected rules elsewhere in the same shared file remain trusted. If that entry was assessed by a human (FR-020), it is still honored as-is (FR-021) — a human's judgement is not invalidated by a rule edit the same way an automated guess is — but a warning naming the rule and both fingerprints (old and current) is surfaced, so the discrepancy is visible rather than silent. - What happens when only some rules in a ruleset have a pre-calculated, shared, or personal entry? Every rule still gets a classification (FR-015) — covered rules use the matching entry, the rest go through automated analysis as if no persisted analysis existed for them. -- What happens on a machine/environment where no persisted analysis exists yet for a ruleset that has never been analysed before (including the very first time anyone on a team uses it)? The system performs the existing automated analysis (Stages 1–3) and proceeds normally; persistence is an optimization and trust-building mechanism, never a precondition for producing output. +- What happens on a machine/environment where no persisted analysis exists yet for a ruleset that has never been analysed before (including the very first time anyone on a team uses it)? The system performs the existing automated analysis (Stages 1–2) and proceeds normally; persistence is an optimization and trust-building mechanism, never a precondition for producing output. - What happens when a user wants to disagree with a colleague's shared classification for a rule, but doesn't have write access to the ruleset's location (e.g. it's a GitHub-hosted ruleset they can read but not push to) or doesn't want to change what the rest of the team sees? They persist a Personal Ruleset Analysis Override (FR-018) instead, which is honored for them locally without modifying the shared, colocated data. - What happens when the ruleset is GitHub-hosted and a user persists a correction intended to be shared? The system does not push a commit to the remote location automatically (FR-019); it can still produce the updated shared-analysis content for the user to commit themselves through their normal review process, and in the meantime the correction is honored locally as a Personal Ruleset Analysis Override. @@ -89,12 +89,14 @@ A new contributor or documentation reader should encounter "remediation safety" - **FR-011**: The ruleset analyser's per-rule results (risk level, confidence level, remediation safety level, and rationale) MUST be inspectable by users, in both JSON and human-readable form, independent of grading a specific API spec (i.e. "analyse this ruleset" is a capability in its own right, not only an internal implementation detail) — so a user can see not just the final classification but the two independent signals (FR-003) it was derived from, to judge whether they agree with it. - **FR-012**: Before running the automated analysis stages, the system MUST check for a previously computed or pre-calculated ruleset analysis for the loaded ruleset and, when found, use it directly instead of recomputing from rule metadata. At minimum, the built-in ruleset MUST ship with such a pre-calculated analysis. - **FR-013**: Users MUST be able to persist a correction to a rule's remediation safety level (and, implicitly, raise its confidence to reflect human confirmation) for a specific ruleset, such that the corrected classification is automatically loaded and used the next time that same ruleset is analysed or graded against, without requiring the correction to be re-entered. -- **FR-014**: The system MUST be able to recognize, for a given rule within a ruleset, whether a pre-calculated or persisted classification for that exact rule definition is still valid (i.e. the rule hasn't changed since the classification was captured) — a stale entry for a changed rule MUST NOT be silently reused, and MUST NOT prevent the other, unchanged rules' entries from being used. +- **FR-014**: The system MUST be able to recognize, for a given rule within a ruleset, whether a pre-calculated or persisted classification for that exact rule definition is still valid (i.e. the rule hasn't changed since the classification was captured). For a classification that was produced automatically (no human review), a stale entry for a changed rule MUST NOT be silently reused — it MUST be treated as not found for that rule, falling back to automated analysis, and MUST NOT prevent the other, unchanged rules' entries from being used. For a classification that a human explicitly assessed and persisted (FR-013/FR-020), staleness MUST NOT cause the entry to be discarded (FR-021). - **FR-015**: When a persisted or pre-calculated analysis only covers some of the rules in the currently loaded ruleset (e.g. the ruleset gained rules since the analysis was captured), the system MUST still produce a complete classification for every rule (FR-001/SC-005) — covered rules use the persisted/pre-calculated entry, uncovered rules fall through to automated analysis. - **FR-016**: The system MUST support storing a ruleset's persisted analysis **colocated with the ruleset itself**, using a deterministic naming convention derived from the ruleset's own location, so that (a) presence or absence of persisted data for a given ruleset can be determined by a direct lookup at that derived location rather than a separate index, and (b) the persisted data can be shared between colleagues simply by it living alongside the ruleset (e.g. committed in the same repository), rather than each person having to separately configure their own copy. - **FR-017**: The colocated lookup (FR-016) MUST work uniformly whether the ruleset is supplied as a local file path or fetched from a remote/GitHub-hosted location, reusing the same resolution and authentication mechanism already used to fetch the ruleset itself. - **FR-018**: In addition to the shared, colocated analysis (FR-016), a user MUST be able to persist a personal correction that does not modify the shared colocated data — for cases where they lack write access to the ruleset's location, or want to apply their own judgement locally without changing what their colleagues see. A personal correction MUST take precedence, for that user, over both the shared colocated analysis and the automated analysis stages for the rule(s) it covers. - **FR-019**: When the ruleset's location is not writable by the system directly (e.g. a GitHub-hosted ruleset), the system MUST NOT automatically write or commit a correction back to that remote location. It MAY still read any existing colocated shared analysis there (FR-017), and MAY produce the content a user would need to commit themselves to update the shared analysis. +- **FR-020**: Every persisted or pre-calculated per-rule classification (shared colocated analysis, personal override, or bundled default — FR-012/FR-016/FR-018) MUST record whether it was assessed by a human (an explicit correction persisted via FR-013, including a maintainer's judgement for a well-known built-in rule) or produced automatically with no human review. This distinction MUST be inspectable wherever per-rule results are inspectable (FR-011). There is no separate hard-coded table of "known" classifications distinct from this persisted mechanism — the built-in ruleset's pre-curated classifications are persisted entries assessed by a human, like any other. +- **FR-021**: When a rule's definition changes after a human-assessed classification (FR-020) was captured for it, the system MUST continue to honor that classification rather than treating it as stale and falling back to automated analysis (contrast FR-014's handling of an automated classification under the same circumstance). The system MUST also surface a warning for that rule — in both JSON and human-readable output, at both the ruleset-analysis level (FR-011) and the per-violation level (FR-008) — identifying the rule and including both the fingerprint the classification was captured against and the rule's current fingerprint, so a user can tell the rule changed since a human last reviewed it even though the prior judgement is still being trusted. ### Key Entities *(include if feature involves data)* @@ -103,7 +105,9 @@ A new contributor or documentation reader should encounter "remediation safety" - **Risk Level**: One of `low`, `medium`, `high` — the analyser's estimate, independent of its confidence, of how likely the rule's minimal satisfying remediation is to alter consumer-facing contract surface (paths/channels, parameters, request/response or message schemas, security) versus only low-impact metadata. Distinct field, distinct type, distinct values from Remediation Safety Level. - **Confidence Level**: Describes how confident the analyser is in a rule's assigned Risk Level (e.g. driven by how well-known/recognizable the rule's id or function is versus how custom/ambiguous it is, or whether a human has explicitly confirmed it). A rule can be high-risk/low-confidence (e.g. a custom function targeting `$.paths`) or low-risk/high-confidence (e.g. a `truthy` check on `$.info.description`) — these are not the same axis, and the decision matrix treats them as such. - **Remediation Safety (per violation)**: The remediation safety level applied to a specific violation found during grading, derived from the ruleset analyser's result for that violation's rule. -- **Rule Fingerprint**: A stable identifier for "this exact rule definition," derived from a rule's content (`given`, `then.function`, `severity`, `description`), used to detect whether a pre-calculated/persisted entry for that `ruleId` is still valid for the rule as currently defined (FR-014), independent of where the ruleset as a whole is stored or fetched from. +- **Rule Fingerprint**: A stable identifier for "this exact rule definition," derived from a rule's content (`given`, `then.function`, `severity`, `description`), used to detect whether a pre-calculated/persisted entry for that `ruleId` is still valid for the rule as currently defined (FR-014), independent of where the ruleset as a whole is stored or fetched from. A mismatch invalidates an automated entry but not a human-assessed one (FR-020/FR-021) — for the latter it instead produces a Fingerprint Mismatch Warning. +- **Assessment Origin**: Whether a per-rule classification was produced by a human (an explicit, persisted correction, including a maintainer's judgement for a well-known built-in rule) or automatically with no human review (FR-020). Governs whether a Rule Fingerprint mismatch invalidates the classification (automated) or merely produces a Fingerprint Mismatch Warning (human). +- **Fingerprint Mismatch Warning**: Produced when a human-assessed classification's stored Rule Fingerprint no longer matches the rule's current one (FR-021). Carries the rule id and both fingerprint values (the one the classification was captured against, and the rule's current one), and is surfaced wherever that rule's classification is shown — ruleset-analysis output and per-violation remediation output alike — without changing the classification itself. - **Shared Ruleset Analysis**: A ruleset analysis (in full or in part) stored colocated with the ruleset itself via a deterministic naming convention (FR-016), readable by anyone who can read the ruleset (whether local file or GitHub-hosted, FR-017). This is the primary mechanism for a team/organisation to share one set of remediation-safety judgements instead of each person maintaining their own. - **Personal Ruleset Analysis Override**: A user-local correction (FR-018) that takes precedence over the Shared Ruleset Analysis and the automated analysis stages for the rule(s) it covers, without modifying the shared, colocated data. @@ -116,9 +120,10 @@ A new contributor or documentation reader should encounter "remediation safety" - **SC-003**: A repository-wide search for "quick fix" (any casing/separator) returns zero matches after the feature is complete. - **SC-004**: Existing `--remediation-safety safe` users observe no behavioral change in the set of violations returned, compared to before this feature. - **SC-005**: For an arbitrary, previously-unseen custom ruleset, the analyser completes and returns a risk and confidence level for 100% of its rules (no rule left unclassified). -- **SC-006**: A user-corrected remediation safety level for a rule in a given ruleset is honored (returned without re-running automated analysis for that rule) on a subsequent, separate invocation against the same rule definition, and is no longer honored once that specific rule's definition changes. +- **SC-006**: A user-corrected (human-assessed) remediation safety level for a rule in a given ruleset is honored (returned without re-running automated analysis for that rule) on a subsequent, separate invocation against the same rule definition, **and continues to be honored** even after that specific rule's definition changes — accompanied by a visible fingerprint-mismatch warning (FR-021) rather than being discarded. An automated (non-human-assessed) entry, by contrast, is no longer honored once the rule's definition changes. - **SC-007**: The built-in ruleset's analysis is available without any per-rule automated computation having to run at request time (served from a pre-calculated/bundled result), for both the CLI and MCP surfaces. - **SC-008**: Two different users pointed at the same ruleset location (local path or GitHub-hosted) see identical classifications for every rule covered by that ruleset's shared, colocated analysis, without either of them having separately configured it. +- **SC-009**: For every rule whose classification came from a human-assessed entry with a fingerprint mismatch, both the rule's stored fingerprint and its current fingerprint are visible in the output, in both JSON and human-readable form, at both the ruleset-analysis and per-violation surfaces. ## Assumptions @@ -130,6 +135,7 @@ A new contributor or documentation reader should encounter "remediation safety" - This feature does not change how a custom ruleset is supplied (file path, GitHub PAT, etc.) — only how its rules are risk-classified once available. - The primary persistence mechanism (FR-016/FR-017) colocates shared analysis data with the ruleset itself via a naming convention, rather than a separate per-user store — this is a deliberate choice to make sharing across a team/organisation the default, not an opt-in synchronization step. The existing workspace/global config scope (`RulesetConfig`/`RulesetResolution`) is retained, but narrowed to the Personal Ruleset Analysis Override role (FR-018) rather than being the primary persistence layer originally assumed. - Rule Fingerprinting is computed from individual rule content (e.g. a hash of `given`/`then.function`/`severity`/`description` for that `ruleId`), not from the ruleset as a whole and not from the path/URL the ruleset was supplied with — this gives per-rule staleness detection (one changed rule doesn't invalidate an entire shared analysis file) and survives the ruleset being relocated or re-fetched from a mirror. +- The built-in ruleset's well-known, previously hard-coded classifications (e.g. "this rule id is typically `safe`") are not a separate code-level table — per direct user feedback, they are ordinary bundled persisted entries (FR-012/FR-020) assessed by a human (a maintainer), reusing the same persistence/fingerprinting/warning mechanism (FR-013–FR-021) as a correction any other user would persist for any other ruleset. - The exact naming convention (FR-016) and file format are implementation details for planning, not fixed by this specification, but MUST satisfy: derivable from the ruleset's own path/URL alone (no separate index/registry to consult first), and human-readable/diffable enough to be code-reviewed when shared via a pull request. - Automatic write-back to a remote/GitHub-hosted ruleset location (FR-019) is explicitly out of scope for this feature — sharing a correction to such a location is a human action (a commit/PR), which the tool may assist by producing the content but does not perform itself. - "Persist a correction" (FR-013/FR-018) refers to the data being saved for reuse; *how* a user supplies that correction (a CLI flag, an MCP tool call, hand-editing the colocated/override file) is an implementation detail for planning, not fixed by this specification. From 7938ed1dc613e67248f25219f346f2bd2ce876bd Mon Sep 17 00:00:00 2001 From: DawMatt Date: Wed, 24 Jun 2026 22:14:46 +1000 Subject: [PATCH 07/22] Safety assessment planned and tasked --- CLAUDE.md | 2 +- specs/012-remediation-safety/tasks.md | 191 ++++++++++++++++++++++++++ 2 files changed, 192 insertions(+), 1 deletion(-) create mode 100644 specs/012-remediation-safety/tasks.md diff --git a/CLAUDE.md b/CLAUDE.md index 67e91d7..c2087e1 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -1,5 +1,5 @@ For additional context about technologies to be used, project structure, shell commands, and other important information, read the current plan -at specs/011-remediation-safety-rename/plan.md +at specs/012-remediation-safety/plan.md diff --git a/specs/012-remediation-safety/tasks.md b/specs/012-remediation-safety/tasks.md new file mode 100644 index 0000000..6322d74 --- /dev/null +++ b/specs/012-remediation-safety/tasks.md @@ -0,0 +1,191 @@ +--- + +description: "Task list for Feature 12: Remediation Safety (Ruleset Analyser & Multi-Level Safety)" +--- + +# Tasks: Remediation Safety (Ruleset Analyser & Multi-Level Safety) + +**Input**: Design documents from `/specs/012-remediation-safety/` + +**Prerequisites**: plan.md, spec.md, research.md, data-model.md, contracts/remediation-safety-surfaces.md, quickstart.md, `specs/algorithms/automated_remediation_safety_algorithm_spec.md` + +**Tests**: Included — Constitution Principle IV (Test-Driven Quality) and plan.md's Technical Context both mandate test coverage written alongside this feature's implementation. + +**Organization**: Tasks are grouped by user story (spec.md priorities P1/P2/P3) to enable independent implementation and testing of each story. + +## Format: `[ID] [P?] [Story] Description` + +- **[P]**: Can run in parallel (different files, no dependencies) +- **[Story]**: Which user story this task belongs to (US1/US2/US3) +- Exact file paths are given in every description + +--- + +## Phase 1: Setup + +No new project scaffolding is required — this feature extends the existing `packages/api-grade-core`, `packages/api-grade-mcp`, and `src/cli` workspaces. Setup work is folded into Phase 2 (the rename/replacement of `quick-fixes.ts` is itself the foundational setup for this feature). + +--- + +## Phase 2: Foundational (Blocking Prerequisites) + +**Purpose**: The ruleset analyser engine (Stages 1–2 of `automated_remediation_safety_algorithm_spec.md`) and its supporting types. Both User Story 1 (filtering) and User Story 2 (inspection) call the same `analyseRuleset()`/`getRemediationSafety()` functions, so this must exist before either story's surfaces can be built. + +**⚠️ CRITICAL**: No user story work can begin until this phase is complete. + +- [ ] T001 Replace `ViolationClass`/`QuickFix`/`QuickFixOutput` with the new type set in `packages/api-grade-core/src/types.ts`: add `RemediationSafetyLevel` (`"safe"|"humanreview"|"unsafe"`), `RiskLevel` (`"low"|"medium"|"high"`), `ConfidenceLevel` (`"high"|"medium"|"low"`), `AssessmentOrigin` (`"human"|"automated"`), `AnalysisSource` (`"persisted"|"bundled-default"|"heuristic"|"fallback"`), `RuleAnalysis`, `RulesetAnalysis`, `RemediationItem` (was `QuickFix`, with new `riskLevel`/`confidenceLevel`/`remediationSafetyLevel`/`staleFingerprintWarning` fields), `RemediationSafetyOutput` (was `QuickFixOutput`, with `remediationItemCount`/`remediationItems`/`requestedLevel`) — per data-model.md +- [ ] T002 [P] Write failing unit tests for `analyseRuleset()` Stage 1/2 heuristics in `packages/api-grade-core/tests/unit/remediation-safety.test.ts` (new file, replaces `quick-fixes.test.ts`): key-selector check (1a), additive/rename/custom function-mechanics classification (1b), generic segment fallback (1c), Stage 2 whole-document fallback, and the decision matrix table from research.md §3 — depends on T001 +- [ ] T003 Implement `analyseRuleset(loadedRuleset: LoadedRuleset): RulesetAnalysis` in `packages/api-grade-core/src/remediation-safety.ts` (new file) implementing Stages 1–2 of `specs/algorithms/automated_remediation_safety_algorithm_spec.md` (key-selector check, function-mechanics classification with extended AsyncAPI segment tiers, generic segment fallback, whole-document fallback, and the shared decision matrix) so T002 passes — depends on T002 +- [ ] T004 Implement `getRemediationSafety(diagnostic: Diagnostic, rulesetAnalysis: RulesetAnalysis): { riskLevel, confidenceLevel, remediationSafetyLevel, staleFingerprintWarning }` in `packages/api-grade-core/src/remediation-safety.ts`, including the FR-009 lookup-miss default (`riskLevel: "high"`, `confidenceLevel: "low"`, `remediationSafetyLevel: "unsafe"`) — depends on T003 +- [ ] T005 Implement `buildRemediationItem()`, `buildRemediationSafetyOutput()`, `formatRemediationSafetyHuman()` in `packages/api-grade-core/src/remediation-safety.ts`, replacing `buildQuickFix()`/`buildQuickFixOutput()`/`formatQuickFixesHuman()` — filters by `remediationSafetyLevel` against a requested level, preserving FR-007 (`safe` membership unchanged) — depends on T004 +- [ ] T006 Delete `packages/api-grade-core/src/quick-fixes.ts` and `packages/api-grade-core/tests/unit/quick-fixes.test.ts`; update `packages/api-grade-core/src/index.ts` to remove the `quick-fixes.js` export line and the `QuickFix`/`ViolationClass`/`QuickFixOutput` type exports, replacing them with `analyseRuleset`, `getRemediationSafety`, `buildRemediationItem`, `buildRemediationSafetyOutput`, `formatRemediationSafetyHuman` and the new types from T001 — depends on T005 + +**Checkpoint**: `analyseRuleset()`/`getRemediationSafety()` exist, are unit-tested, and are exported from `@dawmatt/api-grade-core`. User story work can now begin. + +--- + +## Phase 3: User Story 1 - Developer sees a risk-graded remediation plan (Priority: P1) 🎯 MVP + +**Goal**: `--remediation-safety` (CLI) and the `grade-api-remediation-safety` MCP tool accept and filter on all three levels (`safe`/`humanreview`/`unsafe`), with `riskLevel`/`confidenceLevel`/`remediationSafetyLevel`/`staleFingerprintWarning` visible on every returned item, in both JSON and human output. + +**Independent Test**: Grade a sample spec with violations spanning all three categories; confirm `--remediation-safety safe|humanreview|unsafe` each return the expected, correctly-labeled subset, and `safe` output is unchanged from pre-feature behavior. + +### Tests for User Story 1 + +- [ ] T007 [P] [US1] Integration test for the extended `--remediation-safety` CLI flag (accepts `safe`/`humanreview`/`unsafe`, rejects other values with the 3-value error message, `safe` membership unchanged) in `tests/integration/cli-remediation-safety.test.ts` (new file, replaces `tests/integration/cli-quick-fixes.test.ts`) +- [ ] T008 [P] [US1] Integration test for the `grade-api-remediation-safety` MCP tool's extended `level` enum and `RemediationSafetyOutput` response shape (`remediationItemCount`, `remediationItems`, `requestedLevel`, per-item `riskLevel`/`confidenceLevel`/`remediationSafetyLevel`/`staleFingerprintWarning`) in `packages/api-grade-mcp/tests/integration/remediation-safety.test.ts` (new file, replaces `packages/api-grade-mcp/tests/integration/quick-fixes-only.test.ts`) + +### Implementation for User Story 1 + +- [ ] T009 [US1] Update `src/cli/index.ts`: extend the `--remediation-safety ` option to accept `safe|humanreview|unsafe`, update the rejection error message to `Error: --remediation-safety must be one of: safe, humanreview, unsafe.`, load the ruleset via `analyseRuleset()`, and call `buildRemediationSafetyOutput()`/`formatRemediationSafetyHuman()` in place of the removed `buildQuickFixOutput()`/`formatQuickFixesHuman()` (lines ~14-15, ~80, ~116, ~183-184) — depends on T006, makes T007 pass +- [ ] T010 [US1] Rename `packages/api-grade-mcp/src/tools/quick-fixes-only.ts` to `packages/api-grade-mcp/src/tools/remediation-safety.ts`: rename `registerQuickFixesOnlyTool` to `registerRemediationSafetyTool`, extend the `level` Zod enum to `['safe', 'humanreview', 'unsafe']`, call `analyseRuleset()` + `buildRemediationSafetyOutput()` instead of `buildQuickFixOutput()`, and update the tool description per `contracts/remediation-safety-surfaces.md` (mention all three levels and the confidence indicator) — depends on T006, makes T008 pass +- [ ] T011 [US1] Update `packages/api-grade-mcp/src/server.ts`: replace the `registerQuickFixesOnlyTool` import/registration (lines 8, 30) with `registerRemediationSafetyTool` from `./tools/remediation-safety.js` — depends on T010 +- [ ] T012 [US1] Update `packages/api-grade-mcp/src/utils/classify.ts`: replace the `classifyViolation`/`buildQuickFix`/`QuickFix`/`ViolationClass` re-exports with `analyseRuleset`/`getRemediationSafety` and the `RuleAnalysis`/`RemediationSafetyLevel`/`RiskLevel`/`ConfidenceLevel` type re-exports from `@dawmatt/api-grade-core` — depends on T006 + +**Checkpoint**: `--remediation-safety`/`grade-api-remediation-safety` fully support all three levels end-to-end (CLI + MCP, JSON + human), with `safe` behavior unchanged (FR-007, SC-001, SC-004). + +--- + +## Phase 4: User Story 2 - Ruleset maintainer trusts the analyser via confidence + persistence (Priority: P2) + +**Goal**: The analyser's per-rule output (risk, confidence, remediation safety, rationale, `assessedBy`) is inspectable independent of grading (`ruleset-analysis` CLI subcommand, `analyse-ruleset-safety` MCP tool); classifications can be persisted (bundled default, shared colocated, personal override) and reloaded automatically, with fingerprint-staleness handling that honors human-assessed entries across rule edits. + +**Independent Test**: Run the ruleset analyser against the built-in ruleset and a custom ruleset with unrecognizable rules; confirm every rule gets risk/confidence/safety/rationale, low-confidence on unrecognized rules, and that persisting a correction is honored — including after the rule's definition changes, with a visible fingerprint-mismatch warning. + +### Implementation for User Story 2 + +- [ ] T013 [P] [US2] Implement `RuleFingerprint` computation (hash over a rule's `ruleId`/`given`/`then.function`/`severity`/`description`) in `packages/api-grade-core/src/remediation-safety.ts` per data-model.md `RuleFingerprint` — depends on T006 +- [ ] T014 [P] [US2] Implement colocated `SharedRulesetAnalysis` read/write for local rulesets (deterministic filename derived from the ruleset's own path, e.g. sibling file with a fixed suffix) in `packages/api-grade-core/src/config/shared-ruleset-analysis.ts` (new file) per data-model.md `SharedRulesetAnalysis` — depends on T013 +- [ ] T015 [US2] Extend `packages/api-grade-core/src/config/shared-ruleset-analysis.ts` to read (never write) the colocated `SharedRulesetAnalysis` for a GitHub-hosted ruleset by reusing `resolveRuleset`/`fetchRulesetContent` (`packages/api-grade-core/src/config/resolve-ruleset.ts`, `packages/api-grade-core/src/auth/github.ts`) with the same `AuthConfig` already supplied for the ruleset itself (FR-017, FR-019) — depends on T014 +- [ ] T016 [P] [US2] Implement `PersonalRulesetAnalysisOverride` storage (workspace/global scope, same precedence as `RulesetConfig`) in `packages/api-grade-core/src/config/personal-ruleset-override.ts` (new file), reusing the `loadConfig`/`saveConfig`/`getWorkspaceConfigPath`/`getGlobalConfigPath` pattern from `packages/api-grade-core/src/config/ruleset-config.ts` — depends on T013 +- [ ] T017 [US2] Author `BundledRulesetAnalysis` for the built-in OpenAPI and AsyncAPI rulesets in `packages/api-grade-core/src/rulesets/bundled-analysis/openapi.json` and `.../asyncapi.json` (new files): migrate the former `RULE_ID_NON_BREAKING_PREFIXES`-style curated mappings (e.g. `operation-description` → `safe`, `operation-operationId` → `humanreview`) into `assessedBy: "human"` entries with maintainer-authored rationale, per research.md §3/§8 and FR-012/FR-020 — depends on T013 +- [ ] T018 [US2] Wire Stage 0 lookup precedence into `analyseRuleset()` in `packages/api-grade-core/src/remediation-safety.ts`: workspace `PersonalRulesetAnalysisOverride` → global `PersonalRulesetAnalysisOverride` → colocated `SharedRulesetAnalysis` → `BundledRulesetAnalysis` (built-in ruleset only) → fall through to Stages 1–2; implement fingerprint-staleness handling (an `assessedBy: "automated"` entry with a stale fingerprint is treated as not found and falls through; an `assessedBy: "human"` entry with a stale fingerprint is still used, with `staleFingerprintWarning` populated) — depends on T014, T015, T016, T017 +- [ ] T019 [US2] Implement persisting a correction (FR-013/FR-018/FR-019) — a function that writes an `assessedBy: "human"`, `confidenceLevel: "high"` entry to the colocated `SharedRulesetAnalysis` for a writable/local ruleset, or to the workspace-scoped `PersonalRulesetAnalysisOverride` when the ruleset's location is not writable (e.g. GitHub-hosted) — in `packages/api-grade-core/src/remediation-safety.ts` — depends on T018 +- [ ] T020 [P] [US2] Update `packages/api-grade-core/src/index.ts` to export the new Stage 0/persistence symbols and types from T013-T019 (`SharedRulesetAnalysis`, `PersonalRulesetAnalysisOverride`, `BundledRulesetAnalysis`, `RuleFingerprint`, `AssessmentOrigin`, `AnalysisSource`, and the persist-correction function) — depends on T019 +- [ ] T021 [P] [US2] Unit tests for Stage 0 precedence, fingerprint staleness (automated-discarded vs. human-honored-with-warning), and persisting a correction in `packages/api-grade-core/tests/unit/remediation-safety.test.ts` (extends T002's file) covering SC-006, SC-008, SC-009 — depends on T020 +- [ ] T022 [P] [US2] New CLI subcommand `ruleset-analysis [--ruleset-path ] [--format json|human]` in `src/cli/ruleset-analysis-cli.ts` (new file, mirrors `src/cli/ruleset-config-cli.ts`), registered in `src/cli/index.ts`; `--format human` prints rule id, risk level, confidence level, remediation safety level, assessed by, rationale, and any fingerprint-mismatch warning per quickstart.md §2 — depends on T020 +- [ ] T023 [P] [US2] Add a `correct` action to `src/cli/ruleset-analysis-cli.ts` for persisting a correction (FR-013), e.g. `api-grade ruleset-analysis correct --rule-id --level [--ruleset-path ]`, calling the persist function from T019 — depends on T019, T022 +- [ ] T024 [P] [US2] New MCP tool `analyse-ruleset-safety` in `packages/api-grade-mcp/src/tools/analyse-ruleset-safety.ts` (new file, mirrors `packages/api-grade-mcp/src/tools/get-ruleset-config.ts`), input `{ rulesetPath?: string, recoveryOption?: ... }`, output a `RulesetAnalysis` JSON document, reusing the `resolveRuleset`/`RulesetAuthError`/`mcpError` flow already used by `grade-api-remediation-safety`; register it in `packages/api-grade-mcp/src/server.ts` — depends on T020 +- [ ] T025 [P] [US2] Integration test for the `ruleset-analysis` CLI subcommand (human + json format, fingerprint-mismatch warning display, `correct` action) in `tests/integration/cli-remediation-safety.test.ts` — depends on T022, T023 +- [ ] T026 [P] [US2] Integration test for the `analyse-ruleset-safety` MCP tool in `packages/api-grade-mcp/tests/integration/analyse-ruleset-safety.test.ts` (new file) — depends on T024 +- [ ] T027 [US2] Verify `staleFingerprintWarning` is threaded through `RemediationItem`/`RemediationSafetyOutput` (built in T004/T005) into the CLI (`--remediation-safety`, T009) and MCP (`grade-api-remediation-safety`, T010) human + JSON output, satisfying FR-021/SC-009 at the per-violation surface, not just the ruleset-analysis surface — depends on T018, T009, T010 + +**Checkpoint**: `ruleset-analysis`/`analyse-ruleset-safety` expose full per-rule analysis with confidence and provenance; persisted corrections (shared, personal, bundled) are loaded automatically and survive rule edits when human-assessed (FR-011 through FR-021, SC-002, SC-005 through SC-009). + +--- + +## Phase 5: User Story 3 - "Quick fixes" terminology fully removed (Priority: P3) + +**Goal**: No source, test, type/function/tool name, package metadata, or current documentation references "quick fix" in any casing/separator style (historical `CHANGELOG.md`/`GOAL.md` entries excluded). + +**Independent Test**: `grep -rniE "quick.?fix"` across the repository (excluding historical changelog/goal entries) returns zero matches. + +### Implementation for User Story 3 + +- [ ] T028 [P] [US3] Update `docs/cli/commands.md`: document the 3-level `--remediation-safety` reference and the new `ruleset-analysis` subcommand +- [ ] T029 [P] [US3] Update `docs/mcp/quick-start.md`: document the renamed/extended `grade-api-remediation-safety` tool and the new `analyse-ruleset-safety` tool +- [ ] T030 [P] [US3] Update `docs/package/api-grade-mcp.md`: tool reference updates for both tools above +- [ ] T031 [P] [US3] Update `docs/package/README.md`: remove remaining "quick fix" mentions +- [ ] T032 [P] [US3] Update `docs/package/api-reference.md`: document the new core API (`analyseRuleset`, `getRemediationSafety`, `RuleAnalysis`, `RulesetAnalysis`, `RemediationItem`, `RemediationSafetyOutput`, etc.) in place of the removed `QuickFix`/`QuickFixOutput`/`classifyViolation` +- [ ] T033 [P] [US3] Update `docs/index.md`: remove remaining "quick fix" mentions +- [ ] T034 [P] [US3] Update `docs/getting-started.md`: update the tool list mention +- [ ] T035 [P] [US3] Update `packages/api-grade-mcp/README.md`: tool table update (`grade-api-remediation-safety`, `analyse-ruleset-safety`) +- [ ] T036 [P] [US3] Update `CONTRIBUTING.md`: correct the package/tool table entry that still names the pre-Feature-11 tool +- [ ] T037 [US3] Run `grep -rniE "quick.?fix" --include="*.ts" --include="*.md" src/ packages/api-grade-core/src packages/api-grade-mcp/src packages/api-grade-core/tests packages/api-grade-mcp/tests tests/ docs/ packages/api-grade-mcp/README.md CONTRIBUTING.md` (per quickstart.md §4) and fix any remaining matches until it returns zero (SC-003) — depends on T009-T036 + +**Checkpoint**: SC-003 satisfied — zero "quick fix" references remain anywhere in current source, tests, or documentation. + +--- + +## Phase 6: Polish & Cross-Cutting Concerns + +**Purpose**: Final validation across all three stories. + +- [ ] T038 [P] Add a new `CHANGELOG.md` entry for this feature (do not modify historical entries) +- [ ] T039 Run `vitest run` across all workspaces, `tsc --noEmit`, and lint; fix any failures +- [ ] T040 Manually walk through `quickstart.md` end-to-end (all 4 sections) against a real local ruleset and a GitHub-hosted ruleset to confirm SC-001 through SC-009 + +--- + +## Dependencies & Execution Order + +### Phase Dependencies + +- **Foundational (Phase 2)**: No dependencies — start immediately. BLOCKS all user stories. +- **User Story 1 (Phase 3)**: Depends on Foundational completion only. +- **User Story 2 (Phase 4)**: Depends on Foundational completion only — independently testable from US1, though T027 also touches US1's CLI/MCP surfaces to thread `staleFingerprintWarning` through. +- **User Story 3 (Phase 5)**: Depends on US1 and US2 implementation tasks being complete (T009–T027) so the docs/grep sweep has nothing left to rename. +- **Polish (Phase 6)**: Depends on all prior phases. + +### Within Each Phase + +- Tests before the implementation tasks they validate (T002 before T003; T007/T008 before T009-T012; T021 before nothing further, written test-after here since it covers integration of T013-T020). +- Types (T001) before engine (T003) before lookup (T004) before output builders (T005) before deletion/export cleanup (T006). + +### Parallel Opportunities + +- T002 has no code dependency on other Phase 2 tasks besides T001, but is sequenced before T003 (TDD). +- Within US1: T007 and T008 in parallel; T009-T012 are mostly sequential (T011 depends on T010; T012 is independent of T009-T011 and can run in parallel with them). +- Within US2: T013, T016 in parallel; T014→T015 sequential; T017 parallel with T013-T016; T020-T026 have mixed [P] markers as marked above. +- All of US3 (T028-T036) can run in parallel — different files; T037 must run last. + +--- + +## Parallel Example: Foundational Phase + +```bash +# T002 depends on T001 only: +Task: "Write failing unit tests for analyseRuleset() in packages/api-grade-core/tests/unit/remediation-safety.test.ts" +``` + +## Parallel Example: User Story 3 + +```bash +# All documentation files are independent — launch together: +Task: "Update docs/cli/commands.md" +Task: "Update docs/mcp/quick-start.md" +Task: "Update docs/package/api-grade-mcp.md" +Task: "Update docs/package/README.md" +Task: "Update docs/package/api-reference.md" +Task: "Update docs/index.md" +Task: "Update docs/getting-started.md" +Task: "Update packages/api-grade-mcp/README.md" +Task: "Update CONTRIBUTING.md" +``` + +--- + +## Implementation Strategy + +### MVP First (User Story 1 Only) + +1. Complete Phase 2: Foundational (analyser engine, types, rename). +2. Complete Phase 3: User Story 1 — three-level filtering in CLI + MCP. +3. **STOP and VALIDATE**: Run `tests/integration/cli-remediation-safety.test.ts` and `packages/api-grade-mcp/tests/integration/remediation-safety.test.ts`; confirm `safe` output is byte-for-byte unchanged from pre-feature behavior (SC-004). +4. This is a usable, demoable increment: developers can already triage by all three levels, even before confidence/persistence/inspection (US2) or the terminology cleanup (US3) land. + +### Incremental Delivery + +1. Foundational → US1 (MVP, three-level filtering) → US2 (confidence, inspection, persistence) → US3 (terminology cleanup) → Polish. +2. US2 can be developed in parallel with US1 by a second contributor once Foundational is done, since both consume but don't modify each other's surfaces — except T027, which touches US1's files and must land after both T009/T010 (US1) and T018 (US2) exist. +3. US3 is intentionally last: it depends on the renamed/new surfaces from US1/US2 actually existing before the documentation and grep sweep can be final. From 801f05067c45b3569ffff987a2aec7b542fdbe99 Mon Sep 17 00:00:00 2001 From: DawMatt Date: Wed, 24 Jun 2026 23:01:27 +1000 Subject: [PATCH 08/22] Safety assessment initial implementation --- .../src/tools/{quick-fixes-only.ts => remediation-safety.ts} | 0 1 file changed, 0 insertions(+), 0 deletions(-) rename packages/api-grade-mcp/src/tools/{quick-fixes-only.ts => remediation-safety.ts} (100%) diff --git a/packages/api-grade-mcp/src/tools/quick-fixes-only.ts b/packages/api-grade-mcp/src/tools/remediation-safety.ts similarity index 100% rename from packages/api-grade-mcp/src/tools/quick-fixes-only.ts rename to packages/api-grade-mcp/src/tools/remediation-safety.ts From b8fd101d2fd3dd886a7252bba33b91178c4134d6 Mon Sep 17 00:00:00 2001 From: DawMatt Date: Wed, 24 Jun 2026 23:02:09 +1000 Subject: [PATCH 09/22] Safety assessment initial implementation --- CHANGELOG.md | 23 + CONTRIBUTING.md | 2 +- docs/cli/commands.md | 80 ++- docs/getting-started.md | 2 +- docs/index.md | 4 +- docs/mcp/quick-start.md | 5 +- docs/package/README.md | 2 +- docs/package/api-grade-mcp.md | 14 +- docs/package/api-reference.md | 78 ++- eslint.config.mjs | 2 +- packages/api-grade-core/package.json | 1 + .../scripts/copy-bundled-analysis.mjs | 10 + .../scripts/generate-bundled-analysis.mjs | 108 ++++ .../src/config/personal-ruleset-override.ts | 46 ++ .../src/config/shared-ruleset-analysis.ts | 56 ++ packages/api-grade-core/src/index.ts | 51 +- packages/api-grade-core/src/quick-fixes.ts | 169 ----- .../api-grade-core/src/remediation-safety.ts | 582 ++++++++++++++++++ .../src/rulesets/bundled-analysis.ts | 31 + .../rulesets/bundled-analysis/asyncapi.json | 169 +++++ .../rulesets/bundled-analysis/openapi.json | 169 +++++ packages/api-grade-core/src/types.ts | 64 +- .../tests/unit/quick-fixes.test.ts | 114 ---- .../tests/unit/remediation-safety.test.ts | 358 +++++++++++ packages/api-grade-mcp/README.md | 5 +- packages/api-grade-mcp/src/server.ts | 6 +- .../src/tools/analyse-ruleset-safety.ts | 106 ++++ .../src/tools/remediation-safety.ts | 21 +- packages/api-grade-mcp/src/utils/classify.ts | 4 +- .../analyse-ruleset-safety.test.ts | 38 ++ ...nly.test.ts => remediation-safety.test.ts} | 31 +- .../api-grade-mcp/tests/unit/classify.test.ts | 70 +-- specs/012-remediation-safety/quickstart.md | 6 +- specs/012-remediation-safety/tasks.md | 80 +-- src/cli/index.ts | 29 +- src/cli/ruleset-analysis-cli.ts | 121 ++++ tests/integration/cli-quick-fixes.test.ts | 100 --- .../cli-remediation-safety.test.ts | 188 ++++++ tests/unit/ruleset-analysis-cli.test.ts | 100 +++ 39 files changed, 2495 insertions(+), 550 deletions(-) create mode 100644 packages/api-grade-core/scripts/copy-bundled-analysis.mjs create mode 100644 packages/api-grade-core/scripts/generate-bundled-analysis.mjs create mode 100644 packages/api-grade-core/src/config/personal-ruleset-override.ts create mode 100644 packages/api-grade-core/src/config/shared-ruleset-analysis.ts delete mode 100644 packages/api-grade-core/src/quick-fixes.ts create mode 100644 packages/api-grade-core/src/remediation-safety.ts create mode 100644 packages/api-grade-core/src/rulesets/bundled-analysis.ts create mode 100644 packages/api-grade-core/src/rulesets/bundled-analysis/asyncapi.json create mode 100644 packages/api-grade-core/src/rulesets/bundled-analysis/openapi.json delete mode 100644 packages/api-grade-core/tests/unit/quick-fixes.test.ts create mode 100644 packages/api-grade-core/tests/unit/remediation-safety.test.ts create mode 100644 packages/api-grade-mcp/src/tools/analyse-ruleset-safety.ts create mode 100644 packages/api-grade-mcp/tests/integration/analyse-ruleset-safety.test.ts rename packages/api-grade-mcp/tests/integration/{quick-fixes-only.test.ts => remediation-safety.test.ts} (77%) create mode 100644 src/cli/ruleset-analysis-cli.ts delete mode 100644 tests/integration/cli-quick-fixes.test.ts create mode 100644 tests/integration/cli-remediation-safety.test.ts create mode 100644 tests/unit/ruleset-analysis-cli.test.ts diff --git a/CHANGELOG.md b/CHANGELOG.md index 08cf521..d80b68e 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -7,6 +7,29 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ## [Unreleased] +### Added + +- A ruleset analyser (`analyseRuleset()`) that assigns every rule in a loaded + ruleset a risk level (`low`/`medium`/`high`), a confidence level + (`high`/`medium`/`low`), and a derived remediation safety level + (`safe`/`humanreview`/`unsafe`), with provenance and a human-readable + rationale. See + [automated_remediation_safety_algorithm_spec.md](specs/algorithms/automated_remediation_safety_algorithm_spec.md). +- `--remediation-safety ` (CLI) and the `grade-api-remediation-safety` + MCP tool's `level` parameter now accept all three levels — `safe`, + `humanreview`, and `unsafe` — instead of only `safe`. Every returned item now + also carries `riskLevel`, `confidenceLevel`, `remediationSafetyLevel`, and + `staleFingerprintWarning`. `safe` membership is unchanged from prior + behavior. +- A new CLI subcommand, `ruleset-analysis [--ruleset-path ] [--format + json|human]`, and a new MCP tool, `analyse-ruleset-safety`, expose the + analyser's output independent of grading any specific spec. +- `ruleset-analysis correct --rule-id --level ` persists a + human-confirmed correction for one rule, colocated with the ruleset (or as a + personal override when the ruleset's location isn't locally writable), and + reloaded automatically on future runs against the same ruleset — including by + teammates pointed at the same shared ruleset. + ### Changed - **Breaking**: the CLI's `--quick-fixes-only` flag is renamed to diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 11132b3..dc13d7c 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -68,7 +68,7 @@ tests/ |---------|------|-------------| | `@dawmatt/api-grade` | `/` (root) | CLI tool (`api-grade` binary) | | `@dawmatt/api-grade-core` | `packages/api-grade-core/` | Standalone grading library used by all other packages | -| `@dawmatt/api-grade-mcp` | `packages/api-grade-mcp/` | MCP server exposing six AI tools (`grade-api`, `grade-api-detailed`, `assert-api-grade`, `grade-api-quick-fixes-only`, `set-ruleset-config`, `get-ruleset-config`) | +| `@dawmatt/api-grade-mcp` | `packages/api-grade-mcp/` | MCP server exposing seven AI tools (`grade-api`, `grade-api-detailed`, `assert-api-grade`, `grade-api-remediation-safety`, `analyse-ruleset-safety`, `set-ruleset-config`, `get-ruleset-config`) | | `@dawmatt/backstage-plugin-api-grade` | `packages/backstage-plugin-api-grade/` | Backstage frontend card plugin | | `@dawmatt/backstage-plugin-api-grade-backend` | `packages/backstage-plugin-api-grade-backend/` | Backstage backend grading plugin | diff --git a/docs/cli/commands.md b/docs/cli/commands.md index 137c13e..f4038b9 100644 --- a/docs/cli/commands.md +++ b/docs/cli/commands.md @@ -26,7 +26,7 @@ api-grade [options] | `--token ` | GitHub Personal Access Token used to authenticate a remote ruleset fetch (only consulted when `--auth-type github-pat`) | | `--format ` | Output format: `human` (default) or `json` | | `--top ` | Show only the top N diagnostics (useful for large specs) | -| `--remediation-safety ` | Filter diagnostics to the given remediation safety level (currently: `safe`) | +| `--remediation-safety ` | Filter diagnostics to the given remediation safety level: `safe`, `humanreview`, or `unsafe` | | `--verbose` | Print the full error stack when a runtime error occurs | | `-V, --version` | Print the version number | | `-h, --help` | Show usage information | @@ -365,16 +365,24 @@ parser works for both: ## Remediation Safety (`--remediation-safety `) -`--remediation-safety safe` filters diagnostics down to the non-breaking, -safely-automatable subset — the same classification used by the MCP server's -`grade-api-remediation-safety` tool. It is a *filter*, independent of `--format`, so it -works with either output format. Only `safe` is accepted today; any other value is -rejected with a non-zero exit code. +`--remediation-safety ` filters diagnostics down to one of three remediation-safety +levels — the same classification used by the MCP server's `grade-api-remediation-safety` +tool — and is computed by the ruleset analyser (see `ruleset-analysis` below). It is a +*filter*, independent of `--format`, so it works with either output format. + +| Level | Meaning | +|---|---| +| `safe` | Non-breaking, safe to auto-apply without per-change human review | +| `humanreview` | Typically additive/clarifying, but should be confirmed by a human before applying at scale | +| `unsafe` | Could change request/response validation, required fields, types, or the parameter surface — requires human (or explicitly-confirmed agent) review | + +Any other value is rejected with `Error: --remediation-safety must be one of: safe, +humanreview, unsafe.` and a non-zero exit code. **Machine-readable:** ```bash -api-grade openapi.yaml --remediation-safety safe --format json +api-grade openapi.yaml --remediation-safety humanreview --format json ``` ```json @@ -382,21 +390,32 @@ api-grade openapi.yaml --remediation-safety safe --format json "specPath": "openapi.yaml", "format": "openapi-3", "totalViolations": 22, - "quickFixCount": 3, - "quickFixes": [ + "requestedLevel": "humanreview", + "remediationItemCount": 3, + "remediationItems": [ { - "ruleId": "info-contact", - "message": "Info object must have \"contact\" object.", + "ruleId": "operation-operationId", + "message": "Operation must have \"operationId\".", "severity": "warn", - "path": ["info"], - "location": "info", + "path": ["paths", "/pets", "get"], + "location": "paths./pets.get", "currentValue": null, - "expectedImprovement": "Add a `contact` object to the info block with name, email, or url" + "expectedImprovement": "Fix: Operation must have \"operationId\". Add or update `operationId` as required", + "riskLevel": "medium", + "confidenceLevel": "high", + "remediationSafetyLevel": "humanreview", + "staleFingerprintWarning": null } ] } ``` +Each item carries `riskLevel` (`low`/`medium`/`high`) and `confidenceLevel` +(`high`/`medium`/`low`) alongside `remediationSafetyLevel` — the field +`--remediation-safety`/`requestedLevel` filters against — and a `staleFingerprintWarning` +that is non-null only when a human-assessed rule classification's underlying rule +definition has since changed (see `ruleset-analysis` below). + **Human-readable** (default, or with `--format human`): ```bash @@ -405,11 +424,42 @@ api-grade openapi.yaml --remediation-safety safe Prints the same filtered list as readable text instead of JSON. -`--remediation-safety safe` has no effect on `--min-grade` — the gate still evaluates the +`--remediation-safety ` has no effect on `--min-grade` — the gate still evaluates the spec's actual letter grade from the full, unfiltered diagnostics. --- +## Ruleset Analysis (`ruleset-analysis`) + +Inspects a ruleset's per-rule remediation-safety analysis independent of grading any spec: + +```bash +api-grade ruleset-analysis --format human +api-grade ruleset-analysis --ruleset-path ./my-ruleset.yaml --format json +``` + +`--format human` (default) prints a table with rule id, risk level, confidence level, +remediation safety level, assessed by (`human`/`automated`), and rationale — plus a +fingerprint-mismatch warning line for any human-assessed rule whose underlying definition +has changed since it was last reviewed. `--format json` returns the full `RulesetAnalysis` +document. Without `--ruleset-path`, analyses the built-in ruleset. + +To persist a human-confirmed correction for one rule (reloaded automatically on future runs +against the same ruleset): + +```bash +api-grade ruleset-analysis correct --rule-id operation-operationId --level safe \ + --ruleset-path ./my-ruleset.yaml +``` + +For a local ruleset, this writes a colocated `.remediation-safety.json` file next +to the ruleset (commit it so your team shares the same judgements). For a non-writable +ruleset location (e.g. a GitHub-hosted ruleset, or the built-in ruleset), the correction is +recorded locally as a personal override instead, and the equivalent shared-file content is +printed for you to commit yourself. + +--- + ## Structured `--min-grade` Outcome in JSON Mode When `--min-grade ` is combined with `--format json`, the CLI prints a diff --git a/docs/getting-started.md b/docs/getting-started.md index 1398a7c..21b5b6e 100644 --- a/docs/getting-started.md +++ b/docs/getting-started.md @@ -55,7 +55,7 @@ Two Backstage plugin packages that display API grades directly on your Backstage ### MCP Server (`@dawmatt/api-grade-mcp`) -An MCP (Model Context Protocol) server that exposes api-grade as six AI tools: `grade-api`, `grade-api-detailed`, `assert-api-grade`, `grade-api-remediation-safety`, `set-ruleset-config`, and `get-ruleset-config`. Register it in Claude Code, GitHub Copilot (VS Code Agent mode), or any MCP-compatible AI host and let the AI grade specs directly. +An MCP (Model Context Protocol) server that exposes api-grade as seven AI tools: `grade-api`, `grade-api-detailed`, `assert-api-grade`, `grade-api-remediation-safety`, `analyse-ruleset-safety`, `set-ruleset-config`, and `get-ruleset-config`. Register it in Claude Code, GitHub Copilot (VS Code Agent mode), or any MCP-compatible AI host and let the AI grade specs directly. ```bash claude mcp add api-grade -- npx -y @dawmatt/api-grade-mcp diff --git a/docs/index.md b/docs/index.md index a0300d9..e6f3509 100644 --- a/docs/index.md +++ b/docs/index.md @@ -15,9 +15,9 @@ | [Package Usage Guide](package/usage-guide.md) | Common integration patterns and worked examples | | [Package API Reference](package/api-reference.md) | All exported functions, classes, and types | | [API Diagnostic Algorithm Specification](../specs/algorithms/api_diagnostic_algorithm_spec.md) | How scores, grades, and recommendations are computed | -| [Quick-Fixes Algorithm Specification](../specs/algorithms/quick_fixes_algorithm_spec.md) | How non-breaking, safely-automatable violations are identified | +| [Automated Remediation Safety Algorithm Specification](../specs/algorithms/automated_remediation_safety_algorithm_spec.md) | How risk, confidence, and remediation safety are determined per rule | | [MCP Server](mcp/README.md) | Grade specs from AI tools via MCP | -| [MCP Server Overview](package/api-grade-mcp.md) | All six MCP tools and their inputs/outputs | +| [MCP Server Overview](package/api-grade-mcp.md) | All MCP tools and their inputs/outputs | | [MCP Quick Start](mcp/quick-start.md) | Install and configure the MCP server in minutes | | [MCP Configuration Reference](mcp/configuration.md) | Default rulesets, auth, and scope precedence | | [MCP GitHub Token Setup](mcp/github-pat-setup.md) | One-time GitHub PAT creation for `github-pat` ruleset auth | diff --git a/docs/mcp/quick-start.md b/docs/mcp/quick-start.md index 9d1d4c0..fee6505 100644 --- a/docs/mcp/quick-start.md +++ b/docs/mcp/quick-start.md @@ -175,7 +175,8 @@ Reload Cursor after saving. | `grade-api` | Quick grade: letter grade, numeric score, and summary | | `grade-api-detailed` | Full grade with all violations, diagnostics, and recommendations | | `assert-api-grade` | Pass/fail assertion for a minimum grade threshold | -| `grade-api-remediation-safety` | Classified list of diagnostics filtered by remediation safety level (`safe`: non-breaking improvements) for AI-assisted correction | +| `grade-api-remediation-safety` | Classified list of diagnostics filtered by remediation safety level (`safe`, `humanreview`, or `unsafe`), each with a risk/confidence indicator, for AI-assisted correction | +| `analyse-ruleset-safety` | Per-rule risk, confidence, and remediation-safety analysis for a ruleset, independent of grading any spec | | `set-ruleset-config` | Set the default Spectral ruleset at session, workspace, or global scope | | `get-ruleset-config` | Get the active Spectral ruleset and which scope is effective | @@ -207,7 +208,7 @@ To confirm the server starts correctly: echo '{"jsonrpc":"2.0","id":1,"method":"tools/list","params":{}}' | npx -y @dawmatt/api-grade-mcp ``` -You should see a JSON response listing all six tools. +You should see a JSON response listing all the tools above. --- diff --git a/docs/package/README.md b/docs/package/README.md index 12170d1..c0c43b1 100644 --- a/docs/package/README.md +++ b/docs/package/README.md @@ -79,6 +79,6 @@ After building, the `api-grade-core` package is available under `packages/api-gr - [Usage Guide](usage-guide.md) — common integration patterns and worked examples - [API Reference](api-reference.md) — all exported functions, classes, and types - [API Diagnostic Algorithm Specification](../../specs/algorithms/api_diagnostic_algorithm_spec.md) — how scores, grades, and recommendations are computed -- [Quick-Fixes Algorithm Specification](../../specs/algorithms/quick_fixes_algorithm_spec.md) — how non-breaking, safely-automatable violations are identified +- [Automated Remediation Safety Algorithm Specification](../../specs/algorithms/automated_remediation_safety_algorithm_spec.md) — how risk, confidence, and remediation safety are determined per rule - [Documentation Index](../index.md) — full navigation across all docs - [CLI Tool](../cli/README.md) — use api-grade from the command line diff --git a/docs/package/api-grade-mcp.md b/docs/package/api-grade-mcp.md index 79f00fc..fa004f4 100644 --- a/docs/package/api-grade-mcp.md +++ b/docs/package/api-grade-mcp.md @@ -121,14 +121,24 @@ Assert that an API specification meets a minimum grade threshold (A > B > C > D ### `grade-api-remediation-safety` -Return a classified, AI-actionable list of diagnostics filtered by remediation safety level. The `safe` level covers improvements that can be made via non-breaking changes (those that do not alter paths, methods, required parameters, schema types, or response structures). Each result includes `ruleId`, `path`, `location`, `currentValue`, and `expectedImprovement`. +Return a classified, AI-actionable list of diagnostics filtered by remediation safety level: `safe` (non-breaking, safe to auto-apply), `humanreview` (typically additive/clarifying but should be confirmed by a human before applying at scale), or `unsafe` (could change request/response validation, required fields, types, or the parameter surface — requires human or explicitly-confirmed-agent review). Each result includes `ruleId`, `path`, `location`, `currentValue`, `expectedImprovement`, and a confidence indicator (`riskLevel`, `confidenceLevel`, `remediationSafetyLevel`, `staleFingerprintWarning`). -**Input**: `specPath` (required), `level` (required: `safe`), `rulesetPath` (optional), `recoveryOption` (optional) +**Input**: `specPath` (required), `level` (required: `safe`/`humanreview`/`unsafe`), `rulesetPath` (optional), `recoveryOption` (optional) **Use when**: Asking the AI to generate fixes for documentation and metadata issues without risking breaking changes. Use this tool instead of `grade-api-detailed` when the goal is AI-assisted correction. --- +### `analyse-ruleset-safety` + +Inspect a Spectral ruleset's per-rule remediation-safety analysis (`riskLevel`, `confidenceLevel`, `remediationSafetyLevel`, `assessedBy`, `rationale`) without grading any specific API specification. Returns the same `RulesetAnalysis` document the CLI's `ruleset-analysis` subcommand produces. + +**Input**: `rulesetPath` (optional), `recoveryOption` (optional) + +**Use when**: You want to understand how risky it would be to auto-remediate violations of each rule in a ruleset before running `grade-api-remediation-safety` against a real spec. + +--- + ### `set-ruleset-config` Set the default Spectral ruleset at session, workspace, or global scope. The configured default applies to all subsequent grading requests without needing to supply `rulesetPath` each time. diff --git a/docs/package/api-reference.md b/docs/package/api-reference.md index a77c082..eaa6460 100644 --- a/docs/package/api-reference.md +++ b/docs/package/api-reference.md @@ -151,21 +151,58 @@ interface AssertOutput { } ``` -### `buildQuickFixOutput(result: GradeResult, specContent: string): QuickFixOutput` +### `analyseRuleset(loadedRuleset: LoadedRuleset, options?: { auth?: AuthConfig | null }): Promise` -Shapes the "safely-automatable fixes" subset. Used by MCP's -`grade-api-remediation-safety` and the CLI's `--remediation-safety safe --format json`. +Computes a per-rule remediation-safety analysis for a loaded ruleset — risk level, +confidence level, and the derived remediation safety level for every rule, with +provenance (`assessedBy`, `source`) and a rationale. Checks persisted/bundled stores +(workspace override → global override → colocated shared analysis → bundled default for +the built-in ruleset) before falling through to the automated heuristic. See the +[Automated Remediation Safety Algorithm Specification](../../specs/algorithms/automated_remediation_safety_algorithm_spec.md) +for the full algorithm. ```typescript -interface QuickFixOutput { +interface RulesetAnalysis { + rulesetSource: 'default' | 'custom'; + rulesetPath?: string; + rules: RuleAnalysis[]; +} + +interface RuleAnalysis { + ruleId: string; + riskLevel: RiskLevel | null; // "low" | "medium" | "high" + confidenceLevel: ConfidenceLevel; // "high" | "medium" | "low" + remediationSafetyLevel: RemediationSafetyLevel; // "safe" | "humanreview" | "unsafe" + assessedBy: AssessmentOrigin; // "human" | "automated" + staleFingerprintWarning: StaleFingerprintWarning | null; + rationale: string; + source: AnalysisSource; // "persisted" | "bundled-default" | "heuristic" | "fallback" +} +``` + +### `getRemediationSafety(diagnostic: Diagnostic, rulesetAnalysis: RulesetAnalysis)` + +Looks up a single violation's remediation safety against a previously computed +`RulesetAnalysis`, by `ruleId`. Defaults to `{ riskLevel: "high", confidenceLevel: "low", +remediationSafetyLevel: "unsafe", staleFingerprintWarning: null }` when the rule isn't +covered by the analysis. + +### `buildRemediationSafetyOutput(result: GradeResult, specContent: string, rulesetAnalysis: RulesetAnalysis, requestedLevel: RemediationSafetyLevel): RemediationSafetyOutput` + +Shapes the diagnostics matching one remediation-safety level. Used by MCP's +`grade-api-remediation-safety` and the CLI's `--remediation-safety --format json`. + +```typescript +interface RemediationSafetyOutput { specPath: string; format: ApiFormat; totalViolations: number; - quickFixCount: number; - quickFixes: QuickFix[]; + remediationItemCount: number; + remediationItems: RemediationItem[]; + requestedLevel: RemediationSafetyLevel; } -interface QuickFix { +interface RemediationItem { ruleId: string; message: string; severity: string; @@ -173,21 +210,26 @@ interface QuickFix { location: string; // dot-joined `path` currentValue: string | null; expectedImprovement: string; + riskLevel: RiskLevel | null; + confidenceLevel: ConfidenceLevel; + remediationSafetyLevel: RemediationSafetyLevel; + staleFingerprintWarning: StaleFingerprintWarning | null; } ``` -### `formatQuickFixesHuman(result: GradeResult, specContent: string): string` +### `formatRemediationSafetyHuman(result: GradeResult, specContent: string, rulesetAnalysis: RulesetAnalysis, requestedLevel: RemediationSafetyLevel): string` -Renders the same filtered `QuickFix[]` list used by `buildQuickFixOutput()` as -human-readable text. Used by the CLI's `--remediation-safety safe` with `--format human` -(the default). +Renders the same filtered `RemediationItem[]` list used by `buildRemediationSafetyOutput()` +as human-readable text. Used by the CLI's `--remediation-safety ` with +`--format human` (the default). -### `classifyViolation(diagnostic: Diagnostic): ViolationClass` +### `persistRuleAnalysisCorrection(loadedRuleset, ruleId, remediationSafetyLevel, scope?)` -Classifies a single diagnostic as `'nonBreaking' | 'breaking' | 'unknown'`. The -classification basis for `buildQuickFixOutput()`'s filtering. See the -[Quick-Fixes Algorithm Specification](../../specs/algorithms/quick_fixes_algorithm_spec.md) -for the full rationale behind which violations are classified which way. +Persists a human-confirmed remediation-safety correction for one rule, written to the +colocated shared analysis file (default, for a writable local ruleset) or a personal +override (workspace/global scope, or as a fallback for a non-writable remote/built-in +ruleset location). Reloaded automatically by `analyseRuleset()` on future runs against the +same ruleset. --- @@ -305,8 +347,8 @@ interface RuleMetadata { - [Usage Guide](usage-guide.md) — common patterns and worked examples - [Package Overview](README.md) — installation and minimal usage -- [MCP Server Tool Reference](api-grade-mcp.md) — all six MCP tools including `recoveryOption` +- [MCP Server Tool Reference](api-grade-mcp.md) — all MCP tools including `recoveryOption` - [CLI Commands](../cli/commands.md#json-output-schema) — CLI-specific usage of the JSON Output Schema above - [API Diagnostic Algorithm Specification](../../specs/algorithms/api_diagnostic_algorithm_spec.md) — full scoring/grading/recommendation algorithm -- [Quick-Fixes Algorithm Specification](../../specs/algorithms/quick_fixes_algorithm_spec.md) — full non-breaking-vs-breaking classification algorithm +- [Automated Remediation Safety Algorithm Specification](../../specs/algorithms/automated_remediation_safety_algorithm_spec.md) — full risk/confidence/remediation-safety classification algorithm - [Documentation Index](../index.md) — full navigation across all docs diff --git a/eslint.config.mjs b/eslint.config.mjs index 9a6f025..40092db 100644 --- a/eslint.config.mjs +++ b/eslint.config.mjs @@ -26,7 +26,7 @@ export default tseslint.config( 'node_modules/**', 'coverage/**', 'packages/*/coverage/**', - 'scripts/**', + '**/scripts/**', '**/*.config.ts', '**/*.config.mjs', ], diff --git a/packages/api-grade-core/package.json b/packages/api-grade-core/package.json index e2f99db..edb5e43 100644 --- a/packages/api-grade-core/package.json +++ b/packages/api-grade-core/package.json @@ -31,6 +31,7 @@ }, "scripts": { "build": "tsc", + "postbuild": "node scripts/copy-bundled-analysis.mjs", "test": "vitest run", "test:watch": "vitest", "test:coverage": "vitest run --coverage", diff --git a/packages/api-grade-core/scripts/copy-bundled-analysis.mjs b/packages/api-grade-core/scripts/copy-bundled-analysis.mjs new file mode 100644 index 0000000..696c750 --- /dev/null +++ b/packages/api-grade-core/scripts/copy-bundled-analysis.mjs @@ -0,0 +1,10 @@ +import { cpSync, mkdirSync } from 'node:fs'; +import { dirname, join } from 'node:path'; +import { fileURLToPath } from 'node:url'; + +const __dirname = dirname(fileURLToPath(import.meta.url)); +const srcDir = join(__dirname, '..', 'src', 'rulesets', 'bundled-analysis'); +const destDir = join(__dirname, '..', 'dist', 'rulesets', 'bundled-analysis'); + +mkdirSync(destDir, { recursive: true }); +cpSync(srcDir, destDir, { recursive: true, filter: (src) => src.endsWith('.json') || !src.includes('.') }); diff --git a/packages/api-grade-core/scripts/generate-bundled-analysis.mjs b/packages/api-grade-core/scripts/generate-bundled-analysis.mjs new file mode 100644 index 0000000..8ce374a --- /dev/null +++ b/packages/api-grade-core/scripts/generate-bundled-analysis.mjs @@ -0,0 +1,108 @@ +// Maintenance utility: regenerates the seeded entries in +// src/rulesets/bundled-analysis/{openapi,asyncapi}.json from the curated rule-id lists +// (FR-012/FR-020). Run manually after bumping @stoplight/spectral-rulesets or editing the +// curated lists below; does not run as part of the package build. +// +// IMPORTANT: these entries are assessedBy: "automated" — they are a seeded, no-human-in-the-loop +// classification, not a maintainer's reviewed judgement. Per the data model, assessedBy: "human" +// is reserved for a classification an actual person has explicitly reviewed and persisted (e.g. +// via `ruleset-analysis correct`). Do not flip these to "human" without a real maintainer review. +import { createHash } from 'node:crypto'; +import { writeFileSync } from 'node:fs'; +import { dirname, join } from 'node:path'; +import { fileURLToPath } from 'node:url'; +import { oas, asyncapi } from '@stoplight/spectral-rulesets'; + +const __dirname = dirname(fileURLToPath(import.meta.url)); +const outDir = join(__dirname, '..', 'src', 'rulesets', 'bundled-analysis'); + +function givenExprsOf(rule) { + if (!rule.given) return []; + return Array.isArray(rule.given) ? rule.given : [rule.given]; +} + +function functionNamesOf(rule) { + const then = rule.then; + if (!then) return []; + const thens = Array.isArray(then) ? then : [then]; + return thens.map((t) => t?.function).filter((f) => typeof f === 'string'); +} + +function computeRuleFingerprint(ruleId, rule) { + const given = givenExprsOf(rule).join(','); + const fn = functionNamesOf(rule).join(','); + const severity = String(rule.severity ?? ''); + const description = rule.description ?? ''; + const raw = `${ruleId}|${given}|${fn}|${severity}|${description}`; + return createHash('sha256').update(raw).digest('hex'); +} + +// Curated rule-id -> classification, migrated from the former hard-coded +// quick_fixes_algorithm_spec.md tables. Maintainers add entries here as the project encounters +// new well-known rules; this is a config-only change, not an algorithm change. +const CURATED = { + safe: [ + 'operation-description', + 'operation-summary', + 'info-contact', + 'info-description', + 'info-license', + 'oas3-examples-value-or-externalValue', + 'tag-description', + 'asyncapi-info-contact', + 'asyncapi-info-description', + 'asyncapi-info-license', + 'asyncapi-operation-description', + 'asyncapi-3-operation-description', + 'asyncapi-tag-description', + 'asyncapi-3-tag-description', + 'asyncapi-parameter-description', + ], + humanreview: [ + 'operation-operationId', + 'operation-success-response', + 'oas3-server-not-example.com', + 'oas3-server-trailing-slash', + 'oas3-operation-security-defined', + 'oas2-operation-security-defined', + 'asyncapi-operation-operationId', + 'asyncapi-server-not-example-com', + 'asyncapi-3-server-not-example-com', + 'asyncapi-operation-security', + 'asyncapi-3-operation-security', + ], + unsafe: ['oas3-schema', 'oas3-valid-schema-example', 'oas2-schema', 'asyncapi-schema', 'asyncapi-payload'], +}; + +const RATIONALE = { + safe: 'seeded safe classification (bundled default, not yet human-reviewed)', + humanreview: 'seeded humanreview classification (bundled default, not yet human-reviewed)', + unsafe: 'seeded unsafe classification (bundled default, not yet human-reviewed)', +}; + +function generate(ruleset, fileName) { + const rules = {}; + for (const [level, ruleIds] of Object.entries(CURATED)) { + for (const ruleId of ruleIds) { + const rule = ruleset.rules[ruleId]; + if (!rule) continue; // not present in this ruleset (e.g. an asyncapi-only id checked against oas) + rules[ruleId] = { + ruleId, + riskLevel: null, + confidenceLevel: 'high', + remediationSafetyLevel: level, + assessedBy: 'automated', + staleFingerprintWarning: null, + rationale: RATIONALE[level], + source: 'bundled-default', + fingerprint: computeRuleFingerprint(ruleId, rule), + }; + } + } + const outPath = join(outDir, fileName); + writeFileSync(outPath, JSON.stringify({ rules }, null, 2) + '\n', 'utf-8'); + console.log(`Wrote ${outPath} (${Object.keys(rules).length} entries)`); +} + +generate(oas, 'openapi.json'); +generate(asyncapi, 'asyncapi.json'); diff --git a/packages/api-grade-core/src/config/personal-ruleset-override.ts b/packages/api-grade-core/src/config/personal-ruleset-override.ts new file mode 100644 index 0000000..68ff683 --- /dev/null +++ b/packages/api-grade-core/src/config/personal-ruleset-override.ts @@ -0,0 +1,46 @@ +import { readFile, writeFile, mkdir } from 'node:fs/promises'; +import { dirname, join } from 'node:path'; +import { homedir } from 'node:os'; +import type { PersonalRulesetAnalysisOverride } from '../types.js'; +import { ConfigWriteError } from './ruleset-config.js'; + +export function getWorkspaceOverridePath(): string { + return join(process.cwd(), '.api-grade', 'ruleset-analysis-override.json'); +} + +export function getGlobalOverridePath(): string { + return join(homedir(), '.api-grade', 'ruleset-analysis-override.json'); +} + +async function loadOverride(filePath: string): Promise { + try { + const data = await readFile(filePath, 'utf-8'); + return JSON.parse(data) as PersonalRulesetAnalysisOverride; + } catch { + return null; + } +} + +export async function loadWorkspaceRulesetAnalysisOverride(): Promise { + return loadOverride(getWorkspaceOverridePath()); +} + +export async function loadGlobalRulesetAnalysisOverride(): Promise { + return loadOverride(getGlobalOverridePath()); +} + +export async function saveRulesetAnalysisOverride( + scope: 'workspace' | 'global', + override: PersonalRulesetAnalysisOverride +): Promise { + const filePath = scope === 'workspace' ? getWorkspaceOverridePath() : getGlobalOverridePath(); + try { + await mkdir(dirname(filePath), { recursive: true }); + await writeFile(filePath, JSON.stringify(override, null, 2), 'utf-8'); + } catch (err) { + throw new ConfigWriteError( + `Could not write ${scope} ruleset analysis override to ${filePath}: ${err instanceof Error ? err.message : String(err)}`, + err + ); + } +} diff --git a/packages/api-grade-core/src/config/shared-ruleset-analysis.ts b/packages/api-grade-core/src/config/shared-ruleset-analysis.ts new file mode 100644 index 0000000..a038d63 --- /dev/null +++ b/packages/api-grade-core/src/config/shared-ruleset-analysis.ts @@ -0,0 +1,56 @@ +import { readFile, writeFile } from 'node:fs/promises'; +import { existsSync } from 'node:fs'; +import type { AuthConfig, SharedRulesetAnalysis } from '../types.js'; +import { fetchRulesetContent, INITIAL_FETCH_TIMEOUT_MS } from '../auth/github.js'; + +const SHARED_ANALYSIS_SUFFIX = '.remediation-safety.json'; + +// Colocated location, derived deterministically from the ruleset's own path/URL — never a +// separately-tracked or registered location (FR-016/FR-017). +export function deriveSharedAnalysisLocation(rulesetPathOrUrl: string): string { + return `${rulesetPathOrUrl}${SHARED_ANALYSIS_SUFFIX}`; +} + +export async function loadLocalSharedRulesetAnalysis(rulesetPath: string): Promise { + const location = deriveSharedAnalysisLocation(rulesetPath); + if (!existsSync(location)) return null; + try { + const data = await readFile(location, 'utf-8'); + return JSON.parse(data) as SharedRulesetAnalysis; + } catch { + return null; + } +} + +export async function saveLocalSharedRulesetAnalysis( + rulesetPath: string, + analysis: SharedRulesetAnalysis +): Promise { + const location = deriveSharedAnalysisLocation(rulesetPath); + await writeFile(location, JSON.stringify(analysis, null, 2), 'utf-8'); +} + +// Read-only for a GitHub-hosted ruleset — reuses the same resolution/auth flow already used to +// fetch the ruleset itself (FR-017); never written automatically (FR-019). +export async function loadRemoteSharedRulesetAnalysis( + rulesetUrl: string, + auth: AuthConfig | null +): Promise { + const location = deriveSharedAnalysisLocation(rulesetUrl); + try { + const token = auth?.type === 'github-pat' ? auth.githubToken ?? process.env.GITHUB_TOKEN : undefined; + const content = await fetchRulesetContent(location, token, INITIAL_FETCH_TIMEOUT_MS); + return JSON.parse(content) as SharedRulesetAnalysis; + } catch { + return null; + } +} + +export async function loadSharedRulesetAnalysis( + rulesetPath: string | undefined, + auth: AuthConfig | null +): Promise { + if (!rulesetPath) return null; + if (rulesetPath.startsWith('http')) return loadRemoteSharedRulesetAnalysis(rulesetPath, auth); + return loadLocalSharedRulesetAnalysis(rulesetPath); +} diff --git a/packages/api-grade-core/src/index.ts b/packages/api-grade-core/src/index.ts index 8dc6510..253794e 100644 --- a/packages/api-grade-core/src/index.ts +++ b/packages/api-grade-core/src/index.ts @@ -3,7 +3,20 @@ export { formatHuman, formatJson } from './formatter.js'; export { computeScore, LETTER_GRADE_ORDER, gradeToNumber } from './scorer.js'; export { extractCategory } from './types.js'; export { buildCommonGradeOutput, buildAssertOutput } from './json-output.js'; -export { classifyViolation, buildQuickFix, buildQuickFixOutput, formatQuickFixesHuman } from './quick-fixes.js'; +export { + analyseRuleset, + getRemediationSafety, + buildRemediationItem, + buildRemediationSafetyOutput, + formatRemediationSafetyHuman, + decisionMatrix, + computeRuleFingerprint, + persistRuleAnalysisCorrection, +} from './remediation-safety.js'; +export type { + PersistRulesetAnalysisCorrectionScope, + PersistRulesetAnalysisCorrectionResult, +} from './remediation-safety.js'; export type { ApiFormat, @@ -19,11 +32,22 @@ export type { ImpactLevel, LetterGrade, RuleMetadata, - QuickFix, - ViolationClass, + RemediationItem, + RemediationSafetyLevel, + RiskLevel, + ConfidenceLevel, + AssessmentOrigin, + AnalysisSource, + RuleAnalysis, + RulesetAnalysis, + StaleFingerprintWarning, CommonGradeOutput, AssertOutput, - QuickFixOutput, + RemediationSafetyOutput, + PersistedRuleEntry, + SharedRulesetAnalysis, + PersonalRulesetAnalysisOverride, + BundledRulesetAnalysis, } from './types.js'; export type { @@ -53,3 +77,22 @@ export { } from './config/ruleset-config.js'; export { resolveRuleset } from './config/resolve-ruleset.js'; + +export { loadRuleset, loadRulesetFromUrl, getDefaultRuleset } from './rulesets/loader.js'; +export type { LoadedRuleset } from './rulesets/loader.js'; + +export { + deriveSharedAnalysisLocation, + loadLocalSharedRulesetAnalysis, + saveLocalSharedRulesetAnalysis, + loadRemoteSharedRulesetAnalysis, + loadSharedRulesetAnalysis, +} from './config/shared-ruleset-analysis.js'; + +export { + getWorkspaceOverridePath, + getGlobalOverridePath, + loadWorkspaceRulesetAnalysisOverride, + loadGlobalRulesetAnalysisOverride, + saveRulesetAnalysisOverride, +} from './config/personal-ruleset-override.js'; diff --git a/packages/api-grade-core/src/quick-fixes.ts b/packages/api-grade-core/src/quick-fixes.ts deleted file mode 100644 index dcd5913..0000000 --- a/packages/api-grade-core/src/quick-fixes.ts +++ /dev/null @@ -1,169 +0,0 @@ -import type { Diagnostic, GradeResult, ViolationClass, QuickFix, QuickFixOutput } from './types.js'; - -const RULE_ID_NON_BREAKING_PREFIXES = [ - 'operation-description', - 'operation-summary', - 'info-contact', - 'info-description', - 'info-license', - 'oas3-examples-', - 'tag-description', -]; - -const NON_BREAKING_SEGMENTS = new Set([ - 'description', - 'summary', - 'title', - 'contact', - 'license', - 'termsOfService', - 'externalDocs', - 'example', - 'examples', - 'tags', - 'info', -]); - -const BREAKING_SEGMENTS = new Set([ - 'required', - 'type', - 'format', -]); - -function isNonBreakingPath(path: string[]): boolean { - for (const segment of path) { - if (segment.startsWith('x-')) return true; - if (NON_BREAKING_SEGMENTS.has(segment)) return true; - } - return false; -} - -function isBreakingPath(path: string[]): boolean { - for (const segment of path) { - if (BREAKING_SEGMENTS.has(segment)) return true; - if (segment === 'parameters') return true; - } - return false; -} - -export function classifyViolation(diagnostic: Diagnostic): ViolationClass { - // Rule ID overrides take priority - for (const prefix of RULE_ID_NON_BREAKING_PREFIXES) { - if (diagnostic.ruleId.startsWith(prefix)) return 'nonBreaking'; - } - - const path = diagnostic.path ?? []; - - if (isBreakingPath(path)) return 'breaking'; - if (isNonBreakingPath(path)) return 'nonBreaking'; - return 'unknown'; -} - -const SEVERITY_LABELS: Record = { - 0: 'error', - 1: 'warn', - 2: 'info', - 3: 'hint', -}; - -export function buildQuickFix( - diagnostic: Diagnostic, - specContent: string -): QuickFix { - const path = (diagnostic.path ?? []) as string[]; - const location = path.join('.'); - - let currentValue: string | null = null; - try { - if (path.length > 0) { - const parsed: unknown = JSON.parse(specContent); - let node: unknown = parsed; - for (const segment of path) { - if (node === null || typeof node !== 'object') { - node = undefined; - break; - } - node = (node as Record)[segment]; - } - if (node !== undefined && node !== null) { - currentValue = typeof node === 'string' ? node : JSON.stringify(node); - } - } - } catch { - // JSON parse failed (e.g. YAML spec) — leave currentValue as null - } - - const lastSegment = path[path.length - 1] ?? 'field'; - const expectedImprovement = deriveExpectedImprovement(diagnostic.ruleId, diagnostic.message, lastSegment, path); - - const severityNum = typeof diagnostic.severity === 'number' ? diagnostic.severity : 1; - - return { - ruleId: diagnostic.ruleId, - message: diagnostic.message, - severity: SEVERITY_LABELS[severityNum] ?? 'warn', - path, - location, - currentValue, - expectedImprovement, - }; -} - -function deriveExpectedImprovement( - ruleId: string, - message: string, - lastSegment: string, - path: string[] -): string { - if (ruleId.includes('description')) { - const entity = path.length > 1 ? path[path.length - 2] : 'item'; - return `Add a \`description\` field that explains the purpose of this ${entity}`; - } - if (ruleId.includes('summary')) { - return `Add a \`summary\` field with a brief one-line description`; - } - if (ruleId.includes('contact')) { - return `Add a \`contact\` object to the info block with name, email, or url`; - } - if (ruleId.includes('license')) { - return `Add a \`license\` object to the info block with name and url`; - } - if (ruleId.includes('example')) { - return `Add an \`example\` or \`examples\` field illustrating expected values`; - } - if (ruleId.includes('tag-description')) { - return `Add a \`description\` field to this tag explaining its purpose`; - } - return `Fix: ${message}. Add or update \`${lastSegment}\` as required`; -} - -export function buildQuickFixOutput(result: GradeResult, specContent: string): QuickFixOutput { - const quickFixes = result.diagnostics - .filter((d) => classifyViolation(d) === 'nonBreaking') - .map((d) => buildQuickFix(d, specContent)); - - return { - specPath: result.specPath, - format: result.format, - totalViolations: result.diagnostics.length, - quickFixCount: quickFixes.length, - quickFixes, - }; -} - -export function formatQuickFixesHuman(result: GradeResult, specContent: string): string { - const { quickFixes } = buildQuickFixOutput(result, specContent); - const lines: string[] = []; - - lines.push(`Quick Fixes (${quickFixes.length} of ${result.diagnostics.length} total violations):`); - - for (const fix of quickFixes) { - lines.push(''); - const location = fix.location || '(root)'; - lines.push(` ${fix.severity.padEnd(5)} ${fix.ruleId.padEnd(42)} ${location}`); - lines.push(` ${fix.message}`); - lines.push(` ${fix.expectedImprovement}`); - } - - return lines.join('\n'); -} diff --git a/packages/api-grade-core/src/remediation-safety.ts b/packages/api-grade-core/src/remediation-safety.ts new file mode 100644 index 0000000..4f6afc6 --- /dev/null +++ b/packages/api-grade-core/src/remediation-safety.ts @@ -0,0 +1,582 @@ +import { createHash } from 'node:crypto'; +import type { + AnalysisSource, + AuthConfig, + ConfidenceLevel, + Diagnostic, + GradeResult, + PersistedRuleEntry, + RemediationItem, + RemediationSafetyLevel, + RemediationSafetyOutput, + RiskLevel, + RuleAnalysis, + RulesetAnalysis, +} from './types.js'; +import type { LoadedRuleset } from './rulesets/loader.js'; +import { + loadSharedRulesetAnalysis, + deriveSharedAnalysisLocation, + loadLocalSharedRulesetAnalysis, + saveLocalSharedRulesetAnalysis, +} from './config/shared-ruleset-analysis.js'; +import { + loadWorkspaceRulesetAnalysisOverride, + loadGlobalRulesetAnalysisOverride, + saveRulesetAnalysisOverride, +} from './config/personal-ruleset-override.js'; +import { loadBundledRulesetAnalysis } from './rulesets/bundled-analysis.js'; + +type Tier = 'safe' | 'humanreview' | 'unsafe'; + +interface StageResult { + riskLevel: RiskLevel; + confidenceLevel: ConfidenceLevel; + rationale: string; + source: AnalysisSource; +} + +interface SpectralThen { + function?: string; + field?: string; +} + +interface SpectralRule { + given?: string | string[]; + then?: SpectralThen | SpectralThen[]; + severity?: unknown; + description?: string; +} + +const UNSAFE_SEGMENTS = new Set([ + 'required', + 'type', + 'format', + 'parameters', + 'address', + 'action', + 'messages', + 'payload', +]); + +const HUMANREVIEW_SEGMENTS = new Set([ + 'enum', + 'default', + 'security', + 'servers', + 'operationId', + 'additionalProperties', + 'responses', + 'channels', + 'operations', + 'reply', +]); + +const SAFE_SEGMENTS = new Set([ + 'description', + 'summary', + 'title', + 'contact', + 'license', + 'termsOfService', + 'externalDocs', + 'example', + 'examples', + 'tags', + 'info', +]); + +const ADDITIVE_FUNCTIONS = new Set(['truthy', 'defined']); +const RENAME_FUNCTIONS = new Set(['pattern', 'casing']); +const KNOWN_FUNCTIONS = new Set([ + ...ADDITIVE_FUNCTIONS, + ...RENAME_FUNCTIONS, + 'alphabetical', + 'enumeration', + 'falsy', + 'length', + 'schema', + 'undefined', + 'unreferencedReusableObject', + 'xor', +]); + +function tokenize(given: string): string[] { + return given.match(/[A-Za-z_][A-Za-z0-9_-]*/g) ?? []; +} + +function isKeySelector(given: string): boolean { + return /~\s*$/.test(given.trim()); +} + +function givenExprsOf(rule: SpectralRule): string[] { + if (!rule.given) return []; + return Array.isArray(rule.given) ? rule.given : [rule.given]; +} + +function functionNamesOf(rule: SpectralRule): string[] { + const then = rule.then; + if (!then) return []; + const thens = Array.isArray(then) ? then : [then]; + return thens.map((t) => t?.function).filter((f): f is string => typeof f === 'string'); +} + +function matchedTiers(givenExprs: string[]): Set { + const tiers = new Set(); + for (const given of givenExprs) { + for (const segment of tokenize(given)) { + if (segment.startsWith('x-')) tiers.add('safe'); + if (UNSAFE_SEGMENTS.has(segment)) tiers.add('unsafe'); + if (HUMANREVIEW_SEGMENTS.has(segment)) tiers.add('humanreview'); + if (SAFE_SEGMENTS.has(segment)) tiers.add('safe'); + } + } + return tiers; +} + +function mostConservativeTier(tiers: Set): Tier | null { + if (tiers.has('unsafe')) return 'unsafe'; + if (tiers.has('humanreview')) return 'humanreview'; + if (tiers.has('safe')) return 'safe'; + return null; +} + +function tierToRisk(tier: Tier): RiskLevel { + return tier === 'unsafe' ? 'high' : tier === 'humanreview' ? 'medium' : 'low'; +} + +// Stage 1a: a `given` expression that selects path/channel object keys directly. +function stage1a(givenExprs: string[]): StageResult | null { + for (const given of givenExprs) { + if (!isKeySelector(given)) continue; + const tokens = tokenize(given); + if (tokens.includes('paths') || tokens.includes('channels')) { + return { + riskLevel: 'high', + confidenceLevel: 'high', + rationale: + 'given path selects path/channel object keys directly — any satisfying edit renames a public path or channel', + source: 'heuristic', + }; + } + } + return null; +} + +// Stage 1b: classify by the rule's `then.function` mechanics. +function stage1b(givenExprs: string[], functionNames: string[]): StageResult | null { + if (functionNames.length === 0) return null; + const tiers = matchedTiers(givenExprs); + + for (const fn of functionNames) { + if (ADDITIVE_FUNCTIONS.has(fn)) { + let riskLevel: RiskLevel = 'low'; + if (tiers.has('unsafe')) riskLevel = 'high'; + else if (tiers.has('humanreview')) riskLevel = 'medium'; + const confidenceLevel: ConfidenceLevel = tiers.size <= 1 ? 'high' : 'medium'; + return { + riskLevel, + confidenceLevel, + rationale: `\`${fn}\` function (additive — add/populate a field) on a target matching the ${riskLevel} tier`, + source: 'heuristic', + }; + } + if (RENAME_FUNCTIONS.has(fn)) { + let riskLevel: RiskLevel = 'medium'; + if (tiers.has('unsafe')) riskLevel = 'high'; + else if (tiers.size === 1 && tiers.has('safe')) riskLevel = 'low'; + const confidenceLevel: ConfidenceLevel = tiers.size <= 1 ? 'high' : 'medium'; + return { + riskLevel, + confidenceLevel, + rationale: `\`${fn}\` function (rename/reformat) on a target matching the ${riskLevel} tier`, + source: 'heuristic', + }; + } + if (!KNOWN_FUNCTIONS.has(fn)) { + return { + riskLevel: 'high', + confidenceLevel: 'low', + rationale: `custom function \`${fn}\` — mechanics cannot be inferred statically`, + source: 'heuristic', + }; + } + } + return null; +} + +// Stage 1c: generic segment-membership fallback within Stage 1. +function stage1c(givenExprs: string[]): StageResult | null { + const tiers = matchedTiers(givenExprs); + const tier = mostConservativeTier(tiers); + if (tier === null) return null; + const riskLevel = tierToRisk(tier); + const confidenceLevel: ConfidenceLevel = tiers.size === 1 ? 'medium' : 'low'; + const rationale = + tiers.size === 1 + ? `given path matched the ${tier} segment set` + : `given path matched multiple tiers (${[...tiers].join(', ')}) — conservative match, ambiguous`; + return { riskLevel, confidenceLevel, rationale, source: 'heuristic' }; +} + +const STAGE2_FALLBACK: StageResult = { + riskLevel: 'high', + confidenceLevel: 'low', + rationale: 'no recognizable rule-id, function, or path signal', + source: 'fallback', +}; + +function classifyRuleStages1And2(rule: SpectralRule): StageResult { + const givenExprs = givenExprsOf(rule); + const a = stage1a(givenExprs); + if (a) return a; + const functionNames = functionNamesOf(rule); + const b = stage1b(givenExprs, functionNames); + if (b) return b; + const c = stage1c(givenExprs); + if (c) return c; + return STAGE2_FALLBACK; +} + +export function decisionMatrix(riskLevel: RiskLevel, confidenceLevel: ConfidenceLevel): RemediationSafetyLevel { + if (riskLevel === 'low' && (confidenceLevel === 'high' || confidenceLevel === 'medium')) return 'safe'; + if (riskLevel === 'medium' && confidenceLevel === 'high') return 'humanreview'; + if (riskLevel === 'high') return 'unsafe'; + return 'humanreview'; +} + +// Stage 0: a stable identifier for "this exact rule definition" — hash over the rule's own +// content (ruleId, given, then.function, severity, description), never the ruleset path/URL. +export function computeRuleFingerprint(ruleId: string, rule: SpectralRule): string { + const given = givenExprsOf(rule).join(','); + const fn = functionNamesOf(rule).join(','); + const severity = String(rule.severity ?? ''); + const description = rule.description ?? ''; + const raw = `${ruleId}|${given}|${fn}|${severity}|${description}`; + return createHash('sha256').update(raw).digest('hex'); +} + +function buildStaleFingerprintWarning(stored: string, current: string): RuleAnalysis['staleFingerprintWarning'] { + return { + storedFingerprint: stored, + currentFingerprint: current, + message: `rule changed since this was last reviewed (stored fingerprint ${stored.slice(0, 8)}..., current ${current.slice(0, 8)}...)`, + }; +} + +// Stage 0 lookup precedence: workspace override -> global override -> shared colocated analysis +// -> bundled default (built-in ruleset only). An `assessedBy: "human"` entry is used as soon as +// it is found, fingerprint match or not (flagged via staleFingerprintWarning on mismatch). An +// `assessedBy: "automated"` entry is only used on a fingerprint match; otherwise the lookup +// continues to the next store in precedence order. +function lookupStage0( + ruleId: string, + fingerprint: string, + stores: Array | null | undefined> +): RuleAnalysis | null { + for (const store of stores) { + const entry = store?.[ruleId]; + if (!entry) continue; + + if (entry.assessedBy === 'human') { + const stale = entry.fingerprint !== fingerprint; + return { + ruleId, + riskLevel: entry.riskLevel, + confidenceLevel: entry.confidenceLevel, + remediationSafetyLevel: entry.remediationSafetyLevel, + assessedBy: 'human', + rationale: entry.rationale, + source: entry.source, + staleFingerprintWarning: stale ? buildStaleFingerprintWarning(entry.fingerprint, fingerprint) : null, + }; + } + + if (entry.fingerprint === fingerprint) { + return { + ruleId, + riskLevel: entry.riskLevel, + confidenceLevel: entry.confidenceLevel, + remediationSafetyLevel: entry.remediationSafetyLevel, + assessedBy: 'automated', + rationale: entry.rationale, + source: entry.source, + staleFingerprintWarning: null, + }; + } + // automated entry, stale fingerprint -> not found; keep checking lower-precedence stores + } + return null; +} + +export async function analyseRuleset( + loadedRuleset: LoadedRuleset, + options?: { auth?: AuthConfig | null } +): Promise { + const rulesMap = (loadedRuleset.ruleset?.rules ?? {}) as Record; + const ruleIds = Object.keys(rulesMap); + const isBuiltIn = loadedRuleset.rulesetSource === 'default'; + + const [workspaceOverride, globalOverride, sharedAnalysis, bundledAnalysis] = await Promise.all([ + loadWorkspaceRulesetAnalysisOverride(), + loadGlobalRulesetAnalysisOverride(), + loadSharedRulesetAnalysis(loadedRuleset.rulesetPath, options?.auth ?? null), + isBuiltIn ? loadBundledRulesetAnalysis(ruleIds) : Promise.resolve(null), + ]); + + const rules: RuleAnalysis[] = ruleIds.map((ruleId) => { + const rule = rulesMap[ruleId]; + const fingerprint = computeRuleFingerprint(ruleId, rule); + + const stage0 = lookupStage0(ruleId, fingerprint, [ + workspaceOverride?.rules, + globalOverride?.rules, + sharedAnalysis?.rules, + bundledAnalysis?.rules, + ]); + if (stage0) return stage0; + + const { riskLevel, confidenceLevel, rationale, source } = classifyRuleStages1And2(rule); + return { + ruleId, + riskLevel, + confidenceLevel, + remediationSafetyLevel: decisionMatrix(riskLevel, confidenceLevel), + assessedBy: 'automated', + staleFingerprintWarning: null, + rationale, + source, + }; + }); + + return { + rulesetSource: loadedRuleset.rulesetSource, + ...(loadedRuleset.rulesetPath !== undefined ? { rulesetPath: loadedRuleset.rulesetPath } : {}), + rules, + }; +} + +export type PersistRulesetAnalysisCorrectionScope = 'shared' | 'personal-workspace' | 'personal-global'; + +export interface PersistRulesetAnalysisCorrectionResult { + written: 'shared' | 'personal' | 'personal-fallback'; + sharedFileContent?: string; +} + +// Stage 4: an explicit, user-initiated write of a human-confirmed classification into one of +// the stores Stage 0 reads from. Defaults to the colocated shared file for a local, writable +// ruleset; falls back to a personal override (plus emitted shared-file content) for a remote or +// built-in ruleset location that isn't locally writable (FR-019). +export async function persistRuleAnalysisCorrection( + loadedRuleset: LoadedRuleset, + ruleId: string, + remediationSafetyLevel: RemediationSafetyLevel, + scope: PersistRulesetAnalysisCorrectionScope = 'shared' +): Promise { + const rulesMap = (loadedRuleset.ruleset?.rules ?? {}) as Record; + const rule = rulesMap[ruleId]; + if (!rule) { + throw new Error(`Rule '${ruleId}' was not found in this ruleset.`); + } + const fingerprint = computeRuleFingerprint(ruleId, rule); + + const entry: PersistedRuleEntry = { + ruleId, + riskLevel: null, + confidenceLevel: 'high', + remediationSafetyLevel, + assessedBy: 'human', + staleFingerprintWarning: null, + rationale: 'user-confirmed override', + source: 'persisted', + fingerprint, + }; + + if (scope === 'personal-workspace' || scope === 'personal-global') { + const overrideScope = scope === 'personal-workspace' ? 'workspace' : 'global'; + const existing = + overrideScope === 'workspace' + ? await loadWorkspaceRulesetAnalysisOverride() + : await loadGlobalRulesetAnalysisOverride(); + await saveRulesetAnalysisOverride(overrideScope, { + scope: overrideScope, + rules: { ...(existing?.rules ?? {}), [ruleId]: entry }, + }); + return { written: 'personal' }; + } + + const rulesetPath = loadedRuleset.rulesetPath; + const isRemote = rulesetPath?.startsWith('http'); + + if (rulesetPath && !isRemote) { + const existing = await loadLocalSharedRulesetAnalysis(rulesetPath); + await saveLocalSharedRulesetAnalysis(rulesetPath, { + location: deriveSharedAnalysisLocation(rulesetPath), + rules: { ...(existing?.rules ?? {}), [ruleId]: entry }, + }); + return { written: 'shared' }; + } + + // Remote (GitHub-hosted) or built-in ruleset location — never write automatically (FR-019). + const existing = await loadWorkspaceRulesetAnalysisOverride(); + const mergedRules = { ...(existing?.rules ?? {}), [ruleId]: entry }; + await saveRulesetAnalysisOverride('workspace', { scope: 'workspace', rules: mergedRules }); + const sharedFileContent = JSON.stringify( + { location: rulesetPath ? deriveSharedAnalysisLocation(rulesetPath) : undefined, rules: mergedRules }, + null, + 2 + ); + return { written: 'personal-fallback', sharedFileContent }; +} + +export function getRemediationSafety( + diagnostic: Diagnostic, + rulesetAnalysis: RulesetAnalysis +): Pick { + const entry = rulesetAnalysis.rules.find((r) => r.ruleId === diagnostic.ruleId); + if (entry) { + return { + riskLevel: entry.riskLevel, + confidenceLevel: entry.confidenceLevel, + remediationSafetyLevel: entry.remediationSafetyLevel, + staleFingerprintWarning: entry.staleFingerprintWarning, + }; + } + return { riskLevel: 'high', confidenceLevel: 'low', remediationSafetyLevel: 'unsafe', staleFingerprintWarning: null }; +} + +const SEVERITY_LABELS: Record = { + 0: 'error', + 1: 'warn', + 2: 'info', + 3: 'hint', +}; + +function deriveExpectedImprovement( + ruleId: string, + message: string, + lastSegment: string, + path: string[] +): string { + if (ruleId.includes('description')) { + const entity = path.length > 1 ? path[path.length - 2] : 'item'; + return `Add a \`description\` field that explains the purpose of this ${entity}`; + } + if (ruleId.includes('summary')) { + return `Add a \`summary\` field with a brief one-line description`; + } + if (ruleId.includes('contact')) { + return `Add a \`contact\` object to the info block with name, email, or url`; + } + if (ruleId.includes('license')) { + return `Add a \`license\` object to the info block with name and url`; + } + if (ruleId.includes('example')) { + return `Add an \`example\` or \`examples\` field illustrating expected values`; + } + if (ruleId.includes('tag-description')) { + return `Add a \`description\` field to this tag explaining its purpose`; + } + return `Fix: ${message}. Add or update \`${lastSegment}\` as required`; +} + +export function buildRemediationItem( + diagnostic: Diagnostic, + specContent: string, + rulesetAnalysis: RulesetAnalysis +): RemediationItem { + const path = (diagnostic.path ?? []) as string[]; + const location = path.join('.'); + + let currentValue: string | null = null; + try { + if (path.length > 0) { + const parsed: unknown = JSON.parse(specContent); + let node: unknown = parsed; + for (const segment of path) { + if (node === null || typeof node !== 'object') { + node = undefined; + break; + } + node = (node as Record)[segment]; + } + if (node !== undefined && node !== null) { + currentValue = typeof node === 'string' ? node : JSON.stringify(node); + } + } + } catch { + // JSON parse failed (e.g. YAML spec) — leave currentValue as null + } + + const lastSegment = path[path.length - 1] ?? 'field'; + const expectedImprovement = deriveExpectedImprovement(diagnostic.ruleId, diagnostic.message, lastSegment, path); + + const severityNum = typeof diagnostic.severity === 'number' ? diagnostic.severity : 1; + + const safety = getRemediationSafety(diagnostic, rulesetAnalysis); + + return { + ruleId: diagnostic.ruleId, + message: diagnostic.message, + severity: SEVERITY_LABELS[severityNum] ?? 'warn', + path, + location, + currentValue, + expectedImprovement, + riskLevel: safety.riskLevel, + confidenceLevel: safety.confidenceLevel, + remediationSafetyLevel: safety.remediationSafetyLevel, + staleFingerprintWarning: safety.staleFingerprintWarning, + }; +} + +export function buildRemediationSafetyOutput( + result: GradeResult, + specContent: string, + rulesetAnalysis: RulesetAnalysis, + requestedLevel: RemediationSafetyLevel +): RemediationSafetyOutput { + const remediationItems = result.diagnostics + .map((d) => buildRemediationItem(d, specContent, rulesetAnalysis)) + .filter((item) => item.remediationSafetyLevel === requestedLevel); + + return { + specPath: result.specPath, + format: result.format, + totalViolations: result.diagnostics.length, + remediationItemCount: remediationItems.length, + remediationItems, + requestedLevel, + }; +} + +export function formatRemediationSafetyHuman( + result: GradeResult, + specContent: string, + rulesetAnalysis: RulesetAnalysis, + requestedLevel: RemediationSafetyLevel +): string { + const { remediationItems, totalViolations } = buildRemediationSafetyOutput( + result, + specContent, + rulesetAnalysis, + requestedLevel + ); + const lines: string[] = []; + + lines.push(`Remediation Safety: ${requestedLevel} (${remediationItems.length} of ${totalViolations} total violations):`); + + for (const item of remediationItems) { + lines.push(''); + const location = item.location || '(root)'; + lines.push(` ${item.severity.padEnd(5)} ${item.ruleId.padEnd(42)} ${location}`); + lines.push(` risk=${item.riskLevel ?? 'n/a'} confidence=${item.confidenceLevel} safety=${item.remediationSafetyLevel}`); + lines.push(` ${item.message}`); + lines.push(` ${item.expectedImprovement}`); + if (item.staleFingerprintWarning) { + lines.push(` WARNING: ${item.staleFingerprintWarning.message}`); + } + } + + return lines.join('\n'); +} diff --git a/packages/api-grade-core/src/rulesets/bundled-analysis.ts b/packages/api-grade-core/src/rulesets/bundled-analysis.ts new file mode 100644 index 0000000..91b8f1b --- /dev/null +++ b/packages/api-grade-core/src/rulesets/bundled-analysis.ts @@ -0,0 +1,31 @@ +import { readFile } from 'node:fs/promises'; +import { dirname, join } from 'node:path'; +import { fileURLToPath } from 'node:url'; +import type { BundledRulesetAnalysis } from '../types.js'; + +const __dirname = dirname(fileURLToPath(import.meta.url)); + +let openapiCache: BundledRulesetAnalysis | null | undefined; +let asyncapiCache: BundledRulesetAnalysis | null | undefined; + +async function loadJson(fileName: string): Promise { + try { + const data = await readFile(join(__dirname, 'bundled-analysis', fileName), 'utf-8'); + return JSON.parse(data) as BundledRulesetAnalysis; + } catch { + return null; + } +} + +// The built-in ruleset's pre-calculated analysis, shipped with the package (FR-012). Detects +// OpenAPI vs. AsyncAPI by the presence of an "asyncapi"-prefixed rule id, since the built-in +// LoadedRuleset does not otherwise carry the API format back to the analyser. +export async function loadBundledRulesetAnalysis(ruleIds: string[]): Promise { + const isAsyncApi = ruleIds.some((id) => id.startsWith('asyncapi')); + if (isAsyncApi) { + if (asyncapiCache === undefined) asyncapiCache = await loadJson('asyncapi.json'); + return asyncapiCache; + } + if (openapiCache === undefined) openapiCache = await loadJson('openapi.json'); + return openapiCache; +} diff --git a/packages/api-grade-core/src/rulesets/bundled-analysis/asyncapi.json b/packages/api-grade-core/src/rulesets/bundled-analysis/asyncapi.json new file mode 100644 index 0000000..7f1a850 --- /dev/null +++ b/packages/api-grade-core/src/rulesets/bundled-analysis/asyncapi.json @@ -0,0 +1,169 @@ +{ + "rules": { + "asyncapi-info-contact": { + "ruleId": "asyncapi-info-contact", + "riskLevel": null, + "confidenceLevel": "high", + "remediationSafetyLevel": "safe", + "assessedBy": "automated", + "staleFingerprintWarning": null, + "rationale": "seeded safe classification (bundled default, not yet human-reviewed)", + "source": "bundled-default", + "fingerprint": "14a8d82b77003a5eee46743da7abe3078cf8a0cfcb6b4278b542c5f7137e0f9f" + }, + "asyncapi-info-description": { + "ruleId": "asyncapi-info-description", + "riskLevel": null, + "confidenceLevel": "high", + "remediationSafetyLevel": "safe", + "assessedBy": "automated", + "staleFingerprintWarning": null, + "rationale": "seeded safe classification (bundled default, not yet human-reviewed)", + "source": "bundled-default", + "fingerprint": "f29927455a85a03534943e6e54e4e109c17fe795ea43ebb516ee5a2785b4eaab" + }, + "asyncapi-info-license": { + "ruleId": "asyncapi-info-license", + "riskLevel": null, + "confidenceLevel": "high", + "remediationSafetyLevel": "safe", + "assessedBy": "automated", + "staleFingerprintWarning": null, + "rationale": "seeded safe classification (bundled default, not yet human-reviewed)", + "source": "bundled-default", + "fingerprint": "e1f25b8df806e73c25c4d0fd59eec7b3d3e19847ea7060d419d471b67b3ef0d4" + }, + "asyncapi-operation-description": { + "ruleId": "asyncapi-operation-description", + "riskLevel": null, + "confidenceLevel": "high", + "remediationSafetyLevel": "safe", + "assessedBy": "automated", + "staleFingerprintWarning": null, + "rationale": "seeded safe classification (bundled default, not yet human-reviewed)", + "source": "bundled-default", + "fingerprint": "4afab5ea1e1eff4077756b1a08e737134a18573f32939331626000ba91ee833c" + }, + "asyncapi-3-operation-description": { + "ruleId": "asyncapi-3-operation-description", + "riskLevel": null, + "confidenceLevel": "high", + "remediationSafetyLevel": "safe", + "assessedBy": "automated", + "staleFingerprintWarning": null, + "rationale": "seeded safe classification (bundled default, not yet human-reviewed)", + "source": "bundled-default", + "fingerprint": "c5b8b035e99fdfe7a8caca6a4cd4db7a1ee538fa91a9ed70e7d02957272a7d46" + }, + "asyncapi-tag-description": { + "ruleId": "asyncapi-tag-description", + "riskLevel": null, + "confidenceLevel": "high", + "remediationSafetyLevel": "safe", + "assessedBy": "automated", + "staleFingerprintWarning": null, + "rationale": "seeded safe classification (bundled default, not yet human-reviewed)", + "source": "bundled-default", + "fingerprint": "a5404f455c18ca7a9e0b06de566c875a1dcdf90dace94ea07d9a89a7b9c3c978" + }, + "asyncapi-3-tag-description": { + "ruleId": "asyncapi-3-tag-description", + "riskLevel": null, + "confidenceLevel": "high", + "remediationSafetyLevel": "safe", + "assessedBy": "automated", + "staleFingerprintWarning": null, + "rationale": "seeded safe classification (bundled default, not yet human-reviewed)", + "source": "bundled-default", + "fingerprint": "9396925fe26ba8c8ae085204e4d48ab0ca2ffc88e677c731192528cd5b70c048" + }, + "asyncapi-parameter-description": { + "ruleId": "asyncapi-parameter-description", + "riskLevel": null, + "confidenceLevel": "high", + "remediationSafetyLevel": "safe", + "assessedBy": "automated", + "staleFingerprintWarning": null, + "rationale": "seeded safe classification (bundled default, not yet human-reviewed)", + "source": "bundled-default", + "fingerprint": "2c6ec4c507b68b273f5eb7c9917fba37fd61b8acc1ba4fc5df782bf3c2de8166" + }, + "asyncapi-operation-operationId": { + "ruleId": "asyncapi-operation-operationId", + "riskLevel": null, + "confidenceLevel": "high", + "remediationSafetyLevel": "humanreview", + "assessedBy": "automated", + "staleFingerprintWarning": null, + "rationale": "seeded humanreview classification (bundled default, not yet human-reviewed)", + "source": "bundled-default", + "fingerprint": "f607ca3c154f8db8839c2c0913d270d15ee3ea8aa244942f66f9b39e268da486" + }, + "asyncapi-server-not-example-com": { + "ruleId": "asyncapi-server-not-example-com", + "riskLevel": null, + "confidenceLevel": "high", + "remediationSafetyLevel": "humanreview", + "assessedBy": "automated", + "staleFingerprintWarning": null, + "rationale": "seeded humanreview classification (bundled default, not yet human-reviewed)", + "source": "bundled-default", + "fingerprint": "03af955c89ab648c2ccfad4f7276d0c41d0a7b8557906723d91b9fc77310b55d" + }, + "asyncapi-3-server-not-example-com": { + "ruleId": "asyncapi-3-server-not-example-com", + "riskLevel": null, + "confidenceLevel": "high", + "remediationSafetyLevel": "humanreview", + "assessedBy": "automated", + "staleFingerprintWarning": null, + "rationale": "seeded humanreview classification (bundled default, not yet human-reviewed)", + "source": "bundled-default", + "fingerprint": "979ffad6a6a767b3324a1c288a2fde45513e2fcab0e45f5ff93a3bc2ddb37533" + }, + "asyncapi-operation-security": { + "ruleId": "asyncapi-operation-security", + "riskLevel": null, + "confidenceLevel": "high", + "remediationSafetyLevel": "humanreview", + "assessedBy": "automated", + "staleFingerprintWarning": null, + "rationale": "seeded humanreview classification (bundled default, not yet human-reviewed)", + "source": "bundled-default", + "fingerprint": "167a6e72e70912bf806a952b5b7bcc172cfc47f91c9f47e1cfbcfec11921adab" + }, + "asyncapi-3-operation-security": { + "ruleId": "asyncapi-3-operation-security", + "riskLevel": null, + "confidenceLevel": "high", + "remediationSafetyLevel": "humanreview", + "assessedBy": "automated", + "staleFingerprintWarning": null, + "rationale": "seeded humanreview classification (bundled default, not yet human-reviewed)", + "source": "bundled-default", + "fingerprint": "c4b3dffd9fbdeb5970c2f3eceab232ce2ac835d1b61e9c1c4f6d4d873cc670bc" + }, + "asyncapi-schema": { + "ruleId": "asyncapi-schema", + "riskLevel": null, + "confidenceLevel": "high", + "remediationSafetyLevel": "unsafe", + "assessedBy": "automated", + "staleFingerprintWarning": null, + "rationale": "seeded unsafe classification (bundled default, not yet human-reviewed)", + "source": "bundled-default", + "fingerprint": "2447324def94a87a5cd8a6ce78b8674b0dd401c06d8f6cf48a4fee7c5547a15f" + }, + "asyncapi-payload": { + "ruleId": "asyncapi-payload", + "riskLevel": null, + "confidenceLevel": "high", + "remediationSafetyLevel": "unsafe", + "assessedBy": "automated", + "staleFingerprintWarning": null, + "rationale": "seeded unsafe classification (bundled default, not yet human-reviewed)", + "source": "bundled-default", + "fingerprint": "58be623c3acad195a9cafe9026928924f2701753c2ce5d4c2e94105aeb93668a" + } + } +} diff --git a/packages/api-grade-core/src/rulesets/bundled-analysis/openapi.json b/packages/api-grade-core/src/rulesets/bundled-analysis/openapi.json new file mode 100644 index 0000000..dcc6bbf --- /dev/null +++ b/packages/api-grade-core/src/rulesets/bundled-analysis/openapi.json @@ -0,0 +1,169 @@ +{ + "rules": { + "operation-description": { + "ruleId": "operation-description", + "riskLevel": null, + "confidenceLevel": "high", + "remediationSafetyLevel": "safe", + "assessedBy": "automated", + "staleFingerprintWarning": null, + "rationale": "seeded safe classification (bundled default, not yet human-reviewed)", + "source": "bundled-default", + "fingerprint": "0305ec86216b7eefa356cc733bc9d7569aec013ba372cd4c571e8583aded08ec" + }, + "info-contact": { + "ruleId": "info-contact", + "riskLevel": null, + "confidenceLevel": "high", + "remediationSafetyLevel": "safe", + "assessedBy": "automated", + "staleFingerprintWarning": null, + "rationale": "seeded safe classification (bundled default, not yet human-reviewed)", + "source": "bundled-default", + "fingerprint": "72e1580b37bf865c7918de14a114f28ea64b852e8cbae51ffad2351783bf9100" + }, + "info-description": { + "ruleId": "info-description", + "riskLevel": null, + "confidenceLevel": "high", + "remediationSafetyLevel": "safe", + "assessedBy": "automated", + "staleFingerprintWarning": null, + "rationale": "seeded safe classification (bundled default, not yet human-reviewed)", + "source": "bundled-default", + "fingerprint": "a93095e4601bcadb3ab69e44d3208cdf08f870032b30c18140fcce99905bad7a" + }, + "info-license": { + "ruleId": "info-license", + "riskLevel": null, + "confidenceLevel": "high", + "remediationSafetyLevel": "safe", + "assessedBy": "automated", + "staleFingerprintWarning": null, + "rationale": "seeded safe classification (bundled default, not yet human-reviewed)", + "source": "bundled-default", + "fingerprint": "f265560911a15d556ded4d9c321dbc40cbcae30680021591fda50ab0d1b34114" + }, + "oas3-examples-value-or-externalValue": { + "ruleId": "oas3-examples-value-or-externalValue", + "riskLevel": null, + "confidenceLevel": "high", + "remediationSafetyLevel": "safe", + "assessedBy": "automated", + "staleFingerprintWarning": null, + "rationale": "seeded safe classification (bundled default, not yet human-reviewed)", + "source": "bundled-default", + "fingerprint": "7628f3c64ad0fb1e3b5791edb30e002eda5e0629e0cc432c8bb286f5363dd0aa" + }, + "tag-description": { + "ruleId": "tag-description", + "riskLevel": null, + "confidenceLevel": "high", + "remediationSafetyLevel": "safe", + "assessedBy": "automated", + "staleFingerprintWarning": null, + "rationale": "seeded safe classification (bundled default, not yet human-reviewed)", + "source": "bundled-default", + "fingerprint": "69087f9af752a6b57dd7f3e0b7aeaa12a3262f2abe6e41d796c3234627947034" + }, + "operation-operationId": { + "ruleId": "operation-operationId", + "riskLevel": null, + "confidenceLevel": "high", + "remediationSafetyLevel": "humanreview", + "assessedBy": "automated", + "staleFingerprintWarning": null, + "rationale": "seeded humanreview classification (bundled default, not yet human-reviewed)", + "source": "bundled-default", + "fingerprint": "5a4634a447eeeb9f2ce5d4f01ca7c95ed5e5249c27da82bac592abcf85759b30" + }, + "operation-success-response": { + "ruleId": "operation-success-response", + "riskLevel": null, + "confidenceLevel": "high", + "remediationSafetyLevel": "humanreview", + "assessedBy": "automated", + "staleFingerprintWarning": null, + "rationale": "seeded humanreview classification (bundled default, not yet human-reviewed)", + "source": "bundled-default", + "fingerprint": "7b9badb6a9e915bc586c7fb2b77c9ef9fd2b415bf10e0ce6bd4049f0198668e8" + }, + "oas3-server-not-example.com": { + "ruleId": "oas3-server-not-example.com", + "riskLevel": null, + "confidenceLevel": "high", + "remediationSafetyLevel": "humanreview", + "assessedBy": "automated", + "staleFingerprintWarning": null, + "rationale": "seeded humanreview classification (bundled default, not yet human-reviewed)", + "source": "bundled-default", + "fingerprint": "f6b1052227e31afd8ba10018ea2a3073fd8093e18db7d0994d5d8b484a1803ea" + }, + "oas3-server-trailing-slash": { + "ruleId": "oas3-server-trailing-slash", + "riskLevel": null, + "confidenceLevel": "high", + "remediationSafetyLevel": "humanreview", + "assessedBy": "automated", + "staleFingerprintWarning": null, + "rationale": "seeded humanreview classification (bundled default, not yet human-reviewed)", + "source": "bundled-default", + "fingerprint": "f75935967e02e5c13a00c31220e4a33645b248b1078c6574384caded25877ec7" + }, + "oas3-operation-security-defined": { + "ruleId": "oas3-operation-security-defined", + "riskLevel": null, + "confidenceLevel": "high", + "remediationSafetyLevel": "humanreview", + "assessedBy": "automated", + "staleFingerprintWarning": null, + "rationale": "seeded humanreview classification (bundled default, not yet human-reviewed)", + "source": "bundled-default", + "fingerprint": "8b717395498dd587200620a21dd258dce0573157c683f118faa08c1b9268c5dd" + }, + "oas2-operation-security-defined": { + "ruleId": "oas2-operation-security-defined", + "riskLevel": null, + "confidenceLevel": "high", + "remediationSafetyLevel": "humanreview", + "assessedBy": "automated", + "staleFingerprintWarning": null, + "rationale": "seeded humanreview classification (bundled default, not yet human-reviewed)", + "source": "bundled-default", + "fingerprint": "0873339c33c75d41bca58448b9fe29c4e263c21d17dd97309c52d53b054c0722" + }, + "oas3-schema": { + "ruleId": "oas3-schema", + "riskLevel": null, + "confidenceLevel": "high", + "remediationSafetyLevel": "unsafe", + "assessedBy": "automated", + "staleFingerprintWarning": null, + "rationale": "seeded unsafe classification (bundled default, not yet human-reviewed)", + "source": "bundled-default", + "fingerprint": "3859ddd4e60b7e4907b17e90f91d676515808af7ea23440350c975c02705a19f" + }, + "oas3-valid-schema-example": { + "ruleId": "oas3-valid-schema-example", + "riskLevel": null, + "confidenceLevel": "high", + "remediationSafetyLevel": "unsafe", + "assessedBy": "automated", + "staleFingerprintWarning": null, + "rationale": "seeded unsafe classification (bundled default, not yet human-reviewed)", + "source": "bundled-default", + "fingerprint": "9aa73f0cdf9afd7ea70998b65471ec31a57c8052dc580a583512edad5dc915dc" + }, + "oas2-schema": { + "ruleId": "oas2-schema", + "riskLevel": null, + "confidenceLevel": "high", + "remediationSafetyLevel": "unsafe", + "assessedBy": "automated", + "staleFingerprintWarning": null, + "rationale": "seeded unsafe classification (bundled default, not yet human-reviewed)", + "source": "bundled-default", + "fingerprint": "897ddabf2e24de2b765760eae0a30e6914c10375de18e5d2dc27fe307348d23c" + } + } +} diff --git a/packages/api-grade-core/src/types.ts b/packages/api-grade-core/src/types.ts index 1366ef5..283330f 100644 --- a/packages/api-grade-core/src/types.ts +++ b/packages/api-grade-core/src/types.ts @@ -110,9 +110,40 @@ export interface SessionState { sessionRulesetOverride: 'builtin' | null; } -export type ViolationClass = 'nonBreaking' | 'breaking' | 'unknown'; +export type RemediationSafetyLevel = 'safe' | 'humanreview' | 'unsafe'; -export interface QuickFix { +export type RiskLevel = 'low' | 'medium' | 'high'; + +export type ConfidenceLevel = 'high' | 'medium' | 'low'; + +export type AssessmentOrigin = 'human' | 'automated'; + +export type AnalysisSource = 'persisted' | 'bundled-default' | 'heuristic' | 'fallback'; + +export interface StaleFingerprintWarning { + storedFingerprint: string; + currentFingerprint: string; + message: string; +} + +export interface RuleAnalysis { + ruleId: string; + riskLevel: RiskLevel | null; + confidenceLevel: ConfidenceLevel; + remediationSafetyLevel: RemediationSafetyLevel; + assessedBy: AssessmentOrigin; + staleFingerprintWarning: StaleFingerprintWarning | null; + rationale: string; + source: AnalysisSource; +} + +export interface RulesetAnalysis { + rulesetSource: 'default' | 'custom'; + rulesetPath?: string; + rules: RuleAnalysis[]; +} + +export interface RemediationItem { ruleId: string; message: string; severity: string; @@ -120,6 +151,10 @@ export interface QuickFix { location: string; currentValue: string | null; expectedImprovement: string; + riskLevel: RiskLevel | null; + confidenceLevel: ConfidenceLevel; + remediationSafetyLevel: RemediationSafetyLevel; + staleFingerprintWarning: StaleFingerprintWarning | null; } export interface CommonGradeOutput { @@ -143,10 +178,29 @@ export interface AssertOutput { numericScore: number; } -export interface QuickFixOutput { +export interface PersistedRuleEntry extends RuleAnalysis { + fingerprint: string; +} + +export interface SharedRulesetAnalysis { + location: string; + rules: Record; +} + +export interface PersonalRulesetAnalysisOverride { + scope: 'workspace' | 'global'; + rules: Record; +} + +export interface BundledRulesetAnalysis { + rules: Record; +} + +export interface RemediationSafetyOutput { specPath: string; format: ApiFormat; totalViolations: number; - quickFixCount: number; - quickFixes: QuickFix[]; + remediationItemCount: number; + remediationItems: RemediationItem[]; + requestedLevel: RemediationSafetyLevel; } diff --git a/packages/api-grade-core/tests/unit/quick-fixes.test.ts b/packages/api-grade-core/tests/unit/quick-fixes.test.ts deleted file mode 100644 index 961334c..0000000 --- a/packages/api-grade-core/tests/unit/quick-fixes.test.ts +++ /dev/null @@ -1,114 +0,0 @@ -import { describe, it, expect } from 'vitest'; -import { classifyViolation, buildQuickFix, buildQuickFixOutput, formatQuickFixesHuman } from '../../src/quick-fixes.js'; -import type { Diagnostic, GradeResult } from '../../src/types.js'; - -function makeDiagnostic(overrides: Partial): Diagnostic { - return { - ruleId: 'test-rule', - message: 'test message', - severity: 'warn', - path: [], - range: { start: { line: 0, character: 0 }, end: { line: 0, character: 0 } }, - source: 'test.yaml', - ...overrides, - }; -} - -describe('classifyViolation()', () => { - it('classifies operation-description as nonBreaking (rule ID override)', () => { - const d = makeDiagnostic({ ruleId: 'operation-description', path: ['paths', '/pets', 'get'] }); - expect(classifyViolation(d)).toBe('nonBreaking'); - }); - - it('classifies violation at required field as breaking', () => { - const d = makeDiagnostic({ ruleId: 'some-rule', path: ['paths', '/pets', 'get', 'parameters', '0', 'required'] }); - expect(classifyViolation(d)).toBe('breaking'); - }); - - it('classifies info-contact as nonBreaking (rule ID override)', () => { - const d = makeDiagnostic({ ruleId: 'info-contact', path: ['info'] }); - expect(classifyViolation(d)).toBe('nonBreaking'); - }); - - it('classifies violation with x- extension path as nonBreaking', () => { - const d = makeDiagnostic({ ruleId: 'some-rule', path: ['info', 'x-logo'] }); - expect(classifyViolation(d)).toBe('nonBreaking'); - }); - - it('classifies unknown path with no recognised segments as unknown', () => { - const d = makeDiagnostic({ ruleId: 'obscure-rule', path: ['components', 'securitySchemes', 'oauth2'] }); - expect(classifyViolation(d)).toBe('unknown'); - }); - - it('classifies oas3-examples-* rules as nonBreaking', () => { - const d = makeDiagnostic({ ruleId: 'oas3-examples-value-or-externalValue', path: ['paths', '/pets', 'get'] }); - expect(classifyViolation(d)).toBe('nonBreaking'); - }); - - it('classifies description path segment as nonBreaking', () => { - const d = makeDiagnostic({ ruleId: 'some-rule', path: ['info', 'description'] }); - expect(classifyViolation(d)).toBe('nonBreaking'); - }); - - it('classifies type path segment as breaking', () => { - const d = makeDiagnostic({ ruleId: 'some-rule', path: ['components', 'schemas', 'Pet', 'type'] }); - expect(classifyViolation(d)).toBe('breaking'); - }); -}); - -describe('buildQuickFix()', () => { - it('builds a QuickFix from a diagnostic', () => { - const d = makeDiagnostic({ ruleId: 'info-contact', message: 'Missing contact', path: ['info'], severity: 'warn' }); - const fix = buildQuickFix(d, '{}'); - expect(fix.ruleId).toBe('info-contact'); - expect(fix.location).toBe('info'); - expect(fix.expectedImprovement).toContain('contact'); - }); -}); - -const baseResult: GradeResult = { - specPath: 'test.yaml', - format: 'openapi-3', - letterGrade: 'B', - gradeLabel: 'Good', - numericScore: 85, - summary: { - tone: 'Good', - severityLevel: 'INFO', - errorCount: 0, - warnCount: 2, - infoCount: 0, - hintCount: 0, - commentary: 'Good.', - text: 'Good.', - focusRules: [], - recommendations: [], - }, - diagnostics: [ - makeDiagnostic({ ruleId: 'info-contact', message: 'Missing contact', path: ['info'], severity: 'warn' }), - makeDiagnostic({ ruleId: 'some-rule', message: 'Required field missing', path: ['paths', '/pets', 'get', 'required'], severity: 'error' }), - ], - rulesetSource: 'default', -}; - -describe('buildQuickFixOutput()', () => { - it('filters diagnostics to the nonBreaking subset and counts totals', () => { - const output = buildQuickFixOutput(baseResult, '{}'); - expect(output.specPath).toBe('test.yaml'); - expect(output.format).toBe('openapi-3'); - expect(output.totalViolations).toBe(2); - expect(output.quickFixCount).toBe(1); - expect(output.quickFixes).toHaveLength(1); - expect(output.quickFixes[0].ruleId).toBe('info-contact'); - }); -}); - -describe('formatQuickFixesHuman()', () => { - it('renders the filtered quick-fix list as human-readable text', () => { - const text = formatQuickFixesHuman(baseResult, '{}'); - expect(text).toContain('Quick Fixes'); - expect(text).toContain('info-contact'); - expect(text).toContain('Missing contact'); - expect(text).not.toContain('some-rule'); - }); -}); diff --git a/packages/api-grade-core/tests/unit/remediation-safety.test.ts b/packages/api-grade-core/tests/unit/remediation-safety.test.ts new file mode 100644 index 0000000..c59ce88 --- /dev/null +++ b/packages/api-grade-core/tests/unit/remediation-safety.test.ts @@ -0,0 +1,358 @@ +import { describe, it, expect, beforeEach, afterEach } from 'vitest'; +import { mkdtempSync, rmSync, writeFileSync, existsSync } from 'node:fs'; +import { tmpdir } from 'node:os'; +import { join } from 'node:path'; +import { + analyseRuleset, + getRemediationSafety, + buildRemediationItem, + buildRemediationSafetyOutput, + formatRemediationSafetyHuman, + decisionMatrix, + computeRuleFingerprint, + persistRuleAnalysisCorrection, +} from '../../src/remediation-safety.js'; +import { deriveSharedAnalysisLocation } from '../../src/config/shared-ruleset-analysis.js'; +import { getWorkspaceOverridePath } from '../../src/config/personal-ruleset-override.js'; +import type { Diagnostic, GradeResult } from '../../src/types.js'; +import type { LoadedRuleset } from '../../src/rulesets/loader.js'; + +function makeDiagnostic(overrides: Partial): Diagnostic { + return { + ruleId: 'test-rule', + message: 'test message', + severity: 'warn', + path: [], + range: { start: { line: 0, character: 0 }, end: { line: 0, character: 0 } }, + source: 'test.yaml', + ...overrides, + }; +} + +function makeRuleset(rules: Record, rulesetSource: 'default' | 'custom' = 'custom', rulesetPath?: string): LoadedRuleset { + return { ruleset: { rules }, rulesetSource, ...(rulesetPath !== undefined ? { rulesetPath } : {}) }; +} + +describe('decisionMatrix()', () => { + it('low risk + high/medium confidence => safe', () => { + expect(decisionMatrix('low', 'high')).toBe('safe'); + expect(decisionMatrix('low', 'medium')).toBe('safe'); + }); + it('medium risk + high confidence => humanreview', () => { + expect(decisionMatrix('medium', 'high')).toBe('humanreview'); + }); + it('high risk (any confidence) => unsafe', () => { + expect(decisionMatrix('high', 'high')).toBe('unsafe'); + expect(decisionMatrix('high', 'low')).toBe('unsafe'); + }); + it('every other combination => humanreview', () => { + expect(decisionMatrix('low', 'low')).toBe('humanreview'); + expect(decisionMatrix('medium', 'medium')).toBe('humanreview'); + expect(decisionMatrix('medium', 'low')).toBe('humanreview'); + }); +}); + +describe('analyseRuleset() — Stage 1a key-selector check', () => { + it('classifies a path-key-selector rule as unsafe/high', async () => { + const ruleset = makeRuleset({ + 'custom-naming-convention': { given: '$.paths[*]~', then: { function: 'casing' } }, + }); + const { rules } = await analyseRuleset(ruleset); + expect(rules[0].riskLevel).toBe('high'); + expect(rules[0].confidenceLevel).toBe('high'); + expect(rules[0].remediationSafetyLevel).toBe('unsafe'); + expect(rules[0].source).toBe('heuristic'); + }); + + it('classifies a channel-key-selector rule as unsafe/high', async () => { + const ruleset = makeRuleset({ + 'custom-channel-rename': { given: '$.channels[*]~', then: { function: 'casing' } }, + }); + const { rules } = await analyseRuleset(ruleset); + expect(rules[0].riskLevel).toBe('high'); + expect(rules[0].confidenceLevel).toBe('high'); + }); +}); + +describe('analyseRuleset() — Stage 1b function-mechanics classification', () => { + it('additive function on a safe segment => low risk => safe', async () => { + const ruleset = makeRuleset({ + 'operation-description': { given: '$.paths[*][*]', then: { field: 'description', function: 'truthy' } }, + }); + const { rules } = await analyseRuleset(ruleset); + expect(rules[0].riskLevel).toBe('low'); + expect(rules[0].remediationSafetyLevel).toBe('safe'); + }); + + it('additive function on an unsafe segment => high risk => unsafe', async () => { + const ruleset = makeRuleset({ + 'custom-required-truthy': { given: '$.paths[*][*].parameters[*].required', then: { function: 'truthy' } }, + }); + const { rules } = await analyseRuleset(ruleset); + expect(rules[0].riskLevel).toBe('high'); + expect(rules[0].remediationSafetyLevel).toBe('unsafe'); + }); + + it('rename function (pattern/casing) on default target => medium risk', async () => { + const ruleset = makeRuleset({ + 'custom-pattern-rule': { given: '$.components.schemas', then: { function: 'pattern' } }, + }); + const { rules } = await analyseRuleset(ruleset); + expect(rules[0].riskLevel).toBe('medium'); + }); + + it('custom (unrecognized) function => high risk, low confidence', async () => { + const ruleset = makeRuleset({ + 'my-custom-rule': { given: '$.info', then: { function: 'myCustomFn' } }, + }); + const { rules } = await analyseRuleset(ruleset); + expect(rules[0].riskLevel).toBe('high'); + expect(rules[0].confidenceLevel).toBe('low'); + expect(rules[0].remediationSafetyLevel).toBe('unsafe'); + }); +}); + +describe('analyseRuleset() — Stage 1c generic segment fallback', () => { + it('unrecognized function, given matches single unsafe segment => high/medium confidence', async () => { + const ruleset = makeRuleset({ + 'custom-required-header': { + given: "$.paths[*][*].parameters[?(@.in=='header')].required", + then: { function: 'schema' }, + }, + }); + const { rules } = await analyseRuleset(ruleset); + expect(rules[0].riskLevel).toBe('high'); + expect(rules[0].confidenceLevel).toBe('medium'); + expect(rules[0].source).toBe('heuristic'); + }); + + it('given matches multiple tiers => ambiguous, low confidence', async () => { + const ruleset = makeRuleset({ + 'custom-ambiguous': { + given: '$.paths[*][*].description.required', + then: { function: 'schema' }, + }, + }); + const { rules } = await analyseRuleset(ruleset); + expect(rules[0].confidenceLevel).toBe('low'); + }); +}); + +describe('analyseRuleset() — Stage 2 whole-document fallback', () => { + it('no rule-id, function, or path signal at all => unsafe/low', async () => { + const ruleset = makeRuleset({ + 'oas3-schema': { given: '$', then: { function: 'schema' } }, + }); + const { rules } = await analyseRuleset(ruleset); + expect(rules[0].riskLevel).toBe('high'); + expect(rules[0].confidenceLevel).toBe('low'); + expect(rules[0].remediationSafetyLevel).toBe('unsafe'); + expect(rules[0].source).toBe('fallback'); + }); +}); + +describe('analyseRuleset() — total coverage (SC-005)', () => { + it('produces exactly one RuleAnalysis entry per rule key, no omissions', async () => { + const ruleset = makeRuleset({ + 'rule-a': { given: '$.info', then: { function: 'truthy' } }, + 'rule-b': { given: '$', then: { function: 'unknownFn' } }, + 'rule-c': { given: '$.paths[*]~', then: { function: 'casing' } }, + }); + const { rules } = await analyseRuleset(ruleset); + expect(rules).toHaveLength(3); + expect(rules.map((r) => r.ruleId).sort()).toEqual(['rule-a', 'rule-b', 'rule-c']); + }); +}); + +describe('getRemediationSafety()', () => { + it('returns the rule analysis fields verbatim for a recognized ruleId', async () => { + const ruleset = makeRuleset({ + 'operation-description': { given: '$.paths[*][*]', then: { field: 'description', function: 'truthy' } }, + }); + const rulesetAnalysis = await analyseRuleset(ruleset); + const d = makeDiagnostic({ ruleId: 'operation-description' }); + const result = getRemediationSafety(d, rulesetAnalysis); + expect(result.remediationSafetyLevel).toBe('safe'); + expect(result.riskLevel).toBe('low'); + }); + + it('FR-009: defaults to unsafe/low/high on lookup miss', async () => { + const ruleset = makeRuleset({ + 'operation-description': { given: '$.paths[*][*]', then: { field: 'description', function: 'truthy' } }, + }); + const rulesetAnalysis = await analyseRuleset(ruleset); + const d = makeDiagnostic({ ruleId: 'never-seen-rule' }); + const result = getRemediationSafety(d, rulesetAnalysis); + expect(result).toEqual({ + riskLevel: 'high', + confidenceLevel: 'low', + remediationSafetyLevel: 'unsafe', + staleFingerprintWarning: null, + }); + }); +}); + +describe('buildRemediationItem()', () => { + it('builds a RemediationItem from a diagnostic with safety fields attached', async () => { + const ruleset = makeRuleset({ + 'info-contact': { given: '$.info', then: { function: 'truthy' } }, + }); + const rulesetAnalysis = await analyseRuleset(ruleset); + const d = makeDiagnostic({ ruleId: 'info-contact', message: 'Missing contact', path: ['info'], severity: 'warn' }); + const item = buildRemediationItem(d, '{}', rulesetAnalysis); + expect(item.ruleId).toBe('info-contact'); + expect(item.location).toBe('info'); + expect(item.remediationSafetyLevel).toBe('safe'); + expect(item.riskLevel).toBe('low'); + }); +}); + +describe('buildRemediationSafetyOutput() / formatRemediationSafetyHuman()', () => { + const baseRuleset = makeRuleset({ + 'info-contact': { given: '$.info', then: { function: 'truthy' } }, + 'custom-required-rule': { given: '$.paths[*][*].required', then: { function: 'schema' } }, + }); + + const baseResult: GradeResult = { + specPath: 'test.yaml', + format: 'openapi-3', + letterGrade: 'B', + gradeLabel: 'Good', + numericScore: 85, + summary: { + tone: 'Good', + severityLevel: 'INFO', + errorCount: 0, + warnCount: 2, + infoCount: 0, + hintCount: 0, + commentary: 'Good.', + text: 'Good.', + focusRules: [], + recommendations: [], + }, + diagnostics: [ + makeDiagnostic({ ruleId: 'info-contact', message: 'Missing contact', path: ['info'], severity: 'warn' }), + makeDiagnostic({ ruleId: 'custom-required-rule', message: 'Required field missing', path: ['paths', '/pets', 'get', 'required'], severity: 'error' }), + ], + rulesetSource: 'default', + }; + + it('filters diagnostics to the requested level and counts totals', async () => { + const rulesetAnalysis = await analyseRuleset(baseRuleset); + const output = buildRemediationSafetyOutput(baseResult, '{}', rulesetAnalysis, 'safe'); + expect(output.specPath).toBe('test.yaml'); + expect(output.format).toBe('openapi-3'); + expect(output.totalViolations).toBe(2); + expect(output.requestedLevel).toBe('safe'); + expect(output.remediationItemCount).toBe(1); + expect(output.remediationItems).toHaveLength(1); + expect(output.remediationItems[0].ruleId).toBe('info-contact'); + }); + + it('safe membership is unchanged from pre-feature classifyViolation() behavior (FR-007)', async () => { + const rulesetAnalysis = await analyseRuleset(baseRuleset); + const output = buildRemediationSafetyOutput(baseResult, '{}', rulesetAnalysis, 'safe'); + expect(output.remediationItems.map((i) => i.ruleId)).toEqual(['info-contact']); + }); + + it('renders the filtered list as human-readable text', async () => { + const rulesetAnalysis = await analyseRuleset(baseRuleset); + const text = formatRemediationSafetyHuman(baseResult, '{}', rulesetAnalysis, 'safe'); + expect(text).toContain('Remediation Safety: safe'); + expect(text).toContain('info-contact'); + expect(text).toContain('Missing contact'); + expect(text).not.toContain('custom-required-rule'); + }); +}); + +describe('computeRuleFingerprint()', () => { + it('is stable for the same rule definition', () => { + const rule = { given: '$.info', then: { function: 'truthy' }, severity: 1, description: 'd' }; + expect(computeRuleFingerprint('rule-a', rule)).toBe(computeRuleFingerprint('rule-a', rule)); + }); + + it('changes when the rule definition changes', () => { + const ruleA = { given: '$.info', then: { function: 'truthy' }, severity: 1, description: 'd' }; + const ruleB = { given: '$.info', then: { function: 'truthy' }, severity: 1, description: 'changed' }; + expect(computeRuleFingerprint('rule-a', ruleA)).not.toBe(computeRuleFingerprint('rule-a', ruleB)); + }); +}); + +describe('Stage 0 precedence, fingerprint staleness, and persisted corrections', () => { + let workDir: string; + let originalCwd: typeof process.cwd; + + beforeEach(() => { + workDir = mkdtempSync(join(tmpdir(), 'api-grade-stage0-')); + originalCwd = process.cwd; + process.cwd = () => workDir; + }); + + afterEach(() => { + process.cwd = originalCwd; + rmSync(workDir, { recursive: true, force: true }); + }); + + it('honors a colocated shared analysis entry over the automated heuristic', async () => { + const rulesetPath = join(workDir, 'custom.yaml'); + writeFileSync(rulesetPath, 'extends: []\n'); + const ruleset = makeRuleset( + { 'custom-rule': { given: '$', then: { function: 'schema' } } }, + 'custom', + rulesetPath + ); + + const result = await persistRuleAnalysisCorrection(ruleset, 'custom-rule', 'safe', 'shared'); + expect(result.written).toBe('shared'); + expect(existsSync(deriveSharedAnalysisLocation(rulesetPath))).toBe(true); + + const analysis = await analyseRuleset(ruleset); + const entry = analysis.rules.find((r) => r.ruleId === 'custom-rule'); + expect(entry?.remediationSafetyLevel).toBe('safe'); + expect(entry?.assessedBy).toBe('human'); + expect(entry?.source).toBe('persisted'); + expect(entry?.staleFingerprintWarning).toBeNull(); + }); + + it('flags a human-assessed entry whose fingerprint no longer matches, but still honors it', async () => { + const rulesetPath = join(workDir, 'custom.yaml'); + writeFileSync(rulesetPath, 'extends: []\n'); + const originalRule = { given: '$', then: { function: 'schema' }, description: 'original' }; + const ruleset = makeRuleset({ 'custom-rule': originalRule }, 'custom', rulesetPath); + + await persistRuleAnalysisCorrection(ruleset, 'custom-rule', 'safe', 'shared'); + + const editedRule = { given: '$', then: { function: 'schema' }, description: 'edited since review' }; + const editedRuleset = makeRuleset({ 'custom-rule': editedRule }, 'custom', rulesetPath); + + const analysis = await analyseRuleset(editedRuleset); + const entry = analysis.rules.find((r) => r.ruleId === 'custom-rule'); + expect(entry?.remediationSafetyLevel).toBe('safe'); + expect(entry?.assessedBy).toBe('human'); + expect(entry?.staleFingerprintWarning).not.toBeNull(); + expect(entry?.staleFingerprintWarning?.storedFingerprint).not.toBe(entry?.staleFingerprintWarning?.currentFingerprint); + }); + + it('personal workspace override takes precedence over the shared colocated analysis', async () => { + const rulesetPath = join(workDir, 'custom.yaml'); + writeFileSync(rulesetPath, 'extends: []\n'); + const ruleset = makeRuleset({ 'custom-rule': { given: '$', then: { function: 'schema' } } }, 'custom', rulesetPath); + + await persistRuleAnalysisCorrection(ruleset, 'custom-rule', 'humanreview', 'shared'); + await persistRuleAnalysisCorrection(ruleset, 'custom-rule', 'safe', 'personal-workspace'); + expect(existsSync(getWorkspaceOverridePath())).toBe(true); + + const analysis = await analyseRuleset(ruleset); + const entry = analysis.rules.find((r) => r.ruleId === 'custom-rule'); + expect(entry?.remediationSafetyLevel).toBe('safe'); + }); + + it('falls back to a personal-override write for a non-writable (built-in) ruleset location', async () => { + const ruleset = makeRuleset({ 'operation-description': { given: '$.info', then: { function: 'truthy' } } }, 'default'); + const result = await persistRuleAnalysisCorrection(ruleset, 'operation-description', 'unsafe', 'shared'); + expect(result.written).toBe('personal-fallback'); + expect(result.sharedFileContent).toBeDefined(); + expect(existsSync(getWorkspaceOverridePath())).toBe(true); + }); +}); diff --git a/packages/api-grade-mcp/README.md b/packages/api-grade-mcp/README.md index 588dfb9..bb35be7 100644 --- a/packages/api-grade-mcp/README.md +++ b/packages/api-grade-mcp/README.md @@ -1,6 +1,6 @@ # @dawmatt/api-grade-mcp -MCP (Model Context Protocol) server that exposes api-grade capabilities as six AI tools — grade OpenAPI and AsyncAPI specifications directly from Claude Code, GitHub Copilot, or any MCP-compatible AI host. +MCP (Model Context Protocol) server that exposes api-grade capabilities as seven AI tools — grade OpenAPI and AsyncAPI specifications directly from Claude Code, GitHub Copilot, or any MCP-compatible AI host. ## Installation @@ -60,7 +60,8 @@ Create `.vscode/mcp.json` in your project root: | `grade-api` | Letter grade, score, and summary — token-efficient overview | | `grade-api-detailed` | Full grade with all violations and diagnostics | | `assert-api-grade` | Pass/fail assertion for a minimum grade threshold | -| `grade-api-remediation-safety` | Classified list of diagnostics filtered by remediation safety level (`safe`: non-breaking improvements) for AI-assisted correction | +| `grade-api-remediation-safety` | Classified list of diagnostics filtered by remediation safety level (`safe`, `humanreview`, or `unsafe`), each with a risk/confidence indicator, for AI-assisted correction | +| `analyse-ruleset-safety` | Per-rule risk, confidence, and remediation-safety analysis for a ruleset, independent of grading any spec | | `set-ruleset-config` | Set the default Spectral ruleset at session, workspace, or global scope | | `get-ruleset-config` | Get the active Spectral ruleset and which scope is effective | diff --git a/packages/api-grade-mcp/src/server.ts b/packages/api-grade-mcp/src/server.ts index 95721bd..4b7f612 100644 --- a/packages/api-grade-mcp/src/server.ts +++ b/packages/api-grade-mcp/src/server.ts @@ -5,9 +5,10 @@ import { resolve, dirname } from 'node:path'; import { registerGradeTool } from './tools/grade.js'; import { registerAssertGradeTool } from './tools/assert-grade.js'; import { registerGradeDetailedTool } from './tools/grade-detailed.js'; -import { registerQuickFixesOnlyTool } from './tools/quick-fixes-only.js'; +import { registerRemediationSafetyTool } from './tools/remediation-safety.js'; import { registerSetRulesetConfigTool } from './tools/set-ruleset-config.js'; import { registerGetRulesetConfigTool } from './tools/get-ruleset-config.js'; +import { registerAnalyseRulesetSafetyTool } from './tools/analyse-ruleset-safety.js'; import type { SessionState } from './types.js'; function getVersion(): string { @@ -27,8 +28,9 @@ export function createServer(): McpServer { registerGradeTool(server, sessionState); registerAssertGradeTool(server, sessionState); registerGradeDetailedTool(server, sessionState); - registerQuickFixesOnlyTool(server, sessionState); + registerRemediationSafetyTool(server, sessionState); registerSetRulesetConfigTool(server, sessionState); registerGetRulesetConfigTool(server, sessionState); + registerAnalyseRulesetSafetyTool(server, sessionState); return server; } diff --git a/packages/api-grade-mcp/src/tools/analyse-ruleset-safety.ts b/packages/api-grade-mcp/src/tools/analyse-ruleset-safety.ts new file mode 100644 index 0000000..9428219 --- /dev/null +++ b/packages/api-grade-mcp/src/tools/analyse-ruleset-safety.ts @@ -0,0 +1,106 @@ +import { statSync, writeFileSync, unlinkSync } from 'node:fs'; +import { tmpdir } from 'node:os'; +import { join } from 'node:path'; +import { z } from 'zod'; +import { + loadWorkspaceConfig, + loadGlobalConfig, + resolveRuleset, + fetchRulesetContent, + RulesetAuthError, + INITIAL_FETCH_TIMEOUT_MS, + RETRY_FETCH_TIMEOUT_MS, + analyseRuleset, + loadRuleset, +} from '@dawmatt/api-grade-core'; +import type { McpServer } from '@modelcontextprotocol/sdk/server/mcp.js'; +import { mcpError, buildRulesetFetchFailureResponse, describeFetchFailureReason, ERROR_CODES } from '../utils/errors.js'; +import type { SessionState } from '@dawmatt/api-grade-core'; + +export function registerAnalyseRulesetSafetyTool(server: McpServer, sessionState: SessionState): void { + server.tool( + 'analyse-ruleset-safety', + "Inspect a Spectral ruleset's per-rule remediation-safety analysis (riskLevel, confidenceLevel, remediationSafetyLevel, assessedBy, rationale) without grading any specific API specification. Use this to understand how risky it would be to auto-remediate violations of each rule in a ruleset before running grade-api-remediation-safety against a real spec.", + { + rulesetPath: z + .string() + .optional() + .describe('Optional path to a custom Spectral-compatible ruleset file; omit to analyse the configured default or built-in ruleset'), + recoveryOption: z + .enum(['retry', 'use-builtin-once', 'use-builtin-session', 'cancel']) + .optional() + .describe( + 'Recovery action when the configured default ruleset is inaccessible. Only supply in response to a RULESET_AUTH_FAILED response. On receiving that response, present its recoveryOptions to the user verbatim and wait for their explicit choice before setting this field — do not select use-builtin-once or use-builtin-session on the user’s behalf.' + ), + }, + async ({ rulesetPath, recoveryOption }) => { + if (recoveryOption === 'cancel') { + return mcpError(ERROR_CODES.REQUEST_CANCELLED, 'Ruleset analysis cancelled by user.', {}); + } + + if (recoveryOption === 'use-builtin-session') { + sessionState.sessionRulesetOverride = 'builtin'; + } + + const workspaceConfig = await loadWorkspaceConfig(); + const globalConfig = await loadGlobalConfig(); + const resolved = resolveRuleset(rulesetPath, sessionState, workspaceConfig, globalConfig); + + let effectiveRulesetPath: string | undefined = resolved.rulesetPath ?? undefined; + let tempRulesetFile: string | undefined; + + if (resolved.rulesetPath?.startsWith('http')) { + if (recoveryOption === 'use-builtin-once') { + effectiveRulesetPath = undefined; + } else { + const timeoutMs = recoveryOption === 'retry' ? RETRY_FETCH_TIMEOUT_MS : INITIAL_FETCH_TIMEOUT_MS; + try { + let content: string; + if (resolved.auth?.type === 'github-pat') { + const token = resolved.auth.githubToken ?? process.env.GITHUB_TOKEN ?? ''; + content = await fetchRulesetContent(resolved.rulesetPath, token || undefined, timeoutMs); + } else { + content = await fetchRulesetContent(resolved.rulesetPath, undefined, timeoutMs); + } + tempRulesetFile = join(tmpdir(), `api-grade-ruleset-${Date.now()}.yaml`); + writeFileSync(tempRulesetFile, content); + effectiveRulesetPath = tempRulesetFile; + } catch (err) { + const reason = err instanceof RulesetAuthError ? err.reason : 'network-unreachable'; + return buildRulesetFetchFailureResponse( + reason, + resolved.rulesetPath, + resolved.scope, + `Could not fetch ruleset from '${resolved.rulesetPath}' (${resolved.scope} default): ${describeFetchFailureReason(reason)}.` + ); + } + } + } else if (effectiveRulesetPath) { + try { + statSync(effectiveRulesetPath); + } catch { + return mcpError( + ERROR_CODES.RULESET_NOT_FOUND, + `The ruleset file '${effectiveRulesetPath}' does not exist. Check the path and try again.`, + { rulesetPath: effectiveRulesetPath } + ); + } + } + + try { + const loadedRuleset = await loadRuleset('openapi-3', effectiveRulesetPath); + const analysis = await analyseRuleset(loadedRuleset); + return { content: [{ type: 'text', text: JSON.stringify(analysis) }] }; + } catch (err) { + const message = err instanceof Error ? err.message : String(err); + return mcpError( + ERROR_CODES.GRADE_ENGINE_ERROR, + `Ruleset analysis error: ${message}`, + { rulesetPath: effectiveRulesetPath } + ); + } finally { + if (tempRulesetFile) try { unlinkSync(tempRulesetFile); } catch { /* ignore */ } + } + } + ); +} diff --git a/packages/api-grade-mcp/src/tools/remediation-safety.ts b/packages/api-grade-mcp/src/tools/remediation-safety.ts index c27f45f..9166175 100644 --- a/packages/api-grade-mcp/src/tools/remediation-safety.ts +++ b/packages/api-grade-mcp/src/tools/remediation-safety.ts @@ -11,18 +11,21 @@ import { RulesetAuthError, INITIAL_FETCH_TIMEOUT_MS, RETRY_FETCH_TIMEOUT_MS, - buildQuickFixOutput, + analyseRuleset, + buildRemediationSafetyOutput, + loadRuleset, } from '@dawmatt/api-grade-core'; +import type { RemediationSafetyLevel } from '@dawmatt/api-grade-core'; import type { McpServer } from '@modelcontextprotocol/sdk/server/mcp.js'; import { mcpError, buildRulesetFetchFailureResponse, describeFetchFailureReason, ERROR_CODES } from '../utils/errors.js'; import type { SessionState } from '@dawmatt/api-grade-core'; const LARGE_SPEC_THRESHOLD_BYTES = 500_000; -export function registerQuickFixesOnlyTool(server: McpServer, sessionState: SessionState): void { +export function registerRemediationSafetyTool(server: McpServer, sessionState: SessionState): void { server.tool( 'grade-api-remediation-safety', - 'Return a classified, AI-actionable list of diagnostics filtered by remediation safety level. The `safe` level covers improvements that can be made via non-breaking changes — those that do not alter the API interface contract (paths, methods, required parameters, schema types, or response structures). Use this tool (not grade-api-detailed) when the goal is for the AI to safely resolve violations; the AI generates the corrected specification content and the MCP server does not modify files.', + 'Return a classified, AI-actionable list of diagnostics filtered by remediation safety level: `safe` (non-breaking, safe to auto-apply), `humanreview` (typically additive/clarifying but should be confirmed by a human before applying at scale), or `unsafe` (could change request/response validation, required fields, types, or the parameter surface — requires human or explicitly-confirmed-agent review). Each returned item also carries a confidence indicator (riskLevel/confidenceLevel) explaining how sure the analyser is in its classification. Use this tool (not grade-api-detailed) when the goal is for the AI to safely resolve violations; the AI generates the corrected specification content and the MCP server does not modify files.', { specPath: z .string() @@ -30,8 +33,8 @@ export function registerQuickFixesOnlyTool(server: McpServer, sessionState: Sess 'Absolute or relative path to the OpenAPI or AsyncAPI specification file (YAML or JSON)' ), level: z - .enum(['safe']) - .describe('Remediation safety level to filter diagnostics by. Only "safe" is supported today.'), + .enum(['safe', 'humanreview', 'unsafe']) + .describe('Remediation safety level to filter diagnostics by.'), rulesetPath: z .string() .optional() @@ -43,7 +46,7 @@ export function registerQuickFixesOnlyTool(server: McpServer, sessionState: Sess 'Recovery action when the configured default ruleset is inaccessible. Only supply in response to a RULESET_AUTH_FAILED response. On receiving that response, present its recoveryOptions to the user verbatim and wait for their explicit choice before setting this field — do not select use-builtin-once or use-builtin-session on the user’s behalf.' ), }, - async ({ specPath, rulesetPath, recoveryOption }) => { + async ({ specPath, level, rulesetPath, recoveryOption }) => { if (recoveryOption === 'cancel') { return mcpError(ERROR_CODES.REQUEST_CANCELLED, 'Grading request cancelled by user.', { specPath }); } @@ -118,8 +121,12 @@ export function registerQuickFixesOnlyTool(server: McpServer, sessionState: Sess try { const engine = new GradeEngine(); const result = await engine.grade({ specPath, rulesetPath: effectiveRulesetPath }); + const loadedRuleset = await loadRuleset(result.format, effectiveRulesetPath); + const rulesetAnalysis = await analyseRuleset(loadedRuleset); - const response: Record = { ...buildQuickFixOutput(result, specContent) }; + const response: Record = { + ...buildRemediationSafetyOutput(result, specContent, rulesetAnalysis, level as RemediationSafetyLevel), + }; if (largeSpecWarning) { response.largeSpecWarning = largeSpecWarning; diff --git a/packages/api-grade-mcp/src/utils/classify.ts b/packages/api-grade-mcp/src/utils/classify.ts index 9337817..7bed85f 100644 --- a/packages/api-grade-mcp/src/utils/classify.ts +++ b/packages/api-grade-mcp/src/utils/classify.ts @@ -1,2 +1,2 @@ -export { classifyViolation, buildQuickFix } from '@dawmatt/api-grade-core'; -export type { QuickFix, ViolationClass } from '@dawmatt/api-grade-core'; +export { analyseRuleset, getRemediationSafety } from '@dawmatt/api-grade-core'; +export type { RuleAnalysis, RemediationSafetyLevel, RiskLevel, ConfidenceLevel } from '@dawmatt/api-grade-core'; diff --git a/packages/api-grade-mcp/tests/integration/analyse-ruleset-safety.test.ts b/packages/api-grade-mcp/tests/integration/analyse-ruleset-safety.test.ts new file mode 100644 index 0000000..c383430 --- /dev/null +++ b/packages/api-grade-mcp/tests/integration/analyse-ruleset-safety.test.ts @@ -0,0 +1,38 @@ +import { describe, it, expect } from 'vitest'; +import { createServer } from '../../src/server.js'; + +type ToolRegistry = Record, extra: unknown) => Promise }>; + +async function callTool(server: ReturnType, toolName: string, args: Record) { + const tools = (server as unknown as { _registeredTools: ToolRegistry })._registeredTools; + const tool = tools[toolName]; + if (!tool) throw new Error(`${toolName} tool not registered`); + return tool.handler(args, {}) as Promise<{ content: [{ type: string; text: string }]; isError?: boolean }>; +} + +describe('analyse-ruleset-safety tool', () => { + it('returns a RulesetAnalysis document for the built-in ruleset', async () => { + const server = createServer(); + const result = await callTool(server, 'analyse-ruleset-safety', {}); + expect(result.isError).toBeFalsy(); + const body = JSON.parse(result.content[0].text); + expect(body).toHaveProperty('rulesetSource', 'default'); + expect(Array.isArray(body.rules)).toBe(true); + expect(body.rules.length).toBeGreaterThan(0); + for (const rule of body.rules) { + expect(rule).toHaveProperty('ruleId'); + expect(rule).toHaveProperty('confidenceLevel'); + expect(rule).toHaveProperty('remediationSafetyLevel'); + expect(rule).toHaveProperty('assessedBy'); + expect(rule).toHaveProperty('rationale'); + } + }); + + it('returns RULESET_NOT_FOUND for a non-existent custom ruleset', async () => { + const server = createServer(); + const result = await callTool(server, 'analyse-ruleset-safety', { rulesetPath: '/nonexistent/ruleset.yaml' }); + expect(result.isError).toBe(true); + const body = JSON.parse(result.content[0].text); + expect(body.error).toBe('RULESET_NOT_FOUND'); + }); +}); diff --git a/packages/api-grade-mcp/tests/integration/quick-fixes-only.test.ts b/packages/api-grade-mcp/tests/integration/remediation-safety.test.ts similarity index 77% rename from packages/api-grade-mcp/tests/integration/quick-fixes-only.test.ts rename to packages/api-grade-mcp/tests/integration/remediation-safety.test.ts index 2c8f7c7..2e16563 100644 --- a/packages/api-grade-mcp/tests/integration/quick-fixes-only.test.ts +++ b/packages/api-grade-mcp/tests/integration/remediation-safety.test.ts @@ -18,23 +18,24 @@ async function callTool(server: ReturnType, toolName: strin } describe('grade-api-remediation-safety tool', () => { - it('returns non-empty quickFixes for a spec with documentation gaps (quick fix opportunities)', async () => { + it.each(['safe', 'humanreview', 'unsafe'])('returns the RemediationSafetyOutput shape for level=%s', async (level) => { const server = createServer(); - const result = await callTool(server, 'grade-api-remediation-safety', { specPath: OPENAPI_POOR, level: 'safe' }); + const result = await callTool(server, 'grade-api-remediation-safety', { specPath: OPENAPI_POOR, level }); expect(result.isError).toBeFalsy(); const body = JSON.parse(result.content[0].text); - expect(body).toHaveProperty('quickFixes'); - expect(body).toHaveProperty('quickFixCount'); + expect(body).toHaveProperty('remediationItems'); + expect(body).toHaveProperty('remediationItemCount'); expect(body).toHaveProperty('totalViolations'); + expect(body).toHaveProperty('requestedLevel', level); }); - it('each violation has all required fields', async () => { + it('each remediation item has all required fields', async () => { const server = createServer(); const result = await callTool(server, 'grade-api-remediation-safety', { specPath: OPENAPI_POOR, level: 'safe' }); expect(result.isError).toBeFalsy(); const body = JSON.parse(result.content[0].text); - if (body.quickFixes.length > 0) { - const v = body.quickFixes[0]; + if (body.remediationItems.length > 0) { + const v = body.remediationItems[0]; expect(v).toHaveProperty('ruleId'); expect(v).toHaveProperty('message'); expect(v).toHaveProperty('severity'); @@ -42,29 +43,33 @@ describe('grade-api-remediation-safety tool', () => { expect(v).toHaveProperty('location'); expect(v).toHaveProperty('currentValue'); expect(v).toHaveProperty('expectedImprovement'); + expect(v).toHaveProperty('riskLevel'); + expect(v).toHaveProperty('confidenceLevel'); + expect(v).toHaveProperty('remediationSafetyLevel', 'safe'); + expect(v).toHaveProperty('staleFingerprintWarning', null); expect(typeof v.expectedImprovement).toBe('string'); expect(v.expectedImprovement.length).toBeGreaterThan(0); } }); - it('no violation in quickFixes is a breaking change', async () => { + it('no violation in the safe level is a breaking change', async () => { const server = createServer(); const result = await callTool(server, 'grade-api-remediation-safety', { specPath: OPENAPI_POOR, level: 'safe' }); expect(result.isError).toBeFalsy(); const body = JSON.parse(result.content[0].text); - for (const v of body.quickFixes) { + for (const v of body.remediationItems) { expect(v.path).not.toContain('required'); expect(v.path).not.toContain('type'); } }); - it('quickFixCount matches quickFixes length', async () => { + it('remediationItemCount matches remediationItems length', async () => { const server = createServer(); const result = await callTool(server, 'grade-api-remediation-safety', { specPath: OPENAPI_MUSEUM, level: 'safe' }); expect(result.isError).toBeFalsy(); const body = JSON.parse(result.content[0].text); - expect(typeof body.quickFixCount).toBe('number'); - expect(body.quickFixCount).toBe(body.quickFixes.length); + expect(typeof body.remediationItemCount).toBe('number'); + expect(body.remediationItemCount).toBe(body.remediationItems.length); }); it('returns RULESET_NOT_FOUND for non-existent local ruleset', async () => { @@ -91,6 +96,6 @@ describe('grade-api-remediation-safety tool', () => { const server = createServer(); const tools = (server as unknown as { _registeredTools: ToolRegistry })._registeredTools; const tool = tools['grade-api-remediation-safety'] as unknown as { inputSchema: { parse: (v: unknown) => unknown } }; - expect(() => tool.inputSchema.parse({ specPath: OPENAPI_POOR, level: 'unsafe' })).toThrow(); + expect(() => tool.inputSchema.parse({ specPath: OPENAPI_POOR, level: 'breaking' })).toThrow(); }); }); diff --git a/packages/api-grade-mcp/tests/unit/classify.test.ts b/packages/api-grade-mcp/tests/unit/classify.test.ts index 7a39ea5..49de743 100644 --- a/packages/api-grade-mcp/tests/unit/classify.test.ts +++ b/packages/api-grade-mcp/tests/unit/classify.test.ts @@ -1,5 +1,5 @@ import { describe, it, expect } from 'vitest'; -import { classifyViolation } from '../../src/utils/classify.js'; +import { analyseRuleset, getRemediationSafety } from '../../src/utils/classify.js'; import type { Diagnostic } from '@dawmatt/api-grade-core'; function makeDiagnostic(overrides: Partial): Diagnostic { @@ -14,44 +14,40 @@ function makeDiagnostic(overrides: Partial): Diagnostic { }; } -describe('classifyViolation()', () => { - it('classifies operation-description as nonBreaking (rule ID override)', () => { +describe('classify.ts re-exports', () => { + it('analyseRuleset() classifies a safe rule', async () => { + const loadedRuleset = { + ruleset: { + rules: { + 'operation-description': { given: '$.paths[*][*]', then: { field: 'description', function: 'truthy' } }, + }, + }, + rulesetSource: 'custom' as const, + }; + const analysis = await analyseRuleset(loadedRuleset); + expect(analysis.rules[0].remediationSafetyLevel).toBe('safe'); + }); + + it('getRemediationSafety() looks up a violation against a RulesetAnalysis', async () => { + const loadedRuleset = { + ruleset: { + rules: { + 'operation-description': { given: '$.paths[*][*]', then: { field: 'description', function: 'truthy' } }, + }, + }, + rulesetSource: 'custom' as const, + }; + const analysis = await analyseRuleset(loadedRuleset); const d = makeDiagnostic({ ruleId: 'operation-description', path: ['paths', '/pets', 'get'] }); - expect(classifyViolation(d)).toBe('nonBreaking'); + const result = getRemediationSafety(d, analysis); + expect(result.remediationSafetyLevel).toBe('safe'); }); - it('classifies violation at required field as breaking', () => { - const d = makeDiagnostic({ ruleId: 'some-rule', path: ['paths', '/pets', 'get', 'parameters', '0', 'required'] }); - expect(classifyViolation(d)).toBe('breaking'); - }); - - it('classifies info-contact as nonBreaking (rule ID override)', () => { - const d = makeDiagnostic({ ruleId: 'info-contact', path: ['info'] }); - expect(classifyViolation(d)).toBe('nonBreaking'); - }); - - it('classifies violation with x- extension path as nonBreaking', () => { - const d = makeDiagnostic({ ruleId: 'some-rule', path: ['info', 'x-logo'] }); - expect(classifyViolation(d)).toBe('nonBreaking'); - }); - - it('classifies unknown path with no recognised segments as unknown', () => { - const d = makeDiagnostic({ ruleId: 'obscure-rule', path: ['components', 'securitySchemes', 'oauth2'] }); - expect(classifyViolation(d)).toBe('unknown'); - }); - - it('classifies oas3-examples-* rules as nonBreaking', () => { - const d = makeDiagnostic({ ruleId: 'oas3-examples-value-or-externalValue', path: ['paths', '/pets', 'get'] }); - expect(classifyViolation(d)).toBe('nonBreaking'); - }); - - it('classifies description path segment as nonBreaking', () => { - const d = makeDiagnostic({ ruleId: 'some-rule', path: ['info', 'description'] }); - expect(classifyViolation(d)).toBe('nonBreaking'); - }); - - it('classifies type path segment as breaking', () => { - const d = makeDiagnostic({ ruleId: 'some-rule', path: ['components', 'schemas', 'Pet', 'type'] }); - expect(classifyViolation(d)).toBe('breaking'); + it('getRemediationSafety() defaults to unsafe/low on lookup miss (FR-009)', () => { + const analysis = { rulesetSource: 'custom' as const, rules: [] }; + const d = makeDiagnostic({ ruleId: 'never-seen' }); + const result = getRemediationSafety(d, analysis); + expect(result.remediationSafetyLevel).toBe('unsafe'); + expect(result.confidenceLevel).toBe('low'); }); }); diff --git a/specs/012-remediation-safety/quickstart.md b/specs/012-remediation-safety/quickstart.md index 2465076..3faf42a 100644 --- a/specs/012-remediation-safety/quickstart.md +++ b/specs/012-remediation-safety/quickstart.md @@ -35,15 +35,15 @@ Each returned item now includes `riskLevel`, `confidenceLevel`, `remediationSafe ```bash api-grade ruleset-analysis --format human # rule id risk level confidence remediation safety assessed by rationale -# operation-description low high safe human maintainer-confirmed safe classification (bundled) -# operation-operationId medium high humanreview human maintainer-confirmed humanreview classification (bundled) +# operation-description n/a high safe automated seeded safe classification (bundled default, not yet human-reviewed) +# operation-operationId n/a high humanreview automated seeded humanreview classification (bundled default, not yet human-reviewed) # oas3-schema high low unsafe automated no recognizable rule-id, function, or path signal # custom-team-rule-007 low high safe human WARNING: fingerprint mismatch (stored a1b2c3..., current d4e5f6...) — rule changed since this was last reviewed; persisted classification still honored api-grade ruleset-analysis --ruleset-path ./my-ruleset.yaml --format json ``` -There is no separate hard-coded table backing the `human`-assessed rows above for the built-in ruleset — they are ordinary bundled persisted entries (FR-012/FR-020), the same mechanism a user would use to persist a correction for their own ruleset (FR-013). The last row illustrates FR-021: a human-assessed entry whose rule definition has since changed is still honored, but flagged with both the stored and current fingerprint rather than silently discarded. +The built-in ruleset's bundled entries (FR-012/FR-020) are seeded defaults — `assessedBy: "automated"` — not a maintainer's reviewed judgement; they exist so the baked-in ruleset has *some* classification before anyone has reviewed it, not as a substitute for review. A maintainer who actually reviews a rule and runs `ruleset-analysis correct` on it (the same mechanism a user would use to persist a correction for their own ruleset, FR-013) produces a genuine `assessedBy: "human"` entry. The last row illustrates FR-021: a human-assessed entry whose rule definition has since changed is still honored, but flagged with both the stored and current fingerprint rather than silently discarded. ## 3. MCP: same filtering, plus a dedicated ruleset-analysis tool diff --git a/specs/012-remediation-safety/tasks.md b/specs/012-remediation-safety/tasks.md index 6322d74..ffeb8e8 100644 --- a/specs/012-remediation-safety/tasks.md +++ b/specs/012-remediation-safety/tasks.md @@ -33,12 +33,12 @@ No new project scaffolding is required — this feature extends the existing `pa **⚠️ CRITICAL**: No user story work can begin until this phase is complete. -- [ ] T001 Replace `ViolationClass`/`QuickFix`/`QuickFixOutput` with the new type set in `packages/api-grade-core/src/types.ts`: add `RemediationSafetyLevel` (`"safe"|"humanreview"|"unsafe"`), `RiskLevel` (`"low"|"medium"|"high"`), `ConfidenceLevel` (`"high"|"medium"|"low"`), `AssessmentOrigin` (`"human"|"automated"`), `AnalysisSource` (`"persisted"|"bundled-default"|"heuristic"|"fallback"`), `RuleAnalysis`, `RulesetAnalysis`, `RemediationItem` (was `QuickFix`, with new `riskLevel`/`confidenceLevel`/`remediationSafetyLevel`/`staleFingerprintWarning` fields), `RemediationSafetyOutput` (was `QuickFixOutput`, with `remediationItemCount`/`remediationItems`/`requestedLevel`) — per data-model.md -- [ ] T002 [P] Write failing unit tests for `analyseRuleset()` Stage 1/2 heuristics in `packages/api-grade-core/tests/unit/remediation-safety.test.ts` (new file, replaces `quick-fixes.test.ts`): key-selector check (1a), additive/rename/custom function-mechanics classification (1b), generic segment fallback (1c), Stage 2 whole-document fallback, and the decision matrix table from research.md §3 — depends on T001 -- [ ] T003 Implement `analyseRuleset(loadedRuleset: LoadedRuleset): RulesetAnalysis` in `packages/api-grade-core/src/remediation-safety.ts` (new file) implementing Stages 1–2 of `specs/algorithms/automated_remediation_safety_algorithm_spec.md` (key-selector check, function-mechanics classification with extended AsyncAPI segment tiers, generic segment fallback, whole-document fallback, and the shared decision matrix) so T002 passes — depends on T002 -- [ ] T004 Implement `getRemediationSafety(diagnostic: Diagnostic, rulesetAnalysis: RulesetAnalysis): { riskLevel, confidenceLevel, remediationSafetyLevel, staleFingerprintWarning }` in `packages/api-grade-core/src/remediation-safety.ts`, including the FR-009 lookup-miss default (`riskLevel: "high"`, `confidenceLevel: "low"`, `remediationSafetyLevel: "unsafe"`) — depends on T003 -- [ ] T005 Implement `buildRemediationItem()`, `buildRemediationSafetyOutput()`, `formatRemediationSafetyHuman()` in `packages/api-grade-core/src/remediation-safety.ts`, replacing `buildQuickFix()`/`buildQuickFixOutput()`/`formatQuickFixesHuman()` — filters by `remediationSafetyLevel` against a requested level, preserving FR-007 (`safe` membership unchanged) — depends on T004 -- [ ] T006 Delete `packages/api-grade-core/src/quick-fixes.ts` and `packages/api-grade-core/tests/unit/quick-fixes.test.ts`; update `packages/api-grade-core/src/index.ts` to remove the `quick-fixes.js` export line and the `QuickFix`/`ViolationClass`/`QuickFixOutput` type exports, replacing them with `analyseRuleset`, `getRemediationSafety`, `buildRemediationItem`, `buildRemediationSafetyOutput`, `formatRemediationSafetyHuman` and the new types from T001 — depends on T005 +- [X] T001 Replace `ViolationClass`/`QuickFix`/`QuickFixOutput` with the new type set in `packages/api-grade-core/src/types.ts`: add `RemediationSafetyLevel` (`"safe"|"humanreview"|"unsafe"`), `RiskLevel` (`"low"|"medium"|"high"`), `ConfidenceLevel` (`"high"|"medium"|"low"`), `AssessmentOrigin` (`"human"|"automated"`), `AnalysisSource` (`"persisted"|"bundled-default"|"heuristic"|"fallback"`), `RuleAnalysis`, `RulesetAnalysis`, `RemediationItem` (was `QuickFix`, with new `riskLevel`/`confidenceLevel`/`remediationSafetyLevel`/`staleFingerprintWarning` fields), `RemediationSafetyOutput` (was `QuickFixOutput`, with `remediationItemCount`/`remediationItems`/`requestedLevel`) — per data-model.md +- [X] T002 [P] Write failing unit tests for `analyseRuleset()` Stage 1/2 heuristics in `packages/api-grade-core/tests/unit/remediation-safety.test.ts` (new file, replaces `quick-fixes.test.ts`): key-selector check (1a), additive/rename/custom function-mechanics classification (1b), generic segment fallback (1c), Stage 2 whole-document fallback, and the decision matrix table from research.md §3 — depends on T001 +- [X] T003 Implement `analyseRuleset(loadedRuleset: LoadedRuleset): RulesetAnalysis` in `packages/api-grade-core/src/remediation-safety.ts` (new file) implementing Stages 1–2 of `specs/algorithms/automated_remediation_safety_algorithm_spec.md` (key-selector check, function-mechanics classification with extended AsyncAPI segment tiers, generic segment fallback, whole-document fallback, and the shared decision matrix) so T002 passes — depends on T002 +- [X] T004 Implement `getRemediationSafety(diagnostic: Diagnostic, rulesetAnalysis: RulesetAnalysis): { riskLevel, confidenceLevel, remediationSafetyLevel, staleFingerprintWarning }` in `packages/api-grade-core/src/remediation-safety.ts`, including the FR-009 lookup-miss default (`riskLevel: "high"`, `confidenceLevel: "low"`, `remediationSafetyLevel: "unsafe"`) — depends on T003 +- [X] T005 Implement `buildRemediationItem()`, `buildRemediationSafetyOutput()`, `formatRemediationSafetyHuman()` in `packages/api-grade-core/src/remediation-safety.ts`, replacing `buildQuickFix()`/`buildQuickFixOutput()`/`formatQuickFixesHuman()` — filters by `remediationSafetyLevel` against a requested level, preserving FR-007 (`safe` membership unchanged) — depends on T004 +- [X] T006 Delete `packages/api-grade-core/src/quick-fixes.ts` and `packages/api-grade-core/tests/unit/quick-fixes.test.ts`; update `packages/api-grade-core/src/index.ts` to remove the `quick-fixes.js` export line and the `QuickFix`/`ViolationClass`/`QuickFixOutput` type exports, replacing them with `analyseRuleset`, `getRemediationSafety`, `buildRemediationItem`, `buildRemediationSafetyOutput`, `formatRemediationSafetyHuman` and the new types from T001 — depends on T005 **Checkpoint**: `analyseRuleset()`/`getRemediationSafety()` exist, are unit-tested, and are exported from `@dawmatt/api-grade-core`. User story work can now begin. @@ -52,15 +52,15 @@ No new project scaffolding is required — this feature extends the existing `pa ### Tests for User Story 1 -- [ ] T007 [P] [US1] Integration test for the extended `--remediation-safety` CLI flag (accepts `safe`/`humanreview`/`unsafe`, rejects other values with the 3-value error message, `safe` membership unchanged) in `tests/integration/cli-remediation-safety.test.ts` (new file, replaces `tests/integration/cli-quick-fixes.test.ts`) -- [ ] T008 [P] [US1] Integration test for the `grade-api-remediation-safety` MCP tool's extended `level` enum and `RemediationSafetyOutput` response shape (`remediationItemCount`, `remediationItems`, `requestedLevel`, per-item `riskLevel`/`confidenceLevel`/`remediationSafetyLevel`/`staleFingerprintWarning`) in `packages/api-grade-mcp/tests/integration/remediation-safety.test.ts` (new file, replaces `packages/api-grade-mcp/tests/integration/quick-fixes-only.test.ts`) +- [X] T007 [P] [US1] Integration test for the extended `--remediation-safety` CLI flag (accepts `safe`/`humanreview`/`unsafe`, rejects other values with the 3-value error message, `safe` membership unchanged) in `tests/integration/cli-remediation-safety.test.ts` (new file, replaces `tests/integration/cli-quick-fixes.test.ts`) +- [X] T008 [P] [US1] Integration test for the `grade-api-remediation-safety` MCP tool's extended `level` enum and `RemediationSafetyOutput` response shape (`remediationItemCount`, `remediationItems`, `requestedLevel`, per-item `riskLevel`/`confidenceLevel`/`remediationSafetyLevel`/`staleFingerprintWarning`) in `packages/api-grade-mcp/tests/integration/remediation-safety.test.ts` (new file, replaces `packages/api-grade-mcp/tests/integration/quick-fixes-only.test.ts`) ### Implementation for User Story 1 -- [ ] T009 [US1] Update `src/cli/index.ts`: extend the `--remediation-safety ` option to accept `safe|humanreview|unsafe`, update the rejection error message to `Error: --remediation-safety must be one of: safe, humanreview, unsafe.`, load the ruleset via `analyseRuleset()`, and call `buildRemediationSafetyOutput()`/`formatRemediationSafetyHuman()` in place of the removed `buildQuickFixOutput()`/`formatQuickFixesHuman()` (lines ~14-15, ~80, ~116, ~183-184) — depends on T006, makes T007 pass -- [ ] T010 [US1] Rename `packages/api-grade-mcp/src/tools/quick-fixes-only.ts` to `packages/api-grade-mcp/src/tools/remediation-safety.ts`: rename `registerQuickFixesOnlyTool` to `registerRemediationSafetyTool`, extend the `level` Zod enum to `['safe', 'humanreview', 'unsafe']`, call `analyseRuleset()` + `buildRemediationSafetyOutput()` instead of `buildQuickFixOutput()`, and update the tool description per `contracts/remediation-safety-surfaces.md` (mention all three levels and the confidence indicator) — depends on T006, makes T008 pass -- [ ] T011 [US1] Update `packages/api-grade-mcp/src/server.ts`: replace the `registerQuickFixesOnlyTool` import/registration (lines 8, 30) with `registerRemediationSafetyTool` from `./tools/remediation-safety.js` — depends on T010 -- [ ] T012 [US1] Update `packages/api-grade-mcp/src/utils/classify.ts`: replace the `classifyViolation`/`buildQuickFix`/`QuickFix`/`ViolationClass` re-exports with `analyseRuleset`/`getRemediationSafety` and the `RuleAnalysis`/`RemediationSafetyLevel`/`RiskLevel`/`ConfidenceLevel` type re-exports from `@dawmatt/api-grade-core` — depends on T006 +- [X] T009 [US1] Update `src/cli/index.ts`: extend the `--remediation-safety ` option to accept `safe|humanreview|unsafe`, update the rejection error message to `Error: --remediation-safety must be one of: safe, humanreview, unsafe.`, load the ruleset via `analyseRuleset()`, and call `buildRemediationSafetyOutput()`/`formatRemediationSafetyHuman()` in place of the removed `buildQuickFixOutput()`/`formatQuickFixesHuman()` (lines ~14-15, ~80, ~116, ~183-184) — depends on T006, makes T007 pass +- [X] T010 [US1] Rename `packages/api-grade-mcp/src/tools/quick-fixes-only.ts` to `packages/api-grade-mcp/src/tools/remediation-safety.ts`: rename `registerQuickFixesOnlyTool` to `registerRemediationSafetyTool`, extend the `level` Zod enum to `['safe', 'humanreview', 'unsafe']`, call `analyseRuleset()` + `buildRemediationSafetyOutput()` instead of `buildQuickFixOutput()`, and update the tool description per `contracts/remediation-safety-surfaces.md` (mention all three levels and the confidence indicator) — depends on T006, makes T008 pass +- [X] T011 [US1] Update `packages/api-grade-mcp/src/server.ts`: replace the `registerQuickFixesOnlyTool` import/registration (lines 8, 30) with `registerRemediationSafetyTool` from `./tools/remediation-safety.js` — depends on T010 +- [X] T012 [US1] Update `packages/api-grade-mcp/src/utils/classify.ts`: replace the `classifyViolation`/`buildQuickFix`/`QuickFix`/`ViolationClass` re-exports with `analyseRuleset`/`getRemediationSafety` and the `RuleAnalysis`/`RemediationSafetyLevel`/`RiskLevel`/`ConfidenceLevel` type re-exports from `@dawmatt/api-grade-core` — depends on T006 **Checkpoint**: `--remediation-safety`/`grade-api-remediation-safety` fully support all three levels end-to-end (CLI + MCP, JSON + human), with `safe` behavior unchanged (FR-007, SC-001, SC-004). @@ -74,21 +74,21 @@ No new project scaffolding is required — this feature extends the existing `pa ### Implementation for User Story 2 -- [ ] T013 [P] [US2] Implement `RuleFingerprint` computation (hash over a rule's `ruleId`/`given`/`then.function`/`severity`/`description`) in `packages/api-grade-core/src/remediation-safety.ts` per data-model.md `RuleFingerprint` — depends on T006 -- [ ] T014 [P] [US2] Implement colocated `SharedRulesetAnalysis` read/write for local rulesets (deterministic filename derived from the ruleset's own path, e.g. sibling file with a fixed suffix) in `packages/api-grade-core/src/config/shared-ruleset-analysis.ts` (new file) per data-model.md `SharedRulesetAnalysis` — depends on T013 -- [ ] T015 [US2] Extend `packages/api-grade-core/src/config/shared-ruleset-analysis.ts` to read (never write) the colocated `SharedRulesetAnalysis` for a GitHub-hosted ruleset by reusing `resolveRuleset`/`fetchRulesetContent` (`packages/api-grade-core/src/config/resolve-ruleset.ts`, `packages/api-grade-core/src/auth/github.ts`) with the same `AuthConfig` already supplied for the ruleset itself (FR-017, FR-019) — depends on T014 -- [ ] T016 [P] [US2] Implement `PersonalRulesetAnalysisOverride` storage (workspace/global scope, same precedence as `RulesetConfig`) in `packages/api-grade-core/src/config/personal-ruleset-override.ts` (new file), reusing the `loadConfig`/`saveConfig`/`getWorkspaceConfigPath`/`getGlobalConfigPath` pattern from `packages/api-grade-core/src/config/ruleset-config.ts` — depends on T013 -- [ ] T017 [US2] Author `BundledRulesetAnalysis` for the built-in OpenAPI and AsyncAPI rulesets in `packages/api-grade-core/src/rulesets/bundled-analysis/openapi.json` and `.../asyncapi.json` (new files): migrate the former `RULE_ID_NON_BREAKING_PREFIXES`-style curated mappings (e.g. `operation-description` → `safe`, `operation-operationId` → `humanreview`) into `assessedBy: "human"` entries with maintainer-authored rationale, per research.md §3/§8 and FR-012/FR-020 — depends on T013 -- [ ] T018 [US2] Wire Stage 0 lookup precedence into `analyseRuleset()` in `packages/api-grade-core/src/remediation-safety.ts`: workspace `PersonalRulesetAnalysisOverride` → global `PersonalRulesetAnalysisOverride` → colocated `SharedRulesetAnalysis` → `BundledRulesetAnalysis` (built-in ruleset only) → fall through to Stages 1–2; implement fingerprint-staleness handling (an `assessedBy: "automated"` entry with a stale fingerprint is treated as not found and falls through; an `assessedBy: "human"` entry with a stale fingerprint is still used, with `staleFingerprintWarning` populated) — depends on T014, T015, T016, T017 -- [ ] T019 [US2] Implement persisting a correction (FR-013/FR-018/FR-019) — a function that writes an `assessedBy: "human"`, `confidenceLevel: "high"` entry to the colocated `SharedRulesetAnalysis` for a writable/local ruleset, or to the workspace-scoped `PersonalRulesetAnalysisOverride` when the ruleset's location is not writable (e.g. GitHub-hosted) — in `packages/api-grade-core/src/remediation-safety.ts` — depends on T018 -- [ ] T020 [P] [US2] Update `packages/api-grade-core/src/index.ts` to export the new Stage 0/persistence symbols and types from T013-T019 (`SharedRulesetAnalysis`, `PersonalRulesetAnalysisOverride`, `BundledRulesetAnalysis`, `RuleFingerprint`, `AssessmentOrigin`, `AnalysisSource`, and the persist-correction function) — depends on T019 -- [ ] T021 [P] [US2] Unit tests for Stage 0 precedence, fingerprint staleness (automated-discarded vs. human-honored-with-warning), and persisting a correction in `packages/api-grade-core/tests/unit/remediation-safety.test.ts` (extends T002's file) covering SC-006, SC-008, SC-009 — depends on T020 -- [ ] T022 [P] [US2] New CLI subcommand `ruleset-analysis [--ruleset-path ] [--format json|human]` in `src/cli/ruleset-analysis-cli.ts` (new file, mirrors `src/cli/ruleset-config-cli.ts`), registered in `src/cli/index.ts`; `--format human` prints rule id, risk level, confidence level, remediation safety level, assessed by, rationale, and any fingerprint-mismatch warning per quickstart.md §2 — depends on T020 -- [ ] T023 [P] [US2] Add a `correct` action to `src/cli/ruleset-analysis-cli.ts` for persisting a correction (FR-013), e.g. `api-grade ruleset-analysis correct --rule-id --level [--ruleset-path ]`, calling the persist function from T019 — depends on T019, T022 -- [ ] T024 [P] [US2] New MCP tool `analyse-ruleset-safety` in `packages/api-grade-mcp/src/tools/analyse-ruleset-safety.ts` (new file, mirrors `packages/api-grade-mcp/src/tools/get-ruleset-config.ts`), input `{ rulesetPath?: string, recoveryOption?: ... }`, output a `RulesetAnalysis` JSON document, reusing the `resolveRuleset`/`RulesetAuthError`/`mcpError` flow already used by `grade-api-remediation-safety`; register it in `packages/api-grade-mcp/src/server.ts` — depends on T020 -- [ ] T025 [P] [US2] Integration test for the `ruleset-analysis` CLI subcommand (human + json format, fingerprint-mismatch warning display, `correct` action) in `tests/integration/cli-remediation-safety.test.ts` — depends on T022, T023 -- [ ] T026 [P] [US2] Integration test for the `analyse-ruleset-safety` MCP tool in `packages/api-grade-mcp/tests/integration/analyse-ruleset-safety.test.ts` (new file) — depends on T024 -- [ ] T027 [US2] Verify `staleFingerprintWarning` is threaded through `RemediationItem`/`RemediationSafetyOutput` (built in T004/T005) into the CLI (`--remediation-safety`, T009) and MCP (`grade-api-remediation-safety`, T010) human + JSON output, satisfying FR-021/SC-009 at the per-violation surface, not just the ruleset-analysis surface — depends on T018, T009, T010 +- [X] T013 [P] [US2] Implement `RuleFingerprint` computation (hash over a rule's `ruleId`/`given`/`then.function`/`severity`/`description`) in `packages/api-grade-core/src/remediation-safety.ts` per data-model.md `RuleFingerprint` — depends on T006 +- [X] T014 [P] [US2] Implement colocated `SharedRulesetAnalysis` read/write for local rulesets (deterministic filename derived from the ruleset's own path, e.g. sibling file with a fixed suffix) in `packages/api-grade-core/src/config/shared-ruleset-analysis.ts` (new file) per data-model.md `SharedRulesetAnalysis` — depends on T013 +- [X] T015 [US2] Extend `packages/api-grade-core/src/config/shared-ruleset-analysis.ts` to read (never write) the colocated `SharedRulesetAnalysis` for a GitHub-hosted ruleset by reusing `resolveRuleset`/`fetchRulesetContent` (`packages/api-grade-core/src/config/resolve-ruleset.ts`, `packages/api-grade-core/src/auth/github.ts`) with the same `AuthConfig` already supplied for the ruleset itself (FR-017, FR-019) — depends on T014 +- [X] T016 [P] [US2] Implement `PersonalRulesetAnalysisOverride` storage (workspace/global scope, same precedence as `RulesetConfig`) in `packages/api-grade-core/src/config/personal-ruleset-override.ts` (new file), reusing the `loadConfig`/`saveConfig`/`getWorkspaceConfigPath`/`getGlobalConfigPath` pattern from `packages/api-grade-core/src/config/ruleset-config.ts` — depends on T013 +- [X] T017 [US2] Author `BundledRulesetAnalysis` for the built-in OpenAPI and AsyncAPI rulesets in `packages/api-grade-core/src/rulesets/bundled-analysis/openapi.json` and `.../asyncapi.json` (new files): migrate the former `RULE_ID_NON_BREAKING_PREFIXES`-style curated mappings (e.g. `operation-description` → `safe`, `operation-operationId` → `humanreview`) into `assessedBy: "human"` entries with maintainer-authored rationale, per research.md §3/§8 and FR-012/FR-020 — depends on T013 +- [X] T018 [US2] Wire Stage 0 lookup precedence into `analyseRuleset()` in `packages/api-grade-core/src/remediation-safety.ts`: workspace `PersonalRulesetAnalysisOverride` → global `PersonalRulesetAnalysisOverride` → colocated `SharedRulesetAnalysis` → `BundledRulesetAnalysis` (built-in ruleset only) → fall through to Stages 1–2; implement fingerprint-staleness handling (an `assessedBy: "automated"` entry with a stale fingerprint is treated as not found and falls through; an `assessedBy: "human"` entry with a stale fingerprint is still used, with `staleFingerprintWarning` populated) — depends on T014, T015, T016, T017 +- [X] T019 [US2] Implement persisting a correction (FR-013/FR-018/FR-019) — a function that writes an `assessedBy: "human"`, `confidenceLevel: "high"` entry to the colocated `SharedRulesetAnalysis` for a writable/local ruleset, or to the workspace-scoped `PersonalRulesetAnalysisOverride` when the ruleset's location is not writable (e.g. GitHub-hosted) — in `packages/api-grade-core/src/remediation-safety.ts` — depends on T018 +- [X] T020 [P] [US2] Update `packages/api-grade-core/src/index.ts` to export the new Stage 0/persistence symbols and types from T013-T019 (`SharedRulesetAnalysis`, `PersonalRulesetAnalysisOverride`, `BundledRulesetAnalysis`, `RuleFingerprint`, `AssessmentOrigin`, `AnalysisSource`, and the persist-correction function) — depends on T019 +- [X] T021 [P] [US2] Unit tests for Stage 0 precedence, fingerprint staleness (automated-discarded vs. human-honored-with-warning), and persisting a correction in `packages/api-grade-core/tests/unit/remediation-safety.test.ts` (extends T002's file) covering SC-006, SC-008, SC-009 — depends on T020 +- [X] T022 [P] [US2] New CLI subcommand `ruleset-analysis [--ruleset-path ] [--format json|human]` in `src/cli/ruleset-analysis-cli.ts` (new file, mirrors `src/cli/ruleset-config-cli.ts`), registered in `src/cli/index.ts`; `--format human` prints rule id, risk level, confidence level, remediation safety level, assessed by, rationale, and any fingerprint-mismatch warning per quickstart.md §2 — depends on T020 +- [X] T023 [P] [US2] Add a `correct` action to `src/cli/ruleset-analysis-cli.ts` for persisting a correction (FR-013), e.g. `api-grade ruleset-analysis correct --rule-id --level [--ruleset-path ]`, calling the persist function from T019 — depends on T019, T022 +- [X] T024 [P] [US2] New MCP tool `analyse-ruleset-safety` in `packages/api-grade-mcp/src/tools/analyse-ruleset-safety.ts` (new file, mirrors `packages/api-grade-mcp/src/tools/get-ruleset-config.ts`), input `{ rulesetPath?: string, recoveryOption?: ... }`, output a `RulesetAnalysis` JSON document, reusing the `resolveRuleset`/`RulesetAuthError`/`mcpError` flow already used by `grade-api-remediation-safety`; register it in `packages/api-grade-mcp/src/server.ts` — depends on T020 +- [X] T025 [P] [US2] Integration test for the `ruleset-analysis` CLI subcommand (human + json format, fingerprint-mismatch warning display, `correct` action) in `tests/integration/cli-remediation-safety.test.ts` — depends on T022, T023 +- [X] T026 [P] [US2] Integration test for the `analyse-ruleset-safety` MCP tool in `packages/api-grade-mcp/tests/integration/analyse-ruleset-safety.test.ts` (new file) — depends on T024 +- [X] T027 [US2] Verify `staleFingerprintWarning` is threaded through `RemediationItem`/`RemediationSafetyOutput` (built in T004/T005) into the CLI (`--remediation-safety`, T009) and MCP (`grade-api-remediation-safety`, T010) human + JSON output, satisfying FR-021/SC-009 at the per-violation surface, not just the ruleset-analysis surface — depends on T018, T009, T010 **Checkpoint**: `ruleset-analysis`/`analyse-ruleset-safety` expose full per-rule analysis with confidence and provenance; persisted corrections (shared, personal, bundled) are loaded automatically and survive rule edits when human-assessed (FR-011 through FR-021, SC-002, SC-005 through SC-009). @@ -102,16 +102,16 @@ No new project scaffolding is required — this feature extends the existing `pa ### Implementation for User Story 3 -- [ ] T028 [P] [US3] Update `docs/cli/commands.md`: document the 3-level `--remediation-safety` reference and the new `ruleset-analysis` subcommand -- [ ] T029 [P] [US3] Update `docs/mcp/quick-start.md`: document the renamed/extended `grade-api-remediation-safety` tool and the new `analyse-ruleset-safety` tool -- [ ] T030 [P] [US3] Update `docs/package/api-grade-mcp.md`: tool reference updates for both tools above -- [ ] T031 [P] [US3] Update `docs/package/README.md`: remove remaining "quick fix" mentions -- [ ] T032 [P] [US3] Update `docs/package/api-reference.md`: document the new core API (`analyseRuleset`, `getRemediationSafety`, `RuleAnalysis`, `RulesetAnalysis`, `RemediationItem`, `RemediationSafetyOutput`, etc.) in place of the removed `QuickFix`/`QuickFixOutput`/`classifyViolation` -- [ ] T033 [P] [US3] Update `docs/index.md`: remove remaining "quick fix" mentions -- [ ] T034 [P] [US3] Update `docs/getting-started.md`: update the tool list mention -- [ ] T035 [P] [US3] Update `packages/api-grade-mcp/README.md`: tool table update (`grade-api-remediation-safety`, `analyse-ruleset-safety`) -- [ ] T036 [P] [US3] Update `CONTRIBUTING.md`: correct the package/tool table entry that still names the pre-Feature-11 tool -- [ ] T037 [US3] Run `grep -rniE "quick.?fix" --include="*.ts" --include="*.md" src/ packages/api-grade-core/src packages/api-grade-mcp/src packages/api-grade-core/tests packages/api-grade-mcp/tests tests/ docs/ packages/api-grade-mcp/README.md CONTRIBUTING.md` (per quickstart.md §4) and fix any remaining matches until it returns zero (SC-003) — depends on T009-T036 +- [X] T028 [P] [US3] Update `docs/cli/commands.md`: document the 3-level `--remediation-safety` reference and the new `ruleset-analysis` subcommand +- [X] T029 [P] [US3] Update `docs/mcp/quick-start.md`: document the renamed/extended `grade-api-remediation-safety` tool and the new `analyse-ruleset-safety` tool +- [X] T030 [P] [US3] Update `docs/package/api-grade-mcp.md`: tool reference updates for both tools above +- [X] T031 [P] [US3] Update `docs/package/README.md`: remove remaining "quick fix" mentions +- [X] T032 [P] [US3] Update `docs/package/api-reference.md`: document the new core API (`analyseRuleset`, `getRemediationSafety`, `RuleAnalysis`, `RulesetAnalysis`, `RemediationItem`, `RemediationSafetyOutput`, etc.) in place of the removed `QuickFix`/`QuickFixOutput`/`classifyViolation` +- [X] T033 [P] [US3] Update `docs/index.md`: remove remaining "quick fix" mentions +- [X] T034 [P] [US3] Update `docs/getting-started.md`: update the tool list mention +- [X] T035 [P] [US3] Update `packages/api-grade-mcp/README.md`: tool table update (`grade-api-remediation-safety`, `analyse-ruleset-safety`) +- [X] T036 [P] [US3] Update `CONTRIBUTING.md`: correct the package/tool table entry that still names the pre-Feature-11 tool +- [X] T037 [US3] Run `grep -rniE "quick.?fix" --include="*.ts" --include="*.md" src/ packages/api-grade-core/src packages/api-grade-mcp/src packages/api-grade-core/tests packages/api-grade-mcp/tests tests/ docs/ packages/api-grade-mcp/README.md CONTRIBUTING.md` (per quickstart.md §4) and fix any remaining matches until it returns zero (SC-003) — depends on T009-T036 **Checkpoint**: SC-003 satisfied — zero "quick fix" references remain anywhere in current source, tests, or documentation. @@ -121,9 +121,9 @@ No new project scaffolding is required — this feature extends the existing `pa **Purpose**: Final validation across all three stories. -- [ ] T038 [P] Add a new `CHANGELOG.md` entry for this feature (do not modify historical entries) -- [ ] T039 Run `vitest run` across all workspaces, `tsc --noEmit`, and lint; fix any failures -- [ ] T040 Manually walk through `quickstart.md` end-to-end (all 4 sections) against a real local ruleset and a GitHub-hosted ruleset to confirm SC-001 through SC-009 +- [X] T038 [P] Add a new `CHANGELOG.md` entry for this feature (do not modify historical entries) +- [X] T039 Run `vitest run` across all workspaces, `tsc --noEmit`, and lint; fix any failures +- [X] T040 Manually walk through `quickstart.md` end-to-end (all 4 sections) against a real local ruleset and a GitHub-hosted ruleset to confirm SC-001 through SC-009 --- diff --git a/src/cli/index.ts b/src/cli/index.ts index 6ae03b1..ada36bf 100644 --- a/src/cli/index.ts +++ b/src/cli/index.ts @@ -11,14 +11,17 @@ import { loadWorkspaceConfig, loadGlobalConfig, buildAssertOutput, - buildQuickFixOutput, - formatQuickFixesHuman, + analyseRuleset, + buildRemediationSafetyOutput, + formatRemediationSafetyHuman, + loadRuleset, } from '@dawmatt/api-grade-core'; import { loadConfig } from './config-loader.js'; import { resolveCliAuth, isValidAuthType } from './ruleset-resolution.js'; import { resolveRemoteRuleset } from './ruleset-fetch.js'; import { registerConfigCommand } from './ruleset-config-cli.js'; -import type { LetterGrade } from '@dawmatt/api-grade-core'; +import { registerRulesetAnalysisCommand } from './ruleset-analysis-cli.js'; +import type { LetterGrade, RemediationSafetyLevel } from '@dawmatt/api-grade-core'; // Returns "source:line:col — " when error carries Spectral location data, else "" or "source — " function formatErrorLocation(error: unknown): string { @@ -77,7 +80,7 @@ program return n; }) .option('--url ', '(reserved for future use)') - .option('--remediation-safety ', 'Filter diagnostics to the given remediation safety level (currently: safe)') + .option('--remediation-safety ', 'Filter diagnostics to the given remediation safety level: safe, humanreview, or unsafe') .option('--verbose', 'Print full error stack on failure') .action(async (specFile: string, cliOpts: { minGrade?: string; @@ -112,8 +115,12 @@ program process.exit(1); } - if (cliOpts.remediationSafety !== undefined && cliOpts.remediationSafety !== 'safe') { - console.error(chalk.red(`Error: --remediation-safety must be "safe".`)); + const REMEDIATION_SAFETY_LEVELS: RemediationSafetyLevel[] = ['safe', 'humanreview', 'unsafe']; + if ( + cliOpts.remediationSafety !== undefined && + !REMEDIATION_SAFETY_LEVELS.includes(cliOpts.remediationSafety as RemediationSafetyLevel) + ) { + console.error(chalk.red(`Error: --remediation-safety must be one of: safe, humanreview, unsafe.`)); process.exit(1); } @@ -177,11 +184,14 @@ program rulesetPath, }); - if (cliOpts.remediationSafety === 'safe') { + if (cliOpts.remediationSafety !== undefined) { + const requestedLevel = cliOpts.remediationSafety as RemediationSafetyLevel; const specContent = readFileSync(specFile, 'utf-8'); + const loadedRuleset = await loadRuleset(result.format, rulesetPath); + const rulesetAnalysis = await analyseRuleset(loadedRuleset); const output = outputFormat === 'json' - ? JSON.stringify(buildQuickFixOutput(result, specContent)) - : formatQuickFixesHuman(result, specContent); + ? JSON.stringify(buildRemediationSafetyOutput(result, specContent, rulesetAnalysis, requestedLevel)) + : formatRemediationSafetyHuman(result, specContent, rulesetAnalysis, requestedLevel); console.log(output); } else { const output = outputFormat === 'json' @@ -232,5 +242,6 @@ program }); registerConfigCommand(program); +registerRulesetAnalysisCommand(program); program.parse(); diff --git a/src/cli/ruleset-analysis-cli.ts b/src/cli/ruleset-analysis-cli.ts new file mode 100644 index 0000000..018333f --- /dev/null +++ b/src/cli/ruleset-analysis-cli.ts @@ -0,0 +1,121 @@ +import { Command } from 'commander'; +import chalk from 'chalk'; +import { analyseRuleset, loadRuleset, persistRuleAnalysisCorrection } from '@dawmatt/api-grade-core'; +import type { RulesetAnalysis, RemediationSafetyLevel } from '@dawmatt/api-grade-core'; + +export interface RulesetAnalysisOptions { + rulesetPath?: string; + format?: string; +} + +export interface RulesetAnalysisCorrectOptions { + ruleId?: string; + level?: string; + rulesetPath?: string; + format?: string; +} + +function fail(message: string, format: string | undefined): never { + if (format === 'json') { + console.log(JSON.stringify({ error: 'RULESET_ANALYSIS_FAILED', message })); + } else { + console.error(chalk.red(`Error: ${message}`)); + } + process.exit(1); +} + +function formatHuman(analysis: RulesetAnalysis): string { + const lines: string[] = []; + lines.push( + `${'rule id'.padEnd(34)}${'risk level'.padEnd(12)}${'confidence'.padEnd(12)}${'remediation safety'.padEnd(20)}${'assessed by'.padEnd(13)}rationale` + ); + for (const rule of analysis.rules) { + lines.push( + `${rule.ruleId.padEnd(34)}${(rule.riskLevel ?? 'n/a').padEnd(12)}${rule.confidenceLevel.padEnd(12)}${rule.remediationSafetyLevel.padEnd(20)}${rule.assessedBy.padEnd(13)}${rule.rationale}` + ); + if (rule.staleFingerprintWarning) { + lines.push(` WARNING: ${rule.staleFingerprintWarning.message}`); + } + } + return lines.join('\n'); +} + +export async function runRulesetAnalysis(opts: RulesetAnalysisOptions): Promise { + const format = opts.format ?? 'human'; + if (format !== 'json' && format !== 'human') { + fail(`--format must be "json" or "human".`, format); + } + + try { + const loadedRuleset = await loadRuleset('openapi-3', opts.rulesetPath); + const analysis = await analyseRuleset(loadedRuleset); + + if (format === 'json') { + console.log(JSON.stringify(analysis)); + } else { + console.log(formatHuman(analysis)); + } + } catch (err) { + const message = err instanceof Error ? err.message : String(err); + fail(message, format); + } +} + +const REMEDIATION_SAFETY_LEVELS: RemediationSafetyLevel[] = ['safe', 'humanreview', 'unsafe']; + +export async function runRulesetAnalysisCorrect(opts: RulesetAnalysisCorrectOptions): Promise { + const format = opts.format ?? 'human'; + if (format !== 'json' && format !== 'human') { + fail(`--format must be "json" or "human".`, format); + } + if (!opts.ruleId) { + fail('--rule-id is required.', format); + } + if (!opts.level || !REMEDIATION_SAFETY_LEVELS.includes(opts.level as RemediationSafetyLevel)) { + fail('--level must be one of: safe, humanreview, unsafe.', format); + } + + try { + const loadedRuleset = await loadRuleset('openapi-3', opts.rulesetPath); + const result = await persistRuleAnalysisCorrection( + loadedRuleset, + opts.ruleId!, + opts.level as RemediationSafetyLevel + ); + + if (format === 'json') { + console.log(JSON.stringify({ ruleId: opts.ruleId, level: opts.level, ...result })); + } else { + console.log(`Persisted '${opts.ruleId}' as ${opts.level} (${result.written}).`); + if (result.sharedFileContent) { + console.log('This ruleset location is not locally writable; commit the following shared-analysis content yourself:'); + console.log(result.sharedFileContent); + } + } + } catch (err) { + const message = err instanceof Error ? err.message : String(err); + fail(message, format); + } +} + +export function registerRulesetAnalysisCommand(program: Command): void { + const rulesetAnalysis = program + .command('ruleset-analysis') + .description("Inspect a ruleset's remediation-safety analysis independent of grading any spec") + .option('--ruleset-path ', 'Path to a custom Spectral-compatible ruleset file; omit to analyse the built-in ruleset') + .option('--format ', 'Output format: json or human', 'human') + .action(async (opts: RulesetAnalysisOptions) => { + await runRulesetAnalysis(opts); + }); + + rulesetAnalysis + .command('correct') + .description('Persist a human-confirmed remediation-safety correction for one rule') + .requiredOption('--rule-id ', 'The ruleId to correct') + .requiredOption('--level ', 'The remediation safety level to persist') + .option('--ruleset-path ', 'Path to a custom Spectral-compatible ruleset file; omit to target the built-in ruleset') + .option('--format ', 'Output format: json or human', 'human') + .action(async (opts: RulesetAnalysisCorrectOptions) => { + await runRulesetAnalysisCorrect(opts); + }); +} diff --git a/tests/integration/cli-quick-fixes.test.ts b/tests/integration/cli-quick-fixes.test.ts deleted file mode 100644 index e1c1b1e..0000000 --- a/tests/integration/cli-quick-fixes.test.ts +++ /dev/null @@ -1,100 +0,0 @@ -import { describe, it, expect } from 'vitest'; -import { spawnSync } from 'node:child_process'; -import { resolve, dirname } from 'node:path'; -import { fileURLToPath } from 'node:url'; - -const __filename = fileURLToPath(import.meta.url); -const __dirname = dirname(__filename); - -const CLI = resolve(__dirname, '../../dist/cli/index.js'); -const FIXTURES = resolve(__dirname, '../fixtures'); - -function runCli(args: string[]): { status: number | null; stdout: string; stderr: string } { - const result = spawnSync('node', [CLI, ...args], { encoding: 'utf-8' }); - return { status: result.status, stdout: result.stdout ?? '', stderr: result.stderr ?? '' }; -} - -describe('CLI --remediation-safety flag', () => { - it('--remediation-safety safe --format json matches the QuickFixOutput shape', () => { - const { status, stdout } = runCli([ - resolve(FIXTURES, 'openapi/poor-quality.yaml'), - '--remediation-safety', 'safe', - '--format', 'json', - ]); - expect(status).toBe(0); - const data = JSON.parse(stdout); - expect(data).toHaveProperty('specPath'); - expect(data).toHaveProperty('format'); - expect(data).toHaveProperty('totalViolations'); - expect(data).toHaveProperty('quickFixCount'); - expect(data).toHaveProperty('quickFixes'); - expect(Array.isArray(data.quickFixes)).toBe(true); - }, 30000); - - it('--remediation-safety safe with no --format prints human-readable text containing the filtered ruleIds', () => { - const jsonResult = runCli([ - resolve(FIXTURES, 'openapi/poor-quality.yaml'), - '--remediation-safety', 'safe', - '--format', 'json', - ]); - const { quickFixes } = JSON.parse(jsonResult.stdout); - - const { status, stdout } = runCli([ - resolve(FIXTURES, 'openapi/poor-quality.yaml'), - '--remediation-safety', 'safe', - ]); - expect(status).toBe(0); - expect(() => JSON.parse(stdout)).toThrow(); - for (const fix of quickFixes) { - expect(stdout).toContain(fix.ruleId); - } - }, 30000); - - it('--remediation-safety safe --format human also prints human-readable text with the same ruleIds', () => { - const jsonResult = runCli([ - resolve(FIXTURES, 'openapi/poor-quality.yaml'), - '--remediation-safety', 'safe', - '--format', 'json', - ]); - const { quickFixes } = JSON.parse(jsonResult.stdout); - - const { status, stdout } = runCli([ - resolve(FIXTURES, 'openapi/poor-quality.yaml'), - '--remediation-safety', 'safe', - '--format', 'human', - ]); - expect(status).toBe(0); - expect(() => JSON.parse(stdout)).toThrow(); - for (const fix of quickFixes) { - expect(stdout).toContain(fix.ruleId); - } - }, 30000); - - it('--remediation-safety safe --min-grade still evaluates the gate against the full unfiltered result', () => { - const { status, stderr } = runCli([ - resolve(FIXTURES, 'openapi/poor-quality.yaml'), - '--remediation-safety', 'safe', - '--min-grade', 'A', - ]); - expect(status).toBe(1); - expect(stderr).toMatch(/grade/i); - }, 30000); - - it('--quick-fixes-only is rejected as an unknown option', () => { - const { status, stderr } = runCli([ - resolve(FIXTURES, 'openapi/poor-quality.yaml'), - '--quick-fixes-only', - ]); - expect(status).not.toBe(0); - expect(stderr).toMatch(/unknown option/i); - }, 30000); - - it('--remediation-safety with an unsupported level fails with a clear error and non-zero exit code', () => { - const { status, stderr } = runCli([ - resolve(FIXTURES, 'openapi/poor-quality.yaml'), - '--remediation-safety', 'unsafe', - ]); - expect(status).not.toBe(0); - expect(stderr).toMatch(/--remediation-safety must be "safe"/); - }, 30000); -}); diff --git a/tests/integration/cli-remediation-safety.test.ts b/tests/integration/cli-remediation-safety.test.ts new file mode 100644 index 0000000..90efede --- /dev/null +++ b/tests/integration/cli-remediation-safety.test.ts @@ -0,0 +1,188 @@ +import { describe, it, expect } from 'vitest'; +import { spawnSync } from 'node:child_process'; +import { resolve, dirname, join } from 'node:path'; +import { fileURLToPath } from 'node:url'; +import { mkdtempSync, copyFileSync, rmSync, existsSync } from 'node:fs'; +import { tmpdir } from 'node:os'; + +const __filename = fileURLToPath(import.meta.url); +const __dirname = dirname(__filename); + +const CLI = resolve(__dirname, '../../dist/cli/index.js'); +const FIXTURES = resolve(__dirname, '../fixtures'); + +function runCli(args: string[]): { status: number | null; stdout: string; stderr: string } { + const result = spawnSync('node', [CLI, ...args], { encoding: 'utf-8' }); + return { status: result.status, stdout: result.stdout ?? '', stderr: result.stderr ?? '' }; +} + +describe('CLI --remediation-safety flag', () => { + it.each(['safe', 'humanreview', 'unsafe'])('--remediation-safety %s --format json returns the RemediationSafetyOutput shape', (level) => { + const { status, stdout } = runCli([ + resolve(FIXTURES, 'openapi/poor-quality.yaml'), + '--remediation-safety', level, + '--format', 'json', + ]); + expect(status).toBe(0); + const data = JSON.parse(stdout); + expect(data).toHaveProperty('specPath'); + expect(data).toHaveProperty('format'); + expect(data).toHaveProperty('totalViolations'); + expect(data).toHaveProperty('remediationItemCount'); + expect(data).toHaveProperty('remediationItems'); + expect(data).toHaveProperty('requestedLevel', level); + expect(Array.isArray(data.remediationItems)).toBe(true); + for (const item of data.remediationItems) { + expect(item).toHaveProperty('riskLevel'); + expect(item).toHaveProperty('confidenceLevel'); + expect(item).toHaveProperty('remediationSafetyLevel', level); + expect(item).toHaveProperty('staleFingerprintWarning'); + } + }, 30000); + + it('--remediation-safety safe with no --format prints human-readable text containing the filtered ruleIds', () => { + const jsonResult = runCli([ + resolve(FIXTURES, 'openapi/poor-quality.yaml'), + '--remediation-safety', 'safe', + '--format', 'json', + ]); + const { remediationItems } = JSON.parse(jsonResult.stdout); + + const { status, stdout } = runCli([ + resolve(FIXTURES, 'openapi/poor-quality.yaml'), + '--remediation-safety', 'safe', + ]); + expect(status).toBe(0); + expect(() => JSON.parse(stdout)).toThrow(); + for (const item of remediationItems) { + expect(stdout).toContain(item.ruleId); + } + }, 30000); + + it('--remediation-safety safe --format human also prints human-readable text with the same ruleIds', () => { + const jsonResult = runCli([ + resolve(FIXTURES, 'openapi/poor-quality.yaml'), + '--remediation-safety', 'safe', + '--format', 'json', + ]); + const { remediationItems } = JSON.parse(jsonResult.stdout); + + const { status, stdout } = runCli([ + resolve(FIXTURES, 'openapi/poor-quality.yaml'), + '--remediation-safety', 'safe', + '--format', 'human', + ]); + expect(status).toBe(0); + expect(() => JSON.parse(stdout)).toThrow(); + for (const item of remediationItems) { + expect(stdout).toContain(item.ruleId); + } + }, 30000); + + it('--remediation-safety safe --min-grade still evaluates the gate against the full unfiltered result', () => { + const { status, stderr } = runCli([ + resolve(FIXTURES, 'openapi/poor-quality.yaml'), + '--remediation-safety', 'safe', + '--min-grade', 'A', + ]); + expect(status).toBe(1); + expect(stderr).toMatch(/grade/i); + }, 30000); + + it('an unrecognized flag is rejected as an unknown option', () => { + const { status, stderr } = runCli([ + resolve(FIXTURES, 'openapi/poor-quality.yaml'), + '--not-a-real-flag', + ]); + expect(status).not.toBe(0); + expect(stderr).toMatch(/unknown option/i); + }, 30000); + + it('--remediation-safety with an unsupported level fails with the 3-value error message and non-zero exit code', () => { + const { status, stderr } = runCli([ + resolve(FIXTURES, 'openapi/poor-quality.yaml'), + '--remediation-safety', 'breaking', + ]); + expect(status).not.toBe(0); + expect(stderr).toMatch(/--remediation-safety must be one of: safe, humanreview, unsafe\./); + }, 30000); +}); + +describe('CLI ruleset-analysis subcommand', () => { + it('--format json returns a RulesetAnalysis document for the built-in ruleset', () => { + const { status, stdout } = runCli(['ruleset-analysis', '--format', 'json']); + expect(status).toBe(0); + const data = JSON.parse(stdout); + expect(data).toHaveProperty('rulesetSource', 'default'); + expect(Array.isArray(data.rules)).toBe(true); + expect(data.rules.length).toBeGreaterThan(0); + for (const rule of data.rules) { + expect(rule).toHaveProperty('ruleId'); + expect(rule).toHaveProperty('confidenceLevel'); + expect(rule).toHaveProperty('remediationSafetyLevel'); + expect(rule).toHaveProperty('assessedBy'); + expect(rule).toHaveProperty('rationale'); + } + }, 30000); + + it('--format human (default) prints a readable table including assessed-by and rationale', () => { + const { status, stdout } = runCli(['ruleset-analysis']); + expect(status).toBe(0); + expect(() => JSON.parse(stdout)).toThrow(); + expect(stdout.length).toBeGreaterThan(0); + }, 30000); + + it('--ruleset-path analyses a custom ruleset', () => { + const { status, stdout } = runCli([ + 'ruleset-analysis', + '--ruleset-path', resolve(FIXTURES, 'rulesets/minimal.yaml'), + '--format', 'json', + ]); + if (status === 0) { + const data = JSON.parse(stdout); + expect(data).toHaveProperty('rulesetSource', 'custom'); + } + }, 30000); + + it('correct persists a human-confirmed classification, reloaded by a later ruleset-analysis call', () => { + const workDir = mkdtempSync(join(tmpdir(), 'api-grade-correct-')); + const rulesetPath = join(workDir, 'minimal.yaml'); + copyFileSync(resolve(FIXTURES, 'rulesets/minimal.yaml'), rulesetPath); + + try { + const before = runCli(['ruleset-analysis', '--ruleset-path', rulesetPath, '--format', 'json']); + expect(before.status).toBe(0); + const beforeData = JSON.parse(before.stdout); + const ruleId = beforeData.rules[0]?.ruleId; + if (!ruleId) return; // empty ruleset fixture — nothing to correct + + const correct = runCli([ + 'ruleset-analysis', 'correct', + '--rule-id', ruleId, + '--level', 'safe', + '--ruleset-path', rulesetPath, + '--format', 'json', + ]); + expect(correct.status).toBe(0); + const correctData = JSON.parse(correct.stdout); + expect(correctData.written).toBe('shared'); + expect(existsSync(`${rulesetPath}.remediation-safety.json`)).toBe(true); + + const after = runCli(['ruleset-analysis', '--ruleset-path', rulesetPath, '--format', 'json']); + expect(after.status).toBe(0); + const afterData = JSON.parse(after.stdout); + const entry = afterData.rules.find((r: { ruleId: string }) => r.ruleId === ruleId); + expect(entry.remediationSafetyLevel).toBe('safe'); + expect(entry.assessedBy).toBe('human'); + expect(entry.staleFingerprintWarning).toBeNull(); + } finally { + rmSync(workDir, { recursive: true, force: true }); + } + }, 30000); + + it('correct rejects an unsupported --level value', () => { + const { status, stderr } = runCli(['ruleset-analysis', 'correct', '--rule-id', 'some-rule', '--level', 'breaking']); + expect(status).not.toBe(0); + expect(stderr).toMatch(/--level must be one of: safe, humanreview, unsafe/); + }, 30000); +}); diff --git a/tests/unit/ruleset-analysis-cli.test.ts b/tests/unit/ruleset-analysis-cli.test.ts new file mode 100644 index 0000000..1b163f4 --- /dev/null +++ b/tests/unit/ruleset-analysis-cli.test.ts @@ -0,0 +1,100 @@ +import { describe, it, expect, vi, beforeEach, afterEach } from 'vitest'; +import { mkdtempSync, rmSync, copyFileSync, existsSync } from 'node:fs'; +import { tmpdir } from 'node:os'; +import { join, resolve, dirname } from 'node:path'; +import { fileURLToPath } from 'node:url'; + +const __dirname = dirname(fileURLToPath(import.meta.url)); +const FIXTURES = resolve(__dirname, '../fixtures'); + +const { runRulesetAnalysis, runRulesetAnalysisCorrect } = await import('../../src/cli/ruleset-analysis-cli.js'); + +class FakeExit extends Error { + constructor(public code: number) { + super(`process.exit(${code})`); + } +} + +let logs: string[]; +let errors: string[]; +let workDir: string; + +beforeEach(() => { + workDir = mkdtempSync(join(tmpdir(), 'api-grade-ruleset-analysis-cli-')); + vi.spyOn(process, 'cwd').mockReturnValue(workDir); + vi.spyOn(process, 'exit').mockImplementation(((code?: number) => { + throw new FakeExit(code ?? 0); + }) as never); + logs = []; + errors = []; + vi.spyOn(console, 'log').mockImplementation((msg: string) => { logs.push(msg); }); + vi.spyOn(console, 'error').mockImplementation((msg: string) => { errors.push(msg); }); +}); + +afterEach(() => { + vi.restoreAllMocks(); + try { rmSync(workDir, { recursive: true, force: true }); } catch { /* ignore */ } +}); + +describe('runRulesetAnalysis', () => { + it('rejects an invalid --format value', async () => { + await expect(runRulesetAnalysis({ format: 'xml' })).rejects.toBeInstanceOf(FakeExit); + expect(errors.join('\n')).toMatch(/--format must be "json" or "human"/); + }); + + it('analyses the built-in ruleset as json', async () => { + await runRulesetAnalysis({ format: 'json' }); + const data = JSON.parse(logs[0]); + expect(data.rulesetSource).toBe('default'); + expect(Array.isArray(data.rules)).toBe(true); + expect(data.rules.length).toBeGreaterThan(0); + }); + + it('analyses the built-in ruleset as human-readable text by default', async () => { + await runRulesetAnalysis({}); + expect(logs[0]).toContain('rule id'); + expect(() => JSON.parse(logs[0])).toThrow(); + }); + + it('analyses a custom ruleset path', async () => { + const rulesetPath = join(workDir, 'minimal.yaml'); + copyFileSync(resolve(FIXTURES, 'rulesets/minimal.yaml'), rulesetPath); + await runRulesetAnalysis({ rulesetPath, format: 'json' }); + const data = JSON.parse(logs[0]); + expect(data.rulesetSource).toBe('custom'); + expect(data.rules.some((r: { ruleId: string }) => r.ruleId === 'test-rule')).toBe(true); + }); + + it('reports an error for a non-existent ruleset path', async () => { + await expect(runRulesetAnalysis({ rulesetPath: '/nonexistent/ruleset.yaml', format: 'json' })).rejects.toBeInstanceOf(FakeExit); + const body = JSON.parse(errors.length ? errors[0] : logs[0]); + expect(body.error).toBe('RULESET_ANALYSIS_FAILED'); + }); +}); + +describe('runRulesetAnalysisCorrect', () => { + it('requires --rule-id', async () => { + await expect(runRulesetAnalysisCorrect({ level: 'safe' })).rejects.toBeInstanceOf(FakeExit); + expect(errors.join('\n')).toMatch(/--rule-id is required/); + }); + + it('rejects an unsupported --level value', async () => { + await expect(runRulesetAnalysisCorrect({ ruleId: 'some-rule', level: 'breaking' })).rejects.toBeInstanceOf(FakeExit); + expect(errors.join('\n')).toMatch(/--level must be one of: safe, humanreview, unsafe/); + }); + + it('persists a correction to the colocated shared analysis file for a local ruleset', async () => { + const rulesetPath = join(workDir, 'minimal.yaml'); + copyFileSync(resolve(FIXTURES, 'rulesets/minimal.yaml'), rulesetPath); + + await runRulesetAnalysisCorrect({ ruleId: 'test-rule', level: 'safe', rulesetPath, format: 'json' }); + expect(existsSync(`${rulesetPath}.remediation-safety.json`)).toBe(true); + const body = JSON.parse(logs[0]); + expect(body.written).toBe('shared'); + }); + + it('prints shared-file content when falling back to a personal override for the built-in ruleset', async () => { + await runRulesetAnalysisCorrect({ ruleId: 'operation-description', level: 'unsafe', format: 'human' }); + expect(logs.join('\n')).toMatch(/not locally writable/); + }); +}); From 2de9c26479d44f24c5f732c4c1ad90f895f5e5ef Mon Sep 17 00:00:00 2001 From: DawMatt Date: Thu, 25 Jun 2026 07:54:04 +1000 Subject: [PATCH 10/22] Safety assessment supports default rulesets --- .../scripts/generate-bundled-analysis.mjs | 143 ++--- .../api-grade-core/src/remediation-safety.ts | 95 ++- .../rulesets/bundled-analysis/asyncapi.json | 578 ++++++++++++++--- .../rulesets/bundled-analysis/openapi.json | 581 ++++++++++++++++-- specs/012-remediation-safety/quickstart.md | 8 +- 5 files changed, 1159 insertions(+), 246 deletions(-) diff --git a/packages/api-grade-core/scripts/generate-bundled-analysis.mjs b/packages/api-grade-core/scripts/generate-bundled-analysis.mjs index 8ce374a..7ab0bcc 100644 --- a/packages/api-grade-core/scripts/generate-bundled-analysis.mjs +++ b/packages/api-grade-core/scripts/generate-bundled-analysis.mjs @@ -1,108 +1,71 @@ -// Maintenance utility: regenerates the seeded entries in -// src/rulesets/bundled-analysis/{openapi,asyncapi}.json from the curated rule-id lists -// (FR-012/FR-020). Run manually after bumping @stoplight/spectral-rulesets or editing the -// curated lists below; does not run as part of the package build. +// Maintenance utility: regenerates src/rulesets/bundled-analysis/{openapi,asyncapi}.json by +// running the real analyseRuleset() Stage 1/2 engine (dist/remediation-safety.js) over the +// entire built-in OpenAPI/AsyncAPI rulesets, so the built-in ruleset's analysis never requires +// per-rule computation at request time (SC-007). // -// IMPORTANT: these entries are assessedBy: "automated" — they are a seeded, no-human-in-the-loop -// classification, not a maintainer's reviewed judgement. Per the data model, assessedBy: "human" -// is reserved for a classification an actual person has explicitly reviewed and persisted (e.g. -// via `ruleset-analysis correct`). Do not flip these to "human" without a real maintainer review. -import { createHash } from 'node:crypto'; +// Run manually after bumping @stoplight/spectral-rulesets, after changing the analyser's +// heuristic, or after a maintainer reviews and wants to seed a rule's classification (see the +// HUMAN_REVIEWED table below). Requires `npm run build` to have run first (reads dist/). +// +// IMPORTANT: entries are assessedBy: "automated" unless the rule id is listed in +// HUMAN_REVIEWED below — that table exists for a maintainer to record an actual review (FR-020), +// not as a place to seed a guess. Do not add a rule to HUMAN_REVIEWED unless a person has +// actually read that rule's definition and confirmed the classification. import { writeFileSync } from 'node:fs'; import { dirname, join } from 'node:path'; import { fileURLToPath } from 'node:url'; import { oas, asyncapi } from '@stoplight/spectral-rulesets'; +import { analyseRuleset, computeRuleFingerprint } from '../dist/remediation-safety.js'; const __dirname = dirname(fileURLToPath(import.meta.url)); const outDir = join(__dirname, '..', 'src', 'rulesets', 'bundled-analysis'); -function givenExprsOf(rule) { - if (!rule.given) return []; - return Array.isArray(rule.given) ? rule.given : [rule.given]; -} +// A maintainer who has actually read a rule's definition and confirmed its classification +// records it here: { [ruleId]: { remediationSafetyLevel, rationale } }. Empty today — no rule +// has been through real human review yet. +const HUMAN_REVIEWED = {}; -function functionNamesOf(rule) { - const then = rule.then; - if (!then) return []; - const thens = Array.isArray(then) ? then : [then]; - return thens.map((t) => t?.function).filter((f) => typeof f === 'string'); -} - -function computeRuleFingerprint(ruleId, rule) { - const given = givenExprsOf(rule).join(','); - const fn = functionNamesOf(rule).join(','); - const severity = String(rule.severity ?? ''); - const description = rule.description ?? ''; - const raw = `${ruleId}|${given}|${fn}|${severity}|${description}`; - return createHash('sha256').update(raw).digest('hex'); -} +async function generate(ruleset, fileName) { + // rulesetSource: 'custom' (not 'default') so analyseRuleset() doesn't try to read this very + // file via its own Stage 0 bundled-lookup branch while we're regenerating it. + const loadedRuleset = { ruleset, rulesetSource: 'custom' }; + const analysis = await analyseRuleset(loadedRuleset); -// Curated rule-id -> classification, migrated from the former hard-coded -// quick_fixes_algorithm_spec.md tables. Maintainers add entries here as the project encounters -// new well-known rules; this is a config-only change, not an algorithm change. -const CURATED = { - safe: [ - 'operation-description', - 'operation-summary', - 'info-contact', - 'info-description', - 'info-license', - 'oas3-examples-value-or-externalValue', - 'tag-description', - 'asyncapi-info-contact', - 'asyncapi-info-description', - 'asyncapi-info-license', - 'asyncapi-operation-description', - 'asyncapi-3-operation-description', - 'asyncapi-tag-description', - 'asyncapi-3-tag-description', - 'asyncapi-parameter-description', - ], - humanreview: [ - 'operation-operationId', - 'operation-success-response', - 'oas3-server-not-example.com', - 'oas3-server-trailing-slash', - 'oas3-operation-security-defined', - 'oas2-operation-security-defined', - 'asyncapi-operation-operationId', - 'asyncapi-server-not-example-com', - 'asyncapi-3-server-not-example-com', - 'asyncapi-operation-security', - 'asyncapi-3-operation-security', - ], - unsafe: ['oas3-schema', 'oas3-valid-schema-example', 'oas2-schema', 'asyncapi-schema', 'asyncapi-payload'], -}; - -const RATIONALE = { - safe: 'seeded safe classification (bundled default, not yet human-reviewed)', - humanreview: 'seeded humanreview classification (bundled default, not yet human-reviewed)', - unsafe: 'seeded unsafe classification (bundled default, not yet human-reviewed)', -}; - -function generate(ruleset, fileName) { const rules = {}; - for (const [level, ruleIds] of Object.entries(CURATED)) { - for (const ruleId of ruleIds) { - const rule = ruleset.rules[ruleId]; - if (!rule) continue; // not present in this ruleset (e.g. an asyncapi-only id checked against oas) - rules[ruleId] = { - ruleId, - riskLevel: null, - confidenceLevel: 'high', - remediationSafetyLevel: level, - assessedBy: 'automated', - staleFingerprintWarning: null, - rationale: RATIONALE[level], - source: 'bundled-default', - fingerprint: computeRuleFingerprint(ruleId, rule), - }; - } + for (const ruleAnalysis of analysis.rules) { + const { ruleId } = ruleAnalysis; + const reviewed = HUMAN_REVIEWED[ruleId]; + const fingerprint = computeRuleFingerprint(ruleId, ruleset.rules[ruleId]); + + rules[ruleId] = reviewed + ? { + ruleId, + riskLevel: null, + confidenceLevel: 'high', + remediationSafetyLevel: reviewed.remediationSafetyLevel, + assessedBy: 'human', + staleFingerprintWarning: null, + rationale: reviewed.rationale, + source: 'bundled-default', + fingerprint, + } + : { + ruleId, + riskLevel: ruleAnalysis.riskLevel, + confidenceLevel: ruleAnalysis.confidenceLevel, + remediationSafetyLevel: ruleAnalysis.remediationSafetyLevel, + assessedBy: 'automated', + staleFingerprintWarning: null, + rationale: ruleAnalysis.rationale, + source: 'bundled-default', + fingerprint, + }; } + const outPath = join(outDir, fileName); writeFileSync(outPath, JSON.stringify({ rules }, null, 2) + '\n', 'utf-8'); console.log(`Wrote ${outPath} (${Object.keys(rules).length} entries)`); } -generate(oas, 'openapi.json'); -generate(asyncapi, 'asyncapi.json'); +await generate(oas, 'openapi.json'); +await generate(asyncapi, 'asyncapi.json'); diff --git a/packages/api-grade-core/src/remediation-safety.ts b/packages/api-grade-core/src/remediation-safety.ts index 4f6afc6..9ae10e0 100644 --- a/packages/api-grade-core/src/remediation-safety.ts +++ b/packages/api-grade-core/src/remediation-safety.ts @@ -37,7 +37,10 @@ interface StageResult { } interface SpectralThen { - function?: string; + // Spectral resolves `then.function` to an actual function reference once a ruleset is + // loaded (e.g. via @stoplight/spectral-rulesets); only hand-authored YAML rulesets parsed + // before bundling carry it as a plain string. Both forms must be handled. + function?: string | { name?: string }; field?: string; } @@ -114,23 +117,77 @@ function givenExprsOf(rule: SpectralRule): string[] { return Array.isArray(rule.given) ? rule.given : [rule.given]; } +// Spectral's built-in rulesets (and many custom ones) express `given` via macro aliases — +// e.g. "#OperationObject" — rather than literal JSONPath. An alias resolves to one or more +// JSONPath expressions, declared at the ruleset level either as a plain array or as +// { targets: [{ given: [...] }] }, and may itself reference other aliases recursively +// (e.g. "OperationObject" -> "#PathItem[get,put,...]" -> "$.paths[*]"). Without resolving +// these, segment/key-selector matching never sees a real path for most built-in rules. +type AliasDefinition = string[] | { targets?: Array<{ given?: string | string[] }> }; +type AliasMap = Record; + +const ALIAS_REF_RE = /^#([A-Za-z0-9_]+)(.*)$/; + +function resolveGivenExpr(expr: string, aliases: AliasMap, depth = 0): string[] { + if (depth > 10) return [expr]; + const match = ALIAS_REF_RE.exec(expr.trim()); + if (!match) return [expr]; + const [, aliasName, suffix] = match; + const aliasDef = aliases[aliasName]; + if (!aliasDef) return [expr]; + + const bases = Array.isArray(aliasDef) + ? aliasDef + : (aliasDef.targets ?? []).flatMap((t) => (Array.isArray(t.given) ? t.given : t.given ? [t.given] : [])); + + const resolved: string[] = []; + for (const base of bases) { + resolved.push(...resolveGivenExpr(`${base}${suffix}`, aliases, depth + 1)); + } + return resolved.length > 0 ? resolved : [expr]; +} + +function resolvedGivenExprsOf(rule: SpectralRule, aliases: AliasMap): string[] { + return givenExprsOf(rule).flatMap((expr) => resolveGivenExpr(expr, aliases)); +} + +function functionNameOf(fn: SpectralThen['function']): string | undefined { + if (typeof fn === 'string') return fn; + if (typeof fn === 'function') return (fn as { name?: string }).name || undefined; + if (fn && typeof fn === 'object' && typeof fn.name === 'string') return fn.name; + return undefined; +} + function functionNamesOf(rule: SpectralRule): string[] { const then = rule.then; if (!then) return []; const thens = Array.isArray(then) ? then : [then]; - return thens.map((t) => t?.function).filter((f): f is string => typeof f === 'string'); + return thens.map((t) => functionNameOf(t?.function)).filter((f): f is string => typeof f === 'string' && f.length > 0); } -function matchedTiers(givenExprs: string[]): Set { +// `then.field` names the specific sub-field a function actually targets (e.g. given +// "#OperationObject", field "operationId") — segment matching must consider it alongside +// `given`, since two rules sharing the same `given` (e.g. "operationId" vs "description" on +// the same OperationObject) are only distinguishable by their field. +function fieldTokensOf(rule: SpectralRule): string[] { + const then = rule.then; + if (!then) return []; + const thens = Array.isArray(then) ? then : [then]; + return thens.flatMap((t) => (typeof t?.field === 'string' ? tokenize(t.field) : [])); +} + +function matchedTiers(givenExprs: string[], extraSegments: string[] = []): Set { const tiers = new Set(); + const scan = (segment: string): void => { + if (segment.startsWith('x-')) tiers.add('safe'); + if (UNSAFE_SEGMENTS.has(segment)) tiers.add('unsafe'); + if (HUMANREVIEW_SEGMENTS.has(segment)) tiers.add('humanreview'); + if (SAFE_SEGMENTS.has(segment)) tiers.add('safe'); + }; for (const given of givenExprs) { - for (const segment of tokenize(given)) { - if (segment.startsWith('x-')) tiers.add('safe'); - if (UNSAFE_SEGMENTS.has(segment)) tiers.add('unsafe'); - if (HUMANREVIEW_SEGMENTS.has(segment)) tiers.add('humanreview'); - if (SAFE_SEGMENTS.has(segment)) tiers.add('safe'); - } + for (const segment of tokenize(given)) scan(segment); } + for (const segment of extraSegments) scan(segment); return tiers; } @@ -164,9 +221,9 @@ function stage1a(givenExprs: string[]): StageResult | null { } // Stage 1b: classify by the rule's `then.function` mechanics. -function stage1b(givenExprs: string[], functionNames: string[]): StageResult | null { +function stage1b(givenExprs: string[], functionNames: string[], fieldTokens: string[]): StageResult | null { if (functionNames.length === 0) return null; - const tiers = matchedTiers(givenExprs); + const tiers = matchedTiers(givenExprs, fieldTokens); for (const fn of functionNames) { if (ADDITIVE_FUNCTIONS.has(fn)) { @@ -206,8 +263,8 @@ function stage1b(givenExprs: string[], functionNames: string[]): StageResult | n } // Stage 1c: generic segment-membership fallback within Stage 1. -function stage1c(givenExprs: string[]): StageResult | null { - const tiers = matchedTiers(givenExprs); +function stage1c(givenExprs: string[], fieldTokens: string[]): StageResult | null { + const tiers = matchedTiers(givenExprs, fieldTokens); const tier = mostConservativeTier(tiers); if (tier === null) return null; const riskLevel = tierToRisk(tier); @@ -226,14 +283,15 @@ const STAGE2_FALLBACK: StageResult = { source: 'fallback', }; -function classifyRuleStages1And2(rule: SpectralRule): StageResult { - const givenExprs = givenExprsOf(rule); +function classifyRuleStages1And2(rule: SpectralRule, aliases: AliasMap): StageResult { + const givenExprs = resolvedGivenExprsOf(rule, aliases); + const fieldTokens = fieldTokensOf(rule); const a = stage1a(givenExprs); if (a) return a; const functionNames = functionNamesOf(rule); - const b = stage1b(givenExprs, functionNames); + const b = stage1b(givenExprs, functionNames, fieldTokens); if (b) return b; - const c = stage1c(givenExprs); + const c = stage1c(givenExprs, fieldTokens); if (c) return c; return STAGE2_FALLBACK; } @@ -314,6 +372,7 @@ export async function analyseRuleset( options?: { auth?: AuthConfig | null } ): Promise { const rulesMap = (loadedRuleset.ruleset?.rules ?? {}) as Record; + const aliases = (loadedRuleset.ruleset?.aliases ?? {}) as AliasMap; const ruleIds = Object.keys(rulesMap); const isBuiltIn = loadedRuleset.rulesetSource === 'default'; @@ -336,7 +395,7 @@ export async function analyseRuleset( ]); if (stage0) return stage0; - const { riskLevel, confidenceLevel, rationale, source } = classifyRuleStages1And2(rule); + const { riskLevel, confidenceLevel, rationale, source } = classifyRuleStages1And2(rule, aliases); return { ruleId, riskLevel, diff --git a/packages/api-grade-core/src/rulesets/bundled-analysis/asyncapi.json b/packages/api-grade-core/src/rulesets/bundled-analysis/asyncapi.json index 7f1a850..bc0f47a 100644 --- a/packages/api-grade-core/src/rulesets/bundled-analysis/asyncapi.json +++ b/packages/api-grade-core/src/rulesets/bundled-analysis/asyncapi.json @@ -1,169 +1,609 @@ { "rules": { + "asyncapi-channel-no-empty-parameter": { + "ruleId": "asyncapi-channel-no-empty-parameter", + "riskLevel": "medium", + "confidenceLevel": "high", + "remediationSafetyLevel": "humanreview", + "assessedBy": "automated", + "staleFingerprintWarning": null, + "rationale": "`pattern` function (rename/reformat) on a target matching the medium tier", + "source": "bundled-default", + "fingerprint": "faef5a6495551bbb7fa3f2d0d85ba9e4703ba0fe0cbccf3c40a8c502ba81f98b" + }, + "asyncapi-3-channel-no-empty-parameter": { + "ruleId": "asyncapi-3-channel-no-empty-parameter", + "riskLevel": "high", + "confidenceLevel": "medium", + "remediationSafetyLevel": "unsafe", + "assessedBy": "automated", + "staleFingerprintWarning": null, + "rationale": "`pattern` function (rename/reformat) on a target matching the high tier", + "source": "bundled-default", + "fingerprint": "7f20f52e4806ac9b686d5530ed6e7ff83965a6fc3afb47a2391ffad96d227031" + }, + "asyncapi-channel-no-query-nor-fragment": { + "ruleId": "asyncapi-channel-no-query-nor-fragment", + "riskLevel": "medium", + "confidenceLevel": "high", + "remediationSafetyLevel": "humanreview", + "assessedBy": "automated", + "staleFingerprintWarning": null, + "rationale": "`pattern` function (rename/reformat) on a target matching the medium tier", + "source": "bundled-default", + "fingerprint": "73eff0af4d20d3818101af39362816099cd3a524492d55d905d50b2a3c3cbff5" + }, + "asyncapi-3-channel-no-query-nor-fragment": { + "ruleId": "asyncapi-3-channel-no-query-nor-fragment", + "riskLevel": "high", + "confidenceLevel": "medium", + "remediationSafetyLevel": "unsafe", + "assessedBy": "automated", + "staleFingerprintWarning": null, + "rationale": "`pattern` function (rename/reformat) on a target matching the high tier", + "source": "bundled-default", + "fingerprint": "fbcc987e5a67e86b9220fee87b3226bd0b6d923fec9eb169c4aaad7d31b611ac" + }, + "asyncapi-channel-no-trailing-slash": { + "ruleId": "asyncapi-channel-no-trailing-slash", + "riskLevel": "medium", + "confidenceLevel": "high", + "remediationSafetyLevel": "humanreview", + "assessedBy": "automated", + "staleFingerprintWarning": null, + "rationale": "`pattern` function (rename/reformat) on a target matching the medium tier", + "source": "bundled-default", + "fingerprint": "6e35de3017f8bd11b0aeefee5728c0a7d1d31af3765c0345f79c89c6d87a2953" + }, + "asyncapi-3-channel-no-trailing-slash": { + "ruleId": "asyncapi-3-channel-no-trailing-slash", + "riskLevel": "high", + "confidenceLevel": "medium", + "remediationSafetyLevel": "unsafe", + "assessedBy": "automated", + "staleFingerprintWarning": null, + "rationale": "`pattern` function (rename/reformat) on a target matching the high tier", + "source": "bundled-default", + "fingerprint": "f3bc3a732e2ceb56f9ab5cfd9b75a12399480b456916d70cc609661a49c5868a" + }, + "asyncapi-channel-parameters": { + "ruleId": "asyncapi-channel-parameters", + "riskLevel": "high", + "confidenceLevel": "low", + "remediationSafetyLevel": "unsafe", + "assessedBy": "automated", + "staleFingerprintWarning": null, + "rationale": "custom function `asyncApiChannelParameters` — mechanics cannot be inferred statically", + "source": "bundled-default", + "fingerprint": "546d72f461b4d23e69fc9ea1785691431fc74a4e799a9f35d766d18dbc45ecf1" + }, + "asyncapi-channel-servers": { + "ruleId": "asyncapi-channel-servers", + "riskLevel": "high", + "confidenceLevel": "low", + "remediationSafetyLevel": "unsafe", + "assessedBy": "automated", + "staleFingerprintWarning": null, + "rationale": "custom function `asyncApi2ChannelServers` — mechanics cannot be inferred statically", + "source": "bundled-default", + "fingerprint": "501921363ba8348262034651379c72bde32f843bc1a06d1cce4807ba0c98ca7d" + }, + "asyncapi-3-channel-servers": { + "ruleId": "asyncapi-3-channel-servers", + "riskLevel": "medium", + "confidenceLevel": "high", + "remediationSafetyLevel": "humanreview", + "assessedBy": "automated", + "staleFingerprintWarning": null, + "rationale": "`pattern` function (rename/reformat) on a target matching the medium tier", + "source": "bundled-default", + "fingerprint": "f41d5f90903b2d7d6a4c32d07522249fd76bac6c3f4092bff84761c1b0b439c3" + }, + "asyncapi-headers-schema-type-object": { + "ruleId": "asyncapi-headers-schema-type-object", + "riskLevel": "high", + "confidenceLevel": "low", + "remediationSafetyLevel": "unsafe", + "assessedBy": "automated", + "staleFingerprintWarning": null, + "rationale": "given path matched multiple tiers (unsafe, humanreview) — conservative match, ambiguous", + "source": "bundled-default", + "fingerprint": "aecc325f9987b406d8cb954032bc0ed99526c56f30c01acbe0265f3120347bb7" + }, + "asyncapi-3-headers-schema-type-object": { + "ruleId": "asyncapi-3-headers-schema-type-object", + "riskLevel": "high", + "confidenceLevel": "low", + "remediationSafetyLevel": "unsafe", + "assessedBy": "automated", + "staleFingerprintWarning": null, + "rationale": "given path matched multiple tiers (unsafe, humanreview) — conservative match, ambiguous", + "source": "bundled-default", + "fingerprint": "d8a064576204e0cb9fec501dadb6c7a9b6088cd1211e8a1879296eabe6e17da9" + }, + "asyncapi-info-contact-properties": { + "ruleId": "asyncapi-info-contact-properties", + "riskLevel": "low", + "confidenceLevel": "high", + "remediationSafetyLevel": "safe", + "assessedBy": "automated", + "staleFingerprintWarning": null, + "rationale": "`truthy` function (additive — add/populate a field) on a target matching the low tier", + "source": "bundled-default", + "fingerprint": "720f6ac8c0660070d880cfbf10a400c8d47372d10666ff823b959accdeb0afc8" + }, "asyncapi-info-contact": { "ruleId": "asyncapi-info-contact", - "riskLevel": null, + "riskLevel": "low", "confidenceLevel": "high", "remediationSafetyLevel": "safe", "assessedBy": "automated", "staleFingerprintWarning": null, - "rationale": "seeded safe classification (bundled default, not yet human-reviewed)", + "rationale": "`truthy` function (additive — add/populate a field) on a target matching the low tier", "source": "bundled-default", - "fingerprint": "14a8d82b77003a5eee46743da7abe3078cf8a0cfcb6b4278b542c5f7137e0f9f" + "fingerprint": "40da564631e6b29b1e7f8d7eb4452e87d3e39c0cd98ca71b865b83a666f3fb17" }, "asyncapi-info-description": { "ruleId": "asyncapi-info-description", - "riskLevel": null, + "riskLevel": "low", + "confidenceLevel": "high", + "remediationSafetyLevel": "safe", + "assessedBy": "automated", + "staleFingerprintWarning": null, + "rationale": "`truthy` function (additive — add/populate a field) on a target matching the low tier", + "source": "bundled-default", + "fingerprint": "9d50c4f6e3f86a323ad186cc157699d9336a68b33b3d6d1ffe41e48852443ffe" + }, + "asyncapi-info-license-url": { + "ruleId": "asyncapi-info-license-url", + "riskLevel": "low", "confidenceLevel": "high", "remediationSafetyLevel": "safe", "assessedBy": "automated", "staleFingerprintWarning": null, - "rationale": "seeded safe classification (bundled default, not yet human-reviewed)", + "rationale": "`truthy` function (additive — add/populate a field) on a target matching the low tier", "source": "bundled-default", - "fingerprint": "f29927455a85a03534943e6e54e4e109c17fe795ea43ebb516ee5a2785b4eaab" + "fingerprint": "4539169bb97df454320a01bdc44ff8e717d2e6a02baf9e3da1543f45108a61d1" }, "asyncapi-info-license": { "ruleId": "asyncapi-info-license", - "riskLevel": null, + "riskLevel": "low", "confidenceLevel": "high", "remediationSafetyLevel": "safe", "assessedBy": "automated", "staleFingerprintWarning": null, - "rationale": "seeded safe classification (bundled default, not yet human-reviewed)", + "rationale": "`truthy` function (additive — add/populate a field) on a target matching the low tier", + "source": "bundled-default", + "fingerprint": "564c2cc47a5d30e0fcb8190c317617f3085b480df2d9d04fc154d171380b3404" + }, + "asyncapi-latest-version": { + "ruleId": "asyncapi-latest-version", + "riskLevel": "high", + "confidenceLevel": "low", + "remediationSafetyLevel": "unsafe", + "assessedBy": "automated", + "staleFingerprintWarning": null, + "rationale": "no recognizable rule-id, function, or path signal", + "source": "bundled-default", + "fingerprint": "e51e3e20cc1d8725ffc63beaa95d81c229652871f7355d452d8d3b1a7457f2bd" + }, + "asyncapi-message-examples": { + "ruleId": "asyncapi-message-examples", + "riskLevel": "high", + "confidenceLevel": "low", + "remediationSafetyLevel": "unsafe", + "assessedBy": "automated", + "staleFingerprintWarning": null, + "rationale": "custom function `asyncApi2MessageExamplesValidation` — mechanics cannot be inferred statically", + "source": "bundled-default", + "fingerprint": "5cc61f794e110895be4e9998553f3a772c1ed5482703f754c05595732e22d438" + }, + "asyncapi-message-messageId-uniqueness": { + "ruleId": "asyncapi-message-messageId-uniqueness", + "riskLevel": "high", + "confidenceLevel": "low", + "remediationSafetyLevel": "unsafe", + "assessedBy": "automated", + "staleFingerprintWarning": null, + "rationale": "custom function `asyncApi2MessageIdUniqueness` — mechanics cannot be inferred statically", "source": "bundled-default", - "fingerprint": "e1f25b8df806e73c25c4d0fd59eec7b3d3e19847ea7060d419d471b67b3ef0d4" + "fingerprint": "406375ccdcedf2a4bdcee791741bc01313b5be03a9c211db25945a4aec6f41b3" }, "asyncapi-operation-description": { "ruleId": "asyncapi-operation-description", - "riskLevel": null, - "confidenceLevel": "high", - "remediationSafetyLevel": "safe", + "riskLevel": "medium", + "confidenceLevel": "medium", + "remediationSafetyLevel": "humanreview", "assessedBy": "automated", "staleFingerprintWarning": null, - "rationale": "seeded safe classification (bundled default, not yet human-reviewed)", + "rationale": "`truthy` function (additive — add/populate a field) on a target matching the medium tier", "source": "bundled-default", - "fingerprint": "4afab5ea1e1eff4077756b1a08e737134a18573f32939331626000ba91ee833c" + "fingerprint": "ab7f3fff78bcbb5051374ab1644d3f2312d91bc081d9a7cb5210f33ac745df86" }, "asyncapi-3-operation-description": { "ruleId": "asyncapi-3-operation-description", - "riskLevel": null, - "confidenceLevel": "high", - "remediationSafetyLevel": "safe", + "riskLevel": "medium", + "confidenceLevel": "medium", + "remediationSafetyLevel": "humanreview", "assessedBy": "automated", "staleFingerprintWarning": null, - "rationale": "seeded safe classification (bundled default, not yet human-reviewed)", + "rationale": "`truthy` function (additive — add/populate a field) on a target matching the medium tier", "source": "bundled-default", - "fingerprint": "c5b8b035e99fdfe7a8caca6a4cd4db7a1ee538fa91a9ed70e7d02957272a7d46" + "fingerprint": "9db022057bde86c48e4fd30348bf498c995d7dcbf1cba7a2ab77588ec38ec468" }, - "asyncapi-tag-description": { - "ruleId": "asyncapi-tag-description", - "riskLevel": null, - "confidenceLevel": "high", - "remediationSafetyLevel": "safe", + "asyncapi-operation-operationId-uniqueness": { + "ruleId": "asyncapi-operation-operationId-uniqueness", + "riskLevel": "high", + "confidenceLevel": "low", + "remediationSafetyLevel": "unsafe", "assessedBy": "automated", "staleFingerprintWarning": null, - "rationale": "seeded safe classification (bundled default, not yet human-reviewed)", + "rationale": "custom function `asyncApi2OperationIdUniqueness` — mechanics cannot be inferred statically", "source": "bundled-default", - "fingerprint": "a5404f455c18ca7a9e0b06de566c875a1dcdf90dace94ea07d9a89a7b9c3c978" + "fingerprint": "ef6c87aaf69cec04cb0d91955ae4371d0b83df17954c3f35cb73e04a93c43080" }, - "asyncapi-3-tag-description": { - "ruleId": "asyncapi-3-tag-description", - "riskLevel": null, + "asyncapi-operation-operationId": { + "ruleId": "asyncapi-operation-operationId", + "riskLevel": "medium", "confidenceLevel": "high", - "remediationSafetyLevel": "safe", + "remediationSafetyLevel": "humanreview", + "assessedBy": "automated", + "staleFingerprintWarning": null, + "rationale": "`truthy` function (additive — add/populate a field) on a target matching the medium tier", + "source": "bundled-default", + "fingerprint": "0b46aef66a0b399e3c55e73100ba41afbb42c3479f2d073dbb57966760b4602a" + }, + "asyncapi-operation-security": { + "ruleId": "asyncapi-operation-security", + "riskLevel": "high", + "confidenceLevel": "low", + "remediationSafetyLevel": "unsafe", + "assessedBy": "automated", + "staleFingerprintWarning": null, + "rationale": "custom function `asyncApiSecurity` — mechanics cannot be inferred statically", + "source": "bundled-default", + "fingerprint": "6560182f944e99a49785c408312a0b8dd7c74c236693c7dd197d2ec19c765191" + }, + "asyncapi-3-operation-security": { + "ruleId": "asyncapi-3-operation-security", + "riskLevel": "high", + "confidenceLevel": "low", + "remediationSafetyLevel": "unsafe", "assessedBy": "automated", "staleFingerprintWarning": null, - "rationale": "seeded safe classification (bundled default, not yet human-reviewed)", + "rationale": "custom function `asyncApiSecurity` — mechanics cannot be inferred statically", "source": "bundled-default", - "fingerprint": "9396925fe26ba8c8ae085204e4d48ab0ca2ffc88e677c731192528cd5b70c048" + "fingerprint": "a90de728f8e6ce307bbdfaf172004d4e70bdaf20dc91c619312b63b57587eae2" }, "asyncapi-parameter-description": { "ruleId": "asyncapi-parameter-description", - "riskLevel": null, + "riskLevel": "high", + "confidenceLevel": "medium", + "remediationSafetyLevel": "unsafe", + "assessedBy": "automated", + "staleFingerprintWarning": null, + "rationale": "`truthy` function (additive — add/populate a field) on a target matching the high tier", + "source": "bundled-default", + "fingerprint": "ea3bea998a82498d71473dd6a0babd13afe45883b80fad10cf05c9743bcab37e" + }, + "asyncapi-payload-default": { + "ruleId": "asyncapi-payload-default", + "riskLevel": "high", + "confidenceLevel": "low", + "remediationSafetyLevel": "unsafe", + "assessedBy": "automated", + "staleFingerprintWarning": null, + "rationale": "custom function `asyncApiSchemaValidation` — mechanics cannot be inferred statically", + "source": "bundled-default", + "fingerprint": "06e469803d27d75bb24daf00d492172c5f94fb119614fcbf28577f9939ba9585" + }, + "asyncapi-payload-examples": { + "ruleId": "asyncapi-payload-examples", + "riskLevel": "high", + "confidenceLevel": "low", + "remediationSafetyLevel": "unsafe", + "assessedBy": "automated", + "staleFingerprintWarning": null, + "rationale": "custom function `asyncApiSchemaValidation` — mechanics cannot be inferred statically", + "source": "bundled-default", + "fingerprint": "8b1f583bcbbf360262fd9a906a367202f1d83910632b5c39d7550dcd75d888b8" + }, + "asyncapi-payload-unsupported-schemaFormat": { + "ruleId": "asyncapi-payload-unsupported-schemaFormat", + "riskLevel": "high", + "confidenceLevel": "low", + "remediationSafetyLevel": "unsafe", + "assessedBy": "automated", + "staleFingerprintWarning": null, + "rationale": "given path matched multiple tiers (unsafe, humanreview) — conservative match, ambiguous", + "source": "bundled-default", + "fingerprint": "70806490c4dcacc9e31a8391ef69dd0304466bd2b6812d41b6806315b6e46a41" + }, + "asyncapi-3-payload-unsupported-schemaFormat": { + "ruleId": "asyncapi-3-payload-unsupported-schemaFormat", + "riskLevel": "high", + "confidenceLevel": "low", + "remediationSafetyLevel": "unsafe", + "assessedBy": "automated", + "staleFingerprintWarning": null, + "rationale": "given path matched multiple tiers (unsafe, humanreview) — conservative match, ambiguous", + "source": "bundled-default", + "fingerprint": "a7efbd604af1044c108ed1579e2895c54818373add14e9e0992e17dc55690cc0" + }, + "asyncapi-payload": { + "ruleId": "asyncapi-payload", + "riskLevel": "high", + "confidenceLevel": "low", + "remediationSafetyLevel": "unsafe", + "assessedBy": "automated", + "staleFingerprintWarning": null, + "rationale": "custom function `asyncApiPayloadValidation` — mechanics cannot be inferred statically", + "source": "bundled-default", + "fingerprint": "be86ce0d6e6f458480dd16a3325a73dfa034ba8cab33a3a1f2fa4008a0cae0d6" + }, + "asyncapi-schema-default": { + "ruleId": "asyncapi-schema-default", + "riskLevel": "high", + "confidenceLevel": "low", + "remediationSafetyLevel": "unsafe", + "assessedBy": "automated", + "staleFingerprintWarning": null, + "rationale": "custom function `asyncApiSchemaValidation` — mechanics cannot be inferred statically", + "source": "bundled-default", + "fingerprint": "53561388f54e7f445a8c3e6aa246421a07a1d4006dfffa859fe6184b513ce06c" + }, + "asyncapi-schema-examples": { + "ruleId": "asyncapi-schema-examples", + "riskLevel": "high", + "confidenceLevel": "low", + "remediationSafetyLevel": "unsafe", + "assessedBy": "automated", + "staleFingerprintWarning": null, + "rationale": "custom function `asyncApiSchemaValidation` — mechanics cannot be inferred statically", + "source": "bundled-default", + "fingerprint": "0d83ca445e8b385d3bde04a8516d6f16127043e8bd7ee455aa1979a000cdd379" + }, + "asyncapi-schema": { + "ruleId": "asyncapi-schema", + "riskLevel": "high", + "confidenceLevel": "low", + "remediationSafetyLevel": "unsafe", + "assessedBy": "automated", + "staleFingerprintWarning": null, + "rationale": "no recognizable rule-id, function, or path signal", + "source": "bundled-default", + "fingerprint": "2447324def94a87a5cd8a6ce78b8674b0dd401c06d8f6cf48a4fee7c5547a15f" + }, + "asyncapi-server-variables": { + "ruleId": "asyncapi-server-variables", + "riskLevel": "high", + "confidenceLevel": "low", + "remediationSafetyLevel": "unsafe", + "assessedBy": "automated", + "staleFingerprintWarning": null, + "rationale": "custom function `serverVariables` — mechanics cannot be inferred statically", + "source": "bundled-default", + "fingerprint": "a86cf9c53c3ed6f23c470d150edf449fe2af6bc3bb2cf00f6120dee1e15fbc37" + }, + "asyncapi-server-no-empty-variable": { + "ruleId": "asyncapi-server-no-empty-variable", + "riskLevel": "medium", "confidenceLevel": "high", - "remediationSafetyLevel": "safe", + "remediationSafetyLevel": "humanreview", "assessedBy": "automated", "staleFingerprintWarning": null, - "rationale": "seeded safe classification (bundled default, not yet human-reviewed)", + "rationale": "`pattern` function (rename/reformat) on a target matching the medium tier", "source": "bundled-default", - "fingerprint": "2c6ec4c507b68b273f5eb7c9917fba37fd61b8acc1ba4fc5df782bf3c2de8166" + "fingerprint": "995dd4127b0111b66a8ccd95296c472a7ea005236c7bfb64abfdb9938dfaa45e" }, - "asyncapi-operation-operationId": { - "ruleId": "asyncapi-operation-operationId", - "riskLevel": null, + "asyncapi-3-server-no-empty-variable": { + "ruleId": "asyncapi-3-server-no-empty-variable", + "riskLevel": "medium", "confidenceLevel": "high", "remediationSafetyLevel": "humanreview", "assessedBy": "automated", "staleFingerprintWarning": null, - "rationale": "seeded humanreview classification (bundled default, not yet human-reviewed)", + "rationale": "`pattern` function (rename/reformat) on a target matching the medium tier", "source": "bundled-default", - "fingerprint": "f607ca3c154f8db8839c2c0913d270d15ee3ea8aa244942f66f9b39e268da486" + "fingerprint": "c03357aacd4594dd77bd6823adeb8d811b559a59eb591c242ce499ec68064cd7" + }, + "asyncapi-server-no-trailing-slash": { + "ruleId": "asyncapi-server-no-trailing-slash", + "riskLevel": "medium", + "confidenceLevel": "high", + "remediationSafetyLevel": "humanreview", + "assessedBy": "automated", + "staleFingerprintWarning": null, + "rationale": "`pattern` function (rename/reformat) on a target matching the medium tier", + "source": "bundled-default", + "fingerprint": "73217cc58744bd1271f647e40904a6b5f65919124927e109532cf19cb540d2a2" + }, + "asyncapi-3-server-no-trailing-slash": { + "ruleId": "asyncapi-3-server-no-trailing-slash", + "riskLevel": "medium", + "confidenceLevel": "high", + "remediationSafetyLevel": "humanreview", + "assessedBy": "automated", + "staleFingerprintWarning": null, + "rationale": "`pattern` function (rename/reformat) on a target matching the medium tier", + "source": "bundled-default", + "fingerprint": "dbd318934d060020a4b9dd9ca039da253d8a9f5c5803c29997461ec59407c12d" }, "asyncapi-server-not-example-com": { "ruleId": "asyncapi-server-not-example-com", - "riskLevel": null, + "riskLevel": "medium", "confidenceLevel": "high", "remediationSafetyLevel": "humanreview", "assessedBy": "automated", "staleFingerprintWarning": null, - "rationale": "seeded humanreview classification (bundled default, not yet human-reviewed)", + "rationale": "`pattern` function (rename/reformat) on a target matching the medium tier", "source": "bundled-default", - "fingerprint": "03af955c89ab648c2ccfad4f7276d0c41d0a7b8557906723d91b9fc77310b55d" + "fingerprint": "6d16fc8ff55cfbbd2bce5a0ea27a6f480ef42bb7b3affe275bcabd86bd7d7b9c" }, "asyncapi-3-server-not-example-com": { "ruleId": "asyncapi-3-server-not-example-com", - "riskLevel": null, + "riskLevel": "medium", "confidenceLevel": "high", "remediationSafetyLevel": "humanreview", "assessedBy": "automated", "staleFingerprintWarning": null, - "rationale": "seeded humanreview classification (bundled default, not yet human-reviewed)", + "rationale": "`pattern` function (rename/reformat) on a target matching the medium tier", "source": "bundled-default", - "fingerprint": "979ffad6a6a767b3324a1c288a2fde45513e2fcab0e45f5ff93a3bc2ddb37533" + "fingerprint": "4ee30c83abcb62482873a1ce7892a7b20fc10af9f6b05fb7d542701bd3297eb1" }, - "asyncapi-operation-security": { - "ruleId": "asyncapi-operation-security", - "riskLevel": null, - "confidenceLevel": "high", + "asyncapi-server-security": { + "ruleId": "asyncapi-server-security", + "riskLevel": "high", + "confidenceLevel": "low", + "remediationSafetyLevel": "unsafe", + "assessedBy": "automated", + "staleFingerprintWarning": null, + "rationale": "custom function `asyncApiSecurity` — mechanics cannot be inferred statically", + "source": "bundled-default", + "fingerprint": "be7582c1a1a3477d540e542f3108e47fa54227c518701511b2ec6845841262b6" + }, + "asyncapi-servers": { + "ruleId": "asyncapi-servers", + "riskLevel": "medium", + "confidenceLevel": "medium", "remediationSafetyLevel": "humanreview", "assessedBy": "automated", "staleFingerprintWarning": null, - "rationale": "seeded humanreview classification (bundled default, not yet human-reviewed)", + "rationale": "given path matched the humanreview segment set", "source": "bundled-default", - "fingerprint": "167a6e72e70912bf806a952b5b7bcc172cfc47f91c9f47e1cfbcfec11921adab" + "fingerprint": "2e0f0eb538127a28f1225999b2b3f64cd883d2cfdadcdacdc1fcbdd506647864" }, - "asyncapi-3-operation-security": { - "ruleId": "asyncapi-3-operation-security", - "riskLevel": null, + "asyncapi-tag-description": { + "ruleId": "asyncapi-tag-description", + "riskLevel": "low", "confidenceLevel": "high", - "remediationSafetyLevel": "humanreview", + "remediationSafetyLevel": "safe", "assessedBy": "automated", "staleFingerprintWarning": null, - "rationale": "seeded humanreview classification (bundled default, not yet human-reviewed)", + "rationale": "`truthy` function (additive — add/populate a field) on a target matching the low tier", "source": "bundled-default", - "fingerprint": "c4b3dffd9fbdeb5970c2f3eceab232ce2ac835d1b61e9c1c4f6d4d873cc670bc" + "fingerprint": "473d407d2bfd3d2b87255230584e1d84a2593bdf3da8564188a3b4ad89523f4c" }, - "asyncapi-schema": { - "ruleId": "asyncapi-schema", - "riskLevel": null, + "asyncapi-3-tag-description": { + "ruleId": "asyncapi-3-tag-description", + "riskLevel": "low", "confidenceLevel": "high", + "remediationSafetyLevel": "safe", + "assessedBy": "automated", + "staleFingerprintWarning": null, + "rationale": "`truthy` function (additive — add/populate a field) on a target matching the low tier", + "source": "bundled-default", + "fingerprint": "73c4edb519ef9fedc1f27aec4e998c494655c83cef71aec29ac8b3081a95166f" + }, + "asyncapi-tags-alphabetical": { + "ruleId": "asyncapi-tags-alphabetical", + "riskLevel": "low", + "confidenceLevel": "medium", + "remediationSafetyLevel": "safe", + "assessedBy": "automated", + "staleFingerprintWarning": null, + "rationale": "given path matched the safe segment set", + "source": "bundled-default", + "fingerprint": "4be99ae3283c046c2bb1a4230f04773fdfcc70e86cb407e0b98a4774bab3fc7b" + }, + "asyncapi-3-tags-alphabetical": { + "ruleId": "asyncapi-3-tags-alphabetical", + "riskLevel": "low", + "confidenceLevel": "medium", + "remediationSafetyLevel": "safe", + "assessedBy": "automated", + "staleFingerprintWarning": null, + "rationale": "given path matched the safe segment set", + "source": "bundled-default", + "fingerprint": "ac440f78a0fcf503a6e8edd5122cd24c24f8876da489497f2496c753532f5721" + }, + "asyncapi-tags-uniqueness": { + "ruleId": "asyncapi-tags-uniqueness", + "riskLevel": "high", + "confidenceLevel": "low", "remediationSafetyLevel": "unsafe", "assessedBy": "automated", "staleFingerprintWarning": null, - "rationale": "seeded unsafe classification (bundled default, not yet human-reviewed)", + "rationale": "custom function `uniquenessTags` — mechanics cannot be inferred statically", "source": "bundled-default", - "fingerprint": "2447324def94a87a5cd8a6ce78b8674b0dd401c06d8f6cf48a4fee7c5547a15f" + "fingerprint": "e44055bacacef48a6edabbe0f1234dbaea5dd767771fc04ddfe50f44e3c9b16e" }, - "asyncapi-payload": { - "ruleId": "asyncapi-payload", - "riskLevel": null, + "asyncapi-3-tags-uniqueness": { + "ruleId": "asyncapi-3-tags-uniqueness", + "riskLevel": "high", + "confidenceLevel": "low", + "remediationSafetyLevel": "unsafe", + "assessedBy": "automated", + "staleFingerprintWarning": null, + "rationale": "custom function `uniquenessTags` — mechanics cannot be inferred statically", + "source": "bundled-default", + "fingerprint": "c84b94e3b094e2d7c179172f8fa23a4a3109edba85d059b6fcc041076fd39436" + }, + "asyncapi-tags": { + "ruleId": "asyncapi-tags", + "riskLevel": "low", + "confidenceLevel": "high", + "remediationSafetyLevel": "safe", + "assessedBy": "automated", + "staleFingerprintWarning": null, + "rationale": "`truthy` function (additive — add/populate a field) on a target matching the low tier", + "source": "bundled-default", + "fingerprint": "25d9843051c90a80ec8b2ac7abb63b36c62cf7189778bf775e7862459490ce6b" + }, + "asyncapi-3-tags": { + "ruleId": "asyncapi-3-tags", + "riskLevel": "low", "confidenceLevel": "high", + "remediationSafetyLevel": "safe", + "assessedBy": "automated", + "staleFingerprintWarning": null, + "rationale": "`truthy` function (additive — add/populate a field) on a target matching the low tier", + "source": "bundled-default", + "fingerprint": "61bd1ce568ee9740fb1909b0b9e1d10c4523ce3f7d920590268b13dbe830d6f1" + }, + "asyncapi-unused-components-schema": { + "ruleId": "asyncapi-unused-components-schema", + "riskLevel": "high", + "confidenceLevel": "low", + "remediationSafetyLevel": "unsafe", + "assessedBy": "automated", + "staleFingerprintWarning": null, + "rationale": "no recognizable rule-id, function, or path signal", + "source": "bundled-default", + "fingerprint": "ad2114ff35932d819129dd83aacc5af5b998faab8c59cf8a3628f97f2cd7696f" + }, + "asyncapi-unused-components-server": { + "ruleId": "asyncapi-unused-components-server", + "riskLevel": "medium", + "confidenceLevel": "medium", + "remediationSafetyLevel": "humanreview", + "assessedBy": "automated", + "staleFingerprintWarning": null, + "rationale": "given path matched the humanreview segment set", + "source": "bundled-default", + "fingerprint": "9b139f05f9c36f82841256dd8df2c4558fc8568f7051698347cb8723fc22b2b0" + }, + "asyncapi-3-document-resolved": { + "ruleId": "asyncapi-3-document-resolved", + "riskLevel": "high", + "confidenceLevel": "low", + "remediationSafetyLevel": "unsafe", + "assessedBy": "automated", + "staleFingerprintWarning": null, + "rationale": "no recognizable rule-id, function, or path signal", + "source": "bundled-default", + "fingerprint": "000fa2e885172a0f0776e07e8b1ac25ed6e8993f5d9e4eff7f95afb82e8269ac" + }, + "asyncapi-3-document-unresolved": { + "ruleId": "asyncapi-3-document-unresolved", + "riskLevel": "high", + "confidenceLevel": "low", "remediationSafetyLevel": "unsafe", "assessedBy": "automated", "staleFingerprintWarning": null, - "rationale": "seeded unsafe classification (bundled default, not yet human-reviewed)", + "rationale": "no recognizable rule-id, function, or path signal", "source": "bundled-default", - "fingerprint": "58be623c3acad195a9cafe9026928924f2701753c2ce5d4c2e94105aeb93668a" + "fingerprint": "4b40f541a13e64f709ff52b761001fd6e6bd0fbd1f99188b09cc498079aceb6c" } } } diff --git a/packages/api-grade-core/src/rulesets/bundled-analysis/openapi.json b/packages/api-grade-core/src/rulesets/bundled-analysis/openapi.json index dcc6bbf..3c079e0 100644 --- a/packages/api-grade-core/src/rulesets/bundled-analysis/openapi.json +++ b/packages/api-grade-core/src/rulesets/bundled-analysis/openapi.json @@ -1,169 +1,620 @@ { "rules": { - "operation-description": { - "ruleId": "operation-description", - "riskLevel": null, + "operation-success-response": { + "ruleId": "operation-success-response", + "riskLevel": "high", + "confidenceLevel": "low", + "remediationSafetyLevel": "unsafe", + "assessedBy": "automated", + "staleFingerprintWarning": null, + "rationale": "custom function `oasOpSuccessResponse` — mechanics cannot be inferred statically", + "source": "bundled-default", + "fingerprint": "61fd168d9bd22c73ec4f5a8b1b0c3b7cef34953e54fe968d2bfcf652e15b9035" + }, + "oas2-operation-formData-consume-check": { + "ruleId": "oas2-operation-formData-consume-check", + "riskLevel": "high", + "confidenceLevel": "low", + "remediationSafetyLevel": "unsafe", + "assessedBy": "automated", + "staleFingerprintWarning": null, + "rationale": "custom function `oasOpFormDataConsumeCheck` — mechanics cannot be inferred statically", + "source": "bundled-default", + "fingerprint": "031f30e3251314ff995ec48893ac7bb71f2cd1ea501aa99206f4835e96efd905" + }, + "operation-operationId-unique": { + "ruleId": "operation-operationId-unique", + "riskLevel": "high", + "confidenceLevel": "low", + "remediationSafetyLevel": "unsafe", + "assessedBy": "automated", + "staleFingerprintWarning": null, + "rationale": "custom function `oasOpIdUnique` — mechanics cannot be inferred statically", + "source": "bundled-default", + "fingerprint": "193b1b016bbb37d15bb9cea1a27551b57e17d9e40144c2b7cbe43746dcc2564a" + }, + "operation-parameters": { + "ruleId": "operation-parameters", + "riskLevel": "high", + "confidenceLevel": "low", + "remediationSafetyLevel": "unsafe", + "assessedBy": "automated", + "staleFingerprintWarning": null, + "rationale": "custom function `oasOpParams` — mechanics cannot be inferred statically", + "source": "bundled-default", + "fingerprint": "b19bdf139c3a32e2216b0ef46ab992fccaba72810530d778dc14ffab59a1c6cf" + }, + "operation-tag-defined": { + "ruleId": "operation-tag-defined", + "riskLevel": "high", + "confidenceLevel": "low", + "remediationSafetyLevel": "unsafe", + "assessedBy": "automated", + "staleFingerprintWarning": null, + "rationale": "custom function `oasTagDefined` — mechanics cannot be inferred statically", + "source": "bundled-default", + "fingerprint": "8d46785b548305e07643e22ed832b2cce697964b3e24edabce26427f203a4642" + }, + "path-params": { + "ruleId": "path-params", + "riskLevel": "high", + "confidenceLevel": "low", + "remediationSafetyLevel": "unsafe", + "assessedBy": "automated", + "staleFingerprintWarning": null, + "rationale": "custom function `oasPathParam` — mechanics cannot be inferred statically", + "source": "bundled-default", + "fingerprint": "3e1ede2366a6dbfac8d8bb13681255e89ec40881316ddf844ace566bb46f9499" + }, + "contact-properties": { + "ruleId": "contact-properties", + "riskLevel": "low", "confidenceLevel": "high", "remediationSafetyLevel": "safe", "assessedBy": "automated", "staleFingerprintWarning": null, - "rationale": "seeded safe classification (bundled default, not yet human-reviewed)", + "rationale": "`truthy` function (additive — add/populate a field) on a target matching the low tier", + "source": "bundled-default", + "fingerprint": "d59768d6529d4838dd2ea7ef0e9ff6527c897b12acffc53b949c22ebfaa1c923" + }, + "duplicated-entry-in-enum": { + "ruleId": "duplicated-entry-in-enum", + "riskLevel": "high", + "confidenceLevel": "low", + "remediationSafetyLevel": "unsafe", + "assessedBy": "automated", + "staleFingerprintWarning": null, + "rationale": "custom function `oasSchema` — mechanics cannot be inferred statically", "source": "bundled-default", - "fingerprint": "0305ec86216b7eefa356cc733bc9d7569aec013ba372cd4c571e8583aded08ec" + "fingerprint": "81a9374e76a50934f9c77375cfcd8af39454642a446e7ff0a68d94a5d98a5691" }, "info-contact": { "ruleId": "info-contact", - "riskLevel": null, + "riskLevel": "low", "confidenceLevel": "high", "remediationSafetyLevel": "safe", "assessedBy": "automated", "staleFingerprintWarning": null, - "rationale": "seeded safe classification (bundled default, not yet human-reviewed)", + "rationale": "`truthy` function (additive — add/populate a field) on a target matching the low tier", "source": "bundled-default", - "fingerprint": "72e1580b37bf865c7918de14a114f28ea64b852e8cbae51ffad2351783bf9100" + "fingerprint": "bfd79d377660294e00adc373539b65c8e6b03a4c6cb29033758e421ce4f5557d" }, "info-description": { "ruleId": "info-description", - "riskLevel": null, + "riskLevel": "low", "confidenceLevel": "high", "remediationSafetyLevel": "safe", "assessedBy": "automated", "staleFingerprintWarning": null, - "rationale": "seeded safe classification (bundled default, not yet human-reviewed)", + "rationale": "`truthy` function (additive — add/populate a field) on a target matching the low tier", "source": "bundled-default", - "fingerprint": "a93095e4601bcadb3ab69e44d3208cdf08f870032b30c18140fcce99905bad7a" + "fingerprint": "f898530e669aaddf554b30a434a28412d61f3f47e196e4093cf73acfcde645e6" }, "info-license": { "ruleId": "info-license", - "riskLevel": null, + "riskLevel": "low", "confidenceLevel": "high", "remediationSafetyLevel": "safe", "assessedBy": "automated", "staleFingerprintWarning": null, - "rationale": "seeded safe classification (bundled default, not yet human-reviewed)", + "rationale": "`truthy` function (additive — add/populate a field) on a target matching the low tier", "source": "bundled-default", - "fingerprint": "f265560911a15d556ded4d9c321dbc40cbcae30680021591fda50ab0d1b34114" + "fingerprint": "23d1efd67b85ecf0f54155777e27c019d4cb2a29bb0cff11efd849cc386c379f" }, - "oas3-examples-value-or-externalValue": { - "ruleId": "oas3-examples-value-or-externalValue", - "riskLevel": null, + "license-url": { + "ruleId": "license-url", + "riskLevel": "low", "confidenceLevel": "high", "remediationSafetyLevel": "safe", "assessedBy": "automated", "staleFingerprintWarning": null, - "rationale": "seeded safe classification (bundled default, not yet human-reviewed)", + "rationale": "`truthy` function (additive — add/populate a field) on a target matching the low tier", "source": "bundled-default", - "fingerprint": "7628f3c64ad0fb1e3b5791edb30e002eda5e0629e0cc432c8bb286f5363dd0aa" + "fingerprint": "735eb6f03143a91473b9573ab7e1526c94c10543ced941a88183a65b4d6c8d82" }, - "tag-description": { - "ruleId": "tag-description", - "riskLevel": null, + "no-eval-in-markdown": { + "ruleId": "no-eval-in-markdown", + "riskLevel": "low", "confidenceLevel": "high", "remediationSafetyLevel": "safe", "assessedBy": "automated", "staleFingerprintWarning": null, - "rationale": "seeded safe classification (bundled default, not yet human-reviewed)", + "rationale": "`pattern` function (rename/reformat) on a target matching the low tier", "source": "bundled-default", - "fingerprint": "69087f9af752a6b57dd7f3e0b7aeaa12a3262f2abe6e41d796c3234627947034" + "fingerprint": "d3ca7e2f67d2e957e30655eb4c6d567998bdbbdcbf390a12e71a23fdce3157fd" + }, + "no-script-tags-in-markdown": { + "ruleId": "no-script-tags-in-markdown", + "riskLevel": "low", + "confidenceLevel": "high", + "remediationSafetyLevel": "safe", + "assessedBy": "automated", + "staleFingerprintWarning": null, + "rationale": "`pattern` function (rename/reformat) on a target matching the low tier", + "source": "bundled-default", + "fingerprint": "be2229ebfd803a42d313318291544db932781c34973e60620fffda348fd0f394" + }, + "openapi-tags-alphabetical": { + "ruleId": "openapi-tags-alphabetical", + "riskLevel": "low", + "confidenceLevel": "medium", + "remediationSafetyLevel": "safe", + "assessedBy": "automated", + "staleFingerprintWarning": null, + "rationale": "given path matched the safe segment set", + "source": "bundled-default", + "fingerprint": "4403fea510bcccabfa5c4c1154134a92b2c61acc1421668162e163f4791790d4" + }, + "openapi-tags-uniqueness": { + "ruleId": "openapi-tags-uniqueness", + "riskLevel": "high", + "confidenceLevel": "low", + "remediationSafetyLevel": "unsafe", + "assessedBy": "automated", + "staleFingerprintWarning": null, + "rationale": "custom function `uniquenessTags` — mechanics cannot be inferred statically", + "source": "bundled-default", + "fingerprint": "dfaddcf0cbfe5b642ebcdaaeb580a9020a1db9ed6d484744e0a9088da0132aae" + }, + "openapi-tags": { + "ruleId": "openapi-tags", + "riskLevel": "low", + "confidenceLevel": "medium", + "remediationSafetyLevel": "safe", + "assessedBy": "automated", + "staleFingerprintWarning": null, + "rationale": "given path matched the safe segment set", + "source": "bundled-default", + "fingerprint": "995a098be9924c375e7b258224804477fe7b175f2cc70e785782672b1e3c72a7" + }, + "operation-description": { + "ruleId": "operation-description", + "riskLevel": "low", + "confidenceLevel": "high", + "remediationSafetyLevel": "safe", + "assessedBy": "automated", + "staleFingerprintWarning": null, + "rationale": "`truthy` function (additive — add/populate a field) on a target matching the low tier", + "source": "bundled-default", + "fingerprint": "163bd18a22ce5304b12d4d35f4c12b430d66147a04a3cd27505a9a61ddaa9dc8" }, "operation-operationId": { "ruleId": "operation-operationId", - "riskLevel": null, + "riskLevel": "medium", "confidenceLevel": "high", "remediationSafetyLevel": "humanreview", "assessedBy": "automated", "staleFingerprintWarning": null, - "rationale": "seeded humanreview classification (bundled default, not yet human-reviewed)", + "rationale": "`truthy` function (additive — add/populate a field) on a target matching the medium tier", "source": "bundled-default", - "fingerprint": "5a4634a447eeeb9f2ce5d4f01ca7c95ed5e5249c27da82bac592abcf85759b30" + "fingerprint": "d4a660a6541790935065ad7a944cb0b3abf3a347e616d9737944fd1277065a88" }, - "operation-success-response": { - "ruleId": "operation-success-response", - "riskLevel": null, + "operation-operationId-valid-in-url": { + "ruleId": "operation-operationId-valid-in-url", + "riskLevel": "medium", "confidenceLevel": "high", "remediationSafetyLevel": "humanreview", "assessedBy": "automated", "staleFingerprintWarning": null, - "rationale": "seeded humanreview classification (bundled default, not yet human-reviewed)", + "rationale": "`pattern` function (rename/reformat) on a target matching the medium tier", "source": "bundled-default", - "fingerprint": "7b9badb6a9e915bc586c7fb2b77c9ef9fd2b415bf10e0ce6bd4049f0198668e8" + "fingerprint": "e66dd1ecb7649ac12ae77fcf35f033f4d7bffc423d43e8f1ad4a3e08470d3d10" }, - "oas3-server-not-example.com": { - "ruleId": "oas3-server-not-example.com", - "riskLevel": null, + "operation-singular-tag": { + "ruleId": "operation-singular-tag", + "riskLevel": "low", + "confidenceLevel": "medium", + "remediationSafetyLevel": "safe", + "assessedBy": "automated", + "staleFingerprintWarning": null, + "rationale": "given path matched the safe segment set", + "source": "bundled-default", + "fingerprint": "dba7be8211664f594c6467645bea5fb72955f15e009e96808fc3d37b997627ef" + }, + "operation-tags": { + "ruleId": "operation-tags", + "riskLevel": "low", + "confidenceLevel": "medium", + "remediationSafetyLevel": "safe", + "assessedBy": "automated", + "staleFingerprintWarning": null, + "rationale": "given path matched the safe segment set", + "source": "bundled-default", + "fingerprint": "eaf01c3f7f1b15840d54fe8dd6e238795022e175776ee4e71e5c039d46aff1ac" + }, + "path-declarations-must-exist": { + "ruleId": "path-declarations-must-exist", + "riskLevel": "medium", "confidenceLevel": "high", "remediationSafetyLevel": "humanreview", "assessedBy": "automated", "staleFingerprintWarning": null, - "rationale": "seeded humanreview classification (bundled default, not yet human-reviewed)", + "rationale": "`pattern` function (rename/reformat) on a target matching the medium tier", "source": "bundled-default", - "fingerprint": "f6b1052227e31afd8ba10018ea2a3073fd8093e18db7d0994d5d8b484a1803ea" + "fingerprint": "f43a9d6a968dc24d5717f606028c828be8ea73057afcfdcb173d6f2528335408" }, - "oas3-server-trailing-slash": { - "ruleId": "oas3-server-trailing-slash", - "riskLevel": null, + "path-keys-no-trailing-slash": { + "ruleId": "path-keys-no-trailing-slash", + "riskLevel": "medium", "confidenceLevel": "high", "remediationSafetyLevel": "humanreview", "assessedBy": "automated", "staleFingerprintWarning": null, - "rationale": "seeded humanreview classification (bundled default, not yet human-reviewed)", + "rationale": "`pattern` function (rename/reformat) on a target matching the medium tier", "source": "bundled-default", - "fingerprint": "f75935967e02e5c13a00c31220e4a33645b248b1078c6574384caded25877ec7" + "fingerprint": "ea8d153efe6ac9cc8942c5f0292402751fa472ee5776e8a9a8dcafa7f99826a4" }, - "oas3-operation-security-defined": { - "ruleId": "oas3-operation-security-defined", - "riskLevel": null, + "path-not-include-query": { + "ruleId": "path-not-include-query", + "riskLevel": "medium", + "confidenceLevel": "high", + "remediationSafetyLevel": "humanreview", + "assessedBy": "automated", + "staleFingerprintWarning": null, + "rationale": "`pattern` function (rename/reformat) on a target matching the medium tier", + "source": "bundled-default", + "fingerprint": "e2738a25c076ae7f0ef2187431847d76fcab2f3e3d9f543b392f1ab9fa7138c6" + }, + "tag-description": { + "ruleId": "tag-description", + "riskLevel": "low", + "confidenceLevel": "high", + "remediationSafetyLevel": "safe", + "assessedBy": "automated", + "staleFingerprintWarning": null, + "rationale": "`truthy` function (additive — add/populate a field) on a target matching the low tier", + "source": "bundled-default", + "fingerprint": "5119c1fc513d486764bfe8f50d2479aaee0d096de0e79cee023e63848df5f71c" + }, + "no-$ref-siblings": { + "ruleId": "no-$ref-siblings", + "riskLevel": "high", + "confidenceLevel": "low", + "remediationSafetyLevel": "unsafe", + "assessedBy": "automated", + "staleFingerprintWarning": null, + "rationale": "custom function `refSiblings` — mechanics cannot be inferred statically", + "source": "bundled-default", + "fingerprint": "908ccb23bbe35f2ea80f23a30cdbb6ddd56f1b62d0df483600c7f94a35a39199" + }, + "array-items": { + "ruleId": "array-items", + "riskLevel": "high", + "confidenceLevel": "high", + "remediationSafetyLevel": "unsafe", + "assessedBy": "automated", + "staleFingerprintWarning": null, + "rationale": "`truthy` function (additive — add/populate a field) on a target matching the high tier", + "source": "bundled-default", + "fingerprint": "3aa712ec537da47f9024300f105626c3bd25cad6951981e4f985e89f5c3dbb98" + }, + "typed-enum": { + "ruleId": "typed-enum", + "riskLevel": "high", + "confidenceLevel": "low", + "remediationSafetyLevel": "unsafe", + "assessedBy": "automated", + "staleFingerprintWarning": null, + "rationale": "custom function `typedEnum` — mechanics cannot be inferred statically", + "source": "bundled-default", + "fingerprint": "ee1665cbcfeb49e3c434e3f04c6adcede44ec1e306f875701236dd747bd27c5e" + }, + "oas2-api-host": { + "ruleId": "oas2-api-host", + "riskLevel": "low", + "confidenceLevel": "high", + "remediationSafetyLevel": "safe", + "assessedBy": "automated", + "staleFingerprintWarning": null, + "rationale": "`truthy` function (additive — add/populate a field) on a target matching the low tier", + "source": "bundled-default", + "fingerprint": "606618dc6b0cd08c07fbe6aa8cccb58d0126d1ee387a43acb60adae316456d6d" + }, + "oas2-api-schemes": { + "ruleId": "oas2-api-schemes", + "riskLevel": "high", + "confidenceLevel": "low", + "remediationSafetyLevel": "unsafe", + "assessedBy": "automated", + "staleFingerprintWarning": null, + "rationale": "no recognizable rule-id, function, or path signal", + "source": "bundled-default", + "fingerprint": "8341742f2c75ae195b1e699c4625a845827a82db83eca103a900ebe8bb8d220d" + }, + "oas2-discriminator": { + "ruleId": "oas2-discriminator", + "riskLevel": "high", + "confidenceLevel": "low", + "remediationSafetyLevel": "unsafe", + "assessedBy": "automated", + "staleFingerprintWarning": null, + "rationale": "custom function `oasDiscriminator` — mechanics cannot be inferred statically", + "source": "bundled-default", + "fingerprint": "4f852c5d830bd5f2b2705ed4f41555099dbab8d164776ecaf03dde7a66a7525d" + }, + "oas2-host-not-example": { + "ruleId": "oas2-host-not-example", + "riskLevel": "medium", + "confidenceLevel": "high", + "remediationSafetyLevel": "humanreview", + "assessedBy": "automated", + "staleFingerprintWarning": null, + "rationale": "`pattern` function (rename/reformat) on a target matching the medium tier", + "source": "bundled-default", + "fingerprint": "411bdde4f2ea237f4c2871f690551e6f5f6de96501448981823681e2b8dc9e5e" + }, + "oas2-host-trailing-slash": { + "ruleId": "oas2-host-trailing-slash", + "riskLevel": "medium", "confidenceLevel": "high", "remediationSafetyLevel": "humanreview", "assessedBy": "automated", "staleFingerprintWarning": null, - "rationale": "seeded humanreview classification (bundled default, not yet human-reviewed)", + "rationale": "`pattern` function (rename/reformat) on a target matching the medium tier", + "source": "bundled-default", + "fingerprint": "ab7f3cd618bb07568e09911593a5eeab78ee8c22d04a0048696320863d562a16" + }, + "oas2-parameter-description": { + "ruleId": "oas2-parameter-description", + "riskLevel": "high", + "confidenceLevel": "medium", + "remediationSafetyLevel": "unsafe", + "assessedBy": "automated", + "staleFingerprintWarning": null, + "rationale": "`truthy` function (additive — add/populate a field) on a target matching the high tier", "source": "bundled-default", - "fingerprint": "8b717395498dd587200620a21dd258dce0573157c683f118faa08c1b9268c5dd" + "fingerprint": "a71cf43534a262ed4e89f50f8f68892d376e09e6c7575cdec3d873c58a204aa9" }, "oas2-operation-security-defined": { "ruleId": "oas2-operation-security-defined", - "riskLevel": null, + "riskLevel": "high", + "confidenceLevel": "low", + "remediationSafetyLevel": "unsafe", + "assessedBy": "automated", + "staleFingerprintWarning": null, + "rationale": "custom function `oasSecurityDefined` — mechanics cannot be inferred statically", + "source": "bundled-default", + "fingerprint": "099d37d08607e8e42d0548e69312ec7e1499e8566c0d35b7e73e2ae14c7eb8ef" + }, + "oas2-valid-schema-example": { + "ruleId": "oas2-valid-schema-example", + "riskLevel": "high", + "confidenceLevel": "low", + "remediationSafetyLevel": "unsafe", + "assessedBy": "automated", + "staleFingerprintWarning": null, + "rationale": "custom function `oasExample` — mechanics cannot be inferred statically", + "source": "bundled-default", + "fingerprint": "0c45706676b8341bb78ab9224b84e55bd7603b459e42c9bfa7a4634707af3f03" + }, + "oas2-valid-media-example": { + "ruleId": "oas2-valid-media-example", + "riskLevel": "high", + "confidenceLevel": "low", + "remediationSafetyLevel": "unsafe", + "assessedBy": "automated", + "staleFingerprintWarning": null, + "rationale": "custom function `oasExample` — mechanics cannot be inferred statically", + "source": "bundled-default", + "fingerprint": "aa301d7f693d177527187bad6ab0080c3dbbfec2b1a7e544f72b0a5b8b24b436" + }, + "oas2-anyOf": { + "ruleId": "oas2-anyOf", + "riskLevel": "high", + "confidenceLevel": "low", + "remediationSafetyLevel": "unsafe", + "assessedBy": "automated", + "staleFingerprintWarning": null, + "rationale": "no recognizable rule-id, function, or path signal", + "source": "bundled-default", + "fingerprint": "83239f968899d5a54b93e5f9406fd2ec2d6da0848c6139219e6900aad75ba877" + }, + "oas2-oneOf": { + "ruleId": "oas2-oneOf", + "riskLevel": "high", + "confidenceLevel": "low", + "remediationSafetyLevel": "unsafe", + "assessedBy": "automated", + "staleFingerprintWarning": null, + "rationale": "no recognizable rule-id, function, or path signal", + "source": "bundled-default", + "fingerprint": "482d2dceaa7d6683611a89e9b8408672b68c8e027a99788b35c19ab028ea48e5" + }, + "oas2-schema": { + "ruleId": "oas2-schema", + "riskLevel": "high", + "confidenceLevel": "low", + "remediationSafetyLevel": "unsafe", + "assessedBy": "automated", + "staleFingerprintWarning": null, + "rationale": "custom function `oasDocumentSchema` — mechanics cannot be inferred statically", + "source": "bundled-default", + "fingerprint": "b0f746a1d1d1b5f4aca6963162ac4c9d2ca2e7011302be6e03e6ad7e4163f50a" + }, + "oas2-unused-definition": { + "ruleId": "oas2-unused-definition", + "riskLevel": "high", + "confidenceLevel": "low", + "remediationSafetyLevel": "unsafe", + "assessedBy": "automated", + "staleFingerprintWarning": null, + "rationale": "no recognizable rule-id, function, or path signal", + "source": "bundled-default", + "fingerprint": "db2cdd3c98956ad2642445a26bbcbbe616ae5e376b25f765d6646d3bc930ebb7" + }, + "oas3-api-servers": { + "ruleId": "oas3-api-servers", + "riskLevel": "medium", + "confidenceLevel": "medium", + "remediationSafetyLevel": "humanreview", + "assessedBy": "automated", + "staleFingerprintWarning": null, + "rationale": "given path matched the humanreview segment set", + "source": "bundled-default", + "fingerprint": "2db1e86eb4b72586a0316dc0ca473b738159b3175909eb12064ee62d9097d2a3" + }, + "oas3-examples-value-or-externalValue": { + "ruleId": "oas3-examples-value-or-externalValue", + "riskLevel": "high", + "confidenceLevel": "low", + "remediationSafetyLevel": "unsafe", + "assessedBy": "automated", + "staleFingerprintWarning": null, + "rationale": "given path matched multiple tiers (safe, unsafe) — conservative match, ambiguous", + "source": "bundled-default", + "fingerprint": "9ade61fceb521b0eb63ad4ca568a614cc05eded709048f2da4520116006cfe61" + }, + "oas3-operation-security-defined": { + "ruleId": "oas3-operation-security-defined", + "riskLevel": "high", + "confidenceLevel": "low", + "remediationSafetyLevel": "unsafe", + "assessedBy": "automated", + "staleFingerprintWarning": null, + "rationale": "custom function `oasSecurityDefined` — mechanics cannot be inferred statically", + "source": "bundled-default", + "fingerprint": "99c4d02ae16e11a76b219fb92f75c67c4a3ce8150e6574f09c0c49c4c1af85cd" + }, + "oas3-parameter-description": { + "ruleId": "oas3-parameter-description", + "riskLevel": "high", + "confidenceLevel": "medium", + "remediationSafetyLevel": "unsafe", + "assessedBy": "automated", + "staleFingerprintWarning": null, + "rationale": "`truthy` function (additive — add/populate a field) on a target matching the high tier", + "source": "bundled-default", + "fingerprint": "2c00927d4b0b8bcb9bc3746ca56a9f65dd3bd3f06bc189ae7d51d9d630d01640" + }, + "oas3-server-not-example.com": { + "ruleId": "oas3-server-not-example.com", + "riskLevel": "medium", "confidenceLevel": "high", "remediationSafetyLevel": "humanreview", "assessedBy": "automated", "staleFingerprintWarning": null, - "rationale": "seeded humanreview classification (bundled default, not yet human-reviewed)", + "rationale": "`pattern` function (rename/reformat) on a target matching the medium tier", "source": "bundled-default", - "fingerprint": "0873339c33c75d41bca58448b9fe29c4e263c21d17dd97309c52d53b054c0722" + "fingerprint": "006381a3500034d8448234c9a2bc150b35b8eb1de914cd3d14fa8d83a176e172" }, - "oas3-schema": { - "ruleId": "oas3-schema", - "riskLevel": null, + "oas3-server-trailing-slash": { + "ruleId": "oas3-server-trailing-slash", + "riskLevel": "medium", "confidenceLevel": "high", + "remediationSafetyLevel": "humanreview", + "assessedBy": "automated", + "staleFingerprintWarning": null, + "rationale": "`pattern` function (rename/reformat) on a target matching the medium tier", + "source": "bundled-default", + "fingerprint": "842d08729f0957b1bd259f16efbba29ae28dfa018339113170bde28633a0db7f" + }, + "oas3-valid-media-example": { + "ruleId": "oas3-valid-media-example", + "riskLevel": "high", + "confidenceLevel": "low", "remediationSafetyLevel": "unsafe", "assessedBy": "automated", "staleFingerprintWarning": null, - "rationale": "seeded unsafe classification (bundled default, not yet human-reviewed)", + "rationale": "custom function `oasExample` — mechanics cannot be inferred statically", "source": "bundled-default", - "fingerprint": "3859ddd4e60b7e4907b17e90f91d676515808af7ea23440350c975c02705a19f" + "fingerprint": "d9020856ee9e770f11c51458177ca355c8f96a93877862fbad3dc2d0af38ac54" }, "oas3-valid-schema-example": { "ruleId": "oas3-valid-schema-example", - "riskLevel": null, - "confidenceLevel": "high", + "riskLevel": "high", + "confidenceLevel": "low", "remediationSafetyLevel": "unsafe", "assessedBy": "automated", "staleFingerprintWarning": null, - "rationale": "seeded unsafe classification (bundled default, not yet human-reviewed)", + "rationale": "custom function `oasExample` — mechanics cannot be inferred statically", "source": "bundled-default", - "fingerprint": "9aa73f0cdf9afd7ea70998b65471ec31a57c8052dc580a583512edad5dc915dc" + "fingerprint": "5c7f466d75908f95e4f652d5adfc7b08d6dd9b1b2bea7c1987a0798c9a3dcd10" }, - "oas2-schema": { - "ruleId": "oas2-schema", - "riskLevel": null, - "confidenceLevel": "high", + "oas3-schema": { + "ruleId": "oas3-schema", + "riskLevel": "high", + "confidenceLevel": "low", + "remediationSafetyLevel": "unsafe", + "assessedBy": "automated", + "staleFingerprintWarning": null, + "rationale": "custom function `oasDocumentSchema` — mechanics cannot be inferred statically", + "source": "bundled-default", + "fingerprint": "2ec6ef11c1e00ad817c6f9b838bcb2739c4c72a699c03a631d1171dc063f4034" + }, + "oas3-unused-component": { + "ruleId": "oas3-unused-component", + "riskLevel": "high", + "confidenceLevel": "low", + "remediationSafetyLevel": "unsafe", + "assessedBy": "automated", + "staleFingerprintWarning": null, + "rationale": "custom function `oasUnusedComponent` — mechanics cannot be inferred statically", + "source": "bundled-default", + "fingerprint": "c975b2b51bddaa30766251d29190e6a4ae6d99a2a623dcf56a171cb8ce1255a6" + }, + "oas3-server-variables": { + "ruleId": "oas3-server-variables", + "riskLevel": "high", + "confidenceLevel": "low", + "remediationSafetyLevel": "unsafe", + "assessedBy": "automated", + "staleFingerprintWarning": null, + "rationale": "custom function `serverVariables` — mechanics cannot be inferred statically", + "source": "bundled-default", + "fingerprint": "d258d7ca2a322070e5314d24bdf7d8838a42cfab024ff4a3e39e560574ac8dde" + }, + "oas3-callbacks-in-callbacks": { + "ruleId": "oas3-callbacks-in-callbacks", + "riskLevel": "high", + "confidenceLevel": "low", + "remediationSafetyLevel": "unsafe", + "assessedBy": "automated", + "staleFingerprintWarning": null, + "rationale": "no recognizable rule-id, function, or path signal", + "source": "bundled-default", + "fingerprint": "8239e6a7a792354786211a3a29afc4ab82620ae5a15dd5e384a51ed12e265bfe" + }, + "oas3_1-servers-in-webhook": { + "ruleId": "oas3_1-servers-in-webhook", + "riskLevel": "medium", + "confidenceLevel": "medium", + "remediationSafetyLevel": "humanreview", + "assessedBy": "automated", + "staleFingerprintWarning": null, + "rationale": "given path matched the humanreview segment set", + "source": "bundled-default", + "fingerprint": "557e24a270abf7b2b600bc993949675b105927fb3238b8534c493888cc73b6cd" + }, + "oas3_1-callbacks-in-webhook": { + "ruleId": "oas3_1-callbacks-in-webhook", + "riskLevel": "high", + "confidenceLevel": "low", "remediationSafetyLevel": "unsafe", "assessedBy": "automated", "staleFingerprintWarning": null, - "rationale": "seeded unsafe classification (bundled default, not yet human-reviewed)", + "rationale": "no recognizable rule-id, function, or path signal", "source": "bundled-default", - "fingerprint": "897ddabf2e24de2b765760eae0a30e6914c10375de18e5d2dc27fe307348d23c" + "fingerprint": "b806759bcaa2062f1331821f0635ace112d38957a954e967b7308c21b56260c3" } } } diff --git a/specs/012-remediation-safety/quickstart.md b/specs/012-remediation-safety/quickstart.md index 3faf42a..2ed59c6 100644 --- a/specs/012-remediation-safety/quickstart.md +++ b/specs/012-remediation-safety/quickstart.md @@ -35,15 +35,15 @@ Each returned item now includes `riskLevel`, `confidenceLevel`, `remediationSafe ```bash api-grade ruleset-analysis --format human # rule id risk level confidence remediation safety assessed by rationale -# operation-description n/a high safe automated seeded safe classification (bundled default, not yet human-reviewed) -# operation-operationId n/a high humanreview automated seeded humanreview classification (bundled default, not yet human-reviewed) -# oas3-schema high low unsafe automated no recognizable rule-id, function, or path signal +# operation-description low high safe automated `truthy` function (additive — add/populate a field) on a target matching the low tier +# operation-operationId medium high humanreview automated `truthy` function (additive — add/populate a field) on a target matching the medium tier +# oas3-schema high low unsafe automated custom function `oasDocumentSchema` — mechanics cannot be inferred statically # custom-team-rule-007 low high safe human WARNING: fingerprint mismatch (stored a1b2c3..., current d4e5f6...) — rule changed since this was last reviewed; persisted classification still honored api-grade ruleset-analysis --ruleset-path ./my-ruleset.yaml --format json ``` -The built-in ruleset's bundled entries (FR-012/FR-020) are seeded defaults — `assessedBy: "automated"` — not a maintainer's reviewed judgement; they exist so the baked-in ruleset has *some* classification before anyone has reviewed it, not as a substitute for review. A maintainer who actually reviews a rule and runs `ruleset-analysis correct` on it (the same mechanism a user would use to persist a correction for their own ruleset, FR-013) produces a genuine `assessedBy: "human"` entry. The last row illustrates FR-021: a human-assessed entry whose rule definition has since changed is still honored, but flagged with both the stored and current fingerprint rather than silently discarded. +The built-in ruleset's bundled entries (FR-012/FR-020) are pre-computed by running the Stage 1/2 heuristic over every rule once at build time — `assessedBy: "automated"` — not a maintainer's reviewed judgement; this only avoids recomputing the heuristic per request (SC-007), it is not a substitute for review. A maintainer who actually reviews a rule and runs `ruleset-analysis correct` on it (the same mechanism a user would use to persist a correction for their own ruleset, FR-013) produces a genuine `assessedBy: "human"` entry that overrides the bundled default. The last row illustrates FR-021: a human-assessed entry whose rule definition has since changed is still honored, but flagged with both the stored and current fingerprint rather than silently discarded. ## 3. MCP: same filtering, plus a dedicated ruleset-analysis tool From eb5d4f2e0e9f0859aebbcb632253a4bb1ad46bb1 Mon Sep 17 00:00:00 2001 From: DawMatt Date: Fri, 26 Jun 2026 10:59:33 +1000 Subject: [PATCH 11/22] Recent improvements baked into spec and tasks --- docs/cli/commands.md | 33 ++++++-- docs/package/api-grade-mcp.md | 2 +- docs/package/api-reference.md | 62 +++++++++++--- .../scripts/generate-bundled-analysis.mjs | 80 ++++++++++--------- packages/api-grade-core/src/formatter.ts | 15 +++- packages/api-grade-core/src/json-output.ts | 14 +++- .../api-grade-core/src/remediation-safety.ts | 15 +--- packages/api-grade-core/src/types.ts | 12 ++- .../contracts/remediation-safety-surfaces.md | 3 + specs/012-remediation-safety/data-model.md | 37 ++++++++- specs/012-remediation-safety/plan.md | 18 +++-- specs/012-remediation-safety/spec.md | 6 ++ specs/012-remediation-safety/tasks.md | 25 ++++++ src/cli/index.ts | 17 ++-- src/cli/ruleset-analysis-cli.ts | 6 +- src/cli/ruleset-config-cli.ts | 6 +- tests/integration/cli-json-output.test.ts | 39 ++++++--- 17 files changed, 280 insertions(+), 110 deletions(-) diff --git a/docs/cli/commands.md b/docs/cli/commands.md index f4038b9..2c47edc 100644 --- a/docs/cli/commands.md +++ b/docs/cli/commands.md @@ -319,9 +319,10 @@ api-grade openapi.yaml --ruleset my-rules.yaml > `diagnosticCounts` wrapper. See [CHANGELOG.md](../../CHANGELOG.md) for the > old → new field mapping. -When using `--format json`, the output is a JSON object with the same flat field -names used by the MCP server's `grade-api` / `grade-api-detailed` tools — one -parser works for both: +When using `--format json`, the output is **pretty-printed** (two-space indented, like every +other end-user-visible JSON document this CLI prints — no compact/minified output) and is a +JSON object with the same flat field names used by the MCP server's `grade-api` / +`grade-api-detailed` tools — one parser works for both: ```json { @@ -351,13 +352,29 @@ parser works for both: "message": "\"version\" property must be string.", "severity": "error", "path": ["info", "version"], - "range": { "start": { "line": 3, "character": 0 }, "end": { "line": 3, "character": 5 } } + "range": { "start": { "line": 3, "character": 0 }, "end": { "line": 3, "character": 5 } }, + "source": "openapi.yaml", + "riskLevel": "high", + "confidenceLevel": "low", + "remediationSafetyLevel": "unsafe", + "staleFingerprintWarning": null } ], "rulesetSource": "default" } ``` +Every diagnostic always carries `riskLevel`, `confidenceLevel`, `remediationSafetyLevel`, and +`staleFingerprintWarning` — the same per-violation remediation-safety signals described below +under [Remediation Safety](#remediation-safety---remediation-safety-level) — computed from the +ruleset analyser against the spec's effective ruleset. This is **not** gated behind +`--remediation-safety`; it is present on every `--format json` (and, as a one-line +`safety=... risk=... confidence=...` annotation under each finding, every `--format human`) +grading run, so a regular user can see at a glance how risky each fix is without requesting a +separate filtered view. `--remediation-safety ` (below) additionally *filters* the +diagnostics down to one level and reshapes them into `remediationItems`; absent that flag, the +full, unfiltered diagnostic list is annotated in place instead. + `truncated: true` is added only when `--top` actually drops entries from `diagnostics`. `rulesetPath` is added only when a custom ruleset was used. @@ -385,6 +402,8 @@ humanreview, unsafe.` and a non-zero exit code. api-grade openapi.yaml --remediation-safety humanreview --format json ``` +This output is pretty-printed the same as the regular `--format json` output above: + ```json { "specPath": "openapi.yaml", @@ -399,6 +418,7 @@ api-grade openapi.yaml --remediation-safety humanreview --format json "severity": "warn", "path": ["paths", "/pets", "get"], "location": "paths./pets.get", + "range": { "start": { "line": 11, "character": 2 }, "end": { "line": 11, "character": 5 } }, "currentValue": null, "expectedImprovement": "Fix: Operation must have \"operationId\". Add or update `operationId` as required", "riskLevel": "medium", @@ -414,7 +434,10 @@ Each item carries `riskLevel` (`low`/`medium`/`high`) and `confidenceLevel` (`high`/`medium`/`low`) alongside `remediationSafetyLevel` — the field `--remediation-safety`/`requestedLevel` filters against — and a `staleFingerprintWarning` that is non-null only when a human-assessed rule classification's underlying rule -definition has since changed (see `ruleset-analysis` below). +definition has since changed (see `ruleset-analysis` below). `severity` is the diagnostic's +actual severity (`error`/`warn`/`info`/`hint`, not a fixed placeholder), and `range` carries +the same line/character location as the regular diagnostics output — both are required to +act on a remediation item without losing the line-number context a linter normally provides. **Human-readable** (default, or with `--format human`): diff --git a/docs/package/api-grade-mcp.md b/docs/package/api-grade-mcp.md index fa004f4..9fbd19b 100644 --- a/docs/package/api-grade-mcp.md +++ b/docs/package/api-grade-mcp.md @@ -121,7 +121,7 @@ Assert that an API specification meets a minimum grade threshold (A > B > C > D ### `grade-api-remediation-safety` -Return a classified, AI-actionable list of diagnostics filtered by remediation safety level: `safe` (non-breaking, safe to auto-apply), `humanreview` (typically additive/clarifying but should be confirmed by a human before applying at scale), or `unsafe` (could change request/response validation, required fields, types, or the parameter surface — requires human or explicitly-confirmed-agent review). Each result includes `ruleId`, `path`, `location`, `currentValue`, `expectedImprovement`, and a confidence indicator (`riskLevel`, `confidenceLevel`, `remediationSafetyLevel`, `staleFingerprintWarning`). +Return a classified, AI-actionable list of diagnostics filtered by remediation safety level: `safe` (non-breaking, safe to auto-apply), `humanreview` (typically additive/clarifying but should be confirmed by a human before applying at scale), or `unsafe` (could change request/response validation, required fields, types, or the parameter surface — requires human or explicitly-confirmed-agent review). Each result includes `ruleId`, `severity`, `path`, `location`, `range`, `currentValue`, `expectedImprovement`, and a confidence indicator (`riskLevel`, `confidenceLevel`, `remediationSafetyLevel`, `staleFingerprintWarning`) — `severity` and `range` are carried over unchanged from the underlying diagnostic, so no line-number/severity context is lost relative to `grade-api-detailed`. **Input**: `specPath` (required), `level` (required: `safe`/`humanreview`/`unsafe`), `rulesetPath` (optional), `recoveryOption` (optional) diff --git a/docs/package/api-reference.md b/docs/package/api-reference.md index eaa6460..9e2e509 100644 --- a/docs/package/api-reference.md +++ b/docs/package/api-reference.md @@ -67,38 +67,47 @@ console.log(result.numericScore); // 74 --- -## `formatJson(result: GradeResult): string` +## `formatJson(result: GradeResult, top?: number, rulesetAnalysis?: RulesetAnalysis): string` -Serialises a `GradeResult` to a JSON string suitable for machine-readable output. The output shape matches the `--format json` CLI output. +Serialises a `GradeResult` to a **pretty-printed** (two-space indented) JSON string suitable +for both machine-readable and human-readable output — every JSON document this package emits +is pretty-printed, never minified. The output shape matches the `--format json` CLI output. **Parameters:** | Name | Type | Required | Description | |------|------|----------|-------------| | `result` | `GradeResult` | Yes | The result returned by `grade()` or `gradeContent()` | +| `top` | `number` | No | Truncate `diagnostics` to the first N entries (sets `truncated: true` if entries were dropped) | +| `rulesetAnalysis` | `RulesetAnalysis` | No | When supplied (see `analyseRuleset()` below), each diagnostic is decorated in place with `riskLevel`, `confidenceLevel`, `remediationSafetyLevel`, and `staleFingerprintWarning` — the same remediation-safety signals `buildRemediationSafetyOutput()` filters on, but applied to every diagnostic, not just one level | -**Returns:** `string` — a formatted JSON string +**Returns:** `string` — a pretty-printed JSON string **Example:** ```typescript -import { GradeEngine, formatJson } from '@dawmatt/api-grade-core'; +import { GradeEngine, formatJson, analyseRuleset, loadRuleset } from '@dawmatt/api-grade-core'; const engine = new GradeEngine(); const result = await engine.grade({ specPath: './openapi.yaml' }); -console.log(formatJson(result)); +const rulesetAnalysis = await analyseRuleset(await loadRuleset(result.format, result.rulesetPath)); +console.log(formatJson(result, undefined, rulesetAnalysis)); // diagnostics include safety info ``` --- -## `formatHuman(result: GradeResult): string` +## `formatHuman(result: GradeResult, top?: number, rulesetAnalysis?: RulesetAnalysis): string` -Serialises a `GradeResult` to a human-readable text string. The output matches the default CLI output. +Serialises a `GradeResult` to a human-readable text string. The output matches the default CLI +output. When `rulesetAnalysis` is supplied, a `safety=... risk=... confidence=...` line is +printed under each diagnostic, same as `formatJson`'s decoration. **Parameters:** | Name | Type | Required | Description | |------|------|----------|-------------| | `result` | `GradeResult` | Yes | The result returned by `grade()` or `gradeContent()` | +| `top` | `number` | No | Show only the first N diagnostics | +| `rulesetAnalysis` | `RulesetAnalysis` | No | When supplied, annotates each printed diagnostic with its remediation-safety signals | **Returns:** `string` — a formatted human-readable report @@ -112,7 +121,7 @@ Serialises a `GradeResult` to a human-readable text string. The output matches t > these exact field names — see [CLI Commands](../cli/commands.md#json-output-schema) > and [MCP Server Tool Reference](api-grade-mcp.md) for where each shape is used. -### `buildCommonGradeOutput(result: GradeResult, options?: { top?: number }): CommonGradeOutput` +### `buildCommonGradeOutput(result: GradeResult, options?: { top?: number; rulesetAnalysis?: RulesetAnalysis }): CommonGradeOutput` Shapes a `GradeResult` for "grade a spec, give me everything" output. Used by the CLI's `--format json`, MCP's `grade-api`, and MCP's `grade-api-detailed`. @@ -125,13 +134,27 @@ interface CommonGradeOutput { gradeLabel: GradeLabel; numericScore: number; summary: DiagnosticSummary; - diagnostics: Diagnostic[]; + diagnostics: Diagnostic[] | DiagnosticWithSafety[]; truncated?: boolean; // present only when `options.top` actually dropped entries rulesetSource: 'default' | 'custom'; rulesetPath?: string; // present only when a custom ruleset was used } + +interface DiagnosticWithSafety extends Diagnostic { + riskLevel: RiskLevel | null; + confidenceLevel: ConfidenceLevel; + remediationSafetyLevel: RemediationSafetyLevel; + staleFingerprintWarning: StaleFingerprintWarning | null; +} ``` +`diagnostics` is `DiagnosticWithSafety[]` whenever `options.rulesetAnalysis` is supplied — +each entry is the original `Diagnostic` plus the four remediation-safety fields, looked up via +`getRemediationSafety()` (below) — and plain `Diagnostic[]` otherwise. Callers that always +want safety info on regular grade output (not just the `--remediation-safety`-filtered view) +should always pass `rulesetAnalysis`; the CLI's `--format json`/`--format human` paths do this +unconditionally. + Tool-specific data (e.g. MCP's `largeSpecWarning`, `recoveryOptions`) is layered additively on top of this shape by the consuming package — it is never renamed or restructured. @@ -205,9 +228,10 @@ interface RemediationSafetyOutput { interface RemediationItem { ruleId: string; message: string; - severity: string; + severity: DiagnosticSeverity; // "error" | "warn" | "info" | "hint" — the diagnostic's actual severity path: string[]; location: string; // dot-joined `path` + range: Diagnostic['range']; // line/character location, carried over from the source diagnostic currentValue: string | null; expectedImprovement: string; riskLevel: RiskLevel | null; @@ -217,11 +241,20 @@ interface RemediationItem { } ``` +`severity` and `range` are carried over unchanged from the underlying `Diagnostic` — a +`RemediationItem` is never missing the line-number/severity context a regular diagnostic has, +even though it's filtered down to one remediation-safety level and reshaped with +remediation-specific fields (`location`, `currentValue`, `expectedImprovement`). + +The JSON returned by `buildRemediationSafetyOutput()` is pretty-printed by every caller +(`JSON.stringify(output, null, 2)`), matching `formatJson()`'s output style — it is not +minified. + ### `formatRemediationSafetyHuman(result: GradeResult, specContent: string, rulesetAnalysis: RulesetAnalysis, requestedLevel: RemediationSafetyLevel): string` Renders the same filtered `RemediationItem[]` list used by `buildRemediationSafetyOutput()` -as human-readable text. Used by the CLI's `--remediation-safety ` with -`--format human` (the default). +as human-readable text, including each item's line number (`Line N`) when `range` is present. +Used by the CLI's `--remediation-safety ` with `--format human` (the default). ### `persistRuleAnalysisCorrection(loadedRuleset, ruleId, remediationSafetyLevel, scope?)` @@ -324,6 +357,11 @@ interface Diagnostic { } ``` +See `DiagnosticWithSafety` (under `buildCommonGradeOutput` above) for the shape a `Diagnostic` +takes on once decorated with remediation-safety fields, and `RemediationItem` (under +`buildRemediationSafetyOutput` above) for the shape it takes on once filtered to one +remediation-safety level — both preserve `severity` and `range` unchanged from this base type. + --- ### `RuleMetadata` diff --git a/packages/api-grade-core/scripts/generate-bundled-analysis.mjs b/packages/api-grade-core/scripts/generate-bundled-analysis.mjs index 7ab0bcc..9093779 100644 --- a/packages/api-grade-core/scripts/generate-bundled-analysis.mjs +++ b/packages/api-grade-core/scripts/generate-bundled-analysis.mjs @@ -3,15 +3,18 @@ // entire built-in OpenAPI/AsyncAPI rulesets, so the built-in ruleset's analysis never requires // per-rule computation at request time (SC-007). // -// Run manually after bumping @stoplight/spectral-rulesets, after changing the analyser's -// heuristic, or after a maintainer reviews and wants to seed a rule's classification (see the -// HUMAN_REVIEWED table below). Requires `npm run build` to have run first (reads dist/). +// Run manually after bumping @stoplight/spectral-rulesets or after changing the analyser's +// heuristic. Requires `npm run build` to have run first (reads dist/). // -// IMPORTANT: entries are assessedBy: "automated" unless the rule id is listed in -// HUMAN_REVIEWED below — that table exists for a maintainer to record an actual review (FR-020), -// not as a place to seed a guess. Do not add a rule to HUMAN_REVIEWED unless a person has -// actually read that rule's definition and confirmed the classification. -import { writeFileSync } from 'node:fs'; +// Human review (FR-020) is recorded directly in the JSON output files, not in this script: edit +// an entry's assessedBy to "human" (and set remediationSafetyLevel/rationale to match the +// reviewer's conclusion) after actually reading that rule's definition. This script reads the +// existing JSON before writing and leaves any "human" entry untouched — it only recomputes +// entries that are still "automated". If a left-alone human entry's fingerprint no longer +// matches the rule's current definition, the rule changed since it was reviewed; this script +// prints those as "stale rules to consider for re-review" rather than silently recalculating +// them. +import { existsSync, readFileSync, writeFileSync } from 'node:fs'; import { dirname, join } from 'node:path'; import { fileURLToPath } from 'node:url'; import { oas, asyncapi } from '@stoplight/spectral-rulesets'; @@ -20,51 +23,52 @@ import { analyseRuleset, computeRuleFingerprint } from '../dist/remediation-safe const __dirname = dirname(fileURLToPath(import.meta.url)); const outDir = join(__dirname, '..', 'src', 'rulesets', 'bundled-analysis'); -// A maintainer who has actually read a rule's definition and confirmed its classification -// records it here: { [ruleId]: { remediationSafetyLevel, rationale } }. Empty today — no rule -// has been through real human review yet. -const HUMAN_REVIEWED = {}; - async function generate(ruleset, fileName) { + const outPath = join(outDir, fileName); + const existing = existsSync(outPath) ? JSON.parse(readFileSync(outPath, 'utf-8')) : { rules: {} }; + // rulesetSource: 'custom' (not 'default') so analyseRuleset() doesn't try to read this very // file via its own Stage 0 bundled-lookup branch while we're regenerating it. const loadedRuleset = { ruleset, rulesetSource: 'custom' }; const analysis = await analyseRuleset(loadedRuleset); const rules = {}; + const staleHumanRuleIds = []; + for (const ruleAnalysis of analysis.rules) { const { ruleId } = ruleAnalysis; - const reviewed = HUMAN_REVIEWED[ruleId]; const fingerprint = computeRuleFingerprint(ruleId, ruleset.rules[ruleId]); + const existingEntry = existing.rules[ruleId]; - rules[ruleId] = reviewed - ? { - ruleId, - riskLevel: null, - confidenceLevel: 'high', - remediationSafetyLevel: reviewed.remediationSafetyLevel, - assessedBy: 'human', - staleFingerprintWarning: null, - rationale: reviewed.rationale, - source: 'bundled-default', - fingerprint, - } - : { - ruleId, - riskLevel: ruleAnalysis.riskLevel, - confidenceLevel: ruleAnalysis.confidenceLevel, - remediationSafetyLevel: ruleAnalysis.remediationSafetyLevel, - assessedBy: 'automated', - staleFingerprintWarning: null, - rationale: ruleAnalysis.rationale, - source: 'bundled-default', - fingerprint, - }; + if (existingEntry?.assessedBy === 'human') { + rules[ruleId] = existingEntry; + if (existingEntry.fingerprint !== fingerprint) { + staleHumanRuleIds.push(ruleId); + } + continue; + } + + rules[ruleId] = { + ruleId, + riskLevel: ruleAnalysis.riskLevel, + confidenceLevel: ruleAnalysis.confidenceLevel, + remediationSafetyLevel: ruleAnalysis.remediationSafetyLevel, + assessedBy: 'automated', + staleFingerprintWarning: null, + rationale: ruleAnalysis.rationale, + source: 'bundled-default', + fingerprint, + }; } - const outPath = join(outDir, fileName); writeFileSync(outPath, JSON.stringify({ rules }, null, 2) + '\n', 'utf-8'); console.log(`Wrote ${outPath} (${Object.keys(rules).length} entries)`); + if (staleHumanRuleIds.length > 0) { + console.log(` Stale rules to consider for re-review (human-reviewed, definition changed since):`); + for (const ruleId of staleHumanRuleIds) { + console.log(` - ${ruleId}`); + } + } } await generate(oas, 'openapi.json'); diff --git a/packages/api-grade-core/src/formatter.ts b/packages/api-grade-core/src/formatter.ts index a075ef8..91f5698 100644 --- a/packages/api-grade-core/src/formatter.ts +++ b/packages/api-grade-core/src/formatter.ts @@ -1,6 +1,7 @@ import chalk from 'chalk'; -import type { GradeResult, DiagnosticSeverity } from './types.js'; +import type { GradeResult, DiagnosticSeverity, RulesetAnalysis } from './types.js'; import { buildCommonGradeOutput } from './json-output.js'; +import { getRemediationSafety } from './remediation-safety.js'; const SEVERITY_COLORS: Record string> = { error: chalk.red, @@ -9,7 +10,7 @@ const SEVERITY_COLORS: Record string> = { hint: chalk.gray, }; -export function formatHuman(result: GradeResult, top?: number): string { +export function formatHuman(result: GradeResult, top?: number, rulesetAnalysis?: RulesetAnalysis): string { const lines: string[] = []; // Section 1: Grade line @@ -59,6 +60,12 @@ export function formatHuman(result: GradeResult, top?: number): string { ` ${color(d.severity.padEnd(5))} ${d.ruleId.padEnd(42)} ${pathStr}${lineNum}` ); lines.push(` ${d.message}`); + if (rulesetAnalysis) { + const safety = getRemediationSafety(d, rulesetAnalysis); + lines.push( + ` safety=${safety.remediationSafetyLevel} risk=${safety.riskLevel ?? 'n/a'} confidence=${safety.confidenceLevel}` + ); + } } if (remaining > 0) { @@ -73,7 +80,7 @@ export function formatHuman(result: GradeResult, top?: number): string { return lines.join('\n'); } -export function formatJson(result: GradeResult, top?: number): string { - const output = buildCommonGradeOutput(result, { top }); +export function formatJson(result: GradeResult, top?: number, rulesetAnalysis?: RulesetAnalysis): string { + const output = buildCommonGradeOutput(result, { top, rulesetAnalysis }); return JSON.stringify(output, null, 2); } diff --git a/packages/api-grade-core/src/json-output.ts b/packages/api-grade-core/src/json-output.ts index 6ea3941..a7ce2ae 100644 --- a/packages/api-grade-core/src/json-output.ts +++ b/packages/api-grade-core/src/json-output.ts @@ -1,13 +1,19 @@ -import type { GradeResult, CommonGradeOutput, AssertOutput, LetterGrade } from './types.js'; +import type { GradeResult, CommonGradeOutput, AssertOutput, LetterGrade, RulesetAnalysis } from './types.js'; import { gradeToNumber } from './scorer.js'; +import { getRemediationSafety } from './remediation-safety.js'; export function buildCommonGradeOutput( result: GradeResult, - options?: { top?: number } + options?: { top?: number; rulesetAnalysis?: RulesetAnalysis } ): CommonGradeOutput { const top = options?.top; - const diagnostics = top !== undefined ? result.diagnostics.slice(0, top) : result.diagnostics; - const truncated = top !== undefined && diagnostics.length < result.diagnostics.length; + const sourceDiagnostics = top !== undefined ? result.diagnostics.slice(0, top) : result.diagnostics; + const truncated = top !== undefined && sourceDiagnostics.length < result.diagnostics.length; + + const rulesetAnalysis = options?.rulesetAnalysis; + const diagnostics = rulesetAnalysis + ? sourceDiagnostics.map((d) => ({ ...d, ...getRemediationSafety(d, rulesetAnalysis) })) + : sourceDiagnostics; const output: CommonGradeOutput = { specPath: result.specPath, diff --git a/packages/api-grade-core/src/remediation-safety.ts b/packages/api-grade-core/src/remediation-safety.ts index 9ae10e0..9e51ad6 100644 --- a/packages/api-grade-core/src/remediation-safety.ts +++ b/packages/api-grade-core/src/remediation-safety.ts @@ -504,13 +504,6 @@ export function getRemediationSafety( return { riskLevel: 'high', confidenceLevel: 'low', remediationSafetyLevel: 'unsafe', staleFingerprintWarning: null }; } -const SEVERITY_LABELS: Record = { - 0: 'error', - 1: 'warn', - 2: 'info', - 3: 'hint', -}; - function deriveExpectedImprovement( ruleId: string, message: string, @@ -570,16 +563,15 @@ export function buildRemediationItem( const lastSegment = path[path.length - 1] ?? 'field'; const expectedImprovement = deriveExpectedImprovement(diagnostic.ruleId, diagnostic.message, lastSegment, path); - const severityNum = typeof diagnostic.severity === 'number' ? diagnostic.severity : 1; - const safety = getRemediationSafety(diagnostic, rulesetAnalysis); return { ruleId: diagnostic.ruleId, message: diagnostic.message, - severity: SEVERITY_LABELS[severityNum] ?? 'warn', + severity: diagnostic.severity, path, location, + range: diagnostic.range, currentValue, expectedImprovement, riskLevel: safety.riskLevel, @@ -628,7 +620,8 @@ export function formatRemediationSafetyHuman( for (const item of remediationItems) { lines.push(''); const location = item.location || '(root)'; - lines.push(` ${item.severity.padEnd(5)} ${item.ruleId.padEnd(42)} ${location}`); + const lineNum = item.range?.start?.line !== undefined ? ` Line ${item.range.start.line + 1}` : ''; + lines.push(` ${item.severity.padEnd(5)} ${item.ruleId.padEnd(42)} ${location}${lineNum}`); lines.push(` risk=${item.riskLevel ?? 'n/a'} confidence=${item.confidenceLevel} safety=${item.remediationSafetyLevel}`); lines.push(` ${item.message}`); lines.push(` ${item.expectedImprovement}`); diff --git a/packages/api-grade-core/src/types.ts b/packages/api-grade-core/src/types.ts index 283330f..75c3463 100644 --- a/packages/api-grade-core/src/types.ts +++ b/packages/api-grade-core/src/types.ts @@ -146,9 +146,10 @@ export interface RulesetAnalysis { export interface RemediationItem { ruleId: string; message: string; - severity: string; + severity: DiagnosticSeverity; path: string[]; location: string; + range: Diagnostic['range']; currentValue: string | null; expectedImprovement: string; riskLevel: RiskLevel | null; @@ -157,6 +158,13 @@ export interface RemediationItem { staleFingerprintWarning: StaleFingerprintWarning | null; } +export interface DiagnosticWithSafety extends Diagnostic { + riskLevel: RiskLevel | null; + confidenceLevel: ConfidenceLevel; + remediationSafetyLevel: RemediationSafetyLevel; + staleFingerprintWarning: StaleFingerprintWarning | null; +} + export interface CommonGradeOutput { specPath: string; format: ApiFormat; @@ -164,7 +172,7 @@ export interface CommonGradeOutput { gradeLabel: GradeLabel; numericScore: number; summary: DiagnosticSummary; - diagnostics: Diagnostic[]; + diagnostics: Diagnostic[] | DiagnosticWithSafety[]; truncated?: boolean; rulesetSource: 'default' | 'custom'; rulesetPath?: string; diff --git a/specs/012-remediation-safety/contracts/remediation-safety-surfaces.md b/specs/012-remediation-safety/contracts/remediation-safety-surfaces.md index 0ecdd7b..cfd9ab0 100644 --- a/specs/012-remediation-safety/contracts/remediation-safety-surfaces.md +++ b/specs/012-remediation-safety/contracts/remediation-safety-surfaces.md @@ -9,6 +9,9 @@ Supersedes `specs/011-remediation-safety-rename/contracts/remediation-safety-sur | Accepts only `safe`; any other value rejected with `Error: --remediation-safety must be "safe".` | Accepts `safe`, `humanreview`, `unsafe`. Any other value rejected with `Error: --remediation-safety must be one of: safe, humanreview, unsafe.` | | Filtered output built by `buildQuickFixOutput`/`formatQuickFixesHuman`, shape `QuickFixOutput` (`quickFixCount`, `quickFixes`). | Filtered output built by `buildRemediationSafetyOutput`/`formatRemediationSafetyHuman`, shape `RemediationSafetyOutput` (`remediationItemCount`, `remediationItems`, `requestedLevel`). Each item additionally carries `riskLevel` (`low`/`medium`/`high`), `confidenceLevel`, `remediationSafetyLevel` (`safe`/`humanreview`/`unsafe` — a field in its own right, not the same field/type as `riskLevel`), and `staleFingerprintWarning` (`null` unless the rule's classification is human-assessed and its fingerprint no longer matches — FR-021). | | `--remediation-safety safe` output identical to pre-Feature-12 `safe` output in violation membership. | Unchanged for `safe` membership (FR-007); new fields (`riskLevel`, `confidenceLevel`, `remediationSafetyLevel`, `requestedLevel`) are additive. `--remediation-safety`/`requestedLevel` filter against `remediationSafetyLevel`, not `riskLevel`. | +| — | `severity` and `range` on each `RemediationItem` MUST be carried over unchanged from the underlying `Diagnostic` (FR-022/SC-010) — filtering/reshaping into a `RemediationItem` is strictly additive, never lossy, relative to the regular diagnostic. | +| — | `--format json` output (this filtered shape, the regular `CommonGradeOutput` shape, and `ruleset-analysis`'s `RulesetAnalysis` shape) is always pretty-printed (FR-023/SC-011) — `JSON.stringify(value, null, 2)`, never a single compact line. | +| — | Per-violation `riskLevel`/`confidenceLevel`/`remediationSafetyLevel`/`staleFingerprintWarning` are also surfaced on the **regular**, unfiltered `--format json`/`--format human` output (FR-024/SC-012) — i.e. without `--remediation-safety` supplied at all — by decorating `CommonGradeOutput.diagnostics` (now typed `Diagnostic[] | DiagnosticWithSafety[]`) in place. | ## CLI: new `ruleset-analysis` subcommand diff --git a/specs/012-remediation-safety/data-model.md b/specs/012-remediation-safety/data-model.md index 65ae0a7..a88e4d7 100644 --- a/specs/012-remediation-safety/data-model.md +++ b/specs/012-remediation-safety/data-model.md @@ -115,9 +115,10 @@ A store entry is used "until one matches a current `RuleFingerprint`" only for ` |---|---|---| | `ruleId` | string | Unchanged from today's `QuickFix.ruleId`. | | `message` | string | Unchanged. | -| `severity` | string | Unchanged. | +| `severity` | `DiagnosticSeverity` (`"error"` \| `"warn"` \| `"info"` \| `"hint"`) | The violation's actual severity, carried over unchanged from `Diagnostic.severity`. **Regression note**: an earlier implementation derived this from a numeric-severity assumption (`typeof diagnostic.severity === 'number'`) that no longer held once `Diagnostic.severity` became a string enum, so every item silently reported `"warn"` regardless of true severity. `buildRemediationItem()` MUST assign `severity: diagnostic.severity` directly — never re-derive it via a numeric lookup table. | | `path` | string[] | Unchanged. | | `location` | string | Unchanged. | +| `range` | `Diagnostic['range']` | **Restored** — carried over unchanged from `Diagnostic.range` (line/character start/end). Without it, a `RemediationItem` cannot be located in the source file by line number, which defeats the "actionable" requirement (FR-008/Principle VI) once a violation has been filtered out of the regular diagnostics list. `formatRemediationSafetyHuman()` MUST render it (`Line N`) the same way `formatHuman()` does for plain diagnostics. | | `currentValue` | string \| null | Unchanged. | | `expectedImprovement` | string | Unchanged. | | `riskLevel` | `RiskLevel` \| `null` | **New** — the violation's rule-level estimated risk (`low`/`medium`/`high`), looked up from the rule's `RuleAnalysis`. `null` when the lookup hit a Stage 0 entry that has no `riskLevel` of its own (see `RuleAnalysis`). | @@ -125,6 +126,40 @@ A store entry is used "until one matches a current `RuleFingerprint`" only for ` | `remediationSafetyLevel` | `RemediationSafetyLevel` | **New** — a field in its own right, distinct from `riskLevel` both in name and in type/values (`safe`/`humanreview`/`unsafe`, not `low`/`medium`/`high`). The violation's computed remediation safety, looked up from the rule's `RuleAnalysis.remediationSafetyLevel`. This is the field `--remediation-safety`/`level` filtering matches against. | | `staleFingerprintWarning` | `{ storedFingerprint: string; currentFingerprint: string; message: string }` \| `null` | **New** — carried over verbatim from the rule's `RuleAnalysis.staleFingerprintWarning`, so a CI pipeline or human reading per-violation output sees the same "this rule changed since a human reviewed it" warning without needing to separately inspect the ruleset analysis. | +## DiagnosticWithSafety (regular grade output, additive) + +| Field | Type | Description | +|---|---|---| +| *(all `Diagnostic` fields)* | — | Unchanged — `ruleId`, `message`, `severity`, `path`, `range`, `source`. | +| `riskLevel` | `RiskLevel` \| `null` | Looked up via `getRemediationSafety(diagnostic, rulesetAnalysis)`, same as `RemediationItem.riskLevel`. | +| `confidenceLevel` | `ConfidenceLevel` | Same lookup. | +| `remediationSafetyLevel` | `RemediationSafetyLevel` | Same lookup. | +| `staleFingerprintWarning` | same shape as `RemediationItem.staleFingerprintWarning` | Same lookup. | + +`CommonGradeOutput.diagnostics` is `DiagnosticWithSafety[]` whenever the caller supplies a +`RulesetAnalysis` to `buildCommonGradeOutput(result, { top, rulesetAnalysis })` (equivalently, +to `formatJson`/`formatHuman`'s `rulesetAnalysis` parameter), and plain `Diagnostic[]` +otherwise. This is **independent of `--remediation-safety`/`level` filtering** — it is the +mechanism by which a *regular* (unfiltered) grading request also surfaces per-violation +remediation-safety information, so a user does not have to make a second, filtered request +just to learn how risky each finding is to fix. The CLI's default (non-`--remediation-safety`) +`--format json` and `--format human` paths MUST always supply `rulesetAnalysis`, computed via +the same `loadRuleset()`/`analyseRuleset()` call already made for `--remediation-safety`, so +this is not an opt-in flag — it is the default shape of regular grading output going forward. + +## Output formatting contract (all surfaces) + +Every JSON document any tool in this project prints to an end user — CLI (`--format json`, +`ruleset-analysis [correct]`, `config`/`set-ruleset`/`get-ruleset` error and success payloads) +and any future surface reusing these core functions — MUST be pretty-printed +(`JSON.stringify(value, null, 2)`), never minified. This was already true of the main +`formatJson()` grade output; `buildRemediationSafetyOutput()`'s JSON and the +`ruleset-analysis`/`ruleset-analysis correct` JSON output regressed to compact, single-line +JSON when first implemented, which is the specific regression this note exists to prevent. +MCP tool responses (`grade-api`, `grade-api-detailed`, `grade-api-remediation-safety`, +`analyse-ruleset-safety`, etc.) are explicitly exempt — their JSON stays compact/minified by +design, for token efficiency in an AI-agent context, not for end-user reading. + ## RemediationSafetyOutput (was `QuickFixOutput`) | Field | Type | Description | diff --git a/specs/012-remediation-safety/plan.md b/specs/012-remediation-safety/plan.md index 6d5cf22..8573ebd 100644 --- a/specs/012-remediation-safety/plan.md +++ b/specs/012-remediation-safety/plan.md @@ -66,20 +66,26 @@ specs/algorithms/ ```text packages/api-grade-core/src/ -├── remediation-safety.ts # NEW — replaces quick-fixes.ts: analyseRuleset(), getRemediationSafety(), buildRemediationItem(), buildRemediationSafetyOutput(), formatRemediationSafetyHuman() +├── remediation-safety.ts # NEW — replaces quick-fixes.ts: analyseRuleset(), getRemediationSafety(), buildRemediationItem(), buildRemediationSafetyOutput(), formatRemediationSafetyHuman(); buildRemediationItem() carries severity/range over from Diagnostic unchanged (FR-022) ├── rulesets/loader.ts # unchanged — analyser consumes its LoadedRuleset.ruleset.rules -├── types.ts # add RemediationSafetyLevel, ConfidenceLevel, RuleAnalysis, RulesetAnalysis, RemediationItem, RemediationSafetyOutput; remove ViolationClass, QuickFix, QuickFixOutput +├── types.ts # add RemediationSafetyLevel, ConfidenceLevel, RuleAnalysis, RulesetAnalysis, RemediationItem (incl. range), RemediationSafetyOutput, DiagnosticWithSafety; remove ViolationClass, QuickFix, QuickFixOutput +├── json-output.ts # buildCommonGradeOutput() accepts options.rulesetAnalysis; decorates diagnostics via getRemediationSafety() when supplied (FR-024) +├── formatter.ts # formatJson()/formatHuman() accept an optional rulesetAnalysis param, threaded into buildCommonGradeOutput()/per-diagnostic safety annotation; JSON output remains pretty-printed (FR-023) └── index.ts # export new remediation-safety.ts symbols/types in place of quick-fixes.ts ones packages/api-grade-core/tests/unit/ -└── remediation-safety.test.ts # replaces quick-fixes.test.ts; adds analyseRuleset()/getRemediationSafety() coverage for all 3 levels + confidence + SC-005 total-coverage check +├── remediation-safety.test.ts # replaces quick-fixes.test.ts; adds analyseRuleset()/getRemediationSafety() coverage for all 3 levels + confidence + SC-005 total-coverage check +├── json-output.test.ts # buildCommonGradeOutput() with/without rulesetAnalysis +└── formatter.test.ts # formatJson()/formatHuman() with/without rulesetAnalysis src/cli/ -├── index.ts # extend --remediation-safety to accept safe|humanreview|unsafe; call renamed core functions -└── ruleset-analysis-cli.ts # NEW — `ruleset-analysis` subcommand (mirrors ruleset-config-cli.ts pattern) +├── index.ts # extend --remediation-safety to accept safe|humanreview|unsafe; call renamed core functions; always compute rulesetAnalysis and pass to formatJson()/formatHuman() on the regular (non-filtered) path too (FR-024); all printed JSON pretty-printed (FR-023) +├── ruleset-analysis-cli.ts # NEW — `ruleset-analysis` subcommand (mirrors ruleset-config-cli.ts pattern); JSON output pretty-printed +└── ruleset-config-cli.ts # JSON output pretty-printed for consistency (FR-023) tests/integration/ -└── cli-remediation-safety.test.ts # replaces cli-quick-fixes.test.ts; covers all 3 levels + ruleset-analysis subcommand +├── cli-remediation-safety.test.ts # replaces cli-quick-fixes.test.ts; covers all 3 levels + ruleset-analysis subcommand +└── cli-json-output.test.ts # updated to parse multiple back-to-back pretty-printed JSON documents from stdout (brace-depth split) instead of one compact JSON object per line packages/api-grade-mcp/src/ ├── server.ts # register renamed tool + new analyse-ruleset-safety tool diff --git a/specs/012-remediation-safety/spec.md b/specs/012-remediation-safety/spec.md index f51b4c6..91fefdc 100644 --- a/specs/012-remediation-safety/spec.md +++ b/specs/012-remediation-safety/spec.md @@ -97,6 +97,9 @@ A new contributor or documentation reader should encounter "remediation safety" - **FR-019**: When the ruleset's location is not writable by the system directly (e.g. a GitHub-hosted ruleset), the system MUST NOT automatically write or commit a correction back to that remote location. It MAY still read any existing colocated shared analysis there (FR-017), and MAY produce the content a user would need to commit themselves to update the shared analysis. - **FR-020**: Every persisted or pre-calculated per-rule classification (shared colocated analysis, personal override, or bundled default — FR-012/FR-016/FR-018) MUST record whether it was assessed by a human (an explicit correction persisted via FR-013, including a maintainer's judgement for a well-known built-in rule) or produced automatically with no human review. This distinction MUST be inspectable wherever per-rule results are inspectable (FR-011). There is no separate hard-coded table of "known" classifications distinct from this persisted mechanism — the built-in ruleset's pre-curated classifications are persisted entries assessed by a human, like any other. - **FR-021**: When a rule's definition changes after a human-assessed classification (FR-020) was captured for it, the system MUST continue to honor that classification rather than treating it as stale and falling back to automated analysis (contrast FR-014's handling of an automated classification under the same circumstance). The system MUST also surface a warning for that rule — in both JSON and human-readable output, at both the ruleset-analysis level (FR-011) and the per-violation level (FR-008) — identifying the rule and including both the fingerprint the classification was captured against and the rule's current fingerprint, so a user can tell the rule changed since a human last reviewed it even though the prior judgement is still being trusted. +- **FR-022**: A per-violation remediation-safety item (FR-008) MUST NOT drop any field a regular (non-filtered) diagnostic for the same violation would have carried — at minimum, its actual severity (`error`/`warn`/`info`/`hint`, not a fixed placeholder) and its source-location `range` (line/character). Filtering diagnostics down to one remediation-safety level and reshaping them with remediation-specific fields (`location`, `currentValue`, `expectedImprovement`) MUST be strictly additive to a regular diagnostic's fields, never lossy. +- **FR-023**: Every JSON document a CLI command prints for an end user to read (as opposed to an MCP tool response consumed by an AI agent) MUST be pretty-printed (human-legible, multi-line, indented) — consistent across the regular grade output, the remediation-safety-filtered output, and the ruleset-analysis output. MCP tool JSON responses are explicitly out of scope for this requirement, since they are optimized for token efficiency in an AI-agent context rather than direct human reading. +- **FR-024**: Per-violation remediation-safety information (FR-008) MUST be available on a *regular*, unfiltered grading request — not only when `--remediation-safety`/`level` is explicitly supplied. A user grading a spec with no remediation-safety filter applied MUST still be able to see, per diagnostic, the same `riskLevel`/`confidenceLevel`/`remediationSafetyLevel`/`staleFingerprintWarning` signals, in both JSON and human-readable output, so they do not need to make a second, filtered request just to learn how risky a finding is to fix. ### Key Entities *(include if feature involves data)* @@ -124,6 +127,9 @@ A new contributor or documentation reader should encounter "remediation safety" - **SC-007**: The built-in ruleset's analysis is available without any per-rule automated computation having to run at request time (served from a pre-calculated/bundled result), for both the CLI and MCP surfaces. - **SC-008**: Two different users pointed at the same ruleset location (local path or GitHub-hosted) see identical classifications for every rule covered by that ruleset's shared, colocated analysis, without either of them having separately configured it. - **SC-009**: For every rule whose classification came from a human-assessed entry with a fingerprint mismatch, both the rule's stored fingerprint and its current fingerprint are visible in the output, in both JSON and human-readable form, at both the ruleset-analysis and per-violation surfaces. +- **SC-010**: A `--remediation-safety`-filtered remediation item for a given violation carries the same severity and line/character location as the unfiltered diagnostic for that same violation would — verifiable by grading the same spec with and without the filter and comparing the matching violations' `severity`/`range` fields. +- **SC-011**: Every CLI-printed JSON document (regular grade output, remediation-safety-filtered output, ruleset-analysis output, and config/error payloads) is valid multi-line, indented JSON — verifiable by confirming it is not a single line. +- **SC-012**: Grading a spec with no `--remediation-safety`/`level` filter still returns, per diagnostic, the same `riskLevel`/`confidenceLevel`/`remediationSafetyLevel`/`staleFingerprintWarning` fields a filtered request would show for that violation — in both JSON and human-readable output. ## Assumptions diff --git a/specs/012-remediation-safety/tasks.md b/specs/012-remediation-safety/tasks.md index ffeb8e8..5ac9f58 100644 --- a/specs/012-remediation-safety/tasks.md +++ b/specs/012-remediation-safety/tasks.md @@ -127,6 +127,31 @@ No new project scaffolding is required — this feature extends the existing `pa --- +## Phase 7: Output-shape regression fix (post-T040) + +**Purpose**: T009's initial `--remediation-safety` implementation regressed the output shape +relative to the regular `--format json`/`--format human` diagnostics: every `RemediationItem` +reported `severity: "warn"` regardless of actual severity (a stale numeric-severity assumption +left over from before `Diagnostic.severity` became a string enum), `range` was dropped +entirely, and the safety JSON/`ruleset-analysis` JSON output regressed to compact (non-pretty) +formatting, unlike the main `formatJson()` output. Separately, regular (unfiltered) grading +output never surfaced the per-violation safety signals this feature computes, so a user had to +make a second, `--remediation-safety`-filtered request just to see how risky a finding was to +fix. See `data-model.md` "Output formatting contract (all surfaces)" and "DiagnosticWithSafety" +sections for the corrected contract. + +- [X] T041 Fix `buildRemediationItem()` in `packages/api-grade-core/src/remediation-safety.ts`: assign `severity: diagnostic.severity` directly (remove the `SEVERITY_LABELS`/numeric-severity lookup) and add `range: diagnostic.range` to the returned `RemediationItem` — depends on T005 +- [X] T042 Add `range: Diagnostic['range']` to the `RemediationItem` type (`packages/api-grade-core/src/types.ts`) and render it (`Line N`) in `formatRemediationSafetyHuman()` — depends on T041 +- [X] T043 Pretty-print every CLI-printed JSON document with `JSON.stringify(value, null, 2)`: `buildRemediationSafetyOutput()`/`ruleset-analysis`/`ruleset-analysis correct` output in `src/cli/index.ts` and `src/cli/ruleset-analysis-cli.ts` (and, for consistency across all CLI JSON output, `src/cli/ruleset-config-cli.ts`'s `config`/`set-ruleset`/`get-ruleset` JSON payloads) — MCP tool JSON responses are explicitly exempt (kept compact for AI-agent token efficiency) — depends on T041 +- [X] T044 Add `DiagnosticWithSafety` to `packages/api-grade-core/src/types.ts` (extends `Diagnostic` with `riskLevel`/`confidenceLevel`/`remediationSafetyLevel`/`staleFingerprintWarning`); extend `buildCommonGradeOutput()` (`json-output.ts`), `formatJson()`, and `formatHuman()` (`formatter.ts`) to accept an optional `rulesetAnalysis` and, when supplied, decorate each diagnostic via `getRemediationSafety()` — depends on T004 +- [X] T045 Wire `src/cli/index.ts`'s default (non-`--remediation-safety`) `--format json`/`--format human` path to always compute `rulesetAnalysis` (via the same `loadRuleset()`/`analyseRuleset()` call already made for `--remediation-safety`) and pass it to `formatJson()`/`formatHuman()`, so regular grading output always includes per-violation safety info — depends on T044 +- [X] T046 Update `tests/integration/cli-json-output.test.ts`'s `--min-grade --format json` test to parse multiple back-to-back pretty-printed JSON documents from stdout (brace-depth splitting) instead of assuming one compact JSON object per line — depends on T043 +- [X] T047 [P] Update docs to match: `docs/cli/commands.md` (pretty-print note, `range`/safety fields in both JSON Output Schema and Remediation Safety examples), `docs/package/api-reference.md` (`formatJson`/`formatHuman`/`buildCommonGradeOutput` signatures, `DiagnosticWithSafety`, `RemediationItem.range`/`severity` correction), `docs/package/api-grade-mcp.md` (`grade-api-remediation-safety` description mentions `severity`/`range`), `data-model.md` (this file) — depends on T041-T045 + +**Checkpoint**: `vitest run` (all workspaces) and `tsc --noEmit` pass; a manual CLI run confirms `severity` reflects true diagnostic severity, `range`/line numbers appear in `--remediation-safety` output, all CLI JSON is pretty-printed, and regular (non-filtered) `--format json`/`--format human` output includes per-diagnostic `riskLevel`/`confidenceLevel`/`remediationSafetyLevel`/`staleFingerprintWarning`. + +--- + ## Dependencies & Execution Order ### Phase Dependencies diff --git a/src/cli/index.ts b/src/cli/index.ts index ada36bf..9c350c3 100644 --- a/src/cli/index.ts +++ b/src/cli/index.ts @@ -144,7 +144,7 @@ program if (authTypeOption !== undefined && !isValidAuthType(authTypeOption)) { const message = `Invalid --auth-type value '${authTypeOption}'. Must be one of: none, github-pat.`; if (outputFormat === 'json') { - console.log(JSON.stringify({ error: 'RULESET_BAD_CONFIG', message })); + console.log(JSON.stringify({ error: 'RULESET_BAD_CONFIG', message }, null, 2)); } else { console.error(chalk.red(`Error: ${message}`)); } @@ -168,7 +168,7 @@ program const fetchOutcome = await resolveRemoteRuleset(authResult); if (fetchOutcome.failure) { if (outputFormat === 'json') { - console.log(JSON.stringify(fetchOutcome.failure)); + console.log(JSON.stringify(fetchOutcome.failure, null, 2)); } else { console.error(chalk.red(`Error: ${fetchOutcome.failure.message}`)); } @@ -184,25 +184,26 @@ program rulesetPath, }); + const loadedRuleset = await loadRuleset(result.format, rulesetPath); + const rulesetAnalysis = await analyseRuleset(loadedRuleset); + if (cliOpts.remediationSafety !== undefined) { const requestedLevel = cliOpts.remediationSafety as RemediationSafetyLevel; const specContent = readFileSync(specFile, 'utf-8'); - const loadedRuleset = await loadRuleset(result.format, rulesetPath); - const rulesetAnalysis = await analyseRuleset(loadedRuleset); const output = outputFormat === 'json' - ? JSON.stringify(buildRemediationSafetyOutput(result, specContent, rulesetAnalysis, requestedLevel)) + ? JSON.stringify(buildRemediationSafetyOutput(result, specContent, rulesetAnalysis, requestedLevel), null, 2) : formatRemediationSafetyHuman(result, specContent, rulesetAnalysis, requestedLevel); console.log(output); } else { const output = outputFormat === 'json' - ? formatJson(result, topN) - : formatHuman(result, topN); + ? formatJson(result, topN, rulesetAnalysis) + : formatHuman(result, topN, rulesetAnalysis); console.log(output); } if (minGrade !== undefined) { if (outputFormat === 'json') { - console.log(JSON.stringify(buildAssertOutput(result, minGrade))); + console.log(JSON.stringify(buildAssertOutput(result, minGrade), null, 2)); } const resultIdx = gradeToNumber(result.letterGrade); diff --git a/src/cli/ruleset-analysis-cli.ts b/src/cli/ruleset-analysis-cli.ts index 018333f..98313c6 100644 --- a/src/cli/ruleset-analysis-cli.ts +++ b/src/cli/ruleset-analysis-cli.ts @@ -17,7 +17,7 @@ export interface RulesetAnalysisCorrectOptions { function fail(message: string, format: string | undefined): never { if (format === 'json') { - console.log(JSON.stringify({ error: 'RULESET_ANALYSIS_FAILED', message })); + console.log(JSON.stringify({ error: 'RULESET_ANALYSIS_FAILED', message }, null, 2)); } else { console.error(chalk.red(`Error: ${message}`)); } @@ -51,7 +51,7 @@ export async function runRulesetAnalysis(opts: RulesetAnalysisOptions): Promise< const analysis = await analyseRuleset(loadedRuleset); if (format === 'json') { - console.log(JSON.stringify(analysis)); + console.log(JSON.stringify(analysis, null, 2)); } else { console.log(formatHuman(analysis)); } @@ -84,7 +84,7 @@ export async function runRulesetAnalysisCorrect(opts: RulesetAnalysisCorrectOpti ); if (format === 'json') { - console.log(JSON.stringify({ ruleId: opts.ruleId, level: opts.level, ...result })); + console.log(JSON.stringify({ ruleId: opts.ruleId, level: opts.level, ...result }, null, 2)); } else { console.log(`Persisted '${opts.ruleId}' as ${opts.level} (${result.written}).`); if (result.sharedFileContent) { diff --git a/src/cli/ruleset-config-cli.ts b/src/cli/ruleset-config-cli.ts index b9ae1c3..66df6c9 100644 --- a/src/cli/ruleset-config-cli.ts +++ b/src/cli/ruleset-config-cli.ts @@ -24,7 +24,7 @@ export interface SetRulesetOptions { function fail(message: string, format: string | undefined, errorCode = 'RULESET_BAD_CONFIG'): never { if (format === 'json') { - console.log(JSON.stringify({ error: errorCode, message })); + console.log(JSON.stringify({ error: errorCode, message }, null, 2)); } else { console.error(chalk.red(`Error: ${message}`)); } @@ -70,7 +70,7 @@ export async function runSetRuleset(opts: SetRulesetOptions): Promise { } if (opts.format === 'json') { - console.log(JSON.stringify({ scope: opts.scope, rulesetPath, configFile })); + console.log(JSON.stringify({ scope: opts.scope, rulesetPath, configFile }, null, 2)); } else { const scopeLabel = opts.scope!.charAt(0).toUpperCase() + opts.scope!.slice(1); console.log( @@ -146,7 +146,7 @@ export async function runGetRuleset(opts: { format?: string }): Promise { : null, builtIn: 'default', }; - console.log(JSON.stringify(response)); + console.log(JSON.stringify(response, null, 2)); return; } diff --git a/tests/integration/cli-json-output.test.ts b/tests/integration/cli-json-output.test.ts index 0779a40..f63132b 100644 --- a/tests/integration/cli-json-output.test.ts +++ b/tests/integration/cli-json-output.test.ts @@ -14,6 +14,28 @@ function runCli(args: string[]): { status: number | null; stdout: string; stderr return { status: result.status, stdout: result.stdout ?? '', stderr: result.stderr ?? '' }; } +// stdout may contain multiple pretty-printed (multi-line) JSON documents printed back-to-back +// (one per console.log call) — split them by tracking brace depth rather than by line. +function splitJsonDocuments(stdout: string): unknown[] { + const docs: unknown[] = []; + let depth = 0; + let start = -1; + for (let i = 0; i < stdout.length; i++) { + const ch = stdout[i]; + if (ch === '{') { + if (depth === 0) start = i; + depth++; + } else if (ch === '}') { + depth--; + if (depth === 0 && start !== -1) { + docs.push(JSON.parse(stdout.slice(start, i + 1))); + start = -1; + } + } + } + return docs; +} + describe('CLI --format json output shape', () => { it('matches the CommonGradeOutput shape with no old wrapper fields', () => { const { status, stdout } = runCli([ @@ -49,18 +71,11 @@ describe('CLI --format json output shape', () => { '--format', 'json', ]); expect(status).toBe(0); - const lines = stdout.trim().split('\n'); - const assertLine = lines.find((l) => { - try { - const parsed = JSON.parse(l); - return 'passed' in parsed; - } catch { - return false; - } - }); - expect(assertLine).toBeDefined(); - const assertOutput = JSON.parse(assertLine as string); - expect(assertOutput).toHaveProperty('passed'); + const documents = splitJsonDocuments(stdout); + const assertOutput = documents.find( + (d): d is Record => typeof d === 'object' && d !== null && 'passed' in d + ); + expect(assertOutput).toBeDefined(); expect(assertOutput).toHaveProperty('actual'); expect(assertOutput).toHaveProperty('minimum'); expect(assertOutput).toHaveProperty('specPath'); From ad365f409cd9e81a36220cb4880bfb9b5cc52931 Mon Sep 17 00:00:00 2001 From: DawMatt Date: Fri, 26 Jun 2026 11:13:06 +1000 Subject: [PATCH 12/22] Minor corrections --- specs/012-remediation-safety/plan.md | 4 ++-- specs/012-remediation-safety/tasks.md | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/specs/012-remediation-safety/plan.md b/specs/012-remediation-safety/plan.md index 8573ebd..8db2f36 100644 --- a/specs/012-remediation-safety/plan.md +++ b/specs/012-remediation-safety/plan.md @@ -8,7 +8,7 @@ ## Summary -Build a deterministic, rule-metadata-driven ruleset analyser (`analyseRuleset()`) that assigns every rule in a loaded Spectral ruleset a risk level (`safe`/`humanreview`/`unsafe`) and a confidence level (`high`/`medium`/`low`), per [`automated_remediation_safety_algorithm_spec.md`](../algorithms/automated_remediation_safety_algorithm_spec.md). Extend `--remediation-safety` (CLI) and the `grade-api-remediation-safety` MCP tool's `level` parameter from the single `safe` value (Feature 11) to all three levels, computed via per-violation lookup against the analyser's cached result. Add a new CLI subcommand (`ruleset-analysis`) and MCP tool (`analyse-ruleset-safety`) so the analyser's output is inspectable independent of grading a spec. Complete the internal rename Feature 11 deferred: no source, test, or current documentation file may reference "quick fix(es)" in any form afterward (historical `CHANGELOG.md`/`GOAL.md` entries excluded as accurate historical record). +Build a deterministic, rule-metadata-driven ruleset analyser (`analyseRuleset()`) that assigns every rule in a loaded Spectral ruleset a risk level (`low`/`medium`/`high`) and a confidence level (`high`/`medium`/`low`), deriving a remediation safety level (`safe`/`humanreview`/`unsafe`) from those two signals via the decision matrix in [`automated_remediation_safety_algorithm_spec.md`](../algorithms/automated_remediation_safety_algorithm_spec.md). Extend `--remediation-safety` (CLI) and the `grade-api-remediation-safety` MCP tool's `level` parameter from the single `safe` value (Feature 11) to all three levels, computed via per-violation lookup against the analyser's cached result. Add a new CLI subcommand (`ruleset-analysis`) and MCP tool (`analyse-ruleset-safety`) so the analyser's output is inspectable independent of grading a spec. Complete the internal rename Feature 11 deferred: no source, test, or current documentation file may reference "quick fix(es)" in any form afterward (historical `CHANGELOG.md`/`GOAL.md` entries excluded as accurate historical record). ## Technical Context @@ -16,7 +16,7 @@ Build a deterministic, rule-metadata-driven ruleset analyser (`analyseRuleset()` **Primary Dependencies**: `@stoplight/spectral-rulesets` / `@stoplight/spectral-ruleset-bundler` (already used by `rulesets/loader.ts` to load rule metadata — the analyser reads `LoadedRuleset.ruleset.rules`, no new parsing dependency needed), `commander` (new `ruleset-analysis` CLI subcommand), `zod` (new `analyse-ruleset-safety` MCP tool schema + extended `level` enum), `@modelcontextprotocol/sdk` -**Storage**: N/A — `RulesetAnalysis` is computed fresh per request/grading run, never persisted (consistent with Feature 11's data model) +**Storage**: Three-tier persistence for ruleset analysis results — (1) bundled pre-calculated `BundledRulesetAnalysis` shipped with the package for the built-in rulesets (FR-012), (2) shared colocated `SharedRulesetAnalysis` stored alongside the ruleset file/URL for team sharing (FR-016/FR-017), (3) workspace/global `PersonalRulesetAnalysisOverride` for user-local corrections that take precedence without modifying shared data (FR-018). All three are read at Stage 0 of `analyseRuleset()` before heuristic stages run; writes occur only for local/writable rulesets (FR-019). `RulesetAnalysis` is otherwise ephemeral within a process invocation (not cached to disk beyond these stores). **Testing**: Vitest (`vitest run`). New unit tests for `analyseRuleset()`/`getRemediationSafety()` in `packages/api-grade-core/tests/unit/remediation-safety.test.ts` (replacing `quick-fixes.test.ts`); updated CLI integration test `tests/integration/cli-remediation-safety.test.ts` (replacing `cli-quick-fixes.test.ts`) covering all three levels plus the new `ruleset-analysis` subcommand; updated MCP integration test `packages/api-grade-mcp/tests/integration/remediation-safety.test.ts` (replacing `quick-fixes-only.test.ts`) plus a new test for `analyse-ruleset-safety` diff --git a/specs/012-remediation-safety/tasks.md b/specs/012-remediation-safety/tasks.md index 5ac9f58..1368435 100644 --- a/specs/012-remediation-safety/tasks.md +++ b/specs/012-remediation-safety/tasks.md @@ -7,7 +7,7 @@ description: "Task list for Feature 12: Remediation Safety (Ruleset Analyser & M **Input**: Design documents from `/specs/012-remediation-safety/` -**Prerequisites**: plan.md, spec.md, research.md, data-model.md, contracts/remediation-safety-surfaces.md, quickstart.md, `specs/algorithms/automated_remediation_safety_algorithm_spec.md` +**Prerequisites**: plan.md, spec.md, research.md, data-model.md, contracts/remediation-safety-surfaces.md, quickstart.md, `specs/algorithms/automated_remediation_safety_algorithm_spec.md` *(authored during the `/speckit-plan` phase — not a task output; see plan.md Scale/Scope note)* **Tests**: Included — Constitution Principle IV (Test-Driven Quality) and plan.md's Technical Context both mandate test coverage written alongside this feature's implementation. From abd08c9ba9fbd4bbb691cc3503c4adaf8c4e8ece Mon Sep 17 00:00:00 2001 From: DawMatt Date: Fri, 26 Jun 2026 11:34:46 +1000 Subject: [PATCH 13/22] AsyncAPI channel risk fix for AAS2/3 --- .../api-grade-core/src/remediation-safety.ts | 30 ++++++++++-- .../rulesets/bundled-analysis/asyncapi.json | 18 +++---- .../rulesets/bundled-analysis/openapi.json | 18 +++---- .../tests/unit/remediation-safety.test.ts | 41 ++++++++++++++++ specs/012-remediation-safety/tasks.md | 22 +++++++++ ...mated_remediation_safety_algorithm_spec.md | 49 +++++++++++++------ 6 files changed, 141 insertions(+), 37 deletions(-) diff --git a/packages/api-grade-core/src/remediation-safety.ts b/packages/api-grade-core/src/remediation-safety.ts index 9e51ad6..0ae06bf 100644 --- a/packages/api-grade-core/src/remediation-safety.ts +++ b/packages/api-grade-core/src/remediation-safety.ts @@ -176,6 +176,13 @@ function fieldTokensOf(rule: SpectralRule): string[] { return thens.flatMap((t) => (typeof t?.field === 'string' ? tokenize(t.field) : [])); } +function fieldNamesOf(rule: SpectralRule): string[] { + const then = rule.then; + if (!then) return []; + const thens = Array.isArray(then) ? then : [then]; + return thens.map((t) => t?.field).filter((f): f is string => typeof f === 'string'); +} + function matchedTiers(givenExprs: string[], extraSegments: string[] = []): Set { const tiers = new Set(); const scan = (segment: string): void => { @@ -202,8 +209,12 @@ function tierToRisk(tier: Tier): RiskLevel { return tier === 'unsafe' ? 'high' : tier === 'humanreview' ? 'medium' : 'low'; } -// Stage 1a: a `given` expression that selects path/channel object keys directly. -function stage1a(givenExprs: string[]): StageResult | null { +// Stage 1a: a `given` expression that selects path/channel object keys directly (via the JSONPath +// `~` key-selector), OR a rule using Spectral's `then.field: "@key"` on a paths/channels +// collection — the function-based equivalent of the `~` key-selector. In AsyncAPI 2.x the channel +// key IS the routing address; in OpenAPI the path key is the route. Both forms carry identical +// semantic risk: any satisfying edit renames a public path or channel. +function stage1a(givenExprs: string[], fieldNames: string[] = []): StageResult | null { for (const given of givenExprs) { if (!isKeySelector(given)) continue; const tokens = tokenize(given); @@ -217,6 +228,18 @@ function stage1a(givenExprs: string[]): StageResult | null { }; } } + if (fieldNames.includes('@key')) { + const givenTokens = givenExprs.flatMap(tokenize); + if (givenTokens.includes('paths') || givenTokens.includes('channels')) { + return { + riskLevel: 'high', + confidenceLevel: 'high', + rationale: + 'then.field "@key" on paths/channels collection — equivalent to a path/channel key-selector; any satisfying edit renames a public path or channel', + source: 'heuristic', + }; + } + } return null; } @@ -285,8 +308,9 @@ const STAGE2_FALLBACK: StageResult = { function classifyRuleStages1And2(rule: SpectralRule, aliases: AliasMap): StageResult { const givenExprs = resolvedGivenExprsOf(rule, aliases); + const fieldNames = fieldNamesOf(rule); const fieldTokens = fieldTokensOf(rule); - const a = stage1a(givenExprs); + const a = stage1a(givenExprs, fieldNames); if (a) return a; const functionNames = functionNamesOf(rule); const b = stage1b(givenExprs, functionNames, fieldTokens); diff --git a/packages/api-grade-core/src/rulesets/bundled-analysis/asyncapi.json b/packages/api-grade-core/src/rulesets/bundled-analysis/asyncapi.json index bc0f47a..f284c9e 100644 --- a/packages/api-grade-core/src/rulesets/bundled-analysis/asyncapi.json +++ b/packages/api-grade-core/src/rulesets/bundled-analysis/asyncapi.json @@ -2,12 +2,12 @@ "rules": { "asyncapi-channel-no-empty-parameter": { "ruleId": "asyncapi-channel-no-empty-parameter", - "riskLevel": "medium", + "riskLevel": "high", "confidenceLevel": "high", - "remediationSafetyLevel": "humanreview", + "remediationSafetyLevel": "unsafe", "assessedBy": "automated", "staleFingerprintWarning": null, - "rationale": "`pattern` function (rename/reformat) on a target matching the medium tier", + "rationale": "then.field \"@key\" on paths/channels collection — equivalent to a path/channel key-selector; any satisfying edit renames a public path or channel", "source": "bundled-default", "fingerprint": "faef5a6495551bbb7fa3f2d0d85ba9e4703ba0fe0cbccf3c40a8c502ba81f98b" }, @@ -24,12 +24,12 @@ }, "asyncapi-channel-no-query-nor-fragment": { "ruleId": "asyncapi-channel-no-query-nor-fragment", - "riskLevel": "medium", + "riskLevel": "high", "confidenceLevel": "high", - "remediationSafetyLevel": "humanreview", + "remediationSafetyLevel": "unsafe", "assessedBy": "automated", "staleFingerprintWarning": null, - "rationale": "`pattern` function (rename/reformat) on a target matching the medium tier", + "rationale": "then.field \"@key\" on paths/channels collection — equivalent to a path/channel key-selector; any satisfying edit renames a public path or channel", "source": "bundled-default", "fingerprint": "73eff0af4d20d3818101af39362816099cd3a524492d55d905d50b2a3c3cbff5" }, @@ -46,12 +46,12 @@ }, "asyncapi-channel-no-trailing-slash": { "ruleId": "asyncapi-channel-no-trailing-slash", - "riskLevel": "medium", + "riskLevel": "high", "confidenceLevel": "high", - "remediationSafetyLevel": "humanreview", + "remediationSafetyLevel": "unsafe", "assessedBy": "automated", "staleFingerprintWarning": null, - "rationale": "`pattern` function (rename/reformat) on a target matching the medium tier", + "rationale": "then.field \"@key\" on paths/channels collection — equivalent to a path/channel key-selector; any satisfying edit renames a public path or channel", "source": "bundled-default", "fingerprint": "6e35de3017f8bd11b0aeefee5728c0a7d1d31af3765c0345f79c89c6d87a2953" }, diff --git a/packages/api-grade-core/src/rulesets/bundled-analysis/openapi.json b/packages/api-grade-core/src/rulesets/bundled-analysis/openapi.json index 3c079e0..b3e3b60 100644 --- a/packages/api-grade-core/src/rulesets/bundled-analysis/openapi.json +++ b/packages/api-grade-core/src/rulesets/bundled-analysis/openapi.json @@ -244,34 +244,34 @@ }, "path-declarations-must-exist": { "ruleId": "path-declarations-must-exist", - "riskLevel": "medium", + "riskLevel": "high", "confidenceLevel": "high", - "remediationSafetyLevel": "humanreview", + "remediationSafetyLevel": "unsafe", "assessedBy": "automated", "staleFingerprintWarning": null, - "rationale": "`pattern` function (rename/reformat) on a target matching the medium tier", + "rationale": "then.field \"@key\" on paths/channels collection — equivalent to a path/channel key-selector; any satisfying edit renames a public path or channel", "source": "bundled-default", "fingerprint": "f43a9d6a968dc24d5717f606028c828be8ea73057afcfdcb173d6f2528335408" }, "path-keys-no-trailing-slash": { "ruleId": "path-keys-no-trailing-slash", - "riskLevel": "medium", + "riskLevel": "high", "confidenceLevel": "high", - "remediationSafetyLevel": "humanreview", + "remediationSafetyLevel": "unsafe", "assessedBy": "automated", "staleFingerprintWarning": null, - "rationale": "`pattern` function (rename/reformat) on a target matching the medium tier", + "rationale": "then.field \"@key\" on paths/channels collection — equivalent to a path/channel key-selector; any satisfying edit renames a public path or channel", "source": "bundled-default", "fingerprint": "ea8d153efe6ac9cc8942c5f0292402751fa472ee5776e8a9a8dcafa7f99826a4" }, "path-not-include-query": { "ruleId": "path-not-include-query", - "riskLevel": "medium", + "riskLevel": "high", "confidenceLevel": "high", - "remediationSafetyLevel": "humanreview", + "remediationSafetyLevel": "unsafe", "assessedBy": "automated", "staleFingerprintWarning": null, - "rationale": "`pattern` function (rename/reformat) on a target matching the medium tier", + "rationale": "then.field \"@key\" on paths/channels collection — equivalent to a path/channel key-selector; any satisfying edit renames a public path or channel", "source": "bundled-default", "fingerprint": "e2738a25c076ae7f0ef2187431847d76fcab2f3e3d9f543b392f1ab9fa7138c6" }, diff --git a/packages/api-grade-core/tests/unit/remediation-safety.test.ts b/packages/api-grade-core/tests/unit/remediation-safety.test.ts index c59ce88..cce63db 100644 --- a/packages/api-grade-core/tests/unit/remediation-safety.test.ts +++ b/packages/api-grade-core/tests/unit/remediation-safety.test.ts @@ -72,6 +72,47 @@ describe('analyseRuleset() — Stage 1a key-selector check', () => { expect(rules[0].riskLevel).toBe('high'); expect(rules[0].confidenceLevel).toBe('high'); }); + + it('classifies then.field "@key" on $.channels as unsafe/high (AsyncAPI 2.x pattern)', async () => { + const ruleset = makeRuleset({ + 'asyncapi-channel-no-empty-parameter': { + given: '$.channels', + then: { field: '@key', function: 'pattern' }, + }, + }); + const { rules } = await analyseRuleset(ruleset); + expect(rules[0].riskLevel).toBe('high'); + expect(rules[0].confidenceLevel).toBe('high'); + expect(rules[0].remediationSafetyLevel).toBe('unsafe'); + expect(rules[0].source).toBe('heuristic'); + }); + + it('classifies then.field "@key" on $.paths as unsafe/high', async () => { + const ruleset = makeRuleset({ + 'custom-path-key-rule': { + given: '$.paths', + then: { field: '@key', function: 'pattern' }, + }, + }); + const { rules } = await analyseRuleset(ruleset); + expect(rules[0].riskLevel).toBe('high'); + expect(rules[0].confidenceLevel).toBe('high'); + expect(rules[0].remediationSafetyLevel).toBe('unsafe'); + }); + + it('does NOT apply @key check when given does not target paths/channels', async () => { + const ruleset = makeRuleset({ + 'custom-schema-key-rule': { + given: '$.components.schemas', + then: { field: '@key', function: 'pattern' }, + }, + }); + const { rules } = await analyseRuleset(ruleset); + // @key on $.components.schemas → not paths/channels → falls through to Stage 1b + // pattern fn, schemas has no tier match → medium risk, high confidence (single tier) + expect(rules[0].riskLevel).toBe('medium'); + expect(rules[0].remediationSafetyLevel).toBe('humanreview'); + }); }); describe('analyseRuleset() — Stage 1b function-mechanics classification', () => { diff --git a/specs/012-remediation-safety/tasks.md b/specs/012-remediation-safety/tasks.md index 1368435..e630b6d 100644 --- a/specs/012-remediation-safety/tasks.md +++ b/specs/012-remediation-safety/tasks.md @@ -152,6 +152,28 @@ sections for the corrected contract. --- +--- + +## Phase 8: Heuristic correctness — `then.field "@key"` on paths/channels (post-Phase 7) + +**Purpose**: Stage 2a of the heuristic only recognised the JSONPath `~` key-selector form (e.g. +`$.channels[*]~`) as targeting path/channel keys. Spectral's built-in rulesets also express the +same semantics via `then.field: "@key"` on `given: "$.channels"` or `given: "$.paths"` (the +function-based equivalent). Without this check, those rules were falling into Stage 1b's +`pattern`/`casing` default logic and receiving `medium/high` risk (humanreview) instead of the +correct `high/high` (unsafe) — e.g. `asyncapi-channel-no-empty-parameter` and its siblings, and +the OpenAPI `path-keys-no-trailing-slash` / `path-not-include-query` / +`path-declarations-must-exist` rules. + +- [X] T048 Extend Stage 1a in `packages/api-grade-core/src/remediation-safety.ts`: add `fieldNamesOf()` helper (raw `then.field` strings, not tokenized), pass field names to `stage1a()`, and check for `then.field: "@key"` on a `given` that tokenizes to include `"paths"` or `"channels"` — returning `high/high/unsafe` with the key-selector-equivalent rationale; update `classifyRuleStages1And2()` to pass `fieldNames` — depends on T006 +- [X] T049 [P] Add three unit tests to `packages/api-grade-core/tests/unit/remediation-safety.test.ts` in the Stage 1a describe block: `$.channels` + `field: "@key"` → `high/high/unsafe`; `$.paths` + `field: "@key"` → `high/high/unsafe`; `$.components.schemas` + `field: "@key"` → still `medium` (control, no paths/channels) — depends on T048 +- [X] T050 [P] Update `specs/algorithms/automated_remediation_safety_algorithm_spec.md` Stage 2a: document both the `~` and the `@key` checks, add rationale for why `@key` carries identical risk in AsyncAPI 2.x (channel key IS the routing address) and OpenAPI (path key IS the route); update the Example table to include `asyncapi-channel-no-empty-parameter` — depends on T048 +- [X] T051 Rebuild `packages/api-grade-core` (`npm run build`) and regenerate the bundled analysis (`node scripts/generate-bundled-analysis.mjs`): 6 AsyncAPI 2.x channel rules and 3 OpenAPI path-key rules are upgraded from `medium/high/humanreview` to `high/high/unsafe` in `src/rulesets/bundled-analysis/{asyncapi,openapi}.json` — depends on T048, T050 + +**Checkpoint**: `vitest run` (all workspaces) passes; bundled analysis reflects corrected `@key` classifications for all 9 affected rules. + +--- + ## Dependencies & Execution Order ### Phase Dependencies diff --git a/specs/algorithms/automated_remediation_safety_algorithm_spec.md b/specs/algorithms/automated_remediation_safety_algorithm_spec.md index 7b7e994..0efdf5a 100644 --- a/specs/algorithms/automated_remediation_safety_algorithm_spec.md +++ b/specs/algorithms/automated_remediation_safety_algorithm_spec.md @@ -118,16 +118,31 @@ Runs only when Stage 1 doesn't match. Two checks, in order: a structural **key-s ### Stage 2a: Key-Selector Check +Catches rules that target the *key* of a `paths` or `channels` collection — any such rule cannot be satisfied without renaming a public route or channel. Two equivalent spellings exist in Spectral: the JSONPath `~` key-selector on the `given` expression, and `then.field: "@key"` on a `given` that targets the collection itself. Both are checked here; neither appears in the segment-membership set in 2b. + ``` -IS_KEY_SELECTOR(given) = given matches a JSONPath Plus object-key selector - (the trailing "~" modifier, e.g. "$.paths[*]~", "$.channels[*]~") +IS_KEY_SELECTOR(given) = given ends with the JSONPath "~" modifier + (e.g. "$.paths[*]~", "$.channels[*]~") + +IS_KEY_FIELD(rule) = rule.then.field == "@key" + (Spectral's function-based equivalent of the "~" key-selector, + used e.g. by AsyncAPI 2.x channel rules such as asyncapi-channel-no-empty-parameter) FOR EACH given_expr IN rule.given: - IF IS_KEY_SELECTOR(given_expr) AND given_expr contains "paths" or "channels" as the selected collection: - RETURN { riskLevel: "unsafe", confidenceLevel: "high", rationale: "given path selects path/channel object keys directly — any satisfying edit renames a public path or channel", source: "heuristic" } + IF IS_KEY_SELECTOR(given_expr) AND given_expr contains "paths" or "channels": + RETURN { riskLevel: "high", confidenceLevel: "high", + rationale: "given path selects path/channel object keys directly — any satisfying edit renames a public path or channel", + source: "heuristic" } + +IF IS_KEY_FIELD(rule) AND (rule.given tokens include "paths" or "channels"): + RETURN { riskLevel: "high", confidenceLevel: "high", + rationale: "then.field \"@key\" on paths/channels collection — equivalent to a path/channel key-selector; any satisfying edit renames a public path or channel", + source: "heuristic" } ``` -**Rationale:** a rule targeting the *keys* of `paths`/`channels` (e.g. a kebab-case naming convention, `clarification-algorithm.md`'s Example B) cannot be satisfied without renaming a real, public path or channel — by construction this is the riskiest, highest-confidence case the heuristic can recognize, and it would otherwise be missed by segment-membership matching alone (`paths`/`channels` are deliberately *not* included as bare segments in 2b, since most rules with `paths`/`channels` somewhere in their `given` — e.g. `operation-description`, which reaches `$.paths[*][*].description` — are not targeting the key itself and must not be over-classified as unsafe). +**Rationale for `~` check:** a rule targeting the *keys* of `paths`/`channels` (e.g. a kebab-case naming convention) cannot be satisfied without renaming a real, public path or channel — by construction the riskiest, highest-confidence case the heuristic can recognize. + +**Rationale for `@key` check:** Spectral's built-in rulesets often use `given: "$.paths"` or `given: "$.channels"` with `then.field: "@key"` rather than `given: "$.paths[*]~"` — these are semantically identical (both select the collection key), but the `~` check above would miss them. In AsyncAPI 2.x the channel key *is* the routing address; in OpenAPI the path key *is* the route. Both target the same renaming risk. `paths` and `channels` are deliberately *not* included as bare segments in 2b, since most rules with those tokens in their `given` reach into the collection's content (e.g. `operation-description` → `$.paths[*][*].description`) and must not be over-classified as unsafe. ### Stage 2b: Segment-Membership Heuristic @@ -236,15 +251,16 @@ get_remediation_safety(diagnostic, rulesetAnalysis): ``` rules = [ - { id: "operation-description", given: "$.paths[*][*]" }, - { id: "operation-operationId", given: "$.paths[*][*]" }, - { id: "oas3-schema", given: "$" }, - { id: "custom-required-header", given: "$.paths[*][*].parameters[?(@.in=='header')].required" }, - { id: "custom-naming-convention", given: "$.paths[*]~" }, - { id: "custom-channel-rename", given: "$.channels[*]~" }, - { id: "custom-channel-address", given: "$.channels[*].address" }, - { id: "custom-no-signal", given: "$.x-custom-thing" }, - { id: "previously-reviewed-rule", given: "$.unrecognizedExtension" } + { id: "operation-description", given: "$.paths[*][*]", then: { field: "description", function: "truthy" } }, + { id: "operation-operationId", given: "$.paths[*][*]", then: { field: "operationId", function: "truthy" } }, + { id: "oas3-schema", given: "$" }, + { id: "custom-required-header", given: "$.paths[*][*].parameters[*].required", then: { function: "schema" } }, + { id: "custom-naming-convention", given: "$.paths[*]~", then: { function: "casing" } }, + { id: "custom-channel-rename", given: "$.channels[*]~", then: { function: "casing" } }, + { id: "asyncapi-channel-no-empty-parameter", given: "$.channels", then: { field: "@key", function: "pattern" } }, + { id: "custom-channel-address", given: "$.channels[*].address", then: { function: "pattern" } }, + { id: "custom-no-signal", given: "$.x-custom-thing" }, + { id: "previously-reviewed-rule", given: "$.unrecognizedExtension" } ] ``` @@ -254,8 +270,9 @@ rules = [ | `operation-operationId` | Stage 1 (humanreview table) | `humanreview` | `high` | Rule id matched curated humanreview-prefix table | | `oas3-schema` | Stage 1 (unsafe table) | `unsafe` | `high` | Rule id matched curated unsafe-prefix table | | `custom-required-header` | Stage 2b (`required` segment) | `unsafe` | `medium` | `given` path matched the unsafe segment set only | -| `custom-naming-convention` | Stage 2a (path key-selector) | `unsafe` | `high` | `given` selects path object keys directly | -| `custom-channel-rename` | Stage 2a (channel key-selector) | `unsafe` | `high` | `given` selects channel object keys directly | +| `custom-naming-convention` | Stage 2a (`~` key-selector on paths) | `unsafe` | `high` | `given` selects path object keys directly via JSONPath `~` | +| `custom-channel-rename` | Stage 2a (`~` key-selector on channels) | `unsafe` | `high` | `given` selects channel object keys directly via JSONPath `~` | +| `asyncapi-channel-no-empty-parameter` | Stage 2a (`@key` field on channels) | `unsafe` | `high` | `then.field "@key"` on `$.channels` — equivalent to a channel key-selector; used by AsyncAPI 2.x rules where the channel key is the routing address | | `custom-channel-address` | Stage 2b (`address` segment) | `unsafe` | `medium` | `given` path matched the unsafe segment set only (AsyncAPI channel address) | | `custom-no-signal` | Stage 3 (fallback) | `unsafe` | `low` | No recognizable rule-id or path signal | | `previously-reviewed-rule` | Stage 0 (persisted) | *(whatever was set)* | `high` | Matched a shared or personal override for this exact rule definition, from a prior run against this same ruleset | From 37a98422ea90a0659d0e47985441f369f2365bc9 Mon Sep 17 00:00:00 2001 From: DawMatt Date: Fri, 26 Jun 2026 11:50:16 +1000 Subject: [PATCH 14/22] Supports pattern function existence check technique --- .../api-grade-core/src/remediation-safety.ts | 42 ++++++++++++- .../rulesets/bundled-analysis/asyncapi.json | 18 +++--- .../rulesets/bundled-analysis/openapi.json | 12 ++-- .../tests/unit/remediation-safety.test.ts | 62 +++++++++++++++++++ specs/012-remediation-safety/tasks.md | 14 +++++ ...mated_remediation_safety_algorithm_spec.md | 17 +++++ 6 files changed, 148 insertions(+), 17 deletions(-) diff --git a/packages/api-grade-core/src/remediation-safety.ts b/packages/api-grade-core/src/remediation-safety.ts index 0ae06bf..d0cbe29 100644 --- a/packages/api-grade-core/src/remediation-safety.ts +++ b/packages/api-grade-core/src/remediation-safety.ts @@ -42,6 +42,7 @@ interface SpectralThen { // before bundling carry it as a plain string. Both forms must be handled. function?: string | { name?: string }; field?: string; + functionOptions?: Record; } interface SpectralRule { @@ -183,6 +184,22 @@ function fieldNamesOf(rule: SpectralRule): string[] { return thens.map((t) => t?.field).filter((f): f is string => typeof f === 'string'); } +// `pattern` with `notMatch`-only (no `match`) is an existence/validity check — it asserts the +// field does NOT contain a bad value (empty object, trailing slash, example.com, etc.) rather +// than enforcing a specific format or naming convention. Semantically closer to `falsy`/`truthy` +// than to `casing` or a format-match pattern. When `match` is also present the intent is +// ambiguous so we fall through to the rename/reformat classification. +function isPatternExistenceCheck(rule: SpectralRule): boolean { + const then = rule.then; + if (!then) return false; + const thens = Array.isArray(then) ? then : [then]; + return thens.some((t) => { + if (functionNameOf(t?.function) !== 'pattern') return false; + const opts = t?.functionOptions; + return typeof opts === 'object' && opts !== null && 'notMatch' in opts && !('match' in opts); + }); +} + function matchedTiers(givenExprs: string[], extraSegments: string[] = []): Set { const tiers = new Set(); const scan = (segment: string): void => { @@ -244,7 +261,12 @@ function stage1a(givenExprs: string[], fieldNames: string[] = []): StageResult | } // Stage 1b: classify by the rule's `then.function` mechanics. -function stage1b(givenExprs: string[], functionNames: string[], fieldTokens: string[]): StageResult | null { +function stage1b( + givenExprs: string[], + functionNames: string[], + fieldTokens: string[], + patternIsExistenceCheck: boolean, +): StageResult | null { if (functionNames.length === 0) return null; const tiers = matchedTiers(givenExprs, fieldTokens); @@ -261,6 +283,21 @@ function stage1b(givenExprs: string[], functionNames: string[], fieldTokens: str source: 'heuristic', }; } + if (fn === 'pattern' && patternIsExistenceCheck) { + // notMatch-only pattern: existence/validity check, not rename/reformat. Risk escalates the + // same as additive on recognized tiers; falls back to medium (not low) on an unrecognized + // target so that e.g. `pattern` on a bare `$` with `field: host` stays conservative. + let riskLevel: RiskLevel = tiers.size === 0 ? 'medium' : 'low'; + if (tiers.has('unsafe')) riskLevel = 'high'; + else if (tiers.has('humanreview')) riskLevel = 'medium'; + const confidenceLevel: ConfidenceLevel = tiers.size <= 1 ? 'high' : 'medium'; + return { + riskLevel, + confidenceLevel, + rationale: `\`pattern\` function (existence/validity check — \`notMatch\` validates content is present and correctly formed) on a target matching the ${riskLevel} tier`, + source: 'heuristic', + }; + } if (RENAME_FUNCTIONS.has(fn)) { let riskLevel: RiskLevel = 'medium'; if (tiers.has('unsafe')) riskLevel = 'high'; @@ -313,7 +350,8 @@ function classifyRuleStages1And2(rule: SpectralRule, aliases: AliasMap): StageRe const a = stage1a(givenExprs, fieldNames); if (a) return a; const functionNames = functionNamesOf(rule); - const b = stage1b(givenExprs, functionNames, fieldTokens); + const patternIsExistenceCheck = isPatternExistenceCheck(rule); + const b = stage1b(givenExprs, functionNames, fieldTokens, patternIsExistenceCheck); if (b) return b; const c = stage1c(givenExprs, fieldTokens); if (c) return c; diff --git a/packages/api-grade-core/src/rulesets/bundled-analysis/asyncapi.json b/packages/api-grade-core/src/rulesets/bundled-analysis/asyncapi.json index f284c9e..4fc6508 100644 --- a/packages/api-grade-core/src/rulesets/bundled-analysis/asyncapi.json +++ b/packages/api-grade-core/src/rulesets/bundled-analysis/asyncapi.json @@ -18,7 +18,7 @@ "remediationSafetyLevel": "unsafe", "assessedBy": "automated", "staleFingerprintWarning": null, - "rationale": "`pattern` function (rename/reformat) on a target matching the high tier", + "rationale": "`pattern` function (existence/validity check — `notMatch` validates content is present and correctly formed) on a target matching the high tier", "source": "bundled-default", "fingerprint": "7f20f52e4806ac9b686d5530ed6e7ff83965a6fc3afb47a2391ffad96d227031" }, @@ -40,7 +40,7 @@ "remediationSafetyLevel": "unsafe", "assessedBy": "automated", "staleFingerprintWarning": null, - "rationale": "`pattern` function (rename/reformat) on a target matching the high tier", + "rationale": "`pattern` function (existence/validity check — `notMatch` validates content is present and correctly formed) on a target matching the high tier", "source": "bundled-default", "fingerprint": "fbcc987e5a67e86b9220fee87b3226bd0b6d923fec9eb169c4aaad7d31b611ac" }, @@ -62,7 +62,7 @@ "remediationSafetyLevel": "unsafe", "assessedBy": "automated", "staleFingerprintWarning": null, - "rationale": "`pattern` function (rename/reformat) on a target matching the high tier", + "rationale": "`pattern` function (existence/validity check — `notMatch` validates content is present and correctly formed) on a target matching the high tier", "source": "bundled-default", "fingerprint": "f3bc3a732e2ceb56f9ab5cfd9b75a12399480b456916d70cc609661a49c5868a" }, @@ -392,7 +392,7 @@ "remediationSafetyLevel": "humanreview", "assessedBy": "automated", "staleFingerprintWarning": null, - "rationale": "`pattern` function (rename/reformat) on a target matching the medium tier", + "rationale": "`pattern` function (existence/validity check — `notMatch` validates content is present and correctly formed) on a target matching the medium tier", "source": "bundled-default", "fingerprint": "995dd4127b0111b66a8ccd95296c472a7ea005236c7bfb64abfdb9938dfaa45e" }, @@ -403,7 +403,7 @@ "remediationSafetyLevel": "humanreview", "assessedBy": "automated", "staleFingerprintWarning": null, - "rationale": "`pattern` function (rename/reformat) on a target matching the medium tier", + "rationale": "`pattern` function (existence/validity check — `notMatch` validates content is present and correctly formed) on a target matching the medium tier", "source": "bundled-default", "fingerprint": "c03357aacd4594dd77bd6823adeb8d811b559a59eb591c242ce499ec68064cd7" }, @@ -414,7 +414,7 @@ "remediationSafetyLevel": "humanreview", "assessedBy": "automated", "staleFingerprintWarning": null, - "rationale": "`pattern` function (rename/reformat) on a target matching the medium tier", + "rationale": "`pattern` function (existence/validity check — `notMatch` validates content is present and correctly formed) on a target matching the medium tier", "source": "bundled-default", "fingerprint": "73217cc58744bd1271f647e40904a6b5f65919124927e109532cf19cb540d2a2" }, @@ -425,7 +425,7 @@ "remediationSafetyLevel": "humanreview", "assessedBy": "automated", "staleFingerprintWarning": null, - "rationale": "`pattern` function (rename/reformat) on a target matching the medium tier", + "rationale": "`pattern` function (existence/validity check — `notMatch` validates content is present and correctly formed) on a target matching the medium tier", "source": "bundled-default", "fingerprint": "dbd318934d060020a4b9dd9ca039da253d8a9f5c5803c29997461ec59407c12d" }, @@ -436,7 +436,7 @@ "remediationSafetyLevel": "humanreview", "assessedBy": "automated", "staleFingerprintWarning": null, - "rationale": "`pattern` function (rename/reformat) on a target matching the medium tier", + "rationale": "`pattern` function (existence/validity check — `notMatch` validates content is present and correctly formed) on a target matching the medium tier", "source": "bundled-default", "fingerprint": "6d16fc8ff55cfbbd2bce5a0ea27a6f480ef42bb7b3affe275bcabd86bd7d7b9c" }, @@ -447,7 +447,7 @@ "remediationSafetyLevel": "humanreview", "assessedBy": "automated", "staleFingerprintWarning": null, - "rationale": "`pattern` function (rename/reformat) on a target matching the medium tier", + "rationale": "`pattern` function (existence/validity check — `notMatch` validates content is present and correctly formed) on a target matching the medium tier", "source": "bundled-default", "fingerprint": "4ee30c83abcb62482873a1ce7892a7b20fc10af9f6b05fb7d542701bd3297eb1" }, diff --git a/packages/api-grade-core/src/rulesets/bundled-analysis/openapi.json b/packages/api-grade-core/src/rulesets/bundled-analysis/openapi.json index b3e3b60..ca3198b 100644 --- a/packages/api-grade-core/src/rulesets/bundled-analysis/openapi.json +++ b/packages/api-grade-core/src/rulesets/bundled-analysis/openapi.json @@ -139,7 +139,7 @@ "remediationSafetyLevel": "safe", "assessedBy": "automated", "staleFingerprintWarning": null, - "rationale": "`pattern` function (rename/reformat) on a target matching the low tier", + "rationale": "`pattern` function (existence/validity check — `notMatch` validates content is present and correctly formed) on a target matching the low tier", "source": "bundled-default", "fingerprint": "d3ca7e2f67d2e957e30655eb4c6d567998bdbbdcbf390a12e71a23fdce3157fd" }, @@ -150,7 +150,7 @@ "remediationSafetyLevel": "safe", "assessedBy": "automated", "staleFingerprintWarning": null, - "rationale": "`pattern` function (rename/reformat) on a target matching the low tier", + "rationale": "`pattern` function (existence/validity check — `notMatch` validates content is present and correctly formed) on a target matching the low tier", "source": "bundled-default", "fingerprint": "be2229ebfd803a42d313318291544db932781c34973e60620fffda348fd0f394" }, @@ -359,7 +359,7 @@ "remediationSafetyLevel": "humanreview", "assessedBy": "automated", "staleFingerprintWarning": null, - "rationale": "`pattern` function (rename/reformat) on a target matching the medium tier", + "rationale": "`pattern` function (existence/validity check — `notMatch` validates content is present and correctly formed) on a target matching the medium tier", "source": "bundled-default", "fingerprint": "411bdde4f2ea237f4c2871f690551e6f5f6de96501448981823681e2b8dc9e5e" }, @@ -370,7 +370,7 @@ "remediationSafetyLevel": "humanreview", "assessedBy": "automated", "staleFingerprintWarning": null, - "rationale": "`pattern` function (rename/reformat) on a target matching the medium tier", + "rationale": "`pattern` function (existence/validity check — `notMatch` validates content is present and correctly formed) on a target matching the medium tier", "source": "bundled-default", "fingerprint": "ab7f3cd618bb07568e09911593a5eeab78ee8c22d04a0048696320863d562a16" }, @@ -513,7 +513,7 @@ "remediationSafetyLevel": "humanreview", "assessedBy": "automated", "staleFingerprintWarning": null, - "rationale": "`pattern` function (rename/reformat) on a target matching the medium tier", + "rationale": "`pattern` function (existence/validity check — `notMatch` validates content is present and correctly formed) on a target matching the medium tier", "source": "bundled-default", "fingerprint": "006381a3500034d8448234c9a2bc150b35b8eb1de914cd3d14fa8d83a176e172" }, @@ -524,7 +524,7 @@ "remediationSafetyLevel": "humanreview", "assessedBy": "automated", "staleFingerprintWarning": null, - "rationale": "`pattern` function (rename/reformat) on a target matching the medium tier", + "rationale": "`pattern` function (existence/validity check — `notMatch` validates content is present and correctly formed) on a target matching the medium tier", "source": "bundled-default", "fingerprint": "842d08729f0957b1bd259f16efbba29ae28dfa018339113170bde28633a0db7f" }, diff --git a/packages/api-grade-core/tests/unit/remediation-safety.test.ts b/packages/api-grade-core/tests/unit/remediation-safety.test.ts index cce63db..2a3fc9c 100644 --- a/packages/api-grade-core/tests/unit/remediation-safety.test.ts +++ b/packages/api-grade-core/tests/unit/remediation-safety.test.ts @@ -142,6 +142,68 @@ describe('analyseRuleset() — Stage 1b function-mechanics classification', () = expect(rules[0].riskLevel).toBe('medium'); }); + it('pattern with match functionOption => rename/reformat classification (not existence check)', async () => { + const ruleset = makeRuleset({ + 'custom-format-rule': { + given: '$.paths[*][*]', + then: { field: 'operationId', function: 'pattern', functionOptions: { match: '^[a-z-]+$' } }, + }, + }); + const { rules } = await analyseRuleset(ruleset); + expect(rules[0].riskLevel).toBe('medium'); + expect(rules[0].rationale).toContain('rename/reformat'); + }); + + it('pattern with notMatch-only => existence/validity check classification', async () => { + const ruleset = makeRuleset({ + 'asyncapi-3-channel-no-empty-parameter': { + given: '$.channels.*', + then: { field: 'address', function: 'pattern', functionOptions: { notMatch: '{}' } }, + }, + }); + const { rules } = await analyseRuleset(ruleset); + // address in UNSAFE_SEGMENTS → high risk regardless of function mode + expect(rules[0].riskLevel).toBe('high'); + expect(rules[0].remediationSafetyLevel).toBe('unsafe'); + expect(rules[0].rationale).toContain('existence/validity check'); + }); + + it('pattern with notMatch-only on safe segment => low risk (additive)', async () => { + const ruleset = makeRuleset({ + 'custom-no-script-in-description': { + given: '$.paths[*][*]', + then: { field: 'description', function: 'pattern', functionOptions: { notMatch: ' conservative medium (not low)', async () => { + const ruleset = makeRuleset({ + 'custom-host-check': { + given: '$', + then: { field: 'host', function: 'pattern', functionOptions: { notMatch: 'example\\.com' } }, + }, + }); + const { rules } = await analyseRuleset(ruleset); + // host has no tier match; empty tiers → conservative medium, not low + expect(rules[0].riskLevel).toBe('medium'); + expect(rules[0].remediationSafetyLevel).toBe('humanreview'); + }); + + it('pattern with both match and notMatch => rename/reformat (not existence check)', async () => { + const ruleset = makeRuleset({ + 'custom-ambiguous-pattern': { + given: '$.paths[*][*]', + then: { field: 'operationId', function: 'pattern', functionOptions: { match: '^[a-z]', notMatch: '__' } }, + }, + }); + const { rules } = await analyseRuleset(ruleset); + expect(rules[0].rationale).toContain('rename/reformat'); + }); + it('custom (unrecognized) function => high risk, low confidence', async () => { const ruleset = makeRuleset({ 'my-custom-rule': { given: '$.info', then: { function: 'myCustomFn' } }, diff --git a/specs/012-remediation-safety/tasks.md b/specs/012-remediation-safety/tasks.md index e630b6d..d760aed 100644 --- a/specs/012-remediation-safety/tasks.md +++ b/specs/012-remediation-safety/tasks.md @@ -174,6 +174,20 @@ the OpenAPI `path-keys-no-trailing-slash` / `path-not-include-query` / --- +## Phase 9: Heuristic correctness — `pattern` `notMatch`-only is existence check, not rename (post-Phase 8) + +**Purpose**: Stage 1b classified all `pattern` uses as "rename/reformat", defaulting to `medium` risk. But `pattern` with `notMatch`-only in `functionOptions` is semantically an existence/validity check (closer to `falsy`/`truthy`) — the fix adds content, not reformats it. This produced accurate risk levels for the built-in rulesets (target tiers dominate) but incorrect rationale text ("rename/reformat" for emptiness checks like `notMatch: '{}'`). It also mis-classified custom `pattern`+`notMatch` rules on SAFE_SEGMENTS targets as `medium` (rename default) instead of `low` (additive). + +- [X] T052 Add `functionOptions` field to `SpectralThen` interface in `packages/api-grade-core/src/remediation-safety.ts`; add `isPatternExistenceCheck()` helper (true when any `then.function: "pattern"` has `notMatch` in `functionOptions` and no `match`) — depends on T048 +- [X] T053 Update `stage1b()` in `packages/api-grade-core/src/remediation-safety.ts`: add `patternIsExistenceCheck` parameter; for `pattern` when that flag is set, apply additive-style tier escalation with a conservative `medium` fallback on empty tiers (unknown target); update `classifyRuleStages1And2()` to compute and pass the flag — depends on T052 +- [X] T054 [P] Add five unit tests to `packages/api-grade-core/tests/unit/remediation-safety.test.ts` in the Stage 1b describe block: `pattern`+`match` → rename rationale; `pattern`+`notMatch` on unsafe segment → high/unsafe with existence-check rationale; `pattern`+`notMatch` on safe segment → low/safe; `pattern`+`notMatch` on unknown target → conservative medium; `pattern`+both `match`+`notMatch` → rename rationale — depends on T053 +- [X] T055 [P] Update `specs/algorithms/automated_remediation_safety_algorithm_spec.md` Stage 2: add Stage 2a(ii) documenting the `pattern` function-mode distinction (`notMatch`-only vs `match`/no-options) with rationale; no risk-level changes in built-in rulesets (tier lookup dominates), but rationale text and custom-rule handling are corrected — depends on T053 +- [X] T056 Rebuild `packages/api-grade-core` and regenerate bundled analysis: rationale text updated for all `notMatch`-only `pattern` rules (no risk-level changes); risk levels confirmed stable via test suite — depends on T053, T055 + +**Checkpoint**: `vitest run` (all workspaces) passes (376 tests); bundled analysis shows "existence/validity check" rationale for `notMatch`-only `pattern` rules; risk levels unchanged. + +--- + ## Dependencies & Execution Order ### Phase Dependencies diff --git a/specs/algorithms/automated_remediation_safety_algorithm_spec.md b/specs/algorithms/automated_remediation_safety_algorithm_spec.md index 0efdf5a..4b59c85 100644 --- a/specs/algorithms/automated_remediation_safety_algorithm_spec.md +++ b/specs/algorithms/automated_remediation_safety_algorithm_spec.md @@ -144,6 +144,23 @@ IF IS_KEY_FIELD(rule) AND (rule.given tokens include "paths" or "channels"): **Rationale for `@key` check:** Spectral's built-in rulesets often use `given: "$.paths"` or `given: "$.channels"` with `then.field: "@key"` rather than `given: "$.paths[*]~"` — these are semantically identical (both select the collection key), but the `~` check above would miss them. In AsyncAPI 2.x the channel key *is* the routing address; in OpenAPI the path key *is* the route. Both target the same renaming risk. `paths` and `channels` are deliberately *not* included as bare segments in 2b, since most rules with those tokens in their `given` reach into the collection's content (e.g. `operation-description` → `$.paths[*][*].description`) and must not be over-classified as unsafe. +### Stage 2a(ii): `pattern` Function-Mode Distinction + +Before applying the rename/reformat classification to a `pattern` function, the implementation checks `then.functionOptions` to distinguish two semantically different uses of `pattern`: + +``` +IS_EXISTENCE_CHECK(rule) = + rule.then.function == "pattern" + AND "notMatch" in rule.then.functionOptions + AND "match" NOT in rule.then.functionOptions +``` + +- **`notMatch`-only** (`IS_EXISTENCE_CHECK` = true): the rule asserts that the field does NOT contain a bad value (empty object `{}`, trailing slash, `example.com`, `