Prompt-eval self-improve loop, ADP/HK doc remedies, and viewer UX polish by ffedoroff · Pull Request #17 · ffedoroff/code-ranker

ffedoroff · 2026-06-24T12:57:02Z

Accumulated work on this branch. Main groups:

Prompt-eval (self-improving fix-prompt loop)

New/expanded contrib/prompting-self-improve.md playbook: encode session corrections, branch = build-dir naming, new_cycles false-positive on shrunk cycles, prompt-gap-vs-capability-ceiling rule, and workspace-scope verification + sweep-mode lessons (cargo check --workspace + cargo test --workspace --no-run, not just per-crate).
contrib/prompt-eval-metrics.py: focus-aware collector for non-cycle metrics.

Rust ADP/HK doc remedies

ADP: visibility-narrowing as the explicit step-1 remedy (before extract/move).
HK: ordered remedies (narrow artificial fan-in, then split by role); an audiences-first lever was trialed and reverted (capability ceiling on god-modules).
AI.md: reading --doc <principle> is a non-optional fix-loop step; scaffolding front-loads --doc {id}.

CLI / docs

Fix: metric docs route through doc_overrides, not always base/.
e2e report goldens regenerated for the front-loaded doc_note.

Viewer UX

Header brand reads code-ranker (not upper-cased); exact tool version (snapshot.versions["code-ranker"]) shown small, centred, under the brand on hover.
stat button toggles the diff-summary popup (second click closes) and stays highlighted (.active) while open.

Docs / README

README lists supported languages (Rust production-ready; the other eight beta).
Viewer DESIGN/PRD synced for the brand/version + stat toggle.

Validation: make all green (build / test / clippy / lint / self-check; coverage 96.3% lines), markdownlint + lychee clean.

Map interaction fixes surfaced while dogfooding the report: - Flicker: drop the opacity transition on edges. Animating opacity on hundreds of edge path/polygon elements promoted each to its own compositor layer for the tween and flickered the whole page (worse the more elements there are). Snapping is a single repaint. Confirmed via a performance trace (7408 opacity Animation events on edges). - Hover de-flicker: ignore the synthetic re-mouseenter that raisePaint's reparent fires on the same node; ignore a cluster mouseleave while the pointer is still inside the cluster bbox (crossing sibling nodes/edges painted over the folder background no longer toggles the highlight). - Navigation: clicking a single-file box (files-level, key == node id) opens the file modal instead of drilling into a degenerate single-file group (which also leaked a {target}-prefixed id into the breadcrumb); Shift-click now selects file-node boxes in the overview too. - Open-source modifier: Alt/Option on every platform (was Cmd on macOS / Ctrl elsewhere — Cmd clashed with copy/paste, Ctrl is right-click on macOS); rename the cursor class .ctrl-link -> .src-link. - Crate cluster at dig>0 rendered neutral grey (red/pink reserved for the crate overview at dig 0). - Layout: .svg-frame min-height 420px; zoom buttons and the shortcut legend lifted clear of the bottom status bar; tier-menu options share one style (drop the active .on accent). Docs synced: docs/code-ranker-viewer/{PRD,DESIGN}.md.

…line --doc catalog Make the vocabulary consistent across CLI, config, snapshot, viewer and docs, and make every principle/metric doc reachable offline. Breaking by design (few clients yet); SARIF's standard `rules` field is the only "rules" kept. Terminology - preset → **principle** everywhere: Rust types (`Preset`→`Principle`, `preset.rs`→`principle.rs`), snapshot JSON field `presets`→`principles`, config `[[presets]]`/`[presets.<ID>]` → `[[principles]]`/`[principles.<ID>]`, viewer JS (`snap.principles`) + CSS classes (`exp-principle-*`), docs. - `--focus-rule` → **`--focus`** (old name removed, no alias). value_name fixed (`check`: RULE|GROUP, `report`: METRIC | PRINCIPLE); dropped the false "Mirrors check's --focus-rule" docstring (the two are different operations). - **rule** is reserved for the `check` gate only (`[rules]`, rule ids, SARIF `rules`); principles/metrics are no longer called "rules". - **lens**/**axis** (the framing concept) → **focus**. Kept the unrelated viewer "reveal-depth lens chip" and OCP's "axis of change". Offline docs - `remediation` now points at `code-ranker report --doc <ID>` instead of a GitHub download URL; `--doc` resolves any base doc by filename stem (`Fan-in`, `metrics`, `AI`) and `--doc cycle` → the ADP doc. - New **`--doc AI`**: an AI-agent overview + a catalog auto-assembled from every base doc's TL;DR (`languages/base/AI.md` + `templates.rs::tldr_index`). - The `cycle` focus prompt now mirrors the ADP prompt (cycle is ADP's metric lens) — `synth_metric_principle` borrows the ADP principle's framing. Goldens regenerated (snapshot key + remediation strings). New unit tests cover the cycle→ADP borrow, `--doc cycle`→ADP, the AI index expansion and filename fallback. `make all` + `make e2e` green.

…to 4.0.0-alpha.1 Three independent format versions, each its own constant, bumped on its own criterion (documented in docs/versions.md): - **app** — Cargo `[workspace.package] version` → 4.0.0-alpha.1. - **config + CLI** — `code_ranker_graph::version::CONFIG_VERSION` ("4.0"). - **JSON snapshot + viewer** — `…::version::SCHEMA_VERSION` ("4.0", was "3"). config: - `code-ranker.toml` now MUST declare a matching `version`; `config::load` rejects a missing/mismatched value with a directional hint (older → migrate, newer → upgrade) instead of a cryptic `unknown field`. `CONFIG_SCHEMA_VERSION` aliases `CONFIG_VERSION`. Added `version = "4.0"` to the root config + 9 samples. json + viewer: - snapshot `schema_version` synced to `SCHEMA_VERSION`; the renderer injects `window.SCHEMA_VERSION` and the viewer rejects an incompatible swapped-in snapshot (snap-controls). The number lives only at its constant — config/e2e fixtures `format!` it from the constant, never hardcode it. Goldens regenerated (`schema_version`). docs: new docs/versions.md (the 3 surfaces, where each constant lives, when/how to bump, branch discipline); config.md documents the required `version`; PRD/DESIGN/ node_schema/e2e schema_version refs synced to 4.0. The /update-docs checklist gains a format-compatibility/version-bump step (moved early, Step 3). make all + make e2e green.

The product PRD should carry no on-disk-format or implementation detail: - §5.1 Snapshot File Format: drop the JSON example + field-by-field spec; keep a business note pointing at node_schema.md / DESIGN.md for the shape. - §7.3 Graph JSON Schema: removed (the JSON contract lives in node_schema.md). - §7.2 Plugin Model: replaced the trait / inventory / metric-pipeline internals with a business description (supported languages, --plugin, in-process/offline, no external loading); the mechanism is in DESIGN.md. - Repointed the node_schema.md cross-refs that named the removed PRD §7.3. JSON-shape and internals now live solely in node_schema.md / DESIGN.md.

Replace the maintenance-only `docs` subcommand (and its GitHub Pages corpus step) with a project-facing `ai` command, and stop tests littering .code-ranker/. - remove `docs` subcommand + `templates::build_corpus` + the pages.yml corpus-compose step; the corpus stays binary-embedded (`--doc <ID>`), no longer served over a URL. The HTML report publish in pages.yml stays. - add `ai`: prints the offline AI-agent playbook (`base/AI.md`), no analysis, always exits 0. Resolves the language plugin only to pick output — full playbook + principle/metric catalog when resolved; a brief intro + a dynamic 'Select a language' section (template authored in AI.md, values filled by `ai::fill_select`) when none resolves, with the catalog withheld. - AI.md: expanded product description + a Commands section (check/report/ai/help). - e2e: run report-emitting helpers from a temp cwd so the default json+html pair (always written, since the built-in [output.*] paths are set) no longer lands in the repo's own .code-ranker/. - docs: CLI.md/PRD/DESIGN/ai-skill/templates.md synced. No version bump: the removed/added CLI surface rides this branch's existing CONFIG_VERSION 4.0 (already a major vs main 3.0.2).

The diff-coverage gate flagged 7 added lines with no test: - report.rs `--output.prompt --focus <metric>|<principle>`: two e2e tests for the metric-lens and principle arms of the prompt builder. - templates.rs: extract `catalog_entry` (no-summary arm unreachable via the real corpus) and a shared `with_trailing_newline` helper (DRY with the ai and report `--doc` stdout paths); unit-test both arms. diff-coverage vs origin/main: every added/changed Rust line covered.

When a cycle's only back-edges to the module root are `pub(in crate::…::<ancestor>)` visibility paths (not `use` statements), the smallest fix is to tighten the visibility (`pub(super)`/`pub(crate)`): Code Ranker models the visibility path as a dependency edge, so narrowing it removes the edge with no restructure. Guarded on every consumer already living in the narrower scope (else extract/invert) so cheaper models don't narrow blindly and silence the metric. Surfaced by the prompt-eval reference runs.

- Move the prompt-self-improvement playbook docs/ -> contrib/ and rework it into a closed, self-improving loop: top-level goal, three objectives (quality/cost/clarity), a meta-loop that edits the playbook itself, a metrics.csv schema + scoring rubric, the full run procedure (incl. collecting the agent's own writes into the run dir), and a one-mechanism-per-sweep note. - Add contrib/prompt-eval-metrics.py: extracts a run's objective metrics from its artifacts (chat.jsonl, before/after.json, branch git diff) and appends a row to prompt-eval/metrics.csv; judged columns via flags; format-aware token extraction; --dry-run.

…n TODO notes Prompt-eval branches in PROJECT are flat and recur across builds, so the run id <model>-<focus>-<n> collides between prompt versions. Suffix the branch with the code-ranker commit hash -> <model>-<focus>-<n>-<CR_SHA>, unique per prompt version and tied to the run's build dir. The collector now defaults --branch to <run>-<cr_sha> (cr_sha derived from the run path), so loc/files come from the right branch without passing it; same-commit re-runs bump <n>. Also add rust/todo.md: research notes on the Rust plugin's unsafe detection, the two extension paths (numeric counter vs CEL fact-rule), candidate anti-patterns, and the opt-in per-function analysis level. Reference only; not yet acted on.

- meta-loop: a user correction is the strongest signal; fix THIS file before continuing, not after the sweep (the file not changing IS the bug) - algorithm: drive the loop to convergence; a lever edit is unfinished without its verifying re-run; don't pause for permission between iterations - tuning rule: diagnose from the transcript by hand, not from aggregate counts - launching: sub-agent cwd = code-ranker repo makes it 'cargo run' instead of the PATH binary, inflating cost columns

…ng source doc_note read as a soft mid-prompt aside; cheaper tiers explored the code first and re-derived the mechanism the doc already states (verbatim, with a worked example for this very project). Reframe as a directive first step that promises the payoff (cause + smallest fix + worked example). Hypothesis (prompt-eval): cuts pre-edit exploration on the next sonnet-ADP sweep — first_edit_turn / commands / tool_calls / output_tokens drop vs the f6ddc88 sonnet-ADP-1 baseline. Verify-or-revert.

… of base A lever for a cheaper tier's missing remedy must not leak language-specifics (Rust pub(in ...)) into languages/base/AI.md or the neutral defaults.toml; the base lever stays generic and points at the per-language doc.

…fix-loop step Haiku skipped the deep doc (read_doc_focus=0) despite used_generated_prompt=1 and a front-loaded doc_note — the base AI.md fix loop had no doc-read step, so the agent treated it as optional and reached for a heavy crate-level extraction (ADP 16->9 only, new_cycle, -15 tests). Add it as step 3 (not optional), language-neutral; the per-language cause/remedy lives in languages/<lang>/<ID>.md. Hypothesis (prompt-eval): haiku-ADP-2 read_doc_focus 0->1, fix shape -> in-place visibility narrowing, focus_delta -7->-11, new_cycles 1->0, tests preserved.

…efore extract/move haiku-ADP-2 read --doc ADP and produced a correct fix, but over-extracted (moved transformers.rs to a leaf, 11 files) where in-place pub(in...)->pub(super) narrowing (opus/sonnet, +5/-5) was the minimal fix. The visibility remedy was present but sat before prominent extract-to-leaf shapes; cheaper tiers picked the heavier pattern. Reframe as an ordered procedure: find back-edges -> narrow visibility paths first -> extract/split only for genuine use back-edges. Hypothesis (prompt-eval): haiku-ADP-3 keeps quality (cycle gone, tests intact) but shrinks the fix to in-place narrowing -> loc/files near opus/sonnet, quality 4->5.

A cycle whose membership only shrank (subset of a pre-existing cycle the fix partially cleared) registers as new_cycles; it nearly mis-scored haiku-ADP-3 (dc06762) down despite a clean minimal fix. Diff cycle node-sets before scoring.

Branch = <ts>_<CR_SHA> (e.g. 20260623T1849Z_dc06762), matching the evidence folder name 1:1; UTC <ts> makes each run unique (no bump <n>). One run per build dir; pass --branch to the collector (no longer <run>-<CR_SHA>).

…branches A <ts>_<CR_SHA> branch carries no ticket, so a project prepare-commit-msg hook rejects the commit; --no-verify does not skip prepare-commit-msg. Prefix eval commits with a pseudo-ticket (PROMPTEVAL-N:).

Run 'cfs init' (engine v1.5.9, kit sdlc v1.2.1): adds root AGENTS.md/CLAUDE.md navigation blocks and gitignore entries for cf-* agent integrations. The .cf-studio/ dir stays gitignored (pre-existing rule).

cfs validate-toc flagged 15 headings (h3 subsections) missing from their TOCs. Regenerated with 'cfs toc'; additive only (+15 entries), now all TOCs validate.

…etrics) from_snapshots was hardcoded to the cycle metric, so an HK/sloc/cognitive run reported ADP cycle counts in focus_*/worst_*/new_cycles. Now: cycle FOCUS keeps the SCC logic; a metric FOCUS reads the metric off the module nodes -> worst_* = worst module's value (the --top 1 target), focus_* = project-wide sum (flat total beside a dropped worst = relocated, not dissolved), new_cycles blank. Direction from node_attributes schema (higher_better -> worst is min). read_doc_focus alias set is now focus-aware too. Playbook column docs updated to match. Exposed by the first HK run (opus max): worst HK 390825->140697, total 603969->353313.

…it BY ROLE Reading 3 HK runs' real diffs: opus/sonnet narrowed pub(in) visibility (root cause, 1 line); haiku followed the doc's only stated remedy ('prefer the split') and did a mechanical facade split (trait away from impl) that shaved sloc and even widened field visibility -- HK number down, coupling not separated. Fix: (1) note pub(in <ancestor>) inflates fan_in artificially (consistent with ADP.md); (2) state the highest-value HK fix is splitting a multi-ROLE hub by responsibility (cuts real fan_in AND fan_out), NOT shaving sloc by moving a decl from its impl. Ordered: narrow artificial fan-in -> split by role -> never mechanical-split a cohesive role. Hypothesis (prompt-eval): haiku-HK-2 switches from facade split to visibility narrowing / role split -> q4->q5, footprint collapses. Verify-or-revert.

doc_rel_path hardcoded metric docs to base/<doc>.md, so a per-language metric doc (languages/rust/HK.md) was embedded but NEVER served: --doc HK always returned the neutral base doc, while --doc ADP (a principle) correctly composed rust/ADP.md. Metric docs now reuse the override the plugin already applied to its principle docs (override_lang, inferred from principle doc_urls): serve <lang>/<doc>.md when present, else base/. Verified: --doc HK now composes rust/HK.md; --doc ADP unchanged; base fallback clean; 145 unit tests pass.

The b32aee6 scaffolding change (doc_note front-load) is embedded verbatim in each report's prompt block, which the e2e goldens compare exactly — so all 9 went stale. Regenerated per docs/e2e.md (report + frozen header). Also re-freezes the rust golden's header, which had been committed with real machine values (branch/paths), and syncs versions 3.0.0-alpha.1 -> 4.0.0-alpha.1. Full workspace test suite green.

haiku-HK on cyberfabric-core's gear.rs (HK 149M) read the doc, picked 'split by role', but split the hub by its own internal seams (one file per trait impl) -> shaved sloc 761->345, fan_in ROSE 9->13, HK only -34%, gear still #1. opus/sonnet instead ran the audiences check, found 8-9 consumers all reached in for one wrong-audience type (AppServices), moved it to a leaf -> fan_in ->1/0, HK ~-100%. Reframe 'Remedies, in order' as audiences-FIRST: trace what each fan-in consumer imports; the answer's shape picks the remedy (same item for many -> move to leaf; visibility path -> narrow; different parts -> split by role). Add an explicit trap: splitting a hub by its internal seams shaves sloc without cutting coupling. Hypothesis (prompt-eval): haiku-HK-2 on gear.rs runs the audiences check and does the wrong-audience extraction (gear.rs HK ~-100%, q2->q5) instead of the capability split.

…t step" This reverts commit 919de6c.

If the agent reads the lever and still doesn't do the named step (and a stronger model on the same prompt does), the gap is model capability, not the prompt: revert the lever (failed hypothesis), record a capability ceiling, stop -- don't burn the 3rd iteration. Observed on cyberfabric-core gear.rs HK (haiku, two reverted levers).

From the cyberfabric-core cycle sweep (20 cycles -> 0, 28 Haiku passes): - step 5 / tests_pass: per-crate green is NOT enough on a multi-crate workspace; gate on cargo check --workspace AND cargo test --workspace --no-run (feature- unified + cfg(test) paths break while the touched crate stays green). Also watch the test COUNT (a fix can drop tests and leave survivors green). - new 'Sweeps' subsection: track net progress (stall on 2 no-net-decrease passes; cap iters), gate compilation at workspace scope between passes / checkpoint, measure from artifacts not the agent's summary, and the cheap-tier-converges-on- cycles-but-not-HK-hubs finding.

- header brand now reads 'code-ranker' (not upper-cased); the exact tool version (snapshot.versions["code-ranker"]) appears small, centred, under the brand on header hover (injected by lib.rs for the __CR_VERSION__ placeholder) - the 'stat' button now toggles the diff-summary popup (second click closes) and carries an .active highlight while open docs: README lists supported languages (Rust stable, the other eight beta); viewer DESIGN/PRD updated for the brand/version + stat toggle.

github-actions · 2026-06-24T12:57:46Z

code-ranker

Verdict vs main: ❔ unknown

No new violations vs the baseline. 🎉

📦 Full HTML report: see the code-ranker-report artifact on this run.

codecov · 2026-06-24T12:58:02Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 97.79%. Comparing base (c2fb6ac) to head (2901617).

Additional details and impacted files

@@            Coverage Diff             @@
##             main      #17      +/-   ##
==========================================
+ Coverage   97.68%   97.79%   +0.10%     
==========================================
  Files         121      123       +2     
  Lines       13971    14319     +348     
==========================================
+ Hits        13648    14003     +355     
+ Misses        323      316       -7

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Add a doc_rel_path test for the metric-doc <lang>/ override branch (566fb23): a rust-routing principle + an hk metric whose rust/HK.md exists now asserts the override is served (was only exercised on the base/ fallback path). Also fold the inner early-return into an Option chain (map -> filter -> unwrap_or_else) so the closing brace after the return is gone — llvm-cov flagged it as an uncoverable region. Behaviour unchanged; diff-coverage vs origin/main is now clean.

README: top-row badges for the live code-ranker report, the project website (code-ranker.com) and the GitHub App install; a CI-integration note framing the App as the zero-config per-PR-report option. GitHub Actions Rust guide gets a matching 'prefer zero config -> install the App' cross-reference. Docs-only; no code or format change.

`--config levels.functions=true` was rejected ("unknown config key") — the inline KEY=VALUE dispatcher had no arm for it, so the functions level could only be enabled via a TOML file. Add the arm (parse_on_off), extend the inline-keys test, and note the inline form in config.md.

Bump 4.0.0-alpha.1 -> 4.0.0 (make bump VERSION=4.0.0): workspace Cargo.toml/Cargo.lock and doc version refs. Workspace compiles clean.

Drop the leftover 4.0.0-alpha.1 strings the bump didn't touch: the 9 e2e golden snapshots' embedded code-ranker version, the docs/versions.md examples, and the prompting-self-improve example rows. README status updated from 'pre-alpha' to 4.0.0 (Rust production-ready, other languages beta). e2e + markdownlint green.

ffedoroff added 29 commits June 22, 2026 23:23

docs: add prompt self-improvement loop playbook

6bc1468

chore(studio): initialize Constructor Studio (cfs init + sdlc kit)

d655564

Run 'cfs init' (engine v1.5.9, kit sdlc v1.2.1): adds root AGENTS.md/CLAUDE.md navigation blocks and gitignore entries for cf-* agent integrations. The .cf-studio/ dir stays gitignored (pre-existing rule).

docs(toc): sync tables of contents via cfs toc

d848b28

cfs validate-toc flagged 15 headings (h3 subsections) missing from their TOCs. Regenerated with 'cfs toc'; additive only (+15 entries), now all TOCs validate.

Revert "doc(rust HK): make the audiences diagnosis the mandatory firs…

7a84339

…t step" This reverts commit 919de6c.

ffedoroff added 6 commits June 24, 2026 16:18

style(cli/doc): rustfmt the new doc_rel_path test

b4c9fc2

chore(release): v4.0.0

eb3b0b9

Bump 4.0.0-alpha.1 -> 4.0.0 (make bump VERSION=4.0.0): workspace Cargo.toml/Cargo.lock and doc version refs. Workspace compiles clean.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prompt-eval self-improve loop, ADP/HK doc remedies, and viewer UX polish#17

Prompt-eval self-improve loop, ADP/HK doc remedies, and viewer UX polish#17
ffedoroff wants to merge 35 commits into
mainfrom
refactor/principle-focus-terminology

ffedoroff commented Jun 24, 2026

Uh oh!

github-actions Bot commented Jun 24, 2026 •

edited

Loading

Uh oh!

codecov Bot commented Jun 24, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ffedoroff commented Jun 24, 2026

Prompt-eval (self-improving fix-prompt loop)

Rust ADP/HK doc remedies

CLI / docs

Viewer UX

Docs / README

Uh oh!

github-actions Bot commented Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

code-ranker

Uh oh!

codecov Bot commented Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

github-actions Bot commented Jun 24, 2026 •

edited

Loading

codecov Bot commented Jun 24, 2026 •

edited

Loading