feat: orc report — render traces as HTML artifacts by Thormatt · Pull Request #10 · Thormatt/orc

Thormatt · 2026-06-12T16:56:04Z

Context

The product's proof artifact existed only as a hand-built mockup in site/ while real runs produced JSON. orc report closes that gap: any trace (or set of traces) renders into the designed, self-contained HTML artifact — verdict pills, cited evidence chunks, ledger, token usage, replay lineage.

Stacked on #8 (propose CLI) — merge that first; this retargets automatically.

Changes

src/orc/rendering/trace_html.py — f-string assembly, no new deps; site/trace.css/trace.js copied verbatim as package data (wheel inclusion verified via uv build + unzip).
orc report RUN_ID... [-o out.html] [--open] — multi-run reports supported (one claim article per trace).
Every trace-derived string is HTML-escaped — evidence text is untrusted corpus content; script-injection pinned by test.
Docs commit: the verification gate's coverage ceiling (hallucinated citations: caught reliably / unsupported claims: caught partially / faithful-but-wrong corpus: not caught — provenance controls) in README, EU AI Act doc, and competitive positioning. Framing: "every claim is traceable to a cited source," not "every claim is true."

Testing

16 new tests, all RED-first; full suite 324 passed, ruff clean; wheel asset check PASS.

🤖 Generated with Claude Code

A trace is only defensible if a reviewer can read it without installing orc. `orc report RUN_ID... [-o PATH] [--open]` renders one or more run traces into a single self-contained HTML file — CSS and JS inlined from packaged copies of site/trace.{css,js} so the artifact matches the public site's design and survives email, archival, and air-gapped review with zero external requests. Every trace-derived string is html.escape()d: evidence text is untrusted corpus content and must not become markup in the report. Sparse traces (failed runs, non-verify skills) render rather than crash, and unknown verdict labels fall back to the neutral "nf" pill. Wheel build verified to include the assets via the existing hatch packages config. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

A buyer who reads "verification runtime" can over-read the guarantee. Name the three failure modes and their coverage explicitly: hallucinated citations are caught reliably (structural filter + downgrade), unsupported claims are caught partially (LLM-judge limits, F1 0.864 is the measured rate), and faithful-but-wrong corpus content is not caught at all — mitigated by corpus provenance and freshness controls, not by the gate. The framing sentence — orc guarantees "every claim is traceable to a cited source," not "every claim is true" — lands in the README next to the invariants table, in the EU AI Act doc's "What Orc is NOT" list (tied to the deployer's Article 10 duties), and in the competitive doc's honest-gaps section (post-hoc judges share the same ceiling). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Real traces carry unbreakable tokens the mockup never had — URLs, DOIs, uppercase file paths, 26-char run ids. They push the centered grid's min-content width past the viewport, and margin:0 auto centering clips the overflow off the LEFT edge where no scrollbar can reach it. A documented override block (the copied asset stays verbatim) adds overflow-wrap:anywhere to the affected text surfaces and caps both grid columns with minmax(0,...). Topbar and <title> now summarize multi-run reports ("13 runs <first> +12") instead of dumping every id. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

build_report_html escapes every trace field into the HTML, but trace.js read the claim title back via .textContent (which decodes those entities) and re-injected it through innerHTML when building the ledger and the verdict pill — re-opening exactly the injection the server-side escaping closed. A claim of `<img src=x onerror=...>` executed arbitrary JS when the report was opened; since reports are meant to be emailed/filed as trustworthy compliance artifacts and claim text is attacker-influenceable (verified web/corpus content), this was a real shipping defect. The ledger row and pill are now built with createElement + textContent only. A test pins the contract: no innerHTML in trace.js may carry trace text (bare container clears excepted). Verified end-to-end in a real browser — the payload now renders as inert text. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

The CHANGELOG claimed "[0.2.0] First PyPI release" but orc-ai was never published, tagged, or released — a falsifiable release claim in a project whose pitch is "every claim is traceable." Mark 0.2.0 unreleased and describe the publish trigger accurately. Add the shipped-but-unreleased wave-3 work (hybrid retrieval, orc propose, orc report) to an Added section instead of leaving hybrid retrieval under Planned in the same tree that implements it. Refresh the README roadmap (hybrid retrieval is shipped opt-in) and roadmap.md's code-state line (v0.1.4 -> v0.2.0). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Thormatt and others added 6 commits June 12, 2026 12:52

docs: list orc report in the commands reference

8feec27

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Thormatt changed the base branch from feat/effects-propose-cli to main June 12, 2026 17:46

Thormatt merged commit 996152a into main Jun 12, 2026
3 checks passed

Thormatt mentioned this pull request Jun 12, 2026

fix(site): close DOM-XSS in the deployed trace.js #12

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: orc report — render traces as HTML artifacts#10

feat: orc report — render traces as HTML artifacts#10
Thormatt merged 6 commits into
mainfrom
feat/trace-report

Thormatt commented Jun 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Thormatt commented Jun 12, 2026

Context

Changes

Testing

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant