feat: orc report — render traces as HTML artifacts#10
Merged
Conversation
A trace is only defensible if a reviewer can read it without
installing orc. `orc report RUN_ID... [-o PATH] [--open]` renders
one or more run traces into a single self-contained HTML file —
CSS and JS inlined from packaged copies of site/trace.{css,js} so
the artifact matches the public site's design and survives email,
archival, and air-gapped review with zero external requests.
Every trace-derived string is html.escape()d: evidence text is
untrusted corpus content and must not become markup in the report.
Sparse traces (failed runs, non-verify skills) render rather than
crash, and unknown verdict labels fall back to the neutral "nf"
pill. Wheel build verified to include the assets via the existing
hatch packages config.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
A buyer who reads "verification runtime" can over-read the guarantee. Name the three failure modes and their coverage explicitly: hallucinated citations are caught reliably (structural filter + downgrade), unsupported claims are caught partially (LLM-judge limits, F1 0.864 is the measured rate), and faithful-but-wrong corpus content is not caught at all — mitigated by corpus provenance and freshness controls, not by the gate. The framing sentence — orc guarantees "every claim is traceable to a cited source," not "every claim is true" — lands in the README next to the invariants table, in the EU AI Act doc's "What Orc is NOT" list (tied to the deployer's Article 10 duties), and in the competitive doc's honest-gaps section (post-hoc judges share the same ceiling). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Real traces carry unbreakable tokens the mockup never had — URLs,
DOIs, uppercase file paths, 26-char run ids. They push the centered
grid's min-content width past the viewport, and margin:0 auto
centering clips the overflow off the LEFT edge where no scrollbar can
reach it. A documented override block (the copied asset stays
verbatim) adds overflow-wrap:anywhere to the affected text surfaces
and caps both grid columns with minmax(0,...). Topbar and <title> now
summarize multi-run reports ("13 runs <first> +12") instead of
dumping every id.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
build_report_html escapes every trace field into the HTML, but trace.js read the claim title back via .textContent (which decodes those entities) and re-injected it through innerHTML when building the ledger and the verdict pill — re-opening exactly the injection the server-side escaping closed. A claim of `<img src=x onerror=...>` executed arbitrary JS when the report was opened; since reports are meant to be emailed/filed as trustworthy compliance artifacts and claim text is attacker-influenceable (verified web/corpus content), this was a real shipping defect. The ledger row and pill are now built with createElement + textContent only. A test pins the contract: no innerHTML in trace.js may carry trace text (bare container clears excepted). Verified end-to-end in a real browser — the payload now renders as inert text. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The CHANGELOG claimed "[0.2.0] First PyPI release" but orc-ai was never published, tagged, or released — a falsifiable release claim in a project whose pitch is "every claim is traceable." Mark 0.2.0 unreleased and describe the publish trigger accurately. Add the shipped-but-unreleased wave-3 work (hybrid retrieval, orc propose, orc report) to an Added section instead of leaving hybrid retrieval under Planned in the same tree that implements it. Refresh the README roadmap (hybrid retrieval is shipped opt-in) and roadmap.md's code-state line (v0.1.4 -> v0.2.0). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Context
The product's proof artifact existed only as a hand-built mockup in
site/while real runs produced JSON.orc reportcloses that gap: any trace (or set of traces) renders into the designed, self-contained HTML artifact — verdict pills, cited evidence chunks, ledger, token usage, replay lineage.Stacked on #8 (propose CLI) — merge that first; this retargets automatically.
Changes
src/orc/rendering/trace_html.py— f-string assembly, no new deps;site/trace.css/trace.jscopied verbatim as package data (wheel inclusion verified viauv build+ unzip).orc report RUN_ID... [-o out.html] [--open]— multi-run reports supported (one claim article per trace).Testing
16 new tests, all RED-first; full suite 324 passed, ruff clean; wheel asset check PASS.
🤖 Generated with Claude Code