Skip to content

feat: orc report — render traces as HTML artifacts#10

Merged
Thormatt merged 6 commits into
mainfrom
feat/trace-report
Jun 12, 2026
Merged

feat: orc report — render traces as HTML artifacts#10
Thormatt merged 6 commits into
mainfrom
feat/trace-report

Conversation

@Thormatt

Copy link
Copy Markdown
Owner

Context

The product's proof artifact existed only as a hand-built mockup in site/ while real runs produced JSON. orc report closes that gap: any trace (or set of traces) renders into the designed, self-contained HTML artifact — verdict pills, cited evidence chunks, ledger, token usage, replay lineage.

Stacked on #8 (propose CLI) — merge that first; this retargets automatically.

Changes

  • src/orc/rendering/trace_html.py — f-string assembly, no new deps; site/trace.css/trace.js copied verbatim as package data (wheel inclusion verified via uv build + unzip).
  • orc report RUN_ID... [-o out.html] [--open] — multi-run reports supported (one claim article per trace).
  • Every trace-derived string is HTML-escaped — evidence text is untrusted corpus content; script-injection pinned by test.
  • Docs commit: the verification gate's coverage ceiling (hallucinated citations: caught reliably / unsupported claims: caught partially / faithful-but-wrong corpus: not caught — provenance controls) in README, EU AI Act doc, and competitive positioning. Framing: "every claim is traceable to a cited source," not "every claim is true."

Testing

16 new tests, all RED-first; full suite 324 passed, ruff clean; wheel asset check PASS.

🤖 Generated with Claude Code

Thormatt and others added 6 commits June 12, 2026 12:52
A trace is only defensible if a reviewer can read it without
installing orc. `orc report RUN_ID... [-o PATH] [--open]` renders
one or more run traces into a single self-contained HTML file —
CSS and JS inlined from packaged copies of site/trace.{css,js} so
the artifact matches the public site's design and survives email,
archival, and air-gapped review with zero external requests.

Every trace-derived string is html.escape()d: evidence text is
untrusted corpus content and must not become markup in the report.
Sparse traces (failed runs, non-verify skills) render rather than
crash, and unknown verdict labels fall back to the neutral "nf"
pill. Wheel build verified to include the assets via the existing
hatch packages config.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
A buyer who reads "verification runtime" can over-read the
guarantee. Name the three failure modes and their coverage
explicitly: hallucinated citations are caught reliably (structural
filter + downgrade), unsupported claims are caught partially
(LLM-judge limits, F1 0.864 is the measured rate), and
faithful-but-wrong corpus content is not caught at all — mitigated
by corpus provenance and freshness controls, not by the gate.

The framing sentence — orc guarantees "every claim is traceable to
a cited source," not "every claim is true" — lands in the README
next to the invariants table, in the EU AI Act doc's "What Orc is
NOT" list (tied to the deployer's Article 10 duties), and in the
competitive doc's honest-gaps section (post-hoc judges share the
same ceiling).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Real traces carry unbreakable tokens the mockup never had — URLs,
DOIs, uppercase file paths, 26-char run ids. They push the centered
grid's min-content width past the viewport, and margin:0 auto
centering clips the overflow off the LEFT edge where no scrollbar can
reach it. A documented override block (the copied asset stays
verbatim) adds overflow-wrap:anywhere to the affected text surfaces
and caps both grid columns with minmax(0,...). Topbar and <title> now
summarize multi-run reports ("13 runs <first> +12") instead of
dumping every id.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
build_report_html escapes every trace field into the HTML, but
trace.js read the claim title back via .textContent (which decodes
those entities) and re-injected it through innerHTML when building the
ledger and the verdict pill — re-opening exactly the injection the
server-side escaping closed. A claim of
`<img src=x onerror=...>` executed arbitrary JS when the report was
opened; since reports are meant to be emailed/filed as trustworthy
compliance artifacts and claim text is attacker-influenceable
(verified web/corpus content), this was a real shipping defect.

The ledger row and pill are now built with createElement + textContent
only. A test pins the contract: no innerHTML in trace.js may carry
trace text (bare container clears excepted). Verified end-to-end in a
real browser — the payload now renders as inert text.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The CHANGELOG claimed "[0.2.0] First PyPI release" but orc-ai was
never published, tagged, or released — a falsifiable release claim in
a project whose pitch is "every claim is traceable." Mark 0.2.0
unreleased and describe the publish trigger accurately. Add the
shipped-but-unreleased wave-3 work (hybrid retrieval, orc propose, orc
report) to an Added section instead of leaving hybrid retrieval under
Planned in the same tree that implements it. Refresh the README
roadmap (hybrid retrieval is shipped opt-in) and roadmap.md's code-state
line (v0.1.4 -> v0.2.0).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@Thormatt Thormatt changed the base branch from feat/effects-propose-cli to main June 12, 2026 17:46
@Thormatt Thormatt merged commit 996152a into main Jun 12, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant