Thormatt · Thormatt · Jun 12, 2026 · Jun 12, 2026 · Jun 12, 2026 · Jun 12, 2026
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -7,20 +7,32 @@ Version numbers follow [SemVer](https://semver.org/spec/v2.0.0.html).
 
 ## [Unreleased]
 
+### Added (not yet released)
+
+- **Hybrid retrieval** — opt-in BM25 + dense-vector retrieval fused with
+  Reciprocal Rank Fusion. Local `sentence-transformers` embedder by default
+  (no API key), pluggable `Embedder` protocol. `orc workspace create
+  --embeddings`, `orc workspace embed` backfill. BM25 stays the default.
+- **`orc propose`** — stage an allow-listed effect for human approval from the
+  CLI (the approval queue's producer surface); `orc approve list --json`.
+- **`orc report <run_id>...`** — render trace(s) into a self-contained HTML
+  artifact reusing the trace design language.
+
 ### Planned
 
 - `gads` directive (Google Ads agentic analysis: lens-based decomposition,
   read-only MCP integration, evidence-bound recommendation verification).
 - `orc eval consistency|perturb|retrieval|regression` reliability commands.
-- Voyage-AI or local-`sentence-transformers` embeddings + hybrid retrieval (RRF over BM25 + vector).
+- Voyage-AI / OpenAI embedding backends behind the existing `Embedder` protocol.
 - Hosted runtime (scheduled triggers, web dashboard, team workspaces).
 - Decomposition + arithmetic combined for DROP-shaped multi-step claims.
 
-## [0.2.0] — 2026-06-11
+## [0.2.0] — unreleased
 
-First PyPI release. The distribution is named **`orc-ai`** — `orc` is taken on
-PyPI by an unrelated project — but the import package (`import orc`) and the
-CLI command (`orc`) are unchanged.
+Packaged for PyPI as **`orc-ai`** (`orc` is taken by an unrelated project);
+the import package (`import orc`) and CLI command (`orc`) are unchanged. The
+release workflow publishes on a `v0.2.0` tag once the trusted publisher is
+configured — not yet tagged or published.
 
 ### Added
 

diff --git a/README.md b/README.md
@@ -19,6 +19,16 @@ Bind every claim to evidence you own. Cite real sources only. Replay every decis
 | **Replay** | Every call writes a trace: retrieval set, every LLM call's tokens and cache hits, the structured output. LLM sampling is pinned to `temperature=0` and the corpus is pinned by version, so `orc replay <run_id>` re-issues the original decision against the same snapshot rather than a fresh sample (best-effort against residual model nondeterminism). |
 | **Approval** | Anything that would mutate the outside world is routed to an approval queue first. Skills can only *propose* a typed, schema-validated, allow-listed action; a **separate process** holding the write credentials — which the analysis plane never sees — carries out human-approved actions and records the result, either one-shot (`orc execute`) or via the auto-drain daemon (`orc worker`, with leasing + idempotency + retry/backoff). *(Hosted row-level authz per plane is Phase 3; see [docs/design/0001-isolated-write-paths.md](docs/design/0001-isolated-write-paths.md).)* |
 
+### What the gate does and does not catch
+
+Orc's guarantee is **"every claim is traceable to a cited source"** — not "every claim is true." Three failure modes, three different answers:
+
+| Failure mode | Coverage |
+|---|---|
+| **Hallucinated citations** — the model cites a chunk that doesn't exist | **Caught reliably.** Fabricated chunk IDs are filtered structurally before the verdict ships; a verdict left with no valid grounding is downgraded to `not_found`. |
+| **Unsupported claims** — the model says `supported` when the cited evidence doesn't actually back the claim | **Caught partially.** This is an LLM-judge decision, with LLM-judge limits — the faithfulness benchmark (F1 0.864) is the measured error rate, not a guarantee. |
+| **Faithful-but-wrong** — the corpus itself is wrong, stale, or poisoned, and the claim cites it faithfully | **Not caught.** Orc verifies against your corpus, not against the world. Mitigate with corpus provenance and freshness controls: ingest only sources you trust (sha256 + source path are recorded automatically) and re-verify with `orc replay --live` after corpus updates. |
+
 Built for **research analysts, editorial teams, legal & compliance, agentic-workflow engineers** — anyone whose AI work product has to survive a second reviewer six months later.
 
 ## Quickstart
@@ -69,6 +79,7 @@ orc verify "<claim>" [-w <name>]       verify a single claim
 orc verify --file <path>               extract + verify every claim in a draft
 orc verify --url <url>                 same, from a URL
 orc research "<topic>" [-w <name>]     corpus-grounded synthesis with citations
+orc report <run_id>... [-o out.html]   render trace(s) as a shareable HTML report
 orc trace show <run_id>                full trace JSON
 orc trace list [-w <name>]             recent runs
 orc replay <run_id> [--live]           re-execute a recorded run
@@ -165,7 +176,7 @@ git clone https://github.com/Thormatt/orc.git
 cd orc
 uv sync --extra dev
 
-uv run pytest                           # 260+ tests, <5s
+uv run pytest                           # 360+ tests, <5s
 uv run ruff check src tests
 uv run orc --version
 ```
@@ -174,8 +185,8 @@ Live LLM tests are gated behind `ORC_TEST_ALLOW_LIVE_LLM=1` and require a real A
 
 ## Roadmap
 
-- Embedding-based retrieval (hybrid BM25 + vector via `sqlite-vec`)
 - OCR for scanned/image-only PDFs
+- Voyage/OpenAI embedding backends (the `Embedder` protocol is pluggable; local `sentence-transformers` hybrid retrieval shipped as opt-in)
 - Long-running directives (scheduled triggers, cloud execution)
 - `marketing` directive (assisted-only at first, autonomous behind approval gates later)
 - `legal` / `gads` / `code-review` directives — same runtime, new skill packages

diff --git a/docs/business/roadmap.md b/docs/business/roadmap.md
@@ -7,7 +7,7 @@ validated against real customer demand. Stage 0 is "land 3 pilots and learn
 what to charge for"; everything past Stage 1 will be revised based on what
 those pilots teach us.
 
-Last updated: 2026-05-19. Code state: v0.1.4 (F1 = 0.864 on a stratified
+Last updated: 2026-05-19. Code state: v0.2.0 — hybrid retrieval, PDF ingest, propose/report CLIs shipped (unreleased). Benchmark F1 = 0.864 on a stratified
 504-item HaluBench subsample — competitive with Lynx-70B's published
 home-court 0.85, not a same-set head-to-head; see
 [competitive.md](../positioning/competitive.md) for caveats).

diff --git a/docs/compliance/eu-ai-act.md b/docs/compliance/eu-ai-act.md
@@ -261,6 +261,17 @@ Honest framing matters here.
    obligations fall on Anthropic; Orc passes through whatever transparency
    information the upstream provider supplies.
 
+5. **Orc verifies traceability, not truth.** The guarantee is "every claim is
+   traceable to a cited source," not "every claim is true." Three failure
+   modes, three different coverages: hallucinated citations are caught
+   reliably (fabricated chunk IDs are structurally filtered, ungrounded
+   verdicts downgraded); unsupported claims are caught partially (an
+   LLM-judge decision, with LLM-judge error rates — see the faithfulness
+   benchmarks); faithful-but-wrong corpus content is not caught at all — if
+   the corpus is wrong, stale, or poisoned, a claim that cites it faithfully
+   will pass. The mitigation is the Article 10 data-governance work above:
+   corpus provenance, freshness, and review remain the deployer's obligation.
+
 ---
 
 ## Runbook for deployers
@@ -336,5 +347,5 @@ For procurement, conformity-assessment, or compliance-pilot inquiries:
 [thormatt@gmail.com](mailto:thormatt@gmail.com)
 
 Source: [github.com/Thormatt/orc](https://github.com/Thormatt/orc) · Last updated:
-2026-05-17. This document is part of the repository and is versioned with the
+2026-06-12. This document is part of the repository and is versioned with the
 runtime it describes.
diff --git a/docs/positioning/competitive.md b/docs/positioning/competitive.md
@@ -260,6 +260,14 @@ Honest gaps, kept current so prospects know what they're buying:
   will publish ours once the HHEM tokenizer-load issue is resolved.
 - **No multi-tenancy or team workspace primitives in 0.1.x.** Each
   workspace is owned by one filesystem.
+- **Truth of the corpus.** The runtime guarantee is "every claim is
+  traceable to a cited source," not "every claim is true." Hallucinated
+  citations are caught structurally; unsupported claims are caught at
+  LLM-judge accuracy (the F1 numbers above); faithful-but-wrong corpus
+  content — wrong, stale, or poisoned sources cited faithfully — is not
+  caught at all. Corpus provenance and freshness controls are the
+  mitigation. Post-hoc judges share the same ceiling: they score
+  consistency with the provided context, not the truth of the context.
 
 ---
 
@@ -291,4 +299,4 @@ Updates land via PR with the rationale captured in the commit message.
 The latest reproducible benchmark numbers always live in
 [`docs/benchmarks/`](../benchmarks/).
 
-Last updated: 2026-05-19 (Orc 0.1.4).
+Last updated: 2026-06-12 (Orc 0.2.0).
diff --git a/src/orc/cli.py b/src/orc/cli.py
@@ -8,6 +8,7 @@
 from orc.cli_commands import mcp as mcp_cmd
 from orc.cli_commands import propose as propose_cmd
 from orc.cli_commands import replay as replay_cmd
+from orc.cli_commands import report as report_cmd
 from orc.cli_commands import research as research_cmd
 from orc.cli_commands import search as search_cmd
 from orc.cli_commands import trace as trace_cmd
@@ -29,6 +30,7 @@ def main() -> None:
 main.add_command(research_cmd.research_command)
 main.add_command(trace_cmd.trace_group)
 main.add_command(replay_cmd.replay_command)
+main.add_command(report_cmd.report_command)
 main.add_command(approve_cmd.approve_group)
 main.add_command(propose_cmd.propose_command)
 main.add_command(execute_cmd.execute_command)

diff --git a/src/orc/cli_commands/report.py b/src/orc/cli_commands/report.py
@@ -0,0 +1,51 @@
+"""`orc report RUN_ID...` — render traces as a self-contained HTML artifact."""
+
+from __future__ import annotations
+
+from pathlib import Path
+
+import click
+
+from orc.errors import TraceNotFoundError
+from orc.rendering.trace_html import build_report_html
+from orc.storage.trace_store import load_trace
+
+
+@click.command("report")
+@click.argument("run_ids", nargs=-1, required=True)
+@click.option(
+    "-o",
+    "--output",
+    "output_path",
+    type=click.Path(dir_okay=False, writable=True, path_type=Path),
+    default=None,
+    help="Write the report to PATH instead of stdout.",
+)
+@click.option(
+    "--open",
+    "open_after",
+    is_flag=True,
+    help="Open the written report in the default browser (requires -o).",
+)
+def report_command(
+    run_ids: tuple[str, ...],
+    output_path: Path | None,
+    open_after: bool,
+) -> None:
+    """Render one or more run traces as a self-contained HTML report."""
+    # Fail before rendering: there is no file to open when writing to stdout,
+    # and silently ignoring the flag would hide a typo in the invocation.
+    if open_after and output_path is None:
+        raise click.ClickException("--open requires -o/--output (stdout cannot be opened)")
+    try:
+        traces = [load_trace(run_id) for run_id in run_ids]
+    except TraceNotFoundError as exc:
+        raise click.ClickException(str(exc)) from exc
+    html_doc = build_report_html(traces)
+    if output_path is None:
+        click.echo(html_doc)
+        return
+    output_path.write_text(html_doc, encoding="utf-8")
+    click.echo(str(output_path))
+    if open_after:
+        click.launch(str(output_path))
diff --git a/src/orc/rendering/__init__.py b/src/orc/rendering/__init__.py
@@ -0,0 +1 @@
+"""Rendering: turn persisted trace JSON into human-facing artifacts."""
diff --git a/src/orc/rendering/assets/__init__.py b/src/orc/rendering/assets/__init__.py
@@ -0,0 +1,7 @@
+"""Static assets (trace.css, trace.js) inlined into generated reports.
+
+A real package (not bare data files) so importlib.resources can locate the
+assets from a wheel, a zipapp, or an editable install alike. trace.css and
+trace.js are verbatim copies of site/trace.css and site/trace.js — the report
+artifact and the public site must render traces identically.
+"""
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1 @@
		"""Rendering: turn persisted trace JSON into human-facing artifacts."""