Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
37 changes: 17 additions & 20 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,23 @@ Version numbers follow [SemVer](https://semver.org/spec/v2.0.0.html).

## [Unreleased]

### Added (not yet released)
### Planned

- `gads` directive (Google Ads agentic analysis: lens-based decomposition,
read-only MCP integration, evidence-bound recommendation verification).
- `orc eval consistency|perturb|retrieval|regression` reliability commands.
- Voyage-AI / OpenAI embedding backends behind the existing `Embedder` protocol.
- Hosted runtime (scheduled triggers, web dashboard, team workspaces).
- Decomposition + arithmetic combined for DROP-shaped multi-step claims.

## [0.2.0] — unreleased

Packaged for PyPI as **`orc-ai`** (`orc` is taken by an unrelated project);
the import package (`import orc`) and CLI command (`orc`) are unchanged. The
release workflow publishes on a `v0.2.0` tag once the trusted publisher is
configured — not yet tagged or published.

### Added

- **Hybrid retrieval** — opt-in BM25 + dense-vector retrieval fused with
Reciprocal Rank Fusion. Local `sentence-transformers` embedder by default
Expand All @@ -28,25 +44,6 @@ Version numbers follow [SemVer](https://semver.org/spec/v2.0.0.html).
- **`orc eval calibrate`** — derive the tiered escalation threshold from the
gold set (lowest cutoff meeting `--target`, default 0.95), with an
achievability guard that refuses to silently configure always-escalate.

### Planned

- `gads` directive (Google Ads agentic analysis: lens-based decomposition,
read-only MCP integration, evidence-bound recommendation verification).
- `orc eval consistency|perturb|retrieval|regression` reliability commands.
- Voyage-AI / OpenAI embedding backends behind the existing `Embedder` protocol.
- Hosted runtime (scheduled triggers, web dashboard, team workspaces).
- Decomposition + arithmetic combined for DROP-shaped multi-step claims.

## [0.2.0] — unreleased

Packaged for PyPI as **`orc-ai`** (`orc` is taken by an unrelated project);
the import package (`import orc`) and CLI command (`orc`) are unchanged. The
release workflow publishes on a `v0.2.0` tag once the trusted publisher is
configured — not yet tagged or published.

### Added

- **PDF ingestion** — `orc ingest report.pdf` now works alongside markdown,
text, json, and URLs. Text is extracted page-by-page via `pypdf`, and the
PDF metadata title is used when the body carries no markdown-style heading
Expand Down
13 changes: 13 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -75,6 +75,7 @@ claude mcp add orc -- uv run --directory $(pwd) orc mcp serve
```
orc workspace create <name> create a new workspace
orc workspace list list workspaces
orc workspace embed [-w <name>] backfill vector embeddings (embeddings extra)
orc ingest <path-or-url> [-w <name>] add evidence (md, txt, json, pdf, urls)
orc search "<query>" [-w <name>] BM25 retrieval, no LLM
orc verify "<claim>" [-w <name>] verify a single claim
Expand All @@ -86,14 +87,19 @@ orc verify "<claim>" --mode tiered cheap judge first, escalate only when uns
orc eval import <file.yaml> [-w <n>] seed a labelled gold set
orc eval label <run_id> --verdict <v> promote/correct a real verdict into gold
orc eval run [-w <name>] [--json] score the gate (accuracy, calibration, recall)
orc eval show [-w <name>] reprint a persisted eval report
orc eval calibrate [-w <name>] tune the tiered escalation threshold
orc trace show <run_id> full trace JSON
orc trace list [-w <name>] recent runs
orc replay <run_id> [--live] re-execute a recorded run
orc propose <executor> --params <json> stage an action for human approval
orc approve list [-w <name>] [--json] list pending approval items
orc approve show <id> full payload for an approval
orc approve accept <id> [--note] accept a pending recommendation
orc approve reject <id> [--note] reject one
orc execute <id> [-w <name>] execute one approved action
orc worker [-w <name>] auto-drain daemon for approved actions
orc audit export [-w <name>] bundle traces + evidence for an auditor
orc mcp serve start the MCP stdio server
```

Expand All @@ -117,6 +123,13 @@ orc approve accept <id> -w research
orc execute <id> -w research # lands in ~/.orc/workspaces/research/out/
```

> **Approver identity is self-reported.** `orc approve accept --by <name>`
> records whatever name the caller passes (default `$USER`) — orc does not
> authenticate it. Multi-approver gates (`approvers_required > 1`, e.g. for EU
> AI Act Article 14(5)) are an honor system on a shared shell; for a real
> separation-of-duties guarantee, route decisions through an authenticated
> surface that pins `--by` to a verified identity.

## Architecture

```
Expand Down
17 changes: 17 additions & 0 deletions src/orc/cli_commands/_shared.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
"""Helpers shared by CLI command modules."""

from __future__ import annotations

import click

from orc.errors import WorkspaceNotFoundError
from orc.storage import workspace as ws_module


def resolve_workspace(name: str | None) -> ws_module.Workspace:
"""Resolve a workspace name (or the env default) to a Workspace, mapping
WorkspaceNotFoundError to a clean CLI error."""
try:
return ws_module.resolve(name)
except WorkspaceNotFoundError as exc:
raise click.ClickException(str(exc)) from exc
44 changes: 27 additions & 17 deletions src/orc/cli_commands/approve.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,14 +8,13 @@
from rich.console import Console
from rich.table import Table

from orc.errors import WorkspaceNotFoundError
from orc.cli_commands._shared import resolve_workspace
from orc.queue import approval as approval_module
from orc.queue.approval import (
ApprovalAlreadyDecidedError,
ApprovalNotFoundError,
DuplicateApproverError,
)
from orc.storage import workspace as ws_module

console = Console()

Expand Down Expand Up @@ -44,10 +43,7 @@ def approve_group() -> None:
@click.option("--json", "as_json", is_flag=True, help="Machine-readable JSON output")
def list_command(workspace: str | None, status: str, limit: int, as_json: bool) -> None:
"""List approvals."""
try:
ws = ws_module.resolve(workspace)
except WorkspaceNotFoundError as exc:
raise click.ClickException(str(exc)) from exc
ws = resolve_workspace(workspace)
items = approval_module.list_approvals(
ws.name, status=None if status == "all" else status, limit=limit
)
Expand Down Expand Up @@ -109,10 +105,7 @@ def list_command(workspace: str | None, status: str, limit: int, as_json: bool)
@click.option("--workspace", "-w", default=None)
def show_command(approval_id: str, workspace: str | None) -> None:
"""Print full payload for an approval."""
try:
ws = ws_module.resolve(workspace)
except WorkspaceNotFoundError as exc:
raise click.ClickException(str(exc)) from exc
ws = resolve_workspace(workspace)
try:
a = approval_module.get(ws.name, approval_id)
except ApprovalNotFoundError as exc:
Expand Down Expand Up @@ -158,19 +151,39 @@ def show_command(approval_id: str, workspace: str | None) -> None:
@click.argument("approval_id")
@click.option("--workspace", "-w", default=None)
@click.option("--note", default=None, help="Optional decision note")
@click.option("--by", "decided_by", default=None, help="Who decided (defaults to $USER)")
@click.option(
"--by",
"decided_by",
default=None,
help="Who decided (defaults to $USER). Self-reported and unauthenticated: "
"anyone with shell access can pass any name, so multi-approver gates are "
"honor-system unless an authenticated layer supplies this value.",
)
def accept_command(
approval_id: str, workspace: str | None, note: str | None, decided_by: str | None
) -> None:
"""Accept a pending approval."""
"""Accept a pending approval.

The recorded approver name comes from --by (or $USER) and is not
authenticated by orc. Deployments using approvers_required > 1 as a
compliance control (e.g. EU AI Act Article 14(5)) must ensure decisions
are submitted through an authenticated surface that pins --by to a
verified identity.
"""
_decide(approval_id, workspace, note, decided_by, accept=True)


@approve_group.command("reject")
@click.argument("approval_id")
@click.option("--workspace", "-w", default=None)
@click.option("--note", default=None, help="Optional decision note")
@click.option("--by", "decided_by", default=None)
@click.option(
"--by",
"decided_by",
default=None,
help="Who decided (defaults to $USER). Self-reported and unauthenticated; "
"see `orc approve accept --help`.",
)
def reject_command(
approval_id: str, workspace: str | None, note: str | None, decided_by: str | None
) -> None:
Expand All @@ -190,10 +203,7 @@ def _decide(

if decided_by is None:
decided_by = os.environ.get("USER") or "user"
try:
ws = ws_module.resolve(workspace)
except WorkspaceNotFoundError as exc:
raise click.ClickException(str(exc)) from exc
ws = resolve_workspace(workspace)
try:
if accept:
a = approval_module.accept(ws.name, approval_id, decided_by=decided_by, note=note)
Expand Down
7 changes: 2 additions & 5 deletions src/orc/cli_commands/audit.py
Original file line number Diff line number Diff line change
Expand Up @@ -62,12 +62,9 @@ def export_command(
"""Bundle a workspace's traces, run rows, evidence manifest, approvals,
and runtime metadata into a single tar.gz for handoff to a regulator,
auditor, or customer."""
from orc.storage import workspace as ws_module
from orc.cli_commands._shared import resolve_workspace

try:
ws = ws_module.resolve(workspace)
except Exception as exc: # noqa: BLE001 — surface as ClickException
raise click.ClickException(str(exc)) from exc
ws = resolve_workspace(workspace)

if output_path is None:
from orc.core.clock import now_iso
Expand Down
21 changes: 7 additions & 14 deletions src/orc/cli_commands/eval_cmd.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,9 +8,8 @@
import click
import yaml

from orc.errors import WorkspaceNotFoundError
from orc.cli_commands._shared import resolve_workspace
from orc.eval import gold
from orc.storage import workspace as ws_module
from orc.storage.trace_store import load_trace

_LABELS = ["supported", "contradicted", "not_found", "partial"]
Expand All @@ -26,7 +25,7 @@ def eval_group() -> None:
@click.option("--workspace", "-w", default=None, help="Workspace name (env: ORC_DEFAULT_WORKSPACE)")
def import_command(path: Path, workspace: str | None) -> None:
"""Seed gold claims from a YAML file (id/text/expected[/relevant_chunk_ids/note])."""
ws = _resolve(workspace)
ws = resolve_workspace(workspace)
items = yaml.safe_load(path.read_text()) or []
n = 0
for item in items:
Expand Down Expand Up @@ -69,7 +68,7 @@ def label_command(
raise click.ClickException(f"Run {run_id} has no claim to label")
# Resolve the workspace (not just read its name from the trace) so a
# workspace whose db predates schema v2 gets migrated before we write gold.
_resolve(trace["workspace"])
resolve_workspace(trace["workspace"])
gold.add(
trace["workspace"],
claim=claim,
Expand All @@ -92,7 +91,7 @@ def run_command(workspace: str | None, mode: str, k: int, as_json: bool) -> None
"""Score the gate against the workspace's gold set."""
from orc.eval.runner import run_eval

ws = _resolve(workspace)
ws = resolve_workspace(workspace)
try:
report = run_eval(ws.name, mode=mode, k=k)
except ValueError as exc:
Expand Down Expand Up @@ -128,7 +127,7 @@ def show_command(eval_id: str, workspace: str | None, as_json: bool) -> None:
"""Reprint a persisted eval report."""
from orc.eval.runner import load_eval

ws = _resolve(workspace)
ws = resolve_workspace(workspace)
try:
report = load_eval(ws.name, eval_id)
except KeyError as exc:
Expand Down Expand Up @@ -165,7 +164,7 @@ def calibrate_command(
from orc.eval.calibrate import DEFAULT_TIER1_MODEL, DEFAULT_TIER2_MODEL, calibrate
from orc.eval.policy import save_policy

ws = _resolve(workspace)
ws = resolve_workspace(workspace)
t1 = tier1_model or DEFAULT_TIER1_MODEL
t2 = tier2_model or DEFAULT_TIER2_MODEL
result = calibrate(ws.name, target=target, tier1_model=t1)
Expand Down Expand Up @@ -207,7 +206,7 @@ def calibrate_command(
@click.option("--json", "as_json", is_flag=True)
def gold_command(action: str, workspace: str | None, as_json: bool) -> None:
"""Inspect the gold set (currently: list)."""
ws = _resolve(workspace)
ws = resolve_workspace(workspace)
items = gold.list_gold(ws.name)
stale = {
g.gold_id
Expand Down Expand Up @@ -239,9 +238,3 @@ def gold_command(action: str, workspace: str | None, as_json: bool) -> None:
flag = " [stale chunk labels]" if g.gold_id in stale else ""
click.echo(f"{g.gold_id} {g.expected_label:<12} {g.claim[:60]}{flag}")


def _resolve(workspace: str | None) -> ws_module.Workspace:
try:
return ws_module.resolve(workspace)
except WorkspaceNotFoundError as exc:
raise click.ClickException(str(exc)) from exc
8 changes: 2 additions & 6 deletions src/orc/cli_commands/execute.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,17 +13,16 @@
from rich.console import Console

from orc import effects
from orc.cli_commands._shared import resolve_workspace
from orc.effects.action import Action
from orc.effects.base import MissingCredentialError
from orc.errors import WorkspaceNotFoundError
from orc.queue import approval as approval_module
from orc.queue.approval import (
ActionDeadError,
AlreadyExecutedError,
ApprovalNotFoundError,
NotApprovedError,
)
from orc.storage import workspace as ws_module

console = Console()

Expand All @@ -33,10 +32,7 @@
@click.option("--workspace", "-w", default=None, help="Workspace name (env: ORC_DEFAULT_WORKSPACE)")
def execute_command(approval_id: str, workspace: str | None) -> None:
"""Execute an approved action by approval_id."""
try:
ws = ws_module.resolve(workspace)
except WorkspaceNotFoundError as exc:
raise click.ClickException(str(exc)) from exc
ws = resolve_workspace(workspace)

existing = approval_module.get_execution(ws.name, approval_id)
if existing is not None and existing["exec_status"] == "succeeded":
Expand Down
9 changes: 3 additions & 6 deletions src/orc/cli_commands/ingest.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,9 @@
import click
from rich.console import Console

from orc.errors import IngestError, WorkspaceNotFoundError
from orc.cli_commands._shared import resolve_workspace
from orc.errors import IngestError
from orc.ingest.pipeline import ingest as do_ingest
from orc.storage import workspace as ws_module

console = Console()

Expand All @@ -18,10 +18,7 @@
@click.option("--no-recursive", is_flag=True, help="Skip recursing into subdirectories")
def ingest_command(source: str, workspace: str | None, no_recursive: bool) -> None:
"""Ingest a file, directory, or URL into the workspace's evidence corpus."""
try:
ws = ws_module.resolve(workspace)
except WorkspaceNotFoundError as exc:
raise click.ClickException(str(exc)) from exc
ws = resolve_workspace(workspace)
try:
ids = do_ingest(ws, source, recursive=not no_recursive)
except IngestError as exc:
Expand Down
8 changes: 2 additions & 6 deletions src/orc/cli_commands/propose.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,10 +15,9 @@
import click

from orc import effects
from orc.errors import WorkspaceNotFoundError
from orc.cli_commands._shared import resolve_workspace
from orc.paths import config_path
from orc.runs import open_run
from orc.storage import workspace as ws_module


def _load_params(raw: str) -> dict[str, Any]:
Expand Down Expand Up @@ -61,10 +60,7 @@ def propose_command(
wrong (it would silently stage a duplicate effect).
"""
params = _load_params(params_raw)
try:
ws = ws_module.resolve(workspace)
except WorkspaceNotFoundError as exc:
raise click.ClickException(str(exc)) from exc
ws = resolve_workspace(workspace)

with open_run(
ws,
Expand Down
8 changes: 2 additions & 6 deletions src/orc/cli_commands/research.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,9 +8,8 @@
from rich.console import Console

from orc import directives
from orc.errors import WorkspaceNotFoundError
from orc.cli_commands._shared import resolve_workspace
from orc.runs import open_run
from orc.storage import workspace as ws_module

console = Console()

Expand All @@ -29,10 +28,7 @@ def research_command(
as_json: bool,
) -> None:
"""Research a topic against the workspace's evidence corpus."""
try:
ws = ws_module.resolve(workspace)
except WorkspaceNotFoundError as exc:
raise click.ClickException(str(exc)) from exc
ws = resolve_workspace(workspace)

spec = directives.get("research")
skill = spec.skills["research_topic"]
Expand Down
Loading
Loading