From c90e8a5ccaa861eedef5789c08b5a4d932afae4b Mon Sep 17 00:00:00 2001 From: Cursor Agent Date: Mon, 4 May 2026 15:26:48 +0000 Subject: [PATCH 01/30] docs(deploy): add Fly.io fly.toml and deployment steps Include optional persistent volume mount template, health check on /health, and security notes for FLIGHTDECK_LOCAL_API_TOKEN and read-only UI builds. Co-authored-by: Gottam Sai Bharath --- examples/README.md | 2 +- examples/deploy/README.md | 37 ++++++++++++++++++++++++++++++++ examples/deploy/fly.toml | 44 +++++++++++++++++++++++++++++++++++++++ 3 files changed, 82 insertions(+), 1 deletion(-) create mode 100644 examples/deploy/fly.toml diff --git a/examples/README.md b/examples/README.md index e990f1b..82c6829 100644 --- a/examples/README.md +++ b/examples/README.md @@ -30,7 +30,7 @@ Use this as a **discoverability** pass for the **[ROADMAP.md](../ROADMAP.md)** s |------|---------| | [quickstart/](quickstart/) | Minimal workspace used by `flightdeck-quickstart-verify`. | | [ci/](ci/README.md) | Policy gate script, sample policy YAML, GitHub Actions job snippets. | -| [deploy/](deploy/README.md) | Dockerfile and compose for `flightdeck serve`. | +| [deploy/](deploy/README.md) | Dockerfile and compose for `flightdeck serve`; optional **Fly.io** (`fly.toml`). | | [integration/](integration/README.md) | Sample event emitter for HTTP ingest. | | [integration/adoption/](integration/adoption/README.md) | OpenAI, Anthropic, LangChain, Agents SDK, CrewAI-style totals, Temporal labels → `RunEvent`. | | [fleet/](fleet/README.md) | Multi-workspace naming, optional catalog path, approval workflow notes. | diff --git a/examples/deploy/README.md b/examples/deploy/README.md index 68d4127..066b475 100644 --- a/examples/deploy/README.md +++ b/examples/deploy/README.md @@ -40,6 +40,43 @@ Inside the Compose stack, **`exec`** into the running container with **`/workspa Set **`FLIGHTDECK_LOCAL_API_TOKEN`** in your environment before `docker compose up` (or in an `.env` file beside `docker-compose.yml`). Clients must send **`Authorization: Bearer …`** for **ledger writes**: **`POST /v1/promote*`**, **`POST /v1/rollback`**, and **`POST /v1/events`**. With no token configured, those routes accept only **loopback** callers. **`POST /v1/diff`** stays unauthenticated (read-only); still treat network placement as a trust boundary. +For **Fly.io** (public HTTPS demo or staging), see **[Fly.io](#flyio)** below. + +## Fly.io + +Deploy the same Docker image to [Fly Machines](https://fly.io/docs/machines/). This gives you a URL you can open from any browser; treat it as **trusted** or lock it down with **`FLIGHTDECK_LOCAL_API_TOKEN`** (see **[SECURITY.md](../../SECURITY.md)**). + +### One-time setup + +1. Install [`flyctl`](https://fly.io/docs/hands-on/install-flyctl/) and run **`fly auth login`**. +2. From **`examples/deploy/`**: + - Edit **`fly.toml`**: set **`app`** to a unique name (or run **`fly apps create `** and match). + - Optional **persistent ledger**: create a volume in the **same region** as **`primary_region`**: + ```bash + fly volumes create fd_workspace --region iad --size 1 + ``` + Uncomment the **`[mounts]`** block at the bottom of **`fly.toml`** (`source = "fd_workspace"`, `destination = "/workspace"`). +3. **Secrets** (recommended once you expose the app on the internet): + ```bash + fly secrets set FLIGHTDECK_LOCAL_API_TOKEN="$(openssl rand -hex 24)" + ``` + The server then expects **`Authorization: Bearer …`** for ledger writes from non-loopback clients. The stock **`examples/deploy` image** does not embed a browser token; use either **read-only UI** (`VITE_FLIGHTDECK_UI_READ_ONLY=true` in a custom image build — see **`docs/web-ui.md`**) or rebuild the image with **`VITE_FLIGHTDECK_LOCAL_API_TOKEN`** matching your secret so the bundled UI can call promote/diff when **`read_auth`** is bearer-gated. + +### Deploy + +```bash +cd examples/deploy +fly deploy --remote-only +``` + +Open **`https://.fly.dev/`** — static UI and **`/v1/*`** on the same origin. + +### Notes + +- **Cold starts:** **`fly.toml`** allows **`min_machines_running = 0`**; first request may wake the Machine. +- **Demo-only UI:** ship a build with **`VITE_FLIGHTDECK_UI_READ_ONLY=true`** if you only want read-only navigation (rebuild **`web/`** and static bundle per **`docs/web-ui.md`**). +- **Maintainers:** this repo cannot run **`fly deploy`** for you; use your own Fly org and the steps above. + ## Helm (optional single-replica chart) A minimal chart lives under **`chart/flightdeck/`**. It runs one replica of **`flightdeck serve`** with an **`emptyDir`** workspace (ephemeral); for a persistent ledger, replace the volume in **`templates/deployment.yaml`** with a PVC or mount your own image init. diff --git a/examples/deploy/fly.toml b/examples/deploy/fly.toml new file mode 100644 index 0000000..5e6355d --- /dev/null +++ b/examples/deploy/fly.toml @@ -0,0 +1,44 @@ +# Fly.io — deploy `flightdeck serve` from this directory (`examples/deploy`). +# +# Prerequisites: `flyctl auth login`, then either: +# fly apps create +# and set `app` below to match, or run `fly launch` once and merge settings. +# +# Ephemeral ledger: omit [mounts] (data may reset when Fly replaces the Machine). +# Persistent SQLite: create a volume and uncomment [mounts] (see README). + +app = "flightdeck-demo" +primary_region = "iad" + +[build] + dockerfile = "Dockerfile" + +[env] + # Strongly recommended for any non-loopback deploy (see SECURITY.md): + # fly secrets set FLIGHTDECK_LOCAL_API_TOKEN="$(openssl rand -hex 24)" + # Do not commit tokens; use Fly secrets only. + +[http_service] + internal_port = 8765 + force_https = true + auto_stop_machines = true + auto_start_machines = true + min_machines_running = 0 + +[[http_service.checks]] + grace_period = "20s" + interval = "30s" + method = "GET" + timeout = "5s" + path = "/health" + +[[vm]] + memory = "512mb" + cpu_kind = "shared" + cpus = 1 + +# Uncomment after: fly volumes create fd_workspace --region iad --size 1 +# (region must match primary_region; size is GB) +# [mounts] +# source = "fd_workspace" +# destination = "/workspace" From 80b8da684db8df715766547a9c7bc019f61d8f78 Mon Sep 17 00:00:00 2001 From: Cursor Agent Date: Mon, 4 May 2026 15:33:45 +0000 Subject: [PATCH 02/30] feat(deploy): Railway docs, railway.toml, and PORT-aware entrypoint Railway injects PORT at runtime; entrypoint binds flightdeck serve to PORT with 8765 fallback for local Compose. Dockerfile healthcheck follows PORT. Add railway.toml (Dockerfile builder, /health check) and README section with pricing caveat, volume, and token guidance. Co-authored-by: Gottam Sai Bharath --- examples/README.md | 2 +- examples/deploy/Dockerfile | 3 ++- examples/deploy/README.md | 25 ++++++++++++++++++++++++- examples/deploy/entrypoint.sh | 4 +++- examples/deploy/railway.toml | 14 ++++++++++++++ 5 files changed, 44 insertions(+), 4 deletions(-) create mode 100644 examples/deploy/railway.toml diff --git a/examples/README.md b/examples/README.md index e990f1b..6bfdabb 100644 --- a/examples/README.md +++ b/examples/README.md @@ -30,7 +30,7 @@ Use this as a **discoverability** pass for the **[ROADMAP.md](../ROADMAP.md)** s |------|---------| | [quickstart/](quickstart/) | Minimal workspace used by `flightdeck-quickstart-verify`. | | [ci/](ci/README.md) | Policy gate script, sample policy YAML, GitHub Actions job snippets. | -| [deploy/](deploy/README.md) | Dockerfile and compose for `flightdeck serve`. | +| [deploy/](deploy/README.md) | Dockerfile and compose for `flightdeck serve`; optional **Railway** (`railway.toml`). | | [integration/](integration/README.md) | Sample event emitter for HTTP ingest. | | [integration/adoption/](integration/adoption/README.md) | OpenAI, Anthropic, LangChain, Agents SDK, CrewAI-style totals, Temporal labels → `RunEvent`. | | [fleet/](fleet/README.md) | Multi-workspace naming, optional catalog path, approval workflow notes. | diff --git a/examples/deploy/Dockerfile b/examples/deploy/Dockerfile index 837c767..2e904c3 100644 --- a/examples/deploy/Dockerfile +++ b/examples/deploy/Dockerfile @@ -11,7 +11,8 @@ RUN chmod +x /entrypoint.sh EXPOSE 8765 +# Respect PORT at runtime (e.g. Railway); default matches local Compose. HEALTHCHECK --interval=30s --timeout=5s --start-period=15s --retries=3 \ - CMD python -c "import urllib.request; urllib.request.urlopen('http://127.0.0.1:8765/health').read()" + CMD python -c "import os,urllib.request; p=os.environ.get('PORT','8765'); urllib.request.urlopen(f'http://127.0.0.1:{p}/health').read()" ENTRYPOINT ["/entrypoint.sh"] diff --git a/examples/deploy/README.md b/examples/deploy/README.md index 68d4127..9ebd407 100644 --- a/examples/deploy/README.md +++ b/examples/deploy/README.md @@ -10,7 +10,7 @@ Build (from this directory): docker build -t flightdeck-serve:local . ``` -The image installs **`flightdeck-ai`** from PyPI and runs **`flightdeck serve`** on **`0.0.0.0:8765`** inside the container. +The image installs **`flightdeck-ai`** from PyPI and runs **`flightdeck serve`** on **`0.0.0.0`** using port **`8765`** by default. On platforms that set **`PORT`** (for example **Railway**), **`entrypoint.sh`** binds to **`$PORT`** instead. **`entrypoint.sh`** creates a default **`flightdeck.yaml`** in `/workspace` on first start (`flightdeck init`) if the mounted volume is empty. @@ -40,6 +40,29 @@ Inside the Compose stack, **`exec`** into the running container with **`/workspa Set **`FLIGHTDECK_LOCAL_API_TOKEN`** in your environment before `docker compose up` (or in an `.env` file beside `docker-compose.yml`). Clients must send **`Authorization: Bearer …`** for **ledger writes**: **`POST /v1/promote*`**, **`POST /v1/rollback`**, and **`POST /v1/events`**. With no token configured, those routes accept only **loopback** callers. **`POST /v1/diff`** stays unauthenticated (read-only); still treat network placement as a trust boundary. +## Railway + +[Railway](https://railway.app/) often suits **small demos**; pricing and free allowances change — confirm **[Railway pricing](https://railway.com/pricing)** before relying on **`$0/month`** long term. + +### Deploy from this repo + +1. Create a **new project** → **Deploy from GitHub** (or **`railway init`** / **`railway link`** with the [CLI](https://docs.railway.app/guides/cli)). +2. Set the service **root directory** to **`examples/deploy`** so Railway builds **`Dockerfile`** and picks up **`railway.toml`** (config-as-code). + If the dashboard root cannot be a subdirectory, set [**`RAILWAY_DOCKERFILE_PATH`**](https://docs.railway.app/guides/dockerfiles) (service variable) to **`examples/deploy/Dockerfile`** and point **config as code** at **`examples/deploy/railway.toml`** per [config-as-code](https://docs.railway.app/guides/config-as-code). +3. **Networking:** enable **Public Networking** and **Generate Domain** (HTTPS). Railway routes traffic to the **`PORT`** your process listens on; **`entrypoint.sh`** uses **`PORT`** automatically. +4. **Variables (recommended for any public URL):** add **`FLIGHTDECK_LOCAL_API_TOKEN`** (random secret). The stock PyPI image does **not** embed that token in the browser bundle — use **read-only UI** (`VITE_FLIGHTDECK_UI_READ_ONLY=true` in a **custom image build**) or rebuild static assets with **`VITE_FLIGHTDECK_LOCAL_API_TOKEN`** so the UI can authenticate when **`read_auth`** is bearer-gated — see **`docs/web-ui.md`** and **[SECURITY.md](../../SECURITY.md)**. +5. **Persistent SQLite (optional):** add a [Railway volume](https://docs.railway.app/guides/volumes) mounted at **`/workspace`** so redeploys keep **`.flightdeck/`**. Without a volume, the ledger may reset when the container is recreated. + +CLI sketch (from **`examples/deploy`** after **`railway link`**): + +```bash +railway login +cd examples/deploy +railway variable set FLIGHTDECK_LOCAL_API_TOKEN="$(openssl rand -hex 24)" +railway up +railway domain # generate .railway.app URL if needed +``` + ## Helm (optional single-replica chart) A minimal chart lives under **`chart/flightdeck/`**. It runs one replica of **`flightdeck serve`** with an **`emptyDir`** workspace (ephemeral); for a persistent ledger, replace the volume in **`templates/deployment.yaml`** with a PVC or mount your own image init. diff --git a/examples/deploy/entrypoint.sh b/examples/deploy/entrypoint.sh index d7543bf..5c4efeb 100644 --- a/examples/deploy/entrypoint.sh +++ b/examples/deploy/entrypoint.sh @@ -4,4 +4,6 @@ cd /workspace if [ ! -f flightdeck.yaml ]; then flightdeck init fi -exec flightdeck serve --host 0.0.0.0 --port 8765 "$@" +# Railway (and some PaaS) inject PORT; local Compose defaults to 8765 via Dockerfile EXPOSE. +FD_PORT="${PORT:-8765}" +exec flightdeck serve --host 0.0.0.0 --port "$FD_PORT" "$@" diff --git a/examples/deploy/railway.toml b/examples/deploy/railway.toml new file mode 100644 index 0000000..7442f1d --- /dev/null +++ b/examples/deploy/railway.toml @@ -0,0 +1,14 @@ +# Railway config-as-code for FlightDeck reference image. +# Deploy with repo root directory set to `examples/deploy` (GitHub integration), +# or run `railway up` from this directory after `railway link`. +# +# Docs: https://docs.railway.app/guides/config-as-code + +[build] +builder = "DOCKERFILE" +dockerfilePath = "Dockerfile" + +[deploy] +healthcheckPath = "/health" +healthcheckTimeout = 300 +restartPolicyType = "ON_FAILURE" From d542d8358928f8ee81d898a368758da2fe727c24 Mon Sep 17 00:00:00 2001 From: Cursor Agent Date: Tue, 5 May 2026 19:51:48 +0000 Subject: [PATCH 03/30] Add flightdeck demo for one-command quickstart onboarding Introduce `flightdeck demo` to run the examples/quickstart ledger flow in a temp workspace without sed or fixture paths. Ship quickstart fixtures in the wheel via Hatch force-include to `_bundled_quickstart` for PyPI installs. Refactor quickstart_smoke to share demo_flow helpers; document FLIGHTDECK_QUICKSTART_ROOT. Co-authored-by: Gottam Sai Bharath --- CHANGELOG.md | 1 + DEVELOPMENT.md | 11 ++ README.md | 23 +++ docs/cli.md | 25 ++- examples/quickstart/README.md | 8 +- pyproject.toml | 1 + .../_bundled_quickstart/__init__.py | 1 + src/flightdeck/cli/main.py | 60 ++++++ src/flightdeck/demo_flow.py | 181 ++++++++++++++++++ src/flightdeck/quickstart_smoke.py | 68 +------ tests/test_demo_flow.py | 76 ++++++++ 11 files changed, 389 insertions(+), 66 deletions(-) create mode 100644 src/flightdeck/_bundled_quickstart/__init__.py create mode 100644 src/flightdeck/demo_flow.py create mode 100644 tests/test_demo_flow.py diff --git a/CHANGELOG.md b/CHANGELOG.md index b6204d0..dfa01b7 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -8,6 +8,7 @@ This project follows [Semantic Versioning](https://semver.org/). From **v1.0.0** ### Added +- **`flightdeck demo`** — runs the packaged **examples/quickstart** workflow (init → pricing → policy → register → ingest → diff → promote → history) in a **temp workspace**, with no **`sed`** or repo paths; wheels ship fixtures under **`flightdeck/_bundled_quickstart`** via Hatch **`force-include`**. **`FLIGHTDECK_QUICKSTART_ROOT`** overrides fixture resolution for CI or forks. - **Web UI (`flightdeck serve`):** **`/#/settings`** for appearance (Light / Dark / System, **`flightdeck-theme`**); collapsible sidebar (**`flightdeck-sidebar-collapsed`**); **offline system font stack** (no remote font CSS); sidebar + favicon use **bundled** **`/assets/flightdeck-icon-*.png`** with stable **`GET /flightdeck-icon.png`** fallback; **`html[data-theme="dark"]`** tokens and Playwright **`web/e2e/`** (`smoke` icon checks, `theme.spec.ts`, `sidebar.spec.ts`). - **`flightdeck pricing check`** — reports **`flightdeck-bundled-*`** snapshot age vs **`--max-age-days`** (default **90**); **`--fail`** for CI. **`release diff`** / **`POST /v1/diff`** append **`pricing.warnings`** when bundled snapshots exceed the same age threshold. - **`flightdeck.integrations.telemetry.configure_otel_tracing()`** — optional OTLP HTTP **`TracerProvider`** wiring when the **`telemetry`** extra is installed (see **`docs/sdk-integrations.md`**). diff --git a/DEVELOPMENT.md b/DEVELOPMENT.md index f1d2c5c..60f02a0 100644 --- a/DEVELOPMENT.md +++ b/DEVELOPMENT.md @@ -73,9 +73,12 @@ python -m ruff check src tests python -m pytest flightdeck --help flightdeck doctor +flightdeck demo flightdeck-quickstart-verify ``` +**Fast path for contributors:** **`flightdeck demo`** runs the same core ledger steps as below in a **temp workspace** (fixtures from **`examples/quickstart`**, or **`flightdeck/_bundled_quickstart`** inside an installed wheel). **`flightdeck-quickstart-verify`** adds **`release verify`** + **`doctor`**. + Match **CI**’s CLI smoke: **`flightdeck --help`** must run successfully after changes to the CLI surface. Full command flags and exit codes: [README.md](https://github.com/flightdeckdev/flightdeck/blob/main/README.md). Cross-platform quickstart parity: **`flightdeck-quickstart-verify`** / **`python -m flightdeck.quickstart_smoke`** (also run in CI). HTTP API reference: **[docs/http-api.md](docs/http-api.md)**. Python SDK: **[docs/sdk.md](docs/sdk.md)**. @@ -151,6 +154,14 @@ If **PyPI** rejects **attestations** for your project, set **`attestations: fals ## Local Demo +**One command** (uses bundled **`examples/quickstart`** fixtures; no **`sed`**): + +```bash +flightdeck demo +``` + +**Manual** (same story as **`flightdeck demo`**, in your cwd): + ```bash flightdeck init flightdeck pricing import examples/quickstart/pricing-baseline.yaml diff --git a/README.md b/README.md index 8501379..6121b69 100644 --- a/README.md +++ b/README.md @@ -67,11 +67,33 @@ flowchart LR --- +## Fast start + +After **`pip install flightdeck-ai`** (or **`uv tool install flightdeck-ai`**): + +```bash +flightdeck demo +``` + +**`flightdeck demo`** runs the full quickstart ledger flow in a disposable temp workspace—no **`sed`**, no fixture paths—using **`examples/quickstart`** from your checkout or packaged **`flightdeck/_bundled_quickstart`** from PyPI. + +**Web UI** (needs a workspace in the current directory): + +```bash +flightdeck init +flightdeck serve +``` + +Open **http://127.0.0.1:8765/**. Same end-to-end checks CI uses: **`flightdeck-quickstart-verify`** (contributors: **`uv run flightdeck-quickstart-verify`**). + +--- + ## Install and smoke-test ```bash uv sync --extra dev uv run flightdeck --help +uv run flightdeck demo uv run flightdeck-quickstart-verify ``` @@ -138,6 +160,7 @@ Bundled pricing from `init` is a **convenience snapshot**—`flightdeck pricing uv sync --frozen --extra dev uv run python -m ruff check src tests uv run python -m pytest +uv run flightdeck demo uv run flightdeck-quickstart-verify uv run flightdeck --help ``` diff --git a/docs/cli.md b/docs/cli.md index 5591085..7177af9 100644 --- a/docs/cli.md +++ b/docs/cli.md @@ -15,10 +15,12 @@ serve` see [http-api.md](http-api.md). | `--version` | Print the installed version and exit | | `--help` | Print help for any command or subcommand | -All commands require a `flightdeck.yaml` in the working directory (or the default path +Most commands require `flightdeck.yaml` in the working directory (or the default path `./flightdeck.yaml`). Run `flightdeck init` to create one. **`flightdeck init`** writes the config, then loads it to migrate the ledger and (by default) import bundled pricing. +**`flightdeck demo`** is an exception: it creates a **temporary** workspace and does not read `./flightdeck.yaml` from your shell cwd. + ## Actor resolution Several commands that write to the audit ledger (`release promote`, `release rollback`, @@ -76,6 +78,27 @@ set **`database_url`** to a `postgresql://…` (or `postgres://…`) DSN and ins --- +## `flightdeck demo` + +Run the **examples/quickstart** workflow end-to-end in a **disposable temp directory**: **`init`** → custom **`pricing import`** (both YAMLs) → **`policy set`** → **`release register`** (both bundles) → substitute **`release_id`** placeholders in JSONL → **`runs ingest`** → **`release diff`** → **`release promote`** (baseline under policy) → **`release history`**. + +Does **not** require **`flightdeck.yaml`** in the current directory. Fixtures resolve in order: **`--quickstart-root`**, **`FLIGHTDECK_QUICKSTART_ROOT`**, **`examples/quickstart`** relative to a git checkout, then **`flightdeck/_bundled_quickstart`** packaged in the wheel (PyPI installs). + +```bash +flightdeck demo [--quickstart-root DIR] [--verify / --no-verify] [--doctor / --no-doctor] [--keep-workspace] +``` + +| Option | Default | Description | +|--------|---------|-------------| +| `--quickstart-root` | (see above) | Directory containing `policy.yaml`, pricing YAMLs, `*-events.jsonl`, and `baseline-release` / `candidate-release` | +| `--verify` | off | Also run **`release verify`** on the baseline bundle (parity with **`flightdeck-quickstart-verify`**) | +| `--doctor` | off | Also run **`flightdeck doctor`** | +| `--keep-workspace` | off | Keep the temp workspace and print its path | + +On success, prints a short confirmation. Exit **0** on success, **1** on failure (same as subprocess failures from underlying CLI steps). + +--- + ## `flightdeck doctor` Run read-only health checks on the workspace ledger (SQLite file or PostgreSQL when diff --git a/examples/quickstart/README.md b/examples/quickstart/README.md index c2a361c..77a1206 100644 --- a/examples/quickstart/README.md +++ b/examples/quickstart/README.md @@ -7,7 +7,13 @@ These files are meant to be copied or substituted locally: - `policy.yaml` is an example active policy used by `release diff` and `release promote`. - `*-events.jsonl` contain placeholder `release_id` values (`__BASELINE_RELEASE_ID__`, `__CANDIDATE_RELEASE_ID__`). -Fastest path (from **repository root**, with **uv**): +Fastest path after **`pip install flightdeck-ai`**: + +```bash +flightdeck demo +``` + +Full CI parity (verify + doctor; from **repository root** with **uv**): ```bash uv run flightdeck-quickstart-verify diff --git a/pyproject.toml b/pyproject.toml index 5faa96e..17cfad1 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -78,6 +78,7 @@ flightdeck-quickstart-verify = "flightdeck.quickstart_smoke:quickstart_verify_ma [tool.hatch.build.targets.wheel] packages = ["src/flightdeck"] +force-include = { "examples/quickstart" = "src/flightdeck/_bundled_quickstart" } [tool.uv] # Contributor installs: `uv sync --extra dev` (see DEVELOPMENT.md). After changing diff --git a/src/flightdeck/_bundled_quickstart/__init__.py b/src/flightdeck/_bundled_quickstart/__init__.py new file mode 100644 index 0000000..1af0f92 --- /dev/null +++ b/src/flightdeck/_bundled_quickstart/__init__.py @@ -0,0 +1 @@ +"""Bundled quickstart fixtures (wheel: see pyproject hatch force-include).""" diff --git a/src/flightdeck/cli/main.py b/src/flightdeck/cli/main.py index a1e832b..7238ea6 100644 --- a/src/flightdeck/cli/main.py +++ b/src/flightdeck/cli/main.py @@ -14,6 +14,7 @@ from flightdeck import __version__ from flightdeck.bundle import bundle_checksum +from flightdeck.demo_flow import demo_session from flightdeck.bundled_pricing_bootstrap import ( BUNDLED_PRICING_VERSION, DEFAULT_CATALOG_RELATIVE_PATH, @@ -106,6 +107,65 @@ def init(path_: str, no_bundled_pricing: bool) -> None: ) +@cli.command() +@click.option( + "--quickstart-root", + "quickstart_root_opt", + type=click.Path(exists=True, file_okay=False, path_type=Path), + default=None, + help="Directory with quickstart YAML/JSONL fixtures (default: repo examples/ or bundled wheel copy).", +) +@click.option( + "--verify/--no-verify", + default=False, + show_default=True, + help="Also run release verify on the baseline bundle (matches flightdeck-quickstart-verify).", +) +@click.option( + "--doctor/--no-doctor", + default=False, + show_default=True, + help="Also run flightdeck doctor after the workflow.", +) +@click.option( + "--keep-workspace", + is_flag=True, + default=False, + help="Keep the temp workspace and print its path (for inspection).", +) +def demo( + quickstart_root_opt: Path | None, + verify: bool, + doctor: bool, + keep_workspace: bool, +) -> None: + """Run the bundled quickstart end-to-end in a disposable workspace (no manual sed). + + Typical install: ``pip install flightdeck-ai`` then ``flightdeck demo``. Next: ``flightdeck init`` + in your project and wire ``runs ingest`` / ``release diff`` from real agents. + """ + ws = demo_session( + verify=verify, + doctor=doctor, + qs_dir=str(quickstart_root_opt) if quickstart_root_opt is not None else None, + promote_reason="demo", + keep_workspace=keep_workspace, + ) + click.echo( + "Demo OK — workspace initialized, releases registered, runs ingested, " + "diff computed, baseline promoted under policy." + ) + extras = [] + if verify: + extras.append("verify") + if doctor: + extras.append("doctor") + if extras: + click.echo(f"(also ran: {', '.join(extras)})") + if keep_workspace and ws is not None: + click.echo(f"Workspace: {ws}") + + @cli.command("doctor") @click.option( "--backup", diff --git a/src/flightdeck/demo_flow.py b/src/flightdeck/demo_flow.py new file mode 100644 index 0000000..f540303 --- /dev/null +++ b/src/flightdeck/demo_flow.py @@ -0,0 +1,181 @@ +"""Shared quickstart demo / CI verification (fixtures + subprocess CLI calls).""" + +from __future__ import annotations + +import os +import shutil +import subprocess +import sys +import tempfile +from pathlib import Path + +BASELINE_PH = "__BASELINE_RELEASE_ID__" +CANDIDATE_PH = "__CANDIDATE_RELEASE_ID__" + + +def flightdeck_argv() -> list[str]: + exe = shutil.which("flightdeck") + if exe: + return [exe] + return [sys.executable, "-m", "flightdeck.cli.main"] + + +def quickstart_root(*, env_dir: str | None = None) -> Path: + """Resolve the directory containing quickstart YAML/JSONL fixtures. + + Order: explicit ``env_dir``, ``FLIGHTDECK_QUICKSTART_ROOT``, repo + ``examples/quickstart``, then wheel-bundled ``_bundled_quickstart``. + """ + if env_dir: + p = Path(env_dir).expanduser() + if not p.is_dir(): + msg = f"Not a directory: {p}" + raise FileNotFoundError(msg) + return p.resolve() + + env = os.environ.get("FLIGHTDECK_QUICKSTART_ROOT") + if env: + p = Path(env).expanduser() + if not p.is_dir(): + msg = ( + f"FLIGHTDECK_QUICKSTART_ROOT is not a directory: {p}. " + "Unset it or point it at examples/quickstart." + ) + raise FileNotFoundError(msg) + return p.resolve() + + repo = Path(__file__).resolve().parents[2] + examples = repo / "examples" / "quickstart" + if examples.is_dir(): + return examples + + bundled = Path(__file__).resolve().parent / "_bundled_quickstart" + if bundled.is_dir() and (bundled / "policy.yaml").is_file(): + return bundled + + msg = ( + "Quickstart fixtures not found. Clone the repo or set FLIGHTDECK_QUICKSTART_ROOT " + "to a copy of examples/quickstart." + ) + raise FileNotFoundError(msg) + + +def _run(fd: list[str], *args: str, cwd: Path) -> subprocess.CompletedProcess[str]: + return subprocess.run( + [*fd, *args], + cwd=cwd, + check=True, + text=True, + capture_output=True, + ) + + +def run_quickstart_verify( + workspace: Path, + qs: Path, + fd: list[str] | None = None, + *, + verify: bool = True, + doctor: bool = True, + promote_reason: str = "quickstart smoke", +) -> None: + """Run the full quickstart workflow used by CI (temp workspace must exist).""" + fd = fd or flightdeck_argv() + baseline_events = workspace / "baseline-events.jsonl" + candidate_events = workspace / "candidate-events.jsonl" + + _run(fd, "init", cwd=workspace) + _run(fd, "pricing", "import", str(qs / "pricing-baseline.yaml"), cwd=workspace) + _run(fd, "pricing", "import", str(qs / "pricing-candidate.yaml"), cwd=workspace) + _run(fd, "policy", "set", str(qs / "policy.yaml"), cwd=workspace) + + reg_b = _run(fd, "release", "register", str(qs / "baseline-release"), cwd=workspace) + baseline_id = reg_b.stdout.strip() + reg_c = _run(fd, "release", "register", str(qs / "candidate-release"), cwd=workspace) + candidate_id = reg_c.stdout.strip() + + baseline_events.write_text( + (qs / "baseline-events.jsonl").read_text(encoding="utf-8").replace(BASELINE_PH, baseline_id), + encoding="utf-8", + ) + candidate_events.write_text( + (qs / "candidate-events.jsonl").read_text(encoding="utf-8").replace(CANDIDATE_PH, candidate_id), + encoding="utf-8", + ) + + _run(fd, "runs", "ingest", str(baseline_events), cwd=workspace) + _run(fd, "runs", "ingest", str(candidate_events), cwd=workspace) + _run(fd, "release", "diff", baseline_id, candidate_id, "--window", "7d", cwd=workspace) + _run( + fd, + "release", + "promote", + baseline_id, + "--env", + "local", + "--window", + "7d", + "--reason", + promote_reason, + cwd=workspace, + ) + _run(fd, "release", "history", "--agent", "agent_support", "--env", "local", cwd=workspace) + if verify: + _run(fd, "release", "verify", baseline_id, "--path", str(qs / "baseline-release"), cwd=workspace) + if doctor: + _run(fd, "doctor", cwd=workspace) + + +def run_demo_happy_path( + workspace: Path, + qs: Path, + fd: list[str] | None = None, + *, + verify: bool = False, + doctor: bool = False, + promote_reason: str = "demo", +) -> None: + """Minimal demo: same ledger steps as CI verify, optional verify/doctor.""" + run_quickstart_verify( + workspace, + qs, + fd, + verify=verify, + doctor=doctor, + promote_reason=promote_reason, + ) + + +def demo_session( + *, + verify: bool, + doctor: bool, + qs_dir: str | None, + promote_reason: str, + keep_workspace: bool, +) -> Path | None: + """Create a temp workspace, run the demo. + + Removes the workspace when ``keep_workspace`` is false (unless setup fails). + Returns the workspace path when ``keep_workspace`` is true; otherwise ``None``. + """ + qs = quickstart_root(env_dir=qs_dir) + fd = flightdeck_argv() + tmp_s = tempfile.mkdtemp(prefix="flightdeck_demo_") + workspace = Path(tmp_s) + try: + run_demo_happy_path( + workspace, + qs, + fd, + verify=verify, + doctor=doctor, + promote_reason=promote_reason, + ) + except Exception: + shutil.rmtree(workspace, ignore_errors=True) + raise + if keep_workspace: + return workspace + shutil.rmtree(workspace, ignore_errors=True) + return None diff --git a/src/flightdeck/quickstart_smoke.py b/src/flightdeck/quickstart_smoke.py index deb03fd..c1c9b80 100644 --- a/src/flightdeck/quickstart_smoke.py +++ b/src/flightdeck/quickstart_smoke.py @@ -2,80 +2,20 @@ from __future__ import annotations -import shutil import subprocess import sys import tempfile from pathlib import Path -REPO = Path(__file__).resolve().parents[2] -QS = REPO / "examples" / "quickstart" -BASELINE_PH = "__BASELINE_RELEASE_ID__" -CANDIDATE_PH = "__CANDIDATE_RELEASE_ID__" - - -def _flightdeck_cmd() -> list[str]: - exe = shutil.which("flightdeck") - if exe: - return [exe] - return [sys.executable, "-m", "flightdeck.cli.main"] - - -def _run(fd: list[str], *args: str, cwd: Path) -> subprocess.CompletedProcess[str]: - return subprocess.run( - [*fd, *args], - cwd=cwd, - check=True, - text=True, - capture_output=True, - ) +from flightdeck.demo_flow import flightdeck_argv, quickstart_root, run_quickstart_verify def main() -> None: - fd = _flightdeck_cmd() + fd = flightdeck_argv() + qs = quickstart_root() with tempfile.TemporaryDirectory(prefix="fd_qs_", ignore_cleanup_errors=True) as tmp_s: tmp = Path(tmp_s) - baseline_events = tmp / "baseline-events.jsonl" - candidate_events = tmp / "candidate-events.jsonl" - - _run(fd, "init", cwd=tmp) - _run(fd, "pricing", "import", str(QS / "pricing-baseline.yaml"), cwd=tmp) - _run(fd, "pricing", "import", str(QS / "pricing-candidate.yaml"), cwd=tmp) - _run(fd, "policy", "set", str(QS / "policy.yaml"), cwd=tmp) - - reg_b = _run(fd, "release", "register", str(QS / "baseline-release"), cwd=tmp) - baseline_id = reg_b.stdout.strip() - reg_c = _run(fd, "release", "register", str(QS / "candidate-release"), cwd=tmp) - candidate_id = reg_c.stdout.strip() - - baseline_events.write_text( - (QS / "baseline-events.jsonl").read_text(encoding="utf-8").replace(BASELINE_PH, baseline_id), - encoding="utf-8", - ) - candidate_events.write_text( - (QS / "candidate-events.jsonl").read_text(encoding="utf-8").replace(CANDIDATE_PH, candidate_id), - encoding="utf-8", - ) - - _run(fd, "runs", "ingest", str(baseline_events), cwd=tmp) - _run(fd, "runs", "ingest", str(candidate_events), cwd=tmp) - _run(fd, "release", "diff", baseline_id, candidate_id, "--window", "7d", cwd=tmp) - _run( - fd, - "release", - "promote", - baseline_id, - "--env", - "local", - "--window", - "7d", - "--reason", - "quickstart smoke", - cwd=tmp, - ) - _run(fd, "release", "history", "--agent", "agent_support", "--env", "local", cwd=tmp) - _run(fd, "release", "verify", baseline_id, "--path", str(QS / "baseline-release"), cwd=tmp) - _run(fd, "doctor", cwd=tmp) + run_quickstart_verify(tmp, qs, fd) print("quickstart_smoke: OK") diff --git a/tests/test_demo_flow.py b/tests/test_demo_flow.py new file mode 100644 index 0000000..6d8dce9 --- /dev/null +++ b/tests/test_demo_flow.py @@ -0,0 +1,76 @@ +"""Tests for bundled quickstart resolution and demo flow helpers.""" + +from __future__ import annotations + +from pathlib import Path + +import pytest +from click.testing import CliRunner + +from flightdeck import demo_flow +from flightdeck.cli.main import cli + + +def test_quickstart_root_prefers_repo_examples(tmp_path: Path, monkeypatch: pytest.MonkeyPatch) -> None: + monkeypatch.delenv("FLIGHTDECK_QUICKSTART_ROOT", raising=False) + repo_root = Path(demo_flow.__file__).resolve().parents[2] + examples = repo_root / "examples" / "quickstart" + if not examples.is_dir(): + pytest.skip("examples/quickstart not present in this checkout") + + assert demo_flow.quickstart_root() == examples.resolve() + + +def test_quickstart_root_env_override(tmp_path: Path, monkeypatch: pytest.MonkeyPatch) -> None: + qs = tmp_path / "qs" + qs.mkdir() + (qs / "policy.yaml").write_text("policy_id: x\n", encoding="utf-8") + monkeypatch.setenv("FLIGHTDECK_QUICKSTART_ROOT", str(qs)) + + assert demo_flow.quickstart_root() == qs.resolve() + + +def test_demo_session_keep_workspace(tmp_path: Path, monkeypatch: pytest.MonkeyPatch) -> None: + repo_root = Path(demo_flow.__file__).resolve().parents[2] + examples = repo_root / "examples" / "quickstart" + if not examples.is_dir(): + pytest.skip("examples/quickstart not present") + + monkeypatch.delenv("FLIGHTDECK_QUICKSTART_ROOT", raising=False) + ws = demo_flow.demo_session( + verify=False, + doctor=False, + qs_dir=None, + promote_reason="pytest demo", + keep_workspace=True, + ) + assert ws is not None + cfg = ws / "flightdeck.yaml" + assert cfg.is_file() + + +def test_demo_cli_exits_zero(monkeypatch: pytest.MonkeyPatch) -> None: + repo_root = Path(__file__).resolve().parents[1] + monkeypatch.chdir(repo_root) + runner = CliRunner() + res = runner.invoke(cli, ["demo"]) + assert res.exit_code == 0, res.output + assert "Demo OK" in res.output + + +def test_demo_session_cleanup(tmp_path: Path, monkeypatch: pytest.MonkeyPatch) -> None: + repo_root = Path(demo_flow.__file__).resolve().parents[2] + examples = repo_root / "examples" / "quickstart" + if not examples.is_dir(): + pytest.skip("examples/quickstart not present") + + monkeypatch.delenv("FLIGHTDECK_QUICKSTART_ROOT", raising=False) + ws = demo_flow.demo_session( + verify=False, + doctor=False, + qs_dir=None, + promote_reason="pytest demo", + keep_workspace=False, + ) + assert ws is None + From d452f3954aeb27a7375d543ba552330c3b0ccd95 Mon Sep 17 00:00:00 2001 From: Cursor Agent Date: Mon, 11 May 2026 07:09:05 +0000 Subject: [PATCH 04/30] docs: align web-ui, cli, sdk-integrations, and ops-policy with shipped features - web-ui.md: document refactored DiffPage component tree (DiffVerdictStack, DiffReleaseTwin, DiffPolicyPanel, DiffChangeImpact/DiffPricingExpand, DiffDecisionCard, diffPayload.tsx), URL deep-linking for all pages, OverviewPage focused-release hero (?release= param), ReleaseLifecycleStrip, CopyTextButton, urlSearch.ts helpers, updated routing table, new CSS classes for DiffPage and OverviewPage additions - cli.md: add flightdeck pricing check subcommand (--max-age-days, --fail) with example output and CI usage pattern - pricing-catalog.md: link flightdeck pricing check reference to cli.md - operations-and-policy.md: add schema migration v4 (promotion_requests table); update storage schema table from 7 to 8 tables - sdk-integrations.md: add Module reference section documenting make_run_end_event, temporal_labels, and per-integration public APIs (openai_chat, anthropic_messages, openai_agents, langchain_callback, crewai_bridge) Co-authored-by: Gottam Sai Bharath --- docs/cli.md | 31 ++++++ docs/operations-and-policy.md | 4 +- docs/pricing-catalog.md | 2 +- docs/sdk-integrations.md | 94 ++++++++++++++++ docs/web-ui.md | 203 +++++++++++++++++++++++++++------- 5 files changed, 294 insertions(+), 40 deletions(-) diff --git a/docs/cli.md b/docs/cli.md index 5591085..efd4186 100644 --- a/docs/cli.md +++ b/docs/cli.md @@ -412,6 +412,37 @@ flightdeck pricing show --provider PROVIDER --version VERSION Both flags are required. If the table does not exist, exits 1 with an error message. +### `flightdeck pricing check` + +Check the age of **`flightdeck-bundled-*`** pricing tables in the ledger. Prints one line +per bundled snapshot with its anchor date and approximate age. Non-bundled tables are +ignored. + +```bash +flightdeck pricing check [--max-age-days N] [--fail] +``` + +| Option | Default | Description | +|--------|---------|-------------| +| `--max-age-days` | `90` | Threshold in days. Tables older than this print `STALE` to stderr (and count toward `--fail`). Tables at or under the limit print `OK`. | +| `--fail` | off | Exit 1 if any bundled table exceeds `--max-age-days`. Useful as a CI gate. | + +**Example output:** +``` +OK flightdeck-bundled-2026-05 (~11 days old; max 90) +``` + +If no `flightdeck-bundled-*` tables are in the ledger (e.g. after `flightdeck init --no-bundled-pricing`), +exits 0 and prints `No flightdeck-bundled-* pricing tables in the ledger.` + +Use in CI to surface stale bundled snapshots before they silently affect cost estimates: +```bash +flightdeck pricing check --max-age-days 90 --fail +``` + +See [pricing-catalog.md](pricing-catalog.md) for the bundled snapshot lifecycle and when +to replace with `flightdeck pricing import`. + --- ## `flightdeck policy` diff --git a/docs/operations-and-policy.md b/docs/operations-and-policy.md index 0e53a88..cec11e0 100644 --- a/docs/operations-and-policy.md +++ b/docs/operations-and-policy.md @@ -521,7 +521,7 @@ endpoints (`GET /v1/releases`, `GET /v1/promoted`, `GET /v1/actions`) and intern ## SQLite storage schema -The operations layer reads and writes seven tables (via `src/flightdeck/storage.py`): +The operations layer reads and writes eight tables (via `src/flightdeck/storage.py`): | Table | Purpose | |-------|---------| @@ -532,6 +532,7 @@ The operations layer reads and writes seven tables (via `src/flightdeck/storage. | `active_policy` | Single-row table holding the active `Policy` JSON | | `promoted_releases` | Current promoted pointer per `(agent_id, environment)` | | `release_actions` | Append-only audit ledger; `audit_seq` is monotonically increasing | +| `promotion_requests` | Pending / completed / cancelled approval requests (added in migration v4); used when `promotion_requires_approval: true` | `Storage.migrate()` runs forward-only numbered migrations. `flightdeck doctor` verifies that migrations are applied through `LATEST_SCHEMA_MIGRATION_VERSION` and that @@ -593,6 +594,7 @@ Migrations are numbered and forward-only; they are never reversed. | 1 | Initial schema (all base tables via `CREATE TABLE IF NOT EXISTS`) | | 2 | `CREATE INDEX … ON run_events(release_id, timestamp)` — speeds up diff/query | | 3 | `ALTER TABLE release_actions ADD COLUMN audit_seq INTEGER`; backfill existing rows; add unique index | +| 4 | `CREATE TABLE IF NOT EXISTS promotion_requests` — adds the approval request/confirm workflow (columns: `request_id`, `status`, `release_id`, `agent_id`, `environment`, `window`, `reason`, `actor`, `baseline_release_id`, `policy_result_json`, `created_at`, `resolved_at`, `completed_action_id`) | New migrations must increment `LATEST_SCHEMA_MIGRATION_VERSION` in `storage.py` and add a corresponding check in `test_schemas.py` (or `test_doctor.py`). diff --git a/docs/pricing-catalog.md b/docs/pricing-catalog.md index ce150c3..dc00b0c 100644 --- a/docs/pricing-catalog.md +++ b/docs/pricing-catalog.md @@ -24,7 +24,7 @@ your own YAML (and optionally **`--replace`** with **`--reason`**). Bundled table YAML in the wheel includes **comment links** to each provider’s official list-pricing page so you can spot-check rates between FlightDeck releases. -**Staleness guardrails:** list prices change often. Run **`flightdeck pricing check`** to see whether any **`flightdeck-bundled-*`** table in the ledger is older than **`--max-age-days`** (default **90**); pass **`--fail`** for CI. **`flightdeck release diff`** and **`POST /v1/diff`** add **`pricing.warnings`** when baseline or candidate **`pricing_version`** is a stale bundled snapshot so economics do not look authoritative after the snapshot has aged out. +**Staleness guardrails:** list prices change often. Run **`flightdeck pricing check`** to see whether any **`flightdeck-bundled-*`** table in the ledger is older than **`--max-age-days`** (default **90**); pass **`--fail`** for CI. **`flightdeck release diff`** and **`POST /v1/diff`** add entries to **`pricing.warnings`** when baseline or candidate **`pricing_version`** is a stale bundled snapshot so economics do not look authoritative after the snapshot has aged out. See [cli.md § flightdeck pricing check](cli.md#flightdeck-pricing-check) for the full option reference. **Maintainer cadence:** the bundled snapshot is **updated on each minor release** when vendor public list pricing changes materially (see **[ROADMAP.md](../ROADMAP.md)**). Operators in production should still treat **`flightdeck pricing import`** as the source of truth. diff --git a/docs/sdk-integrations.md b/docs/sdk-integrations.md index 7c0a329..b0dcfdb 100644 --- a/docs/sdk-integrations.md +++ b/docs/sdk-integrations.md @@ -52,6 +52,100 @@ batch processor. Set **`OTEL_EXPORTER_OTLP_ENDPOINT`** (for example FlightDeck does not auto-instrument **`httpx`** or the Python SDK; create spans in your app or attach upstream auto-instrumentation if you need request-level traces. +## Module reference + +Each submodule under `flightdeck.integrations` has a single responsibility: map +third-party SDK output into a `RunEvent`. Import only the submodule you need. + +### `flightdeck.integrations.common` (no extras required) + +Available as `from flightdeck.integrations import make_run_end_event, temporal_labels`. + +#### `make_run_end_event(**kwargs) -> RunEvent` + +Convenience constructor for a `type=run_end` `RunEvent`. All named parameters map +directly to fields on the v1 wire shape: + +| Parameter | Required | Description | +|-----------|----------|-------------| +| `agent_id` | yes | Stable agent ID | +| `release_id` | yes | Release ID from `flightdeck release register` | +| `run_id` | yes | Unique identifier; duplicates are skipped at ingest | +| `tenant_id` | yes | Tenant scoping dimension | +| `task_id` | yes | Task type dimension | +| `environment` | yes | Deployment environment | +| `provider` | yes | LLM provider (e.g. `"openai"`) | +| `model` | yes | Model name (e.g. `"gpt-4o"`) | +| `input_tokens` | yes | Prompt token count | +| `output_tokens` | yes | Completion token count | +| `cached_input_tokens` | no | Cached-prompt token count (default `0`) | +| `latency_ms` | no | End-to-end latency in milliseconds | +| `success` | no | Whether the run succeeded (default `True`) | +| `error_type` | no | Optional error class string | +| `trace_id`, `session_id`, `span_id` | no | Tracing identifiers (stored in `request.*`) | +| `labels` | no | Arbitrary string labels dict | +| `timestamp` | no | Event timestamp (defaults to `datetime.now(UTC)`) | +| `workspace_id` | no | Workspace identifier (default `"ws_local"`) | + +#### `temporal_labels(*, workflow_id, workflow_run_id=None) -> dict[str, str]` + +Returns a `labels` dict with `temporal.workflow_id` (and optionally `temporal.run_id`) +for tagging run events emitted from Temporal workflows. Pass the result as the `labels=` +argument to `make_run_end_event`. + +### `flightdeck.integrations.openai_chat` (no extra needed; `openai` extra for the SDK itself) + +#### `run_event_from_openai_chat_completion(response, *, agent_id, release_id, run_id, tenant_id, task_id, environment, **kwargs) -> RunEvent` + +Constructs a `RunEvent` from an `openai.types.chat.ChatCompletion` response object. +Extracts `model`, `input_tokens`, `output_tokens`, and `cached_input_tokens` from +`response.usage`. Extra `kwargs` are passed to `make_run_end_event` (e.g. `latency_ms`, +`trace_id`). See `examples/integration/adoption/openai_chat/emit_run.py`. + +### `flightdeck.integrations.anthropic_messages` (no extra needed; `anthropic` extra for the SDK itself) + +#### `run_event_from_anthropic_message(message, *, agent_id, release_id, run_id, tenant_id, task_id, environment, **kwargs) -> RunEvent` + +Constructs a `RunEvent` from an `anthropic.types.Message` object. Extracts `model`, +`input_tokens`, `output_tokens`, and `cache_read_input_tokens` from `message.usage`. +See `examples/integration/adoption/anthropic_messages/emit_run.py`. + +### `flightdeck.integrations.openai_agents` (`integrations-openai-agents` extra) + +#### `run_event_from_openai_agents_result(result, *, agent_id, release_id, run_id, tenant_id, task_id, environment, **kwargs) -> RunEvent` + +Constructs a `RunEvent` from an OpenAI Agents SDK `RunResult` (or compatible object). +Aggregates token usage across all items in `result.raw_responses`. See +`examples/integration/adoption/openai_agents/emit_run.py`. + +### `flightdeck.integrations.langchain_callback` (`integrations-langchain` extra) + +#### `FlightDeckLangChainCallbackHandler` + +A `BaseCallbackHandler` subclass. Pass an instance to LangChain chains or agents as +`callbacks=[handler]`. On `on_llm_end`, extracts token usage from the LLM result and +appends a `RunEvent` to `handler.events` (a list). After the chain completes, call +`client.ingest_run_events(handler.events)`. Constructor parameters: + +| Parameter | Description | +|-----------|-------------| +| `agent_id` | Stable agent ID | +| `release_id` | Release ID | +| `run_id` | Unique run identifier (used for all events this handler captures) | +| `tenant_id`, `task_id`, `environment` | Standard scoping dimensions | + +See `examples/integration/adoption/langchain/emit_run.py`. + +### `flightdeck.integrations.crewai_bridge` (no extra; install `crewai` in your app env) + +#### `run_event_from_crew_token_totals(input_tokens, output_tokens, *, model, provider, agent_id, release_id, run_id, tenant_id, task_id, environment, **kwargs) -> RunEvent` + +Constructs a `RunEvent` from manually collected CrewAI token totals (no direct dependency +on CrewAI's internal classes). Collect totals from your crew's result callbacks and pass +them here. See `examples/integration/adoption/crewai/emit_totals.py`. + +--- + ## Trust boundaries Anyone who can reach **`POST /v1/events`** can append ledger rows. Keep **`flightdeck serve`** diff --git a/docs/web-ui.md b/docs/web-ui.md index 5dd4df4..73e494b 100644 --- a/docs/web-ui.md +++ b/docs/web-ui.md @@ -54,11 +54,11 @@ The app uses **HashRouter** (`react-router-dom`) so all navigation stays within | Hash path | Component | HTTP calls | Notes | |-----------|-----------|-----------|-------| -| `#/` | `OverviewPage` | `GET /v1/releases`, `GET /v1/promoted`, `GET /v1/actions`, `GET /v1/metrics` (parallel where applicable) | Ledger metrics (read-only); short per-counter hints; skeleton on first load; **auto-refresh** every 30s when the tab is visible + on timeline **`generation`** bump; links to Diff/Runs | -| `#/diff` | `DiffPage` | `POST /v1/diff` | Sections: policy gate (incl. `evaluated_at`), evidence window, pricing/catalog/hints (incl. provider/version skew callout when sides differ), per-1k prices when present, cost/quality rollups; raw JSON panel | +| `#/` | `OverviewPage` | `GET /v1/releases`, `GET /v1/promoted`, `GET /v1/actions`, `GET /v1/metrics` | `ReleaseLifecycleStrip` + optional `?release=` hero; promoted table first; releases table with filter row and copy/diff shortcuts; collapsible ledger metrics; **auto-refresh** every 30 s while tab is visible + on timeline **`generation`** bump | +| `#/diff` | `DiffPage` | `POST /v1/diff` | URL params prefill form (`baseline`, `candidate`, `window`, `environment`); result rendered through `DiffVerdictStack` → `DiffReleaseTwin` → `DiffPolicyPanel` → `DiffChangeImpact` (with collapsible `DiffPricingExpand`) → `DiffDecisionCard` + **Continue to promote** link → raw JSON panel | | `#/runs` | `RunsPage` | `GET /v1/releases` (for datalist), `GET /v1/runs`, `GET /v1/runs/export` | Forensics: filters, table (trace/status, trace band rows or **Group by trace_id**), **View** drawer (focus trap, session/span ids), typed **run-query error** card with **Retry**, empty/offset/truncation hints, NDJSON download | | `#/settings` | `SettingsPage` | *(none)* | **Color theme** (Light / Dark / System) via `ThemeToggle`; more preferences later. | -| `#/actions` | `ActionsPage` | `GET /v1/workspace`, `GET /v1/promotion-requests` (when `promotion_requires_approval`), `POST /v1/promote` **or** `POST /v1/promote/request` + `POST /v1/promote/confirm`, `POST /v1/rollback` | Workspace skeleton then strip; approval path: numbered steps, pending **Refresh list** / **Use for confirm**; **Rollback** danger-styled; see **ActionsPage** below | +| `#/actions` | `ActionsPage` | `GET /v1/workspace`, `GET /v1/promotion-requests` (when `promotion_requires_approval`), `POST /v1/promote` **or** `POST /v1/promote/request` + `POST /v1/promote/confirm`, `POST /v1/rollback` | URL params prefill form (`release_id`, `environment`, `window`); workspace skeleton then strip; approval path: numbered steps, pending **Refresh list** / **Use for confirm**; **Rollback** danger-styled | | `#/*` (any other) | — | Redirects to `#/` | | `App.tsx` declares the route tree. `AppShell` is the layout wrapper rendered for all routes. @@ -81,7 +81,20 @@ ThemePreferenceProvider (`App.tsx`) ├── aside.fd-sidebar (brand, collapse chevron, primary nav, footer nav → Settings) └── div.fd-shell__content ├── SecurityStatusBar - └── main#main-content → OverviewPage | DiffPage | RunsPage | ActionsPage | SettingsPage + └── main#main-content + ├── OverviewPage + │ ├── ReleaseLifecycleStrip + │ └── focused release hero (when ?release= is set) + ├── DiffPage + │ ├── DiffVerdictStack + │ ├── DiffReleaseTwin + │ ├── DiffPolicyPanel + │ ├── DiffChangeImpact → DiffPricingExpand + │ ├── DiffDecisionCard + │ └── JsonPanel + ├── RunsPage + ├── ActionsPage + └── SettingsPage ``` --- @@ -172,20 +185,50 @@ fail. This is a configuration hint only — the server enforces the actual gate. ## `OverviewPage` (`web/src/pages/OverviewPage.tsx`) -Read-only dashboard. Renders a **Ledger metrics** card from `fetchMetrics()` plus three tables from `loadTimeline()` output: +Read-only dashboard. Layout: -| Block | Source | Content | -|-------|--------|---------| -| Ledger metrics | `GET /v1/metrics` | Releases, pricing tables, run events, promoted pointers, and actions totals (plus `actions_by_action` breakdown), `schema_version`, `generated_at` | -| Releases | `GET /v1/releases` | Release ID, Agent, Version, Environment, Checksum, Created | -| Promoted | `GET /v1/promoted` | Agent, Environment, Active release | -| Recent actions | `GET /v1/actions` | When, Action, Policy (PASS/FAIL badge), Release, Environment, Reason | +1. **`ReleaseLifecycleStrip`** — horizontal workflow guide showing the four stages + (Register → Ingest → Diff & policy → Promote & rollback) as linked steps. Each step + links to the relevant page; the Promote step is static (no link) in read-only builds. + Includes a note that deep links prefill forms but do not auto-submit. + +2. **Focused release hero** — when `?release=` is present in the URL, a hero + section appears above the tables. It shows agent, version, environment, abbreviated + release ID (with **Copy ID** button), checksum, and the current promoted baseline for + that agent/environment pair (or a note that no pointer exists). Action buttons link to + Diff, Runs, and Promote with the release, environment, and a default `7d` window + pre-filled. A **Clear focus** button removes the `?release=` param. If the ID does not + match any registered release, a warning is shown instead. + +3. **Promoted releases table** — lists current `(agent_id, environment)` → `release_id` + pointers. Each row has a **View** link to `#/?release=` to focus that release. + +4. **Releases table** — lists all registered releases with Agent, Version, Environment, ID, + Checksum, and Created columns. A **Status** badge shows **Live** (the release matches the + current promoted pointer for that agent/environment) or **Registered**. A filter row + (agent substring, environment substring, and Live / Not live / All dropdown) reduces the + table without re-fetching. **Copy** buttons (via `CopyTextButton`) copy the release ID. + Each row has a **Diff** shortcut (links to `#/diff` with baseline = promoted pointer, + candidate = this release, environment and `7d` window pre-filled) and a **Focus** link. + +5. **Recent actions table** — promote/rollback audit rows: When, Action, Policy badge, + Release, Environment, Reason. + +6. **Ledger metrics** — collapsible panel (collapsed by default, toggle via button). Shows + raw counters from `GET /v1/metrics`: releases, pricing tables, run events, promoted + pointers, actions totals + breakdown, `schema_version`, `generated_at`. Long IDs are abbreviated with `shortId(id, keepStart, keepEnd)` and shown in full on hover via the HTML `title` attribute. +**URL params for OverviewPage:** + +| Param | Effect | +|-------|--------| +| `?release=` | Activates the focused release hero. The releases table filter and tables remain visible below. | + **Refresh:** while the document tab is visible, the page **auto-polls** metrics and the -timeline on an interval and uses **silent** fetches after the first load. The `generation` +timeline every 30 s and uses **silent** fetches after the first load. The `generation` counter from `TimelineRefreshContext` triggers an immediate refresh after mutations from `ActionsPage`. @@ -193,14 +236,18 @@ counter from `TimelineRefreshContext` triggers an immediate refresh after mutati ## `DiffPage` (`web/src/pages/DiffPage.tsx`) -Form-based interface for `POST /v1/diff`. Fields mirror the request body: +Form-based interface for `POST /v1/diff`. The page reads initial field values from URL +search params and writes them back on each submission, enabling **deep links** that +pre-fill the form: -| Field | Default | Maps to | -|-------|---------|---------| -| Baseline release ID | (empty) | `baseline_release_id` | -| Candidate release ID | (empty) | `candidate_release_id` | -| Window | `7d` | `window` | -| Environment | `local` | `environment` (sent as `null` when empty) | +| URL param | Form field | Default | +|-----------|-----------|---------| +| `baseline` | Baseline release ID | (empty) | +| `candidate` | Candidate release ID | (empty) | +| `window` | Time window | `7d` | +| `environment` | Environment | `local` | + +Example: `#/diff?baseline=rel_abc&candidate=rel_xyz&window=7d&environment=production` `tenant_id` and `task_id` are **not exposed** in the UI form. To run a diff narrowed to a specific tenant or task, use the CLI (`flightdeck release diff --tenant --task `) @@ -209,25 +256,25 @@ or call `POST /v1/diff` directly with the `tenant_id` and `task_id` fields. See [operations-and-policy.md § compute_diff vs. promote_release filter scope](operations-and-policy.md#compute_diff-vs-promote_release--rollback_release-filter-scope) for details on what those filters affect. -On submit, the raw diff response is parsed and rendered as: - -- **Summary card:** policy badge (PASS / FAIL), failure reasons list, sample counts and - confidence label (including `confidence_reason` when present). -- **Pricing table warnings:** when `pricing.warnings` is a non-empty string array, a - `fd-alert--warn` list is shown above the pricing/model-change banner (diagnostic only). -- **Catalog / hints:** when `pricing.catalog` or `pricing.hints` is present, the UI surfaces - catalog enabled state, lines, and hint strings (see [pricing-catalog.md](pricing-catalog.md)). -- **Pricing change warning:** when the diff response includes a `pricing` block with - `pricing_or_model_changed: true`, a `fd-alert--warn` banner is shown in the summary - card. It names the baseline and candidate provider/version/model so the user knows the - cost delta includes pricing assumption changes, not just usage changes. When the response - also includes a `pricing.prices` block with all four per-1k token rates present, the - banner additionally shows a **Per-1k token prices** line (baseline → candidate, input and - output separately) so the user can separate tariff moves from token volume changes in the - cost delta. Rates are rendered to six decimal places via `toFixed(6)`. -- **Metric cards:** cost/run (USD), latency avg (ms), error rate — each showing baseline, - candidate, and delta. -- **Raw diff JSON** panel (collapsed by default via `JsonPanel`). +On submit, the response is parsed via helpers in `diffPayload.tsx` and rendered through a +sequence of dedicated components: + +1. **`DiffVerdictStack`** — full-width strip at the top. Shows a **Blocked** banner with the + first policy reason when policy fails, then a **verdict strip** (green PASS / red FAIL + with a short narrative). If the diff response contains no `policy` block, a warning is + shown instead. +2. **`DiffReleaseTwin`** — side-by-side baseline vs candidate IDs, environment, window, and + resolved `provider/version model` lines from each side's pricing block. +3. **`DiffPolicyPanel`** — card showing the policy PASS/FAIL badge, `evaluated_at` + timestamp, and full reasons list. +4. **`DiffChangeImpact`** — card with three sub-sections: + - **Sample coverage** — baseline/candidate run counts and confidence label (with `confidence_reason` when present). + - **Cost and quality rollups** — `DiffMetric` cards for cost/run (USD), latency avg (ms), error rate, each with baseline → candidate and delta. + - **`DiffPricingExpand`** — collapsible pricing & model section (collapsed on each new diff result). Shows baseline vs candidate `provider/version model` inline. Expands to reveal: provider/version skew warning, `pricing.warnings` list, `pricing.hints` list, pricing catalog detail (when enabled), and per-1k token prices (input/output, baseline → candidate) when all four rates are present and pricing changed. +5. **`DiffDecisionCard`** — summarizes the gate outcome in plain English and, when policy + passes and the candidate release ID is known, shows a **Continue to promote** link to + `#/actions` with `release_id`, `environment`, and `window` pre-filled. +6. **Raw diff JSON** panel (`JsonPanel`, collapsed by default). The **Compute diff** button is disabled while the request is in flight (`busy` state). Errors from the API are shown as an inline `fd-alert--error` element. @@ -235,6 +282,23 @@ Errors from the API are shown as an inline `fd-alert--error` element. Note: `POST /v1/diff` is a **read-only computation** and does not require a mutation token. See [http-api.md](http-api.md) for the full response schema. +### Diff component subtree + +``` +DiffPage +├── DiffVerdictStack (full-width verdict/block strip) +├── DiffReleaseTwin (baseline vs candidate identity, env, pricing line) +├── DiffPolicyPanel (policy badge + reasons) +├── DiffChangeImpact (samples, metric rollups, expandable pricing) +│ └── DiffPricingExpand (collapsed; shows per-1k prices, warnings, catalog) +├── DiffDecisionCard (verdict copy + "Continue to promote" link) +└── JsonPanel (raw diff JSON, collapsed by default) +``` + +Shared data extraction: `web/src/components/diff/diffPayload.tsx` exports typed helpers +(`pickPolicy`, `pickPricing`, `pricingLine`, `DiffMetric`) that isolate JSON traversal from +rendering. + --- ## `ActionsPage` (`web/src/pages/ActionsPage.tsx`) @@ -276,6 +340,27 @@ After a successful **promote** or **rollback** (or **confirm**): --- +## `urlSearch.ts` (`web/src/urlSearch.ts`) + +Helpers for hash-router deep-linking. Both `DiffPage`, `OverviewPage`, `RunsPage`, and +`ActionsPage` use these to read from and write to `URLSearchParams`: + +| Export | Description | +|--------|-------------| +| `pickTrimmedSearch(searchParams, key)` | Returns `searchParams.get(key)?.trim() ?? ""`. Never returns `null`. | +| `searchParamsFromRecord(rec)` | Builds a `?key=value` string from a `Record`, omitting entries with empty values. Returns `""` when all values are empty. | + +**Deep-link examples:** + +| Page | URL | Effect | +|------|-----|--------| +| Overview | `#/?release=rel_abc123` | Activates focused release hero | +| Diff | `#/diff?baseline=rel_a&candidate=rel_b&window=7d&environment=production` | Pre-fills the diff form | +| Runs | `#/runs?release_id=rel_abc&window=24h&environment=staging` | Pre-fills release and filters | +| Actions | `#/actions?release_id=rel_abc&environment=production&window=7d` | Pre-fills promote/rollback form | + +--- + ## `api.ts` (`web/src/api.ts`) Typed client helpers shared across pages. @@ -371,6 +456,29 @@ Calls `GET /v1/promotion-requests` with optional query parameters. Used by `Acti ## Shared components +### `CopyTextButton` (`web/src/components/CopyTextButton.tsx`) + +Inline button that copies a string to the clipboard. Uses `navigator.clipboard.writeText` +with an `execCommand` fallback for headless or insecure contexts (so Playwright E2E tests +also work). Status cycles through `idle → "Copied" → idle` (2 s) or `idle → "Failed" → +idle` (2.5 s). Props: + +| Prop | Type | Default | Description | +|------|------|---------|-------------| +| `label` | string | — | Accessible label prefix (e.g. `"Release ID"`) | +| `value` | string | — | String to copy | +| `buttonText` | string | `"Copy"` | Visible button text when idle | +| `className` | string | `"fd-btn fd-btn--ghost fd-copy-btn"` | CSS class | +| `testId` | string | — | Optional `data-testid` for E2E | + +### `ReleaseLifecycleStrip` (`web/src/components/ReleaseLifecycleStrip.tsx`) + +Horizontal `