diff --git a/CHANGELOG.md b/CHANGELOG.md index 03ce390..2b91063 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -8,6 +8,7 @@ This project follows [Semantic Versioning](https://semver.org/). From **v1.0.0** ### Added +- **`GET /metrics`:** Prometheus text exposition format scrape endpoint (no new dependencies). Same Bearer / loopback gate as `GET /v1/metrics`. Configure Prometheus with `metrics_path: /metrics` and `bearer_token: ` (when token is set). Metrics: `flightdeck_releases_total`, `flightdeck_run_events_total`, `flightdeck_promoted_pointers_total`, `flightdeck_actions_total`, `flightdeck_actions_by_type{action=...}`, `flightdeck_pricing_tables_total`, `flightdeck_schema_version`. - **Identity passthrough → audit actor:** HTTP mutating routes (`POST /v1/promote*`, `POST /v1/rollback`) now read **`X-FlightDeck-Actor`** (first) and **`X-Forwarded-User`** (second) before falling back to the body `actor` field. Lets a reverse-proxy / SSO layer stamp the audit ledger authoritatively. Trust model documented in **`SECURITY.md`** and **`docs/http-api.md`**. - **`flightdeck workspace info [--json]`** — one-screen snapshot of workspace path, database backend, schema version, ledger counters (releases / promoted / actions / run events), configuration (policy, pricing catalog, approval mode), and webhook count. JSON mode for CI / chatops pipelines. - **`flightdeck version [--json]`** — explicit version subcommand alongside the existing `--version` flag; `--json` emits `{"name":"flightdeck-ai","version":"1.3.0"}` for scripts and dashboards. @@ -236,11 +237,11 @@ This project follows [Semantic Versioning](https://semver.org/). From **v1.0.0** ### Added -- **[docs/cli.md](https://github.com/flightdeckdev/flightdeck/blob/main/docs/cli.md)**: CLI reference (synopsis, flags, exit codes, pointers to quickstart examples). +- **[docs/cli.md](https://flightdeckdev.github.io/flightdeck/cli/)**: CLI reference (synopsis, flags, exit codes, pointers to quickstart examples). - **`scripts/quickstart_smoke.py`**: cross-platform quickstart smoke (**no bash**): temp workspace, Python placeholder substitution, **`release verify`**, **`doctor`**. - **CI:** run quickstart smoke on **Ubuntu** and **Windows** matrix jobs (alongside pytest and schema drift). - **Tests:** `tests/test_quickstart_smoke.py` exercises the smoke script. -- **0.8 milestone planning** (CLI + CI): archived under **`docs/reviews/`** in development clones; shipped CLI narrative is **[docs/cli.md](https://github.com/flightdeckdev/flightdeck/blob/main/docs/cli.md)** on the canonical repo; this tree ships **`scripts/quickstart_smoke.py`** / **`flightdeck-quickstart-verify`** (see **Unreleased**). +- **0.8 milestone planning** (CLI + CI): archived under **`docs/reviews/`** in development clones; shipped CLI narrative is **[docs/cli.md](https://flightdeckdev.github.io/flightdeck/cli/)** on the canonical repo; this tree ships **`scripts/quickstart_smoke.py`** / **`flightdeck-quickstart-verify`** (see **Unreleased**). ### Changed diff --git a/docs/cli.md b/docs/cli.md index e531c98..e8d1805 100644 --- a/docs/cli.md +++ b/docs/cli.md @@ -8,6 +8,83 @@ For the on-disk formats the CLI reads and writes see [release-artifact.md](release-artifact.md). For the HTTP API exposed by `flightdeck serve` see [http-api.md](http-api.md). +## Common patterns + +Copy-paste recipes for the most frequent workflows. Full flag reference follows below. + +### Pattern 1: Register, ingest, and diff in one shell session + +The core loop from scratch — useful when you already have event JSONL files ready: + +```bash +flightdeck init +BASELINE=$(flightdeck release register ./baseline-release) +CANDIDATE=$(flightdeck release register ./candidate-release) + +# Substitute the placeholder release IDs in your JSONL files, then ingest: +sed "s/__BASELINE_RELEASE_ID__/${BASELINE}/g" baseline-events.jsonl > /tmp/be.jsonl +sed "s/__CANDIDATE_RELEASE_ID__/${CANDIDATE}/g" candidate-events.jsonl > /tmp/ce.jsonl +flightdeck runs ingest /tmp/be.jsonl +flightdeck runs ingest /tmp/ce.jsonl + +flightdeck release diff "$BASELINE" "$CANDIDATE" --window 7d +``` + +### Pattern 2: Policy-gated CI step + +Exit 1 when policy fails — wire this as a blocking CI step: + +```bash +flightdeck release diff "$BASELINE" "$CANDIDATE" --window 7d --fail-on-policy +# Exit 0 → policy passed; Exit 1 → blocked (diff output already printed) +``` + +For JSON output suitable for `jq` parsing in CI: + +```bash +flightdeck release diff "$BASELINE" "$CANDIDATE" --window 7d \ + --fail-on-policy --output json | jq '.policy.passed' +``` + +See `examples/ci/ledger_gate.py` for the complete cross-platform CI pattern and +`examples/ci/github-actions/` for GitHub Actions workflow templates. + +### Pattern 3: Weekly health check + +Run on a schedule to confirm the ledger is intact and pricing tables are not stale: + +```bash +flightdeck doctor +flightdeck pricing check --max-age-days 90 --fail +``` + +`doctor` exits 0 when all schema, pointer, and sequence checks pass. `pricing check` exits 1 +when any bundled table is older than `--max-age-days` and `--fail` is set. Together they make +a clean recurring health probe. + +### Pattern 4: Setting up outbound webhooks for Slack + +Get a Slack notify on every promote or rollback: + +```bash +# Register the webhook (prints the secret once — save it) +flightdeck webhook add \ + --url https://your-adapter.example.com/flightdeck \ + --event promote.succeeded \ + --event rollback.succeeded \ + --event promote.blocked \ + --description "Slack #releases" + +# Test delivery before going live +flightdeck webhook test +``` + +FlightDeck sends HMAC-signed `POST` requests; most chat tools need a thin adapter to reshape +the payload. See [SDK integrations — Slack](sdk-integrations.md#slack) for a ready-made +Cloudflare Worker example. + +--- + ## Global flags | Flag | Description | diff --git a/docs/getting-started.md b/docs/getting-started.md new file mode 100644 index 0000000..81306fb --- /dev/null +++ b/docs/getting-started.md @@ -0,0 +1,386 @@ +# Getting started + +## What the demo showed you + +`flightdeck demo` ran the full FlightDeck loop in a throw-away temp directory. It registered +two agent releases (a baseline and a candidate), ingested one batch of run events into each, +diffed them to compute cost-per-run, latency, and error-rate deltas with a confidence label, +then promoted the baseline to `production`. Nothing left your machine — the workspace was +just a SQLite file in `/tmp`. That is the complete loop. The steps below wire the same loop +to your real agent. + +--- + +## Before you start + +You need a `flightdeck.yaml` workspace config in your working directory. Create one now: + +```bash +pip install flightdeck-ai # skip if already installed +flightdeck init +``` + +`init` writes `flightdeck.yaml`, creates `.flightdeck/flightdeck.db`, and imports bundled +OpenAI / Anthropic / Google pricing tables (`flightdeck-bundled-2026-05`) so you can run +diffs without assembling pricing YAMLs from scratch. + +--- + +## Step 1 — Register your first release + +A **release** is an immutable snapshot of your agent configuration: which model it uses, which +prompts, what pricing reference to apply. Every subsequent diff, promote, and rollback refers +back to these snapshots by ID. + +Create a file called `release.yaml` alongside your agent code: + +```yaml +api_version: v1 +kind: Release +metadata: + name: my-support-agent # human label — shows up in `release list` output + version: "1.0.0" # free-form version string +spec: + agent: + agent_id: my-agent # stable identifier — must match across all releases for the same agent + runtime: + provider: openai + model: gpt-4o-mini # must exist in the pricing table you imported + prompts: + system_ref: prompts/system.md # path relative to the bundle directory + pricing_reference: + provider: openai + pricing_version: flightdeck-bundled-2026-05 # matches the bundled table from `flightdeck init` +``` + +The only truly required fields are `api_version`, `kind`, `metadata.name`, `metadata.version`, +`spec.agent.agent_id`, `spec.runtime.provider`, `spec.runtime.model`, and +`spec.pricing_reference`. Everything else is optional. + +Register it: + +```bash +BASELINE=$(flightdeck release register ./release.yaml) +echo "Baseline release: $BASELINE" +# Baseline release: rel_abc123def456 +``` + +`release register` accepts a single `release.yaml` file or a bundle directory containing +one. The ID it prints (`rel_…`) is what you will pass to ingest, diff, and promote. + +See [release-artifact.md](release-artifact.md) for the full `release.yaml` field reference. + +--- + +## Step 2 — Ingest run events from your agent + +FlightDeck needs runtime evidence — cost, latency, error rate — before it can compute a +meaningful diff. Choose the path that fits your agent: + +=== "Python SDK" + + Start `flightdeck serve` first, then emit events directly from your agent process: + + ```bash + flightdeck serve & # starts on 127.0.0.1:8765 + ``` + + ```python + import uuid + from datetime import datetime, timezone + from flightdeck.sdk import FlightdeckClient + from flightdeck.models import RunEvent + + client = FlightdeckClient("http://127.0.0.1:8765") + + # Call once per agent run, right after the LLM responds + event = RunEvent( + timestamp=datetime.now(timezone.utc), + agent_id="my-agent", # must match spec.agent.agent_id in release.yaml + release_id="rel_abc123def456", # from `flightdeck release register` + run_id=str(uuid.uuid4()), # unique per run — duplicates are silently skipped + tenant_id="tenant_a", + task_id="support_ticket", + environment="production", + usage={ + "model": { + "provider": "openai", + "model": "gpt-4o-mini", + "input_tokens": 850, + "output_tokens": 320, + } + }, + metrics={"success": True, "latency_ms": 740}, + ) + client.ingest_run_events([event]) + ``` + + See [Python SDK](sdk.md) for the full client reference. + +=== "HTTP POST /v1/events" + + If you prefer curl or are not using Python: + + ```bash + curl -s -X POST http://127.0.0.1:8765/v1/events \ + -H "Content-Type: application/json" \ + -d '{ + "events": [{ + "timestamp": "2026-05-01T12:00:00Z", + "agent_id": "my-agent", + "release_id": "rel_abc123def456", + "run_id": "run_unique_001", + "tenant_id": "tenant_a", + "task_id": "support_ticket", + "environment": "production", + "usage": { + "model": { + "provider": "openai", + "model": "gpt-4o-mini", + "input_tokens": 850, + "output_tokens": 320 + } + }, + "metrics": {"success": true, "latency_ms": 740} + }] + }' + ``` + + See [HTTP API reference](http-api.md#post-v1events) for the full field list. + +=== "OpenAI integration" + + Wrap an existing `openai.chat.completions.create` call: + + ```python + import uuid + from flightdeck.sdk import FlightdeckClient + from flightdeck.integrations.openai_chat import run_event_from_openai_chat_completion + + client = FlightdeckClient("http://127.0.0.1:8765") + release_id = "rel_abc123def456" # from `flightdeck release register` + + # Your existing OpenAI call (unchanged): + response = openai_client.chat.completions.create( + model="gpt-4o-mini", + messages=[{"role": "user", "content": user_message}], + ) + + # Map the response to a RunEvent and emit it: + event = run_event_from_openai_chat_completion( + response, + agent_id="my-agent", + release_id=release_id, + run_id=str(uuid.uuid4()), + tenant_id="tenant_a", + task_id="support_ticket", + environment="production", + ) + client.ingest_run_events([event]) + ``` + + Install the extra: `pip install 'flightdeck-ai[openai]'`. See + [SDK integrations](sdk-integrations.md) for Anthropic, LangChain, CrewAI, and Temporal. + +--- + +## Step 3 — Run your first real diff + +Once you have events for both a baseline and a candidate release, compare them: + +```bash +flightdeck release diff $BASELINE $CANDIDATE --window 7d +``` + +Example output: + +``` +Window: 7d (2026-04-24T12:00:00+00:00 .. 2026-05-01T12:00:00+00:00) +Filters: env=local tenant=* task=* +Baseline pricing: openai/flightdeck-bundled-2026-05 (model=gpt-4o-mini) +Candidate pricing: openai/flightdeck-bundled-2026-05 (model=gpt-4o-mini) +Samples: baseline=420 candidate=380 +Confidence: MEDIUM + +Estimated model token cost/run (USD): 0.000312 -> 0.000289 (delta -0.000023, -7.37%) +Latency avg (ms): 820.00 -> 756.50 (delta -63.50) +Error rate: 0.0095 -> 0.0071 (delta -0.0024) + +Policy: PASS +``` + +**What "Confidence: LOW" or "Confidence: MEDIUM" means:** FlightDeck compares your event +counts against the thresholds in your active policy (or the workspace defaults: +`min_candidate_runs=500`, `min_baseline_runs=500`). Below those thresholds the confidence +degrades to MEDIUM or LOW — the numbers are real but the sample is small. To get to HIGH: + +1. Ingest more events (let the agent run longer). +2. Or lower the thresholds in your policy for a staging environment: + +```yaml +# policy-staging.yaml +policy_id: staging +min_candidate_runs: 50 +min_baseline_runs: 50 +min_low_runs: 0 +require_high_diff_confidence: false +``` + +The `--window` flag controls how far back events are pulled. Use `24h` for a daily gate or +`7d` for a weekly one. See [Operations & policy](operations-and-policy.md) for the full +confidence algorithm. + +--- + +## Step 4 — Set a policy + +A policy defines the maximum cost, latency, and error rate your candidate may not exceed +before promotion is blocked. Copy this to `policy.yaml` and tune the numbers to match your +agent's SLO: + +```yaml +policy_id: prod-v1 +max_cost_per_run_usd: 0.005 # block if candidate costs more than $0.005/run +max_error_rate: 0.02 # block if error rate exceeds 2% +max_latency_ms: 2000 # block if p-avg latency exceeds 2 s +require_high_diff_confidence: true +min_candidate_runs: 200 +min_baseline_runs: 200 +min_low_runs: 20 +``` + +Load it: + +```bash +flightdeck policy set policy.yaml +flightdeck policy show # confirm the active policy +``` + +All `max_*` fields are optional — omit any you do not want to gate on. Only one policy is +active at a time. Setting a new one replaces the previous. + +See [Operations & policy](operations-and-policy.md#policy-system) for the full policy model +and how all constraints are evaluated simultaneously. + +--- + +## Step 5 — Promote when policy passes + +The first promotion for an agent/environment is unconditional (no baseline exists yet to +diff against). After that, every promotion runs the active policy: + +```bash +# First promotion — establishes the baseline pointer +flightdeck release promote $BASELINE --env production --window 7d \ + --reason "initial baseline for v1.0.0" + +# Later: promote a candidate after policy passes +flightdeck release promote $CANDIDATE --env production --window 7d \ + --reason "v1.1.0: latency and cost improvements validated in staging" +``` + +What happens: + +- The currently promoted release becomes the **baseline** for the diff. +- FlightDeck runs policy against the diff. +- On **PASS**: the promoted pointer is updated and an audit record is written. +- On **FAIL**: the attempt is still recorded (intent captured) but the pointer is not moved. + +Check the history afterward: + +```bash +flightdeck release history --agent my-agent --env production +``` + +The audit ledger is append-only. Every attempt — pass or fail — is recorded with timestamp, +actor, reason, and policy outcome. + +--- + +## Next: CI integration + +The `examples/ci/ledger_gate.py` script shows the canonical CI pattern: create a fresh +workspace, register both releases, ingest events, run `release diff --fail-on-policy`, then +clean up. The `--fail-on-policy` flag exits 1 when the diff's policy result is FAIL, which +makes CI block the deployment. GitHub Actions examples live in +`examples/ci/github-actions/`. + +```bash +# The core CI gate in one shell session: +flightdeck init +BASELINE=$(flightdeck release register ./baseline-release) +CANDIDATE=$(flightdeck release register ./candidate-release) +flightdeck runs ingest baseline-events.jsonl +flightdeck runs ingest candidate-events.jsonl +flightdeck release diff $BASELINE $CANDIDATE --window 7d --fail-on-policy +``` + +See the [CLI reference](cli.md#common-patterns) for copy-paste recipes including policy-gated +CI steps and Slack webhook setup. + +--- + +## Next: Web UI + +Run `flightdeck serve` to open the web UI at `http://127.0.0.1:8765/`. The UI shows your +registered releases, promoted pointers, diff results, run forensics, and the audit ledger. +The `#/diff` page accepts `baseline`, `candidate`, `window`, and `environment` as URL +parameters so you can share a specific comparison as a link. + +See [Web UI](web-ui.md) for the full page and component reference. + +--- + +## Production checklist + +Before running `flightdeck serve` with real team traffic: + +### Switch to PostgreSQL (recommended for teams) + +SQLite works great for a single developer or CI. For multi-user teams or +anything you'd call production, switch to PostgreSQL: + +```bash +# Install the PostgreSQL extra +pip install "flightdeck-ai[postgres]" + +# Set your connection URL in flightdeck.yaml +# (or via environment variable FLIGHTDECK_DATABASE_URL) +``` + +Add to `flightdeck.yaml`: + +```yaml +database_url: postgresql://user:password@localhost:5432/flightdeck +``` + +Or set the environment variable and omit `database_url` from the YAML: + +```bash +export FLIGHTDECK_DATABASE_URL=postgresql://user:password@host:5432/flightdeck +flightdeck serve +``` + +Schema migrations run automatically on startup — same as SQLite. + +> **Backup:** use `pg_dump` for PostgreSQL. `flightdeck doctor --backup` +> only works for SQLite. Add `pg_dump` to your cron / systemd schedule. + +### Set a Bearer token for remote access + +When `flightdeck serve` is exposed beyond localhost, set a secret: + +```bash +export FLIGHTDECK_LOCAL_API_TOKEN="$(openssl rand -hex 32)" +flightdeck serve --host 0.0.0.0 +``` + +The Python SDK and HTTP clients must then pass `Authorization: Bearer `. +CLI commands running on the same machine still work without it +(loopback bypass stays active). + +### Use a process supervisor + +Run `flightdeck serve` under systemd, supervisor, or as a Docker container +(see `examples/deploy/` for Docker Compose and Fly.io recipes). Configure +a health check against `GET /health` for restart-on-failure. diff --git a/docs/http-api.md b/docs/http-api.md index 734eb89..0b6a20e 100644 --- a/docs/http-api.md +++ b/docs/http-api.md @@ -116,6 +116,66 @@ Read-only JSON snapshot of aggregate counts in the local SQLite ledger (releases --- +## `GET /metrics` + +Prometheus text exposition format scrape endpoint. Returns the same ledger counters as `GET /v1/metrics` but formatted as Prometheus `text/plain; version=0.0.4; charset=utf-8` — the standard content-type expected by a Prometheus scrape. + +**Auth:** Same as `GET /v1/metrics` — `Authorization: Bearer ` required when `FLIGHTDECK_LOCAL_API_TOKEN` is set; open when no token is configured (see [Authentication and access control](#authentication-and-access-control)). + +**Content-Type:** `text/plain; version=0.0.4; charset=utf-8` + +**Sample output** + +``` +# generated_at 2026-05-03T12:00:00+00:00 + +# HELP flightdeck_releases_total Registered release artifacts +# TYPE flightdeck_releases_total gauge +flightdeck_releases_total 3 + +# HELP flightdeck_run_events_total Ingested RunEvent records +# TYPE flightdeck_run_events_total gauge +flightdeck_run_events_total 120 + +# HELP flightdeck_promoted_pointers_total Active promotion pointers +# TYPE flightdeck_promoted_pointers_total gauge +flightdeck_promoted_pointers_total 1 + +# HELP flightdeck_actions_total Ledger actions (promote + rollback) +# TYPE flightdeck_actions_total gauge +flightdeck_actions_total 5 + +# HELP flightdeck_actions_by_type Ledger actions broken down by type +# TYPE flightdeck_actions_by_type gauge +flightdeck_actions_by_type{action="promote"} 4 +flightdeck_actions_by_type{action="rollback"} 1 + +# HELP flightdeck_pricing_tables_total Imported pricing tables +# TYPE flightdeck_pricing_tables_total gauge +flightdeck_pricing_tables_total 1 + +# HELP flightdeck_schema_version Current ledger schema migration version +# TYPE flightdeck_schema_version gauge +flightdeck_schema_version 5 +``` + +**Prometheus configuration** + +```yaml +scrape_configs: + - job_name: flightdeck + metrics_path: /metrics + static_configs: + - targets: ["127.0.0.1:8765"] + # Include the next two lines only when FLIGHTDECK_LOCAL_API_TOKEN is set: + authorization: + credentials: +``` + +No new runtime dependencies — the Prometheus text format is generated inline. All metric values are valid floats as required by the Prometheus exposition format specification. + +--- + ## `GET /v1/workspace` Read-only flags derived from `flightdeck.yaml` plus the running package version. Used by the web UI and automation to choose **direct promote** vs **request/confirm** without embedding workspace YAML in the client. No secrets and no catalog file contents — only whether a **non-empty** `pricing_catalog_path` is set (`pricing_catalog_configured`). diff --git a/docs/index.md b/docs/index.md index dc7a7dd..a3b0744 100644 --- a/docs/index.md +++ b/docs/index.md @@ -28,6 +28,21 @@ The same contract works from the CLI, the HTTP API (`POST /v1/promote`), and the --- +## Where to start + +| I want to… | Go here | +|---|---| +| Try it in 30 seconds | `pip install flightdeck-ai && flightdeck demo` | +| Wire it to my real agent | [Getting started](getting-started.md) | +| See all CLI commands | [CLI reference](cli.md) | +| Use the HTTP API | [HTTP API](http-api.md) | +| Understand policy gates | [Operations & policy](operations-and-policy.md) | +| Self-host `flightdeck serve` | [HTTP API — starting the server](http-api.md#starting-the-server) | +| Deploy for a team in production | [Getting started → Production checklist](getting-started.md#production-checklist) | +| Something broke | [Troubleshooting](troubleshooting.md) | + +--- + ## Quick reference | Topic | Doc | diff --git a/docs/operations-and-policy.md b/docs/operations-and-policy.md index cec11e0..f25db0c 100644 --- a/docs/operations-and-policy.md +++ b/docs/operations-and-policy.md @@ -8,6 +8,15 @@ For the on-disk formats these operations consume — `release.yaml`, bundle layo algorithm, workspace config, and pricing table YAML — see [release-artifact.md](release-artifact.md). +## Storage backends + +| Backend | When to use | How to configure | +|---|---|---| +| SQLite (default) | Local dev, single user, CI | `db_path` in `flightdeck.yaml` — no extra packages | +| PostgreSQL | Teams, production, multi-writer | `database_url: postgresql://...` + `pip install "flightdeck-ai[postgres]"` | + +See [SQLite concurrency and PostgreSQL](#sqlite-concurrency-and-postgresql) below for migration notes and concurrency caveats. + ## Architecture: single operations layer ``` diff --git a/docs/sdk.md b/docs/sdk.md index 08a8f09..47f8f7d 100644 --- a/docs/sdk.md +++ b/docs/sdk.md @@ -12,6 +12,88 @@ For most workflows the CLI is sufficient. Use the SDK when you need to: - drive diff / promote / rollback from Python (CI automation, notebooks) - integrate FlightDeck into an async service +## Quickstart + +A complete end-to-end example: start the server, register a release, emit events, run a diff, +and check policy — all from Python in under 20 lines. + +```python +import subprocess, uuid +from datetime import datetime, timezone +from flightdeck.sdk import FlightdeckClient +from flightdeck.models import RunEvent + +# --- 1. Start the server in a subprocess (or run `flightdeck serve` separately) --- +# Assumes `flightdeck init` has already been run in the working directory. +server = subprocess.Popen(["flightdeck", "serve"]) + +# --- 2. Register two releases via the CLI and capture their IDs --- +baseline_id = subprocess.check_output( + ["flightdeck", "release", "register", "./baseline-release"], text=True +).strip() +candidate_id = subprocess.check_output( + ["flightdeck", "release", "register", "./candidate-release"], text=True +).strip() + +# --- 3. Create the SDK client --- +client = FlightdeckClient("http://127.0.0.1:8765") +print(client.health()) # {"status": "ok", ...} + +# --- 4. Emit run events for the baseline release --- +events = [ + RunEvent( + timestamp=datetime.now(timezone.utc), + agent_id="my-agent", # must match spec.agent.agent_id in release.yaml + release_id=baseline_id, + run_id=str(uuid.uuid4()), # unique per run; duplicates are silently skipped + tenant_id="tenant_a", + task_id="support_ticket", + environment="production", + usage={ + "model": { + "provider": "openai", + "model": "gpt-4o-mini", + "input_tokens": 900, + "output_tokens": 310, + } + }, + metrics={"success": True, "latency_ms": 820}, + ) + # In production, collect hundreds of events before diffing + for _ in range(5) +] +inserted = client.ingest_run_events(events) +print(f"Inserted {inserted} baseline events") + +# --- 5. Compute a diff --- +diff = client.post_diff( + baseline_release_id=baseline_id, + candidate_release_id=candidate_id, + window="7d", + environment="production", +) +print("Confidence:", diff["samples"]["confidence"]) +print("Policy passed:", diff["policy"]["passed"]) +# Policy reasons (when blocked): diff["policy"]["reasons"] + +# --- 6. Promote if policy passed --- +if diff["policy"]["passed"]: + result = client.post_promote( + release_id=candidate_id, + environment="production", + window="7d", + reason="candidate validated via SDK quickstart", + ) + print("Promoted:", result["promoted_pointer_changed"]) + +client.close() +server.terminate() +``` + +The server-start and CLI register steps are usually done outside Python (e.g. in a Makefile +or CI step). The SDK's job is emitting events and driving diff / promote from within your +agent process or notebook. + ## Installation ```bash diff --git a/docs/troubleshooting.md b/docs/troubleshooting.md index 1e6afeb..dc020f6 100644 --- a/docs/troubleshooting.md +++ b/docs/troubleshooting.md @@ -4,6 +4,32 @@ This document covers common problems encountered while developing FlightDeck or `flightdeck serve`. For the operational runbook (SQLite busy errors, backup/restore, interpreting `doctor` failures) see [operations-and-policy.md](operations-and-policy.md). +## First 5 things to check + +Run through this checklist before reading the detailed sections. Most failures come from +one of these five causes: + +1. **Python version too old.** FlightDeck requires CPython 3.11+. Run `python --version` + or `uv python list`. If you see 3.10 or earlier, install a newer interpreter. + +2. **`flightdeck.yaml` not found.** Almost all CLI commands read `flightdeck.yaml` from + the current working directory. If you see `Workspace config not found: flightdeck.yaml`, + either run `flightdeck init` to create one or `cd` to the directory that contains it. + +3. **`flightdeck` not on PATH.** After `pip install` or `uv sync`, the console script is + only on PATH inside the active venv. Use `uv run flightdeck --help` or + `source .venv/bin/activate` first. + +4. **Missing pricing table.** `flightdeck release diff` needs a pricing table for each + release's `spec.pricing_reference`. If you skipped the pricing import step, run + `flightdeck pricing import ` — or re-run `flightdeck init` to import + the bundled tables automatically. + +5. **`release_id` placeholder not substituted.** If `runs ingest` reports + `Inserted 0 events` on a non-empty file, check whether the `release_id` values in + the file still contain a placeholder like `__BASELINE_RELEASE_ID__` instead of the + real ID from `flightdeck release register`. + --- ## Developer environment diff --git a/examples/deploy/README.md b/examples/deploy/README.md index 025c8c6..a9e5eee 100644 --- a/examples/deploy/README.md +++ b/examples/deploy/README.md @@ -26,6 +26,46 @@ docker compose up --build - **Compose healthcheck:** `docker-compose.yml` probes **`/health`** so orchestrators can mark the service ready (see `healthcheck:` in that file). - **Data:** named Docker volume **`fd_workspace`** (SQLite under **`.flightdeck/`** inside the volume). Remove with `docker compose down -v` when you want a clean ledger. +### Using PostgreSQL + +For team or production deployments, add a `postgres` service alongside `flightdeck serve` +and wire the connection URL via `FLIGHTDECK_DATABASE_URL`. Create a +`docker-compose.postgres.yml` override (or extend the existing `docker-compose.yml`): + +```yaml +services: + db: + image: postgres:16-alpine + environment: + POSTGRES_DB: flightdeck + POSTGRES_USER: flightdeck + POSTGRES_PASSWORD: changeme + volumes: + - pg_data:/var/lib/postgresql/data + healthcheck: + test: ["CMD-SHELL", "pg_isready -U flightdeck"] + interval: 5s + retries: 5 + + flightdeck: + build: . + environment: + FLIGHTDECK_DATABASE_URL: postgresql://flightdeck:changeme@db:5432/flightdeck + FLIGHTDECK_LOCAL_API_TOKEN: "${FLIGHTDECK_LOCAL_API_TOKEN}" + depends_on: + db: + condition: service_healthy + ports: + - "8765:8765" + +volumes: + pg_data: +``` + +Install the PostgreSQL extra in the image by adding `pip install "flightdeck-ai[postgres]"` +to the `Dockerfile`, or set it as a build arg. Schema migrations run automatically on +startup. Back up with `pg_dump flightdeck` on your preferred schedule. + ### SQLite backups FlightDeck stores the ledger in **`.flightdeck/flightdeck.db`** under the workspace root. For a **hot copy** while the server is stopped or idle, run from the workspace directory: diff --git a/mkdocs.yml b/mkdocs.yml index 92107ac..96cff03 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -19,6 +19,7 @@ validation: nav: - Home: index.md + - Getting started: getting-started.md - CLI: cli.md - HTTP API: http-api.md - Operations & policy: operations-and-policy.md diff --git a/sdk-ts/README.md b/sdk-ts/README.md new file mode 100644 index 0000000..4bd1df3 --- /dev/null +++ b/sdk-ts/README.md @@ -0,0 +1,127 @@ +# flightdeck-ai (Node.js / TypeScript SDK) + +TypeScript/Node.js SDK for [FlightDeck](https://github.com/flightdeckdev/flightdeck) — AI release governance for production agents. + +FlightDeck versions agent releases, ingests runtime evidence (cost/latency/errors), diffs builds, and policy-gates promote/rollback decisions. This SDK lets Node.js and TypeScript agents ingest evidence without writing raw HTTP. + +## Install + +```bash +npm install flightdeck-ai +``` + +Requires Node.js 18+ (uses the built-in `fetch` API). Zero runtime dependencies. + +If you use the OpenAI adapter, install `openai` as a peer dependency: + +```bash +npm install openai +``` + +## Quickstart + +```typescript +import { FlightDeckClient } from "flightdeck-ai"; + +const flightdeck = new FlightDeckClient({ + serverUrl: "http://localhost:7777", // or your deployed FlightDeck URL + apiToken: process.env.FLIGHTDECK_TOKEN, + releaseId: "rel-my-agent-v1.2.0", +}); + +// Ingest a run event after your agent completes a task +await flightdeck.ingestEvent({ + timestamp: new Date().toISOString(), + agent_id: "my-agent", + release_id: "rel-my-agent-v1.2.0", + run_id: crypto.randomUUID(), + tenant_id: "tenant-acme", + task_id: "order-summary-42", + environment: "production", + metrics: { latency_ms: 312, success: true }, + usage: { + model: { + provider: "openai", + model: "gpt-4o", + input_tokens: 850, + output_tokens: 220, + }, + }, +}); + +// Diff two releases and check the policy gate before promoting +const diff = await flightdeck.diff({ + baselineReleaseId: "rel-my-agent-v1.1.0", + candidateReleaseId: "rel-my-agent-v1.2.0", + window: "24h", + environment: "production", +}); + +if (diff.policy.passed) { + console.log("Policy passed — safe to promote!"); + console.log(`Cost delta: ${diff.metrics.delta_cost_per_run_usd.toFixed(6)} USD/run`); +} else { + console.error("Policy blocked promotion:", diff.policy.reasons); +} +``` + +## OpenAI Adapter + +Drop-in wrapper for `openai.chat.completions.create()`. Automatically measures latency, extracts token counts, and ingests a `RunEvent` into FlightDeck without modifying the response your code receives. + +```typescript +import OpenAI from "openai"; +import { FlightDeckClient } from "flightdeck-ai"; +import { FlightDeckOpenAIAdapter } from "flightdeck-ai/adapters/openai"; + +const openai = new OpenAI(); +const flightdeck = new FlightDeckClient({ serverUrl: "http://localhost:7777", releaseId: "rel-v1" }); +const adapter = new FlightDeckOpenAIAdapter(flightdeck, { environment: "production" }); + +// Use exactly like openai.chat.completions.create — the response is unchanged +const completion = await adapter.chatCompletionsCreate(openai, { + model: "gpt-4o", + messages: [{ role: "user", content: "Summarise this order for me." }], +}); +``` + +Ingest errors are swallowed so they never surface to the caller — your agent keeps running even if the FlightDeck server is unreachable. + +## API Reference + +### `FlightDeckClient` + +| Method | Description | +|--------|-------------| +| `ingestEvent(event)` | Ingest a single `RunEvent` via `POST /v1/events` | +| `ingestEvents(events[])` | Ingest a batch of `RunEvent`s in one request | +| `diff(options)` | Diff two releases via `POST /v1/diff`, returns a `DiffResult` | +| `getWorkspace()` | Read workspace metadata via `GET /v1/workspace` | +| `health()` | Health check via `GET /health` | + +All methods retry on `429` and `5xx` responses with exponential backoff (max 3 attempts). Failed requests throw `FlightDeckError` with `statusCode`, `detail`, and `requestId` fields. + +### `FlightDeckOpenAIAdapter` + +| Method | Description | +|--------|-------------| +| `chatCompletionsCreate(openai, params)` | Wraps `openai.chat.completions.create()`, ingests a `RunEvent`, returns the original response | + +### Constructor options + +```typescript +new FlightDeckClient({ + serverUrl: string; // FlightDeck server base URL (required) + releaseId: string; // Default release ID for events (required) + apiToken?: string; // Bearer token for authenticated endpoints + actor?: string; // Audit actor label (default: "node-sdk") +}) +``` + +## Full HTTP API reference + +See the [FlightDeck Python docs](https://github.com/flightdeckdev/flightdeck) for the complete HTTP API, CLI reference, release artifact schema, and policy configuration. + +## License + +Apache-2.0 diff --git a/sdk-ts/package.json b/sdk-ts/package.json new file mode 100644 index 0000000..bf972f2 --- /dev/null +++ b/sdk-ts/package.json @@ -0,0 +1,40 @@ +{ + "name": "flightdeck-ai", + "version": "0.1.0", + "description": "TypeScript SDK for FlightDeck — AI release governance for production agents.", + "type": "module", + "main": "./dist/index.cjs", + "module": "./dist/index.js", + "types": "./dist/index.d.ts", + "exports": { + ".": { + "import": "./dist/index.js", + "require": "./dist/index.cjs", + "types": "./dist/index.d.ts" + }, + "./adapters/openai": { + "import": "./dist/adapters/openai.js", + "require": "./dist/adapters/openai.cjs", + "types": "./dist/adapters/openai.d.ts" + } + }, + "scripts": { + "build": "tsup src/index.ts src/adapters/openai.ts --format esm,cjs --dts", + "test": "vitest run", + "typecheck": "tsc --noEmit" + }, + "keywords": ["flightdeck", "ai", "agents", "governance", "llmops"], + "license": "Apache-2.0", + "peerDependencies": { + "openai": ">=4.0.0" + }, + "peerDependenciesMeta": { + "openai": { "optional": true } + }, + "devDependencies": { + "tsup": "^8.0.0", + "typescript": "^5.4.0", + "vitest": "^2.0.0", + "openai": "^4.0.0" + } +} diff --git a/sdk-ts/src/adapters/openai.ts b/sdk-ts/src/adapters/openai.ts new file mode 100644 index 0000000..7c9cb62 --- /dev/null +++ b/sdk-ts/src/adapters/openai.ts @@ -0,0 +1,122 @@ +/** + * FlightDeck adapter for the OpenAI Node.js SDK. + * + * This adapter wraps `openai.chat.completions.create()` (non-streaming), + * measures latency, extracts token counts, and ingests a RunEvent into + * FlightDeck — all without modifying the return value seen by the caller. + * + * The `openai` package is an optional peer dependency; the adapter imports it + * as a type-only import so the rest of the SDK remains zero-dependency. + */ + +import type OpenAI from "openai"; +import { FlightDeckClient } from "../client.js"; +import type { RunEvent } from "../types.js"; + +export interface OpenAIAdapterOptions { + /** + * Override the FlightDeck agent_id for events recorded by this adapter. + * Defaults to "openai-agent". + */ + agentId?: string; + /** + * Override the tenant_id recorded on events. + * Defaults to "default". + */ + tenantId?: string; + /** + * Override the task_id recorded on events. + * Defaults to "openai-chat". + */ + taskId?: string; + /** + * Deployment environment (e.g. "production", "staging"). + * Defaults to "production". + */ + environment?: string; +} + +export class FlightDeckOpenAIAdapter { + private readonly client: FlightDeckClient; + private readonly agentId: string; + private readonly tenantId: string; + private readonly taskId: string; + private readonly environment: string; + + constructor(client: FlightDeckClient, options: OpenAIAdapterOptions = {}) { + this.client = client; + this.agentId = options.agentId ?? "openai-agent"; + this.tenantId = options.tenantId ?? "default"; + this.taskId = options.taskId ?? "openai-chat"; + this.environment = options.environment ?? "production"; + } + + /** + * Drop-in wrapper for `openai.chat.completions.create()` (non-streaming). + * + * Behaviour: + * - Calls the OpenAI API exactly once. + * - Records wall-clock latency, token counts, model name, and finish_reason. + * - Ingests a `run_end` RunEvent into FlightDeck (fire-and-forget; errors + * are swallowed so they never surface to the caller). + * - Returns the original `ChatCompletion` object unchanged. + * + * @param openai An initialised `OpenAI` instance. + * @param params Non-streaming chat completion params (same as you'd pass + * to `openai.chat.completions.create` directly). + */ + async chatCompletionsCreate( + openai: OpenAI, + params: OpenAI.Chat.ChatCompletionCreateParamsNonStreaming + ): Promise { + const startMs = Date.now(); + const runId = crypto.randomUUID(); + const timestamp = new Date(startMs).toISOString(); + + const completion = await openai.chat.completions.create(params); + + const latencyMs = Date.now() - startMs; + + const usage = completion.usage; + const finishReason = completion.choices[0]?.finish_reason; + + const event: RunEvent = { + api_version: "v1", + type: "run_end", + timestamp, + agent_id: this.agentId, + release_id: this.client.defaultReleaseId, + run_id: runId, + tenant_id: this.tenantId, + task_id: this.taskId, + environment: this.environment, + metrics: { + latency_ms: latencyMs, + success: finishReason === "stop", + error_type: finishReason !== "stop" ? `finish_reason:${finishReason}` : null, + }, + usage: { + model: { + provider: "openai", + model: params.model, + input_tokens: usage?.prompt_tokens ?? 0, + output_tokens: usage?.completion_tokens ?? 0, + cached_input_tokens: + (usage as OpenAI.CompletionUsage & { prompt_tokens_details?: { cached_tokens?: number } }) + ?.prompt_tokens_details?.cached_tokens ?? 0, + }, + tools: [], + }, + labels: { + sdk_adapter: "openai", + }, + }; + + // Fire-and-forget — ingest errors must not surface to the caller + this.client.ingestEvent(event).catch(() => { + // Intentionally swallowed + }); + + return completion; + } +} diff --git a/sdk-ts/src/client.ts b/sdk-ts/src/client.ts new file mode 100644 index 0000000..19c7f86 --- /dev/null +++ b/sdk-ts/src/client.ts @@ -0,0 +1,227 @@ +import { randomUUID } from "node:crypto"; +import type { + DiffResult, + FlightDeckClientOptions, + RunEvent, + WorkspaceInfo, +} from "./types.js"; + +const MAX_RETRIES = 3; +const RETRYABLE_STATUS = new Set([429, 500, 502, 503, 504]); + +/** @internal Delay helper — resolves after `ms` milliseconds. */ +function sleep(ms: number): Promise { + return new Promise((resolve) => setTimeout(resolve, ms)); +} + +/** @internal Compute exponential-backoff delay for attempt `n` (0-indexed). */ +function backoffMs(attempt: number): number { + // 200 ms * 2^attempt with ±10 % jitter, capped at 10 s + const base = 200 * Math.pow(2, attempt); + const jitter = base * 0.1 * (Math.random() * 2 - 1); + return Math.min(base + jitter, 10_000); +} + +export class FlightDeckClient { + private readonly serverUrl: string; + private readonly apiToken: string | undefined; + private readonly releaseId: string; + private readonly actor: string; + + constructor(options: FlightDeckClientOptions) { + // Strip trailing slashes without a quantified regex — CodeQL flags /\/+$/ on + // user-supplied strings as a potential ReDoS vector (CWE-1333). + let url = options.serverUrl; + while (url.endsWith("/")) url = url.slice(0, -1); + this.serverUrl = url; + this.apiToken = options.apiToken; + this.releaseId = options.releaseId; + this.actor = options.actor ?? "node-sdk"; + } + + // --------------------------------------------------------------------------- + // Low-level HTTP helpers + // --------------------------------------------------------------------------- + + /** @internal Build the standard request headers. */ + private buildHeaders(requestId: string): Record { + const headers: Record = { + "Content-Type": "application/json", + "X-FlightDeck-Actor": this.actor, + "X-Request-Id": requestId, + }; + if (this.apiToken) { + headers["Authorization"] = `Bearer ${this.apiToken}`; + } + return headers; + } + + /** + * @internal Make an HTTP request with retry on 429/5xx. + * Throws a FlightDeckError on non-retryable failure or after exhausting retries. + */ + private async request( + method: "GET" | "POST", + path: string, + body?: unknown + ): Promise { + const url = `${this.serverUrl}${path}`; + const requestId = randomUUID(); + const headers = this.buildHeaders(requestId); + + let lastError: FlightDeckError | undefined; + + for (let attempt = 0; attempt < MAX_RETRIES; attempt++) { + if (attempt > 0) { + await sleep(backoffMs(attempt - 1)); + } + + let response: Response; + try { + response = await fetch(url, { + method, + headers, + body: body !== undefined ? JSON.stringify(body) : undefined, + }); + } catch (err) { + // Network-level failure — retry + lastError = new FlightDeckError( + `Network error calling ${method} ${path}: ${String(err)}`, + 0, + null, + requestId + ); + continue; + } + + if (response.ok) { + // Parse JSON only if there's a body (204 No Content has none) + if (response.status === 204) { + return undefined as unknown as T; + } + return (await response.json()) as T; + } + + if (RETRYABLE_STATUS.has(response.status)) { + let detail: unknown = null; + try { + detail = await response.json(); + } catch { + // ignore parse failure + } + lastError = new FlightDeckError( + `FlightDeck server returned ${response.status} for ${method} ${path}`, + response.status, + detail, + requestId + ); + continue; + } + + // Non-retryable HTTP error — surface immediately + let detail: unknown = null; + try { + detail = await response.json(); + } catch { + // ignore + } + throw new FlightDeckError( + `FlightDeck server returned ${response.status} for ${method} ${path}`, + response.status, + detail, + requestId + ); + } + + throw ( + lastError ?? + new FlightDeckError(`Failed after ${MAX_RETRIES} attempts`, 0, null, requestId) + ); + } + + // --------------------------------------------------------------------------- + // Public API + // --------------------------------------------------------------------------- + + /** + * Ingest a batch of RunEvents. + * Wraps `POST /v1/events { events: [...] }`. + */ + async ingestEvents(events: RunEvent[]): Promise { + await this.request<{ inserted: number }>("POST", "/v1/events", { events }); + } + + /** + * Convenience wrapper — ingest a single RunEvent. + */ + async ingestEvent(event: RunEvent): Promise { + return this.ingestEvents([event]); + } + + /** + * Run a diff between two releases. + * Wraps `POST /v1/diff`. + */ + async diff(options: { + baselineReleaseId: string; + candidateReleaseId: string; + window?: string; + environment?: string; + tenantId?: string; + taskId?: string; + }): Promise { + return this.request("POST", "/v1/diff", { + baseline_release_id: options.baselineReleaseId, + candidate_release_id: options.candidateReleaseId, + window: options.window ?? "24h", + environment: options.environment ?? null, + tenant_id: options.tenantId ?? null, + task_id: options.taskId ?? null, + }); + } + + /** + * Retrieve read-only workspace metadata. + * Wraps `GET /v1/workspace`. + */ + async getWorkspace(): Promise { + return this.request("GET", "/v1/workspace"); + } + + /** + * Check server health. + * Wraps `GET /health` (unauthenticated on most deployments). + */ + async health(): Promise<{ status: string; mutation_auth: string; read_auth: string }> { + return this.request<{ status: string; mutation_auth: string; read_auth: string }>( + "GET", + "/health" + ); + } + + /** The default releaseId this client was constructed with. */ + get defaultReleaseId(): string { + return this.releaseId; + } +} + +// --------------------------------------------------------------------------- +// Error class +// --------------------------------------------------------------------------- + +export class FlightDeckError extends Error { + /** HTTP status code; 0 for network-level failures. */ + readonly statusCode: number; + /** Parsed response body from the server, if available. */ + readonly detail: unknown; + /** Request-Id sent with the failing request (useful for server-side tracing). */ + readonly requestId: string; + + constructor(message: string, statusCode: number, detail: unknown, requestId: string) { + super(message); + this.name = "FlightDeckError"; + this.statusCode = statusCode; + this.detail = detail; + this.requestId = requestId; + } +} diff --git a/sdk-ts/src/index.ts b/sdk-ts/src/index.ts new file mode 100644 index 0000000..b77fbea --- /dev/null +++ b/sdk-ts/src/index.ts @@ -0,0 +1,46 @@ +// Core client and error +export { FlightDeckClient, FlightDeckError } from "./client.js"; + +// Adapters +export { FlightDeckOpenAIAdapter } from "./adapters/openai.js"; +export type { OpenAIAdapterOptions } from "./adapters/openai.js"; + +// All wire types +export type { + // RunEvent hierarchy + RunEvent, + RunEventRequest, + RunEventMetrics, + RunEventModelUsage, + RunEventToolUsage, + RunEventUsage, + // Release artifact + ReleaseArtifact, + ReleaseMetadata, + ReleaseSpec, + ReleaseSpecAgent, + ReleaseSpecRuntime, + ReleaseSpecPrompts, + ReleaseSpecTools, + ReleaseSpecRouting, + ReleaseSpecRoutingFallback, + ReleaseSpecSafety, + ReleaseSpecSafetyRetryPolicy, + ReleaseSpecSafetyTimeouts, + ReleasePricingReference, + // Diff result + DiffResult, + DiffFilters, + DiffMetrics, + DiffSamples, + DiffPricing, + DiffPricingPrices, + DiffPricingCatalog, + PolicyResult, + // Workspace + WorkspaceInfo, + // Webhook + WebhookPayload, + // Client options + FlightDeckClientOptions, +} from "./types.js"; diff --git a/sdk-ts/src/types.ts b/sdk-ts/src/types.ts new file mode 100644 index 0000000..96c3206 --- /dev/null +++ b/sdk-ts/src/types.ts @@ -0,0 +1,322 @@ +/** + * Wire types for the FlightDeck v1 HTTP API. + * + * Field names and required/optional designations are derived from: + * - schemas/v1/run_event.schema.json + * - src/flightdeck/models.py (RunEvent, RunEventMetrics, RunEventModelUsage, + * RunEventToolUsage, RunEventUsage, RunEventRequest, + * ReleaseArtifact, PolicyResult) + * - src/flightdeck/operations.py (DiffOutcome / diff_outcome_to_public_dict) + */ + +// --------------------------------------------------------------------------- +// RunEvent sub-types +// --------------------------------------------------------------------------- + +/** Distributed-tracing context attached to a run (all optional). */ +export interface RunEventRequest { + session_id?: string | null; + trace_id?: string | null; + span_id?: string | null; +} + +/** Success/latency/error classification for one run. */ +export interface RunEventMetrics { + /** Wall-clock latency of the run in milliseconds. */ + latency_ms?: number | null; + /** Whether the run completed successfully. Defaults to true. */ + success?: boolean; + /** Error classifier string (e.g. "rate_limit", "timeout"). */ + error_type?: string | null; +} + +/** Token usage and pricing context for the model call within a run. */ +export interface RunEventModelUsage { + /** LLM provider identifier (e.g. "openai", "anthropic"). */ + provider: string; + /** Model name as returned by the provider (e.g. "gpt-4o"). */ + model: string; + /** Number of prompt/input tokens consumed. */ + input_tokens: number; + /** Number of completion/output tokens generated. */ + output_tokens: number; + /** Cached/prefix-cache input tokens (default 0). */ + cached_input_tokens?: number; +} + +/** Usage record for a single tool invoked during a run. */ +export interface RunEventToolUsage { + /** Canonical tool name. */ + tool_name: string; + /** Number of times the tool was called (default 0). */ + invocations?: number; + /** Vendor-specific cost units (default 0). */ + cost_units?: number; +} + +/** Aggregate resource usage for a run (model + optional tools). */ +export interface RunEventUsage { + model: RunEventModelUsage; + /** Tool usage records (empty array if no tools were called). */ + tools?: RunEventToolUsage[]; +} + +// --------------------------------------------------------------------------- +// RunEvent — top-level event ingested via POST /v1/events +// --------------------------------------------------------------------------- + +/** + * A single agent run event. Required fields match the JSON schema at + * schemas/v1/run_event.schema.json. + */ +export interface RunEvent { + /** Always "v1". */ + api_version?: "v1"; + /** "run_start" or "run_end" (default "run_end"). */ + type?: "run_start" | "run_end"; + /** ISO-8601 datetime with timezone. */ + timestamp: string; + + /** Workspace identifier (default "ws_local"). */ + workspace_id?: string; + /** Stable agent identifier (matches the release artifact). */ + agent_id: string; + /** Release identifier this run was executed under. */ + release_id: string; + /** Unique identifier for this agent run. */ + run_id: string; + + /** Tenant identifier (for multi-tenant deployments). */ + tenant_id: string; + /** Task/request identifier correlating multiple runs. */ + task_id: string; + /** Deployment environment (e.g. "production", "staging"). */ + environment: string; + + /** Optional distributed-tracing context. */ + request?: RunEventRequest | null; + /** Run outcome metrics. */ + metrics?: RunEventMetrics; + /** Resource usage for the run (required). */ + usage: RunEventUsage; + /** Arbitrary string key-value labels for filtering. */ + labels?: Record; +} + +// --------------------------------------------------------------------------- +// ReleaseArtifact metadata +// --------------------------------------------------------------------------- + +export interface ReleaseMetadata { + name: string; + version: string; + description?: string | null; + created_by?: string | null; + created_at?: string | null; +} + +export interface ReleaseSpecAgent { + agent_id: string; + entrypoint?: string | null; +} + +export interface ReleaseSpecRuntime { + provider: string; + model: string; + temperature?: number | null; + max_output_tokens?: number | null; +} + +export interface ReleaseSpecPrompts { + system_ref: string; + template_refs?: string[]; +} + +export interface ReleaseSpecTools { + manifest_ref?: string | null; + tool_names?: string[]; +} + +export interface ReleaseSpecRoutingFallback { + model: string; + on_error?: boolean; +} + +export interface ReleaseSpecRouting { + strategy?: "single_model" | "fallback_model"; + fallback?: ReleaseSpecRoutingFallback | null; +} + +export interface ReleaseSpecSafetyRetryPolicy { + max_retries?: number; + backoff_ms?: number | null; +} + +export interface ReleaseSpecSafetyTimeouts { + model_call?: number | null; + tool_call?: number | null; +} + +export interface ReleaseSpecSafety { + retry_policy?: ReleaseSpecSafetyRetryPolicy; + timeouts_ms?: ReleaseSpecSafetyTimeouts | null; +} + +export interface ReleasePricingReference { + provider: string; + pricing_version: string; +} + +export interface ReleaseSpec { + agent: ReleaseSpecAgent; + runtime: ReleaseSpecRuntime; + prompts: ReleaseSpecPrompts; + tools?: ReleaseSpecTools | null; + routing?: ReleaseSpecRouting | null; + safety?: ReleaseSpecSafety | null; + pricing_reference: ReleasePricingReference; + tags?: Record; +} + +/** Versioned release artifact (mirrors models.ReleaseArtifact). */ +export interface ReleaseArtifact { + api_version?: "v1"; + kind?: "Release"; + metadata: ReleaseMetadata; + spec: ReleaseSpec; +} + +// --------------------------------------------------------------------------- +// DiffResult — response from POST /v1/diff +// --------------------------------------------------------------------------- + +/** Policy evaluation result embedded in a DiffResult. */ +export interface PolicyResult { + passed: boolean; + reasons: string[]; + evaluated_at: string; +} + +export interface DiffFilters { + environment: string; + tenant_id: string | null; + task_id: string | null; +} + +export interface DiffPricingCatalog { + enabled: boolean; + catalog_version: string | null; + baseline_slot_id: string | null; + candidate_slot_id: string | null; + baseline_cost_per_run_usd: number | null; + candidate_cost_per_run_usd: number | null; + delta_cost_per_run_usd: number | null; + warnings: string[]; +} + +export interface DiffPricingPrices { + baseline_input_usd_per_1k_tokens: number | null; + baseline_output_usd_per_1k_tokens: number | null; + baseline_cached_input_usd_per_1k_tokens: number | null; + candidate_input_usd_per_1k_tokens: number | null; + candidate_output_usd_per_1k_tokens: number | null; + candidate_cached_input_usd_per_1k_tokens: number | null; +} + +export interface DiffPricing { + baseline_provider: string; + baseline_version: string; + baseline_model: string; + candidate_provider: string; + candidate_version: string; + candidate_model: string; + pricing_or_model_changed: boolean; + prices: DiffPricingPrices; + warnings: string[]; + hints: string[]; + catalog: DiffPricingCatalog; +} + +export interface DiffSamples { + baseline_runs: number; + candidate_runs: number; + /** "HIGH", "LOW", or "INSUFFICIENT" */ + confidence: string; + confidence_reason: string | null; +} + +export interface DiffMetrics { + baseline_cost_per_run_usd: number; + candidate_cost_per_run_usd: number; + delta_cost_per_run_usd: number; + delta_cost_per_run_pct: number | null; + baseline_latency_ms_avg: number | null; + candidate_latency_ms_avg: number | null; + delta_latency_ms_avg: number | null; + baseline_error_rate: number; + candidate_error_rate: number; + delta_error_rate: number; +} + +/** + * Full response from POST /v1/diff. + * The `policy` field is the key gate: check `policy.passed` before promoting. + */ +export interface DiffResult { + window: string; + since: string; + until: string; + filters: DiffFilters; + pricing: DiffPricing; + samples: DiffSamples; + metrics: DiffMetrics; + policy: PolicyResult; +} + +// --------------------------------------------------------------------------- +// Workspace +// --------------------------------------------------------------------------- + +/** Response from GET /v1/workspace. */ +export interface WorkspaceInfo { + api_version: "v1"; + kind: "WorkspacePublic"; + promotion_requires_approval: boolean; + pricing_catalog_configured: boolean; + server_version: string; +} + +// --------------------------------------------------------------------------- +// Webhook payload envelope +// --------------------------------------------------------------------------- + +/** Envelope shape for outbound webhook payloads dispatched by FlightDeck. */ +export interface WebhookPayload { + /** Webhook event type (e.g. "promote.succeeded", "rollback.succeeded"). */ + event: string; + /** ISO-8601 timestamp when the event was dispatched. */ + dispatched_at: string; + /** Webhook registration identifier. */ + webhook_id: string; + /** Arbitrary event-specific payload body. */ + data: Record; +} + +// --------------------------------------------------------------------------- +// Client options +// --------------------------------------------------------------------------- + +/** Options for constructing a FlightDeckClient. */ +export interface FlightDeckClientOptions { + /** Base URL of the FlightDeck server (e.g. "http://localhost:7777"). */ + serverUrl: string; + /** Bearer token for authenticated endpoints. */ + apiToken?: string; + /** Default release ID attached to events when not overridden per-event. */ + releaseId: string; + /** + * Actor identity written to audit logs for mutation requests. + * Defaults to "node-sdk". + */ + actor?: string; +} diff --git a/sdk-ts/tests/client.test.ts b/sdk-ts/tests/client.test.ts new file mode 100644 index 0000000..57ccd80 --- /dev/null +++ b/sdk-ts/tests/client.test.ts @@ -0,0 +1,584 @@ +/** + * Tests for FlightDeckClient and FlightDeckOpenAIAdapter. + * Uses vitest with vi.stubGlobal to mock the native fetch API. + */ + +import { describe, it, expect, vi, beforeEach, afterEach } from "vitest"; +import { FlightDeckClient, FlightDeckError } from "../src/client.js"; +import { FlightDeckOpenAIAdapter } from "../src/adapters/openai.js"; +import type { RunEvent, DiffResult } from "../src/types.js"; + +// --------------------------------------------------------------------------- +// Helpers +// --------------------------------------------------------------------------- + +/** Build a minimal valid RunEvent for tests. */ +function makeRunEvent(overrides: Partial = {}): RunEvent { + return { + timestamp: "2024-06-01T12:00:00.000Z", + agent_id: "test-agent", + release_id: "rel-abc123", + run_id: "run-001", + tenant_id: "tenant-1", + task_id: "task-42", + environment: "test", + usage: { + model: { + provider: "openai", + model: "gpt-4o", + input_tokens: 100, + output_tokens: 50, + }, + }, + ...overrides, + }; +} + +/** Construct a FlightDeckClient pointed at a fake server. */ +function makeClient(overrides: { apiToken?: string; actor?: string } = {}) { + return new FlightDeckClient({ + serverUrl: "http://localhost:7777", + releaseId: "rel-abc123", + ...overrides, + }); +} + +/** Create a mock fetch Response. */ +function mockResponse( + body: unknown, + status = 200 +): Response { + return new Response(JSON.stringify(body), { + status, + headers: { "Content-Type": "application/json" }, + }); +} + +// --------------------------------------------------------------------------- +// FlightDeckClient — ingestEvent / ingestEvents +// --------------------------------------------------------------------------- + +describe("FlightDeckClient.ingestEvent", () => { + beforeEach(() => { + vi.stubGlobal("fetch", vi.fn()); + }); + + afterEach(() => { + vi.unstubAllGlobals(); + }); + + it("sends POST /v1/events with events array", async () => { + const mockFetch = vi.mocked(fetch); + mockFetch.mockResolvedValueOnce(mockResponse({ inserted: 1 })); + + const client = makeClient(); + const event = makeRunEvent(); + await client.ingestEvent(event); + + expect(mockFetch).toHaveBeenCalledOnce(); + const [url, init] = mockFetch.mock.calls[0] as [string, RequestInit]; + expect(url).toBe("http://localhost:7777/v1/events"); + expect(init.method).toBe("POST"); + + const body = JSON.parse(init.body as string) as { events: RunEvent[] }; + expect(body.events).toHaveLength(1); + expect(body.events[0].agent_id).toBe("test-agent"); + expect(body.events[0].run_id).toBe("run-001"); + }); + + it("serialises all required RunEvent fields", async () => { + const mockFetch = vi.mocked(fetch); + mockFetch.mockResolvedValueOnce(mockResponse({ inserted: 1 })); + + const client = makeClient(); + const event = makeRunEvent({ + metrics: { latency_ms: 250, success: true }, + labels: { region: "eu-west-1" }, + }); + await client.ingestEvent(event); + + const [, init] = mockFetch.mock.calls[0] as [string, RequestInit]; + const body = JSON.parse(init.body as string) as { events: RunEvent[] }; + const sent = body.events[0]; + + expect(sent.tenant_id).toBe("tenant-1"); + expect(sent.task_id).toBe("task-42"); + expect(sent.environment).toBe("test"); + expect(sent.usage.model.input_tokens).toBe(100); + expect(sent.usage.model.output_tokens).toBe(50); + expect(sent.metrics?.latency_ms).toBe(250); + expect(sent.labels?.region).toBe("eu-west-1"); + }); + + it("sets Authorization header when apiToken is provided", async () => { + const mockFetch = vi.mocked(fetch); + mockFetch.mockResolvedValueOnce(mockResponse({ inserted: 1 })); + + const client = makeClient({ apiToken: "sk-secret-token" }); + await client.ingestEvent(makeRunEvent()); + + const [, init] = mockFetch.mock.calls[0] as [string, RequestInit]; + const headers = init.headers as Record; + expect(headers["Authorization"]).toBe("Bearer sk-secret-token"); + }); + + it("omits Authorization header when apiToken is not provided", async () => { + const mockFetch = vi.mocked(fetch); + mockFetch.mockResolvedValueOnce(mockResponse({ inserted: 1 })); + + const client = makeClient(); + await client.ingestEvent(makeRunEvent()); + + const [, init] = mockFetch.mock.calls[0] as [string, RequestInit]; + const headers = init.headers as Record; + expect(headers["Authorization"]).toBeUndefined(); + }); + + it("sets X-FlightDeck-Actor header to 'node-sdk' by default", async () => { + const mockFetch = vi.mocked(fetch); + mockFetch.mockResolvedValueOnce(mockResponse({ inserted: 1 })); + + const client = makeClient(); + await client.ingestEvent(makeRunEvent()); + + const [, init] = mockFetch.mock.calls[0] as [string, RequestInit]; + const headers = init.headers as Record; + expect(headers["X-FlightDeck-Actor"]).toBe("node-sdk"); + }); + + it("allows overriding X-FlightDeck-Actor via actor option", async () => { + const mockFetch = vi.mocked(fetch); + mockFetch.mockResolvedValueOnce(mockResponse({ inserted: 1 })); + + const client = makeClient({ actor: "my-service" }); + await client.ingestEvent(makeRunEvent()); + + const [, init] = mockFetch.mock.calls[0] as [string, RequestInit]; + const headers = init.headers as Record; + expect(headers["X-FlightDeck-Actor"]).toBe("my-service"); + }); + + it("includes X-Request-Id header on every request", async () => { + const mockFetch = vi.mocked(fetch); + mockFetch.mockResolvedValueOnce(mockResponse({ inserted: 1 })); + + const client = makeClient(); + await client.ingestEvent(makeRunEvent()); + + const [, init] = mockFetch.mock.calls[0] as [string, RequestInit]; + const headers = init.headers as Record; + expect(headers["X-Request-Id"]).toBeTruthy(); + // Should be a UUID-like string + expect(headers["X-Request-Id"]).toMatch( + /^[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}$/i + ); + }); + + it("sends a batch of events via ingestEvents", async () => { + const mockFetch = vi.mocked(fetch); + mockFetch.mockResolvedValueOnce(mockResponse({ inserted: 3 })); + + const client = makeClient(); + const events = [ + makeRunEvent({ run_id: "run-001" }), + makeRunEvent({ run_id: "run-002" }), + makeRunEvent({ run_id: "run-003" }), + ]; + await client.ingestEvents(events); + + const [, init] = mockFetch.mock.calls[0] as [string, RequestInit]; + const body = JSON.parse(init.body as string) as { events: RunEvent[] }; + expect(body.events).toHaveLength(3); + expect(body.events.map((e) => e.run_id)).toEqual(["run-001", "run-002", "run-003"]); + }); + + it("throws FlightDeckError on 4xx non-retryable response", async () => { + const mockFetch = vi.mocked(fetch); + mockFetch.mockResolvedValueOnce( + mockResponse({ detail: "Unauthorized" }, 401) + ); + + const client = makeClient(); + await expect(client.ingestEvent(makeRunEvent())).rejects.toThrow(FlightDeckError); + }); + + it("retries on 429 and succeeds on third attempt", async () => { + const mockFetch = vi.mocked(fetch); + mockFetch + .mockResolvedValueOnce(mockResponse({ detail: "rate limited" }, 429)) + .mockResolvedValueOnce(mockResponse({ detail: "rate limited" }, 429)) + .mockResolvedValueOnce(mockResponse({ inserted: 1 })); + + // Patch sleep so the test doesn't wait for real backoff + vi.stubGlobal( + "setTimeout", + (fn: () => void) => { fn(); return 0; } + ); + + const client = makeClient(); + await client.ingestEvent(makeRunEvent()); // should not throw + + expect(mockFetch).toHaveBeenCalledTimes(3); + }); +}); + +// --------------------------------------------------------------------------- +// FlightDeckClient — diff() +// --------------------------------------------------------------------------- + +describe("FlightDeckClient.diff", () => { + beforeEach(() => { + vi.stubGlobal("fetch", vi.fn()); + }); + + afterEach(() => { + vi.unstubAllGlobals(); + }); + + const fakeDiffResult: DiffResult = { + window: "24h", + since: "2024-05-31T12:00:00Z", + until: "2024-06-01T12:00:00Z", + filters: { + environment: "production", + tenant_id: null, + task_id: null, + }, + pricing: { + baseline_provider: "openai", + baseline_version: "2024-01", + baseline_model: "gpt-4o", + candidate_provider: "openai", + candidate_version: "2024-01", + candidate_model: "gpt-4o", + pricing_or_model_changed: false, + prices: { + baseline_input_usd_per_1k_tokens: 0.005, + baseline_output_usd_per_1k_tokens: 0.015, + baseline_cached_input_usd_per_1k_tokens: null, + candidate_input_usd_per_1k_tokens: 0.005, + candidate_output_usd_per_1k_tokens: 0.015, + candidate_cached_input_usd_per_1k_tokens: null, + }, + warnings: [], + hints: [], + catalog: { + enabled: false, + catalog_version: null, + baseline_slot_id: null, + candidate_slot_id: null, + baseline_cost_per_run_usd: null, + candidate_cost_per_run_usd: null, + delta_cost_per_run_usd: null, + warnings: [], + }, + }, + samples: { + baseline_runs: 1200, + candidate_runs: 850, + confidence: "HIGH", + confidence_reason: null, + }, + metrics: { + baseline_cost_per_run_usd: 0.0012, + candidate_cost_per_run_usd: 0.0011, + delta_cost_per_run_usd: -0.0001, + delta_cost_per_run_pct: -8.33, + baseline_latency_ms_avg: 320, + candidate_latency_ms_avg: 295, + delta_latency_ms_avg: -25, + baseline_error_rate: 0.01, + candidate_error_rate: 0.008, + delta_error_rate: -0.002, + }, + policy: { + passed: true, + reasons: [], + evaluated_at: "2024-06-01T12:00:00Z", + }, + }; + + it("sends POST /v1/diff with correct body", async () => { + const mockFetch = vi.mocked(fetch); + mockFetch.mockResolvedValueOnce(mockResponse(fakeDiffResult)); + + const client = makeClient(); + await client.diff({ + baselineReleaseId: "rel-v1", + candidateReleaseId: "rel-v2", + window: "48h", + environment: "production", + }); + + const [url, init] = mockFetch.mock.calls[0] as [string, RequestInit]; + expect(url).toBe("http://localhost:7777/v1/diff"); + expect(init.method).toBe("POST"); + + const body = JSON.parse(init.body as string) as Record; + expect(body.baseline_release_id).toBe("rel-v1"); + expect(body.candidate_release_id).toBe("rel-v2"); + expect(body.window).toBe("48h"); + expect(body.environment).toBe("production"); + }); + + it("defaults window to '24h' when not specified", async () => { + const mockFetch = vi.mocked(fetch); + mockFetch.mockResolvedValueOnce(mockResponse(fakeDiffResult)); + + const client = makeClient(); + await client.diff({ + baselineReleaseId: "rel-v1", + candidateReleaseId: "rel-v2", + }); + + const [, init] = mockFetch.mock.calls[0] as [string, RequestInit]; + const body = JSON.parse(init.body as string) as Record; + expect(body.window).toBe("24h"); + }); + + it("parses DiffResult and exposes policy.passed", async () => { + const mockFetch = vi.mocked(fetch); + mockFetch.mockResolvedValueOnce(mockResponse(fakeDiffResult)); + + const client = makeClient(); + const result = await client.diff({ + baselineReleaseId: "rel-v1", + candidateReleaseId: "rel-v2", + }); + + expect(result.policy.passed).toBe(true); + expect(result.policy.reasons).toHaveLength(0); + expect(result.samples.confidence).toBe("HIGH"); + expect(result.metrics.delta_cost_per_run_usd).toBe(-0.0001); + }); + + it("surfaces policy failure reasons correctly", async () => { + const failedDiff: DiffResult = { + ...fakeDiffResult, + policy: { + passed: false, + reasons: ["candidate error_rate 0.0500 exceeds max 0.0200"], + evaluated_at: "2024-06-01T12:00:00Z", + }, + }; + const mockFetch = vi.mocked(fetch); + mockFetch.mockResolvedValueOnce(mockResponse(failedDiff)); + + const client = makeClient(); + const result = await client.diff({ + baselineReleaseId: "rel-v1", + candidateReleaseId: "rel-v2", + }); + + expect(result.policy.passed).toBe(false); + expect(result.policy.reasons[0]).toContain("error_rate"); + }); +}); + +// --------------------------------------------------------------------------- +// FlightDeckOpenAIAdapter +// --------------------------------------------------------------------------- + +describe("FlightDeckOpenAIAdapter", () => { + beforeEach(() => { + vi.stubGlobal("fetch", vi.fn()); + }); + + afterEach(() => { + vi.unstubAllGlobals(); + }); + + /** Minimal mock OpenAI client that resolves immediately. */ + function makeOpenAIMock(overrides: { + prompt_tokens?: number; + completion_tokens?: number; + finish_reason?: string; + model?: string; + cached_tokens?: number; + } = {}) { + const { + prompt_tokens = 200, + completion_tokens = 80, + finish_reason = "stop", + model = "gpt-4o", + cached_tokens = 0, + } = overrides; + + const completion = { + id: "chatcmpl-test123", + object: "chat.completion", + created: Math.floor(Date.now() / 1000), + model, + choices: [ + { + index: 0, + message: { role: "assistant", content: "Hello!" }, + finish_reason, + }, + ], + usage: { + prompt_tokens, + completion_tokens, + total_tokens: prompt_tokens + completion_tokens, + prompt_tokens_details: { cached_tokens }, + }, + }; + + return { + chat: { + completions: { + create: vi.fn().mockResolvedValue(completion), + }, + }, + }; + } + + it("returns the original OpenAI response unchanged", async () => { + // Suppress the fire-and-forget ingest call + vi.mocked(fetch).mockResolvedValue(mockResponse({ inserted: 1 })); + + const client = makeClient({ apiToken: "tok" }); + const adapter = new FlightDeckOpenAIAdapter(client); + const openai = makeOpenAIMock(); + + const result = await adapter.chatCompletionsCreate( + openai as unknown as import("openai").default, + { model: "gpt-4o", messages: [{ role: "user", content: "Hi" }] } + ); + + expect(result.choices[0].finish_reason).toBe("stop"); + expect(result.model).toBe("gpt-4o"); + }); + + it("records input_tokens and output_tokens from OpenAI usage", async () => { + vi.mocked(fetch).mockResolvedValue(mockResponse({ inserted: 1 })); + + const client = makeClient(); + const adapter = new FlightDeckOpenAIAdapter(client); + const openai = makeOpenAIMock({ prompt_tokens: 150, completion_tokens: 60 }); + + await adapter.chatCompletionsCreate( + openai as unknown as import("openai").default, + { model: "gpt-4o", messages: [{ role: "user", content: "Hi" }] } + ); + + // Give the fire-and-forget promise a tick to settle + await new Promise((r) => setTimeout(r, 0)); + + expect(fetch).toHaveBeenCalledOnce(); + const [, init] = vi.mocked(fetch).mock.calls[0] as [string, RequestInit]; + const body = JSON.parse(init.body as string) as { events: RunEvent[] }; + const modelUsage = body.events[0].usage.model; + + expect(modelUsage.input_tokens).toBe(150); + expect(modelUsage.output_tokens).toBe(60); + expect(modelUsage.provider).toBe("openai"); + expect(modelUsage.model).toBe("gpt-4o"); + }); + + it("records latency_ms > 0", async () => { + vi.mocked(fetch).mockResolvedValue(mockResponse({ inserted: 1 })); + + const client = makeClient(); + const adapter = new FlightDeckOpenAIAdapter(client); + + // Delay the OpenAI mock by 10ms so latency is measurable + const openai = { + chat: { + completions: { + create: vi.fn().mockImplementation( + () => + new Promise((resolve) => + setTimeout( + () => + resolve({ + id: "chatcmpl-x", + object: "chat.completion", + created: Math.floor(Date.now() / 1000), + model: "gpt-4o", + choices: [{ index: 0, message: { role: "assistant", content: "Hi" }, finish_reason: "stop" }], + usage: { prompt_tokens: 10, completion_tokens: 5, total_tokens: 15 }, + }), + 10 + ) + ) + ), + }, + }, + }; + + await adapter.chatCompletionsCreate( + openai as unknown as import("openai").default, + { model: "gpt-4o", messages: [{ role: "user", content: "Hi" }] } + ); + + await new Promise((r) => setTimeout(r, 0)); + + const [, init] = vi.mocked(fetch).mock.calls[0] as [string, RequestInit]; + const body = JSON.parse(init.body as string) as { events: RunEvent[] }; + const latency = body.events[0].metrics?.latency_ms; + + expect(latency).toBeGreaterThan(0); + }); + + it("marks success=false when finish_reason is not 'stop'", async () => { + vi.mocked(fetch).mockResolvedValue(mockResponse({ inserted: 1 })); + + const client = makeClient(); + const adapter = new FlightDeckOpenAIAdapter(client); + const openai = makeOpenAIMock({ finish_reason: "length" }); + + await adapter.chatCompletionsCreate( + openai as unknown as import("openai").default, + { model: "gpt-4o", messages: [{ role: "user", content: "Hi" }] } + ); + + await new Promise((r) => setTimeout(r, 0)); + + const [, init] = vi.mocked(fetch).mock.calls[0] as [string, RequestInit]; + const body = JSON.parse(init.body as string) as { events: RunEvent[] }; + const metrics = body.events[0].metrics; + + expect(metrics?.success).toBe(false); + expect(metrics?.error_type).toBe("finish_reason:length"); + }); + + it("extracts cached_input_tokens from prompt_tokens_details", async () => { + vi.mocked(fetch).mockResolvedValue(mockResponse({ inserted: 1 })); + + const client = makeClient(); + const adapter = new FlightDeckOpenAIAdapter(client); + const openai = makeOpenAIMock({ cached_tokens: 50 }); + + await adapter.chatCompletionsCreate( + openai as unknown as import("openai").default, + { model: "gpt-4o", messages: [{ role: "user", content: "Hi" }] } + ); + + await new Promise((r) => setTimeout(r, 0)); + + const [, init] = vi.mocked(fetch).mock.calls[0] as [string, RequestInit]; + const body = JSON.parse(init.body as string) as { events: RunEvent[] }; + expect(body.events[0].usage.model.cached_input_tokens).toBe(50); + }); + + it("does not throw when FlightDeck ingest fails (fire-and-forget)", async () => { + // Simulate a server error — adapter should not surface this to the caller + vi.mocked(fetch).mockResolvedValue(mockResponse({ detail: "server error" }, 500)); + + // Also mock additional retries as 500 + vi.mocked(fetch).mockResolvedValue(mockResponse({ detail: "server error" }, 500)); + + const client = makeClient(); + const adapter = new FlightDeckOpenAIAdapter(client); + const openai = makeOpenAIMock(); + + // Should resolve without throwing despite ingest failure + await expect( + adapter.chatCompletionsCreate( + openai as unknown as import("openai").default, + { model: "gpt-4o", messages: [{ role: "user", content: "Hi" }] } + ) + ).resolves.toBeDefined(); + }); +}); diff --git a/sdk-ts/tsconfig.json b/sdk-ts/tsconfig.json new file mode 100644 index 0000000..95560ec --- /dev/null +++ b/sdk-ts/tsconfig.json @@ -0,0 +1,13 @@ +{ + "compilerOptions": { + "target": "ES2020", + "module": "NodeNext", + "moduleResolution": "NodeNext", + "strict": true, + "declaration": true, + "outDir": "dist", + "rootDir": "src", + "skipLibCheck": true + }, + "include": ["src"] +} diff --git a/src/flightdeck/server/routes/metrics.py b/src/flightdeck/server/routes/metrics.py index 5bc28bf..0c90b86 100644 --- a/src/flightdeck/server/routes/metrics.py +++ b/src/flightdeck/server/routes/metrics.py @@ -1,6 +1,7 @@ from __future__ import annotations from fastapi import APIRouter, Depends, Request +from fastapi.responses import PlainTextResponse from flightdeck.models import utc_now from flightdeck.server.mutation_access import require_protected_read_access @@ -19,3 +20,82 @@ def get_metrics(request: Request) -> dict[str, object]: "schema_version": LATEST_SCHEMA_MIGRATION_VERSION, "generated_at": utc_now().isoformat(), } + + +def _to_prometheus_text(counters: dict, schema_version: int) -> str: + """Format ledger counters as Prometheus text exposition format (version 0.0.4).""" + lines: list[str] = [] + + def _metric(name: str, description: str, value: int | float, labels: dict[str, str] | None = None) -> None: + lines.append(f"# HELP {name} {description}") + lines.append(f"# TYPE {name} gauge") + if labels: + label_str = ",".join(f'{k}="{v}"' for k, v in labels.items()) + lines.append(f"{name}{{{label_str}}} {value}") + else: + lines.append(f"{name} {value}") + lines.append("") + + _metric( + "flightdeck_releases_total", + "Registered release artifacts", + counters.get("releases_total", 0), + ) + _metric( + "flightdeck_run_events_total", + "Ingested RunEvent records", + counters.get("run_events_total", 0), + ) + _metric( + "flightdeck_promoted_pointers_total", + "Active promotion pointers", + counters.get("promoted_pointers_total", 0), + ) + _metric( + "flightdeck_actions_total", + "Ledger actions (promote + rollback)", + counters.get("actions_total", 0), + ) + + # actions_by_type — one labeled series per action type. + # Skip the block entirely when empty: the Prometheus 0.0.4 spec forbids + # # HELP / # TYPE headers with zero time-series beneath them. + actions_by_action: dict[str, int] = counters.get("actions_by_action", {}) + if actions_by_action: + lines.append("# HELP flightdeck_actions_by_type Ledger actions broken down by type") + lines.append("# TYPE flightdeck_actions_by_type gauge") + for action_name, action_count in actions_by_action.items(): + lines.append(f'flightdeck_actions_by_type{{action="{action_name}"}} {action_count}') + lines.append("") + + _metric( + "flightdeck_pricing_tables_total", + "Imported pricing tables", + counters.get("pricing_tables_total", 0), + ) + _metric( + "flightdeck_schema_version", + "Current ledger schema migration version", + schema_version, + ) + + return "\n".join(lines) + + +@router.get("/metrics") +def get_prometheus_metrics(request: Request) -> PlainTextResponse: + """Prometheus text exposition format scrape endpoint (no new dependencies). + + Returns metrics in ``text/plain; version=0.0.4; charset=utf-8`` as required by the + Prometheus scrape protocol. Same Bearer / loopback gate as ``GET /v1/metrics``. + Configure Prometheus with ``metrics_path: /metrics`` and + ``bearer_token: `` when a token is set. + """ + _, storage = ensure_app_state(request) + counters = storage.get_ledger_counters() + generated_at = utc_now().isoformat() + body = f"# generated_at {generated_at}\n\n" + _to_prometheus_text(counters, LATEST_SCHEMA_MIGRATION_VERSION) + return PlainTextResponse( + content=body, + media_type="text/plain; version=0.0.4; charset=utf-8", + ) diff --git a/tests/test_prometheus_metrics.py b/tests/test_prometheus_metrics.py new file mode 100644 index 0000000..8deca24 --- /dev/null +++ b/tests/test_prometheus_metrics.py @@ -0,0 +1,108 @@ +from __future__ import annotations + +from pathlib import Path + +import pytest +from click.testing import CliRunner +from fastapi.testclient import TestClient + +from flightdeck.cli.main import cli +from flightdeck.server.app import create_app +from flightdeck.storage import LATEST_SCHEMA_MIGRATION_VERSION + + +def test_prometheus_metrics_200_no_token(tmp_path: Path, monkeypatch: pytest.MonkeyPatch) -> None: + """GET /metrics returns 200 with no token set (open read access).""" + monkeypatch.delenv("FLIGHTDECK_LOCAL_API_TOKEN", raising=False) + monkeypatch.chdir(tmp_path) + assert CliRunner().invoke(cli, ["init", "--no-bundled-pricing"]).exit_code == 0 + with TestClient(create_app()) as client: + r = client.get("/metrics") + assert r.status_code == 200 + + +def test_prometheus_metrics_content_type(tmp_path: Path, monkeypatch: pytest.MonkeyPatch) -> None: + """GET /metrics content-type contains 'text/plain'.""" + monkeypatch.delenv("FLIGHTDECK_LOCAL_API_TOKEN", raising=False) + monkeypatch.chdir(tmp_path) + assert CliRunner().invoke(cli, ["init", "--no-bundled-pricing"]).exit_code == 0 + with TestClient(create_app()) as client: + r = client.get("/metrics") + assert r.status_code == 200 + assert r.headers.get("content-type", "").startswith("text/plain; version=0.0.4") + + +def test_prometheus_metrics_contains_releases_total(tmp_path: Path, monkeypatch: pytest.MonkeyPatch) -> None: + """Response body contains flightdeck_releases_total.""" + monkeypatch.delenv("FLIGHTDECK_LOCAL_API_TOKEN", raising=False) + monkeypatch.chdir(tmp_path) + assert CliRunner().invoke(cli, ["init", "--no-bundled-pricing"]).exit_code == 0 + with TestClient(create_app()) as client: + r = client.get("/metrics") + assert "flightdeck_releases_total" in r.text + + +def test_prometheus_metrics_contains_schema_version(tmp_path: Path, monkeypatch: pytest.MonkeyPatch) -> None: + """Response body contains flightdeck_schema_version.""" + monkeypatch.delenv("FLIGHTDECK_LOCAL_API_TOKEN", raising=False) + monkeypatch.chdir(tmp_path) + assert CliRunner().invoke(cli, ["init", "--no-bundled-pricing"]).exit_code == 0 + with TestClient(create_app()) as client: + r = client.get("/metrics") + assert "flightdeck_schema_version" in r.text + + +def test_prometheus_metrics_values_are_parseable_floats(tmp_path: Path, monkeypatch: pytest.MonkeyPatch) -> None: + """All metric value tokens in the response are parseable as floats (Prometheus requirement).""" + monkeypatch.delenv("FLIGHTDECK_LOCAL_API_TOKEN", raising=False) + monkeypatch.chdir(tmp_path) + assert CliRunner().invoke(cli, ["init", "--no-bundled-pricing"]).exit_code == 0 + with TestClient(create_app()) as client: + r = client.get("/metrics") + assert r.status_code == 200 + for line in r.text.splitlines(): + line = line.strip() + if not line or line.startswith("#"): + continue + # Last token on a metric line is the value + value_token = line.rsplit(" ", 1)[-1] + float(value_token) # raises ValueError if not parseable + + +def test_prometheus_metrics_schema_version_value(tmp_path: Path, monkeypatch: pytest.MonkeyPatch) -> None: + """flightdeck_schema_version value matches LATEST_SCHEMA_MIGRATION_VERSION.""" + monkeypatch.delenv("FLIGHTDECK_LOCAL_API_TOKEN", raising=False) + monkeypatch.chdir(tmp_path) + assert CliRunner().invoke(cli, ["init", "--no-bundled-pricing"]).exit_code == 0 + with TestClient(create_app()) as client: + r = client.get("/metrics") + assert r.status_code == 200 + for line in r.text.splitlines(): + if line.startswith("flightdeck_schema_version "): + value = float(line.split()[-1]) + assert value == float(LATEST_SCHEMA_MIGRATION_VERSION) + break + else: + pytest.fail("flightdeck_schema_version metric line not found") + + +def test_prometheus_metrics_401_without_bearer_when_token_set(tmp_path: Path, monkeypatch: pytest.MonkeyPatch) -> None: + """GET /metrics returns 401 when token is set and no Bearer is provided.""" + monkeypatch.setenv("FLIGHTDECK_LOCAL_API_TOKEN", "prom-read-gate") + monkeypatch.chdir(tmp_path) + assert CliRunner().invoke(cli, ["init", "--no-bundled-pricing"]).exit_code == 0 + with TestClient(create_app()) as client: + r = client.get("/metrics") + assert r.status_code == 401 + assert "read route" in r.json()["detail"] + + +def test_prometheus_metrics_200_with_bearer_when_token_set(tmp_path: Path, monkeypatch: pytest.MonkeyPatch) -> None: + """GET /metrics returns 200 when correct Bearer token is provided.""" + monkeypatch.setenv("FLIGHTDECK_LOCAL_API_TOKEN", "prom-read-ok") + monkeypatch.chdir(tmp_path) + assert CliRunner().invoke(cli, ["init", "--no-bundled-pricing"]).exit_code == 0 + with TestClient(create_app()) as client: + r = client.get("/metrics", headers={"Authorization": "Bearer prom-read-ok"}) + assert r.status_code == 200 + assert "flightdeck_releases_total" in r.text diff --git a/uv.lock b/uv.lock index 413a4f0..29b00bd 100644 --- a/uv.lock +++ b/uv.lock @@ -463,7 +463,7 @@ wheels = [ [[package]] name = "flightdeck-ai" -version = "1.2.0" +version = "1.3.0" source = { editable = "." } dependencies = [ { name = "aiosqlite" },