FlightLog

FlightLog is an agent observability and CI regression layer for browser and AI agents. It logs traces, model calls, tool calls, browser actions, observations, artifacts, replay timelines, and evaluations so you can inspect what an autonomous agent did and gate changes with scored regression tests in GitHub Checks.

Setup

bun install
bun run db:up
bun run db:push
bun run check
bun run lint
bun run test

Database (local vs production)

Use Docker Postgres locally and Neon in production. Both use the same DATABASE_URL variable; the app picks the driver from the hostname.

Start local Postgres: bun run db:up (or docker compose up -d postgres).
Set DATABASE_URL in .env to the local URL (see .env.example).
Apply schema: bun run db:push.
On Vercel (or other production), set DATABASE_URL to your Neon connection string.

drizzle-kit uses the pg driver when pg is installed (required for local db:push). The Neon serverless driver is only used at runtime when the URL is not localhost. Schema changes belong in src/lib/server/db/schema.ts; apply them with bun run db:push.

Required environment:

DATABASE_URL=
FLIGHTLOG_KEYS_SECRET=
PUBLIC_CLERK_PUBLISHABLE_KEY=
CLERK_SECRET_KEY=

Optional LLM evaluation environment:

OPENAI_API_KEY=
OPENAI_AGENT_MODEL=computer-use-preview
OPENAI_EVAL_MODEL=
BROWSERBASE_API_KEY=
BROWSERBASE_PROJECT_ID=
FLIGHTLOG_AGENT_MAX_STEPS=20
FLIGHTLOG_AGENT_APPROVAL_TIMEOUT_SECONDS=300

Optional ChatGPT subscription OAuth (dashboard agent runs). ChatGPT credentials call OpenAI’s Codex Responses backend (chatgpt.com/backend-api/codex), not api.openai.com, because subscription OAuth tokens do not include the api.responses.write scope required on the platform API. Reconnect ChatGPT under Runs → Keys after upgrading if runs still fail with a scope error. ChatGPT sign-in supports tool-agent runs with Codex models (gpt-5.3-codex, etc.); browser runs still require a platform API key for computer-use.

# Defaults to the public Codex CLI client.
OPENAI_OAUTH_CLIENT_ID=
# Set only when registered with OpenAI; enables browser redirect sign-in.
# Without this, ChatGPT sign-in uses the device-code flow (Codex public client).
OPENAI_OAUTH_REDIRECT_URI=
# auto | on | off — auto uses device flow unless OPENAI_OAUTH_REDIRECT_URI is set
OPENAI_OAUTH_DEVICE_AUTH=auto

Optional GitHub App environment for PR regression checks:

GITHUB_APP_ID=
GITHUB_APP_PRIVATE_KEY=
GITHUB_WEBHOOK_SECRET=
PUBLIC_APP_URL=http://localhost:5173

FLIGHTLOG_KEYS_SECRET is required when saving or using dashboard-entered provider API keys. The dashboard stores only encrypted provider keys and returns masked previews to the browser. Clerk is required for dashboard access and API ingestion. Existing unowned local runs and provider credentials remain unowned until they are assigned through an explicit backfill.

LLM evaluations call api.openai.com with OPENAI_API_KEY when configured. Without it, evaluations still complete using deterministic rule checks. POST /api/runs/:id/evaluate accepts optional constraints only.

Browser-mode UI agent runs use a saved OpenAI credential (ChatGPT subscription or platform API key) plus a Browserbase credential. Tool-agent dashboard runs use the encrypted OpenAI or Anthropic credential selected in the UI.

UI Agent Runs

Sign in with Clerk, open /runs, connect ChatGPT (subscription) or save a platform API key for OpenAI/Anthropic, choose a run mode, provider, framework, model, and tools, then start a run. FlightLog creates a user-owned run, opens the run detail page, and streams events and spans live over Server-Sent Events.

Tool-agent runs use a curated local tool registry:

calculator.evaluate
web.fetchText
web.searchMock
time.now

Browser runs still use OpenAI computer use through Playwright CDP against a Browserbase Chromium session.

During a live run, FlightLog logs:

the user goal
browser actions requested by the model
concise reasoning summaries when provided
observations and screenshots after browser actions
approval decisions
final result or failure details

Screenshots are stored as screenshot artifacts and shown in the replay panel as they arrive.

Tool-agent runs log:

model call spans
tool call spans
tool inputs, outputs, and failures
final result text
provider, framework, model, and run mode metadata

Approvals

The runner pauses when OpenAI returns pending safety checks or when FlightLog detects selected risky actions. The run detail page shows an approval panel with the latest screenshot, action payload, and Approve/Reject controls. Approving resumes the run with acknowledged safety checks. Rejecting marks the run cancelled.

API Example

Create a run:

curl -X POST http://localhost:5173/api/runs \
  -H 'content-type: application/json' \
  -H 'authorization: Bearer CLERK_TOKEN' \
  -d '{"goal":"Find product pricing without buying anything","name":"Pricing check"}'

Append an event:

curl -X POST http://localhost:5173/api/runs/run_id/events \
  -H 'content-type: application/json' \
  -H 'authorization: Bearer CLERK_TOKEN' \
  -d '{"type":"tool_call","message":"Search pricing page","data":{"tool":"browser.search","input":{"query":"pricing"}}}'

Attach an artifact:

curl -X POST http://localhost:5173/api/runs/run_id/artifacts \
  -H 'content-type: application/json' \
  -H 'authorization: Bearer CLERK_TOKEN' \
  -d '{"type":"text","name":"observation","content":"Pricing page loaded"}'

Finish and evaluate:

curl -X PATCH http://localhost:5173/api/runs/run_id \
  -H 'content-type: application/json' \
  -H 'authorization: Bearer CLERK_TOKEN' \
  -d '{"status":"success"}'

curl -X POST http://localhost:5173/api/runs/run_id/evaluate \
  -H 'content-type: application/json' \
  -H 'authorization: Bearer CLERK_TOKEN' \
  -d '{"constraints":["Do not place a real order"]}'

SDK Example

import { FlightLogClient } from '$lib/sdk/client';

const flightlog = new FlightLogClient({
	endpoint: 'http://localhost:5173',
	apiKey: clerkToken
});
const run = await flightlog.startRun({
	goal: 'Find product pricing without buying anything',
	name: 'Pricing check',
	agentName: 'browser-agent'
});

await run.logPlan('Open the pricing page and inspect visible prices.');
await run.logToolCall({ tool: 'browser.search', input: { query: 'pricing' } });
await run.logObservation('Pricing page loaded.');
await run.finish({ status: 'success' });
await run.evaluate({ constraints: ['Do not place a real order'] });

Trace API Example

The versioned trace API is the model/framework-agnostic surface. It keeps browser actions first-class for replay, but represents model calls and tool calls as spans with lifecycle events.

import { FlightLogClient, openAIAttributes } from '$lib';

const flightlog = new FlightLogClient({
	endpoint: 'http://localhost:5173',
	apiKey: clerkToken
});
const trace = await flightlog.startTrace({
	goal: 'Look up the customer and draft a refund response',
	name: 'Refund agent',
	agentName: 'support-agent',
	metadata: { framework: 'custom-loop' }
});

const model = await trace.logModelCall({
	name: 'plan next action',
	input: { messages: [{ role: 'user', content: 'Refund request' }] },
	attributes: openAIAttributes({ model: 'gpt-4.1' })
});
await model.complete({ tool: 'crm.lookupCustomer' });

const tool = await trace.logToolCall({
	name: 'crm.lookupCustomer',
	input: { email: 'customer@example.com' }
});
await tool.complete({ customerId: 'cus_123' });

await trace.finish({ status: 'success' });
await trace.evaluate({ constraints: ['Do not refund more than the order total'] });

Regression Testing and GitHub Checks

FlightLog can run goal-based regression suites against agents, evaluate each run with rules plus an optional LLM judge, and report pass/fail results as GitHub Check Runs.

Create a regression suite mapped to a GitHub repository.
Add cases with goals, constraints, and minimum score thresholds.
Install the FlightLog GitHub App on that repository.
Open or update a pull request; FlightLog receives the webhook, runs the suite, and updates the check with aggregate score and per-case findings.
Inspect traces and evaluator output in the dashboard at /regression and /runs.

Create a suite:

curl -X POST http://localhost:5173/api/regression/suites \
  -H 'content-type: application/json' \
  -H 'authorization: Bearer CLERK_TOKEN' \
  -d '{"name":"PR smoke tests","repositoryOwner":"acme","repositoryName":"support-agent"}'

Add a case:

curl -X POST http://localhost:5173/api/regression/suites/suite_id \
  -H 'content-type: application/json' \
  -H 'authorization: Bearer CLERK_TOKEN' \
  -d '{"name":"Lookup customer","goal":"Find the customer record for support@example.com","constraints":["Do not delete records"],"minScore":80}'

Run a suite manually:

curl -X POST http://localhost:5173/api/regression/suites/suite_id/runs \
  -H 'content-type: application/json' \
  -H 'authorization: Bearer CLERK_TOKEN' \
  -d '{}'

GitHub App webhook endpoint:

POST /api/github/webhook

SDK Regression Example

import { FlightLogClient } from '$lib/sdk/client';

const flightlog = new FlightLogClient({
	endpoint: 'http://localhost:5173',
	apiKey: clerkToken
});

const suite = await flightlog.createRegressionSuite({
	name: 'PR smoke tests',
	repositoryOwner: 'acme',
	repositoryName: 'support-agent'
});

await flightlog.addRegressionCase(suite.id, {
	name: 'Lookup customer',
	goal: 'Find the customer record for support@example.com',
	constraints: ['Do not delete records'],
	minScore: 80
});

const run = await flightlog.startRegressionRun(suite.id);
console.log(run.pageUrl);

Framework Metadata Helpers

import {
	aiSdkAttributes,
	anthropicAttributes,
	instrumentModelCall,
	langChainAttributes
} from '$lib';

await instrumentModelCall({
	trace,
	name: 'anthropic planning call',
	metadata: anthropicAttributes({ model: 'claude-sonnet-4-5' }),
	call: async () => {
		// Call Anthropic from your app. FlightLog observes metadata/results only.
		return { text: 'Use the CRM lookup tool.' };
	}
});

await trace.logModelCall({
	name: 'ai sdk generateText',
	attributes: aiSdkAttributes({ provider: 'openai', model: 'gpt-4.1-mini' })
});

await trace.logModelCall({
	name: 'langchain runnable',
	attributes: langChainAttributes({ provider: 'anthropic', model: 'claude-haiku-4-5' })
});

MVP Limitations

No organizations or projects; isolation is per Clerk user.
Artifacts are stored in Postgres as text, URLs, or data URLs.
Replay is timeline-based, not a true browser session replay.
LLM evaluation is optional.
UI-triggered agent runs use an in-process runner. Production multi-instance deployments should move this to a durable queue/worker.
Regression suite execution uses the same in-process runner and requires configured provider credentials for tool-agent cases unless cases are completed externally through the regression API.
Golden trace comparison, PR annotations, and durable GitHub check workers are not implemented yet.

Name		Name	Last commit message	Last commit date
Latest commit History 126 Commits
.cursor		.cursor
.github/workflows		.github/workflows
.plans		.plans
.vscode		.vscode
src		src
static		static
test		test
.env.example		.env.example
.gitignore		.gitignore
.npmrc		.npmrc
.prettierignore		.prettierignore
.prettierrc		.prettierrc
AGENTS.md		AGENTS.md
LICENSE		LICENSE
README.md		README.md
bun.lock		bun.lock
components.json		components.json
compose.yaml		compose.yaml
drizzle.config.ts		drizzle.config.ts
eslint.config.js		eslint.config.js
package.json		package.json
skills-lock.json		skills-lock.json
svelte.config.js		svelte.config.js
tsconfig.json		tsconfig.json
vercel.json		vercel.json
vite.config.ts		vite.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FlightLog

Setup

Database (local vs production)

UI Agent Runs

Approvals

API Example

SDK Example

Trace API Example

Regression Testing and GitHub Checks

SDK Regression Example

Framework Metadata Helpers

MVP Limitations

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

FlightLog

Setup

Database (local vs production)

UI Agent Runs

Approvals

API Example

SDK Example

Trace API Example

Regression Testing and GitHub Checks

SDK Regression Example

Framework Metadata Helpers

MVP Limitations

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages