Devpost: "Building Agents for Real-World Challenges" — Arize track submission. Gemini 3-powered background agents for BUFI (a stablecoin-first financial workspace for global teams), with production-grade Arize Phoenix observability and a working self-improvement loop: the agents read their own traces, recall curated fixes, and get measurably better every morning.
BUFI's ops team drowns in recurring engineering/back-office missions (reconciliation checks, incident triage, repo chores). This fork turns Vercel's open-agents into BUFI's autonomous "minion" workforce:
Linear daily plan ──(cron 12:00)──► BUFI bridge ──► open-agents (Gemini 3 Pro)
│ plan (todo) · subagents (task)
│ human gate (ask_user_question)
▼
OpenInference traces ──► Arize Phoenix
│ ▲
agent introspects ───┘ │ LLM-as-judge evals
(Phoenix MCP + │ (gemini-3-flash)
recall/find_resolved_gap)
▼
morning digest (cron 13:00) ──► Slack #bu-minions
+ promotes successes/fixes to Phoenix datasets ⟲ self-improvement
| Requirement | Implementation |
|---|---|
| Code-owned Gemini agent runtime | packages/agent/open-agent.ts — AI SDK v6 ToolLoopAgent, default google/gemini-3-pro-preview via AI Gateway |
| OpenInference instrumentation → Phoenix | packages/arize-phoenix/otel.ts + apps/web/instrumentation.ts + experimental_telemetry (sessionId/chatId/source/linearTaskId/repo metadata) |
| Phoenix MCP in the agent | packages/agent/tools/phoenix-mcp.ts — @arizeai/phoenix-mcp over stdio via @ai-sdk/mcp, 14 trace/span/session/dataset tools merged into the toolset (PHOENIX_MCP_ENABLED) |
| Evals on traces | scripts/phoenix-eval.ts — Gemini LLM-as-judge → Phoenix span annotations + eval badges in the sessions UI |
| Self-improvement loop | packages/agent/tools/phoenix-introspection.ts (recall_similar_runs, find_resolved_gap) + app/api/bufi/digest/promote (auto-curation to bufi-recall / bufi-resolved-gaps datasets) |
| Multi-step + human-in-control | todo_write planning, task subagents, ask_user_question HITL, sessions UI |
Demo flow (scripts/demo-dispatch.ts + scripts/phoenix-seed-resolved-gap.ts): a
treasury-reconciliation mission fails on run 1 (missing script), an engineer curates the
fix into the bufi-resolved-gaps dataset, and run 2 self-heals — the agent's
find_resolved_gap call returns the curated fix and it completes the mission, citing its
own trace history.
Hosted at https://open-agents-bay.vercel.app · BUFI-side crons live in
BuFi007/desk-v1 (apps/app/src/app/api/cron/daily-plan-coffee,
coffee-digest). Cloud Run-deployable (standard Next.js standalone build); hosted on
Vercel for the demo.
Open Agents is an open-source reference app for building and running background coding agents on Vercel. It includes the web UI, the agent runtime, sandbox orchestration, and the GitHub integration needed to go from prompt to code changes without keeping your laptop involved.
The repo is meant to be forked and adapted, not treated as a black box.
Open Agents is a three-layer system:
Web -> Agent workflow -> Sandbox VM
- The web app handles auth, sessions, chat, and streaming UI.
- The agent runs as a durable workflow on Vercel.
- The sandbox is the execution environment: filesystem, shell, git, dev servers, and preview ports.
The agent does not run inside the VM. It runs outside the sandbox and interacts with it through tools like file reads, edits, search, and shell commands.
That separation is the main point of the project:
- agent execution is not tied to a single request lifecycle
- sandbox lifecycle can hibernate and resume independently
- model/provider choices and sandbox implementation can evolve separately
- the VM stays a plain execution environment instead of becoming the control plane
- chat-driven coding agent with file, search, shell, task, skill, and web tools
- durable multi-step execution with Workflow SDK-backed runs, streaming, and cancellation
- isolated Vercel sandboxes with snapshot-based resume
- repo cloning and branch work inside the sandbox
- optional auto-commit, push, and PR creation after a successful run
- session sharing via read-only links
- optional voice input via ElevenLabs transcription
A few details that matter for understanding the current implementation:
- Chat requests start a workflow run instead of executing the agent inline.
- Each agent turn can continue across many persisted workflow steps.
- Active runs can be resumed by reconnecting to the stream for the existing workflow.
- Sandboxes expose ports
3000,5173,4321,8000, and5001, use a build-prewarmed deployment template on Vercel, and hibernate after inactivity. - Auto-commit and auto-PR are supported, but they are preference-driven features, not always-on behavior.
See apps/web/.env.example for the full list. Summary:
POSTGRES_URL=
BETTER_AUTH_SECRET=NEXT_PUBLIC_VERCEL_APP_CLIENT_ID=
VERCEL_APP_CLIENT_SECRET=NEXT_PUBLIC_GITHUB_CLIENT_ID=
GITHUB_CLIENT_SECRET=
GITHUB_APP_ID=
GITHUB_APP_PRIVATE_KEY=
NEXT_PUBLIC_GITHUB_APP_SLUG=
GITHUB_WEBHOOK_SECRET=REDIS_URL=
KV_URL=
OPEN_AGENTS_RESOURCE_PROFILE=
VERCEL_PROJECT_PRODUCTION_URL=
NEXT_PUBLIC_VERCEL_PROJECT_PRODUCTION_URL=
VERCEL_SANDBOX_BASE_SNAPSHOT_ID=
ELEVENLABS_API_KEY=REDIS_URL/KV_URL: optional skills metadata cache (falls back to in-memory when not configured).OPEN_AGENTS_RESOURCE_PROFILE: optional deployment resource profile. Set tohobbyto use Hobby-compatible defaults for chat and sandbox resources; leave unset for standard behavior.VERCEL_PROJECT_PRODUCTION_URL/NEXT_PUBLIC_VERCEL_PROJECT_PRODUCTION_URL: canonical production URL for metadata and some callback behavior.VERCEL_SANDBOX_BASE_SNAPSHOT_ID: optional explicit base snapshot override for fresh sandboxes. Vercel deployments normally resolve their automatically prewarmed named template without this value. Outside a Vercel deployment, leaving it unset starts from the standard Sandbox runtime.ELEVENLABS_API_KEY: voice transcription.
-
Fork this repo.
-
Import the repo into Vercel. Neon Postgres is auto-provisioned if you use the deploy button above.
-
Generate a secret for session signing:
openssl rand -base64 32 # BETTER_AUTH_SECRET -
Add env vars in Vercel project settings:
POSTGRES_URL= BETTER_AUTH_SECRET=
-
Deploy once to get a stable production URL.
-
Create a Vercel OAuth app with callback URL:
https://YOUR_DOMAIN/api/auth/callback/vercel -
Add these env vars and redeploy:
NEXT_PUBLIC_VERCEL_APP_CLIENT_ID= VERCEL_APP_CLIENT_SECRET=
-
If you want the full GitHub-enabled coding-agent flow, create a GitHub App using:
- Homepage URL:
https://YOUR_DOMAIN - Callback URL:
https://YOUR_DOMAIN/api/auth/callback/github - Setup URL:
https://YOUR_DOMAIN/api/github/app/callback
In the GitHub App settings:
- use the GitHub App's Client ID and Client Secret for
NEXT_PUBLIC_GITHUB_CLIENT_IDandGITHUB_CLIENT_SECRET - make the app public if you want org installs to work cleanly
- Homepage URL:
-
Add the GitHub App env vars and redeploy.
-
Optionally add Redis/KV,
OPEN_AGENTS_RESOURCE_PROFILE=hobbyfor Hobby-compatible resource defaults, the canonical production URL vars, andVERCEL_SANDBOX_BASE_SNAPSHOT_IDonly if you need to override the automatically prewarmed sandbox template.
-
Install dependencies:
corepack enable pnpm install -
Create your local env file:
cp apps/web/.env.example apps/web/.env
-
Fill in the required values in
apps/web/.env. -
Start the app:
pnpm web
If you already have a linked Vercel project, you can pull env vars locally with vc env pull.
Authentication is handled by Better Auth with Vercel and GitHub as social providers. All auth routes are served from the /api/auth/[...all] catchall.
Create a Vercel OAuth app and use this callback:
https://YOUR_DOMAIN/api/auth/callback/vercel
For local development, use:
http://localhost:3000/api/auth/callback/vercel
Then set:
NEXT_PUBLIC_VERCEL_APP_CLIENT_ID=...
VERCEL_APP_CLIENT_SECRET=...You do not need a separate GitHub OAuth app. Open Agents uses the GitHub App's OAuth credentials as a Better Auth social provider, plus the App's installation tokens for repo access.
Create a GitHub App for installation-based repo access and configure:
- Homepage URL:
https://YOUR_DOMAIN - Callback URL:
https://YOUR_DOMAIN/api/auth/callback/github - Setup URL:
https://YOUR_DOMAIN/api/github/app/callback - make the app public if you want org installs to work cleanly
For local development, use http://localhost:3000 as the homepage URL, http://localhost:3000/api/auth/callback/github as the callback URL, and http://localhost:3000/api/github/app/callback as the setup URL.
Then set:
NEXT_PUBLIC_GITHUB_CLIENT_ID=... # GitHub App Client ID
GITHUB_CLIENT_SECRET=... # GitHub App Client Secret
GITHUB_APP_ID=...
GITHUB_APP_PRIVATE_KEY=...
NEXT_PUBLIC_GITHUB_APP_SLUG=...
GITHUB_WEBHOOK_SECRET=...GITHUB_APP_PRIVATE_KEY can be stored as the PEM contents with escaped newlines or as a base64-encoded PEM.
pnpm web # run dev server
pnpm check # lint + format check
pnpm fix # lint + format fix
pnpm typecheck # typecheck all packages
pnpm run ci # full CI: check, typecheck, tests, migration check
pnpm harness:smoke:sandbox:create # create a caller-owned sandbox for harness smoke tests
pnpm harness:smoke:codex # run one Codex turn against an existing sandbox
pnpm sandbox:snapshot-base # manually layer a new sandbox snapshot from an existing snapshotapps/web Next.js app, workflows, auth, chat UI
packages/agent agent implementation, tools, subagents, skills
packages/sandbox sandbox abstraction and Vercel sandbox integration
packages/shared shared utilities