Skip to content

timaxorum/open_data_apps

 
 

Repository files navigation

Tracecast: Open Source Generative Data Apps

This project lets you generate interactive data apps on top of your data, using a Cursor-style AI chat. It stitches together Marimo, LangGraph agents, and data warehouse connectors.

Demo Screenshot

Inspiration

The main inspiration for this project was Marimo, an exciting open source python notebook that can be "queried with SQL, run as a script, and deployed as an app" (source). The recent release of Marimo Pair (source) demonstrated the power of connecting AI agents like Claude Code to Marimo notebooks directly. This project seeks to build on that work. It incorporates a LangGraph agent with two key abilities: (1) the ability to execute queries against a connected data warehouse (such as Snowflake); (2) the ability to write Marimo notebooks. This approach intentionally decouples exploratory data analysis from writing a finished Marimo notebook.

This project intentionally hides the Marimo edit mode. That means that the end user only ever sees a finished, read-only data app. Ease of use and trust in AI output were the main drivers behind this decision.

Quickstart

git clone https://github.com/tracecast/open_data_apps.git
cd open_data_apps
cp .env.example .env
# Required: generate an encryption key for your data-source credentials.
echo "ENCRYPTION_KEY=$(openssl rand -base64 32)" >> .env
docker compose -f deploy/docker/docker-compose.yml up --build

If ENCRYPTION_KEY is missing or shorter than 32 characters the web service refuses to start. This is deliberate — data-source credentials (Snowflake passwords, BigQuery service-account JSON, etc.) are AES-GCM encrypted at rest with a key derived from this value.

Once the stack is up, open http://localhost:3000 and go to Settings → Models to add an LLM provider and API key. The agent reads model configuration from the database, not from .env, so this step is required before the chat will respond.

The local stack starts:

  • Postgres on 5432
  • Local Marimo runtime API on 8010, dashboards on 2719-2819
  • LangGraph agent API on 8000
  • Next.js web app on 3000

docker compose up waits for each service's healthcheck before starting the next, so the web app never starts requesting LangGraph endpoints before the agent server is ready.

Features

Generate data apps from your data, with a Cursor-style chat

demo_video.mp4

Edit and refine data apps using natural language

edit_demo.mp4

Connect a data source

Data connector image

Bring your own model

Models page image

Data connectors

5 data sources are currently supported: Snowflake, BigQuery, Postgres, Metabase, and CSV. The code for the database query tools was derived from Google's open source MCP Toolbox for Databases. CSV uploads are queried in-process via DuckDB, so you can join multiple CSVs with normal SQL.

There is currently no support for MCP. Instead, data query tools are hardcoded. This decision was made to ensure high quality AI queries and limit tool bloat.

Architecture

┌────────────────────┐    SSE / JSON    ┌────────────────────┐
│  Next.js web app   │ ───────────────▶ │   Agent server     │
│  (chat + iframe)   │ ◀─────────────── │ (FastAPI, custom   │
│                    │                  │  LangGraph SDK     │
│                    │                  │  wire impl)        │
└──────────┬─────────┘                  └──────────┬─────────┘
           │                                       │
           │ start session / write notebook        │ AsyncPostgresSaver
           ▼                                       │
┌────────────────────┐                             ▼
│ runtime-local      │                  ┌────────────────────────┐
│ (FastAPI + Marimo) │                  │   Postgres 16          │
│ `marimo run --watch│ ◀──────────────▶ │  - dashboards          │
│  --headless`       │ shared notebook  │  - dashboard_versions  │
│                    │     volume       │  - chat_threads        │
│                    │                  │  - data_sources        │
│                    │                  │  - model_configs       │
│                    │                  │  - checkpoint tables   │
└────────────────────┘                  └────────────────────────┘
  • Web app (apps/web): Next.js 15. Renders dashboards, hosts the chat panel via @langchain/langgraph-sdk@0.1.0's useStream, and proxies SDK calls through /api/ch-chat so encrypted data-source credentials and dashboard context are injected server-side.
  • Agent server (apps/agent-server): FastAPI app that implements the LangGraph Platform SDK wire protocol (/threads, /threads/{id}/runs/stream, /threads/{id}/state, /threads/{id}/history, ...) over a LangGraph-compiled ReAct agent. Checkpoints are persisted via AsyncPostgresSaver, so chat history and branching survive restarts. SSE events are encoded as messages/partial, messages/complete, updates, values.
  • Runtime (apps/runtime-local): one FastAPI server that owns the per-dashboard marimo run --watch processes, plus validate-notebook / test-notebook for the agent's tool flow.
  • **tc** (packages/tc): the Python package notebooks import. tc.postgres.query(sql), tc.metabase.query(db, sql), tc.bigquery.query(sql), tc.snowflake.query(sql), tc.csv.query(sql) (DuckDB over uploaded CSV files).

Security model

Tracecast is currently local-first and single-tenant. The OSS build is designed to run on your laptop, behind your OS firewall, for one user at a time. It is not hardened for hostile environments.

Data-source credentials

Tracecast sends SQL written by an LLM to your warehouse. The trust boundary is the database
role you provide when you add a connection.

Always connect each data source with a least-privilege, read-only user. A prompt-injection payload smuggled through data the agent reads could otherwise convince the model to issue DELETE / DROP / UPDATE against your warehouse. A read-only role makes those writes impossible at the database layer, which is the only enforcement Tracecast relies on.

See docs/data-sources.md for copy-pasteable least-privilege role recipes for Postgres, Snowflake, BigQuery, and Metabase.

Adding data sources

Open http://localhost:3000/dashboard/data-sources and add a Postgres / Metabase / BigQuery / Snowflake connection, or upload a CSV. Database credentials are encrypted at rest with ENCRYPTION_KEY and decrypted only inside the Next.js server when the agent fires a run. The agent reads them from a private __runtime_credentials config slot and never echoes them to the LLM prompt. CSV uploads are stored unencrypted on a shared Docker volume (csv-uploads); see docs/data-sources.md for the CSV-specific threat model.

Chat → dashboard flow

  1. User opens /dashboard/analytics/<id> and types into the chat panel.
  2. The web proxy adds runtime credentials + dashboard context to the SDK's runs/stream request.
  3. The agent server runs the dashboard ReAct loop. The agent calls read_dashboard_skills, inspects schemas, runs small sample queries, and writes a complete Marimo notebook into the update_analytics_dashboard tool.
  4. update_analytics_dashboard validates the notebook (marimo check), writes it to the runtime, starts (or hot-reloads) the viewer, executes the notebook once to surface runtime errors, then persists the new source to dashboards.notebook_source.
  5. The chat panel detects the matched AI tool-call + tool-result pair and bumps the iframe's refreshTrigger, so MarimoEmbed re-fetches a fresh session URL from /api/dashboards/<id>/session. The runtime answer is cached: the notebook is not re-written on every session start, eliminating the per-turn flicker.

Branching, history, and editing

Because chat threads are checkpointed in Postgres, you can:

  • Edit a past message — useStream forks a new branch from that checkpoint.
  • Retry the last AI response — same branching mechanic.
  • Switch between branches via the inline branch picker on each human message.

State survives docker compose down && up.

Supported models

Tracecast is provider-agnostic. You can pick a model and add API keys entirely through the Settings page in the Web UI.

The OpenAI-compatible adapter (ChatOpenAICompatibleWithReasoning in apps/agent-server/agent_server/models.py) is a thin subclass of ChatOpenAI that preserves provider-specific reasoning_content (which vanilla ChatOpenAI drops). That single adapter covers every modern OpenAI-compatible reasoning endpoint without provider-specific code.

Files of interest

  • apps/agent-server/agent_server/main.py — SDK-compatible HTTP surface and SSE runner.
  • apps/agent-server/agent_server/graph.py — compiled agent graph with AsyncPostgresSaver.
  • apps/agent-server/agent_server/analytics_tools.py — fully async dashboard + per-source tools.
  • apps/agent-server/agent_server/sse.py — wire-format-correct SSE encoder.
  • apps/web/app/api/ch-chat/[...path]/route.ts — credentialed proxy to the agent.
  • apps/web/components/ChatPanel.tsx — chat UI and dashboard-readiness detection.
  • apps/runtime-local/runtime_local/main.py — Marimo viewer manager.

See docs/architecture.md for more.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • TypeScript 60.7%
  • Python 36.7%
  • Dockerfile 1.4%
  • PLpgSQL 1.1%
  • Other 0.1%