GitHub - SkardiLabs/skardi: Skardi is an agent data plane that gives AI agents data autonomy.

Skardi is an agent data plane that gives AI agents data autonomy.

What is Skardi?

Skardi is an open-source data plane for AI agents — every tool call your agent makes hits a Skardi pipeline: declarative SQL, served over REST or shell, with retrieval primitives built in. Build RAG, hybrid search, memory, and data APIs across databases, files, data lakes, and vector stores.

Borrowing Spark's shape — one engine over every data source — but tilted for online serving, not analytics. Your agent and your pipeline YAMLs are the control plane; Skardi is the data plane every tool call traverses, designed for how agents actually use data: schemas they can read, outputs they can parse, tools they can discover, writes they can trust.

skardi CLI — federated SQL + parameterized pipelines as shell commands. Drop it into any agent that has a Bash tool (Claude Code, Cursor, custom loops) and it's wired.
skardi-server — two peer surfaces on one engine: online serving (declarative SQL pipelines as parameterized REST endpoints) and offline jobs (async batch writes into Lance or any read-write DB, with atomic commit + run ledger).
Soon — skills generation for auto-discovery, MCP binding for non-Claude hosts, a first-class memory primitive (structured + vector + FTS + provenance + TTL), lineage, and agent-scoped governance.

Beta. Skardi is under active development. APIs may move. Hit us on Discord if you want to co-design a POC.

Why an agent data plane?

Agents don't lack intelligence — they lack data autonomy. Hand an LLM a raw schema dump and it hallucinates; hand it a bag of bespoke REST endpoints and it gets lost; hand it a vector store and it still can't JOIN. The gap isn't the model. The gap is that today's data stack was designed for humans writing queries, not agents calling tools.

Skardi closes that gap with three deliberate choices:

One engine over every source. DataFusion-based single-node federation. An agent can JOIN a CSV against Postgres against a Lance dataset in one query.
Online serving. Parameterized SQL served synchronously as REST endpoints; the low-latency path every agent tool call hits.
Offline jobs. The same SQL shape run asynchronously into a durable destination, with a run ledger, atomic commit, and submit / poll / cancel.

Read the full narrative in docs/agent_data_plane.md.

Quick Start

Install the CLI

# From source (recommended during beta)
git clone https://github.com/SkardiLabs/skardi.git
cd skardi
cargo install --locked --path crates/cli

Or grab a pre-built binary:

curl -fSL "https://github.com/SkardiLabs/skardi/releases/latest/download/skardi-$(uname -m | sed 's/arm64/aarch64/')-$(uname -s | sed 's/Linux/unknown-linux-gnu/' | sed 's/Darwin/apple-darwin/').tar.gz" | tar xz
sudo mv skardi /usr/local/bin/

Platform	Target
Linux x86_64	`skardi-x86_64-unknown-linux-gnu.tar.gz`
Linux ARM64	`skardi-aarch64-unknown-linux-gnu.tar.gz`
macOS ARM64 (Apple Silicon)	`skardi-aarch64-apple-darwin.tar.gz`

macOS Intel binaries are not published. Build from source if you need one.

First-time agent loop (two minutes)

# 1. Ad-hoc SQL across local + remote data — no server, no pre-registration
skardi query --sql "SELECT * FROM './data/products.csv' LIMIT 10"
skardi query --sql "SELECT * FROM 's3://mybucket/events.parquet' LIMIT 10"

# 2. Register named sources in a ctx, query them by name
skardi query --ctx ./ctx.yaml --sql "SELECT * FROM products LIMIT 10"

# 3. Turn a parameterized SQL into an agent-callable verb (alias + pipeline)
#    — now any agent with a shell can call it:
skardi grep "turing machine computation" --limit=10

Drop skardi into a Claude Code or Cursor session and the agent can already use any pipeline you've declared as a tool via its Bash integration. No MCP config, no separate server — that's the MVP design intent.

Skardi Server — online serving + offline jobs

cargo run --bin skardi-server -- \
  --ctx ctx.yaml \
  --pipeline pipelines/ \
  --jobs jobs/ \
  --port 8080

# Pipelines: synchronous answer
curl -X POST http://localhost:8080/product-search-demo/execute \
  -H "Content-Type: application/json" \
  -d '{"brand": null, "max_price": 100.0, "limit": 5}'

# Jobs: submit an async write-to-destination
skardi job run backfill-to-lake --param from_date='2026-01-01'
skardi job status <run_id>

Full reference:

CLI — docs/cli.md
Server — docs/server.md
Pipelines (online serving) — docs/pipelines.md
Jobs (offline batch) — docs/jobs.md
Why an agent data plane — docs/agent_data_plane.md

Worked examples

For end-to-end walkthroughs — RAG, recommendations, an agent-native wiki, a simple REST backend — see the demo/ directory. Each demo ships as a self-contained ctx.yaml plus pipelines (and sometimes jobs), so reading the YAML shows the Skardi shape in practice. Full list in Demo & Examples below.

Local knowledge base for local agents

The auto_knowledge_base skill turns a directory of documents into a queryable RAG with one command — chunking, embedding, indexing, and hybrid search exposed as a skardi grep verb. Zero infra by default (SQLite + local embeddings), so any Claude Code or Cursor session gets a grounded, citable local knowledge base.

Supported Data Sources

Type	CRUD	Description	Docs
CSV	Read	Local or remote CSV files	docs/server.md
Parquet	Read	Local or remote Parquet files	docs/server.md
JSON / NDJSON	Read	Local or remote JSON files	docs/cli.md
PostgreSQL	Full	Table or catalog registration, pgvector KNN	docs/postgres/
MySQL	Full	Table or catalog registration	docs/mysql/
SQLite	Full	Table or catalog registration, sqlite-vec KNN, FTS	docs/sqlite/
MongoDB	Full	Collections with point lookups	docs/mongo/
Redis	Full	Hashes mapped to SQL rows	docs/redis/
SeekDB	Full	MySQL-wire CRUD, native FULLTEXT FTS, HNSW VECTOR KNN	docs/seekdb/
Apache Iceberg	Read	Schema evolution, partition pruning	docs/iceberg/
Lance	Read (job-write)	KNN vector search, BM25 FTS; job destination	docs/lance/
S3 / GCS / Azure	Read	CSV, Parquet, Lance from object stores	docs/S3_USAGE.md

Additional Features

Federated queries — JOIN across different source types. See docs/federated-queries.md.
Authentication — session-based via better-auth + SQLite. See docs/auth/.
ONNX inference — inline model predictions in SQL. See docs/onnx_predict.md.
Embedding inference — GGUF, Candle, or remote APIs. See docs/embeddings/.
Observability — OTel traces / metrics / logs with Grafana. See docs/observability.md.

Architecture

Click to expand Skardi's architecture diagram

Docker

# Build
docker build -t skardi .
docker build -t skardi --build-arg FEATURES=embedding .

# Or pull pre-built
docker pull ghcr.io/skardilabs/skardi/skardi-server:latest

# Run
docker run --rm \
  -v /path/to/your/ctx.yaml:/config/ctx.yaml \
  -v /path/to/your/pipelines:/config/pipelines \
  -p 8080:8080 \
  skardi \
  --ctx /config/ctx.yaml \
  --pipeline /config/pipelines \
  --port 8080

Cloud (Sealos)

The fastest cloud path is the Sealos template in skardi-skills — our growing library of ready-to-use Skardi setups. One-click launch, no local setup.

Building from Source

git clone https://github.com/SkardiLabs/skardi.git
cd skardi

cargo build --release -p skardi-cli
cargo build --release -p skardi-server

# With embedding support (ONNX, GGUF, Candle, remote embed)
cargo build --release -p skardi-server --features embedding

Demo & Examples

Directory	Description
demo/llm_wiki/	Agent-native wiki (server + CLI flavors) — hybrid search, inline embeddings, agent verbs
demo/simple_backend/	REST backend with SQLite and optional auth
demo/rag/	Retrieval-augmented generation pipeline
demo/movie_recommendation/	Movie recommendations with ONNX NCF model

For data-source-specific demos, see the entries in Supported Data Sources.

Roadmap

We're building in public. [x] means shipped today, [ ] means open for contribution. Open an issue or hop into Discord on anything unchecked.

1 Federated SQL engine

DataFusion single-node federation across CSV, Parquet, JSON, S3 / GCS / Azure, Postgres, MySQL, SQLite, MongoDB, Redis, Iceberg, Lance, SeekDB — all joinable in one query
Register by table, or load an entire DB (Postgres / MySQL / SQLite) as a DataFusion catalog — one config line either way
Graph database sources (Neo4j / Kuzu) — native federation to unlock graphRAG patterns alongside vector / FTS retrieval

2 Retrieval primitives

Vector search — pg_knn (pgvector), sqlite_knn (sqlite-vec), Lance KNN, SeekDB HNSW
Full-text search — pg_fts, sqlite_fts, Lance BM25 inverted indexes, SeekDB FULLTEXT
Hybrid search — RRF merge of FTS + KNN in plain SQL
Inline embeddings — candle() UDF (GGUF / Candle / remote embed APIs) runs directly inside SQL; content + vector stay on the same row atomically
ONNX inference — onnx_predict UDF for inline model predictions in SQL
Chunking UDF — character / token / markdown / code splitters (via text-splitter) so ingestion can chunk inline in SQL
Memory primitive — hybrid access + TTL + provenance + consolidation collapsed into one declarative macro

3 Online serving (pipelines)

Declarative YAML → parameterized REST endpoint with inferred request / response schema
Built-in pipeline dashboard
CLI pipeline binding + aliases — skardi run <pipeline> --param=… and user-defined verb aliases (#90)
CLI federated SQL — skardi query against files, object stores, datalake formats, and databases with no server required

4 Offline jobs

Async batch execution with submit / poll / cancel (#98)
Lance dataset destinations with atomic commit + crash recovery
SQL-DML destinations (Postgres / MySQL / SQLite)
SQLite-backed run ledger with submit-time schema diff

5 Agent-facing bindings

REST — every pipeline served as a parameterized HTTP endpoint
Shell — every pipeline runnable as a skardi command; works in Claude Code, Cursor, and any agent with a Bash tool
Skills generator — skardi skills generate --ctx <ctx.yaml> --out .claude/skills/ emits a skill Markdown per pipeline for Claude Code / Desktop auto-discovery
MCP binding — same pipeline YAML projected to MCP tools for non-Claude hosts

6 Governance & lineage

Catalog with semantics — NL description on catalog / table / column; an agent-callable describe pipeline
Lineage capture — agent_id, session_id, tool_call_id, timestamp on writes; queryable from metadata tables
Agent identity passthrough — any binding injects client identity into a SQL context var pipelines can read
Snapshot-as-branch / agent checkpoints — Iceberg / Lance-backed; git checkout-like semantics for destructive agent experiments

7 Ops

Session auth — drop-in user auth via better-auth backed by SQLite
Observability — OpenTelemetry traces / metrics / logs with a pre-configured Grafana stack
Docker + pre-built binaries — Linux x86_64 / ARM64, macOS ARM64

What's already in the box

Engine

Federated SQL across every major source — CSV, Parquet, JSON, S3 / GCS / Azure, Postgres, MySQL, SQLite, MongoDB, Redis, Iceberg, Lance, SeekDB — all joinable in one query.
Register by table or by catalog — pick per source: expose a single named table, or load an entire Postgres / MySQL / SQLite database as a DataFusion catalog. One config line either way.
Vector search — native KNN via Lance, pg_knn (pgvector), sqlite_knn (sqlite-vec), SeekDB HNSW.
Full-text search — Lance BM25 inverted indexes, pg_fts, sqlite_fts, SeekDB native FULLTEXT.
Inline embeddings — candle() UDF (GGUF / Candle / remote embed APIs) directly inside SQL, so content + vector stay on the same row atomically.
ONNX inference — onnx_predict UDF for inline model predictions in SQL.
Hybrid search — RRF merge of FTS + KNN in plain SQL (see llm_wiki demo).

Agent-facing surfaces

CLI skardi run <pipeline> — parameterized pipeline invocation from any shell; works in Claude Code / Cursor / any agent with a Bash tool.
User-defined aliases — skardi grep "…" → run wiki-search-hybrid. Collapses multi-line SQL into agent-ergonomic verbs.
Online serving — YAML → parameterized HTTP endpoint, with an inferred request / response schema and a built-in dashboard.
Offline jobs — async pipeline that commits to Lance or a DB destination, with a SQLite run ledger and submit / poll / cancel. (#98)

Ops

Session auth — drop-in user auth via better-auth backed by SQLite.
Observability — OpenTelemetry traces / metrics / logs with a pre-configured Grafana stack.
Docker + pre-built binaries — Linux x86_64 / ARM64, macOS ARM64.

Community

Building an agent on top of Skardi, or want to influence the roadmap above? Join us on Discord, file an issue, or open a PR. We read everything.

License

Apache 2.0 — see LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 171 Commits
.config		.config
.github/workflows		.github/workflows
asset		asset
crates		crates
data		data
demo		demo
docs		docs
models		models
observability		observability
.dockerignore		.dockerignore
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
AGENTS.md		AGENTS.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

What is Skardi?

Why an agent data plane?

Quick Start

Install the CLI

First-time agent loop (two minutes)

Skardi Server — online serving + offline jobs

Worked examples

Local knowledge base for local agents

Supported Data Sources

Additional Features

Architecture

Docker

Cloud (Sealos)

Building from Source

Demo & Examples

Roadmap

What's already in the box

Engine

Agent-facing surfaces

Ops

Community

License

About

Uh oh!

Releases 6

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

What is Skardi?

Why an agent data plane?

Quick Start

Install the CLI

First-time agent loop (two minutes)

Skardi Server — online serving + offline jobs

Worked examples

Local knowledge base for local agents

Supported Data Sources

Additional Features

Architecture

Docker

Cloud (Sealos)

Building from Source

Demo & Examples

Roadmap

What's already in the box

Engine

Agent-facing surfaces

Ops

Community

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 6

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages