A fully native .NET reimplementation of Apache Spark — with first-class Delta tables and Kubernetes-native distributed execution. No JVM.
🚧 Early development. DeltaSharp is greenfield: the architecture and roadmap are defined (see
docs/adr/), but there is no released code yet. The API snippets below show the target API. Contributions and design feedback are very welcome — see Contributing.
DeltaSharp brings the Apache Spark programming model — SparkSession,
DataFrame/Dataset<T>, columns, and SQL — to idiomatic C#/.NET, executing
natively (no JVM bridge). It is built around four pillars:
- Apache Spark parity — match Spark's API and execution semantics so code and concepts port over directly.
- Native Delta tables — Delta Lake (transaction log, ACID, time travel, schema evolution, deletion vectors, CDF) implemented in .NET.
- Kubernetes-native — distributed driver/executor execution managed by a custom Operator and CRDs.
- Open source — Apache-2.0, community-driven (ADR-0015).
.NET for Apache Spark is a JVM bridge: every DataFrame call is a round-trip into a JVM-hosted Spark, and all execution, memory management, shuffle, and Parquet I/O happen in the JVM. DeltaSharp is a native engine — it implements the optimizer, vectorized execution, shuffle, Delta, and Parquet in .NET, enabling:
- Native AOT executors with fast cold start and low memory for ephemeral Kubernetes pods (ADR-0014).
- Vectorized columnar execution with SIMD kernels over Arrow-compatible batches (ADR-0001, ADR-0002).
- A .NET-native remote shuffle service designed for spot/scale-down resilience (ADR-0004).
DeltaSharp follows Spark's layered model: the API builds an immutable logical plan (lazy), a Catalyst-style analyzer + optimizer (rule-based plus a cost-based optimizer and Adaptive Query Execution) produces a physical plan, and actions trigger distributed execution across executor pods. The defining invariant: transformations are lazy, actions are eager.
using DeltaSharp.Sql;
using static DeltaSharp.Sql.Functions;
using var spark = SparkSession.Builder()
.AppName("quickstart")
.GetOrCreate();
// Read a Delta table, transform lazily, act eagerly.
var df = spark.Read().Format("delta").Load("/data/events");
df.Filter(Col("country") == "US")
.GroupBy("device")
.Agg(Count("*").As("events"))
.OrderBy(Col("events").Desc())
.Show();
// SQL is a first-class door into the same engine.
spark.Sql("SELECT device, COUNT(*) AS n FROM delta.`/data/events` GROUP BY device")
.Show();Decisions are recorded as Architecture Decision Records — the source of truth:
docs/adr/— 15 ADRs (execution, columnar format, transport, shuffle, catalog, optimizer/AQE, SQL, types, operator, streaming, Delta protocol, plan serialization, memory, target framework, OSS).docs/engineering/design/engine-architecture.md— the overview with diagrams..github/copilot-instructions.md— the conventions summary.
See ROADMAP.md — milestones mapped to the ADRs.
DeltaSharp targets .NET 10 (engine) and multi-targets public libraries
(net8.0;net10.0). Once the solution is scaffolded:
dotnet restore
dotnet build -c Release
dotnet test
dotnet format --verify-no-changesWe welcome code, tests, docs, triage, and design proposals.
- CONTRIBUTING.md — dev setup, DCO sign-off, PR process.
- CODE_OF_CONDUCT.md — Contributor Covenant.
- GOVERNANCE.md — how decisions are made and how to become a maintainer.
- docs/rfcs/ — the RFC process for substantial changes.
Please report vulnerabilities privately — see SECURITY.md.
Apache License 2.0 — see LICENSE and NOTICE. Contributions are accepted under the same license via DCO sign-off.