Skip to content

khaines/deltasharp

DeltaSharp

A fully native .NET reimplementation of Apache Spark — with first-class Delta tables and Kubernetes-native distributed execution. No JVM.

License: Apache 2.0 Status: early development

🚧 Early development. DeltaSharp is greenfield: the architecture and roadmap are defined (see docs/adr/), but there is no released code yet. The API snippets below show the target API. Contributions and design feedback are very welcome — see Contributing.

What is DeltaSharp?

DeltaSharp brings the Apache Spark programming model — SparkSession, DataFrame/Dataset<T>, columns, and SQL — to idiomatic C#/.NET, executing natively (no JVM bridge). It is built around four pillars:

  1. Apache Spark parity — match Spark's API and execution semantics so code and concepts port over directly.
  2. Native Delta tables — Delta Lake (transaction log, ACID, time travel, schema evolution, deletion vectors, CDF) implemented in .NET.
  3. Kubernetes-native — distributed driver/executor execution managed by a custom Operator and CRDs.
  4. Open source — Apache-2.0, community-driven (ADR-0015).

Why DeltaSharp?

.NET for Apache Spark is a JVM bridge: every DataFrame call is a round-trip into a JVM-hosted Spark, and all execution, memory management, shuffle, and Parquet I/O happen in the JVM. DeltaSharp is a native engine — it implements the optimizer, vectorized execution, shuffle, Delta, and Parquet in .NET, enabling:

  • Native AOT executors with fast cold start and low memory for ephemeral Kubernetes pods (ADR-0014).
  • Vectorized columnar execution with SIMD kernels over Arrow-compatible batches (ADR-0001, ADR-0002).
  • A .NET-native remote shuffle service designed for spot/scale-down resilience (ADR-0004).

The big idea

DeltaSharp follows Spark's layered model: the API builds an immutable logical plan (lazy), a Catalyst-style analyzer + optimizer (rule-based plus a cost-based optimizer and Adaptive Query Execution) produces a physical plan, and actions trigger distributed execution across executor pods. The defining invariant: transformations are lazy, actions are eager.

Example (target API)

using DeltaSharp.Sql;
using static DeltaSharp.Sql.Functions;

using var spark = SparkSession.Builder()
    .AppName("quickstart")
    .GetOrCreate();

// Read a Delta table, transform lazily, act eagerly.
var df = spark.Read().Format("delta").Load("/data/events");

df.Filter(Col("country") == "US")
  .GroupBy("device")
  .Agg(Count("*").As("events"))
  .OrderBy(Col("events").Desc())
  .Show();

// SQL is a first-class door into the same engine.
spark.Sql("SELECT device, COUNT(*) AS n FROM delta.`/data/events` GROUP BY device")
     .Show();

Architecture & design

Decisions are recorded as Architecture Decision Records — the source of truth:

Roadmap

See ROADMAP.md — milestones mapped to the ADRs.

Building from source

DeltaSharp targets .NET 10 (engine) and multi-targets public libraries (net8.0;net10.0). Once the solution is scaffolded:

dotnet restore
dotnet build -c Release
dotnet test
dotnet format --verify-no-changes

Contributing

We welcome code, tests, docs, triage, and design proposals.

Security

Please report vulnerabilities privately — see SECURITY.md.

License

Apache License 2.0 — see LICENSE and NOTICE. Contributions are accepted under the same license via DCO sign-off.

About

No description, website, or topics provided.

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages