feat(executor): physical planning bridge to EPIC-03 backend + local query executor (STORY-04.6.2)#414
Conversation
…uery executor (STORY-04.6.2) Rebased onto merged main (#172 optimizer, #173/#177 actions+Row+seam). This lane now consumes the REAL merged Core seam instead of its earlier stand-ins: - LocalQueryExecutor implements DeltaSharp.Execution.IQueryExecutor (Collect(LogicalPlan)->IReadOnlyList<Row>, Count(LogicalPlan)->long). - RowMaterializer builds the real DeltaSharp.Row from ColumnBatch results using the analyzed plan's output StructType, null-aware and DataType-mapped. - ExecutorRegistration wires the real SparkSession.RegisterQueryExecutorFactory hook via InternalsVisibleTo("DeltaSharp.Executor"). The #414 Core-seam stand-ins (Row, IQueryExecutor, UnregisteredQueryExecutor, SparkSession hook edits, InternalsVisibleTo, PublicAPI.Unshipped Row entries) were dropped; those types are owned by merged #173. This PR adds only the Executor-side physical-planning bridge and its tests — no Core changes. The optimizer is intentionally NOT wired into execution (respects #415); LocalQueryExecutor.Collect receives the analyzed plan and does physical planning only. Closes #174 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Ken Haines <1144092+khaines@users.noreply.github.com>
07d1e5e to
dda7aee
Compare
…dup-name diagnostic, coverage + docs (STORY-04.6.2 council) MEDIUM dup-name->UnsupportedPlanException(#419); decimal new-decimal(scale-preserving)+reject scale>28/>96-bit; Date/Timestamp->DateOnly/DateTime(UTC) roundtrip; batch-ownership invariant(#420); LogicalOutput self-check strengthen(name/type, #421); type-matrix + selection + all-null/empty + unsupported-expr tests; nits (dead AnsiMode, TryGetBatches nullable out, no-op discard, ConcurrentDictionary); doc Row->#177 provenance + xrefs + backend-parity reframe Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Ken Haines <1144092+khaines@users.noreply.github.com>
…ccurate backend-parity framing + timestamp guard (STORY-04.6.2 council r2) MEDIUM: PlanDistinct now derives a collision-proof internal probe name (UniqueProbeName, reserved '__distinct_count' with a numeric suffix on the improbable child-schema collision) instead of a hardcoded 'count', so df.GroupBy(x).Count().Distinct() dedups and returns rows (Spark parity) rather than throwing SchemaValidationException on the intermediate [x, count, count] schema. MEDIUM: reframed the backend-parity check (physical-planning.md §10, LocalQueryExecutor.OptionsFor, EndToEndExecutionTests) as a real interpreted-vs-compiled EXPRESSION-evaluation differential: both selections share InterpretedOperators dispatch, Default resolves to CompiledBackend (ADR-0001 codegen tier, STORY-03.4.2) which fuses scalar expressions via Expression.Compile under dynamic code (identical under AOT). Dropped every closed-#148 citation; operator-level codegen referenced as out of scope (ADR-0001 Follow-ups / EPIC-13, #309/#310). Hardening: ReadTimestamp now guards the epoch-micros -> DateTime conversion (checked *10 + range check) and throws a deterministic UnsupportedPlanException instead of a raw ArgumentOutOfRangeException / silent mis-decode, mirroring the decimal path. Added a multi-batch accumulation test enforcing the PhysicalRuntime.Run batch-ownership invariant (2+ source batches -> all rows in global order after drain+dispose). Tightened PhysicalRuntime wording (fresh, independently-owned output; ExecutionContext owns an inert lazy spill store today, #420). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Ken Haines <1144092+khaines@users.noreply.github.com>
…p out-of-range guard Round-3 council fix for PR #414 (STORY-04.6.2 / #174). Writer and QueryExec re-score seats both independently held at 4/5 on the round-2 backend-parity reframe, which overstated the executor end-to-end check (InterpretedAndDefaultBackends_ProduceIdenticalRows) as a "genuine interpreted-vs-compiled expression differential." That is false against the code: both CompiledBackend.Open and InterpretedVectorizedBackend.Open delegate to the same InterpretedOperators.Open, which always builds interpreted ExpressionEvaluators (backend name only attributes exceptions). CompiledBackend's Expression.Compile scalar fusion (STORY-03.4.2) is not wired into the operator Open() path, so both selections currently run byte-identical interpreted code — a plumbing/smoke cross-check, not a differential. The genuine expression-level differential lives in the Engine BackendParityOracle (#154), which calls BuildExpressionEvaluator directly. Reverts the three framing sites to the accurate wording (design doc §10, LocalQueryExecutor.OptionsFor comment, EndToEndExecutionTests comment), keeping the corrected forward trackers (EPIC-13 / #309/#310, no stale #148). Also documents the already-implemented TimestampType out-of-range guard (RowMaterializer.ReadTimestamp -> deterministic UnsupportedPlanException, backed by Timestamp_OutOfDateTimeRange_ThrowsDeterministicUnsupported) in design doc §7/§8, mirroring the decimal not-representable diagnostic (Writer Finding 2). Docs/comments only — no code behavior change. Build clean both TFMs (-warnaserror), format clean, Executor 47/47, PublicAPI delta empty. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Ken Haines <1144092+khaines@users.noreply.github.com>
…och-day (raw-exception leak) Red-team MISS-FOUND on PR #414 (STORY-04.6.2 / #174). RowMaterializer.ReadDate had the same raw-exception-leak class the council fixed for Timestamp and Decimal, but the guard was never applied to Date: an epoch-day whose date falls outside DateOnly's representable range (e.g. int.MaxValue, ~5.9M years past 1970) made UnixEpochDate.AddDays(epochDay) throw a raw System.ArgumentOutOfRangeException straight to the caller, breaking the bridge's contract that an unrepresentable materialized value must surface a deterministic UnsupportedPlanException (as Timestamp and Decimal already do). Fix mirrors the ReadTimestamp guard exactly: catch the ArgumentOutOfRangeException from DateOnly.AddDays and rethrow OutOfRangeDate -> deterministic UnsupportedPlanException naming the offending epoch-day and DateOnly. Materializer sweep confirms this was the only remaining unguarded path: Boolean/Byte/numeric are exact primitive reads, Binary is a byte copy, String uses Encoding.UTF8.GetString (replacement fallback, never throws on bad bytes), and Decimal/Timestamp were already guarded. Adds Date_OutOfDateOnlyRange_ThrowsDeterministicUnsupported (int.MaxValue) and Date_MinIntEpochDay_ThrowsDeterministicUnsupported (int.MinValue) proving the guard, and documents the Date out-of-range diagnostic in design doc §7/§8 alongside the Timestamp/Decimal ones. Executor-only, no Core/Engine change. Build clean both TFMs (-warnaserror), format clean, Executor 49/49 (was 47), PublicAPI delta empty. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Ken Haines <1144092+khaines@users.noreply.github.com>
…-over-eager) Codifies the positive-boundary property the council + red-team verified for the ReadDate out-of-range guard (Quality re-score nit): the extreme IN-RANGE epoch-days materialize to the correct calendar date, while one day past either bound is a deterministic UnsupportedPlanException. DateOnly's representable window (0001-01-01..9999-12-31) is exactly epoch-days [-719162, 2932896] (days since 1970-01-01); the test asserts both extremes round-trip to DateOnly.Min/MaxValue and that maxEpochDay+1 / minEpochDay-1 each throw. Guards against a future refactor making the guard over-eager (rejecting valid dates) — the exact concern several council seats verified analytically (QueryExec: full-int-range proof; Columnar: exhaustive 2^32 sweep, 0 silent in-range returns). Test-only, no production change (production behavior certified at 0b84922: unanimous 7x5/5 council + red-team NO-MISS-CERTIFIED). Build clean both TFMs (-warnaserror), Executor 50/50 (was 49), format clean, PublicAPI delta empty. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Ken Haines <1144092+khaines@users.noreply.github.com>
🟢 Review-Fix-Loop — PASS (unanimous 7×5/5 + red-team NO-MISS-CERTIFIED, orchestrator anti-forgery re-verified)PR #414 — This PR delivers the physical-planning bridge that lowers the Core logical plan → EPIC-03 columnar operators and materializes results back to Core Progression
Council composition (verified)
Spine = Opus 4.8 ×5 + GPT-5.5 (Quality); red-team decorrelated on Gemini (a family used by no voting seat). All execution-eligible (C7) claims were RUN in scratch clones outside the worktree. Findings (all resolved)🟠 HIGH — Raw-exception leak in
|
Summary
STORY-04.6.2 (#174) — the physical-planning bridge that maps Core's
LogicalPlan→ Engine's EPIC-03 executable operators, making a DeltaSharp query run end-to-end for the first time.PhysicalPlanner+PhysicalPlanmodel (src/DeltaSharp.Executor/Physical/): optimized/analyzedLogicalPlan→ tree of physical operators, one strategy per supported M1 node.IExecutionBackend(interpreted vectorized default per ADR-0001), pullingColumnBatches.ColumnBatch→Rowmaterialization (RowMaterializer): null-aware,DataType-mapped per ADR-0002.LocalQueryExecutor : IQueryExecutorregistered intoSparkSessionvia a[ModuleInitializer], so a session in any app referencingDeltaSharp.Executorexecutes for real.createDataFrameis [Story] STORY-04.1.2: Read door and DataFrame creation from local inputs #158, deferred).docs/engineering/design/physical-planning.md(doc-first).LogicalPlan node → EPIC-03 operator mapping
ScanPlanInMemoryScanOperatorProjectProjectPlanProjectOperatorFilterFilterPlanFilterOperatorAggregateAggregatePlanAggregateOperatorJoin(equi)JoinPlanJoinOperatorSortSortPlanSortOperatorLimitLimitPlan(bridge)SelectionVectorDistinctProjectPlan(AggregatePlan(group-by-all, COUNT(*)))AggregateOperatorUnionUnionPlan(bridge)ProjectOperator)UnsupportedPlanException(deterministic)AC → test map
PhysicalPlanShapeTests(shape per node) +EndToEndExecutionTests(FilterThenProject,GroupByAgg,InnerJoin,OrderByDescThenLimit,Limit,Distinct,Union, compose) asserting exactRowvalues + schema over the real backend.UnsupportedPlanTests(cross join, no-condition cartesian, theta predicate; determinism check).EndToEndExecutionTests.Count_MatchesCollectCount_*.InterpretedAndDefaultBackends_ProduceIdenticalRows.SessionRegistrationTests(session resolvesLocalQueryExecutor; executes end-to-end through it).Validation
dotnet build -c Release -warnaserror— clean (0/0).dotnet test— all green: Executor 29, Engine 2781, Core 573 (×net8.0/net10.0). No regressions.dotnet format --verify-no-changes— clean.net10.0, non-packable (no PublicAPI baseline there).origin/main)Row, internalIQueryExecutor,SparkSessionregistration hook (src/DeltaSharp.Core/Execution/,SparkSession.cs). Confirm exact signatures against merged [Story] STORY-04.6.1: Collect, count, and show actions #173 and drop stand-ins if [Story] STORY-04.6.1: Collect, count, and show actions #173 owns them.DeltaSharp.Row.*inPublicAPI.Unshipped.txt) —Rowbelongs to [Story] STORY-04.6.1: Collect, count, and show actions #173; expected to become empty after rebase.LogicalOutput.cs) mirrors the analyzer's non-idempotent ExprId minting — the top fragility/rebase risk against [Story] STORY-04.5.3: Minimal logical optimization rules for local execution #172's optimizer. Self-checking: a resolution miss throwsUnsupportedPlanException. See design doc §5/§10.Closes #174