feat(core): collect/count/show actions and Row materialization (STORY-04.6.1, STORY-04.7.1)#412
Conversation
…-04.6.1, STORY-04.7.1) Add the first DataFrame actions (Collect/Count/Show) and the public Row materialization contract, wired through a dependency-inversion IQueryExecutor seam so packable net8.0;net10.0 Core drives execution without referencing the net10.0-only engine. - Row (public, #177): schema-carrying, immutable; ordinal + by-name access, IsNullAt/AnyNull, GetAs<T>, FieldIndex, ToString; deterministic error model and ADR-0008 null/type semantics. - IQueryExecutor (internal seam) + UnsupportedQueryExecutor default that throws a clear public QueryExecutionException until DeltaSharp.Executor (#174) registers a backend. Core grants InternalsVisibleTo("DeltaSharp.Executor"). - SparkSession holds a catalog + IQueryExecutor with a per-session setter and a process-wide RegisterQueryExecutorFactory hook for #174. - DataFrame.Collect/Count/Show(+internal ShowString): analyze -> optimize seam (#172, identity today) -> executor; Show renders a Spark-style table without mutating the plan. Session propagates through all transformations. - Lazy/eager proven via the #169 audit seam: transformations record no stages; each action records exactly one Analyzer stage (Analyzer->Planner->Backend). - Design doc docs/engineering/design/actions-and-row.md; PublicAPI.Unshipped updated. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Ken Haines <1144092+khaines@users.noreply.github.com>
1cb3080 to
d81b877
Compare
… thread-safe executor, correct optimizer-seam docs (STORY-04.6.1/04.7.1 council) A Limit/Distinct now thread Session (single-arg ctor dropped it -> bound-frame actions threw); table-driven guard asserts every transformation keeps the same non-null session and stays executable. B QueryExecutor getter publishes atomically via Volatile.Read + Interlocked.CompareExchange; _queryExecutorFactory marked volatile; fail-closed default preserved. C Row gains structural Equals/GetHashCode (schema + null-aware ordinal element equality). D optimizer-seam narrative corrected: standalone #172 Optimizer already merged; action-pipeline Optimize is an intentional identity pass in M1, wiring deferred to #174 gated on #415; AC1 phrasing fixed (no Optimizer ExecutionStage). E GetAs out-of-range + by-name missing-field tests. F stopped-session guard in RequireSession, arch-guard also bans Executor refs, Show header derived from analyzed output schema (empty result still renders headers), PublicAPI sorted, doc trivia. Tracks #416/#417/#418. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Ken Haines <1144092+khaines@users.noreply.github.com>
…uality + hash coverage, doc/init nits (STORY-04.6.1/04.7.1 council r2)
1 Show derives its header from a dup-name-tolerant (name,type,nullable) output list (Analyzer.Resolve out overload) instead of the dup-rejecting StructType ctor, so df.Join(other).Show() and df.Select(Col("name"),Col("name")).Show() render duplicate headers like Spark where Collect/Count already succeed; Row-materialization policy tracked by #419. 2 Add distinct-hash test pinning Row.GetHashCode against a constant-hash mutation. 3 Row.Equals/GetHashCode compare byte[] (BinaryType) by content (Spark parity); nested-type deep equality folded into #418. 4 Doc Row equality as schema-inclusive, intentionally stricter than Spark's values-only Row.equals. 5 Row IReadOnlyList ctor shares one single-clone init path. 6 Soften factory 'never double-invokes' doc/comment to 'publishes exactly one executor (loser discarded)'. 7 Guard test covers GroupBy.Count/GroupBy.Agg via RelationalGroupedDataset.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Ken Haines <1144092+khaines@users.noreply.github.com>
… council nit) Mirrors the Collect/Count single-analyze-stage audits; confirms Show's header-from-analyzed-schema derives columns from the one ResolveCore pass (no second Analyzer stage). QueryExec nit. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Ken Haines <1144092+khaines@users.noreply.github.com>
🌟 Review-Fix-Loop Report — PR #412 (STORY-04.6.1 #173 + STORY-04.7.1 #177)Run identity · HEAD Council composition (verified)
Progression (2 fix rounds)
Findings & dispositions🔴 HIGH (all 7 seats independently found it): 🟠 MEDIUM (fixed):
🔵 LOW/coverage (fixed): 🟠 Deferrals — filed + orchestrator-verified OPEN: #415 (engine eager Adversarial red-team (gemini-3.1-pro-preview, decorrelated, shell): NO-MISS-CERTIFIEDFalsification attempts all failed: a reflection enumeration of every Orchestrator anti-forgery: independently confirmed source-exhaustively that all 13 frame constructions thread Validation evidence
Recommendation: APPROVE / merge-ready (squash). |
…uery executor (#414) Delivers the physical-planning bridge that lowers the Core logical plan to EPIC-03 columnar operators and materializes results back to Core Rows — DeltaSharp's first end-to-end local query execution (Collect/Count over an in-memory relation), the final piece of Batch N (after #411 optimizer + #412 actions/Row). Executor-only (Core/Engine untouched, PublicAPI delta empty). Every unmaterializable value or unmapped plan node surfaces a deterministic UnsupportedPlanException; decimal is scale-preserving, Date/Timestamp round-trip the CLR temporal types lit() produces, and Distinct lowering uses a collision-proof probe name. Reviewed to a unanimous 7x5/5 council + decorrelated red-team NO-MISS-CERTIFIED (which caught a raw-exception leak in ReadDate the voting council missed) + orchestrator anti-forgery. Deferrals tracked: #419, #420, #421, #422. Closes #174 Signed-off-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Implements the first DataFrame actions (
Collect/Count/Show) and the publicRowmaterialization contract, together as one design-doc-first change.Closes #173
Closes #177
What landed
Row(public, [Story] STORY-04.7.1: Row and schema materialization #177) — schema-carrying, immutable; ordinal + by-name access,IsNullAt/AnyNull,GetAs<T>,FieldIndex,ToString. Null-aware, deterministic error model.IQueryExecutor(internal dependency-inversion seam) +UnsupportedQueryExecutordefault that throws a clear publicQueryExecutionException("reference DeltaSharp.Executor") until [Story] STORY-04.6.2: Physical planning bridge to EPIC-03 backend #174 registers a real backend. Core grantsInternalsVisibleTo("DeltaSharp.Executor").SparkSessionholds a catalog +IQueryExecutor, with a per-session setter and a process-wideRegisterQueryExecutorFactoryhook for [Story] STORY-04.6.2: Physical planning bridge to EPIC-03 backend #174.DataFrame.Collect()/Count()/Show()(+ internalShowString) — analyze → optimize seam ([Story] STORY-04.5.3: Minimal logical optimization rules for local execution #172 identity today) → executor;Showrenders a Spark-style table without mutating the plan. Session propagates through all transformations.The invariant (proven)
Transformations stay lazy, actions are eager. Verified via the #169
ExecutionAuditseam: a transformation chain records 0 Analyzer stages; each action records exactly 1 (Analyzer→Planner→Backend).Layering seam for #174
DeltaSharp.Executor(physical planning, sibling lane) implementsIQueryExecutorand materializes EngineColumnBatch→Row. Exact shapes documented indocs/engineering/design/actions-and-row.md.Validation
dotnet build -c Release -warnaserror— clean, net8.0 + net10.0dotnet test— Core 611/611 both TFMs; full solution greendotnet format --verify-no-changes— cleanPublicAPI.Unshipped.txtupdated (expected diff)Design doc:
docs/engineering/design/actions-and-row.md