Skip to content

feat(core): collect/count/show actions and Row materialization (STORY-04.6.1, STORY-04.7.1)#412

Merged
khaines merged 4 commits into
mainfrom
khaines/feat-04.6.1-actions
Jul 3, 2026
Merged

feat(core): collect/count/show actions and Row materialization (STORY-04.6.1, STORY-04.7.1)#412
khaines merged 4 commits into
mainfrom
khaines/feat-04.6.1-actions

Conversation

@khaines

@khaines khaines commented Jul 3, 2026

Copy link
Copy Markdown
Owner

Implements the first DataFrame actions (Collect/Count/Show) and the public Row materialization contract, together as one design-doc-first change.

Closes #173
Closes #177

What landed

The invariant (proven)

Transformations stay lazy, actions are eager. Verified via the #169 ExecutionAudit seam: a transformation chain records 0 Analyzer stages; each action records exactly 1 (Analyzer→Planner→Backend).

Layering seam for #174

DeltaSharp.Executor (physical planning, sibling lane) implements IQueryExecutor and materializes Engine ColumnBatchRow. Exact shapes documented in docs/engineering/design/actions-and-row.md.

Validation

  • dotnet build -c Release -warnaserror — clean, net8.0 + net10.0
  • dotnet test — Core 611/611 both TFMs; full solution green
  • dotnet format --verify-no-changes — clean
  • PublicAPI.Unshipped.txt updated (expected diff)

Design doc: docs/engineering/design/actions-and-row.md

…-04.6.1, STORY-04.7.1)

Add the first DataFrame actions (Collect/Count/Show) and the public Row
materialization contract, wired through a dependency-inversion IQueryExecutor
seam so packable net8.0;net10.0 Core drives execution without referencing the
net10.0-only engine.

- Row (public, #177): schema-carrying, immutable; ordinal + by-name access,
  IsNullAt/AnyNull, GetAs<T>, FieldIndex, ToString; deterministic error model
  and ADR-0008 null/type semantics.
- IQueryExecutor (internal seam) + UnsupportedQueryExecutor default that throws
  a clear public QueryExecutionException until DeltaSharp.Executor (#174)
  registers a backend. Core grants InternalsVisibleTo("DeltaSharp.Executor").
- SparkSession holds a catalog + IQueryExecutor with a per-session setter and a
  process-wide RegisterQueryExecutorFactory hook for #174.
- DataFrame.Collect/Count/Show(+internal ShowString): analyze -> optimize seam
  (#172, identity today) -> executor; Show renders a Spark-style table without
  mutating the plan. Session propagates through all transformations.
- Lazy/eager proven via the #169 audit seam: transformations record no stages;
  each action records exactly one Analyzer stage (Analyzer->Planner->Backend).
- Design doc docs/engineering/design/actions-and-row.md; PublicAPI.Unshipped
  updated.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Ken Haines <1144092+khaines@users.noreply.github.com>
… thread-safe executor, correct optimizer-seam docs (STORY-04.6.1/04.7.1 council)

A Limit/Distinct now thread Session (single-arg ctor dropped it -> bound-frame actions threw); table-driven guard asserts every transformation keeps the same non-null session and stays executable. B QueryExecutor getter publishes atomically via Volatile.Read + Interlocked.CompareExchange; _queryExecutorFactory marked volatile; fail-closed default preserved. C Row gains structural Equals/GetHashCode (schema + null-aware ordinal element equality). D optimizer-seam narrative corrected: standalone #172 Optimizer already merged; action-pipeline Optimize is an intentional identity pass in M1, wiring deferred to #174 gated on #415; AC1 phrasing fixed (no Optimizer ExecutionStage). E GetAs out-of-range + by-name missing-field tests. F stopped-session guard in RequireSession, arch-guard also bans Executor refs, Show header derived from analyzed output schema (empty result still renders headers), PublicAPI sorted, doc trivia. Tracks #416/#417/#418.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Ken Haines <1144092+khaines@users.noreply.github.com>
khaines and others added 2 commits July 3, 2026 01:01
…uality + hash coverage, doc/init nits (STORY-04.6.1/04.7.1 council r2)

1 Show derives its header from a dup-name-tolerant (name,type,nullable) output list (Analyzer.Resolve out overload) instead of the dup-rejecting StructType ctor, so df.Join(other).Show() and df.Select(Col("name"),Col("name")).Show() render duplicate headers like Spark where Collect/Count already succeed; Row-materialization policy tracked by #419. 2 Add distinct-hash test pinning Row.GetHashCode against a constant-hash mutation. 3 Row.Equals/GetHashCode compare byte[] (BinaryType) by content (Spark parity); nested-type deep equality folded into #418. 4 Doc Row equality as schema-inclusive, intentionally stricter than Spark's values-only Row.equals. 5 Row IReadOnlyList ctor shares one single-clone init path. 6 Soften factory 'never double-invokes' doc/comment to 'publishes exactly one executor (loser discarded)'. 7 Guard test covers GroupBy.Count/GroupBy.Agg via RelationalGroupedDataset.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Ken Haines <1144092+khaines@users.noreply.github.com>
… council nit)

Mirrors the Collect/Count single-analyze-stage audits; confirms Show's header-from-analyzed-schema derives columns from the one ResolveCore pass (no second Analyzer stage). QueryExec nit.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Ken Haines <1144092+khaines@users.noreply.github.com>
@khaines

khaines commented Jul 3, 2026

Copy link
Copy Markdown
Owner Author

🌟 Review-Fix-Loop Report — PR #412 (STORY-04.6.1 #173 + STORY-04.7.1 #177)

Run identity · HEAD f975ced · base 2855279 (merged #172) · collect/count/show actions + Row + the IQueryExecutor dependency-inversion seam (Batch N Lane 2).
Result: ✅ VERIFIED PASS — all 7 voting seats unanimous 5/5, decorrelated red-team NO-MISS-CERTIFIED, orchestrator anti-forgery re-verified.

Council composition (verified)

Seat agent_type Model Final
Architect general-purpose claude-opus-4.8 5/5
Balanced general-purpose claude-opus-4.8 5/5
Security general-purpose claude-opus-4.8 5/5
Quality general-purpose gpt-5.5 5/5
DX-API (specialist) developer-experience-api-engineer claude-opus-4.8 5/5
QueryExec (specialist) query-execution-engine-engineer claude-opus-4.8 5/5
Writer (specialist) technical-writer claude-opus-4.8 5/5
Red-team (gate) general-purpose gemini-3.1-pro-preview (decorrelated) NO-MISS-CERTIFIED

Progression (2 fix rounds)

Seat R1 Fix r1 Fix r2 Final
Architect 3/5 5/5 5/5
Security 3/5 5/5 5/5
Writer 3/5 5/5 5/5
Quality 4/5 4/5 5/5 5/5
Balanced 3/5 4/5 5/5 5/5
DX-API 4/5 4/5 5/5 5/5
QueryExec 3/5 4/5 5/5 5/5

Findings & dispositions

🔴 HIGH (all 7 seats independently found it): Limit/Distinct used a session-less constructor → dropped the owning session → df.limit(n).collect() / df.distinct().count() threw InvalidOperationException on a bound frame. Every other transformation threaded Session; these two were missed and the tests never ran an action after a transformation. Fixed: thread Session + a table-driven guard (EveryTransformation_ThreadsTheSameNonNullSession_AndStaysExecutable, incl. GroupBy/Agg) so no future transformation can silently drop it. Red-team + orchestrator confirmed source-exhaustively that no frame construction drops the session.

🟠 MEDIUM (fixed):

🔵 LOW/coverage (fixed): Row.GetAs<T> out-of-range + GetHashCode distinct-hash + byte[] equality tests; stopped-session guard (Collect after Stop() throws); arch-guard extended to ban "Executor"; empty-Show renders real headers (from the analyzed output schema, single analyze pass); Show single-Analyzer-stage audit; Row single-clone ctor; doc corrections (schema-inclusive-stricter-than-Spark equality; factory double-publish; trivia).

🟠 Deferrals — filed + orchestrator-verified OPEN: #415 (engine eager And/Or vs ANSI — gates optimizer wiring), #416 (seam CancellationToken + resource bounds), #417 (DataFrame plan memoization), #418 (Row/Show Spark-parity backlog incl. nested-type deep equality), #419 (duplicate-output-name StructType/Row policy).

Adversarial red-team (gemini-3.1-pro-preview, decorrelated, shell): NO-MISS-CERTIFIED

Falsification attempts all failed: a reflection enumeration of every DataFrame-returning method confirmed all thread the session (no other HIGH-class bug); lazy/eager + single-Analyzer-stage hold; Row equality/hash consistent (byte[]/NaN/schema-nullability); Show dup-name tolerant; optimizer un-wired (Optimize => analyzedPlan); PublicAPI matches; 4152 tests pass. C7 in an out-of-tree scratch clone.

Orchestrator anti-forgery: independently confirmed source-exhaustively that all 13 frame constructions thread Session (zero session-less new(new X(...))), worktree clean, #415#419 OPEN.

Validation evidence

  • Build: -c Release -warnaserror 0W/0E both net8.0+net10.0. Tests: Core 684×2 (was 611; +73 across 2 fix rounds), Engine 2781, Executor 3. Format clean. PublicAPI matches the added surface (Row, Collect/Count/Show, exceptions). Layering: Core ⊄ Engine (reflection-verified); InternalsVisibleTo("DeltaSharp.Executor") the sole coupling.

Recommendation: APPROVE / merge-ready (squash). Closes #173, Closes #177.

@khaines khaines merged commit a810e31 into main Jul 3, 2026
5 checks passed
@khaines khaines deleted the khaines/feat-04.6.1-actions branch July 3, 2026 13:16
khaines added a commit that referenced this pull request Jul 3, 2026
…uery executor (#414)

Delivers the physical-planning bridge that lowers the Core logical plan to EPIC-03
columnar operators and materializes results back to Core Rows — DeltaSharp's first
end-to-end local query execution (Collect/Count over an in-memory relation), the
final piece of Batch N (after #411 optimizer + #412 actions/Row).

Executor-only (Core/Engine untouched, PublicAPI delta empty). Every unmaterializable
value or unmapped plan node surfaces a deterministic UnsupportedPlanException; decimal
is scale-preserving, Date/Timestamp round-trip the CLR temporal types lit() produces,
and Distinct lowering uses a collision-proof probe name.

Reviewed to a unanimous 7x5/5 council + decorrelated red-team NO-MISS-CERTIFIED (which
caught a raw-exception leak in ReadDate the voting council missed) + orchestrator
anti-forgery. Deferrals tracked: #419, #420, #421, #422.

Closes #174

Signed-off-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Story] STORY-04.7.1: Row and schema materialization [Story] STORY-04.6.1: Collect, count, and show actions

1 participant