Skip to content

feature - implement RFC 023 approximate aggregates (#40)#53

Merged
dannymeijer merged 3 commits into
mainfrom
feature/40-rfc023-approximate-functions-v2
May 30, 2026
Merged

feature - implement RFC 023 approximate aggregates (#40)#53
dannymeijer merged 3 commits into
mainfrom
feature/40-rfc023-approximate-functions-v2

Conversation

@dannymeijer
Copy link
Copy Markdown
Collaborator

Summary

Implements the portable RFC 023 approximate aggregate slice from current main:

  • Adds approx_count_distinct(...) and approx_percentile(..., accuracy=10000) as registry-backed aggregate helpers.
  • Adds approximation policy metadata and stable InQL Substrait extension anchors.
  • Carries aggregate argument lists through Prism/Substrait so approx_percentile keeps percentile and accuracy parameters.
  • Keeps DataFusion implementation-name rewrites adapter-local (approx_distinct, approx_percentile_cont) without changing emitted InQL Substrait names.
  • Adds Draft RFC 025 as the typed sketch logical value follow-up, linked to RFC 025: Typed sketch logical values #51, so RFC 023 does not smuggle sketch state in as strings/bytes.

Replaces closed stale PR #50 with a clean branch based on the merged RFC 022 mainline.

Type of change

  • Bug fix
  • New feature
  • Refactor / maintenance
  • Documentation
  • CI / tooling
  • RFC (adds/updates docs/rfcs/*)

Area(s)

  • Package & tests
  • Specification (RFCs)
  • Documentation
  • Automation & repo config
  • Other

Key details

  • User-facing behavior: users can build approximate grouped aggregates with approx_count_distinct(expr) and approx_percentile(expr, percentile, accuracy=10000). Approximate percentile output names include percentile and accuracy so distinct estimates over the same input column do not collide.
  • Internals: aggregate measures now carry explicit argument lists; Prism structural equality compares those arguments; Substrait aggregate lowering uses the argument list rather than rebuilding from the primary expression; DataFusion consumer-plan adaptation only rewrites extension declarations at the adapter boundary.
  • Risks: DataFusion approximate aggregate behavior is implementation-dependent, so docs describe approximation semantics without promising bit-for-bit portability. Sketch-state helpers remain RFC 025 Draft rather than lowerable RFC 023 functions.

Testing / verification

  • make fmt INCAN=/Users/danny/Development/encero/incan/target/debug/incan
  • incan test tests/test_approximate_functions.incn
  • incan test tests/test_function_registry.incn
  • incan test tests/test_substrait_plan.incn
  • incan test tests/test_prism.incn
  • incan test tests/test_session_aggregates.incn
  • make fmt-check INCAN=/Users/danny/Development/encero/incan/target/debug/incan
  • make test-style
  • make registry-metadata INCAN=/Users/danny/Development/encero/incan/target/debug/incan
  • make build INCAN=/Users/danny/Development/encero/incan/target/debug/incan
  • make test INCAN=/Users/danny/Development/encero/incan/target/debug/incan (186 passed)
  • make smoke-consumer INCAN=/Users/danny/Development/encero/incan/target/debug/incan
  • make pre-commit INCAN=/Users/danny/Development/encero/incan/target/debug/incan
  • Manual verification described below

Manual verification notes:

  • Rebuilt local Incan from merged #718 before testing; smoke-consumer needed a network-enabled rerun after sandbox DNS blocked crates.io lockfile generation.
  • Review reports were recorded in the local InQL worktree under .agents/state/ and not in the central Incan findings corpus.

Docs impact

  • No docs changes needed
  • Docs updated
  • Docs follow Divio intent (tutorial/how-to/reference/explanation) where applicable

If docs updated:

  • Link(s): docs/language/reference/functions/approximate.md, docs/language/reference/builders/aggregates.md, docs/rfcs/023_approximate_sketch_functions.md, docs/rfcs/025_typed_sketch_logical_values.md, docs/rfcs/README.md, docs/release_notes/v0_1.md

Checklist

  • I kept public docs user-focused and moved internals to contributing docs when appropriate
  • I avoided duplicating canonical install/run instructions in multiple places
  • I added/updated tests where it materially reduces regressions

Closes #40.
Refs #51.

@incan-triage-bot incan-triage-bot Bot added documentation Improvements or additions to documentation package Library source, tests, incan.toml specification docs/rfcs/ normative RFCs labels May 30, 2026
@dannymeijer dannymeijer marked this pull request as ready for review May 30, 2026 14:52
@dannymeijer dannymeijer merged commit 41a0405 into main May 30, 2026
4 checks passed
@dannymeijer dannymeijer deleted the feature/40-rfc023-approximate-functions-v2 branch May 30, 2026 15:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation package Library source, tests, incan.toml specification docs/rfcs/ normative RFCs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

RFC 023: Approximate and sketch functions

1 participant