Skip to content

fix: exact-precision SUM/AVG over xsd:decimal and xsd:integer#1249

Merged
bplatz merged 1 commit into
mainfrom
fix/aggregate-sum-avg-decimal-promotion
May 21, 2026
Merged

fix: exact-precision SUM/AVG over xsd:decimal and xsd:integer#1249
bplatz merged 1 commit into
mainfrom
fix/aggregate-sum-avg-decimal-promotion

Conversation

@bplatz
Copy link
Copy Markdown
Contributor

@bplatz bplatz commented May 20, 2026

Summary

SPARQL SUM(?x) over an xsd:decimal-typed variable returned 0, and
AVG(?x) returned an unbound result. Casting to xsd:double worked, and
MIN/MAX/COUNT over the same decimals worked — only the arithmetic
aggregates were broken, and only on the decimal path. SUM(xsd:integer) was
fine.

The four numeric extractors used by SUM/AVG (in aggregate.rs and
group_aggregate.rs) only matched FlakeValue::Long / Double / Boolean
and had a fall-through _ => None that silently dropped FlakeValue::Decimal
(BigDecimal) and FlakeValue::BigInt. The streaming SUM accumulator
initializes total = 0.0, has_int_only = true, so with every decimal value
filtered out it finalized to Long(0) typed as xsd:integer. AVG kept
count = 0 and finalized to Unbound. That explains the asymmetric
"SUM = 0, AVG = empty" failure mode exactly.

MIN/MAX were unaffected because they compare bindings directly via
compare_for_minmax rather than going through the numeric extractor.

What changed

  • Added a shared NumericAcc accumulator in fluree-db-query/src/aggregate.rs
    that handles every XSD numeric class — Long, BigInt, Decimal, Double,
    Boolean — and follows the W3C XPath numeric promotion lattice:
    Integer → Decimal → Double, sticky upward.

    • Accumulation is exact via BigDecimal for the integer/decimal path.
    • Only xsd:double inputs collapse the accumulator into f64.
    • Both the streaming aggregate (group_aggregate.rs AggState::Sum / Avg)
      and the non-streaming aggregate (aggregate.rs agg_sum / agg_avg)
      share the same accumulator, eliminating divergence between the two
      paths.
  • The streaming path additionally decodes EncodedLit of kind NUM_BIG
    (the per-predicate arena handle for BigInt/BigDecimal) via
    BinaryGraphView::decode_value_from_kind, so arena-encoded values reach
    the numeric accumulator instead of being dropped.

  • MEDIAN / VARIANCE / STDDEV (which intrinsically operate in f64)
    now also see decimal and BigInt inputs via the shared
    binding_to_numeric helper — previously they were silently dropped too.

  • Added Sid::xsd_decimal() helper to fluree-db-core/src/sid.rs, mirroring
    the existing xsd_integer / xsd_double / xsd_string accessors.

Behavior change to be aware of

AVG of xsd:integer inputs now returns xsd:decimal instead of
xsd:double. This matches the W3C / XPath rule that
op:numeric-divide(xsd:integer, xsd:integer) yields xsd:decimal — the
previous double result was non-compliant.

In JSON-LD output this means the AVG cell serializes as a string
("20", "42.333...") rather than a JSON number, because JSON-LD
formatting of xsd:decimal renders as a string for exactness. SPARQL JSON
output gains the explicit xsd:decimal datatype. Four existing tests that
asserted the old double-typed behavior were updated.

To prevent recurring decimals from expanding to BigDecimal's default
100-digit precision (e.g. 0.3333… × 100), AVG division is capped at
AVG_DECIMAL_PRECISION = 34 significant digits — IEEE-754 decimal128 — and
trailing zeros are stripped via .normalized().

The numeric extractors used by SPARQL SUM and AVG only recognized
xsd:long / xsd:double / xsd:boolean and silently dropped FlakeValue::Decimal
and FlakeValue::BigInt. SUM over a column of xsd:decimal values therefore
returned 0 (the additive identity in the streaming accumulator), and AVG
returned Unbound (count stayed at 0). MIN/MAX/COUNT were unaffected
because they don't go through the numeric extractor.

Replace the four ad-hoc extractors with a shared NumericAcc in
fluree-db-query/src/aggregate.rs that handles every XSD numeric class
(Long, BigInt, Decimal, Double, Boolean) and follows the W3C XPath
numeric promotion lattice — Integer → Decimal → Double, sticky upward.
Accumulation is exact via BigDecimal for the integer/decimal path; only
xsd:double inputs collapse the accumulator to f64. Both the streaming
(group_aggregate.rs AggState) and non-streaming (aggregate.rs agg_*)
paths share the same accumulator, eliminating divergence between them.

The streaming path also gains NUM_BIG decoding via BinaryGraphView, so
arena-encoded BigInt/Decimal values from the predicate dictionary are
picked up by SUM/AVG (previously they would have been dropped).

Behavior change visible to users: AVG of xsd:integer inputs now yields
xsd:decimal (per XPath op:numeric-divide on integers) instead of
xsd:double. In JSON-LD output this serializes as a string rather than a
number. AVG division precision is capped at 34 significant digits
(decimal128) to keep recurring decimals from expanding to BigDecimal's
default 100-digit precision.

W3C SPARQL 1.1 eval suite: +7 newly passing tests
(agg-avg-01/02/distinct, agg-sum-01/02/distinct, agg-err-02), zero
regressions. Workspace nextest green.
@bplatz bplatz requested review from aaj3f and zonotope May 20, 2026 03:32
Copy link
Copy Markdown
Contributor

@zonotope zonotope left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🌫️

@bplatz bplatz merged commit 9bab51b into main May 21, 2026
14 checks passed
@bplatz bplatz deleted the fix/aggregate-sum-avg-decimal-promotion branch May 21, 2026 09:38
@bplatz bplatz mentioned this pull request May 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants