fix: exact-precision SUM/AVG over xsd:decimal and xsd:integer#1249
Merged
Conversation
The numeric extractors used by SPARQL SUM and AVG only recognized xsd:long / xsd:double / xsd:boolean and silently dropped FlakeValue::Decimal and FlakeValue::BigInt. SUM over a column of xsd:decimal values therefore returned 0 (the additive identity in the streaming accumulator), and AVG returned Unbound (count stayed at 0). MIN/MAX/COUNT were unaffected because they don't go through the numeric extractor. Replace the four ad-hoc extractors with a shared NumericAcc in fluree-db-query/src/aggregate.rs that handles every XSD numeric class (Long, BigInt, Decimal, Double, Boolean) and follows the W3C XPath numeric promotion lattice — Integer → Decimal → Double, sticky upward. Accumulation is exact via BigDecimal for the integer/decimal path; only xsd:double inputs collapse the accumulator to f64. Both the streaming (group_aggregate.rs AggState) and non-streaming (aggregate.rs agg_*) paths share the same accumulator, eliminating divergence between them. The streaming path also gains NUM_BIG decoding via BinaryGraphView, so arena-encoded BigInt/Decimal values from the predicate dictionary are picked up by SUM/AVG (previously they would have been dropped). Behavior change visible to users: AVG of xsd:integer inputs now yields xsd:decimal (per XPath op:numeric-divide on integers) instead of xsd:double. In JSON-LD output this serializes as a string rather than a number. AVG division precision is capped at 34 significant digits (decimal128) to keep recurring decimals from expanding to BigDecimal's default 100-digit precision. W3C SPARQL 1.1 eval suite: +7 newly passing tests (agg-avg-01/02/distinct, agg-sum-01/02/distinct, agg-err-02), zero regressions. Workspace nextest green.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
SPARQL
SUM(?x)over anxsd:decimal-typed variable returned0, andAVG(?x)returned an unbound result. Casting toxsd:doubleworked, andMIN/MAX/COUNTover the same decimals worked — only the arithmeticaggregates were broken, and only on the decimal path.
SUM(xsd:integer)wasfine.
The four numeric extractors used by SUM/AVG (in
aggregate.rsandgroup_aggregate.rs) only matchedFlakeValue::Long/Double/Booleanand had a fall-through
_ => Nonethat silently droppedFlakeValue::Decimal(BigDecimal) and
FlakeValue::BigInt. The streaming SUM accumulatorinitializes
total = 0.0, has_int_only = true, so with every decimal valuefiltered out it finalized to
Long(0)typed asxsd:integer. AVG keptcount = 0and finalized toUnbound. That explains the asymmetric"SUM = 0, AVG = empty" failure mode exactly.
MIN/MAXwere unaffected because they compare bindings directly viacompare_for_minmaxrather than going through the numeric extractor.What changed
Added a shared
NumericAccaccumulator influree-db-query/src/aggregate.rsthat handles every XSD numeric class —
Long,BigInt,Decimal,Double,Boolean— and follows the W3C XPath numeric promotion lattice:Integer → Decimal → Double, sticky upward.
BigDecimalfor the integer/decimal path.xsd:doubleinputs collapse the accumulator intof64.group_aggregate.rsAggState::Sum/Avg)and the non-streaming aggregate (
aggregate.rsagg_sum/agg_avg)share the same accumulator, eliminating divergence between the two
paths.
The streaming path additionally decodes
EncodedLitof kindNUM_BIG(the per-predicate arena handle for BigInt/BigDecimal) via
BinaryGraphView::decode_value_from_kind, so arena-encoded values reachthe numeric accumulator instead of being dropped.
MEDIAN/VARIANCE/STDDEV(which intrinsically operate inf64)now also see decimal and BigInt inputs via the shared
binding_to_numerichelper — previously they were silently dropped too.Added
Sid::xsd_decimal()helper tofluree-db-core/src/sid.rs, mirroringthe existing
xsd_integer/xsd_double/xsd_stringaccessors.Behavior change to be aware of
AVGofxsd:integerinputs now returnsxsd:decimalinstead ofxsd:double. This matches the W3C / XPath rule thatop:numeric-divide(xsd:integer, xsd:integer)yieldsxsd:decimal— theprevious double result was non-compliant.
In JSON-LD output this means the AVG cell serializes as a string
(
"20","42.333...") rather than a JSON number, because JSON-LDformatting of
xsd:decimalrenders as a string for exactness. SPARQL JSONoutput gains the explicit
xsd:decimaldatatype. Four existing tests thatasserted the old double-typed behavior were updated.
To prevent recurring decimals from expanding to BigDecimal's default
100-digit precision (e.g.
0.3333…× 100), AVG division is capped atAVG_DECIMAL_PRECISION = 34significant digits — IEEE-754 decimal128 — andtrailing zeros are stripped via
.normalized().