Skip to content

Support function calls where args have differing row orders#2635

Open
albertlockett wants to merge 3 commits intoopen-telemetry:mainfrom
albertlockett:albert/func-calls-with-misaligned-data
Open

Support function calls where args have differing row orders#2635
albertlockett wants to merge 3 commits intoopen-telemetry:mainfrom
albertlockett:albert/func-calls-with-misaligned-data

Conversation

@albertlockett
Copy link
Copy Markdown
Member

Change Summary

Adds capability to the columnar query engine to invoke functions where the columns passed as arguments have a different "row order"..

Consider calling a function in OPL or KQL like my_func(severity_text, attributes["x"]). In these cases, the severity_text would have the "row order" of the root record batch, where attributes["x"] would have the "row order" of however the parent_ids were sorted in the Log Attrs batch rows where key == "x".

(We can think of "row order" here as order we'd encounter a row belonging to some signal (e.g. log, span, etc.) represented as we scan the column).

Currently the columnar query engine doesn't handle the case where function args have different "row order" and will just return an error. This PR resolves the issue.

It does this by determining if the arguments to some function have different "row order" (in the parlance of the engine's expressions planning, we say they have incompatible DataScopes), and if the scopes are incompatible we must align the rows in each column by performing one or more joins before calling the function.

Noe: the engine's planned expressions are a tree of "scoped exprs" where each node in the tree represents a datafusion expression operating on a single data scope. While evaluating the tree at each node, we take data from the source and possibly perform "joins" on the input (depending on the data source for this node), to create an input record batch for the datafusion expression. A 2-way join was already implemented for binary expressions (using in arithmetic, for example).

This PR adds a new kind of data source called MultiJoin representing an arbitrary number of input expressions that must be joined, and having the convention that the resulting record batch columns will be named like "arg0", "arg1", ... "argn". The planner is changed to create this kind of join for as the data source to a scoped expr node that evaluates a function if any of the args have incompatible DataScopes. The scoped expr node now has logic to perform this MultiJoin when it encounters data source of this type. The mutli join logic is implemented in the pipeline::expr::join module, using the same join utilities we use for binary expressions.

What issue does this PR close?

How are these changes tested?

Unit tests

Are there any user-facing changes?

Yes, users of transform processor can now pass to functions columns as arguments that in OTAP would have different row orders.

@github-actions github-actions bot added rust Pull requests that update Rust code query-engine Query Engine / Transform related tasks query-engine-columnar Columnar query engine which uses DataFusion to process OTAP Batches labels Apr 12, 2026
@albertlockett albertlockett changed the title support funciton calls on misaligned data scopes Support function calls where args have differing row orders Apr 12, 2026
@albertlockett albertlockett marked this pull request as ready for review April 12, 2026 13:50
@albertlockett albertlockett requested a review from a team as a code owner April 12, 2026 13:50
@codecov
Copy link
Copy Markdown

codecov bot commented Apr 12, 2026

Codecov Report

❌ Patch coverage is 89.83051% with 66 lines in your changes missing coverage. Please review.
✅ Project coverage is 88.14%. Comparing base (5868ff1) to head (6fdda33).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2635      +/-   ##
==========================================
- Coverage   88.14%   88.14%   -0.01%     
==========================================
  Files         633      633              
  Lines      235806   236383     +577     
==========================================
+ Hits       207862   208367     +505     
- Misses      27420    27492      +72     
  Partials      524      524              
Components Coverage Δ
otap-dataflow 89.84% <89.83%> (-0.01%) ⬇️
query_abstraction 80.61% <ø> (ø)
query_engine 90.74% <ø> (ø)
otel-arrow-go 52.45% <ø> (ø)
quiver 92.27% <ø> (ø)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

query-engine Query Engine / Transform related tasks query-engine-columnar Columnar query engine which uses DataFusion to process OTAP Batches rust Pull requests that update Rust code

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

Columnar Query Engine function calls with columns of different row order

1 participant