Support function calls where args have differing row orders#2635
Open
albertlockett wants to merge 3 commits intoopen-telemetry:mainfrom
Open
Support function calls where args have differing row orders#2635albertlockett wants to merge 3 commits intoopen-telemetry:mainfrom
albertlockett wants to merge 3 commits intoopen-telemetry:mainfrom
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #2635 +/- ##
==========================================
- Coverage 88.14% 88.14% -0.01%
==========================================
Files 633 633
Lines 235806 236383 +577
==========================================
+ Hits 207862 208367 +505
- Misses 27420 27492 +72
Partials 524 524
🚀 New features to boost your workflow:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Change Summary
Adds capability to the columnar query engine to invoke functions where the columns passed as arguments have a different "row order"..
Consider calling a function in OPL or KQL like
my_func(severity_text, attributes["x"]). In these cases, theseverity_textwould have the "row order" of the root record batch, whereattributes["x"]would have the "row order" of however theparent_ids were sorted in the Log Attrs batch rows wherekey == "x".(We can think of "row order" here as order we'd encounter a row belonging to some signal (e.g. log, span, etc.) represented as we scan the column).
Currently the columnar query engine doesn't handle the case where function args have different "row order" and will just return an error. This PR resolves the issue.
It does this by determining if the arguments to some function have different "row order" (in the parlance of the engine's expressions planning, we say they have incompatible
DataScopes), and if the scopes are incompatible we must align the rows in each column by performing one or more joins before calling the function.Noe: the engine's planned expressions are a tree of "scoped exprs" where each node in the tree represents a datafusion expression operating on a single data scope. While evaluating the tree at each node, we take data from the source and possibly perform "joins" on the input (depending on the data source for this node), to create an input record batch for the datafusion expression. A 2-way join was already implemented for binary expressions (using in arithmetic, for example).
This PR adds a new kind of data source called
MultiJoinrepresenting an arbitrary number of input expressions that must be joined, and having the convention that the resulting record batch columns will be named like "arg0", "arg1", ... "argn". The planner is changed to create this kind of join for as the data source to a scoped expr node that evaluates a function if any of the args have incompatibleDataScopes. The scoped expr node now has logic to perform thisMultiJoinwhen it encounters data source of this type. The mutli join logic is implemented in thepipeline::expr::joinmodule, using the same join utilities we use for binary expressions.What issue does this PR close?
How are these changes tested?
Unit tests
Are there any user-facing changes?
Yes, users of transform processor can now pass to functions columns as arguments that in OTAP would have different row orders.