Subquery (SubPlan) pushdown#289
Conversation
0f1fab8 to
8c506c3
Compare
Deparse the correlated and uncorrelated SubPlans that pull_up_sublinks cannot flatten into joins, so they execute on ClickHouse instead of being evaluated locally by Postgres. Covers correlated scalar subqueries (TPC-H Q2/Q17), uncorrelated scalar subqueries in WHERE/HAVING (Q11/Q22), IN with GROUP BY/HAVING (Q18), and NOT IN (Q16, deparsed as LEFT ANTI JOIN). Gate the pushdown on the ClickHouse server version: only 25.8+ supports these correlated shapes; older analyzers error on the SQL, so below 25.8 the SubPlan runs locally. Verified the emitted SQL against ClickHouse 25.8, 26.3, and 26.5. Tests pin the deparsed Remote SQL via EXPLAIN. The correlated executions are EXPLAIN-only, since they run only on 25.8+ and would otherwise need version-split expected output.
8c506c3 to
d9c6d11
Compare
Use an asterisk to indicate full pushdown but with multiple foreign scans. Outcome: 11 now recorded as pushed down, and 2, 17, & 22 newly push down.
theory
left a comment
There was a problem hiding this comment.
Hell yeah, this is looking great. I have a few nitpicks, nothing serious. Thanks for working on this!
| jobs: | ||
| build: | ||
| env: { pg: 19 } | ||
| env: { pg: 19, CH_RELEASE: "${{ matrix.ch }}" } |
There was a problem hiding this comment.
What's the reason for this change?
| /* | ||
| * SubPlan-interior aliases. plan_id is unique across PlannedStmt.subplans, | ||
| * so q{plan_id}_{varno} can never collide with the outer query's r{N}/s{N} | ||
| * aliases nor with a sibling SubPlan's tables. See deparseSubPlanQuery. |
There was a problem hiding this comment.
| * aliases nor with a sibling SubPlan's tables. See deparseSubPlanQuery. | |
| * aliases, nor with a sibling SubPlan's tables. See deparseSubPlanQuery. |
| if (!IS_UPPER_REL(glob_cxt->foreignrel)) { | ||
| /* | ||
| * Not safe to pushdown when not in grouping context. Inside a | ||
| * SubPlan's Query a bare aggregate is legal (it is the |
There was a problem hiding this comment.
| * SubPlan's Query a bare aggregate is legal (it is the | |
| * SubPlan's Query a bare aggregate is legal (it's the |
| * Unsupported (and why): | ||
| * ALL_SUBLINK ClickHouse lacks a direct ALL; NOT IN arrives as | ||
| * NOT(ANY) and is handled by the BoolExpr case. | ||
| * ROWCOMPARE_SUBLINK multi-column compares not deparsed. |
There was a problem hiding this comment.
Could they be in the future? Add "currently" if so.
| * ALL_SUBLINK ClickHouse lacks a direct ALL; NOT IN arrives as | ||
| * NOT(ANY) and is handled by the BoolExpr case. |
There was a problem hiding this comment.
So can we not handle it as NOT IN(...)?
| /* | ||
| * Emit a pushed-down SubPlan subquery's FROM list (sans the FROM keyword). | ||
| * | ||
| * This looks like deparseFromExpr's job but cannot use it: there is no planned |
There was a problem hiding this comment.
| * This looks like deparseFromExpr's job but cannot use it: there is no planned | |
| * This looks like deparseFromExpr but we cannot use it: there is no planned |
| Foreign Scan | ||
| Output: i.item_id, s.amount | ||
| Relations: (items i) INNER JOIN (sales s) | ||
| Remote SQL: SELECT r1.item_id, r2.amount FROM subplan_test.items r1 ALL INNER JOIN subplan_test.sales r2 ON (((r1.item_id = r2.item_id))) WHERE (((SELECT max(q1_1.amount) FROM subplan_test.sales q1_1 WHERE ((q1_1.item_id = (r1.item_id)))) = r2.amount)) ORDER BY r1.item_id ASC NULLS LAST |
There was a problem hiding this comment.
We should also execute the queries and verify the results are correct. Same for #3 above.
| (2 rows) | ||
|
|
||
| -- ============================================================ | ||
| -- 6. Scalar subquery in HAVING (TPC-H Q11 shape) |
| -- Query 2. EXPLAIN-only: this correlated subquery executes only on ClickHouse | ||
| -- 25.8+, so we pin the deparsed plan (version-independent) rather than the | ||
| -- execution (which is not). |
There was a problem hiding this comment.
Can we restore execution? Maybe check the version and don't run it on < 25.8.
| # The subquery-pushdown tests exercise SubPlan pushdown, which is gated on | ||
| # ClickHouse 25.8+ (older analyzers error on the correlated SQL). When CH_RELEASE | ||
| # names an older server, drop those tests and emit a GitHub Actions warning | ||
| # rather than carry a second set of "not pushed down" expected files. CH_RELEASE | ||
| # is unset for local runs and the Postgres matrix (which use the latest CH), so | ||
| # they keep the tests. |
There was a problem hiding this comment.
This confuses me. CH_RELEASE is currently set in the GitHub workflow but nowhere else. Someone downloading it and running the tests won't have it set. I don't have it set! The Makefile has not and should not have any idea of the ClickHouse version IMO. Nor should it have any awareness of GitHub Actions.
Folds correlated/uncorrelated scalar (EXPR), EXISTS, and ANY/IN subqueries, plus NOT EXISTS/NOT IN (ANTI joins), into the remote SQL. Rebased onto current main; supersedes the retired subquery-pushdown-v3 with all review feedback folded in. Draft pending one more pass.