fix: ordered_param_types and add unit test by cemremengu · Pull Request #357 · datafusion-contrib/datafusion-postgres

cemremengu · 2026-06-15T21:08:18Z

ordered_param_types() received inferred parameter types from DataFusion as a HashMap keyed by strings such as $1, $2, ..., $10. Because HashMaps are unordered, the code sorted the keys before building the positional parameter type list returned to the client.

However, the keys were sorted lexicographically:

$1, $10, $2, $3, ...

instead of numerically:

$1, $2, $3, ..., $10

For queries with more than nine placeholders, this caused parameter types to be assigned to the wrong positions. For example, the type inferred for $10 could be reported or used as if it belonged to $2. As a result, clients could bind or deserialize values using incorrect PostgreSQL types, such as treating a UUID parameter as a timestamp.

The fix was to sort parameters by their numeric suffix:

types.sort_by_key(|(key, _)| {
    key.trim_start_matches('$')
        .parse::<u32>()
        .unwrap_or(u32::MAX)
});

Refactor sorting of parameter types to sort numerically based on placeholder keys. Add unit test to verify correct sorting of parameter types.

sunng87

LGTM. Sounds good.

* chore(deps): bump rust_decimal from 1.42.0 to 1.42.1 (#354) Bumps [rust_decimal](https://github.com/paupino/rust-decimal) from 1.42.0 to 1.42.1. - [Release notes](https://github.com/paupino/rust-decimal/releases) - [Changelog](https://github.com/paupino/rust-decimal/blob/master/CHANGELOG.md) - [Commits](paupino/rust-decimal@1.42.0...1.42.1) --- updated-dependencies: - dependency-name: rust_decimal dependency-version: 1.42.1 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * chore(deps): bump getset from 0.1.6 to 0.1.7 (#353) Bumps [getset](https://github.com/jbaublitz/getset) from 0.1.6 to 0.1.7. - [Release notes](https://github.com/jbaublitz/getset/releases) - [Commits](jbaublitz/getset@v0.1.6...v0.1.7) --- updated-dependencies: - dependency-name: getset dependency-version: 0.1.7 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * chore(deps): bump postgres-types from 0.2.13 to 0.2.14 (#356) Bumps [postgres-types](https://github.com/rust-postgres/rust-postgres) from 0.2.13 to 0.2.14. - [Release notes](https://github.com/rust-postgres/rust-postgres/releases) - [Commits](rust-postgres/rust-postgres@postgres-types-v0.2.13...postgres-types-v0.2.14) --- updated-dependencies: - dependency-name: postgres-types dependency-version: 0.2.14 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * feat: update regclass cast to subquery * feat: add array_upper/array_lower UDFs DataFusion ships `array_length` but not `array_upper`/`array_lower`, so Postgres client queries using them only "worked" via hardcoded blacklist token substitution -- any non-byte-matching variant failed at planning with `Invalid function 'array_upper'`. Implement both as scalar UDFs (`array_bounds_udf.rs`) with Postgres semantics: * arrow lists are always 1-based, so array_lower(arr,1) -> 1 and array_upper(arr,1) -> array_length(arr,1) * NULL for null array, out-of-range dim, or empty array * accepts List/LargeList/FixedSizeList + Int32/Int64 dims, returns Int32 Register them in `setup_pg_catalog`. The dbeaver startup test went from FAILED -> PASS (18/19 queries now run for real). The two dbeaver queries that still can't run are blacklisted: one needs dynamic `generate_series` bounds (DF only takes literal Int64), the other needs Postgres regnamespace/oid-alias string resolution. Neither is fixable by adding the UDF; both get type-compatible NULL stubs. Includes unit + end-to-end integration tests. * feat: rewrite regclass/regnamespace/regtype/regproc casts to oid lookups These are all Postgres `oid` (int4) alias types: casting a *name* to one yields the matching catalog oid. DataFusion has no such types, so a bare `'trigger'::regtype` was stripped to the string `'trigger'` and then crashed an oid comparison with "Cannot cast string 'trigger' to Int32". Generalize the existing regclass-only rewrite rule into `RewriteRegCastToSubquery`, with a registry of name->oid lookup queries: * regclass -> pg_class.oid (schema-qualified relation name) * regnamespace -> pg_namespace.oid (schema name) * regtype -> pg_type.oid (schema-qualified type name) * regproc -> pg_proc.oid (schema-qualified function name) Direction guard: only the *forward* direction (name -> oid) is rewritten, detected by a string-literal or `$1` placeholder operand. The *reverse* direction (`<oid-column>::regtype`, e.g. `prorettype::regtype::text` for display, or the left side of `prorettype::regtype != 'trigger'::regtype`) has a column operand and is left untouched; it is stripped to the bare oid column by RemoveUnsupportedTypes, which is correct since the column already is an oid. `regclass` is re-added to the unsupported-types set so its reverse direction is stripped too (it was removed earlier for the forward case, now handled by the rewrite). Results -- client startup tests, base branch -> this branch: * pgcli : FAILED -> PASS (the `'trigger'::regtype` crash is fixed) * dbeaver: PASS (already green from the array_upper work) * psql : 3 failures -> 1 failure; the two regclass/regtype-related queries now pass. The remaining failure is a DataFusion internal "ScalarSubqueryExpr evaluated before the subquery was executed" bug in `IN (subquery UNION ALL VALUES(...))`, which is blacklisted here as it can't be fixed at the SQL-rewrite layer. The full workspace test suite is green. * chore(deps): bump log from 0.4.32 to 0.4.33 (#361) Bumps [log](https://github.com/rust-lang/log) from 0.4.32 to 0.4.33. - [Release notes](https://github.com/rust-lang/log/releases) - [Changelog](https://github.com/rust-lang/log/blob/master/CHANGELOG.md) - [Commits](rust-lang/log@0.4.32...0.4.33) --- updated-dependencies: - dependency-name: log dependency-version: 0.4.33 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * fix: ordered_param_types and add unit test (#357) * fix ordered_param_types and add unit test Refactor sorting of parameter types to sort numerically based on placeholder keys. Add unit test to verify correct sorting of parameter types. * fix cargo fmt * chore(deps): bump pgwire from 0.40.1 to 0.40.3 (#359) Bumps [pgwire](https://github.com/sunng87/pgwire) from 0.40.1 to 0.40.3. - [Release notes](https://github.com/sunng87/pgwire/releases) - [Changelog](https://github.com/sunng87/pgwire/blob/master/CHANGELOG.md) - [Commits](sunng87/pgwire@v0.40.1...v0.40.3) --- updated-dependencies: - dependency-name: pgwire dependency-version: 0.40.3 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * refactor: replace 2 blacklists with proper rewrite rules The previous commit added 3 BLACKLIST_SQL_MAPPING entries to keep client startup tests green. Two of them can be replaced with general, correct rewrite rules; only the one that genuinely needs schema awareness remains. 1. dbeaver type lookup (REMOVED): `generate_series(1, array_upper(current_schemas(false), 1))` failed because our array_upper UDF returns Int32 (correct Postgres int4) but DataFusion's generate_series wants Int64. Postgres implicitly coerces int4->int8 here; DF does not. New rule `CastArrayBoundsForGenerateSeries` wraps array_upper/array_lower args of generate_series in ::bigint, the way Postgres would. 2. psql foreign-key \d lookup (REMOVED): `VALUES ('16417'::pg_catalog.regclass)` broke DF's subquery decorrelator because the regclass rewrite embedded a (SELECT ...) inside VALUES. Fix it at the source: a *numeric* string operand like '16417' should resolve directly to that oid (Postgres behavior), not do a name lookup. RewriteRegCastToSubquery now emits a literal oid for numeric operands, which both is more correct and avoids the nested subquery. 3. dbeaver relation-size lookup (KEPT): `WHERE c.relnamespace='public'` compares an oid column to a bare string, needing Postgres' implicit string->oid coercion. The explicit `'public'::regnamespace` form is already handled, but detecting that a bare string needs coercing requires knowing the column is an oid column -- schema awareness the AST-rewrite layer lacks. Hardcoding pg_catalog column names would risk regressing user tables with same-named columns, so this single entry stays, with an accurate comment. Net: BLACKLIST_SQL_MAPPING goes from +3 (previous commit) to +1 vs the base branch, and that one is the genuinely hard case. Full test suite + clippy are green. * chore: format * chore(deps): bump bytes from 1.11.1 to 1.12.0 (#360) Bumps [bytes](https://github.com/tokio-rs/bytes) from 1.11.1 to 1.12.0. - [Release notes](https://github.com/tokio-rs/bytes/releases) - [Changelog](https://github.com/tokio-rs/bytes/blob/master/CHANGELOG.md) - [Commits](tokio-rs/bytes@v1.11.1...v1.12.0) --- updated-dependencies: - dependency-name: bytes dependency-version: 1.12.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * feat: oid coercion analyzer rule (#363) * feat: postgres-correct oid/string comparison via Field metadata + analyzer rule Postgres catalog columns like pg_class.relnamespace or pg_attribute.atttypid are object identifiers stored as Int32, but clients compare them against string literals (WHERE relnamespace = 'public', WHERE atttypid = '23') and expect Postgres name/numeric oid resolution. DataFusion has no notion of "oid": it coerces Int32_col = Utf8('public') to CAST(col AS Utf8) = 'public', silently returning zero rows for name strings. This implements the design as a standalone feature: 1. oid_field.rs -- annotate oid/oid-alias columns via Arrow Field metadata (key "pg.oid_alias" -> kind: oid/regclass/regnamespace/regtype/regproc). The metadata survives into the logical plan schema, readable via ExprSchema::metadata(). pg_namespace/pg_class/pg_attribute oid columns are annotated. 2. oid_coercion_rule.rs -- an AnalyzerRule (OidStringCoercion) that rewrites oid-alias column comparisons against string literals using Postgres semantics: - numeric string -> literal Int32 oid (e.g. '16417' -> 16417) - name string -> scalar subquery SELECT oid FROM <table> WHERE <namecol> = $s (regnamespace->pg_namespace, regclass->pg_class, regproc->pg_proc) The rule is order-independent (handles both the pre-coercion `col = 'x'` form and the post-coercion `CAST(col AS Utf8) = 'x'` form) and supports =, <>, and IN (...). Name resolution works in the sync analyzer because PgCatalogSchemaProvider can build a table provider synchronously (dynamic tables defer data fetch to execution via a streaming source). The rule captures an OidLookupProvider handle to build the lookup subquery plans; data materializes lazily at execution. 3. Registered in setup_pg_catalog. Verified end-to-end: relnamespace = '<name>' and relnamespace = '<oid>' return the same rows as the int-oid baseline (which previously returned empty). 29 lib tests + full workspace green. * feat: stamp oid-alias metadata into pg_catalog feather exports The oid-coercion analyzer rule relies on Arrow field metadata (`pg.oid_alias=<kind>`) to know which Int32 columns are object identifiers. Previously only the handful of dynamic tables (pg_namespace, pg_class, pg_attribute) were annotated by hand. The ~50 static tables loaded from the embedded feather files (pg_type, pg_proc, pg_aggregate, pg_operator, ...) had no metadata, so string-vs-oid comparisons on them silently returned zero rows. This moves the annotation to the source of truth: the export scripts that read the real schema from PostgreSQL. export_pg_catalog_arrow.sh (embedded exporter): - Map every oid-alias type (oid, regproc, regprocedure, regoper, regoperator, regclass, regtype, regnamespace, regrole, regconfig, regdictionary, regcollation) to int32 (the raw oid), not string. - reg* columns are cast with ::oid at SELECT time so Postgres returns the integer, not its display (name) form. - Attach field metadata `pg.oid_alias=<pg_type>` to those columns. Arrow IPC round-trips this metadata faithfully (verified pyarrow -> feather -> arrow-rs). pg_to_arrow.py (generic exporter): same treatment, detecting oid-alias columns from pg_type at runtime. Regenerated all 65 feather files. 251 oid-alias columns are now annotated (216 oid, 35 regproc); the 21 reg* columns previously stored as string (e.g. pg_type.typinput, pg_aggregate.aggfnoid) are now int32 + metadata. The dynamic tables (pg_namespace/pg_class/pg_attribute) keep their hand-annotated oid_field() calls since they build schemas at runtime; the feather metadata covers everything else transparently. Added end-to-end test feather_metadata_reaches_rule_for_static_tables: pg_type.typinput (regproc, annotated solely via the feather file) resolves a proc-name string via pg_proc name->oid lookup, matching the int-oid baseline. 60 tests pass. * chore: format and cleanup * refactor: remove RewriteRegclassCastToSubquery, rely on oid-coercion rule The oid-coercion analyzer rule + Field metadata (added in be40d6b/0300310) now handle `'name'::regclass::oid` casts in comparison predicates natively at the logical-plan layer, making the SQL-rewrite rule redundant for that purpose. Remove it. To support this, fix RemoveUnsupportedTypes to peel nested casts to a fixed point. Previously the visitor replaced an `Expr::Cast` with its inner expression in a single pass and never re-visited the replacement, so `'x'::regclass::oid` became `'x'::regclass` (the residual REGCLASS then being rejected by DataFusion as an unsupported SQL type at plan time). Looping to a fixed point fully unwinds it to `'x'`, which the oid-coercion rule resolves against any oid-alias column it is compared with. Net: -162 lines. The `'x'::regclass::oid` pattern in comparisons (the only shape exercised by client queries -- pgadbc, psql) is now covered by: 1. RemoveUnsupportedTypes -> peels to bare 'x' 2. OidStringCoercion rule -> resolves 'x' via pg_namespace/pg_class/pg_proc Trade-off: schema-qualified names in casts (e.g. `oid = 'pg_catalog.pg_type'::regclass::oid`) no longer split into schema+name. The old rule attempted this via parse_ident()+current_schema(), but that path was never verified to resolve correctly and relies on UDFs DataFusion may not support. Bare-oid columns never supported name resolution in either approach by design (no names-for-oid table). All 59 remaining tests pass (the removed rule's own test was deleted). Added regression assertions to test_remove_unsupported_types pinning the full-peel behavior. * refactor: unify oid-alias type names, rename RemoveUnsupportedTypes Two cleanups to the oid machinery: 1. Single source of truth. The list of oid-alias type names previously lived in two places that could drift: * RemoveUnsupportedTypes had a hardcoded set of 6 names to strip (regclass, regproc, regtype, regtype[], regnamespace, oid) * oid_field::kind had 5 named constants for metadata annotation Both must enumerate the same Postgres oid-alias family, so consolidate them into one constant `oid_field::OID_ALIAS_TYPE_NAMES`, now covering the full pg_catalog oid-alias set (oid, regproc, regprocedure, regoper, regoperator, regclass, regtype, regnamespace, regrole, regconfig, regdictionary, regcollation) -- matching what the feather export scripts stamp as metadata. RemoveOidTypeCast derives its strip set from it, generating the bare / `pg_catalog.`-qualified / array spellings. Net effect: cast types like `::regrole` that DF rejects are now stripped too (previously would error); no client query regressed (over-stripping is harmless since these casts can only ever resolve to oid behavior). 2. Rename RemoveUnsupportedTypes -> RemoveOidTypeCast. The old name was a misnomer inherited from when the rule was generic; it now exists for one specific purpose: stripping oid-alias cast type names so they survive DataFusion's sql_to_plan (which has no notion of oid types) and reach the oid-coercion analyzer rule at the logical-plan layer. Visitor renamed accordingly. oid_field()'s debug_assert now validates against OID_ALIAS_TYPE_NAMES instead of a separate match. Added regression assertions: newly-covered `::regrole` and the array variant `::regtype[]` (the latter used by pgcli's `proallargtypes::regtype[]`). 59 tests pass. * chore: format * fix: resolve master merge for DataFusion 54 (oid-cast + analyzer rule) Merging master's oid-coercion work (PR #363) into the DF54 branch broke compilation and the dbeaver e2e test. Two distinct problems, both rooted in the DF53->DF54 upgrade: 1. DataFusion 54 API: Cast/TryCast lost `data_type` in favor of `field: FieldRef`. Updated the oid-coercion-rule tests; the production code already used `Cast { expr, .. }`. 2. Type-coercion direction change: DF53 coerced `Int32 = Utf8` leniently (column -> string, returning empty); DF54 coerces the other way (string -> Int32) and ERRORS on non-numeric names. Master removed the SQL-layer RewriteRegCastToSubquery relying on DF53's leniency, but on DF54 that removal makes explicit `'name'::regclass` casts (stripped to bare strings by RemoveOidTypeCast) hit the broken coercion path. Restore RewriteRegCastToSubquery (run before RemoveOidTypeCast) so explicit forward oid-alias casts resolve to an oid (literal for numeric strings, name-lookup subquery otherwise) in ANY position -- projection, join ON, filter. This is complementary to master's oid-coercion analyzer rule, which handles the implicit `oid_col = 'name'` case via Field metadata. Also: - Extend the oid-coercion analyzer rule to rewrite LogicalPlan::Join filters (not just Filter), and to peek through CAST('str' AS int) so it stays order-independent under DF54's appended-after-TypeCoercion ordering. - Remove stale RewriteRegCastToSubquery tests left mis-merged, then re-add them against the restored rule; refresh the now-inaccurate parser.rs blacklist comment. - Drop redundant PostgreSqlDialect/Parser imports from the test module (now at crate top level). 70 tests pass (38 lib + 32 integration), no new clippy warnings. * refactor: replace both oid SQL rewrite rules with a metadata-aware TypePlanner Eliminates RewriteRegCastToSubquery and RemoveOidTypeCast (two SQL/AST-layer rewrite rules, ~570 lines) in favor of a single, metadata-aware mechanism that resolves oid-alias casts and comparisons at the analyzer layer -- the architectural goal of pushing oid resolution off the AST layer entirely. Key insight: DataFusion 54 exposes a TypePlanner extension point consulted *before* the default type conversion that rejects `regclass`/`regtype`/etc. Registering a TypePlanner that maps each oid-alias type name to an int4 Field carrying `pg.oid_alias` metadata means DF accepts the casts (no more "Unsupported SQL type") AND the kind survives into the logical plan inside the resulting Expr::Cast.field. The oid-coercion analyzer rule then reads that metadata to resolve names -> oids. Changes: * oid_type_planner.rs (new): PgOidTypePlanner maps regclass/regproc/regtype/ regnamespace/oid (+ pg_catalog.-prefixed) to int4 Field with kind metadata. Registered via SessionStateBuilder in setup_pg_catalog. * oid_coercion_rule.rs: PlanRewriter now uses map_expressions (covers every plan node -- projections, filters, join ON filters -- not just Filter/Join). Added a Cast/TryCast arm that reads the cast's OWN field metadata (not just the column's), so oid-column casts like `classoid = 'pg_namespace'::regclass` resolve even though classoid is bare `oid`-kind. Added regtype -> pg_type name resolution. Added kind_from_field + immediate_str_literal helpers. * rules.rs: deleted RewriteRegCastToSubquery, RemoveOidTypeCast, and their 4 test functions. * parser.rs: dropped both rules from the registration list + their imports. * oid_field.rs: refreshed OID_ALIAS_TYPE_NAMES doc (set is now consumed by the TypePlanner, not a strip rule). Net: -387 lines, 69 tests pass (was 70; the 4 deleted SQL-rewrite tests are superseded by analyzer-rule + type-planner tests), zero new clippy warnings. This completes the goal of removing all SQL-level oid-cast rewriting: both the cast-stripping (RemoveOidTypeCast) and the cast-rebuilding (RewriteRegCastToSubquery) are gone, leaving resolution entirely to the metadata-aware analyzer rule. * refactor: drop the dbeaver relation-size blacklist entry The oid-coercion analyzer rule (added with the TypePlanner work) now resolves `c.relnamespace = 'public'` -- a bare string compared against an oid-alias Int32 column -- to the namespace oid via a pg_namespace lookup, the way Postgres does. That was the sole reason this query was blacklisted (the rest of it -- pg_total_relation_size / pg_relation_size -- already worked), so the NULL-fallback entry is no longer needed: the real query executes directly. 6 of 7 blacklist entries remain. Each is blocked by an unrelated, non-oid DataFusion limitation, verified by probing the exact original SQL through the full pipeline: - pgcli FK columns: unnest() of a subquery result - pgcli types: correlated scalar subqueries - psql \d policies: `array(SELECT ...)` constructor unsupported - statistics: `any(stxkind)` -> array_has type mismatch - publications: duplicate NULL projection names - grafana search_path: missing current_setting() UDF 69 tests pass (dbeaver sends the un-blacklisted query and succeeds). --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Cemre Mengu <cemremengu@gmail.com>

cemremengu and others added 2 commits June 16, 2026 00:07

fix ordered_param_types and add unit test

084c66f

Refactor sorting of parameter types to sort numerically based on placeholder keys. Add unit test to verify correct sorting of parameter types.

fix cargo fmt

696f3cf

sunng87 approved these changes Jun 22, 2026

View reviewed changes

sunng87 merged commit 21f5e43 into datafusion-contrib:master Jun 22, 2026
7 checks passed

cemremengu deleted the patch-1 branch June 22, 2026 12:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: ordered_param_types and add unit test#357

fix: ordered_param_types and add unit test#357
sunng87 merged 2 commits into
datafusion-contrib:masterfrom
cemremengu:patch-1

cemremengu commented Jun 15, 2026

Uh oh!

sunng87 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

cemremengu commented Jun 15, 2026

Uh oh!

sunng87 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants