Skip to content

feat(python): expose register_table_function for Paimon UDTFs#324

Open
shyjsarah wants to merge 2 commits into
apache:mainfrom
shyjsarah:feat/register-table-function
Open

feat(python): expose register_table_function for Paimon UDTFs#324
shyjsarah wants to merge 2 commits into
apache:mainfrom
shyjsarah:feat/register-table-function

Conversation

@shyjsarah
Copy link
Copy Markdown
Contributor

Purpose

Linked issue: close #xxx

paimon-datafusion provides register_vector_search / register_full_text_search to register table-valued functions (UDTFs) on a session. But the Python binding PySQLContext only exposed register_catalog / set_current_* /register_batch / sql — there was no way to reach register_udtf from Python, so these UDTFs were entirely unusable from pypaimon.

This PR exposes a single registration entry point to the Python binding.

Brief change log

  • bindings/python/src/context.rs: add SQLContext.register_table_function(name, default_database=None). A single dispatch method (rather than one method per function) keeps the Python API surface stable — it matches on the function name, currently handling vector_search and full_text_search, and raises a clear ValueError for an unknown name. The function is bound to the current catalog.
  • crates/integrations/datafusion/src/sql_context.rs: change SQLContext::current_catalog from private to pub. The binding needs the registered Arc<dyn Catalog> to pass to register_*; exposing the accessor lets it read from SQLContext instead of keeping a duplicate catalog handle.
  • bindings/python/Cargo.toml: enable the fulltext feature on paimon-datafusion (pulls in tantivy + tempfile, both pure-Rust) so register_full_text_search is compiled into the binding.

Once register_referenced_files_size / register_physical_files_size land on main, wiring them is a two-line addition to the match — the Python signature does not change.

Tests

bindings/python/tests/test_datafusion.py — 5 new tests:

  • vector_search / full_text_search register without error
  • the optional default_database keyword is accepted
  • an unknown function name raises a clear error
  • calling before any catalog is registered raises

Registration touches neither the Lumina nor the Tantivy runtime, so the tests are deterministic and need no index fixtures.

API and Format

  • New Python API: SQLContext.register_table_function.
  • New public Rust API: SQLContext::current_catalog (previously private).
  • Build: the binding now enables paimon-datafusion/fulltext (adds tantivy).
  • No storage format change.

Documentation

New Python-facing API. The Rust-side docs/src/sql.md already documents the underlying register_* functions; the pypaimon-facing docs live in the apache/paimon repo and can be updated as a follow-up.

shyjsarah and others added 2 commits May 18, 2026 01:38
Add `SQLContext.register_table_function(name, default_database=None)`
to the Python binding so Paimon table-valued functions can be
registered from Python — the binding previously had no way to reach
`register_udtf`.

A single dispatch method keeps the API surface stable: it currently
supports `vector_search` and `full_text_search`, and the same `match`
will pick up `referenced_files_size` / `physical_files_size` once
those land, without changing the Python signature.

The function binds to the current catalog. So the binding can obtain
that catalog without keeping a duplicate handle of its own,
`SQLContext::current_catalog` is made public. The binding also enables
the `fulltext` feature so `register_full_text_search` is available.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Add tests for `SQLContext.register_table_function`:
- vector_search / full_text_search register without error
- the optional default_database keyword is accepted
- an unknown function name raises a clear error
- calling it before any catalog is registered raises

Registration alone touches neither the Lumina nor Tantivy runtime,
so these tests are deterministic and need no index fixtures.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant