Skip to content

Fix reconcile schema fetch failure on Foreign Catalogs (Lakehouse Federation)#2367

Open
moomindani wants to merge 3 commits intodatabrickslabs:mainfrom
moomindani:fix/foreign-catalog-schema-fallback
Open

Fix reconcile schema fetch failure on Foreign Catalogs (Lakehouse Federation)#2367
moomindani wants to merge 3 commits intodatabrickslabs:mainfrom
moomindani:fix/foreign-catalog-schema-fallback

Conversation

@moomindani
Copy link
Copy Markdown
Contributor

@moomindani moomindani commented Apr 9, 2026

Summary

  • Fixes [BUG]: Reconciliation fails with UNRESOLVED_COLUMN error on Foreign Catalogs (Lakehouse Federation) #2366
  • Foreign Catalogs (created via Lakehouse Federation) lack the Databricks-specific full_data_type column in information_schema.columns, causing UNRESOLVED_COLUMN errors for all reconcile report types (schema, data, row, all)
  • Split DatabricksDataSource into two subclasses per review feedback:
    • DatabricksSourceDataSource: uses DESCRIBE TABLE (works for hive_metastore, global_temp, and Foreign Catalogs)
    • DatabricksTargetDataSource: uses information_schema with full_data_type (for native Unity Catalog tables)
  • The source_adapter and utils now create the appropriate subclass based on source vs target role

Test plan

  • Existing unit tests pass
  • New tests: test_get_schema_target, test_get_schema_source, test_get_schema_source_foreign_catalog, test_get_schema_target_exception_handling, test_get_schema_source_exception_handling
  • Source adapter tests updated: test_create_adapter_for_databricks_dialect_source, test_create_adapter_for_databricks_dialect_target
  • Integration test: Reconcile with Foreign Catalog (Lakebase PostgreSQL via Lakehouse Federation) as direct source

…eration)

Foreign Catalogs created via Lakehouse Federation do not have the
Databricks-specific `full_data_type` column in `information_schema.columns`,
causing an UNRESOLVED_COLUMN error for all report types. This adds a
DESCRIBE TABLE fallback when the `full_data_type` column is not found.

Co-authored-by: Isaac
@moomindani moomindani requested a review from a team as a code owner April 9, 2026 06:19
@sundarshankar89
Copy link
Copy Markdown
Collaborator

@BesikiML and @bishwajit-db can you take a look at this when you get a chance

@asnare asnare added bug Something isn't working feat/recon making sure that remorphed query produces the same results as original labels Apr 9, 2026
Copy link
Copy Markdown
Contributor

@bishwajit-db bishwajit-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM. Please have a look at the comment https://github.com/databrickslabs/lakebridge/pull/2367/changes#r3056652489

if "full_data_type" not in str(e):
return self.log_and_throw_exception(e, "schema", schema_query)

# Fallback to DESCRIBE TABLE for catalogs that lack the full_data_type column
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add an optional parameter to _get_schema_query?

  def _get_schema_query(catalog: str, schema: str, table: str, force_describe: bool = False):
      if catalog == "hive_metastore" or force_describe:
          return f"describe table {catalog}.{schema}.{table}"
      ...

Then the fallback becomes:

describe_query = _get_schema_query(catalog_str, schema, table, force_describe=True)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review! Updated to reuse _get_schema_query with a force_describe parameter instead of a separate function. Please take another look.

Address review feedback from @bishwajit-db — reuse _get_schema_query
with a force_describe parameter instead of a separate
_get_describe_table_query function.

Co-authored-by: Isaac
Copy link
Copy Markdown
Contributor

@m-abulazm m-abulazm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for raising this. I documented a different approach as scaling this might require dedicated logic for hive vs foreign catalog vs native catalog. to be open for future changes, I would prefer a subclass instead conditional control



def _get_schema_query(catalog: str, schema: str, table: str):
def _get_schema_query(catalog: str, schema: str, table: str, force_describe: bool = False):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to split this into _get_schema_query_for_source that uses describe table exclusively since we know databricks sources can only be Hive, global views or foreign catalog that only supporst DESCRIBE TABLE.

and _get_schema_query_for_target that uses the SELECT query only as this is the needed behavior always.

This requires distinguishing source vs target databricks data source. so I would implement a subclass instead of nested conditionals and control through exception handling.

Also to note this rubs a bit with PR #2362. should not block us on this PR but keep in mind that there might be conflicts

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the detailed feedback! Refactored to split DatabricksDataSource into two subclasses:

  • DatabricksSourceDataSource: always uses DESCRIBE TABLE (works for hive_metastore, global_temp, and Foreign Catalogs)
  • DatabricksTargetDataSource: uses information_schema with full_data_type (for native UC tables)

The source_adapter and utils now create the appropriate subclass based on source vs target role. This eliminates the need for exception-based fallback.

Also verified with an integration test — reconcile against a Foreign Catalog (Lakebase PostgreSQL via Lakehouse Federation) succeeded as direct source.

Noted the potential conflict with #2362 — will keep an eye on it.

Address review feedback from @m-abulazm — instead of exception-based
fallback, split DatabricksDataSource into:
- DatabricksSourceDataSource: uses DESCRIBE TABLE (works for hive,
  global_temp, and Foreign Catalogs via Lakehouse Federation)
- DatabricksTargetDataSource: uses information_schema with full_data_type
  (optimal for native Unity Catalog tables)

The source_adapter and utils are updated to create the appropriate
subclass based on source vs target role.

Co-authored-by: Isaac
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working feat/recon making sure that remorphed query produces the same results as original

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG]: Reconciliation fails with UNRESOLVED_COLUMN error on Foreign Catalogs (Lakehouse Federation)

5 participants