Skip to content

Adding feature to integrate Teradata profiler with Lakebridge#2343

Open
dey-abhishek wants to merge 26 commits intomainfrom
feature/teradata_profiler
Open

Adding feature to integrate Teradata profiler with Lakebridge#2343
dey-abhishek wants to merge 26 commits intomainfrom
feature/teradata_profiler

Conversation

@dey-abhishek
Copy link
Copy Markdown

@dey-abhishek dey-abhishek commented Mar 23, 2026

Changes

What does this PR do?

Adds end-to-end Teradata profiler support and hardens the Teradata assessment workflow with:

  • extraction + ingestion + Lakeview dashboard publishing
  • PDCR-aware runtime behavior with DBQL-core fallback
  • improved Teradata dashboard UX and query quality
  • optional security/governance controls for SQLTextInfo in Unity Catalog

Relevant implementation details

  • Added Teradata assessment resources and pipeline config under resources/assessments/teradata/:
    • system, storage, object, UDF, PDCR, and DBQL extraction SQL/DDL assets
  • Added Teradata dashboard template:
    • lakebridge_teradata_profiler_summary.lvdash.json
    • includes KPI/system/storage/object/UDF/workload views
    • increased Top Queries SQL preview truncation from 120 to 300 chars
  • Added PDCR handling behavior:
    • optional use_pdcr in credentials
    • runtime fallback to DBQL-core when PDCR SQL step is disabled/unavailable
  • Updated CLI behavior:
    • create-profiler-dashboard now auto-resolves extract path from pipeline config when --extract-file is omitted
  • Ingestion (assessments/dashboards/execute.py) improvements:
    • robust multilingual text normalization
    • sensitivity metadata tagging for td_dbql_core_info_extract.SQLTextInfo
    • optional best-effort post-ingestion SQL masking:
      • LAKEBRIDGE_ENABLE_SQLTEXT_MASK=true
      • optional bypass group via LAKEBRIDGE_SQLTEXT_MASK_BYPASS_GROUP
  • Added/updated tests:
    • Teradata unit tests for profiler, validator, dashboard execute/manager, config
    • Teradata integration test coverage for assessment/profiler flow
    • governance test coverage for SQLTextInfo comment/tag/mask behavior
  • Added docs:
    • Teradata profiler guide with prerequisites, PDCR mode, multilingual handling
    • new Security/Governance options section for tagging + masking + UC guidance

Caveats/things to watch out for when reviewing:

  • Synapse integration tests under tests/integration/assessments/test_pipeline.py require TEST_TSQL_JDBC; unrelated to Teradata but may fail in environments where it is absent.
  • SQL masking in ingestion is opt-in and best-effort:
    • ingestion should not fail if UC permissions for function/mask/tag/comment are missing.
  • SQLTextInfo may still contain sensitive business SQL if masking is not enabled.
  • create-profiler-dashboard now infers extract path from the configured pipeline; custom extract paths still require explicit --extract-file.

Linked issues

Resolves #..

Functionality

  • added relevant user documentation
  • added new CLI command
  • modified existing command: databricks labs lakebridge create-profiler-dashboard
  • ... +add your own:
    • added Teradata profiler extraction resources and dashboard template
    • added optional post-ingestion SQL masking for SQLTextInfo
    • added sensitivity metadata tagging/commenting for SQLTextInfo

Tests

  • manually tested
  • added unit tests
  • added integration tests

@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 23, 2026

Codecov Report

❌ Patch coverage is 70.24221% with 86 lines in your changes missing coverage. Please review.
✅ Project coverage is 68.53%. Comparing base (9586bc8) to head (4c99133).

Files with missing lines Patch % Lines
.../labs/lakebridge/assessments/dashboards/execute.py 65.32% 32 Missing and 11 partials ⚠️
src/databricks/labs/lakebridge/cli.py 26.08% 17 Missing ⚠️
...databricks/labs/lakebridge/assessments/profiler.py 85.45% 4 Missing and 4 partials ⚠️
...ks/labs/lakebridge/connections/database_manager.py 57.89% 6 Missing and 2 partials ⚠️
...databricks/labs/lakebridge/assessments/pipeline.py 41.66% 7 Missing ⚠️
...databricks/labs/lakebridge/deployment/dashboard.py 75.00% 2 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2343      +/-   ##
==========================================
+ Coverage   66.10%   68.53%   +2.43%     
==========================================
  Files          99       99              
  Lines        9291     9539     +248     
  Branches      989     1037      +48     
==========================================
+ Hits         6142     6538     +396     
+ Misses       2970     2781     -189     
- Partials      179      220      +41     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Mar 23, 2026

✅ 143/143 passed, 11 flaky, 20 skipped, 1h11m58s total

Flaky tests:

  • 🤪 test_installs_and_runs_local_bladebridge (21.24s)
  • 🤪 test_installs_and_runs_pypi_bladebridge (29.728s)
  • 🤪 test_transpiles_informatica_to_sparksql_non_interactive[True] (16.917s)
  • 🤪 test_transpiles_informatica_to_sparksql_non_interactive[False] (17.094s)
  • 🤪 test_transpile_teradata_sql (19.991s)
  • 🤪 test_transpile_teradata_sql_non_interactive[False] (5.934s)
  • 🤪 test_transpile_teradata_sql_non_interactive[True] (6.088s)
  • 🤪 test_databricks_read_schema_happy_TEST_CATALOG (10m23.176s)
  • 🤪 test_transpiles_informatica_to_sparksql (15.015s)
  • 🤪 test_snowflake_read_schema_happy (10m12.944s)
  • 🤪 test_reconcile_data_with_mismatches_and_missing (10m15.371s)

Running from acceptance #4086

@dey-abhishek dey-abhishek added the feat/profiler Issues related to profilers label Mar 24, 2026
@dey-abhishek
Copy link
Copy Markdown
Author

Hi @sundarshankar89 - Pls. share your review comments

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feat/profiler Issues related to profilers

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant