Add Redshift connector to Recon#2339
Conversation
4a9181b to
611e267
Compare
611e267 to
fb2d451
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2339 +/- ##
==========================================
+ Coverage 66.10% 66.30% +0.19%
==========================================
Files 99 100 +1
Lines 9291 9358 +67
Branches 989 993 +4
==========================================
+ Hits 6142 6205 +63
- Misses 2970 2973 +3
- Partials 179 180 +1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
❌ 59/147 passed, 88 failed, 6 skipped, 1h19m6s total ❌ test_run_empty_result_pipeline: RuntimeError: Pipeline execution failed due to errors in steps: empty_result_step (1.074s)❌ test_run_pipeline_with_combined_ddl: RuntimeError: Pipeline execution failed due to errors in steps: inventory, usage (2.134s)❌ test_run_pipeline_with_ddl: RuntimeError: Pipeline execution failed due to errors in steps: inventory, usage (1.778s)❌ test_synapse_query_execution: sqlalchemy.exc.InterfaceError: (pyodbc.InterfaceError) ('28000', "[28000] [Microsoft][ODBC Driver 18 for SQL Server][SQL Server]Login failed for user 'REDSHIFT_DATABASE-CLOUD_ENV-TEST_CATALOG-admin'. (18456) (SQLDriverConnect)") (851ms)❌ test_connection_test: sqlalchemy.exc.InterfaceError: (pyodbc.InterfaceError) ('28000', "[28000] [Microsoft][ODBC Driver 18 for SQL Server][SQL Server]Login failed for user 'REDSHIFT_DATABASE-CLOUD_ENV-TEST_CATALOG-admin'. (18456) (SQLDriverConnect)") (908ms)❌ test_mssql_connector_execute_query: sqlalchemy.exc.InterfaceError: (pyodbc.InterfaceError) ('28000', "[28000] [Microsoft][ODBC Driver 18 for SQL Server][SQL Server]Login failed for user 'REDSHIFT_DATABASE-CLOUD_ENV-TEST_CATALOG-admin'. (18456) (SQLDriverConnect)") (808ms)❌ test_profiler_connection_synapse_success: assert '\u2713 Dedicated SQL pool connection successful' in '10:07 DEBUG [databricks.REDSHIFT_DATABASE.lakebridge.contexts.application] Added User-Agent extra cmd=test-profiler-connection\n10:07 DEBUG [databricks.REDSHIFT_DATABASE.lakebridge.contexts.application] Added User-Agent extra profiler_source_tech=synapse\n10:07 DEBUG [databricks.REDSHIFT_DATABASE.lakebridge] User: User(active=True, display_name=\'REDSHIFT_DATABASE-tool-identity\', emails=[ComplexValue(display=None, primary=True, ref=None, type=\'work\', value=\'3fe685a1-96cc-4fec-8cdb-6944f5c9787e\')], entitlements=[], external_id=\'92f40178-1ee9-4156-a6da-28a376a12109\', groups=[ComplexValue(display=\'users\', primary=None, ref=\'Groups/153383108335587\', type=\'direct\', value=\'153383108335587\'), ComplexValue(display=\'admins\', primary=None, ref=\'Groups/149832780896743\', type=\'indirect\', value=\'149832780896743\'), ComplexValue(display=\'REDSHIFT_DATABASE.scope.tool\', primary=None, ref=\'Groups/531996560706268\', type=\'direct\', value=\'531996560706268\'), ComplexValue(display=\'REDSHIFT_DATABASE.scope.admin\', primary=None, ref=\'Groups/847659649002239\', type=\'direct\', value=\'847659649002239\'), ComplexValue(display=\'REDSHIFT_DATABASE.scope.account-admin\', primary=None, ref=\'Groups/688239313962730\', type=\'direct\', value=\'688239313962730\')], id=\'1425339244351829\', name=Name(family_name=None, given_name=\'REDSHIFT_DATABASE-tool-identity\'), roles=[], schemas=[, ], user_name=\'3fe685a1-96cc-4fec-8cdb-6944f5c9787e\')\n10:07 INFO [databricks.REDSHIFT_DATABASE.lakebridge] Testing connection for source technology: synapse\n10:07 INFO [databricks.REDSHIFT_DATABASE.lakebridge.connections.synapse_connection_helpers] Testing connection to dedicated SQL pool...\n10:07 ERROR [databricks.REDSHIFT_DATABASE.lakebridge.connections.synapse_connection_helpers] \u2717 Failed to connect to dedicated SQL pool: (pyodbc.InterfaceError) (\'28000\', "[28000] [Microsoft][ODBC Driver 18 for SQL Server][SQL Server]Login failed for user \'REDSHIFT_DATABASE-CLOUD_ENV-TEST_CATALOG-admin\'. (18456) (SQLDriverConnect)")\n(Background on this error at: https://sqlalche.me/e/20/rvf5)\n10:07 ERROR [databricks.REDSHIFT_DATABASE.lakebridge] Failed to connect to the source system: Connection failed for SQL pools - dedicated: Failed to connect to dedicated SQL pool: (pyodbc.InterfaceError) (\'28000\', "[28000] [Microsoft][ODBC Driver 18 for SQL Server][SQL Server]Login failed for user \'REDSHIFT_DATABASE-CLOUD_ENV-TEST_CATALOG-admin\'. (18456) (SQLDriverConnect)")\n(Background on this error at: https://sqlalche.me/e/20/rvf5)\n10:07 CRITICAL [databricks.REDSHIFT_DATABASE.lakebridge] Connection validation failed. Exiting...\n' (1.984s)❌ test_synapse_connector_execute_query: sqlalchemy.exc.InterfaceError: (pyodbc.InterfaceError) ('28000', "[28000] [Microsoft][ODBC Driver 18 for SQL Server][SQL Server]Login failed for user 'REDSHIFT_DATABASE-CLOUD_ENV-TEST_CATALOG-admin'. (18456) (SQLDriverConnect)") (860ms)❌ test_synapse_connection_check: sqlalchemy.exc.InterfaceError: (pyodbc.InterfaceError) ('28000', "[28000] [Microsoft][ODBC Driver 18 for SQL Server][SQL Server]Login failed for user 'REDSHIFT_DATABASE-CLOUD_ENV-TEST_CATALOG-admin'. (18456) (SQLDriverConnect)") (794ms)❌ test_installs_and_runs_maven_morpheus: assert None is not None (46ms)❌ test_synapse_with_credential_format: sqlalchemy.exc.InterfaceError: (pyodbc.InterfaceError) ('28000', "[28000] [Microsoft][ODBC Driver 18 for SQL Server][SQL Server]Login failed for user 'REDSHIFT_DATABASE-CLOUD_ENV-TEST_CATALOG-admin'. (18456) (SQLDriverConnect)") (869ms)❌ test_installs_and_runs_pypi_bladebridge: assert None is not None (22ms)❌ test_run_pipeline: RuntimeError: Pipeline execution failed due to errors in steps: inventory, usage (30.908s)❌ test_snowflake_read_schema_happy: databricks.REDSHIFT_DATABASE.lakebridge.reconcile.exception.DataSourceRuntimeException: Runtime exception occurred while fetching schema using select column_name, case when numeric_precision is not null and numeric_scale is not null then concat(data_type, '(', numeric_precision, ',' , numeric_scale, ')') when lower(data_type) = 'text' then concat('varchar', '(', CHARACTER_MAXIMUM_LENGTH, ')') else data_type end as data_type from remorph.INFORMATION_SCHEMA.COLUMNS where lower(table_name)='diamonds' and table_schema = 'SANDBOX' order by ordinal_position : [RETRIES_EXCEEDED] The maximum number of retries has been exceeded. (1m22.279s)❌ test_recon_for_report_type_schema: pyspark.errors.exceptions.base.RetriesExceeded: [RETRIES_EXCEEDED] The maximum number of retries has been exceeded. (1m22.067s)❌ test_mock_data_source_no_catalog: pyspark.errors.exceptions.base.RetriesExceeded: [RETRIES_EXCEEDED] The maximum number of retries has been exceeded. (1m21.809s)❌ test_reconcile_data_without_mismatches_and_missing: pyspark.errors.exceptions.base.RetriesExceeded: [RETRIES_EXCEEDED] The maximum number of retries has been exceeded. (1m22.023s)❌ test_schema_recon_with_data_source_exception: pyspark.errors.exceptions.base.RetriesExceeded: [RETRIES_EXCEEDED] The maximum number of retries has been exceeded. (1m22.034s)❌ test_mock_data_source_happy: pyspark.errors.exceptions.base.RetriesExceeded: [RETRIES_EXCEEDED] The maximum number of retries has been exceeded. (1m22.43s)❌ test_data_recon_with_source_exception: pyspark.errors.exceptions.base.RetriesExceeded: [RETRIES_EXCEEDED] The maximum number of retries has been exceeded. (1m22.294s)❌ test_reconcile_data_with_threshold_and_row_report_type: pyspark.errors.exceptions.base.RetriesExceeded: [RETRIES_EXCEEDED] The maximum number of retries has been exceeded. (1m22.01s)❌ test_reconcile_data_with_mismatches_and_missing: pyspark.errors.exceptions.base.RetriesExceeded: [RETRIES_EXCEEDED] The maximum number of retries has been exceeded. (27.346s)❌ test_compare_data_for_report_all: pyspark.errors.exceptions.base.RetriesExceeded: [RETRIES_EXCEEDED] The maximum number of retries has been exceeded. (27.334s)❌ test_sql_server_read_schema_happy: databricks.REDSHIFT_DATABASE.lakebridge.reconcile.exception.DataSourceRuntimeException: Runtime exception occurred while fetching schema using SELECT COLUMN_NAME AS 'column_name', CASE WHEN DATA_TYPE IN ('int', 'bigint') THEN DATA_TYPE WHEN DATA_TYPE IN ('smallint', 'tinyint') THEN 'smallint' WHEN DATA_TYPE IN ('decimal' ,'numeric') THEN 'decimal(' + CAST(NUMERIC_PRECISION AS VARCHAR) + ',' + CAST(NUMERIC_SCALE AS VARCHAR) + ')' WHEN DATA_TYPE IN ('float', 'real') THEN 'double' WHEN CHARACTER_MAXIMUM_LENGTH IS NOT NULL AND DATA_TYPE IN ('varchar','char','text','nchar','nvarchar','ntext') THEN DATA_TYPE WHEN DATA_TYPE IN ('date','time','datetime', 'datetime2','smalldatetime','datetimeoffset') THEN DATA_TYPE WHEN DATA_TYPE IN ('bit') THEN 'boolean' WHEN DATA_TYPE IN ('binary','varbinary') THEN 'binary' ELSE DATA_TYPE END AS 'data_type' FROM INFORMATION_SCHEMA.COLUMNS WHERE LOWER(TABLE_NAME) = LOWER('reconcile_in') AND LOWER(TABLE_SCHEMA) = LOWER('dbo') AND LOWER(TABLE_CATALOG) = LOWER('REDSHIFT_DATABASE_CLOUD_ENV_TEST_CATALOG_remorph') : [RETRIES_EXCEEDED] The maximum number of retries has been exceeded. (27.565s)❌ test_reconcile_data_with_mismatch_and_no_missing: pyspark.errors.exceptions.base.RetriesExceeded: [RETRIES_EXCEEDED] The maximum number of retries has been exceeded. (27.409s)❌ test_build_query_for_snowflake_src_for_non_integer_primary_keys: pyspark.errors.exceptions.base.RetriesExceeded: [RETRIES_EXCEEDED] The maximum number of retries has been exceeded. (1m22.46s)❌ test_capture_mismatch_data_and_cols: pyspark.errors.exceptions.base.RetriesExceeded: [RETRIES_EXCEEDED] The maximum number of retries has been exceeded. (27.438s)❌ test_schema_recon_with_general_exception: pyspark.errors.exceptions.base.RetriesExceeded: [RETRIES_EXCEEDED] The maximum number of retries has been exceeded. (27.454s)❌ test_build_query_for_oracle_src: pyspark.errors.exceptions.base.RetriesExceeded: [RETRIES_EXCEEDED] The maximum number of retries has been exceeded. (1m22.116s)❌ test_recon_for_wrong_report_type: pyspark.errors.exceptions.base.RetriesExceeded: [RETRIES_EXCEEDED] The maximum number of retries has been exceeded. (27.365s)❌ test_build_query_for_snowflake_src: pyspark.errors.exceptions.base.RetriesExceeded: [RETRIES_EXCEEDED] The maximum number of retries has been exceeded. (27.52s)❌ test_reconcile_aggregate_data_mismatch_and_missing_records: pyspark.errors.exceptions.base.RetriesExceeded: [RETRIES_EXCEEDED] The maximum number of retries has been exceeded. (27.294s)❌ test_compare_data_for_report_hash: pyspark.errors.exceptions.base.RetriesExceeded: [RETRIES_EXCEEDED] The maximum number of retries has been exceeded. (27.359s)❌ test_reconcile_data_missing_and_no_mismatch: pyspark.errors.exceptions.base.RetriesExceeded: [RETRIES_EXCEEDED] The maximum number of retries has been exceeded. (27.42s)❌ test_aggregates_reconcile_store_aggregate_metrics: pyspark.errors.exceptions.base.RetriesExceeded: [RETRIES_EXCEEDED] The maximum number of retries has been exceeded. (27.384s)❌ test_capture_mismatch_data_and_cols_no_mismatch: pyspark.errors.exceptions.base.RetriesExceeded: [RETRIES_EXCEEDED] The maximum number of retries has been exceeded. (27.46s)❌ test_data_recon_with_general_exception: pyspark.errors.exceptions.base.RetriesExceeded: [RETRIES_EXCEEDED] The maximum number of retries has been exceeded. (27.376s)❌ test_build_query_for_databricks_src: pyspark.errors.exceptions.base.RetriesExceeded: [RETRIES_EXCEEDED] The maximum number of retries has been exceeded. (27.39s)❌ test_capture_mismatch_data_and_cols_fail: pyspark.errors.exceptions.base.RetriesExceeded: [RETRIES_EXCEEDED] The maximum number of retries has been exceeded. (27.525s)❌ test_capture_mismatch_data_and_cols_special_column_names: pyspark.errors.exceptions.base.RetriesExceeded: [RETRIES_EXCEEDED] The maximum number of retries has been exceeded. (27.345s)❌ test_recon_capture_start_snowflake_all: pyspark.errors.exceptions.base.RetriesExceeded: [RETRIES_EXCEEDED] The maximum number of retries has been exceeded. (27.359s)❌ test_databricks_read_schema_happy: pyspark.errors.exceptions.base.RetriesExceeded: [RETRIES_EXCEEDED] The maximum number of retries has been exceeded. (54.647s)❌ test_generate_final_reconcile_output_row: pyspark.errors.exceptions.base.RetriesExceeded: [RETRIES_EXCEEDED] The maximum number of retries has been exceeded. (27.389s)❌ test_recon_for_report_type_is_data: pyspark.errors.exceptions.base.RetriesExceeded: [RETRIES_EXCEEDED] The maximum number of retries has been exceeded. (27.335s)❌ test_reconcile_aggregate_data_missing_records: pyspark.errors.exceptions.base.RetriesExceeded: [RETRIES_EXCEEDED] The maximum number of retries has been exceeded. (27.39s)❌ test_test_recon_capture_start_databricks_row: pyspark.errors.exceptions.base.RetriesExceeded: [RETRIES_EXCEEDED] The maximum number of retries has been exceeded. (27.199s)❌ test_recon_capture_start_oracle_with_exception: pyspark.errors.exceptions.base.RetriesExceeded: [RETRIES_EXCEEDED] The maximum number of retries has been exceeded. (27.46s)❌ test_build_query_for_snowflake_without_transformations: pyspark.errors.exceptions.base.RetriesExceeded: [RETRIES_EXCEEDED] The maximum number of retries has been exceeded. (27.487s)❌ test_compare_data_special_column_names: pyspark.errors.exceptions.base.RetriesExceeded: [RETRIES_EXCEEDED] The maximum number of retries has been exceeded. (27.321s)❌ test_apply_threshold_for_mismatch_with_missing: pyspark.errors.exceptions.base.RetriesExceeded: [RETRIES_EXCEEDED] The maximum number of retries has been exceeded. (27.343s)❌ test_test_recon_capture_start_databricks_data: pyspark.errors.exceptions.base.RetriesExceeded: [RETRIES_EXCEEDED] The maximum number of retries has been exceeded. (27.36s)❌ test_generate_final_reconcile_output_data: pyspark.errors.exceptions.base.RetriesExceeded: [RETRIES_EXCEEDED] The maximum number of retries has been exceeded. (27.338s)❌ test_generate_final_reconcile_output_schema: pyspark.errors.exceptions.base.RetriesExceeded: [RETRIES_EXCEEDED] The maximum number of retries has been exceeded. (27.333s)❌ test_generate_final_reconcile_output_all: pyspark.errors.exceptions.base.RetriesExceeded: [RETRIES_EXCEEDED] The maximum number of retries has been exceeded. (27.473s)❌ test_recon_capture_start_oracle_schema: pyspark.errors.exceptions.base.RetriesExceeded: [RETRIES_EXCEEDED] The maximum number of retries has been exceeded. (27.378s)❌ test_recon_capture_start_with_exception: pyspark.errors.exceptions.base.RetriesExceeded: [RETRIES_EXCEEDED] The maximum number of retries has been exceeded. (27.565s)❌ test_generate_final_reconcile_output_exception: pyspark.errors.exceptions.base.RetriesExceeded: [RETRIES_EXCEEDED] The maximum number of retries has been exceeded. (27.326s)❌ test_apply_threshold_for_mismatch_with_true_absolute: pyspark.errors.exceptions.base.RetriesExceeded: [RETRIES_EXCEEDED] The maximum number of retries has been exceeded. (27.38s)❌ test_apply_threshold_for_mismatch_with_schema_fail: pyspark.errors.exceptions.base.RetriesExceeded: [RETRIES_EXCEEDED] The maximum number of retries has been exceeded. (27.434s)❌ test_apply_threshold_for_mismatch_with_wrong_absolute_bound: pyspark.errors.exceptions.base.RetriesExceeded: [RETRIES_EXCEEDED] The maximum number of retries has been exceeded. (27.391s)❌ test_apply_threshold_for_mismatch_with_true_percentage_bound: pyspark.errors.exceptions.base.RetriesExceeded: [RETRIES_EXCEEDED] The maximum number of retries has been exceeded. (27.395s)❌ test_apply_threshold_for_mismatch_with_invalid_bounds: pyspark.errors.exceptions.base.RetriesExceeded: [RETRIES_EXCEEDED] The maximum number of retries has been exceeded. (27.433s)❌ test_apply_threshold_for_only_threshold_mismatch_with_true_absolute: pyspark.errors.exceptions.base.RetriesExceeded: [RETRIES_EXCEEDED] The maximum number of retries has been exceeded. (27.355s)❌ test_random_sampler_count: pyspark.errors.exceptions.base.RetriesExceeded: [RETRIES_EXCEEDED] The maximum number of retries has been exceeded. (27.335s)❌ test_stratified_sampler_count: pyspark.errors.exceptions.base.RetriesExceeded: [RETRIES_EXCEEDED] The maximum number of retries has been exceeded. (27.303s)❌ test_transpiles_informatica_to_sparksql_non_interactive[False]: assert (None is not None) (19ms)❌ test_transpile_teradata_sql: assert (None is not None) (1ms)❌ test_transpile_teradata_sql_non_interactive[True]: assert (None is not None) (1ms)❌ test_transpile_teradata_sql_non_interactive[False]: assert (None is not None) (1ms)❌ test_gets_maven_artifact_version: assert None is not None (26ms)❌ test_downloads_from_maven: assert False (13ms)❌ test_gets_pypi_artifact_version: assert None is not None (12ms)❌ test_transpiles_all_dbt_project_files: ValueError: No such transpiler: Morpheus (379ms)❌ test_transpile_sql_file: ValueError: No such transpiler: Morpheus (376ms)❌ test_stratified_sampler_negative_count: pyspark.errors.exceptions.base.RetriesExceeded: [RETRIES_EXCEEDED] The maximum number of retries has been exceeded. (27.287s)❌ test_redshift_schema_compare: pyspark.errors.exceptions.base.RetriesExceeded: [RETRIES_EXCEEDED] The maximum number of retries has been exceeded. (27.463s)❌ test_transpiles_informatica_to_sparksql: assert (None is not None) (20ms)❌ test_random_sampler_negative_count: pyspark.errors.exceptions.base.RetriesExceeded: [RETRIES_EXCEEDED] The maximum number of retries has been exceeded. (27.38s)❌ test_transpiles_informatica_to_sparksql_non_interactive[True]: assert (None is not None) (14ms)❌ test_snowflake_schema_compare: pyspark.errors.exceptions.base.RetriesExceeded: [RETRIES_EXCEEDED] The maximum number of retries has been exceeded. (27.573s)❌ test_apply_threshold_for_mismatch_with_wrong_percentage_bound: pyspark.errors.exceptions.base.RetriesExceeded: [RETRIES_EXCEEDED] The maximum number of retries has been exceeded. (27.449s)❌ test_recon_redshift_job_succeeds: databricks.sdk.errors.sdk.OperationFailed: failed to reach TERMINATED or SKIPPED, got RunLifeCycleState.INTERNAL_ERROR: Task run_reconciliation failed with message: Library installation failed for library due to user error. Error messages: (7m34.975s)❌ test_schema_compare: pyspark.errors.exceptions.base.RetriesExceeded: [RETRIES_EXCEEDED] The maximum number of retries has been exceeded. (27.416s)❌ test_recon_snowflake_job_succeeds: databricks.sdk.errors.sdk.OperationFailed: failed to reach TERMINATED or SKIPPED, got RunLifeCycleState.INTERNAL_ERROR: Task run_reconciliation failed with message: Workload failed, see run output for details. (8m20.013s)❌ test_tsql_schema_compare: pyspark.errors.exceptions.base.RetriesExceeded: [RETRIES_EXCEEDED] The maximum number of retries has been exceeded. (27.423s)❌ test_recon_sql_server_job_succeeds: databricks.sdk.errors.sdk.OperationFailed: failed to reach TERMINATED or SKIPPED, got RunLifeCycleState.INTERNAL_ERROR: Task run_reconciliation failed with message: Workload failed, see run output for details. (10m1.068s)❌ test_oracle_schema_compare: pyspark.errors.exceptions.base.RetriesExceeded: [RETRIES_EXCEEDED] The maximum number of retries has been exceeded. (27.475s)❌ test_databricks_schema_compare: pyspark.errors.exceptions.base.RetriesExceeded: [RETRIES_EXCEEDED] The maximum number of retries has been exceeded. (27.47s)Running from acceptance #4136 |
fb2d451 to
e952d15
Compare
e952d15 to
b095af4
Compare
m-abulazm
left a comment
There was a problem hiding this comment.
Looks good. we need to run the integration tests to be sure
SummaryThanks for adding Redshift support to Recon. ScopeThe change set is large relative to the PR title (“Add Redshift connector to Recon”): it includes many unrelated areas (e.g. profiler/Synapse, workflows, broader config/telemetry). For reviewability and release clarity, consider updating the title/description to reflect the full scope. Verify before merge
Suggestions (non-blocking)
|
Scope: The PR doesn't seem to affect any other part of the code apart from recon. Please elaborate.
|
Thanks for the follow-up — a few clarifications on my earlier review. ScopeLooking at the current branch again, the changes are reconcile-scoped (Redshift connector, adapter wiring, hash/query bits, constants, docs, tests). My comment about the PR being much broader than the title does not apply to this revision; I may have been thinking of an older diff or I mixed it up. No action needed from you on that unless you still have unrelated commits that are not on this PR.
|
Add Redshift connector to Recon