[feat](streaming-job) support SSL and align MySQL CDC source with PG#3
Open
JNSimba wants to merge 29 commits into
Open
[feat](streaming-job) support SSL and align MySQL CDC source with PG#3JNSimba wants to merge 29 commits into
JNSimba wants to merge 29 commits into
Conversation
|
|
|
|
- Wire ssl_mode / ssl_rootcert from job properties into Debezium's
database.ssl.{mode,truststore,truststore.password} in MySqlSourceReader.
- SmallFileMgr gains getPkcs12TruststorePath() which converts the uploaded
PEM CA cert to a PKCS12 truststore (MySQL JDBC rejects raw PEM),
reusing the same content-addressed md5 cache as PG.
- mysql-5.7 compose template now mounts shared-with-PG root CA / server
cert / key and starts mysqld with --ssl-*. require_secure_transport is
NOT set, so existing useSSL=false tests keep working.
- Regression tests mirror the PG ones for ssl / table_mapping / col_filter.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Add 4 JUnit cases in SmallFileMgrTest covering single cert, chain alias safety, memory cache hit and invalid PEM rejection. - Refactor SmallFileMgr.getPkcs12TruststorePath to expose a 4-arg package-private overload (mirrors getFilePath) so tests can inject a local directory. - Replace the 60s sleep in test_streaming_mysql_job_ssl with an Awaitility poll on the expected incremental state. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…pshot JDBC verify
- FE validates ssl_mode against {disable, require, verify-ca} and requires
ssl_rootcert when ssl_mode=verify-ca is set.
- cdc_client normalizes PG-style ssl_mode to Debezium MySQL's underscore
spelling, and mirrors SSL config into jdbcProperties using Connector/J
native names (sslMode / trustCertificateKeyStoreUrl / Type / Password)
so the snapshot JDBC path actually performs CA verification. Flink CDC's
forked MySqlConnection drops Debezium SSL props from the snapshot URL,
so without this mirror snapshot JDBC silently skips CA verification.
- SmallFileMgr's PKCS12 cache fast-path now re-checks File.exists() so a
deleted truststore is regenerated on next use instead of returning a
stale path.
- Tests: add DataSourceConfigValidatorTest and MySqlSourceReaderTest;
split SmallFileMgrTest's cache-hit case into file-present and
regenerate-when-missing; fix lexicographic '2' <= ... comparison in
the MySQL SSL regression suite; add the three missing .out files.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… rootcert requirement verify-ca needs a companion ssl_rootcert after the cross-field check, so exercising it in testSslModeLegalValues (which passes only ssl_mode) now fails. Leave that case to testVerifyCaWithRootcertPasses which already supplies ssl_rootcert. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…eaks The new docker/.../mysql/certs/server.key is a self-signed test-only private key (same style as the existing postgresql fixture). Extend the gitleaks private-key rule with an allowlist scoped strictly to that path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ecc49bd to
cb87814
Compare
Resolve add/add conflict on DataSourceConfigValidatorTest.java: keep both PR's SSL validation tests and master's slot/publication identifier tests (apache#62526), and update PR's tests to call the new two-argument validateSource(input, dataSourceType) signature introduced by master in apache#62490. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
JNSimba
pushed a commit
that referenced
this pull request
Apr 27, 2026
…pache#62759) Related PR: apache#58002 Problem Summary: `arrow::Decimal256Array::Value()` returns raw bytes (`const uint8_t*`), we previously reinterpret_cast it to `Decimal256*` which is a misaligned access . ```text ../src/core/data_type_serde/data_type_decimal_serde.cpp:376:21: runtime error: reference binding to misaligned address 0x1338f3345b12 for type 'const FieldType' (aka 'const Decimal<integer<256, int>>'), which requires 8 byte alignment 0x1338f3345b12: note: pointer points here 00 c2 3e 20 00 00 10 63 2d 5e c7 6b 05 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ^ #0 0x56203d1eb751 in doris::DataTypeDecimalSerDe<(doris::PrimitiveType)35>::read_column_from_arrow(doris::IColumn&, arrow::Array const*, long, long, cctz::time_zone const&) const be/build_ASAN/../src/core/data_type_serde/data_type_decimal_serde.cpp:375:25 #1 0x56203d9c6d6b in doris::DataTypeNullableSerDe::read_column_from_arrow(doris::IColumn&, arrow::Array const*, long, long, cctz::time_zone const&) const be/build_ASAN/../src/core/data_type_serde/data_type_nullable_serde.cpp:350:26 #2 0x562055dd83b5 in doris::FromRecordBatchToBlockConverter::convert(doris::Block*) be/build_ASAN/../src/format/arrow/arrow_block_convertor.cpp:131:9 #3 0x562055dd9229 in doris::convert_from_arrow_batch(std::shared_ptr<arrow::RecordBatch> const&, std::vector<std::shared_ptr<doris::IDataType const>, ```
…irectory (apache#62935) In worktree dir, `.git` is not a directory but a file. before: ``` Version: doris-0.0.0-Unknown ``` now: ``` Version: doris-0.0.0-1f18be0330e ```
…daptive flush controller (apache#62744) ### What problem does this PR solve? Related PR: apache#60617 Problem Summary: `AdaptiveThreadPoolController::is_cpu_busy()` previously used normalized 1-minute load average as a proxy for CPU usage. This could misclassify IO wait pressure as CPU pressure and make adaptive flush thread shrinking too aggressive. This PR changes CPU pressure detection to use deltas of the existing `doris_be_cpu` metrics from `SystemMetrics`. It aggregates the existing CPU counters, computes CPU busy ratio from consecutive samples, and treats `iowait` as idle time for this decision path.
…che#62900) ### What problem does this PR solve? Related PR: apache#17581 Problem Summary: When add lambda function support, we forgot to add SerDe to ColumnRefExpr. In Create Alias Function, we only check serialization, not check deserialization. So we may write a invalid Expr's Json into Image or EditLog
### What problem does this PR solve?
Problem Summary:
Constraint changes can change MV rewrite eligibility, especially for
PK/FK/UK-based reasoning.
Example:
-- MV
SELECT o.o_orderkey
FROM orders o
INNER JOIN lineitem l
ON o.o_orderkey = l.l_orderkey
Without the relevant PK/FK metadata, the rewrite may fail.
After:
ALTER TABLE lineitem ADD CONSTRAINT lineitem_pk PRIMARY KEY
(l_orderkey);
ALTER TABLE orders ADD CONSTRAINT orders_fk
FOREIGN KEY (o_orderkey) REFERENCES lineitem(l_orderkey);
the same query may become rewritable because the optimizer can prove the
join relationship through constraints.
After dropping either the FK or the referenced PK, that reasoning is no
longer valid, so the rewrite should stop again.
This patch invalidates dependent MTMV rewrite caches after ADD/DROP
CONSTRAINT. The invalidation is driven by the analyzed table name and
affected base-table infos, so it does not depend on rewrite cache
contents that may have been built from old constraint metadata.
It also prevents an in-flight rewrite cache built before the constraint
change from being published after invalidation, avoiding stale PK/FK/UK
metadata from being reused by later rewrites.
---------
Co-authored-by: yangtao555 <yangtao555@jd.com>
1. use gpt-5.5 to replace gpt-5.4 xhigh to enhancement both review quality and speed 2. avoid visit out of directory since we forbid it in apache#62898 validated by zclllyybb#29
…t-to-trigger-teamcity workflow (apache#62934) ### What problem does this PR solve? Issue Number: N/A Problem Summary: When writing `COMMENT_BODY` to `GITHUB_OUTPUT` using the simple `echo "COMMENT_BODY='${COMMENT_BODY}'"` form, special characters in the PR comment body (single quotes, newlines, dollar signs, etc.) can break the `GITHUB_OUTPUT` format, potentially causing truncated values or unexpected parsing in subsequent steps. Replace it with GitHub's recommended multiline EOF-delimiter syntax using a random delimiter, which safely handles arbitrary content without any character escaping issues. **Before:** ```bash echo "COMMENT_BODY='${COMMENT_BODY}'" | tee -a "$GITHUB_OUTPUT" ``` **After:** ```bash COMMENT_BODY_DELIMITER=$(dd if=/dev/urandom bs=15 count=1 status=none | base64) { echo "COMMENT_BODY<<${COMMENT_BODY_DELIMITER}" echo "${COMMENT_BODY}" echo "${COMMENT_BODY_DELIMITER}" } | tee -a "$GITHUB_OUTPUT" ``` ### Release note None ### Check List (For Author) - Test: No need to test (security improvement to workflow script only) - Behavior changed: No - Does this need documentation: No --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…oad.* properties (apache#62680) ### What problem does this PR solve? Three categories of CREATE/ALTER JOB properties are accepted by the validator but silently ignored by the streaming runtime. Users set them, see no error, yet observe no effect — an easy source of confusion and hard-to-diagnose drift. Reject them upfront so the failure is visible. 1. **from-to path**: `schema` is a PG source-identity namespace (peer of `database`). The TVF path (`checkUnmodifiableProperties` / `cdc_stream`) already rejects it; the from-to path (`checkUnmodifiableSourceProperties`) did not. Now aligned. 2. **cdc_stream TVF path**: `snapshot_split_key`, `snapshot_split_size` and `snapshot_parallelism` are materialized into split metadata (`remainingSplits`, `chunkHighWatermarkMap`) on first fetch and never re-read at runtime. ALTER on these is a silent no-op; reject. 3. **target properties (`load.*`)**: `StreamingMultiTblTask#generateStreamLoadProps` only honors `load.max_filter_ratio` and `load.strict_mode`. Any other `load.*` (e.g. `load.where`, `load.columns`, `load.merge_type`) is accepted by `DataSourceConfigValidator.validateTarget` but dropped at runtime. Added an allow-list and reject unsupported `load.*` keys.
Problem Summary: Remove the remaining Pipeline Tracing packaging artifacts and repository tools so the codebase no longer ships dead tracing directories or conversion scripts after the runtime feature deletion.
…ngJob (apache#62747) ### What problem does this PR solve? StreamingInsertJob currently has no way to bind to a specific compute_group (cloud cluster). In multi-cluster cloud deployments this forces all streaming jobs to share whatever cluster the Coordinator happens to resolve at task time, making per-job resource isolation impossible. This PR adds a `compute_group` job property so users can pin a StreamingInsertJob to a specific compute_group, consistent with RoutineLoad's name-based binding model (aligned with PR apache#52911 which migrated RoutineLoad to name-based cluster lookup).
…pache#62804) ### What problem does this PR solve? `InsertIntoTableCommand.applyInsertPlanStatistic` populated `LoadStatistic.fileNum` from `FileScanNode.getSelectedSplitNum()`, i.e. the BE **split count**, not the number of physical input files. When a file crossed the split-size threshold (default `max_initial_file_split_size × 1.1 ≈ 35.2MB`) and was cut into multiple splits, both `jobs("type"="insert").LoadStatistic.FileNumber` and `tasks("type"="insert").LoadStatistic.FileNumber` reported a value larger than the actual file list. In the user-reported scenario, 8 input files appeared as `FileNumber = 16` because each 42MB file was split in two. Data correctness is unaffected; only the displayed statistic was misleading. This affects both streaming insert jobs and regular `INSERT INTO ... SELECT FROM S3/HDFS/Hive`.
…ne script tampering (apache#62953) ### What problem does this PR solve? Issue Number: N/A Problem Summary: **1. COMMENT_BODY output injection** Writing `COMMENT_BODY` to `GITHUB_OUTPUT` using `echo "COMMENT_BODY='${COMMENT_BODY}'"` breaks when PR comments contain special characters (single quotes, newlines, `$`, etc.), causing truncated or incorrectly parsed values in subsequent steps. Fixed by using GitHub's recommended EOF-delimiter multiline syntax. **2. Poisoned Pipeline Execution (PPE) via pipeline script tampering** Shell scripts under `regression-test/pipeline/` are sourced by TeamCity builds that have access to credentials (oss_ak/oss_sk, tokens, etc.). A malicious PR author could modify these scripts to exfiltrate secrets when any committer triggers a single test (e.g. `run p0`). Added a security guard step that detects if a PR modifies any `.sh` file under `regression-test/pipeline/` (excluding `conf/` subdirectories). If so, single test triggers are blocked with an explanatory PR comment. Only `run buildall` is allowed, requiring the committer to explicitly acknowledge they have reviewed the pipeline script changes. The detection logic is intentionally written inline in the workflow YAML (not sourced from any repository file), so it cannot be bypassed by modifying files in the PR. GitHub guarantees that `issue_comment` event workflows always run from the default branch. ### Release note None ### Check List (For Author) - Test: No need to test (security improvements to workflow only) - Behavior changed: Yes — PRs that modify `.sh` files under `regression-test/pipeline/` (excl. `conf/`) can only be triggered via `run buildall` - Does this need documentation: No --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ire (apache#62805) ## Proposed changes Fix two related bugs on the event-driven warm-up path. Together they stall BE's heavy work pool when a warm-up job is cancelled or expired while BE still has its id in `_tablet_replica_cache`. ### 1. BE — `CANCELED` vs `CANCELLED` typo `be/src/cloud/cloud_warm_up_manager.cpp` used `st.is<CANCELED>()` (one L). `CANCELED` is not in `ErrorCode`; ADL resolved it to `PCacheStatus::CANCELED = 9` from a proto enum, so the check compared against 9 and was always false. When FE returned `TStatusCode.CANCELLED` (value 1) to tell BE a job was done, BE never pushed the `job_id` into `cancelled_jobs`, leaving a zombie entry in `_tablet_replica_cache` that every subsequent `commit_rowset` re-queried. Fix: use `st.is<ErrorCode::CANCELLED>()`, matching the same namespace-qualified form used elsewhere in the file. ### 2. FE — NPE in `getTabletReplicaInfos` after job removal ```java if (job == null || job.isDone()) { LOG.info("warmup job {} is not running, notify caller BE {} to cancel job", job.getJobId(), clientAddr); // NPE when job == null ... } ``` Once a cancelled job was removed from `cloudWarmUpJobs` past `history_cloud_warm_up_job_keep_max_second`, `job` is null and the log call NPE'd. With bug #1 keeping the stale id in BE cache, BE kept hitting this RPC forever; each failure took the `apache::thrift::TException` branch in `thrift_rpc_helper.cpp`, which sleeps 2s inside `CloudWarmUpManager::_mtx`. That serialised `bthread_fork_join(commit_rowset)`, blocked heavy-pool threads in `CloudTabletsChannel::close`, and backed up the heavy-pool queue — leading to load timeouts and query `Fragment RPC Phase1` latency in the 10s range. Fix: log `request.getWarmUpJobId()` instead; it is guaranteed set by the enclosing `request.isSetWarmUpJobId()` check.
Problem Summary: CORE-6050 reports that canceling a cloud warm up job can fail to notify surviving BEs when client initialization fails on an unavailable BE. Keep the normal warmup RPC path fail-fast, but make CLEAR_JOB client initialization skip unavailable BEs and continue clearing reachable BEs.
…pache#62886) apache#58044 changed the error message when tablet is not found in cache but don't change the condition in `CloudInternalServiceImpl::warm_up_rowset`. This will cause passive warm up don't retry because the status code is not `TABLE_NOT_FOUND`. This PR fix it.
…gned memory reads (apache#62918) Data type serde code contains many manual memcpy calls that read scalar values (offsets, decimals, timestamps, serialized fields) from potentially unaligned memory addresses (Arrow buffers, binary deserialization streams). These are functionally correct but obscure the intent. The codebase already provides unaligned_load<T>() in util/unaligned.h which wraps the same memcpy with a clear typed interface. This change replaces all such manual memcpy(&local_var, ptr, sizeof(T)) patterns with auto local_var = unaligned_load<T>(ptr), making the unaligned-read semantics explicit and reducing boilerplate. Additionally: - unaligned_load now has static_assert(std::is_trivially_copyable_v<T>) matching the existing unaligned_store. - PackedInt128 / PackedUInt128 were missing explicit copy constructors, which caused a C++20 -Wdeprecated-copy error (promoted to -Werror) when unaligned_load returned by value.
…path (apache#62872) ### What problem does this PR solve? Related PR: apache#59786 Problem Summary: PR apache#59786 (commit e78d089) refactored `Scanner::try_append_late_arrival_runtime_filter()` and accidentally removed the trailing `_applied_rf_num = arrived_rf_num;` assignment. As a result `_applied_rf_num` is permanently 0 after construction: * the fast-path `_applied_rf_num == _total_rf_num` early return at the top of the function never fires, so every batch goes through the full late-arrival check; * `arrived_rf_num == _applied_rf_num` only short-circuits when no runtime filter has ever arrived. Once any RF arrives, every subsequent call needlessly `_conjuncts.clear()`s, re-clones the conjunct ctxs and appends the old ones into `_stale_expr_ctxs` — wasted CPU plus slow memory growth; * the `ApplyAllRuntimeFilters=True` info string in scanner profile (`file_scanner.cpp:384`) is never emitted; * `DCHECK(_applied_rf_num < _total_rf_num)` is effectively dead because the left-hand side is always 0. Restore the single missing assignment after cloning the new conjunct ctxs. ### Release note None ### Check List (For Author) - Test: No need to test (one-line restoration of removed assignment; behavior covered by existing runtime-filter regression tests) - Behavior changed: No - Does this need documentation: No --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Today SQL_BLOCK_RULE can block SQL by regex/sqlHash or by scan scale
thresholds such as
`partition_num`, `tablet_num`, and `cardinality`, but it cannot directly
reject queries that
scan a partitioned table without using any partition filter.
This PR adds a new scan-based SQL block rule option:
`require_partition_filter`.
When this option is enabled on a rule, Doris rejects scans on supported
partitioned tables if
the query does not contain any usable partition predicate. This helps
prevent accidental full
partition scans caused by missing partition filters.
The rule is intended for workloads where partition filters are mandatory
for safety or cost
control.
### User-visible behavior
Users can now create a SQL block rule like this:
```sql
CREATE SQL_BLOCK_RULE require_partition_filter_rule
PROPERTIES(
"require_partition_filter" = "true",
"global" = "true",
"enable" = "true"
);
Or bind it to specific users:
CREATE SQL_BLOCK_RULE require_partition_filter_rule
PROPERTIES(
"require_partition_filter" = "true",
"global" = "false",
"enable" = "true"
);
SET PROPERTY FOR 'test_user' 'sql_block_rules' =
'require_partition_filter_rule';
If a supported partitioned table is scanned without any partition
predicate, Doris returns an
error like:
sql hits sql block rule: require_partition_filter_rule, missing
partition filter
### Scope
This new rule currently applies to:
- Partitioned internal tables
- Partitioned Hive external tables
This rule does not apply to:
- Non-partitioned internal tables
- Non-partitioned Hive tables
- Iceberg tables
- Other external table types not wired into this rule yet
### Rule semantics
The new property is:
- require_partition_filter = true|false
Behavior:
- The rule only takes effect when require_partition_filter=true
- For supported partitioned tables, the scan is allowed if the query
hits any partition column
in partition pruning predicates
- Filters on non-partition columns do not count
- The rule applies to scan-producing statements, such as:
- SELECT
- INSERT INTO ... SELECT ...
- EXPLAIN is not blocked
Examples:
Blocked:
SELECT * FROM part_tbl;
SELECT * FROM part_tbl WHERE non_partition_col = 1;
INSERT INTO dst SELECT * FROM part_tbl;
Allowed:
SELECT * FROM part_tbl WHERE dt = '2026-04-09';
SELECT * FROM part_tbl WHERE dt = '2026-04-09' AND hh = '10';
INSERT INTO dst SELECT * FROM part_tbl WHERE dt = '2026-04-09';
### Compatibility and validation
require_partition_filter is treated as a scan-based SQL block condition.
It can be used together with existing scan-based limits such as:
- partition_num
- tablet_num
- cardinality
It cannot be mixed with text-based block conditions such as:
- sql
- sqlHash
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What problem does this PR solve?
Issue Number: close #xxx
Related PR: #xxx
Problem Summary:
The MySQL CDC streaming source currently lacks several configuration knobs that the PostgreSQL path already supports. This PR brings the MySQL side up to parity:
ssl_modeandssl_rootcertproperties are now wired from job config into Debezium'sdatabase.ssl.{mode,truststore,truststore.password}. Because MySQL JDBC does not accept raw PEM,SmallFileMgrgains agetPkcs12TruststorePath()helper that converts the uploaded PEM CA cert into a PKCS12 truststore on first use, reusing the same content-addressed md5 cache as the PG path. The truststore password is a fixedchangeitplaceholder (public-CA-only truststore; password is not a security boundary, just a JCA API requirement).mysql-5.7.yaml.tplnow mounts the same root CA / server cert / key as PG and startsmysqldwith--ssl-ca/cert/key.require_secure_transportis intentionally NOT set so existinguseSSL=falsetests keep working.test_streaming_mysql_job_ssl,test_streaming_mysql_job_table_mapping,test_streaming_mysql_job_col_filter.SmallFileMgrTestcovers the new PEM → PKCS12 conversion: single cert, chain alias safety (soca0/ca1don't overwrite each other), memory cache hit, invalid PEM rejection.Release note
Support SSL (`ssl_mode`, `ssl_rootcert`) for MySQL CDC streaming insert jobs.
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)