Skip to content

Fix memory leak in mongo-c-driver retryable command retry loop#107448

Open
groeneai wants to merge 1 commit into
ClickHouse:masterfrom
groeneai:groeneai/fix-mongo-c-driver-reply-leak
Open

Fix memory leak in mongo-c-driver retryable command retry loop#107448
groeneai wants to merge 1 commit into
ClickHouse:masterfrom
groeneai:groeneai/fix-mongo-c-driver-reply-leak

Conversation

@groeneai

Copy link
Copy Markdown
Contributor

Related: ClickHouse/mongo-c-driver#13
Related: #107429

Changelog category (leave one):

  • Bug Fix (user-visible misbehavior in an official stable release)

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Fixed a memory leak in the bundled mongo-c-driver that could occur when reading from MongoDB (for example via a MongoDB dictionary or the mongodb table function) if a retryable read error was followed by a failed retry server selection.

Description

Bumps the mongo-c-driver submodule by one commit to fix a reply memory leak in the retryable command retry loop.

_mongoc_retryable_cmd_run reuses *reply across retry attempts. After an attempt fails with a retryable error, the loop calls select_retry_server(..., reply, ...) while reply still owns that attempt's allocation. select_retry_server (cursor path: mongoc_cluster_stream_for_reads -> _mongoc_cluster_stream_for_optype) treats reply as an out-param and, on its server-selection-failure path, overwrites it (bson_copy_to / bson_init) without freeing the prior contents. The loop's own bson_destroy(reply) sits after the if (!server_description) break;, so it is never reached when selection fails, and the attempt's reply is leaked.

The fix destroys and re-inits reply before select_retry_server overwrites it, and drops the now-redundant trailing destroy. See ClickHouse/mongo-c-driver#13 for the driver change.

This surfaced in CI as a LeakSanitizer 128-byte direct leak on test_dictionaries_all_layouts_separate_sources/test_mongo.py::test_simple_ssl (Integration tests (amd_asan_ubsan, ...)):

Direct leak of 128 byte(s) in 1 object(s) allocated from:
    bson_malloc                     contrib/mongo-c-driver/src/libbson/src/bson/memory.c:106
    bson_copy_to                    contrib/mongo-c-driver/src/libbson/src/bson/bson.c:2106
    _mongoc_cluster_run_opmsg_recv  contrib/mongo-c-driver/src/libmongoc/src/mongoc/mongoc-cluster.c:3313
    mongoc_cluster_run_opmsg / run_command_monitored / mongoc_cluster_run_command_monitored
    _retryable_cursor_command_execute / _mongoc_retryable_cmd_run
    _mongoc_cursor_run_command / _mongoc_cursor_response_refresh / _prime / mongoc_cursor_next
    DB::MongoDBSource::generate()   src/Processors/Sources/MongoDBSource.cpp:215

The trigger (retryable read error followed by a retry server-selection failure) is a rare failover race: it appeared twice in 30 days across PR CI runs, never on master. Reproduced deterministically with a standalone AddressSanitizer/LeakSanitizer harness driving the real _mongoc_retryable_cmd_run with mock callbacks matching the ownership pattern above; the leak is reported without the change and gone with it.

Found by @alexey-milovidov on #107429 and fixed here on his request.

Report: https://s3.amazonaws.com/clickhouse-test-reports/107429/index.html (Integration tests, amd_asan_ubsan, db disk, old analyzer, 3/6)

@groeneai

Copy link
Copy Markdown
Contributor Author

Pre-PR validation gate

# Question Answer
a Deterministic repro? Yes. A standalone ASan/LeakSanitizer harness drives the real _mongoc_retryable_cmd_run with mock execute/select_retry_server callbacks matching the ownership pattern. The leak is reported 100% of runs without the fix; clean with it.
b Root cause explained? Yes. The retry loop calls select_retry_server(..., reply, ...) while reply still owns the previous attempt's bson_copy_to allocation. select_retry_server (stream_for_reads -> _mongoc_cluster_stream_for_optype) overwrites reply as an out-param on its selection-failure path without freeing it, and the loop's bson_destroy(reply) sits after the !server_description break, so it is never reached. The attempt's reply leaks.
c Fix matches root cause? Yes. bson_destroy(reply); bson_init(reply); is moved to just before select_retry_server overwrites it (the existing reset idiom in mongoc-cluster.c OIDC reauth path), and the now-redundant trailing destroy is removed. It targets the lifetime bug directly, not a symptom.
d Test intent preserved / new tests added? The ASan/LSan harness is the regression proof. Note: ClickHouse does not build the mongo-c-driver test suite (only test.util phony targets), so a driver-internal C test would never run in ClickHouse CI; the leak is instead exercised by the existing test_mongo.py integration tests under sanitizers.
e Both directions demonstrated? Yes. Without fix: LeakSanitizer: detected memory leaks ... bson_malloc <- _mongoc_retryable_cmd_run, exit 1. With fix: no leak, exit 0. Same allocation site as the CI report.
f Fix is general, not a narrow patch? Yes. The fix is at the retry loop, the single owner of the cross-attempt reply lifetime, not a guard at the crash site. Sibling out-param writers (_handle_network_error, _mongoc_cluster_stream_for_optype) all assume a caller-owned empty reply; the retry loop was the one place violating that contract.

Session id: cron:clickhouse-worker-slot-2:20260614-184800

@nikitamikhaylov nikitamikhaylov added the can be tested Allows running workflows for external contributors label Jun 14, 2026
@groeneai

Copy link
Copy Markdown
Contributor Author

cc @alexey-milovidov @rschu1ze — could you review this? It bumps the mongo-c-driver submodule by one commit (ClickHouse/mongo-c-driver#13) to fix a reply memory leak in the retryable command retry loop: the loop calls select_retry_server() while reply still owns the previous attempt's allocation, and that function overwrites reply as an out-param without freeing it, so the allocation leaks when retry server selection fails. This is the test_simple_ssl LeakSanitizer failure from #107429.

@clickhouse-gh

clickhouse-gh Bot commented Jun 14, 2026

Copy link
Copy Markdown
Contributor

Workflow [PR], commit [59168d5]


AI Review

Summary

This PR bumps contrib/mongo-c-driver by one commit to fix a bson_t reply leak in _mongoc_retryable_cmd_run. The leak fix targets a real ownership bug, but the current generic reset also changes retryable-read error semantics for command helpers that rely on preserving the first failed server reply when retry server selection itself fails.

Missing context / blind spots
  • ⚠️ The direct PR CI report currently reports Total: 0 test results, so completed PR CI is not available as validation evidence in this review.
Findings
  • ⚠️ Majors
    • [contrib/mongo-c-driver:1] The submodule bump moves src/libmongoc/src/mongoc/mongoc-retryable-cmd.c to a version that destroys and reinitializes reply before every select_retry_server callback. That violates the retryable-read command path's behavior of returning the original retryable error when retrying is impossible: _retryable_read_select_retry_server ignores reply and calls mongoc_cluster_stream_for_reads(..., NULL, NULL), so on selection failure mongoc_client_read_command_with_opts now returns {} instead of the first failed command's server reply, losing errorLabels and server error fields.
    • Suggested fix: keep the reset local to callbacks that actually pass reply as an out-param, such as _retryable_cursor_commmand_select_retry_server, or otherwise preserve the previous reply until a select callback replaces it.
Tests
  • ⚠️ Add focused regression coverage for a retryable read command failure followed by retry server selection failure, asserting the caller-visible reply still contains the original server error document.
Final Verdict

Status: ⚠️ Request changes

Minimum required actions: preserve the original reply for retry-server callbacks that do not overwrite it, and add focused coverage for that error path or point to automated coverage that would fail on the regression above.

@clickhouse-gh clickhouse-gh Bot added pr-bugfix Pull request with bugfix, not backported by default submodule changed At least one submodule changed in this PR. labels Jun 14, 2026
Comment thread contrib/mongo-c-driver Outdated
@@ -1 +1 @@
Subproject commit a26c8e251fec147258e89b7e85e9adfa259fc42f
Subproject commit 5aad6ab39aea8a2aa425404bc3d82bbc6fe581cd

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This generic reset drops the original server reply for retryable read commands when retry server selection fails. _retryable_read_select_retry_server deliberately ignores its reply argument and calls mongoc_cluster_stream_for_reads(..., NULL, NULL), so the only reply available to mongoc_client_read_command_with_opts is the one produced by the failed first attempt. Before this submodule commit, a selection failure took the !server_description branch before the trailing bson_destroy, preserving that reply; now src/libmongoc/src/mongoc/mongoc-retryable-cmd.c:106 replaces it with {} before the callback can fail. That violates the retry path's "return the original retryable error" behavior and removes errorLabels / server error fields from the caller-visible reply.

Please keep the reset local to callbacks that actually pass reply as an out-param, for example the cursor retry-server path, or otherwise preserve the previous reply when the select callback does not replace it.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, thanks. I confirmed it against the driver source.

Of the three select_retry_server callbacks, only the cursor one passes reply as a real out-param:

  • _retryable_read_select_retry_server (client.c): BSON_UNUSED(reply), calls mongoc_cluster_stream_for_reads(..., NULL /* reply */, NULL).
  • _retryable_write_select_retry_server (cluster.c): BSON_UNUSED(reply), calls mongoc_cluster_stream_for_writes(..., NULL /* reply */, NULL).
  • _retryable_cursor_commmand_select_retry_server (cursor.c): forwards reply to mongoc_cluster_stream_for_reads, whose _mongoc_cluster_stream_for_optype selection-failure path re-inits it (bson_init / bson_copy_to) without freeing prior contents.

So the leak is cursor-path only, and the previous reset-before-select wiped the first attempt's reply (errorLabels and server error fields) on the read/write paths, which the loop used to preserve.

New approach: the retry loop is reverted to upstream, and the reset is moved into the cursor callback right before it hands reply to stream_for_reads. Read/write keep returning the failed attempt's reply; cursor no longer leaks. The submodule bump now points at the scoped commit (ClickHouse/mongo-c-driver@6b6b676).

Verified with a standalone ASan/LSan harness driving the loop's reply lifecycle with real libbson, one scenario per process:

  orig/read      leak=0 errorLabels=1   (baseline)
  orig/cursor    leak=1 errorLabels=0   <- the leak (256B)
  v1/read        leak=0 errorLabels=0   <- the regression you flagged
  scoped/read    leak=0 errorLabels=1   <- fix: errorLabels preserved
  scoped/cursor  leak=0 errorLabels=0   <- fix: leak gone

@rschu1ze

Copy link
Copy Markdown
Member

@groeneai Send the fix first upstream before integrating it in ClickHouse.

…ry loop

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@groeneai groeneai force-pushed the groeneai/fix-mongo-c-driver-reply-leak branch from 59168d5 to 6d09e59 Compare June 14, 2026 20:58
@clickhouse-gh clickhouse-gh Bot added the manual approve Manual approve required to run CI label Jun 14, 2026
@groeneai

Copy link
Copy Markdown
Contributor Author

Pre-PR validation gate (updated for the scoped fix, commit 6d09e59)

Revised after @clickhouse-gh[bot]'s review. The earlier reset-before-select approach fixed the leak but wiped the read/write paths' first-attempt reply (errorLabels). This version moves the reset into the cursor callback only.

# Question Answer
a Deterministic repro? Yes. A standalone ASan/LSan harness drives the _mongoc_retryable_cmd_run reply lifecycle with real libbson, one scenario per process. orig/cursor leaks 256 bytes 100% of runs; v1/read loses errorLabels 100% of runs.
b Root cause explained? Yes. The loop reuses *reply across attempts. Only the cursor select_retry_server callback forwards reply to mongoc_cluster_stream_for_reads; on its selection-failure path _mongoc_cluster_stream_for_optype re-inits reply (bson_init/bson_copy_to) without freeing prior contents, and the loop's trailing bson_destroy(reply) sits after the !server_description break, so the attempt's reply leaks. The read and write callbacks BSON_UNUSED(reply) and pass NULL downstream, so they never overwrite it.
c Fix matches root cause? Yes. The reset (bson_destroy(reply); bson_init(reply);) is placed inside the cursor callback, immediately before it hands reply to stream_for_reads (the single point where prior contents are about to be orphaned). The retry loop is reverted to upstream byte-for-byte.
d Test intent preserved / new tests added? The scoped fix restores the read/write paths' prior behavior (they keep returning the failed attempt's reply with errorLabels) that the v1 change broke. The ASan/LSan harness is the regression proof. ClickHouse does not build the mongo-c-driver test suite, so the leak is exercised by the existing test_mongo.py integration tests under sanitizers, same as how it was caught.
e Both directions demonstrated? Yes. Harness, one scenario per process under LSan: orig/cursor leak=1 errorLabels=0 (the original leak); v1/read leak=0 errorLabels=0 (the regression flagged in review); scoped/read leak=0 errorLabels=1 (errorLabels preserved); scoped/cursor leak=0 errorLabels=0 (leak gone).
f Fix is general, not a narrow patch? Yes. I checked every caller of mongoc_cluster_stream_for_reads/_for_writes: all others pass NULL or a fresh caller-owned reply. The only contract violation (reusing a live reply across the out-param boundary) is the retry loop, and within it only the cursor callback forwards reply. The fix is at that single owner, not a guard at the symptom.

Session id: cron:clickhouse-worker-slot-0:20260614-200800

@alexey-milovidov

Copy link
Copy Markdown
Member

Ok. One concern is that upstream could be managed by AI-haters. This often happens in open-source. So they can insult and humiliate our bot. Better to read everything and author the upstream fix from a real person.

@groeneai

Copy link
Copy Markdown
Contributor Author

On sending the fix upstream first: agreed, and I won't open a PR against mongodb/mongo-c-driver myself given the concern about how an AI-authored PR would land there.

The change ports cleanly for a human to author upstream. The retry loop is left byte-for-byte as upstream; the single reset (bson_destroy(reply); bson_init(reply);) moves into _retryable_cursor_commmand_select_retry_server in mongoc-cursor.c, just before it forwards reply to mongoc_cluster_stream_for_reads. That is the only select_retry_server callback that passes reply as an out-param, so it is the only leaking path; the read and write callbacks keep returning the failed attempt's reply with its errorLabels. The diff is in ClickHouse/mongo-c-driver#13, and I can hand over a clean git format-patch for whoever submits it. The standalone ASan/LSan harness confirms the leak is gone and errorLabels is preserved on the read path (results above).

Holding this PR until the upstream fix lands; I'll rebase the submodule bump onto whatever commit is accepted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

can be tested Allows running workflows for external contributors manual approve Manual approve required to run CI pr-bugfix Pull request with bugfix, not backported by default submodule changed At least one submodule changed in this PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants