Drain active_connections_ in McpServer::shutdown and cover the contract#218
Merged
Merged
Conversation
added 3 commits
April 20, 2026 10:56
#218) shutdown() previously flipped server_running_ = false on the caller thread and then posted a cleanup task that disabled listeners and cleared the connection-manager list -- but never cleared active_connections_ or lifecycle_callbacks_. Those survived into ~McpServer, whose destruction runs on whichever thread owns the server (typically the test thread or the app-main thread). Each ConnectionImpl destructor fires closeSocket(LocalClose), which forwards through the per-connection ConnectionLifecycleCallbacks adapter into McpServer::onConnectionLifecycleEvent -- which asserts it is running on the main dispatcher thread. The process then crashes at teardown even when the body of work completed cleanly. Fix: drain active_connections_ and lifecycle_callbacks_ inside the cleanup post, so each destruction fires on the dispatcher thread. Two details that make this safe: - server_running_ is already false by the time the post runs, so the container-unwind branch inside onConnectionLifecycleEvent is skipped. Otherwise each synchronous LocalClose would try to erase its own element from the vector we are actively clearing, invalidating iterators and possibly dropping adapters that are mid-callback. - Connections are cleared before their lifecycle adapters. The connection's callback list still references the adapter during the last closeSocket in ~ConnectionImpl; destroying the adapter first would be a use-after-free on that last callback.
Two real-IO integration tests against a loopback McpServer.
RepeatedConnectCloseCyclesStayStable drives three sequential
connect/close cycles against the listener. Each cycle relies on the
per-connection ConnectionLifecycleCallbacks adapter firing RemoteClose
on the dispatcher thread and erasing its entry from active_connections_
and lifecycle_callbacks_ via the deferred-delete queue. A fourth
connect after the three cycles confirms the listener is still
accepting.
ShutdownClosesLiveConnections is the sharp one. It holds a client TCP
connection open, calls McpServer::shutdown(), and asserts two observable
consequences of the drain path:
- The client's socket reads EOF. Only the dispatcher-thread drain of
active_connections_ could have closed the server-side fd during
shutdown; without the drain, the server fd would survive until
~McpServer and the client wouldn't see FIN until much later.
- server_.reset() returns normally. Pre-drain, destroying the server
with a live connection in active_connections_ fired
~ConnectionImpl -> closeSocket(LocalClose) on the test thread,
which hit the "onConnectionLifecycleEvent off dispatcher" assert
inside the adapter and crashed the process at teardown. Running
this test against the pre-fix revision of mcp_server.cc reproduces
the abort on the very first test; under the fix both tests pass.
A raw TCP client is used instead of a full McpClient because the
lifecycle adapter only observes transport-level ConnectionEvent values,
and keeping the client dumb isolates the test to the server-side
callback-routing and drain behavior.
The earlier revision of this test ended TearDown with server_.release() because the McpServer teardown bug -- active_connections_ surviving into ~McpServer on the caller thread -- would trip the dispatcher-thread assert inside onConnectionLifecycleEvent whenever a connection was still live at shutdown. The companion drain fix earlier in this branch clears active_connections_ and lifecycle_callbacks_ on the dispatcher thread inside the shutdown post, so by the time server_.reset() runs on the test thread the containers are empty and ~McpServer is a straightforward destruction. As a side effect, teardown no longer needs the 500ms safety-margin drain sleep either; the test completes in roughly a quarter of the earlier wall time.
c88e7c5 to
e7dd596
Compare
caleb2h
approved these changes
Apr 20, 2026
gophergogo
pushed a commit
that referenced
this pull request
Apr 20, 2026
#218) shutdown() previously flipped server_running_ = false on the caller thread and then posted a cleanup task that disabled listeners and cleared the connection-manager list -- but never cleared active_connections_ or lifecycle_callbacks_. Those survived into ~McpServer, whose destruction runs on whichever thread owns the server (typically the test thread or the app-main thread). Each ConnectionImpl destructor fires closeSocket(LocalClose), which forwards through the per-connection ConnectionLifecycleCallbacks adapter into McpServer::onConnectionLifecycleEvent -- which asserts it is running on the main dispatcher thread. The process then crashes at teardown even when the body of work completed cleanly. Fix: drain active_connections_ and lifecycle_callbacks_ inside the cleanup post, so each destruction fires on the dispatcher thread. Two details that make this safe: - server_running_ is already false by the time the post runs, so the container-unwind branch inside onConnectionLifecycleEvent is skipped. Otherwise each synchronous LocalClose would try to erase its own element from the vector we are actively clearing, invalidating iterators and possibly dropping adapters that are mid-callback. - Connections are cleared before their lifecycle adapters. The connection's callback list still references the adapter during the last closeSocket in ~ConnectionImpl; destroying the adapter first would be a use-after-free on that last callback.
gophergogo
pushed a commit
that referenced
this pull request
Apr 20, 2026
Two real-IO integration tests against a loopback McpServer.
RepeatedConnectCloseCyclesStayStable drives three sequential
connect/close cycles against the listener. Each cycle relies on the
per-connection ConnectionLifecycleCallbacks adapter firing RemoteClose
on the dispatcher thread and erasing its entry from active_connections_
and lifecycle_callbacks_ via the deferred-delete queue. A fourth
connect after the three cycles confirms the listener is still
accepting.
ShutdownClosesLiveConnections is the sharp one. It holds a client TCP
connection open, calls McpServer::shutdown(), and asserts two observable
consequences of the drain path:
- The client's socket reads EOF. Only the dispatcher-thread drain of
active_connections_ could have closed the server-side fd during
shutdown; without the drain, the server fd would survive until
~McpServer and the client wouldn't see FIN until much later.
- server_.reset() returns normally. Pre-drain, destroying the server
with a live connection in active_connections_ fired
~ConnectionImpl -> closeSocket(LocalClose) on the test thread,
which hit the "onConnectionLifecycleEvent off dispatcher" assert
inside the adapter and crashed the process at teardown. Running
this test against the pre-fix revision of mcp_server.cc reproduces
the abort on the very first test; under the fix both tests pass.
A raw TCP client is used instead of a full McpClient because the
lifecycle adapter only observes transport-level ConnectionEvent values,
and keeping the client dumb isolates the test to the server-side
callback-routing and drain behavior.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fix the pre-existing
McpServerteardown bug and cover the lifecycle+drain contract end-to-end, then drop the leak workaround the previous integration test needed around it.Three commits, each independently reviewable:
McpServer::shutdowndrain fix (src/server/mcp_server.cc) —shutdown()now clearsactive_connections_andlifecycle_callbacks_inside the cleanup post on the dispatcher thread, so eachConnectionImpldestructor firescloseSocket(LocalClose)there rather than on whichever thread ends up running~McpServer. Two correctness details are documented in the commit message:server_running_is alreadyfalseby the time the post runs, so the container-unwind branch insideonConnectionLifecycleEventis skipped — otherwise each synchronousLocalClosewould try to erase its own element from the vector we're actively clearing.closeSocketin~ConnectionImpl; destroying the adapter first would be a use-after-free on that last callback.Integration test (
tests/integration/test_mcp_server_connection_lifecycle.cc) — two real-IO tests against a loopbackMcpServer:RepeatedConnectCloseCyclesStayStable— three sequential connect/close cycles, then verify the listener is still accepting.ShutdownClosesLiveConnections— hold a client TCP connection open, callshutdown(), and assert both that the client reads EOF (proving the drain actually closed the server-side fd) and thatserver_.reset()returns normally (proving the drain emptied the containers before~McpServer).I confirmed the second test reproduces the exact
onConnectionLifecycleEvent off dispatcherassert atmcp_server.cc:804against the pre-fixmcp_server.cc; under the fix both tests pass.Drop the leak workaround in the initialize-routing test — the earlier integration test deliberately
release()d the server because shutdown couldn't clean up connections. With the drain fix it can, so the workaround goes away. As a side benefit the 500 ms drain sleep also disappears, cutting the test's wall time from ~650 ms to ~140 ms.Test plan
ctest -R \"[Mm]cp|[Ss]erver|[Ss]se|[Hh]ttp\"— 38/38 passline 804assert, confirming the test is actually guarding the contract