Drain active_connections_ in McpServer::shutdown and cover the contract by gophergogo · Pull Request #218 · GopherSecurity/gopher-mcp

gophergogo · 2026-04-20T17:48:34Z

Summary

Fix the pre-existing McpServer teardown bug and cover the lifecycle+drain contract end-to-end, then drop the leak workaround the previous integration test needed around it.

Three commits, each independently reviewable:

McpServer::shutdown drain fix (src/server/mcp_server.cc) — shutdown() now clears active_connections_ and lifecycle_callbacks_ inside the cleanup post on the dispatcher thread, so each ConnectionImpl destructor fires closeSocket(LocalClose) there rather than on whichever thread ends up running ~McpServer. Two correctness details are documented in the commit message:
- server_running_ is already false by the time the post runs, so the container-unwind branch inside onConnectionLifecycleEvent is skipped — otherwise each synchronous LocalClose would try to erase its own element from the vector we're actively clearing.
- Connections are cleared before their lifecycle adapters. The connection's callback list still references the adapter during the final closeSocket in ~ConnectionImpl; destroying the adapter first would be a use-after-free on that last callback.
Integration test (tests/integration/test_mcp_server_connection_lifecycle.cc) — two real-IO tests against a loopback McpServer:
- RepeatedConnectCloseCyclesStayStable — three sequential connect/close cycles, then verify the listener is still accepting.
- ShutdownClosesLiveConnections — hold a client TCP connection open, call shutdown(), and assert both that the client reads EOF (proving the drain actually closed the server-side fd) and that server_.reset() returns normally (proving the drain emptied the containers before ~McpServer).
I confirmed the second test reproduces the exact onConnectionLifecycleEvent off dispatcher assert at mcp_server.cc:804 against the pre-fix mcp_server.cc; under the fix both tests pass.
Drop the leak workaround in the initialize-routing test — the earlier integration test deliberately release()d the server because shutdown couldn't clean up connections. With the drain fix it can, so the workaround goes away. As a side benefit the 500 ms drain sleep also disappears, cutting the test's wall time from ~650 ms to ~140 ms.

Test plan

ctest -R \"[Mm]cp|[Ss]erver|[Ss]se|[Hh]ttp\" — 38/38 pass
New lifecycle test passes 5× in a row (~270 ms each)
Reverted the fix locally and re-ran the new lifecycle test — reproduces the pre-fix line 804 assert, confirming the test is actually guarding the contract

#218) shutdown() previously flipped server_running_ = false on the caller thread and then posted a cleanup task that disabled listeners and cleared the connection-manager list -- but never cleared active_connections_ or lifecycle_callbacks_. Those survived into ~McpServer, whose destruction runs on whichever thread owns the server (typically the test thread or the app-main thread). Each ConnectionImpl destructor fires closeSocket(LocalClose), which forwards through the per-connection ConnectionLifecycleCallbacks adapter into McpServer::onConnectionLifecycleEvent -- which asserts it is running on the main dispatcher thread. The process then crashes at teardown even when the body of work completed cleanly. Fix: drain active_connections_ and lifecycle_callbacks_ inside the cleanup post, so each destruction fires on the dispatcher thread. Two details that make this safe: - server_running_ is already false by the time the post runs, so the container-unwind branch inside onConnectionLifecycleEvent is skipped. Otherwise each synchronous LocalClose would try to erase its own element from the vector we are actively clearing, invalidating iterators and possibly dropping adapters that are mid-callback. - Connections are cleared before their lifecycle adapters. The connection's callback list still references the adapter during the last closeSocket in ~ConnectionImpl; destroying the adapter first would be a use-after-free on that last callback.

Two real-IO integration tests against a loopback McpServer. RepeatedConnectCloseCyclesStayStable drives three sequential connect/close cycles against the listener. Each cycle relies on the per-connection ConnectionLifecycleCallbacks adapter firing RemoteClose on the dispatcher thread and erasing its entry from active_connections_ and lifecycle_callbacks_ via the deferred-delete queue. A fourth connect after the three cycles confirms the listener is still accepting. ShutdownClosesLiveConnections is the sharp one. It holds a client TCP connection open, calls McpServer::shutdown(), and asserts two observable consequences of the drain path: - The client's socket reads EOF. Only the dispatcher-thread drain of active_connections_ could have closed the server-side fd during shutdown; without the drain, the server fd would survive until ~McpServer and the client wouldn't see FIN until much later. - server_.reset() returns normally. Pre-drain, destroying the server with a live connection in active_connections_ fired ~ConnectionImpl -> closeSocket(LocalClose) on the test thread, which hit the "onConnectionLifecycleEvent off dispatcher" assert inside the adapter and crashed the process at teardown. Running this test against the pre-fix revision of mcp_server.cc reproduces the abort on the very first test; under the fix both tests pass. A raw TCP client is used instead of a full McpClient because the lifecycle adapter only observes transport-level ConnectionEvent values, and keeping the client dumb isolates the test to the server-side callback-routing and drain behavior.

The earlier revision of this test ended TearDown with server_.release() because the McpServer teardown bug -- active_connections_ surviving into ~McpServer on the caller thread -- would trip the dispatcher-thread assert inside onConnectionLifecycleEvent whenever a connection was still live at shutdown. The companion drain fix earlier in this branch clears active_connections_ and lifecycle_callbacks_ on the dispatcher thread inside the shutdown post, so by the time server_.reset() runs on the test thread the containers are empty and ~McpServer is a straightforward destruction. As a side effect, teardown no longer needs the 500ms safety-margin drain sleep either; the test completes in roughly a quarter of the earlier wall time.

#218) shutdown() previously flipped server_running_ = false on the caller thread and then posted a cleanup task that disabled listeners and cleared the connection-manager list -- but never cleared active_connections_ or lifecycle_callbacks_. Those survived into ~McpServer, whose destruction runs on whichever thread owns the server (typically the test thread or the app-main thread). Each ConnectionImpl destructor fires closeSocket(LocalClose), which forwards through the per-connection ConnectionLifecycleCallbacks adapter into McpServer::onConnectionLifecycleEvent -- which asserts it is running on the main dispatcher thread. The process then crashes at teardown even when the body of work completed cleanly. Fix: drain active_connections_ and lifecycle_callbacks_ inside the cleanup post, so each destruction fires on the dispatcher thread. Two details that make this safe: - server_running_ is already false by the time the post runs, so the container-unwind branch inside onConnectionLifecycleEvent is skipped. Otherwise each synchronous LocalClose would try to erase its own element from the vector we are actively clearing, invalidating iterators and possibly dropping adapters that are mid-callback. - Connections are cleared before their lifecycle adapters. The connection's callback list still references the adapter during the last closeSocket in ~ConnectionImpl; destroying the adapter first would be a use-after-free on that last callback.

Two real-IO integration tests against a loopback McpServer. RepeatedConnectCloseCyclesStayStable drives three sequential connect/close cycles against the listener. Each cycle relies on the per-connection ConnectionLifecycleCallbacks adapter firing RemoteClose on the dispatcher thread and erasing its entry from active_connections_ and lifecycle_callbacks_ via the deferred-delete queue. A fourth connect after the three cycles confirms the listener is still accepting. ShutdownClosesLiveConnections is the sharp one. It holds a client TCP connection open, calls McpServer::shutdown(), and asserts two observable consequences of the drain path: - The client's socket reads EOF. Only the dispatcher-thread drain of active_connections_ could have closed the server-side fd during shutdown; without the drain, the server fd would survive until ~McpServer and the client wouldn't see FIN until much later. - server_.reset() returns normally. Pre-drain, destroying the server with a live connection in active_connections_ fired ~ConnectionImpl -> closeSocket(LocalClose) on the test thread, which hit the "onConnectionLifecycleEvent off dispatcher" assert inside the adapter and crashed the process at teardown. Running this test against the pre-fix revision of mcp_server.cc reproduces the abort on the very first test; under the fix both tests pass. A raw TCP client is used instead of a full McpClient because the lifecycle adapter only observes transport-level ConnectionEvent values, and keeping the client dumb isolates the test to the server-side callback-routing and drain behavior.

gophergogo force-pushed the main branch from 2fbb883 to 7c3ba00 Compare April 20, 2026 17:53

gophergogo added 3 commits April 20, 2026 10:56

gophergogo force-pushed the feature/mcp-server-lifecycle-shutdown-drain branch from c88e7c5 to e7dd596 Compare April 20, 2026 17:57

caleb2h approved these changes Apr 20, 2026

View reviewed changes

gophergogo merged commit af8ca80 into main Apr 20, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Drain active_connections_ in McpServer::shutdown and cover the contract#218

Drain active_connections_ in McpServer::shutdown and cover the contract#218
gophergogo merged 3 commits into
mainfrom
feature/mcp-server-lifecycle-shutdown-drain

gophergogo commented Apr 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

gophergogo commented Apr 20, 2026

Summary

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants