fix: improve cancellation handling for shell and MCP flows#3288
Open
sunipan wants to merge 2 commits intotailcallhq:mainfrom
Open
fix: improve cancellation handling for shell and MCP flows#3288sunipan wants to merge 2 commits intotailcallhq:mainfrom
sunipan wants to merge 2 commits intotailcallhq:mainfrom
Conversation
Co-Authored-By: ForgeCode <noreply@forgecode.dev>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Improve cancellation handling for shell commands and MCP server initialization so Forge can recover more gracefully when a model/tool flow is interrupted mid-run.
Context
This PR was prepared with AI assistance by me after repeatedly hitting a bad cancellation state in ForgeCode.
The failure mode I was seeing locally:
CTRL+CnorCTRL+Dwould get me out.I am using
cmux, so that environment may be part of the trigger or may make the failure mode easier to reproduce. However, with these changes, I was still able to get the Python crash dialog, but ForgeCode itself appeared to keep functioning correctly after a mid-flow cancellation.This PR does not try to fix the upstream Python-side crash directly. Instead, it makes Forge’s process and MCP cleanup paths more resilient when cancellation interrupts an in-flight shell/MCP operation.
Changes
RunningCommandwrapper for shell command children so dropped/cancelled commands are cleaned up explicitly.SIGINTbefore escalating tokillafter a short grace period.kill_on_dropbehavior for Forge-managed shell commands so cancellation can follow the graceful cleanup path.kill_on_dropfor stdio MCP child commands so the MCP transport can attempt graceful shutdown.Key Implementation Details
Shell command cancellation is now managed by a wrapper that sends
SIGINTon Unix and then waits up to 3 seconds before killing the child process if it has not exited.MCP server loading now runs per-server initialization behind each server’s configured timeout. Failed or timed-out servers are collected into the
McpServersfailure map, while successful servers are still published.Reloading MCP clears the persisted cache and invalidates the config hash, but it does not eagerly wipe the in-memory tool registry. That keeps previously working tools available until the next initialization pass completes.
Use Cases
Testing
Ran the focused tests and checks locally:
Results:
forge_infraexecutor tests: 13 passedforge_servicesMCP service tests: 9 passedcargo check: passed for changed packagescargo clippy: passed for changed packagesNotes for Reviewers
This is an AI-assisted PR from a real local failure case. The cancellation trigger may involve my local
cmuxsetup, but the behavior this changes is within Forge’s shell/MCP cleanup path and should improve resilience generally.