Skip to content

feat: MCP tool error isolation — individual tool failures should not abort entire agent run #7851

Description

@tcconnally

Problem

When an AutoGen agent uses MCP tools (via mcp-tools integration) and a single tool fails, the error propagates up and aborts the entire agent run. This is problematic because:

  1. One bad tool kills the session: If the agent calls 5 tools and the 3rd fails (e.g., MCP server timeout, tool-specific error), the remaining 2 tool calls are lost
  2. No per-tool error reporting: The agent receives a generic "MCP tool call failed" error, not structured error information about which tool failed and why
  3. No retry/fallback: There's no mechanism for the agent to retry the failed tool or use an alternative

In production multi-agent scenarios, individual tool failures should be surfaced as tool-level errors that the agent can reason about and recover from, not as fatal exceptions.

Expected Behavior

  1. MCP tool errors should be caught and returned as structured ToolResult with is_error=True rather than raising exceptions
  2. The agent should see: "Tool search_docs failed: Connection timeout (MCP server at localhost:9000)"
  3. The agent run should continue with remaining tool calls
  4. Optional: configurable retry policy per tool/MCP server

Current Workaround

Users must wrap every MCP tool call in try/except in their agent logic, which defeats the purpose of declarative tool registration.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions