Problem
When an AutoGen agent uses MCP tools (via mcp-tools integration) and a single tool fails, the error propagates up and aborts the entire agent run. This is problematic because:
- One bad tool kills the session: If the agent calls 5 tools and the 3rd fails (e.g., MCP server timeout, tool-specific error), the remaining 2 tool calls are lost
- No per-tool error reporting: The agent receives a generic "MCP tool call failed" error, not structured error information about which tool failed and why
- No retry/fallback: There's no mechanism for the agent to retry the failed tool or use an alternative
In production multi-agent scenarios, individual tool failures should be surfaced as tool-level errors that the agent can reason about and recover from, not as fatal exceptions.
Expected Behavior
- MCP tool errors should be caught and returned as structured
ToolResult with is_error=True rather than raising exceptions
- The agent should see: "Tool
search_docs failed: Connection timeout (MCP server at localhost:9000)"
- The agent run should continue with remaining tool calls
- Optional: configurable retry policy per tool/MCP server
Current Workaround
Users must wrap every MCP tool call in try/except in their agent logic, which defeats the purpose of declarative tool registration.
Related
Problem
When an AutoGen agent uses MCP tools (via
mcp-toolsintegration) and a single tool fails, the error propagates up and aborts the entire agent run. This is problematic because:In production multi-agent scenarios, individual tool failures should be surfaced as tool-level errors that the agent can reason about and recover from, not as fatal exceptions.
Expected Behavior
ToolResultwithis_error=Truerather than raising exceptionssearch_docsfailed: Connection timeout (MCP server at localhost:9000)"Current Workaround
Users must wrap every MCP tool call in try/except in their agent logic, which defeats the purpose of declarative tool registration.
Related
autogen-ext[mcp]