fix(federation): drive filigree-mcp over newline JSON-RPC (was Content-Length)#78
Merged
Merged
Conversation
…t-Length) filigree.rs's MCP stdio client (the observation_create / observation_dismiss path, used by propose_guidance and guidance promotion) framed requests with Loomweave's Content-Length plugin framing, but filigree-mcp uses the official MCP Python SDK (`mcp.server.stdio.stdio_server`), whose stdio transport is NEWLINE-delimited JSON-RPC. Same bug class as the Warpline churn consumer (#77). Verified empirically against the installed filigree-mcp: a newline-delimited initialize gets a clean result; a Content-Length-framed one makes filigree-mcp emit an "Internal Server Error" notification, after which loomweave's Content-Length reader cannot parse filigree's newline responses and the call hangs. The HTTP read path (issues_for / entity-associations) was unaffected — only the stdio observation seam was broken. Fix (mirrors the warpline transport fix): - write_mcp_json / read_mcp_json now use newline framing: one compact JSON line + \n; responses read line-by-line, skipping non-matching ids (the init result and the notification's id:null error), EOF-before-match surfaced as an error. - Extracted run_mcp_tool_over_command(program, args, root, timeout, tool, args): the handshake+call runs on a worker thread bounded by recv_timeout + kill, so a hung filigree-mcp degrades instead of blocking forever (FILIGREE_MCP_TIMEOUT, 10s). stderr -> Stdio::null so a large traceback can't block the child. The resolved command is a parameter, so the transport is unit-testable with an injected fake newline server (no env mutation — set_var is unsafe under edition 2024 + unsafe_code=deny). - Last-resort launcher fallback `("filigree", ["mcp"])` -> `filigree-mcp` (the real binary); `filigree mcp` is not a valid subcommand. The happy path still resolves `python -m filigree.mcp_server` via `filigree mcp-status`. TDD: newline-framing helper round-trip + EOF-error, fallback-command guard, and a real-subprocess happy-path + timeout-not-hang test driving a fake newline server. 131 federation tests pass; fmt + clippy (federation/mcp/cli, -D warnings) + cargo doc clean. Live-probed against the real filigree-mcp on /home/john/lacuna (newline initialize+tools/call round-trips; Content-Length errors). Residual (pre-existing, NOT bounded here): resolve_filigree_mcp_command runs `filigree mcp-status --json` via a plain blocking .output() before the timeout-bounded section, so a hung mcp-status is an unbounded wait outside the new deadline. Short-lived in practice; bounding it is a follow-up. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
tachyon-beep
added a commit
that referenced
this pull request
Jun 28, 2026
…, warpline #77 in flight); PDR-0006
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
crates/loomweave-federation/src/filigree.rs's MCP stdio client framed requests with Loomweave's Content-Length plugin framing (ADR-002), butfiligree-mcpuses the official MCP Python SDK (mcp.server.stdio.stdio_server), whose stdio transport is newline-delimited JSON-RPC. Same bug class as the Warpline churn consumer (#77), found while reviewing that fix.Verified empirically against the installed
filigree-mcp:initialize→ clean result, exit 0initialize→ filigree-mcp emits an"Internal Server Error"notification; loomweave's Content-Length reader then can't parse filigree's newline responses → the call hangs.Blast radius (narrow): only the stdio observation seam —
create_observation(propose_guidance) anddismiss_observation(guidance promotion). The main filigree read path (issues_for/ entity-associations) is HTTP and was unaffected. Tracked asclarion-a5bfcf5ef9.Fix (mirrors the warpline transport fix in #77)
write_mcp_json/read_mcp_json→ newline framing: one compact JSON line +\n; responses read line-by-line, skipping non-matching ids (the init result, the notification'sid: nullerror); EOF-before-match surfaced as an error.run_mcp_tool_over_command(program, args, root, timeout, tool, args): the handshake+call runs on a worker thread bounded byrecv_timeout+ kill, so a hungfiligree-mcpdegrades instead of blocking forever (FILIGREE_MCP_TIMEOUT, 10s).stderr → Stdio::null. The resolved command is a parameter so the transport is unit-testable with an injected fake newline server (no env mutation —set_varis unsafe under edition 2024 +unsafe_code = deny).("filigree", ["mcp"])→filigree-mcp(the real binary;filigree mcpis not a valid subcommand). The happy path still resolvespython -m filigree.mcp_serverviafiligree mcp-status.Tests / validation
TDD: newline-framing helper round-trip + EOF-error, fallback-command guard, and a real-subprocess happy-path + timeout-not-hang test driving a fake newline server.
fmt+clippy(federation/mcp/cli,-D warnings) +cargo doc -D warningsclean.filigree-mcpon/home/john/lacuna: newlineinitialize+tools/callround-trip cleanly; Content-Length errors (the bug).loomweave-federation(warpline fix(churn): speak warpline-mcp's newline JSON-RPC + honest paging/keying disclosure #77 + this) — bug class closed.Residual (pre-existing, not bounded here)
resolve_filigree_mcp_commandrunsfiligree mcp-status --jsonvia a plain blocking.output()before the timeout-bounded section, so a hungmcp-statusis an unbounded wait outside the new deadline. Short-lived in practice; bounding it is a follow-up.Independent of #77 (different file); merges cleanly to
main.🤖 Generated with Claude Code