Skip to content

fix(federation): drive filigree-mcp over newline JSON-RPC (was Content-Length)#78

Merged
tachyon-beep merged 1 commit into
mainfrom
fix/filigree-mcp-newline-transport
Jun 28, 2026
Merged

fix(federation): drive filigree-mcp over newline JSON-RPC (was Content-Length)#78
tachyon-beep merged 1 commit into
mainfrom
fix/filigree-mcp-newline-transport

Conversation

@tachyon-beep

Copy link
Copy Markdown
Collaborator

Problem

crates/loomweave-federation/src/filigree.rs's MCP stdio client framed requests with Loomweave's Content-Length plugin framing (ADR-002), but filigree-mcp uses the official MCP Python SDK (mcp.server.stdio.stdio_server), whose stdio transport is newline-delimited JSON-RPC. Same bug class as the Warpline churn consumer (#77), found while reviewing that fix.

Verified empirically against the installed filigree-mcp:

  • newline-delimited initialize → clean result, exit 0
  • Content-Length-framed initialize → filigree-mcp emits an "Internal Server Error" notification; loomweave's Content-Length reader then can't parse filigree's newline responses → the call hangs.

Blast radius (narrow): only the stdio observation seam — create_observation (propose_guidance) and dismiss_observation (guidance promotion). The main filigree read path (issues_for / entity-associations) is HTTP and was unaffected. Tracked as clarion-a5bfcf5ef9.

Fix (mirrors the warpline transport fix in #77)

  • write_mcp_json / read_mcp_jsonnewline framing: one compact JSON line + \n; responses read line-by-line, skipping non-matching ids (the init result, the notification's id: null error); EOF-before-match surfaced as an error.
  • Extracted run_mcp_tool_over_command(program, args, root, timeout, tool, args): the handshake+call runs on a worker thread bounded by recv_timeout + kill, so a hung filigree-mcp degrades instead of blocking forever (FILIGREE_MCP_TIMEOUT, 10s). stderr → Stdio::null. The resolved command is a parameter so the transport is unit-testable with an injected fake newline server (no env mutation — set_var is unsafe under edition 2024 + unsafe_code = deny).
  • Last-resort launcher fallback ("filigree", ["mcp"])filigree-mcp (the real binary; filigree mcp is not a valid subcommand). The happy path still resolves python -m filigree.mcp_server via filigree mcp-status.

Tests / validation

TDD: newline-framing helper round-trip + EOF-error, fallback-command guard, and a real-subprocess happy-path + timeout-not-hang test driving a fake newline server.

  • 131 federation tests pass (+6 new); fmt + clippy (federation/mcp/cli, -D warnings) + cargo doc -D warnings clean.
  • Live-probed against the real filigree-mcp on /home/john/lacuna: newline initialize + tools/call round-trip cleanly; Content-Length errors (the bug).
  • Confirmed this was the last Content-Length stdio client in loomweave-federation (warpline fix(churn): speak warpline-mcp's newline JSON-RPC + honest paging/keying disclosure #77 + this) — bug class closed.

Residual (pre-existing, not bounded here)

resolve_filigree_mcp_command runs filigree mcp-status --json via a plain blocking .output() before the timeout-bounded section, so a hung mcp-status is an unbounded wait outside the new deadline. Short-lived in practice; bounding it is a follow-up.

Independent of #77 (different file); merges cleanly to main.

🤖 Generated with Claude Code

…t-Length)

filigree.rs's MCP stdio client (the observation_create / observation_dismiss
path, used by propose_guidance and guidance promotion) framed requests with
Loomweave's Content-Length plugin framing, but filigree-mcp uses the official MCP
Python SDK (`mcp.server.stdio.stdio_server`), whose stdio transport is
NEWLINE-delimited JSON-RPC. Same bug class as the Warpline churn consumer (#77).

Verified empirically against the installed filigree-mcp: a newline-delimited
initialize gets a clean result; a Content-Length-framed one makes filigree-mcp
emit an "Internal Server Error" notification, after which loomweave's
Content-Length reader cannot parse filigree's newline responses and the call
hangs. The HTTP read path (issues_for / entity-associations) was unaffected —
only the stdio observation seam was broken.

Fix (mirrors the warpline transport fix):
- write_mcp_json / read_mcp_json now use newline framing: one compact JSON line +
  \n; responses read line-by-line, skipping non-matching ids (the init result and
  the notification's id:null error), EOF-before-match surfaced as an error.
- Extracted run_mcp_tool_over_command(program, args, root, timeout, tool, args):
  the handshake+call runs on a worker thread bounded by recv_timeout + kill, so a
  hung filigree-mcp degrades instead of blocking forever (FILIGREE_MCP_TIMEOUT,
  10s). stderr -> Stdio::null so a large traceback can't block the child. The
  resolved command is a parameter, so the transport is unit-testable with an
  injected fake newline server (no env mutation — set_var is unsafe under
  edition 2024 + unsafe_code=deny).
- Last-resort launcher fallback `("filigree", ["mcp"])` -> `filigree-mcp` (the
  real binary); `filigree mcp` is not a valid subcommand. The happy path still
  resolves `python -m filigree.mcp_server` via `filigree mcp-status`.

TDD: newline-framing helper round-trip + EOF-error, fallback-command guard, and
a real-subprocess happy-path + timeout-not-hang test driving a fake newline
server. 131 federation tests pass; fmt + clippy (federation/mcp/cli, -D warnings)
+ cargo doc clean. Live-probed against the real filigree-mcp on /home/john/lacuna
(newline initialize+tools/call round-trips; Content-Length errors).

Residual (pre-existing, NOT bounded here): resolve_filigree_mcp_command runs
`filigree mcp-status --json` via a plain blocking .output() before the
timeout-bounded section, so a hung mcp-status is an unbounded wait outside the
new deadline. Short-lived in practice; bounding it is a follow-up.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@tachyon-beep tachyon-beep merged commit b5aabe8 into main Jun 28, 2026
4 checks passed
tachyon-beep added a commit that referenced this pull request Jun 28, 2026
@tachyon-beep tachyon-beep deleted the fix/filigree-mcp-newline-transport branch June 28, 2026 12:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant