Skip to content

fix(client): always force WS reconnect on iOS foreground return#368

Merged
dimakis merged 2 commits into
mainfrom
fix/ios-first-message-lost-v2
Jun 5, 2026
Merged

fix(client): always force WS reconnect on iOS foreground return#368
dimakis merged 2 commits into
mainfrom
fix/ios-first-message-lost-v2

Conversation

@dimakis
Copy link
Copy Markdown
Owner

@dimakis dimakis commented Jun 5, 2026

Summary

  • iOS kills WebSocket connections during background without updating readyState
  • checkAndReconnect() trusted readyState — if the socket appeared OPEN, it did nothing
  • First message sent on the dead socket was silently lost (regression)
  • Fix: always tear down the old socket and force a fresh connection on foreground return
  • Sends during the reconnect window queue in pendingSends and flush after the welcome handshake

Root cause

connection.ts:328-336 — the readyState !== OPEN guard:

// BEFORE (broken)
checkAndReconnect(): void {
    if (!this.ws || this.ws.readyState !== WS_READY_STATE.OPEN) {
        // only reconnects if socket is visibly dead
    }
    // if readyState appears OPEN → does NOTHING
}

iOS kills the TCP connection during background but the JS readyState property stays OPEN until a send/recv actually fails. The first user message fires into the void.

Test plan

  • Background Mitzo on iOS, wait 30+ seconds, return to foreground
  • Send a message — verify it reaches the agent on the first try
  • Verify rapid foreground/background cycling doesn't cause double connections (reconnectTimer guard)
  • Verify desktop browser tab switching still works (visibilitychange path)

Re-opened from #367 which was merged without Centaur reviews.

🤖 Generated with Claude Code

iOS kills WebSocket connections during background without updating
readyState. checkAndReconnect() previously trusted readyState, so if
the socket appeared OPEN the function did nothing — first message sent
on the dead socket was silently lost.

Remove the readyState guard: always tear down the old socket and create
a fresh connection. Sends during the reconnect window queue in
pendingSends and flush after the welcome handshake completes.

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Owner Author

@dimakis dimakis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Centaur Review

Found 3 issue(s) (3 warning).

packages/client/src/connection.ts

Correct fix for the iOS stale-socket problem, but orphans live sockets without closing them and forces unnecessary reconnect churn on desktop — both worth addressing.

  • 🟡 regressions (L337): On desktop browsers, every tab switch (visibilitychange → visible) now tears down a healthy WebSocket and creates a new one. The old code preserved live sockets. This adds reconnect latency, a brief disconnected flash, and extra server load on every foreground return — even when the socket is fine. The reconnect + seq replay machinery makes this safe from a correctness standpoint, but it's wasteful on non-iOS platforms. Consider guarding with a Capacitor/iOS platform check, or at minimum accepting this as a documented trade-off. [fixable]
  • 🟡 bugs (L339): defuseOldWs() nulls out handlers and drops the reference but never calls ws.close(). On desktop where the socket IS still alive, this orphans an open WebSocket — the server keeps the old connection object until TCP timeout. Previously this was rare (only triggered when readyState was already broken), but now it happens on every foreground transition. The orphaned socket should be explicitly closed before being abandoned to avoid server-side connection leaks. [fixable]
  • 🟡 missing_tests: MitzoConnection has zero test coverage — no test or spec files exist under packages/client/. This PR changes reconnect behavior on all platforms (not just iOS). Key scenarios that should be tested: (1) checkAndReconnect always calls doConnect even when ws.readyState is OPEN, (2) checkAndReconnect is a no-op when reconnectTimer is set, (3) sends during the reconnect window queue in pendingSends and flush after welcome, (4) defuseOldWs prevents stale onclose from clobbering the new socket. [fixable]

Comment thread packages/client/src/connection.ts Outdated
* sends that arrive during the reconnect window queue in pendingSends
* and flush after the welcome handshake completes.
*/
checkAndReconnect(): void {
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 regressions: On desktop browsers, every tab switch (visibilitychange → visible) now tears down a healthy WebSocket and creates a new one. The old code preserved live sockets. This adds reconnect latency, a brief disconnected flash, and extra server load on every foreground return — even when the socket is fine. The reconnect + seq replay machinery makes this safe from a correctness standpoint, but it's wasteful on non-iOS platforms. Consider guarding with a Capacitor/iOS platform check, or at minimum accepting this as a documented trade-off. [fixable]

this.doConnect();
}
if (this.reconnectTimer) return;
this.defuseOldWs();
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 bugs: defuseOldWs() nulls out handlers and drops the reference but never calls ws.close(). On desktop where the socket IS still alive, this orphans an open WebSocket — the server keeps the old connection object until TCP timeout. Previously this was rare (only triggered when readyState was already broken), but now it happens on every foreground transition. The orphaned socket should be explicitly closed before being abandoned to avoid server-side connection leaks. [fixable]

- Add force parameter to checkAndReconnect(): iOS (via forceReconnect)
  always forces, desktop browser visibility events use default (false)
  which preserves healthy sockets
- Close old socket in defuseOldWs() to prevent server-side connection
  leak from orphaned live sockets
- Add 6 tests covering: force vs default behavior, reconnectTimer
  guard, pending send flush after forced reconnect, socket close

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Owner Author

@dimakis dimakis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Centaur Review

LGTM — no issues found.

@dimakis dimakis merged commit d6ab610 into main Jun 5, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant