Skip to content

fix: prevent panic on send to closed channel in WebSocket.Disconnect()#43

Open
kramer65 wants to merge 1 commit into
krakenfx:mainfrom
kramer65:fix-race
Open

fix: prevent panic on send to closed channel in WebSocket.Disconnect()#43
kramer65 wants to merge 1 commit into
krakenfx:mainfrom
kramer65:fix-race

Conversation

@kramer65

Copy link
Copy Markdown

Problem

Disconnect() contains a race condition that causes a panic
(send on closed channel) in production under reconnect load.

The sequence that triggers it:

  1. Disconnect() creates an unbuffered done channel and registers a
    Recurring callback on OnDisconnected that sends true to done.
  2. defer close(done) is registered, so done is closed when
    Disconnect() returns.
  3. The OnDisconnected event can fire from the read() goroutine
    concurrently with (or after) the defer close(done) runs.
  4. When the callback tries done <- true on an already-closed channel:
    panic: send on closed channel.

This panic is not recoverable and crashes the entire process, taking
down any other goroutines or connections that were running concurrently.

Root cause

Two mistakes compound to cause the panic:

  • An unbuffered channel whose sender is not the goroutine that
    closes it. Any timing that lets the sender run after the closer
    is fatal.
  • defer close(done). Closing a channel you are still potentially
    writing to violates Go's rule that only the sender should close a
    channel.

Fix

Three small changes together make Disconnect() race-free:

  1. Buffered channel (capacity 1): the callback can always complete
    its send without blocking, regardless of when it fires relative to
    the function returning.

  2. Remove defer close(done): the channel only needs to be closed
    if something is ranging over it; here we only receive one value.
    Removing the close eliminates the hazardous write-after-close window.

  3. Non-blocking send in the callback (select/default): defensive
    guard in case the OnDisconnected event fires more than once (e.g.
    on a rapid reconnect cycle); the first send wins, subsequent calls
    are silently dropped instead of blocking.

How to reproduce (before fix)

Run the connector under any workload that triggers repeated reconnects
(network blip, exchange-side disconnect, explicit Close() during an
active connection). With high enough reconnect frequency the goroutine
scheduler will expose the window and the panic will occur within minutes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant