Skip to content

Fix goroutine leak in reader on disconnect#37

Open
rradu92 wants to merge 1 commit intoivahaev:masterfrom
rradu92:fix/reader-goroutine-leak
Open

Fix goroutine leak in reader on disconnect#37
rradu92 wants to merge 1 commit intoivahaev:masterfrom
rradu92:fix/reader-goroutine-leak

Conversation

@rradu92
Copy link
Copy Markdown

@rradu92 rradu92 commented May 4, 2026

Problem

The inner goroutine in (*amiAdapter).reader blocks indefinitely on chanEvents <- event (or chanErr <- err) when the outer reader exits via <-stop. Nobody drains those channels after stop, and the inner goroutine has no way to observe stop directly, so it parks forever holding the bufio reader and the underlying socket.

Each AMI reconnect therefore leaks one goroutine.

Reproduction

A long-running service with Keepalive: true and intermittent network issues accumulated 100+ leaked reader goroutines visible in /debug/pprof/goroutine, all parked on chanEvents <- event at ami.go:429. In one production capture, 167 of 345 total goroutines (~48%) were stuck in this single stack — one per historical reconnect since process start.

167 @ ...
#       github.com/ivahaev/amigo.(*amiAdapter).reader.func1+0x1ac      ami.go:429

Fix

  • Buffer chanErr and chanEvents with size 1.
  • Use select-on-stop in the inner goroutine when sending, so it can bail out instead of blocking when the outer reader has already exited.
  • In a defer, close the connection so any in-flight readMessage returns an error, then wait on innerDone to confirm the inner goroutine has fully exited before reader returns. This makes the cleanup deterministic for callers.

The outer reader's behavior is unchanged for the happy path; only the disconnect/stop path is now leak-free.

Tests

Existing tests pass (go test ./...).

The inner goroutine in `(*amiAdapter).reader` blocks indefinitely on
`chanEvents <- event` (or `chanErr <- err`) when the outer reader exits
via `<-stop`, because nobody drains those channels after stop and the
inner goroutine has no way to observe stop directly. Each reconnect of
the AMI client therefore leaks one goroutine that holds the bufio
reader and the underlying socket forever.

This was reproducible in production: a long-running service with
Keepalive=true accumulated 100+ leaked reader goroutines visible in
`/debug/pprof/goroutine` after a few hours of intermittent network
issues, all parked on `chanEvents <- event` at ami.go:429.

Fix:
  - Use buffered channels (size 1) for chanErr and chanEvents.
  - Use select-on-stop in the inner goroutine when sending, so it can
    bail out instead of blocking.
  - Close the connection in a defer so any in-flight readMessage call
    returns an error, then wait on innerDone to confirm the inner
    goroutine has fully exited before reader returns.

Existing tests pass.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant