Fix goroutine leak in reader on disconnect#37
Open
rradu92 wants to merge 1 commit intoivahaev:masterfrom
Open
Conversation
The inner goroutine in `(*amiAdapter).reader` blocks indefinitely on
`chanEvents <- event` (or `chanErr <- err`) when the outer reader exits
via `<-stop`, because nobody drains those channels after stop and the
inner goroutine has no way to observe stop directly. Each reconnect of
the AMI client therefore leaks one goroutine that holds the bufio
reader and the underlying socket forever.
This was reproducible in production: a long-running service with
Keepalive=true accumulated 100+ leaked reader goroutines visible in
`/debug/pprof/goroutine` after a few hours of intermittent network
issues, all parked on `chanEvents <- event` at ami.go:429.
Fix:
- Use buffered channels (size 1) for chanErr and chanEvents.
- Use select-on-stop in the inner goroutine when sending, so it can
bail out instead of blocking.
- Close the connection in a defer so any in-flight readMessage call
returns an error, then wait on innerDone to confirm the inner
goroutine has fully exited before reader returns.
Existing tests pass.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
The inner goroutine in
(*amiAdapter).readerblocks indefinitely onchanEvents <- event(orchanErr <- err) when the outer reader exits via<-stop. Nobody drains those channels after stop, and the inner goroutine has no way to observe stop directly, so it parks forever holding the bufio reader and the underlying socket.Each AMI reconnect therefore leaks one goroutine.
Reproduction
A long-running service with
Keepalive: trueand intermittent network issues accumulated 100+ leaked reader goroutines visible in/debug/pprof/goroutine, all parked onchanEvents <- eventatami.go:429. In one production capture, 167 of 345 total goroutines (~48%) were stuck in this single stack — one per historical reconnect since process start.Fix
chanErrandchanEventswith size 1.select-on-stop in the inner goroutine when sending, so it can bail out instead of blocking when the outer reader has already exited.defer, close the connection so any in-flightreadMessagereturns an error, then wait oninnerDoneto confirm the inner goroutine has fully exited beforereaderreturns. This makes the cleanup deterministic for callers.The outer reader's behavior is unchanged for the happy path; only the disconnect/stop path is now leak-free.
Tests
Existing tests pass (
go test ./...).