Skip to content

fix: graceful TCP close to stop RST on early error responses (#185)#389

Merged
emil916 merged 2 commits into
masterfrom
fix/issue-185-graceful-close
Jun 22, 2026
Merged

fix: graceful TCP close to stop RST on early error responses (#185)#389
emil916 merged 2 commits into
masterfrom
fix/issue-185-graceful-close

Conversation

@bushidocodes

@bushidocodes bushidocodes commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Closes #185

Summary

Fixes the root cause behind #185hey logging Unsolicited response received on idle HTTP channel starting with "HTTP/1.1 503 ...".

SLEdge often answers a request before it has finished reading it: the short-circuit error responses (400/404/429/500/503) in the listener thread write the response and close() the socket while the client may still be sending its request body. tcp_session_close() did a bare close(), and on Linux a close() with unread data in the kernel receive buffer discards it and emits a TCP RST instead of a graceful FIN. The client surfaces that RST as connection reset by peer or, on a pooled keepalive connection, as the exact Unsolicited response received on idle HTTP channel warning in the issue. This matches the maintainers' own hypothesis in the issue thread ("we write a response before reading the entire HTTP request").

Change

tcp_session_close() now closes gracefully:

  1. shutdown(SHUT_WR) — send our FIN so the client (now holding a complete response) stops sending.
  2. Drain already-buffered inbound data with a non-blocking, bounded loop (32 × 8 KiB) so close() never sees unread data, while guaranteeing the single listener thread can't block or spin on a slow/malicious client.

One file, +41 lines, no change to the success path.

Verification

Reproduced and verified in the Docker dev environment with hey:

Case Before After
Normal 200 traffic (5000 req) 200×5000, ~15k rps unchanged, 0 warnings
GET → 429/503 (originally reported scenario) already clean clean
POST 100 KB body → 404 route (connection reset by peer) ~159 / 2400 req 0

Note: the originally reported bodyless-GET scenario is already clean on current master (admissions/scheduler 429s now fire after the full request is read). The reproducible residual was a request with a body hitting an early-error route, which this fixes.

Known limitation

A client that keeps streaming a large body to a rejected route may still observe its own write abort as broken pipe (EPIPE). Fully suppressing that would require lingering on the request in the event loop (a draining state in the listener), which is disproportionate for this log-level issue — left out of scope and documented in the code comment.

🤖 Generated with Claude Code

SLEdge frequently answers a request before it has finished reading it:
the short-circuit error responses (400/404/429/500/503) in the listener
thread write the response and close the socket while the client may still
be sending its request body.

tcp_session_close() did a bare close(). On Linux, close() with unread data
still in the kernel receive buffer discards that data and emits a TCP RST
instead of a graceful FIN. The client's HTTP stack reports this as
"connection reset by peer" or, on a pooled keepalive connection, as the Go
net/http warning "Unsolicited response received on idle HTTP channel" that
issue #185 describes.

Close gracefully instead: half-close the write side so the client receives
our FIN (and, holding a complete response, stops sending), then drain any
already-buffered inbound data so close() no longer sees unread data. The
drain is non-blocking and bounded (32 x 8 KiB) so the single listener
thread can never block or spin on a slow client.

Verified in the Docker dev environment with hey: POSTing a 100 KB body to a
rejected (404) route went from ~159 "connection reset by peer" errors per
2400 requests to 0, with no change to normal 200-response throughput
(~15k rps) or the bodyless GET -> 429/503 path.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@bushidocodes bushidocodes force-pushed the fix/issue-185-graceful-close branch from 138add3 to 9e66463 Compare June 16, 2026 01:46
@bushidocodes bushidocodes requested a review from emil916 June 16, 2026 01:49
@bushidocodes bushidocodes marked this pull request as ready for review June 16, 2026 01:49

@emil916 emil916 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've never run into this issue. The server doesn't send any of the mentioned error responses (400/404/429/500/503), unless the request is fully received, no? I don't see how this is ever useful.

bushidocodes added a commit that referenced this pull request Jun 20, 2026
Self-contained script that reproduces the "connection reset by peer" /
"Unsolicited response received on idle HTTP channel" failure behind #185:
launches sledgert with a single valid route, then POSTs a sizeable body to a
non-existent route so the server emits a 404 (matched at the request line, in
on_client_request_receiving) and close()s while the body is still in flight,
producing a RST. Reports the reset count and prints BUG REPRODUCED when nonzero.

Helps reproduce the concrete failure case requested in PR #389 review.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Self-contained script that reproduces the "connection reset by peer" /
"Unsolicited response received on idle HTTP channel" failure behind #185:
launches sledgert with a single valid route, then POSTs a sizeable body to a
non-existent route so the server emits a 404 (matched at the request line, in
on_client_request_receiving) and close()s while the body is still in flight,
producing a RST. Reports the reset count and prints BUG REPRODUCED when nonzero.

Run on an unpatched checkout to see the bug; on this branch the graceful-close
fix drops the reset count to 0. Helps reviewers reproduce the concrete failure
case requested in PR #389 review.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@emil916

emil916 commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

I see. I remember now, I used to get these 503 errors back in 2022 when using hey (now I am pretty much using my own loadtest).

@emil916 emil916 merged commit 1e7bd47 into master Jun 22, 2026
0 of 4 checks passed
@emil916 emil916 deleted the fix/issue-185-graceful-close branch June 22, 2026 20:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Hey complains when it receives 503s.

2 participants