Skip to content

Mark connections broken on protocol I/O failures#4549

Open
lh0156 wants to merge 9 commits into
redis:masterfrom
lh0156:fix-4278-error-broken-connection
Open

Mark connections broken on protocol I/O failures#4549
lh0156 wants to merge 9 commits into
redis:masterfrom
lh0156:fix-4278-error-broken-connection

Conversation

@lh0156

@lh0156 lh0156 commented May 30, 2026

Copy link
Copy Markdown

Fixes #4278.

Summary

  • Mark Connection broken when protocol send/read/push paths fail with JedisConnectionException or Error.
  • Guard reads and writes after a connection has already been marked broken.
  • Add focused unit coverage for serialization/write failures, protocol read failures, push read failures, pool invalidation, and Redis data errors that should not break the connection.

Motivation

If a pooled connection fails while RESP bytes are being written or read, the connection can retain partial or stale protocol state. Returning that connection to the pool as healthy can allow a later borrower to read stale data.

Redis error replies are still valid protocol responses, so JedisDataException paths continue to leave the connection reusable.

Notes

  • The existing best-effort Redis error-line diagnostic in sendCommand is preserved.
  • The connection is marked broken before the diagnostic read, so a failure during diagnostic enrichment cannot return a suspect connection to the pool.

Testing

  • /Users/developseop/.m2/wrapper/dists/apache-maven-3.9.15/9925cc1d/bin/mvn -B -Dwith-param-names=true -Dtest=ConnectionErrorHandlingTest test
  • /Users/developseop/.m2/wrapper/dists/apache-maven-3.9.15/9925cc1d/bin/mvn -B -Dwith-param-names=true -Dnet.bytebuddy.experimental=true test

Note

High Risk
Changes core connection lifecycle used by every pooled client; incorrect broken vs healthy classification could discard good connections or return stale RESP state to the pool.

Overview
Connection now consistently marks itself broken on RESP send/read/push failures (JedisConnectionException, I/O/RuntimeException during writes, and Error), while JedisDataException and post-read builder failures still leave the connection reusable.

New markBroken helpers centralize that behavior; sendCommand checks broken before writing (skipping diagnostic Redis error reads on already-broken sockets); flush and readProtocolWithCheckingBroken reject further I/O on broken connections; readReplyAfterCommandSent wraps reply reads so Error during executeCommand also breaks the connection. Write failures still run the optional Redis -ERR line enrichment only after marking broken.

Adds ConnectionErrorHandlingTest (fake sockets, pool invalidation vs healthy reuse on data errors) and registers it in the formatter pom. touch integration tests now assert OBJECT IDLETIME is in 0–1 after touch instead of exactly 0.

Reviewed by Cursor Bugbot for commit 6641f2c. Bugbot is set up for automated code reviews on this repo. Configure here.

@jit-ci

jit-ci Bot commented May 30, 2026

Copy link
Copy Markdown

Hi, I’m Jit, a friendly security platform designed to help developers build secure applications from day zero with an MVS (Minimal viable security) mindset.

In case there are security findings, they will be communicated to you as a comment inside the PR.

Hope you’ll enjoy using Jit.

Questions? Comments? Want to learn more? Get in touch with us.

@ggivo

ggivo commented Jun 1, 2026

Copy link
Copy Markdown
Collaborator

Thanks for the PR — the team will take a look as soon as we have free cycles.

In the meantime, to get the formatting checks passing, please add the new files to the formatter config in pom.xml and run mvn format over them. That should resolve the current CI failures.

@lh0156

lh0156 commented Jun 3, 2026

Copy link
Copy Markdown
Author

Thanks for the guidance. I added the new test file to the formatter config, ran the formatter, and pushed the fix in 724d5ee.

I also verified locally with:

  • mvn formatter:validate
  • mvn -Dtest=ConnectionErrorHandlingTest test

The GitHub Actions workflows on the latest commit currently show as action_required, so I’m waiting for them to be approved/rerun.

@ggivo

ggivo commented Jun 4, 2026

Copy link
Copy Markdown
Collaborator

@lh0156
Thanks. Just triggered the CI

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR hardens redis.clients.jedis.Connection error handling so that protocol I/O failures (write/flush/read/push paths) reliably mark a connection as broken, preventing pooled connections with potentially contaminated RESP state from being reused (fixes #4278).

Changes:

  • Update Connection to mark broken on protocol send/flush/read/push failures (notably JedisConnectionException and Error) and to reject reads/writes once broken.
  • Preserve the existing “read Redis error line if possible” diagnostic when sendCommand fails, while ensuring the connection is already marked broken first.
  • Add comprehensive unit coverage for serialization/write failures, protocol read failures, push read failures, and pool invalidation behavior; register the test in the formatter plugin includes.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
src/main/java/redis/clients/jedis/Connection.java Marks connections broken on key protocol failure paths and adds broken-state guards.
src/test/java/redis/clients/jedis/ConnectionErrorHandlingTest.java Adds focused tests covering broken-state behavior across send/flush/read/push and pool invalidation scenarios.
pom.xml Registers the new test file in the formatter plugin includes list.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/main/java/redis/clients/jedis/Connection.java
@ggivo

ggivo commented Jun 8, 2026

Copy link
Copy Markdown
Collaborator

@lh0156
Made one round of review, I think the review looks good. Want to make another round over the tests introduced before final approval, since it does not seem that we had good coverage over error propagations.

adding also @atakavci for another pair of review

@atakavci atakavci left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i suggest to investigate if connect outside of tyry-block 💯 safe

Comment thread src/main/java/redis/clients/jedis/Connection.java Outdated
Comment thread src/main/java/redis/clients/jedis/Connection.java
@ggivo ggivo added waiting-for-feedback We need additional information before we can continue and removed status: waiting-for-review labels Jun 10, 2026

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

Reviewed by Cursor Bugbot for commit cd826b2. Configure here.

Comment thread src/main/java/redis/clients/jedis/Connection.java
@lh0156 lh0156 force-pushed the fix-4278-error-broken-connection branch from 9d725fa to cd826b2 Compare June 13, 2026 10:46
@lh0156 lh0156 requested review from atakavci and ggivo June 14, 2026 04:59
@ggivo ggivo added status: waiting-for-review and removed waiting-for-feedback We need additional information before we can continue labels Jun 15, 2026
@ggivo

ggivo commented Jun 17, 2026

Copy link
Copy Markdown
Collaborator

@lh0156

Nice work!

I did another review pass, and I think the main remaining gap is hardening the Error (OutOfMemoryError example) handling described in the original issue.

Specifically, there is still a window in executeCommand() where an Error can occur after the command has been successfully sent (sendCommand(args)) but before the response is consumed (getOne()). In that case, the connection may be left in an unknown protocol state and later returned to the pool without being marked as broken.

@ggivo ggivo added waiting-for-feedback We need additional information before we can continue and removed status: waiting-for-review labels Jun 17, 2026
@lh0156

lh0156 commented Jun 28, 2026

Copy link
Copy Markdown
Author

Thanks for the review. I addressed the remaining executeCommand() window.

After a command is sent, reply consumption now goes through readReplyAfterCommandSent(), which marks the connection broken if an Error is thrown before or during reply consumption.

For CommandObject execution, the code now reads the reply first and only invokes the response builder after the reply has been consumed. That keeps the protocol-state boundary explicit: Errors while consuming the Redis reply mark the connection broken, but builder lookup/build failures after the reply is consumed do not.

I also added regression coverage for the builder lookup case and verified locally with:

mvn -B -Dwith-param-names=true -Dtest=ConnectionErrorHandlingTest test

@lh0156 lh0156 force-pushed the fix-4278-error-broken-connection branch from 0fb52b7 to a90040e Compare June 28, 2026 11:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

waiting-for-feedback We need additional information before we can continue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Input Stream Not Cleaned Up on Errors May Lead to Buffer Corruption

4 participants