Skip to content

cluster: fix host state pool refresh races#854

Open
dkropachev wants to merge 1 commit intomasterfrom
dk/standalone-host-state-fixes
Open

cluster: fix host state pool refresh races#854
dkropachev wants to merge 1 commit intomasterfrom
dk/standalone-host-state-fixes

Conversation

@dkropachev
Copy link
Copy Markdown
Collaborator

Summary

  • make Cluster.signal_connection_failure() report whether down handling actually ran, so defunct control connections reconnect when host-down handling is discounted
  • skip Session.update_created_pools() pool creation for hosts already being handled by Cluster.on_up()
  • make the no-pool-future Cluster.on_up() path notify listeners and reconcile session pools like the async completion path
  • add focused unit coverage for all three host-state cases

Closes #847
Closes #851
Closes #852

Testing

  • Added unit test coverage for the fixed cases.
  • repo-ci fast was attempted. workflow-prepare and workflow-test passed, but the overall run failed when repo-ci hit its internal 300s timeout while a build step was still running. Artifact: 7707c4d2abc4992ec7b76d3fcfdc50e79369210e6713c10b27ac85272b70a580.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses race conditions and missed reconciliation/notification paths during host up/down transitions in cassandra.cluster.Cluster, ensuring control connections and session pools converge correctly even when host-down handling is discounted.

Changes:

  • Adjust Cluster.signal_connection_failure() / Cluster.on_down() so callers can distinguish “down handling executed” vs “conviction policy said down,” enabling ControlConnection._signal_error() to reconnect when down handling is discounted.
  • Prevent Session.update_created_pools() from scheduling pool creation for hosts currently being processed by Cluster.on_up().
  • Make the Cluster.on_up() “no pool futures” path notify listeners and reconcile session pools, matching the async completion behavior.
  • Add unit tests covering the discounted-down control connection case, the on-up no-future notification/reconciliation case, and the pool-refresh skip during node-up handling.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File Description
cassandra/cluster.py Refines host up/down transition semantics, prevents duplicate pool refresh during node-up handling, and ensures listener notification + pool reconciliation in all on-up paths.
tests/unit/test_control_connection.py Adds coverage ensuring defunct control connections reconnect when host-down handling is discounted.
tests/unit/test_cluster.py Adds coverage for on-up notification/reconciliation without pool futures and for skipping update_created_pools() work while node-up handling is in progress.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

2 participants