Skip to content

bugfix: improve shutdown and stop PoA sooner#3289

Open
MitchTurner wants to merge 17 commits intomasterfrom
bugfix/shutdown-poa-faster
Open

bugfix: improve shutdown and stop PoA sooner#3289
MitchTurner wants to merge 17 commits intomasterfrom
bugfix/shutdown-poa-faster

Conversation

@MitchTurner
Copy link
Copy Markdown
Contributor

@MitchTurner MitchTurner commented May 4, 2026

Linked Issues/PRs

Description

Because the services were being being shutdown in the same order they were started, the PoA service can be held up by earlier services. This means that our producers could be nominally "shut down" but the PoA service still runs a while. The old leader would still hold on to its lock, but it wouldn't be broadcasting blocks anymore.

The solution here is to shutdown all of the services in parallel, this way none of them is blocking the others.

The other part of the PR is taking care of the GraphQL which was the culprit for PoA never being shut down. We've added a timeout to force it to shutdown if it doesn't shut down gracefully.

Checklist

  • Breaking changes are clearly marked as such in the PR description and changelog
  • New behavior is reflected in tests
  • The specification matches the implemented behavior (link update PR if changes are needed)

Before requesting review

  • I have reviewed the code myself
  • I have created follow-up issues caused by this PR and linked them here

After merging, notify other teams

[Add or remove entries as needed]

@cursor
Copy link
Copy Markdown

cursor Bot commented May 4, 2026

PR Summary

Medium Risk
Changes node shutdown sequencing and introduces forced abort of the GraphQL server after a timeout, which could affect graceful shutdown semantics and in-flight requests. Dependency bumps/lockfile churn also carry some integration risk, especially around wasmtime/cranelift updates.

Overview
Ensures node shutdown does not block on slow sub-services by stopping all sub-services in parallel, improving PoA shutdown promptness and adding more detailed shutdown logging.

Hardens GraphQL shutdown by running the server in a spawned task and applying a 2s timeout; if graceful shutdown stalls, the task is aborted and the result is mapped consistently.

Improves test stability by waiting for node interfaces to become reachable before proceeding and retrying txpool inserts on transient ServiceCommunicationFailed errors.

Includes minor dependency/maintenance updates: adds a cargo-audit advisory ignore entry, bumps wasmtime to 43.0.2, and updates Cargo.lock plus some formatting-only Cargo.toml changes.

Reviewed by Cursor Bugbot for commit 9e14736. Bugbot is set up for automated code reviews on this repo. Configure here.

@MitchTurner MitchTurner self-assigned this May 4, 2026
Comment thread .cargo/audit.toml Outdated
Comment thread crates/fuel-core/src/service.rs
Comment thread crates/fuel-core/src/graphql_api/api_service.rs
@MitchTurner MitchTurner marked this pull request as ready for review May 7, 2026 14:48
@MitchTurner MitchTurner requested review from a team, Dentosal and xgreenx as code owners May 7, 2026 14:48
Comment thread crates/fuel-core/src/graphql_api/api_service.rs
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 11328e1. Configure here.

Comment thread crates/fuel-core/src/graphql_api/api_service.rs
let total_services = self.services.len();
futures::future::join_all((1_usize..).zip(self.services.iter()).map(
|(service_num, service)| {
stop_sub_service(service_num, total_services, service.as_ref())
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to use the actual service name here, but I don't want to touch to many pieces of code in a bugfix like this. This doesn't have access to RunnableService::NAME

@MitchTurner MitchTurner changed the title bugfix: stop poa first bugfix: improve shutdown and stop PoA sooner May 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant