Skip to content

[clustermgtd] Add retry to compute fleet status retrieval in clustermgtd#704

Open
gmarciani wants to merge 1 commit into
aws:developfrom
gmarciani:wip/mgiacomo/3160/fleet-status-retry-0506-1
Open

[clustermgtd] Add retry to compute fleet status retrieval in clustermgtd#704
gmarciani wants to merge 1 commit into
aws:developfrom
gmarciani:wip/mgiacomo/3160/fleet-status-retry-0506-1

Conversation

@gmarciani
Copy link
Copy Markdown
Contributor

@gmarciani gmarciani commented May 6, 2026

Description of changes

Add retry to compute fleet status retrieval in clustermgtd
This retry mitigates the impact of networking glitches, such as transient unavailability of IMDS.

Tests

SUCCESS test_slurm_scaling which was the test that detected the clustermgtd error caused by IMDS failures.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@gmarciani gmarciani added the 3.x label May 6, 2026
@gmarciani gmarciani requested review from a team as code owners May 6, 2026 22:59
…gtd.

 This retry mitigates the impact of networking glitches, such as transient unavailability of IMDS.
@gmarciani gmarciani force-pushed the wip/mgiacomo/3160/fleet-status-retry-0506-1 branch from 7c32088 to d174d8d Compare May 6, 2026 23:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant