Skip to content

Fix state_dict handling for max_iters in Engine#3729

Draft
TahaZahid05 wants to merge 9 commits intopytorch:masterfrom
TahaZahid05:fix-taha/max-iters
Draft

Fix state_dict handling for max_iters in Engine#3729
TahaZahid05 wants to merge 9 commits intopytorch:masterfrom
TahaZahid05:fix-taha/max-iters

Conversation

@TahaZahid05
Copy link
Copy Markdown
Collaborator

Fixes #1521

Description:
This PR builds up on #3439 to implement max_iters handling in state serialization and deserialization during Engine runs.

Key Changes:

  • Engine.state_dict() now correctly exports exactly one of max_iters or max_epochs depending on which condition the Engine run was configured with.
  • Reconstructed _state_dict_one_of_opt_keys to accept groups of mutually exclusive requirements, enabling Engine.load_state_dict() to cleanly accept either iteration/max_iters or epoch/max_epochs and correctly continue the engine's resume state from either.
  • Added validation checks directly to Engine to prevent loading impossible future iterations while safely resuming training.
  • Added local execution and Checkpointer tests that verify mutually exclusive cross-parameter resumption.

Check list:

  • New tests are added (if a new feature is added)
  • New doc strings: description and/or example code are in RST format
  • Documentation is updated (if required)

@github-actions github-actions Bot added module: engine Engine module module: base Base module labels Apr 10, 2026
Comment thread ignite/base/mixins.py Outdated
@TahaZahid05
Copy link
Copy Markdown
Collaborator Author

@vfdev-5 done!

Comment thread ignite/base/mixins.py Outdated
Comment thread ignite/engine/engine.py Outdated
Comment thread ignite/engine/engine.py Outdated
Comment thread tests/ignite/base/test_mixins.py Outdated
Comment thread tests/ignite/engine/test_engine_state_dict.py Outdated
Comment thread tests/ignite/engine/test_engine_state_dict.py Outdated
More descriptive way to show tuple of tuples

Co-authored-by: vfdev <vfdev.5@gmail.com>
@vfdev-5
Copy link
Copy Markdown
Collaborator

vfdev-5 commented Apr 12, 2026 via email

@TahaZahid05 TahaZahid05 requested a review from vfdev-5 April 12, 2026 14:58
Comment thread ignite/engine/engine.py
keys: tuple[str, ...] = self._state_dict_all_req_keys + (self._state_dict_one_of_opt_keys[0],)
keys: tuple[str, ...] = self._state_dict_all_req_keys
# We add iteration by default to get exact measure of progress
keys += ("iteration",)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we do not add it to self._state_dict_all_req_keys directly ?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vfdev-5 @aaishwarymishra Won't it be BC? any existing code that relies only on epoch rather than iteration may fail?

Comment thread ignite/engine/engine.py Outdated
Comment thread ignite/engine/engine.py
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends Ignite’s Engine state (de)serialization to properly support runs terminated by max_iters, including mutually-exclusive termination/progress keys and additional validation to prevent invalid resume states.

Changes:

  • Update Engine.state_dict() to serialize exactly one termination key (max_epochs or max_iters) alongside progress.
  • Update Serializable.load_state_dict() to validate groups of mutually exclusive optional keys (e.g., (iteration|epoch) and (max_epochs|max_iters)).
  • Add/expand tests covering max_iters serialization, resumption, validation errors, and checkpointing.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 6 comments.

File Description
ignite/engine/engine.py Adds max_iters serialization/deserialization support, validation helpers, and run-time handling of mutually exclusive termination params.
ignite/base/mixins.py Changes optional-key validation to support grouped “one-of” requirements.
tests/ignite/engine/test_engine_state_dict.py Expands integration/unit tests for max_iters state dict behavior, resume cases, and error validation.
tests/ignite/base/test_mixins.py Adds tests for the new grouped optional-key validation logic in Serializable.load_state_dict().

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread ignite/engine/engine.py
Comment thread ignite/engine/engine.py
Comment thread tests/ignite/base/test_mixins.py Outdated
Comment thread ignite/base/mixins.py Outdated
Comment thread ignite/engine/engine.py Outdated
Comment thread ignite/engine/engine.py
Comment thread ignite/base/mixins.py
Comment thread ignite/engine/engine.py Outdated
@vfdev-5 vfdev-5 requested a review from aaishwarymishra April 15, 2026 15:33
Comment thread ignite/base/mixins.py Outdated
Comment thread ignite/engine/engine.py
keys: tuple[str, ...] = self._state_dict_all_req_keys + (self._state_dict_one_of_opt_keys[0],)
keys: tuple[str, ...] = self._state_dict_all_req_keys
# We add iteration by default to get exact measure of progress
keys += ("iteration",)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If iteration is always getting included why not we just add it to _state_dict_all_req_keys

Comment thread ignite/engine/engine.py Outdated
Comment thread ignite/engine/engine.py Outdated
Comment thread ignite/engine/engine.py
@TahaZahid05 TahaZahid05 marked this pull request as draft April 21, 2026 23:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

module: base Base module module: engine Engine module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Possible issues with max_iters when loading/saving engine's state

5 participants