Upgrade TFMA to TF 2.21.0 and Added Python 3.12/3.13 support by vkarampudi · Pull Request #522 · tensorflow/model-analysis

vkarampudi · 2026-04-28T20:26:52Z

Overview

This PR adds official support for Python 3.12 and 3.13, stabilizes the codebase for Python 3.13, and ensures compatibility with NumPy 2.0 (validating on NumPy 1.25+). It resolves critical architectural pickling failures in Apache Beam pipelines, addresses strict scalar conversion rules in NumPy 2.0, and fixes several functional regressions in tests.

Detailed Changes

1. Core Architecture & Pickling Stability (Python 3.12 & 3.13 Support)

Change: Refactored closure-based Beam matchers into module-level classes in test files (e.g., rouge_test.py, stats_test.py, model_util_test.py).
- What made us fix it: Tests were failing with PicklingError or RuntimeError: Unable to pickle fn during distributed execution with PrismRunner on Python 3.13. Python 3.13 has stricter rules regarding serialization of functions that capture surrounding state (like self referring to the test instance).
- Why needed: Apache Beam requires functions passed to transforms to be fully serializable to distribute them across workers. Without this refactor, tests failing to pickle would crash the Beam pipeline.
Change: Implemented kind='stable' sort in metric_util.top_k_indices.
- What made us fix it: Non-deterministic tie-breaking caused flaky assertion failures in tests when multiple elements had identical values.
- Why needed: To ensure consistent behavior across different runs and environments, preserving relative ordering of equal elements.

2. NumPy 2.0 Compatibility

Change: Replaced direct scalar casting like float(ndarray) with np.asarray(x).item() across the metrics library (e.g., aggregation.py, flip_metrics.py, ndcg.py).
- What made us fix it: TypeError: only 0-dimensional arrays can be converted to Python scalars or AttributeError when trying to convert arrays to scalars. NumPy 2.0 strictly enforces that only true scalars or 0-dimensional arrays can be cast to Python scalars this way.
- Why needed: To support NumPy 2.0 without breaking when handling array-like objects returned by TensorFlow or Beam.
Change: Systematically applied np.divide with where clauses in confusion matrix metrics.
- What made us fix it: RuntimeWarning: invalid value encountered in divide (division by zero) when predictions or labels were zero for certain slices.
- Why needed: To suppress noisy warnings and handle edge cases gracefully by returning NaN or designated values instead of warning or crashing.
Change: Fixed redundant size argument in poisson(1, 1) in poisson_bootstrap.py.
- What made us fix it: TypeError in NumPy 2.0 where poisson expected different arguments or behavior changed when size was provided in a way that wasn't expected.
- Why needed: To ensure correct execution of bootstrapping logic under NumPy 2.0.

3. Functional Fixes & Regressions

Change: Fixed false_omission_rate in binary_confusion_matrices.py to return NaN when undefined, ensuring it's a float array.
- What made us fix it: Proto mismatches in confusion_matrix_plot_test.py and score_distribution_plot_test.py where tests expected NaN but got valid float representations that didn't match the expected proto structure due to undefined states.
- Why needed: To match the expected proto schema and pass tests accurately when values are mathematically undefined.
Change: Fixed NotFoundError in model_eval_lib_test.py by ensuring directories exist before writing files using tf.io.gfile.makedirs.
- What made us fix it: Tests failed because the target directory /tmp/absl_testing did not exist when trying to write test results.
- Why needed: To ensure tests run reliably regardless of the initial state of the temporary directory.
Change: Corrected SubKey(k=k) selection logic in metric_util.py by explicitly passing sort=True.
- What made us fix it: It was selecting the first prediction instead of the k-th largest prediction because the array was not sorted.
- Why needed: To ensure correct metric calculation for top-k scenarios.

4. Dependency & Infrastructure

Change: Dropped Python 3.9 support and updated requirements to ">=3.10,<3.14".
- What made us fix it: Python 3.9 is nearing end-of-life, and dropping it simplifies dependency management.
- Why needed: To align with supported Python versions and leverage modern features.

Verification Results

All unit tests have been verified to pass in the target python environment (Python 3.13, 3.12, 3.11, 3.10 & TF 2.21.0). Specific edge case failures (e.g., weight mismatches in rouge_test.py and stats_test.py) were investigated and confirmed to be expected negative test behaviors asserting specific error messages.

- Refactored core test matchers to class-based architecture for pickling stability on Python 3.13. - Updated dependencies: TF 2.21.0, Protobuf 6.31.1+, Bazel 7.4.1, PyArrow >14. - Dropped support for Python 3.9 (Minimum supported 3.10). - Updated GitHub Actions to support Python 3.10-3.13. - Consolidated apache-beam constraints and restored TFX-BSL fork for CI validation. - Fixed various environment-specific regressions (numpy scalar conversion, extractors mutation).

- Replaced tag-based archives with immutable commit-based archives for TensorFlow v2.21.0 and Protobuf v31.1. - Added SHA256 checksum verification to ensure build integrity.

- Fixed E402 (Module level import not at top of file) in evaluator tests. - Resolved trailing whitespace in multiple modules. - Standardized quotes and formatting in setup.py. - Corrected import order in SQL extractor modules. - Fixed class definitions and removed redundant object inheritance.

- Restored 'types' import in metrics_plots_and_validations_evaluator_test.py (fixed F821). - Applied final formatting fixes and removed trailing whitespace across test suites. - Synchronized extraction and evaluation modules with ruff-standard formatting.

- Consolidated API imports in evaluator tests. - Removed unused metric imports. - Standardized whitespace after class definitions in all test suites. - Fixed indentation and formatting in Attributions and Metrics check functions. - Added necessary blank lines for PipelineOptions imports.

- Auto-fixed trailing whitespace in rouge_test.py. - Refined indentation in evaluation metric checks. - Standardized class definition spacing in extraction modules. - Applied missing blank lines in setup.py and utility tests.

- Applied consistent spacing in extractors and metrics. - Standardized indentation across test suites. - Re-synchronized all formatting with strict CI standards.

- Broadened pandas constraint to >=1.0,<3 in setup.py. - This unblocks the CI environment initialization by allowing pandas 2.x, which is required by the tfx-bsl testing fork.

…ic CI results

- Refactored scalar extraction to use .item() instead of float(ndarray). - Implemented safe division in AUC/PR AUC metrics. - Fixed SubKey(k=k) indexing logic regression. - Restored necessary protobuf generated files for environment stability. - Verified fixes with full test suite pass.

…modules

- Fix scalar conversion issues in aggregation and flip metrics. - Fix batching bug in _BooleanFlipCountsCombiner. - Harden confusion matrix and calibration metrics against zero division. - Fix rouge metric ValueError. - Fix missing numpy imports across metrics. - Add regression test for array-like inputs in flip_metrics_test.py.

…umPy 2.0

…ession in model_util_test.py

vkarampudi added 20 commits April 28, 2026 20:13

Harden WORKSPACE with secure commit hashes and SHA256 for TF/Protobuf

d892cff

- Replaced tag-based archives with immutable commit-based archives for TensorFlow v2.21.0 and Protobuf v31.1. - Added SHA256 checksum verification to ensure build integrity.

Stage all pre-commit auto-fixes

5f06bbe

- Applied consistent spacing in extractors and metrics. - Standardized indentation across test suites. - Re-synchronized all formatting with strict CI standards.

Resolve pandas dependency conflict with TFX-BSL

d7b70bf

- Broadened pandas constraint to >=1.0,<3 in setup.py. - This unblocks the CI environment initialization by allowing pandas 2.x, which is required by the tfx-bsl testing fork.

Implement stable tie-breaking for top_k_indices to ensure determinist…

86ad547

…ic CI results

Update RELEASE.md with NumPy 2.0 compatibility and SubKey indexing fix

bf7d4ad

Resolve pre-commit lint and formatting failures in proto and metrics …

65d7cc8

…modules

Apply automated pre-commit ruff fixes and formatting

b9bf2fe

Update RELEASE.md with final Python 3.13 stabilization fixes.

5626bf6

Apply ruff formatting to attributions.py.

af80ae6

Fix NumPy 2.0 scalar conversion in poisson_bootstrap.py

9b3c11c

Fix scalar conversion and proto mismatch issues for Python 3.13 and N…

ea06211

…umPy 2.0

Fix NotFoundError in model_eval_lib_test.py and add partial log suppr…

cca0310

…ession in model_util_test.py

Update RELEASE.md with stabilization changes

e1bbce3

vkarampudi changed the title ~~Testing~~ Upgrade TFMA to TF 2.21.0 and Added Python 3.12/3.13 support Apr 30, 2026

vkarampudi requested a review from genehwung April 30, 2026 18:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Upgrade TFMA to TF 2.21.0 and Added Python 3.12/3.13 support#522

Upgrade TFMA to TF 2.21.0 and Added Python 3.12/3.13 support#522
vkarampudi wants to merge 20 commits intotensorflow:masterfrom
vkarampudi:master

vkarampudi commented Apr 28, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

vkarampudi commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Detailed Changes

1. Core Architecture & Pickling Stability (Python 3.12 & 3.13 Support)

2. NumPy 2.0 Compatibility

3. Functional Fixes & Regressions

4. Dependency & Infrastructure

Verification Results

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vkarampudi commented Apr 28, 2026 •

edited

Loading