Skip to content

Upgrade TFMA to TF 2.21.0 and Added Python 3.12/3.13 support#522

Open
vkarampudi wants to merge 20 commits intotensorflow:masterfrom
vkarampudi:master
Open

Upgrade TFMA to TF 2.21.0 and Added Python 3.12/3.13 support#522
vkarampudi wants to merge 20 commits intotensorflow:masterfrom
vkarampudi:master

Conversation

@vkarampudi
Copy link
Copy Markdown
Contributor

@vkarampudi vkarampudi commented Apr 28, 2026

Overview

This PR adds official support for Python 3.12 and 3.13, stabilizes the codebase for Python 3.13, and ensures compatibility with NumPy 2.0 (validating on NumPy 1.25+). It resolves critical architectural pickling failures in Apache Beam pipelines, addresses strict scalar conversion rules in NumPy 2.0, and fixes several functional regressions in tests.

Detailed Changes

1. Core Architecture & Pickling Stability (Python 3.12 & 3.13 Support)

  • Change: Refactored closure-based Beam matchers into module-level classes in test files (e.g., rouge_test.py, stats_test.py, model_util_test.py).
    • What made us fix it: Tests were failing with PicklingError or RuntimeError: Unable to pickle fn during distributed execution with PrismRunner on Python 3.13. Python 3.13 has stricter rules regarding serialization of functions that capture surrounding state (like self referring to the test instance).
    • Why needed: Apache Beam requires functions passed to transforms to be fully serializable to distribute them across workers. Without this refactor, tests failing to pickle would crash the Beam pipeline.
  • Change: Implemented kind='stable' sort in metric_util.top_k_indices.
    • What made us fix it: Non-deterministic tie-breaking caused flaky assertion failures in tests when multiple elements had identical values.
    • Why needed: To ensure consistent behavior across different runs and environments, preserving relative ordering of equal elements.

2. NumPy 2.0 Compatibility

  • Change: Replaced direct scalar casting like float(ndarray) with np.asarray(x).item() across the metrics library (e.g., aggregation.py, flip_metrics.py, ndcg.py).
    • What made us fix it: TypeError: only 0-dimensional arrays can be converted to Python scalars or AttributeError when trying to convert arrays to scalars. NumPy 2.0 strictly enforces that only true scalars or 0-dimensional arrays can be cast to Python scalars this way.
    • Why needed: To support NumPy 2.0 without breaking when handling array-like objects returned by TensorFlow or Beam.
  • Change: Systematically applied np.divide with where clauses in confusion matrix metrics.
    • What made us fix it: RuntimeWarning: invalid value encountered in divide (division by zero) when predictions or labels were zero for certain slices.
    • Why needed: To suppress noisy warnings and handle edge cases gracefully by returning NaN or designated values instead of warning or crashing.
  • Change: Fixed redundant size argument in poisson(1, 1) in poisson_bootstrap.py.
    • What made us fix it: TypeError in NumPy 2.0 where poisson expected different arguments or behavior changed when size was provided in a way that wasn't expected.
    • Why needed: To ensure correct execution of bootstrapping logic under NumPy 2.0.

3. Functional Fixes & Regressions

  • Change: Fixed false_omission_rate in binary_confusion_matrices.py to return NaN when undefined, ensuring it's a float array.
    • What made us fix it: Proto mismatches in confusion_matrix_plot_test.py and score_distribution_plot_test.py where tests expected NaN but got valid float representations that didn't match the expected proto structure due to undefined states.
    • Why needed: To match the expected proto schema and pass tests accurately when values are mathematically undefined.
  • Change: Fixed NotFoundError in model_eval_lib_test.py by ensuring directories exist before writing files using tf.io.gfile.makedirs.
    • What made us fix it: Tests failed because the target directory /tmp/absl_testing did not exist when trying to write test results.
    • Why needed: To ensure tests run reliably regardless of the initial state of the temporary directory.
  • Change: Corrected SubKey(k=k) selection logic in metric_util.py by explicitly passing sort=True.
    • What made us fix it: It was selecting the first prediction instead of the k-th largest prediction because the array was not sorted.
    • Why needed: To ensure correct metric calculation for top-k scenarios.

4. Dependency & Infrastructure

  • Change: Dropped Python 3.9 support and updated requirements to ">=3.10,<3.14".
    • What made us fix it: Python 3.9 is nearing end-of-life, and dropping it simplifies dependency management.
    • Why needed: To align with supported Python versions and leverage modern features.

Verification Results

All unit tests have been verified to pass in the target python environment (Python 3.13, 3.12, 3.11, 3.10 & TF 2.21.0). Specific edge case failures (e.g., weight mismatches in rouge_test.py and stats_test.py) were investigated and confirmed to be expected negative test behaviors asserting specific error messages.

- Refactored core test matchers to class-based architecture for pickling stability on Python 3.13.
- Updated dependencies: TF 2.21.0, Protobuf 6.31.1+, Bazel 7.4.1, PyArrow >14.
- Dropped support for Python 3.9 (Minimum supported 3.10).
- Updated GitHub Actions to support Python 3.10-3.13.
- Consolidated apache-beam constraints and restored TFX-BSL fork for CI validation.
- Fixed various environment-specific regressions (numpy scalar conversion, extractors mutation).
- Replaced tag-based archives with immutable commit-based archives for TensorFlow v2.21.0 and Protobuf v31.1.
- Added SHA256 checksum verification to ensure build integrity.
- Fixed E402 (Module level import not at top of file) in evaluator tests.
- Resolved trailing whitespace in multiple modules.
- Standardized quotes and formatting in setup.py.
- Corrected import order in SQL extractor modules.
- Fixed class definitions and removed redundant object inheritance.
- Restored 'types' import in metrics_plots_and_validations_evaluator_test.py (fixed F821).
- Applied final formatting fixes and removed trailing whitespace across test suites.
- Synchronized extraction and evaluation modules with ruff-standard formatting.
- Consolidated API imports in evaluator tests.
- Removed unused metric imports.
- Standardized whitespace after class definitions in all test suites.
- Fixed indentation and formatting in Attributions and Metrics check functions.
- Added necessary blank lines for PipelineOptions imports.
- Auto-fixed trailing whitespace in rouge_test.py.
- Refined indentation in evaluation metric checks.
- Standardized class definition spacing in extraction modules.
- Applied missing blank lines in setup.py and utility tests.
- Applied consistent spacing in extractors and metrics.
- Standardized indentation across test suites.
- Re-synchronized all formatting with strict CI standards.
- Broadened pandas constraint to >=1.0,<3 in setup.py.
- This unblocks the CI environment initialization by allowing pandas 2.x, which is required by the tfx-bsl testing fork.
- Refactored scalar extraction to use .item() instead of float(ndarray).
- Implemented safe division in AUC/PR AUC metrics.
- Fixed SubKey(k=k) indexing logic regression.
- Restored necessary protobuf generated files for environment stability.
- Verified fixes with full test suite pass.
- Fix scalar conversion issues in aggregation and flip metrics.
- Fix batching bug in _BooleanFlipCountsCombiner.
- Harden confusion matrix and calibration metrics against zero division.
- Fix rouge metric ValueError.
- Fix missing numpy imports across metrics.
- Add regression test for array-like inputs in flip_metrics_test.py.
@vkarampudi vkarampudi changed the title Testing Upgrade TFMA to TF 2.21.0 and Added Python 3.12/3.13 support Apr 30, 2026
@vkarampudi vkarampudi requested a review from genehwung April 30, 2026 18:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant