Skip to content

Aggregate FindJavaVersion to one row per repository#1105

Merged
sambsnyd merged 3 commits into
mainfrom
find-java-version-per-project
May 20, 2026
Merged

Aggregate FindJavaVersion to one row per repository#1105
sambsnyd merged 3 commits into
mainfrom
find-java-version-per-project

Conversation

@timtebeek
Copy link
Copy Markdown
Member

@timtebeek timtebeek commented May 20, 2026

Summary

  • Convert FindJavaVersion to a ScanningRecipe keyed on GitProvenance.origin, so each git repository contributes exactly one row regardless of how many modules or compilation units it contains
  • When modules in a repository disagree on JDK, retain the row with the lowest target compatibility (tiebreak: lowest source) — the migration floor for the repository
  • No schema change: JavaVersionTable.Row keeps its two original columns, sourceVersion and targetVersion

Problem

The recipe originally kept a transient HashSet<JavaVersion> seen on the recipe instance. Two things conspired to drop rows:

  1. JavaVersion excludes its UUID from equals/hashCode.
  2. The Moderne CLI reuses one recipe instance across every repository in a working-set run (verified empirically — see below).

The combination silently dropped every repository's row except the first one processed. The previous attempt on this branch removed the dedup and emitted one row per compilation unit — correct counts, but thousands of identical rows per project and no aggregation at the repository level (the unit that actually matters for portfolio-wide JDK inventory).

Solution

  • ScanningRecipe<Map<String, Row>>. The scanner reads JavaVersion and GitProvenance markers from each JavaSourceFile and merges into the accumulator.
  • Key: origin:<GitProvenance.origin>project:<JavaProject.getId()>version:<JavaVersion.getId()> (in that fallback order). Prefixed string keys keep the three namespaces from colliding and make the accumulator readable at a glance.
  • Merge: lower targetVersion wins; tiebreak on lower sourceVersion.
  • generate flushes accumulated rows to the data table. No recipe-instance mutable state.

Validation

Unit tests

  • multiModuleRepositoryCollapsesToOneRow — modules sharing one GitProvenance produce a single row
  • heterogeneousVersionsInOneRepoPickLowestTarget — Java 8 + Java 17 modules in one repo produce a single row showing Java 8
  • identicalJavaVersionMarkersAcrossRepositoriesAreEachReported — three repos sharing one JavaVersion produce three rows (regression for moderneinc/customer-requests#2409)
  • withoutGitProvenanceFallsBackToPerProject — no GitProvenance → one row per JavaProject
  • ./gradlew test --tests '*FindJavaVersionTest*' passes

CLI reproduction on real repositories

Ran mod run working-set --recipe org.openrewrite.java.migrate.search.FindJavaVersion against the 8 repositories in the Spinnaker organization (clouddriver, echo, fiat, front50, halyard, kayenta, kork, spinnaker-gradle-project — 4448 source files in clouddriver alone, varied submodule structures).

To confirm the bug is real on the CLI (not just a theoretical concern) I swapped in the pre-fix code from commit 4beeb8a1 and reran identically:

Code version Rows in aggregated CSV Per-repo distribution
Pre-fix (transient seen HashSet) 1 Only clouddriver (processed first) wrote a JavaVersionTable.csv.gz. All 7 other repos: no data table file emitted at all.
This PR (ScanningRecipe) 8 One row per repo. 7 × Java 17, 1 × Java 11 (spinnaker-gradle-project).

This proves three things on real LSTs:

  1. The CLI does reuse the recipe instance across repositories in a working-set run. The buggy seen HashSet leaked across all 8 repos and silently dropped 7 of them.
  2. Multi-module aggregation works: clouddriver (4448 source files, ~30 submodules) collapses to a single row under the fix. Previously this would have been the only row, masking the much worse cross-repo bug.
  3. Cross-repo non-collapse works (regression for #2409): 7 repositories sharing Java 17 each appear as their own row instead of merging into one.

The "lowest version wins" branch (heterogeneous JDKs within one repo) isn't exercised by Spinnaker — every repo's submodules agree on a single JDK — but it's covered deterministically by the unit test.

Fixes moderneinc/customer-requests#2409

The recipe used a `transient Set<JavaVersion> seen` HashSet on the recipe
instance to deduplicate rows. Because `JavaVersion` excludes its UUID from
`equals`, two compilation units from different projects built with the same
JDK compare equal, and recipe instances are reused across repositories within
a single run, so identical Java versions in separate projects collapsed into
a single data-table row.

Drop the deduplication and add a `projectName` column derived from the
`JavaProject` marker so that each project contributes its own row.

Fixes moderneinc/customer-requests#2409
@github-project-automation github-project-automation Bot moved this to In Progress in OpenRewrite May 20, 2026
@timtebeek timtebeek marked this pull request as draft May 20, 2026 18:00
@timtebeek timtebeek changed the title Record one row per project in FindJavaVersion Emit one row per project in FindJavaVersion via ScanningRecipe May 20, 2026
@timtebeek timtebeek changed the title Emit one row per project in FindJavaVersion via ScanningRecipe Emit one row per git repository in FindJavaVersion May 20, 2026
@timtebeek timtebeek changed the title Emit one row per git repository in FindJavaVersion Aggregate FindJavaVersion to one row per repository May 20, 2026
timtebeek added 2 commits May 20, 2026 20:26
Convert FindJavaVersion to a ScanningRecipe keyed on GitProvenance.origin,
so each git repository contributes exactly one row regardless of how many
modules or compilation units it contains. When modules in a repository
disagree on JDK, retain the row with the lowest target compatibility
(tiebreak: lowest source) — the migration floor for the repository.

Fallback key order: GitProvenance.origin -> JavaProject.getId() ->
JavaVersion.getId(), so disconnected sources never silently merge.

Revert the projectName column added in e5a310b; JavaVersionTable.Row
is back to its original two columns (sourceVersion, targetVersion).

Fixes moderneinc/customer-requests#2409
Address PR review feedback:
- Replace `Map<Object, Row>` with `Map<String, Row>`. The three key
  namespaces (git origin URL, JavaProject UUID, JavaVersion UUID) now get
  explicit `origin:` / `project:` / `version:` prefixes so they can never
  collide with each other.
- Collapse the if/else key-derivation into a single Optional chain.
@timtebeek timtebeek marked this pull request as ready for review May 20, 2026 21:13
@timtebeek timtebeek marked this pull request as draft May 20, 2026 21:18
@timtebeek timtebeek marked this pull request as ready for review May 20, 2026 21:26
@timtebeek timtebeek requested a review from sambsnyd May 20, 2026 21:26
@github-project-automation github-project-automation Bot moved this from In Progress to Ready to Review in OpenRewrite May 20, 2026
@sambsnyd sambsnyd merged commit a08ce01 into main May 20, 2026
1 check passed
@sambsnyd sambsnyd deleted the find-java-version-per-project branch May 20, 2026 21:27
@github-project-automation github-project-automation Bot moved this from Ready to Review to Done in OpenRewrite May 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

2 participants