Aggregate FindJavaVersion to one row per repository#1105
Merged
Conversation
The recipe used a `transient Set<JavaVersion> seen` HashSet on the recipe instance to deduplicate rows. Because `JavaVersion` excludes its UUID from `equals`, two compilation units from different projects built with the same JDK compare equal, and recipe instances are reused across repositories within a single run, so identical Java versions in separate projects collapsed into a single data-table row. Drop the deduplication and add a `projectName` column derived from the `JavaProject` marker so that each project contributes its own row. Fixes moderneinc/customer-requests#2409
Convert FindJavaVersion to a ScanningRecipe keyed on GitProvenance.origin, so each git repository contributes exactly one row regardless of how many modules or compilation units it contains. When modules in a repository disagree on JDK, retain the row with the lowest target compatibility (tiebreak: lowest source) — the migration floor for the repository. Fallback key order: GitProvenance.origin -> JavaProject.getId() -> JavaVersion.getId(), so disconnected sources never silently merge. Revert the projectName column added in e5a310b; JavaVersionTable.Row is back to its original two columns (sourceVersion, targetVersion). Fixes moderneinc/customer-requests#2409
Address PR review feedback: - Replace `Map<Object, Row>` with `Map<String, Row>`. The three key namespaces (git origin URL, JavaProject UUID, JavaVersion UUID) now get explicit `origin:` / `project:` / `version:` prefixes so they can never collide with each other. - Collapse the if/else key-derivation into a single Optional chain.
sambsnyd
approved these changes
May 20, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
FindJavaVersionto aScanningRecipekeyed onGitProvenance.origin, so each git repository contributes exactly one row regardless of how many modules or compilation units it containsJavaVersionTable.Rowkeeps its two original columns,sourceVersionandtargetVersionProblem
The recipe originally kept a
transient HashSet<JavaVersion> seenon the recipe instance. Two things conspired to drop rows:JavaVersionexcludes its UUID fromequals/hashCode.The combination silently dropped every repository's row except the first one processed. The previous attempt on this branch removed the dedup and emitted one row per compilation unit — correct counts, but thousands of identical rows per project and no aggregation at the repository level (the unit that actually matters for portfolio-wide JDK inventory).
Solution
ScanningRecipe<Map<String, Row>>. The scanner readsJavaVersionandGitProvenancemarkers from eachJavaSourceFileand merges into the accumulator.origin:<GitProvenance.origin>→project:<JavaProject.getId()>→version:<JavaVersion.getId()>(in that fallback order). Prefixed string keys keep the three namespaces from colliding and make the accumulator readable at a glance.targetVersionwins; tiebreak on lowersourceVersion.generateflushes accumulated rows to the data table. No recipe-instance mutable state.Validation
Unit tests
multiModuleRepositoryCollapsesToOneRow— modules sharing oneGitProvenanceproduce a single rowheterogeneousVersionsInOneRepoPickLowestTarget— Java 8 + Java 17 modules in one repo produce a single row showing Java 8identicalJavaVersionMarkersAcrossRepositoriesAreEachReported— three repos sharing oneJavaVersionproduce three rows (regression for moderneinc/customer-requests#2409)withoutGitProvenanceFallsBackToPerProject— noGitProvenance→ one row perJavaProject./gradlew test --tests '*FindJavaVersionTest*'passesCLI reproduction on real repositories
Ran
mod run working-set --recipe org.openrewrite.java.migrate.search.FindJavaVersionagainst the 8 repositories in the Spinnaker organization (clouddriver, echo, fiat, front50, halyard, kayenta, kork, spinnaker-gradle-project — 4448 source files in clouddriver alone, varied submodule structures).To confirm the bug is real on the CLI (not just a theoretical concern) I swapped in the pre-fix code from commit
4beeb8a1and reran identically:transient seenHashSet)clouddriver(processed first) wrote aJavaVersionTable.csv.gz. All 7 other repos: no data table file emitted at all.ScanningRecipe)spinnaker-gradle-project).This proves three things on real LSTs:
seenHashSet leaked across all 8 repos and silently dropped 7 of them.clouddriver(4448 source files, ~30 submodules) collapses to a single row under the fix. Previously this would have been the only row, masking the much worse cross-repo bug.The "lowest version wins" branch (heterogeneous JDKs within one repo) isn't exercised by Spinnaker — every repo's submodules agree on a single JDK — but it's covered deterministically by the unit test.
Fixes moderneinc/customer-requests#2409