[MINOR] Add multi-block HBase-reader compat test for native HFile writer#19080
Draft
shangxinli wants to merge 2 commits into
Draft
[MINOR] Add multi-block HBase-reader compat test for native HFile writer#19080shangxinli wants to merge 2 commits into
shangxinli wants to merge 2 commits into
Conversation
added 2 commits
June 26, 2026 15:53
The existing TestHFileCompatibility#testHbaseReaderSucceedsWhenKeyValueVersionIsSetTo1 covers the single-data-block happy path with 5 records, which does not stress the load-on-open data structures (data-block index, meta-block index) that HFileWriterImpl's trailer points at. Add a parameterized cross-version compat test that writes 5,000 records across 5 key/value shapes representative of the Metadata Table (MDT) partitions (FILES, COLUMN_STATS, RECORD_INDEX, SECONDARY_INDEX, BLOOM_FILTERS), reopens each file with HBase 2.4.x HFile.createReader, and asserts: - trailer parses (no CorruptHFileException), - getEntries() matches the count written, - a full forward scan returns every key in order. A future trailer-layout drift in HFileWriterImpl (field reorder, missed field, width change) would fail at least one assertion across these shapes, before the change reaches MDT files in production. No production code changes; test-only. Signed-off-by: Xinli Shang <shangxinli@apache.org>
Three issues found by second-pass review on the previous commit: 1. BLOOM_FILTERS shape had keyLen=24 with a prefix of "BLOOM_FILTERS::" (15 bytes), leaving only 9 bytes for the 10-digit idx after the prefix. The earlier truncation path silently dropped the last digit, producing 10 identical keys per group of consecutive indices. Bump to keyLen=32 and harden the generator: it now throws IllegalArgumentException when keyLen is too small for prefix + idx instead of truncating. 2. assertNotNull(scanner.seekTo()) was a no-op guard; seekTo() returns a primitive (boolean in HBase 2.4.13), autoboxed to a non-null wrapper in all cases. Replace with assertTrue(scanner.seekTo(), ...). 3. The redundant assertDoesNotThrow(() -> HFile.createReader(...)) block opened a Reader that was never closed and immediately reopened one in the try-with-resources below. Remove it; the try-with-resources already proves trailer parse success. Signed-off-by: Xinli Shang <shangxinli@apache.org>
Contributor
Author
|
FYI — cherry-picked internally at Uber to gate our 0.14 → 1.2 cutover ahead of the OSS review cycle: uber-code/data-hoodie_oss#273. Any feedback here will be back-ported. |
Collaborator
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Describe the issue this Pull Request addresses
The existing
TestHFileCompatibility#testHbaseReaderSucceedsWhenKeyValueVersionIsSetTo1exercises HBase-reads-Hudi-written HFile compatibility, but only with 5 records (single data block). It does not stress the multi-block load-on-open data structures (data-block index, meta-block index) thatHFileWriterImpl's trailer points at — and those are the structures most exposed to a trailer-layout drift.A future change to
HFileWriterImplthat reorders, drops, or resizes any trailer field would break every reader that still uses HBase 2.4.xHFile.createReader()to open Metadata Table files. Such a regression is currently uncovered by CI.Summary and Changelog
Adds a parameterized cross-version compatibility test that writes 5,000 records across 5 key/value shapes representative of the Metadata Table (MDT) partitions, reopens each file with HBase 2.4.x
HFile.createReader, and asserts:CorruptHFileException),getEntries()matches the count written,Shapes covered:
FILES,COLUMN_STATS,RECORD_INDEX,SECONDARY_INDEX,BLOOM_FILTERS— each with realistic key and value sizes so multi-block index paths in the trailer are actually exercised.Test-only. No production code changes. New test file:
hudi-io/src/test/java/org/apache/hudi/io/hfile/TestHudiHFileMdtHbaseReadCompatibility.java.Impact
None on runtime behavior. CI runs ~2 seconds longer in the
hudi-iomodule.Risk Level
none — test-only, no production code touched.
Documentation Update
none
Contributor's checklist