Skip to content

[MINOR] Add multi-block HBase-reader compat test for native HFile writer#19080

Draft
shangxinli wants to merge 2 commits into
apache:masterfrom
shangxinli:shangx/hudi-cross-version-hfile-compat
Draft

[MINOR] Add multi-block HBase-reader compat test for native HFile writer#19080
shangxinli wants to merge 2 commits into
apache:masterfrom
shangxinli:shangx/hudi-cross-version-hfile-compat

Conversation

@shangxinli

@shangxinli shangxinli commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Describe the issue this Pull Request addresses

The existing TestHFileCompatibility#testHbaseReaderSucceedsWhenKeyValueVersionIsSetTo1 exercises HBase-reads-Hudi-written HFile compatibility, but only with 5 records (single data block). It does not stress the multi-block load-on-open data structures (data-block index, meta-block index) that HFileWriterImpl's trailer points at — and those are the structures most exposed to a trailer-layout drift.

A future change to HFileWriterImpl that reorders, drops, or resizes any trailer field would break every reader that still uses HBase 2.4.x HFile.createReader() to open Metadata Table files. Such a regression is currently uncovered by CI.

Summary and Changelog

Adds a parameterized cross-version compatibility test that writes 5,000 records across 5 key/value shapes representative of the Metadata Table (MDT) partitions, reopens each file with HBase 2.4.x HFile.createReader, and asserts:

  • the trailer parses (no CorruptHFileException),
  • getEntries() matches the count written,
  • a full forward scan returns every key in the order written.

Shapes covered: FILES, COLUMN_STATS, RECORD_INDEX, SECONDARY_INDEX, BLOOM_FILTERS — each with realistic key and value sizes so multi-block index paths in the trailer are actually exercised.

Test-only. No production code changes. New test file: hudi-io/src/test/java/org/apache/hudi/io/hfile/TestHudiHFileMdtHbaseReadCompatibility.java.

Impact

None on runtime behavior. CI runs ~2 seconds longer in the hudi-io module.

Risk Level

none — test-only, no production code touched.

Documentation Update

none

Contributor's checklist

  • Read through contributor's guide
  • Enough context is provided in the sections above
  • Adequate tests were added if applicable (this PR is the test)
  • CI passed

Xinli Shang added 2 commits June 26, 2026 15:53
The existing TestHFileCompatibility#testHbaseReaderSucceedsWhenKeyValueVersionIsSetTo1
covers the single-data-block happy path with 5 records, which does not
stress the load-on-open data structures (data-block index, meta-block
index) that HFileWriterImpl's trailer points at.

Add a parameterized cross-version compat test that writes 5,000 records
across 5 key/value shapes representative of the Metadata Table (MDT)
partitions (FILES, COLUMN_STATS, RECORD_INDEX, SECONDARY_INDEX,
BLOOM_FILTERS), reopens each file with HBase 2.4.x HFile.createReader,
and asserts:

- trailer parses (no CorruptHFileException),
- getEntries() matches the count written,
- a full forward scan returns every key in order.

A future trailer-layout drift in HFileWriterImpl (field reorder, missed
field, width change) would fail at least one assertion across these
shapes, before the change reaches MDT files in production.

No production code changes; test-only.

Signed-off-by: Xinli Shang <shangxinli@apache.org>
Three issues found by second-pass review on the previous commit:

1. BLOOM_FILTERS shape had keyLen=24 with a prefix of "BLOOM_FILTERS::"
   (15 bytes), leaving only 9 bytes for the 10-digit idx after the prefix.
   The earlier truncation path silently dropped the last digit, producing
   10 identical keys per group of consecutive indices. Bump to keyLen=32
   and harden the generator: it now throws IllegalArgumentException when
   keyLen is too small for prefix + idx instead of truncating.

2. assertNotNull(scanner.seekTo()) was a no-op guard; seekTo() returns a
   primitive (boolean in HBase 2.4.13), autoboxed to a non-null wrapper
   in all cases. Replace with assertTrue(scanner.seekTo(), ...).

3. The redundant assertDoesNotThrow(() -> HFile.createReader(...))
   block opened a Reader that was never closed and immediately reopened
   one in the try-with-resources below. Remove it; the try-with-resources
   already proves trailer parse success.

Signed-off-by: Xinli Shang <shangxinli@apache.org>
@github-actions github-actions Bot added the size:M PR with lines of changes in (100, 300] label Jun 26, 2026
@shangxinli

Copy link
Copy Markdown
Contributor Author

FYI — cherry-picked internally at Uber to gate our 0.14 → 1.2 cutover ahead of the OSS review cycle: uber-code/data-hoodie_oss#273. Any feedback here will be back-ported.

@hudi-bot

Copy link
Copy Markdown
Collaborator

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:M PR with lines of changes in (100, 300]

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants