Skip to content

fix(storage-format): emit HBase-readable block-index keys in the native HFile writer#19071

Draft
yihua wants to merge 10 commits into
apache:masterfrom
yihua:hfile-hbase-readable-block-index
Draft

fix(storage-format): emit HBase-readable block-index keys in the native HFile writer#19071
yihua wants to merge 10 commits into
apache:masterfrom
yihua:hfile-hbase-readable-block-index

Conversation

@yihua

@yihua yihua commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Change Logs

The native hudi-io HFile writer stored each block-index first key as a bare [2-byte len][row] with no HBase KeyValue suffix. An HBase-based HFile reader (such as the Hudi 0.x reader) parsed it as a full KeyValue and read the family-length byte one past the end, throwing ArrayIndexOutOfBoundsException in KeyValue.getFamilyLength during a point or prefix lookup of a data-block boundary key. Full scans worked; only the block-index binary search crashed.

Fix: HFileRootIndexBlock now writes each block-index first key as a full HBase KeyValue key ([2-byte rowLen][row][cfLen=0][ts=LATEST][type=Put]), matching the data-block entries. Both the data block and the root index block serialize this key through a single HFileUtils.writeKeyValueKey helper, so the column-family length, timestamp, and key type are single-sourced and cannot drift between the two blocks.

Backward compatibility: the native hudi-io reader is unchanged and reads both old bare-key and new full-key files (it compares only the row via the leading 2-byte length prefix), so no reader change and no format-version flag are needed. Only files written after this change are affected; existing bare-key metadata tables are not retroactively fixed.

Tests (hudi-io):

  • TestHFileReadCompatibility: an HBase reader point-looks-up every key (including block-boundary keys) in a native-written multi-block file; native-vs-HBase writer cells are byte-identical under the HBase reader; and a native file reads identically through the native and HBase readers. The point-lookup and byte-comparison cases run under both NONE and GZIP compression.
  • TestHFileWriter: a strict golden byte comparison of the data block and root block-index block locks the on-disk format; the single index entry grows by the 10-byte KeyValue suffix (4537 to 4547).
  • TestHFileDataBlock and TestHFileRootIndexBlock: byte-level unit tests asserting the exact serialized bytes of each data-block record and each root-index entry field by field (key and value lengths, row, column-family length, timestamp, key type, MVCC).

The HBase HFile writer is used only in test scope (hudi-io already declares HBase as a test dependency); production hudi-io has no HBase dependency.

Impact

New metadata-table HFiles written by the native writer become point-lookup-readable by an HBase-based HFile reader. No change to the native read or write path or the on-disk layout beyond the 10-byte-per-index-entry KeyValue suffix.

Risk level (write none, low medium or high below)

low. The change is additive (a fixed suffix on block-index keys), covered by byte-exact and cross-reader tests under both compression modes, and the native reader reads both formats.

Documentation Update

None.

Contributor's checklist

  • Read through contributor's guide
  • Change Logs and Impact were stated clearly
  • Adequate tests were added if applicable
  • CI passed

yihua added 2 commits June 25, 2026 10:34
…ve HFile writer

The native hudi-io HFile writer stored each block-index first key as a bare
[2-byte len][row] with no HBase KeyValue suffix. An HBase-based HFile reader
parsed it as a full KeyValue and read the family-length byte one past the end,
throwing ArrayIndexOutOfBoundsException in KeyValue.getFamilyLength during a
point or prefix lookup of a data-block boundary key. Full scans worked; only the
block-index binary search crashed.

Write each block-index first key as a full HBase KeyValue key
([2-byte rowLen][row][cfLen=0][ts=LATEST][type=Put]), matching the data-block
entries. The native reader is unaffected: it compares only the row via the 2-byte
length prefix, so it reads both old bare-key and new full-key files. Only files
written after this change are affected.

Tests (hudi-io):
- TestHFileReadCompatibility: an HBase reader point-looks-up every key, including
  block-boundary keys, in a native-written multi-block file; native-vs-HBase
  writer cells are byte-identical under the HBase reader; and a native file reads
  identically through the native and HBase readers. The point-lookup and
  byte-comparison cases run under both NONE and GZIP compression.
- TestHFileWriter: a golden byte comparison of the data block and root block-index
  block locks the on-disk format; the single index entry grows by the 10-byte
  KeyValue suffix.
Both the data block and the root index block now serialize the HBase KeyValue
key via HFileUtils.writeKeyValueKey / keyValueKeyLength, so the column-family
length (0), timestamp (latest), and key type (Put) are single-sourced and cannot
drift between the two blocks (a point lookup compares index keys against data
keys, so a mismatch would break it). Behavior-preserving: the on-disk bytes are
unchanged and the byte-exact format-lock golden still passes.
@github-actions github-actions Bot added the size:L PR with lines of changes in (300, 1000] label Jun 25, 2026
yihua added 8 commits June 25, 2026 15:32
Validate the exact serialized bytes of each data-block record and each
root-index entry field by field (key and value lengths, row, column-family
length, timestamp, key type, and MVCC), pinning the HBase KeyValue framing
at the block level.
…lock base class

Move keyValueKeyLength and writeKey (renamed from writeKeyValueKey) out of
HFileUtils and into the shared HFileBlock base class, next to the existing
getVariableLengthEncodedBytes helper, so the data block and the root index
block share them without a util class. The timestamp, key type, and suffix
length constants are now private to HFileBlock. Restore the original data
block comment and drop the redundant inline comment in the root index block;
the rationale now lives in the keyValueKeyLength javadoc.
- Refer to the field as the KeyValue key (file-format-specific, not HBase
  specific) in HFileBlock javadocs and comments.
- writeKey now takes the row length explicitly rather than the backing-array
  length, since a Key's array can be larger than its content.
- Document the KeyValue key metadata suffix (column-family length, timestamp,
  key type) in hfile_format.md for the data block and the data index entry.
writeKey now takes the row offset as well as the length and writes
row[offset, offset + length), so a key that is a view into a larger buffer
(non-zero offset) serializes correctly. The root index caller passes the
key's offset; the data block passes 0. No behavior change for current callers
(write-path keys start at offset 0), but it removes the offset-0 assumption.
Describe what the block-index point-lookup and midKey tests validate (the
HFile key encoding) rather than referencing before/after behavior.
…fied in the read-compat test

Switch the assertion calls to individually static-imported methods (matching the
other hudi-io tests) and, since this file is in the org.apache.hudi.io.hfile
package, reference the native HFileContext unqualified while fully-qualifying the
HBase HFileContext.
…format-lock test

Write the fixed records with the HBase HFile writer (NONE compression, NULL
checksum, latest timestamp, Put type) and assert its data block and root
block-index bytes equal the same golden, confirming the native and HBase writers
produce byte-identical blocks.
@hudi-bot

Copy link
Copy Markdown
Collaborator

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:L PR with lines of changes in (300, 1000]

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants