Skip to content

fix(hadoop): use codec-aware metadata filenames#1326

Open
fallintoplace wants to merge 2 commits into
apache:mainfrom
fallintoplace:fix/hadoop-compressed-metadata-paths
Open

fix(hadoop): use codec-aware metadata filenames#1326
fallintoplace wants to merge 2 commits into
apache:mainfrom
fallintoplace:fix/hadoop-compressed-metadata-paths

Conversation

@fallintoplace

Copy link
Copy Markdown
Contributor

Summary

  • write Hadoop metadata files with suffixes that match the configured metadata compression codec
  • load the actual discovered metadata file path instead of reconstructing an uncompressed-looking path
  • teach Hadoop metadata discovery to recognize plain, gzip, and zstd metadata files consistently

Why

Hadoop currently always writes metadata to vN.metadata.json even when write.metadata.compression-codec requests gzip or zstd. The table metadata reader infers decompression from the filename suffix, so compressed bytes behind a plain .metadata.json path can fail to reload.

Using codec-aware filenames on write and carrying the discovered metadata path through LoadTable keeps create, commit, and reload consistent for plain, gzip, and zstd metadata.

Testing

  • go test ./catalog/hadoop -run 'TestHadoopCatalogTestSuite/(TestMetadataFilePathForCompression|TestVersionPatternMatches|TestIsTableDirWithZstdMetadata|TestFindVersionGzipOnlyWithHint|TestFindVersionMixedGzipAndPlain|TestFindVersionZstdOnlyWithHint|TestFindVersionMixedZstdAndPlain|TestCreateTableGzipMetadata|TestCreateTableZstdMetadata|TestCommitTableGzipMetadata|TestCommitTableZstdMetadata)$' -count=1
  • go test ./catalog/hadoop -count=1
  • go test ./catalog/... -count=1
  • go test ./... -run '^$' -count=1
  • go run github.com/golangci/golangci-lint/v2/cmd/golangci-lint@v2.11.4 run --timeout=10m ./catalog/hadoop/...
  • git diff --check

@fallintoplace fallintoplace requested a review from zeroshade as a code owner June 27, 2026 09:55

@tanmayrauth tanmayrauth left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good — write suffix now matches the read-path codec inference in table.go, and the create→load / commit→load round-trip tests cover both gzip and zstd. One tiny nit.

Comment thread catalog/hadoop/hadoop.go Outdated
@@ -188,6 +190,22 @@ func (c *Catalog) metadataFilePath(ident table.Identifier, version int) string {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After this change metadataFilePath no longer has a production caller, CreateTable/CommitTable/LoadTable all go through metadataFilePathForCompression / findMetadataLocation now. It's still referenced from the tests so it's not dead, but might be worth either keeping it explicitly as a test-only helper or pointing those tests at the new helper to avoid drift.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be a valid comment, I will amend the PR. Have a nice weekend.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants