Skip to content

fix(catalog/hadoop): publish metadata without replacement#1300

Open
fallintoplace wants to merge 1 commit into
apache:mainfrom
fallintoplace:fix/hadoop-no-replace-metadata-commit
Open

fix(catalog/hadoop): publish metadata without replacement#1300
fallintoplace wants to merge 1 commit into
apache:mainfrom
fallintoplace:fix/hadoop-no-replace-metadata-commit

Conversation

@fallintoplace

Copy link
Copy Markdown
Contributor

Summary

  • add a no-replace rename path for Hadoop metadata publishes
  • use it when creating v1.metadata.json and when committing the next table metadata version
  • return ErrTableAlreadyExists for create races and table.ErrCommitFailed for commit races
  • add deterministic concurrent create/commit coverage where two writers publish the same metadata version

Why

Hadoop commits wrote metadata to a temp file, checked whether the final path existed, then used Rename to publish it. On local filesystems os.Rename replaces the destination, so another writer could win between the existence check and rename and then be overwritten by the later commit.

The metadata file publish now fails if the final path already exists. version-hint.text stays best-effort after the metadata file is committed.

Testing

  • go test ./catalog/hadoop -run 'TestHadoopCatalogTestSuite/(TestCreateTableConcurrentMetadataPublishConflict|TestCommitTableConcurrentMetadataPublishConflict|TestCommitTableConflictDetection)' -count=1
  • go test ./io -run '^TestLocalFSRenameNoReplaceDoesNotOverwrite$' -count=1
  • go test ./catalog/hadoop -count=1
  • go test ./catalog/... -count=1
  • go test ./io -count=1
  • go run github.com/golangci/golangci-lint/v2/cmd/golangci-lint@v2.11.4 run --timeout=10m ./catalog/hadoop/...
  • go run github.com/golangci/golangci-lint/v2/cmd/golangci-lint@v2.11.4 run --timeout=10m ./io
  • git diff --check

@fallintoplace fallintoplace requested a review from zeroshade as a code owner June 24, 2026 17:06

@zeroshade zeroshade left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approve. Correct fix for a real TOCTOU race, well tested.

Bug: CreateTable/CommitTable did Stat-then-Rename, and os.Rename replaces the destination on local FS, so two writers racing the same version could both "succeed" with the later overwriting the winner.

Fix: RenameNoReplace publishes via os.Link (atomic create-or-fail) and maps fs.ErrExist to ErrTableAlreadyExists (create) / table.ErrCommitFailed (commit). The atomic gate is the os.Link step itself, so conflict detection does not depend on the link+remove pair being atomic.

Verified locally: errors.Is(os.Link-EEXIST, fs.ErrExist) is true (Go maps syscall.EEXIST to ErrExist, and *os.LinkError implements Unwrap), so the conflict mapping is sound. Concurrency tests pass and deterministically assert exactly 1 success + 1 conflict; ./io and ./catalog/hadoop tests pass; golangci-lint clean. Adding RenameNoReplaceIO to HadoopCatalogFS is safe since the catalog only supports LocalFS (compile-time assertion).

Two non-blocking nits inline.

Comment thread io/local.go
oldpath = strings.TrimPrefix(oldpath, "file://")
newpath = strings.TrimPrefix(newpath, "file://")

if err := os.Link(oldpath, newpath); err != nil {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

os.Link fails with EXDEV across filesystems. Here temp and final share the metadata dir (same FS), so it is fine in practice, but a one-line doc note on the method would help since this is a new exported io primitive. On Linux, unix.Renameat2(..., RENAME_NOREPLACE) is the truly-atomic kernel primitive; the link-based approach is the conventional portable fallback. Reasonable as-is.

Comment thread io/local.go
return err
}

_ = os.Remove(oldpath)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dropped os.Remove error: the publish already succeeded via os.Link, so this only risks leaking a stray temp file if removal fails (correctness is unaffected). A short comment noting it is intentional best-effort cleanup would prevent a future "unchecked error" flag.

@fallintoplace fallintoplace force-pushed the fix/hadoop-no-replace-metadata-commit branch from 370f7b5 to 2538056 Compare June 24, 2026 20:47
@fallintoplace fallintoplace reopened this Jun 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants