Skip to content

fix(hadoop): publish metadata without replacing existing files#1325

Closed
fallintoplace wants to merge 1 commit into
apache:mainfrom
fallintoplace:fix/hadoop-no-clobber-metadata-publish
Closed

fix(hadoop): publish metadata without replacing existing files#1325
fallintoplace wants to merge 1 commit into
apache:mainfrom
fallintoplace:fix/hadoop-no-clobber-metadata-publish

Conversation

@fallintoplace

Copy link
Copy Markdown
Contributor

Summary

  • publish staged metadata by hard-linking into place instead of renaming over the target
  • use the same no-replace publish path for both CreateTable and CommitTable
  • add regression coverage that a rival metadata file wins without being overwritten

Why

CreateTable and CommitTable currently stage metadata to a temp file and then rename it into place. On POSIX, rename can replace an existing destination, so a concurrent writer that publishes the same metadata version first can still be overwritten by the loser.

Using link(temp, final) keeps the publish step atomic while making an existing destination fail with ErrExist instead of being replaced.

Testing

  • go test ./catalog/hadoop -run 'TestHadoopCatalogTestSuite/(TestCreateTableDoesNotOverwriteConcurrentMetadataFile|TestCommitTableDoesNotOverwriteConcurrentMetadataFile|TestCommitTableConflictDetection|TestCommitTableNoOrphanedTempFiles|TestCreateTableAlreadyExists)$' -count=1
  • go test ./catalog/hadoop -count=1
  • go test ./catalog/... -count=1
  • go test ./... -run '^$' -count=1
  • go test ./io -count=1
  • go run github.com/golangci/golangci-lint/v2/cmd/golangci-lint@v2.11.4 run --timeout=10m ./catalog/hadoop/...
  • git diff --check

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant