fix(catalog/hadoop): publish metadata without replacement#1300
fix(catalog/hadoop): publish metadata without replacement#1300fallintoplace wants to merge 1 commit into
Conversation
zeroshade
left a comment
There was a problem hiding this comment.
Approve. Correct fix for a real TOCTOU race, well tested.
Bug: CreateTable/CommitTable did Stat-then-Rename, and os.Rename replaces the destination on local FS, so two writers racing the same version could both "succeed" with the later overwriting the winner.
Fix: RenameNoReplace publishes via os.Link (atomic create-or-fail) and maps fs.ErrExist to ErrTableAlreadyExists (create) / table.ErrCommitFailed (commit). The atomic gate is the os.Link step itself, so conflict detection does not depend on the link+remove pair being atomic.
Verified locally: errors.Is(os.Link-EEXIST, fs.ErrExist) is true (Go maps syscall.EEXIST to ErrExist, and *os.LinkError implements Unwrap), so the conflict mapping is sound. Concurrency tests pass and deterministically assert exactly 1 success + 1 conflict; ./io and ./catalog/hadoop tests pass; golangci-lint clean. Adding RenameNoReplaceIO to HadoopCatalogFS is safe since the catalog only supports LocalFS (compile-time assertion).
Two non-blocking nits inline.
| oldpath = strings.TrimPrefix(oldpath, "file://") | ||
| newpath = strings.TrimPrefix(newpath, "file://") | ||
|
|
||
| if err := os.Link(oldpath, newpath); err != nil { |
There was a problem hiding this comment.
os.Link fails with EXDEV across filesystems. Here temp and final share the metadata dir (same FS), so it is fine in practice, but a one-line doc note on the method would help since this is a new exported io primitive. On Linux, unix.Renameat2(..., RENAME_NOREPLACE) is the truly-atomic kernel primitive; the link-based approach is the conventional portable fallback. Reasonable as-is.
| return err | ||
| } | ||
|
|
||
| _ = os.Remove(oldpath) |
There was a problem hiding this comment.
Dropped os.Remove error: the publish already succeeded via os.Link, so this only risks leaking a stray temp file if removal fails (correctness is unaffected). A short comment noting it is intentional best-effort cleanup would prevent a future "unchecked error" flag.
370f7b5 to
2538056
Compare
Summary
v1.metadata.jsonand when committing the next table metadata versionErrTableAlreadyExistsfor create races andtable.ErrCommitFailedfor commit racesWhy
Hadoop commits wrote metadata to a temp file, checked whether the final path existed, then used
Renameto publish it. On local filesystemsos.Renamereplaces the destination, so another writer could win between the existence check and rename and then be overwritten by the later commit.The metadata file publish now fails if the final path already exists.
version-hint.textstays best-effort after the metadata file is committed.Testing