fix(glue): revalidate source table during rename#1338
Conversation
tanmayrauth
left a comment
There was a problem hiding this comment.
Solid fix and the test coverage is good. One question on the residual window and a couple of small notes
| return nil, fmt.Errorf("failed to create the table %s.%s: %w", fromDatabase, fromTable, err) | ||
| } | ||
|
|
||
| currentFromGlueTable, err := c.getTable(ctx, fromDatabase, fromTable) |
There was a problem hiding this comment.
This closes most of the race, but there's still a TOCTOU gap between this revalidation GetTable and the DeleteTable below — a writer that commits in that sub-window still loses its commit. The summary already notes Glue has no conditional DeleteTable, so I think this is the best available mitigation (and create-then-delete is the right order). Could you add a line to the comment making clear it narrows rather than eliminates the race? Just so a future reader doesn't assume it's airtight.
|
|
||
| originalVersionID := aws.ToString(original.VersionId) | ||
| currentVersionID := aws.ToString(current.VersionId) | ||
| if originalVersionID != "" || currentVersionID != "" { |
There was a problem hiding this comment.
The version-first / metadata-location-fallback logic looks right to me, VersionId bumps on every Glue UpdateTable, so it's strictly stronger than comparing metadata_location, and only falling back when both
versions are empty is sensible. This does rely on two back-to-back GetTable calls returning the same VersionId absent a write (true for Glue), so no spurious rename failures — worth a one-line comment to
capture that assumption.
| Name: aws.String(tableName), | ||
| }) | ||
| if rollbackErr != nil { | ||
| fmt.Printf("failed to rollback the new table %s.%s: %v", database, tableName, rollbackErr) |
There was a problem hiding this comment.
nit: the Printf on rollback failure is pre-existing code you've just relocated, so not a blocker but since it's moving anyway, worth routing through a logger and adding the missing trailing newline.
Summary
Why
Glue RenameTable currently snapshots the source table, creates the destination table from that snapshot, and then deletes the source table. If another writer commits between the initial GetTable and DeleteTable, the destination can point at stale metadata while the source entry holding the newer commit is deleted.
Glue exposes VersionId optimistic locking on UpdateTable, but not on DeleteTable, so this change hardens rename by revalidating the source table immediately before delete and failing closed if it drifted.
Testing
go test ./catalog/glue -run 'TestGlueRenameTable|TestGlueRenameTable_DeleteTableFailureRollback|TestGlueRenameTable_SourceTableChangedRollback|TestIsConcurrentModificationException' -count=1go test ./catalog/glue -count=1git diff --check