Skip to content

fix(glue): revalidate source table during rename#1338

Open
fallintoplace wants to merge 2 commits into
apache:mainfrom
fallintoplace:fix/glue-rename-concurrent-commit-loss
Open

fix(glue): revalidate source table during rename#1338
fallintoplace wants to merge 2 commits into
apache:mainfrom
fallintoplace:fix/glue-rename-concurrent-commit-loss

Conversation

@fallintoplace

Copy link
Copy Markdown
Contributor

Summary

  • re-fetch the source Glue table after creating the rename destination and before deleting the source
  • roll back the destination table and fail the rename if the source table's VersionId or metadata location changed during the rename window
  • add regression coverage for source-table drift during rename alongside the existing rollback path

Why

Glue RenameTable currently snapshots the source table, creates the destination table from that snapshot, and then deletes the source table. If another writer commits between the initial GetTable and DeleteTable, the destination can point at stale metadata while the source entry holding the newer commit is deleted.

Glue exposes VersionId optimistic locking on UpdateTable, but not on DeleteTable, so this change hardens rename by revalidating the source table immediately before delete and failing closed if it drifted.

Testing

  • go test ./catalog/glue -run 'TestGlueRenameTable|TestGlueRenameTable_DeleteTableFailureRollback|TestGlueRenameTable_SourceTableChangedRollback|TestIsConcurrentModificationException' -count=1
  • go test ./catalog/glue -count=1
  • git diff --check

@fallintoplace fallintoplace requested a review from zeroshade as a code owner June 27, 2026 18:08

@tanmayrauth tanmayrauth left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solid fix and the test coverage is good. One question on the residual window and a couple of small notes

Comment thread catalog/glue/glue.go
return nil, fmt.Errorf("failed to create the table %s.%s: %w", fromDatabase, fromTable, err)
}

currentFromGlueTable, err := c.getTable(ctx, fromDatabase, fromTable)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This closes most of the race, but there's still a TOCTOU gap between this revalidation GetTable and the DeleteTable below — a writer that commits in that sub-window still loses its commit. The summary already notes Glue has no conditional DeleteTable, so I think this is the best available mitigation (and create-then-delete is the right order). Could you add a line to the comment making clear it narrows rather than eliminates the race? Just so a future reader doesn't assume it's airtight.

Comment thread catalog/glue/glue.go

originalVersionID := aws.ToString(original.VersionId)
currentVersionID := aws.ToString(current.VersionId)
if originalVersionID != "" || currentVersionID != "" {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The version-first / metadata-location-fallback logic looks right to me, VersionId bumps on every Glue UpdateTable, so it's strictly stronger than comparing metadata_location, and only falling back when both
versions are empty is sensible. This does rely on two back-to-back GetTable calls returning the same VersionId absent a write (true for Glue), so no spurious rename failures — worth a one-line comment to
capture that assumption.

Comment thread catalog/glue/glue.go Outdated
Name: aws.String(tableName),
})
if rollbackErr != nil {
fmt.Printf("failed to rollback the new table %s.%s: %v", database, tableName, rollbackErr)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: the Printf on rollback failure is pre-existing code you've just relocated, so not a blocker but since it's moving anyway, worth routing through a logger and adding the missing trailing newline.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants