Skip to content

fix(table): detect concurrently-removed data files in row-level deletes#1298

Open
laskoviymishka wants to merge 1 commit into
apache:mainfrom
laskoviymishka:fix/mor-delete-data-file-existence
Open

fix(table): detect concurrently-removed data files in row-level deletes#1298
laskoviymishka wants to merge 1 commit into
apache:mainfrom
laskoviymishka:fix/mor-delete-data-file-existence

Conversation

@laskoviymishka

Copy link
Copy Markdown
Contributor

Two related fixes for optimistic-concurrency validation of position deletes.

  1. validateDataFilesExist counted a referenced data file as present whenever any manifest entry for its path existed on the branch head, because it iterated head.dataFiles (discardDeleted=false). A concurrent compaction or overwrite that rewrote the file leaves a DELETED tombstone, so the check passed and the orphaned position delete (its rows had moved to a new file) was committed silently. Iterate data-manifest entries and treat a DELETED entry as absent — only a live ADDED/EXISTING entry satisfies existence, mirroring Java's MergingSnapshotProducer. This corrects the check for every caller, including RowDelta.

  2. performMergeOnReadDeletion never ran that check at all: it builds an overwriteFiles producer whose only validator checks partition-filter overlap, and appended its position deletes without registering validateDataFilesExist (wired only into RowDelta.validate). So a merge-on-read delete whose referenced data file was concurrently removed went undetected. writePositionDeletesForFiles now returns the referenced data-file paths of the position deletes it appends, and performMergeOnReadDeletion registers a validateDataFilesExist validator over them, mirroring RowDelta.validate: unconditional (no isolation gating) and additive to the producer's partition-filter validator. A position delete that does not record its referenced data file is skipped, matching RowDelta and Java.

Tests: TestValidateDataFilesExist_TreatsConcurrentlyRemovedFileAsMissing pins the status-aware check (a compacted-away file is missing; a live file is present); TestMergeOnReadDeleteConflict_ReferencedDataFileRemoved drives a merge-on-read delete whose referenced file a concurrent compaction removes and asserts the commit fails with ErrDataFilesMissing; the no-conflict control still commits.

Two related fixes for optimistic-concurrency validation of position deletes.

1. validateDataFilesExist counted a referenced data file as present whenever
   any manifest entry for its path existed on the branch head, because it
   iterated head.dataFiles (discardDeleted=false). A concurrent compaction or
   overwrite that rewrote the file leaves a DELETED tombstone, so the check
   passed and the orphaned position delete (its rows had moved to a new file)
   was committed silently. Iterate data-manifest entries and treat a DELETED
   entry as absent — only a live ADDED/EXISTING entry satisfies existence,
   mirroring Java's MergingSnapshotProducer. This corrects the check for every
   caller, including RowDelta.

2. performMergeOnReadDeletion never ran that check at all: it builds an
   overwriteFiles producer whose only validator checks partition-filter
   overlap, and appended its position deletes without registering
   validateDataFilesExist (wired only into RowDelta.validate). So a
   merge-on-read delete whose referenced data file was concurrently removed
   went undetected. writePositionDeletesForFiles now returns the referenced
   data-file paths of the position deletes it appends, and
   performMergeOnReadDeletion registers a validateDataFilesExist validator over
   them, mirroring RowDelta.validate: unconditional (no isolation gating) and
   additive to the producer's partition-filter validator. A position delete
   that does not record its referenced data file is skipped, matching RowDelta
   and Java.

Tests: TestValidateDataFilesExist_TreatsConcurrentlyRemovedFileAsMissing pins
the status-aware check (a compacted-away file is missing; a live file is
present); TestMergeOnReadDeleteConflict_ReferencedDataFileRemoved drives a
merge-on-read delete whose referenced file a concurrent compaction removes and
asserts the commit fails with ErrDataFilesMissing; the no-conflict control
still commits.
@laskoviymishka laskoviymishka marked this pull request as ready for review June 24, 2026 22:59

@zeroshade zeroshade left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. The status-aware existence check (a DELETED tombstone no longer counts as present) matches Java's MergingSnapshotProducer semantics, and wiring the unconditional validateDataFilesExist into the MoR delete path closes the gap cleanly. Tests are well targeted. One tiny doc-comment nit inline.


// newV3MoRConflictTestTable builds a v3 table that deletes via merge-on-read
// (so position deletes are written as Puffin deletion vectors that record their
// referenced data file) and never retries commits, so a single conflict surfaces

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: the doc comment says the table 'never retries commits', but CommitNumRetriesKey is set to 2 (with short backoff) just below - the conflict actually surfaces through a retry/refresh rather than by skipping retries. Worth rewording so the comment matches the config (e.g. note it uses short retries so the stale first attempt refreshes and the conflict surfaces terminally).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants