fix(table): validate pos delete position chunks#1337
Conversation
tanmayrauth
left a comment
There was a problem hiding this comment.
LGTM, nice catch turning the blind pos cast into a clean schema error. Just one nit and a question, details inline.
| func collectPosDeletePositions(positionalDeletes positionDeletes) (set[int64], error) { | ||
| deletes := set[int64]{} | ||
| for _, chunk := range positionalDeletes { | ||
| if chunk == nil { |
There was a problem hiding this comment.
Is the nil-chunk guard meant purely as defensiveness? groupPosDeletesByFilePath only ever stores retained, non-nil chunks into the map, so this branch shouldn't be reachable from the scan path — just want to make sure I'm not missing a caller that can pass a nil chunk in.
| return nil, fmt.Errorf("%w: unsupported pos column type %s in position delete file", | ||
| iceberg.ErrInvalidSchema, chunk.DataType()) | ||
| } | ||
| for _, arr := range chunk.Chunks() { |
There was a problem hiding this comment.
The inner arr.(*array.Int64) assertion is unreachable here — once the outer chunk.DataType().ID() == arrow.INT64 check at line 401 passes, every chunk of that column is already *array.Int64, so the
!ok branch and its separate "unsupported pos chunk array type" message can't fire. Not a problem to leave as defensive code, but you could drop the , ok check (or fold the two error messages into one) to
keep it tight.
Summary
poscasts with a shared validated collectorErrInvalidSchemawhen a delete file exposes a non-int64poschunkposchunk typesWhy
The Arrow scanner assumed every positional-delete
poschunk was*array.Int64and cast it directly in both scan paths. If a malformed delete file carried the wrong Arrow type, the scanner panicked instead of returning a schema error.Testing
go test ./table -run 'Test(GroupPosDeletesByFilePathSupportsStringLayouts|GroupPosDeletesByFilePathRejectsUnsupportedFilePathLayout|CollectPosDeletePositionsRejectsUnsupportedPosType|ProcessPositionalDeletesAcrossBatches)$' -count=1go test ./table -count=1