feat(schema): refactor UpdateSchemaAction for schema evolution in Iceberg#2451
feat(schema): refactor UpdateSchemaAction for schema evolution in Iceberg#2451TwinklerG wants to merge 6 commits into
UpdateSchemaAction for schema evolution in Iceberg#2451Conversation
…dateColumn` struct; promotion type check in Column Update; remove useless test case
UpdateSchemaAction for schema evolution in Iceberg
UpdateSchemaAction for schema evolution in IcebergUpdateSchemaAction for schema evolution in Iceberg
|
I'm not sure if this is the correct direction: it changes the update_schema API, and I don't think it adds much value for users/devs Could you share more about your reasoning here? |
Well, I have been continuously tracking the Issues #2119 #697 and the related PRs. I noticed that the merged PR #2120 only added the capabilities of adding column and deleting column, but this is obviously only a small part of the schema evolution, and I don't think the original implementation was a good one. Therefore, I imitated the implementation of iceberg java and completed this refactoring |
Extends UpdateSchemaAction with a rename(RenameColumn) builder method,
completing the add/delete/rename triad for non-incompatible schema
evolution.
API shape:
action.rename(
RenameColumn::builder().name("old").new_name("new").build(),
);
Mirrors the typed-builder shape used by apache#2451's broader schema-evolution
refactor, so callers adopting this PR today can migrate to apache#2451 with
no source change once it lands.
Renames preserve field IDs — only the leaf name changes. RenameColumn::name
uses SCHEMA_NAME_DELIMITER for nested fields (e.g. "person.name");
RenameColumn::new_name must be unqualified.
Ordering in commit():
1. deletes validated
2. renames validated — runs after deletes so a rename can re-use a
name being deleted, before additions so an addition can re-use a
name being renamed away
3. additions validated + ID-assigned
4. schema tree rebuilt with renames threaded through
rebuild_fields / rebuild_field
Validation rules (match pyiceberg's UpdateSchema.rename_column):
- missing source field → PreconditionFailed
- source field also staged for deletion → PreconditionFailed
- new_name contains SCHEMA_NAME_DELIMITER → PreconditionFailed
- new_name collides with a non-deleted, non-renamed sibling →
PreconditionFailed
- same field renamed twice → last rename wins
Identifier-field handling falls out for free — Rust keys identifier
fields by ID, not name, so with_identifier_field_ids(base_schema
.identifier_field_ids()) propagates the set unchanged across rename.
Tests: 10 new tests covering simple root rename, nested rename,
missing field, delete-conflict, sibling-collision, dotted-name
rejection, identifier-field preservation, rename-frees-old-name
(combined with add), repeated-rename-last-wins, no-op self-rename.
Full iceberg lib suite: 1306/1306 passing. Clippy + rustfmt clean.
Extends UpdateSchemaAction with a rename(RenameColumn) builder method,
completing the add/delete/rename triad for non-incompatible schema
evolution.
API shape:
action.rename(
RenameColumn::builder().name("old").new_name("new").build(),
);
Mirrors the typed-builder shape used by apache#2451's broader schema-evolution
refactor, so callers adopting this PR today can migrate to apache#2451 with
no source change once it lands.
Renames preserve field IDs — only the leaf name changes. RenameColumn::name
uses SCHEMA_NAME_DELIMITER for nested fields (e.g. "person.name");
RenameColumn::new_name must be unqualified.
Ordering in commit():
1. deletes validated
2. renames validated — runs after deletes so a rename can re-use a
name being deleted, before additions so an addition can re-use a
name being renamed away
3. additions validated + ID-assigned
4. schema tree rebuilt with renames threaded through
rebuild_fields / rebuild_field
Validation rules (match pyiceberg's UpdateSchema.rename_column):
- missing source field → PreconditionFailed
- source field also staged for deletion → PreconditionFailed
- new_name contains SCHEMA_NAME_DELIMITER → PreconditionFailed
- new_name collides with a non-deleted, non-renamed sibling →
PreconditionFailed
- same field renamed twice → last rename wins
Identifier-field handling falls out for free — Rust keys identifier
fields by ID, not name, so with_identifier_field_ids(base_schema
.identifier_field_ids()) propagates the set unchanged across rename.
Tests: 10 new tests covering simple root rename, nested rename,
missing field, delete-conflict, sibling-collision, dotted-name
rejection, identifier-field preservation, rename-frees-old-name
(combined with add), repeated-rename-last-wins, no-op self-rename.
Full iceberg lib suite: 1306/1306 passing. Clippy + rustfmt clean.
Extends UpdateSchemaAction with a rename(RenameColumn) builder method,
completing the add/delete/rename triad for non-incompatible schema
evolution.
API shape:
action.rename(
RenameColumn::builder().name("old").new_name("new").build(),
);
Mirrors the typed-builder shape used by apache#2451's broader schema-evolution
refactor, so callers adopting this PR today can migrate to apache#2451 with
no source change once it lands.
Renames preserve field IDs — only the leaf name changes. RenameColumn::name
uses SCHEMA_NAME_DELIMITER for nested fields (e.g. "person.name");
RenameColumn::new_name must be unqualified.
Ordering in commit():
1. deletes validated
2. renames validated — runs after deletes so a rename can re-use a
name being deleted, before additions so an addition can re-use a
name being renamed away
3. additions validated + ID-assigned
4. schema tree rebuilt with renames threaded through
rebuild_fields / rebuild_field
Validation rules (match pyiceberg's UpdateSchema.rename_column):
- missing source field → PreconditionFailed
- source field also staged for deletion → PreconditionFailed
- new_name contains SCHEMA_NAME_DELIMITER → PreconditionFailed
- new_name collides with a non-deleted, non-renamed sibling →
PreconditionFailed
- same field renamed twice → last rename wins
Identifier-field handling falls out for free — Rust keys identifier
fields by ID, not name, so with_identifier_field_ids(base_schema
.identifier_field_ids()) propagates the set unchanged across rename.
Tests: 10 new tests covering simple root rename, nested rename,
missing field, delete-conflict, sibling-collision, dotted-name
rejection, identifier-field preservation, rename-frees-old-name
(combined with add), repeated-rename-last-wins, no-op self-rename.
Full iceberg lib suite: 1306/1306 passing. Clippy + rustfmt clean.
|
Just a heads-up that #2563 adds column rename to the existing |
Which issue does this PR close?
SchemaUpdatelogic to Iceberg-Rust #697What changes are included in this PR?
Add
schema_updatefunction and refactorUpdateSchemaActionto support schema evolution operations on Iceberg tables.A similar PR #2120(only add column and delete column), but refactor it to add more features.
Are these changes tested?
Yes. Besides, I add some complex test cases which are ported from iceberg-java, a few of them to be implemented.