Skip to content

feat(schema): refactor UpdateSchemaAction for schema evolution in Iceberg#2451

Open
TwinklerG wants to merge 6 commits into
apache:mainfrom
TwinklerG:twinklerg/update_schema
Open

feat(schema): refactor UpdateSchemaAction for schema evolution in Iceberg#2451
TwinklerG wants to merge 6 commits into
apache:mainfrom
TwinklerG:twinklerg/update_schema

Conversation

@TwinklerG
Copy link
Copy Markdown
Contributor

Which issue does this PR close?

What changes are included in this PR?

Add schema_update function and refactor UpdateSchemaAction to support schema evolution operations on Iceberg tables.

A similar PR #2120(only add column and delete column), but refactor it to add more features.

Are these changes tested?

Yes. Besides, I add some complex test cases which are ported from iceberg-java, a few of them to be implemented.

@TwinklerG TwinklerG changed the title feat: refactor UpdateSchemaAction for schema evolution in Iceberg feat: refactor UpdateSchemaAction for schema evolution in Iceberg May 14, 2026
@TwinklerG TwinklerG changed the title feat: refactor UpdateSchemaAction for schema evolution in Iceberg feat(schema): refactor UpdateSchemaAction for schema evolution in Iceberg May 14, 2026
@CTTY
Copy link
Copy Markdown
Collaborator

CTTY commented May 18, 2026

I'm not sure if this is the correct direction: it changes the update_schema API, and I don't think it adds much value for users/devs

Could you share more about your reasoning here?

@TwinklerG
Copy link
Copy Markdown
Contributor Author

TwinklerG commented May 19, 2026

I'm not sure if this is the correct direction: it changes the update_schema API, and I don't think it adds much value for users/devs

Could you share more about your reasoning here?

Well, I have been continuously tracking the Issues #2119 #697 and the related PRs. I noticed that the merged PR #2120 only added the capabilities of adding column and deleting column, but this is obviously only a small part of the schema evolution, and I don't think the original implementation was a good one. Therefore, I imitated the implementation of iceberg java and completed this refactoring

nazq added a commit to nazq/iceberg-rust that referenced this pull request Jun 2, 2026
Extends UpdateSchemaAction with a rename(RenameColumn) builder method,
completing the add/delete/rename triad for non-incompatible schema
evolution.

API shape:

    action.rename(
        RenameColumn::builder().name("old").new_name("new").build(),
    );

Mirrors the typed-builder shape used by apache#2451's broader schema-evolution
refactor, so callers adopting this PR today can migrate to apache#2451 with
no source change once it lands.

Renames preserve field IDs — only the leaf name changes. RenameColumn::name
uses SCHEMA_NAME_DELIMITER for nested fields (e.g. "person.name");
RenameColumn::new_name must be unqualified.

Ordering in commit():
  1. deletes validated
  2. renames validated — runs after deletes so a rename can re-use a
     name being deleted, before additions so an addition can re-use a
     name being renamed away
  3. additions validated + ID-assigned
  4. schema tree rebuilt with renames threaded through
     rebuild_fields / rebuild_field

Validation rules (match pyiceberg's UpdateSchema.rename_column):
  - missing source field → PreconditionFailed
  - source field also staged for deletion → PreconditionFailed
  - new_name contains SCHEMA_NAME_DELIMITER → PreconditionFailed
  - new_name collides with a non-deleted, non-renamed sibling →
    PreconditionFailed
  - same field renamed twice → last rename wins

Identifier-field handling falls out for free — Rust keys identifier
fields by ID, not name, so with_identifier_field_ids(base_schema
.identifier_field_ids()) propagates the set unchanged across rename.

Tests: 10 new tests covering simple root rename, nested rename,
missing field, delete-conflict, sibling-collision, dotted-name
rejection, identifier-field preservation, rename-frees-old-name
(combined with add), repeated-rename-last-wins, no-op self-rename.

Full iceberg lib suite: 1306/1306 passing. Clippy + rustfmt clean.
nazq added a commit to nazq/iceberg-rust that referenced this pull request Jun 2, 2026
Extends UpdateSchemaAction with a rename(RenameColumn) builder method,
completing the add/delete/rename triad for non-incompatible schema
evolution.

API shape:

    action.rename(
        RenameColumn::builder().name("old").new_name("new").build(),
    );

Mirrors the typed-builder shape used by apache#2451's broader schema-evolution
refactor, so callers adopting this PR today can migrate to apache#2451 with
no source change once it lands.

Renames preserve field IDs — only the leaf name changes. RenameColumn::name
uses SCHEMA_NAME_DELIMITER for nested fields (e.g. "person.name");
RenameColumn::new_name must be unqualified.

Ordering in commit():
  1. deletes validated
  2. renames validated — runs after deletes so a rename can re-use a
     name being deleted, before additions so an addition can re-use a
     name being renamed away
  3. additions validated + ID-assigned
  4. schema tree rebuilt with renames threaded through
     rebuild_fields / rebuild_field

Validation rules (match pyiceberg's UpdateSchema.rename_column):
  - missing source field → PreconditionFailed
  - source field also staged for deletion → PreconditionFailed
  - new_name contains SCHEMA_NAME_DELIMITER → PreconditionFailed
  - new_name collides with a non-deleted, non-renamed sibling →
    PreconditionFailed
  - same field renamed twice → last rename wins

Identifier-field handling falls out for free — Rust keys identifier
fields by ID, not name, so with_identifier_field_ids(base_schema
.identifier_field_ids()) propagates the set unchanged across rename.

Tests: 10 new tests covering simple root rename, nested rename,
missing field, delete-conflict, sibling-collision, dotted-name
rejection, identifier-field preservation, rename-frees-old-name
(combined with add), repeated-rename-last-wins, no-op self-rename.

Full iceberg lib suite: 1306/1306 passing. Clippy + rustfmt clean.
nazq added a commit to nazq/iceberg-rust that referenced this pull request Jun 2, 2026
Extends UpdateSchemaAction with a rename(RenameColumn) builder method,
completing the add/delete/rename triad for non-incompatible schema
evolution.

API shape:

    action.rename(
        RenameColumn::builder().name("old").new_name("new").build(),
    );

Mirrors the typed-builder shape used by apache#2451's broader schema-evolution
refactor, so callers adopting this PR today can migrate to apache#2451 with
no source change once it lands.

Renames preserve field IDs — only the leaf name changes. RenameColumn::name
uses SCHEMA_NAME_DELIMITER for nested fields (e.g. "person.name");
RenameColumn::new_name must be unqualified.

Ordering in commit():
  1. deletes validated
  2. renames validated — runs after deletes so a rename can re-use a
     name being deleted, before additions so an addition can re-use a
     name being renamed away
  3. additions validated + ID-assigned
  4. schema tree rebuilt with renames threaded through
     rebuild_fields / rebuild_field

Validation rules (match pyiceberg's UpdateSchema.rename_column):
  - missing source field → PreconditionFailed
  - source field also staged for deletion → PreconditionFailed
  - new_name contains SCHEMA_NAME_DELIMITER → PreconditionFailed
  - new_name collides with a non-deleted, non-renamed sibling →
    PreconditionFailed
  - same field renamed twice → last rename wins

Identifier-field handling falls out for free — Rust keys identifier
fields by ID, not name, so with_identifier_field_ids(base_schema
.identifier_field_ids()) propagates the set unchanged across rename.

Tests: 10 new tests covering simple root rename, nested rename,
missing field, delete-conflict, sibling-collision, dotted-name
rejection, identifier-field preservation, rename-frees-old-name
(combined with add), repeated-rename-last-wins, no-op self-rename.

Full iceberg lib suite: 1306/1306 passing. Clippy + rustfmt clean.
@nazq
Copy link
Copy Markdown

nazq commented Jun 2, 2026

Just a heads-up that #2563 adds column rename to the existing UpdateSchemaAction, using a RenameColumn typed-builder that matches the shape in this PR — so callers adopting either land on the same surface for that operation. Mentioning it in case a smaller incremental step is useful while this larger refactor is under discussion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add support for adding new fields to an iceberg table

3 participants