Skip to content

feat(schema): add rename_column to UpdateSchemaAction#2563

Open
nazq wants to merge 1 commit into
apache:mainfrom
nazq:update-schema-rename
Open

feat(schema): add rename_column to UpdateSchemaAction#2563
nazq wants to merge 1 commit into
apache:mainfrom
nazq:update-schema-rename

Conversation

@nazq
Copy link
Copy Markdown

@nazq nazq commented Jun 1, 2026

Which issue does this PR close?

What changes are included in this PR?

Adds column rename to UpdateSchemaAction, completing the add/delete/rename triad for non-incompatible schema evolution in transaction::update_schema.

Relationship to #2451 — the bigger refactor

Since opening this PR, @TwinklerG's #2451 has proposed a broader refactor of UpdateSchemaAction into an operations-buffer model that subsumes rename plus update_column, move, type promotion, and allow_incompatible_changes — the full PyIceberg / Java UpdateSchema surface that #697 calls for. I think #2451 is the architecturally sound long-term direction, mirroring what @Fokko built in pyiceberg.

To make this PR a clean stepping stone rather than a divergent path, I've aligned the rename API to #2451's RenameColumn typed-builder shape. The user-facing call is now:

let tx = Transaction::new(&table);
let action = tx.update_schema()
    .add_column(AddColumn::optional("new_col", Type::Primitive(PrimitiveType::Int)))
    .rename(RenameColumn::builder().name("old_name").new_name("new_name").build())
    .rename(RenameColumn::builder().name("person.name").new_name("fullname").build())
    .delete_column("dead_col");
let tx = action.apply(tx).unwrap();
let table = tx.commit(&catalog).await.unwrap();

Identical to how rename works under #2451, so callers who adopt this PR today won't need a source change when #2451 lands — only the dispatch underneath the .rename() method changes (from imperative-vec to operations-buffer). Field IDs are preserved across rename; only the leaf name changes.

Implementation

A new renames: Vec<(String, String)> accumulator on UpdateSchemaAction, plus a new step in commit() between deletes and additions:

  • After deletes so a rename can re-use a name being deleted in the same action.
  • Before additions so an addition can re-use a name being renamed away.

The renames are resolved to a HashMap<i32, String> keyed by field ID, then threaded through rebuild_fields / rebuild_field — same recursive walk that handles deletes and nested additions. Each rebuild call site that previously copied field.name.clone() now looks up the rename map first and falls back to the original name. Identifier-field handling falls out for free because Rust keys identifier_field_ids by ID, not name — the existing with_identifier_field_ids(base_schema.identifier_field_ids()) step propagates the set unchanged.

RenameColumn itself is a TypedBuilder struct with name / new_name fields, both setter(into), re-exported from iceberg::transaction alongside AddColumn. Same shape as #2451's RenameColumn.

Semantics

Modeled on pyiceberg.table.update.schema.UpdateSchema.rename_column:

  • name must exist in the current schema → otherwise PreconditionFailed.
  • A field cannot be both renamed and deleted in the same action → PreconditionFailed (intent is ambiguous; matches PyIceberg).
  • new_name cannot contain SCHEMA_NAME_DELIMITERPreconditionFailed (the unqualified-name requirement; preventing it from silently looking like a move-across-structs).
  • new_name cannot collide with a sibling that is not itself being deleted or renamed away → PreconditionFailed.
  • Same field renamed twice → last rename wins (matches PyIceberg's "stack on prior update" behavior).

Scope

Rename only. PyIceberg's UpdateSchema has other ops (update_column, make_column_required, move_*, set_identifier_fields) — those are #2451's scope, not this PR's. Treat this as the rename on-ramp; treat #2451 as the destination.

Are these changes tested?

Yes — 10 new unit tests in transaction::update_schema::tests, matching the style and depth of the existing test_*_add_column* / test_*_delete_column* tests:

Test What it covers
test_rename_column Simple root-level rename, verifies field ID preserved
test_rename_nested_column Nested rename (person.nameperson.fullname), preserves sibling fields
test_rename_missing_column_fails Missing source field → PreconditionFailed
test_rename_and_delete_same_column_fails Rename + delete on same column → PreconditionFailed
test_rename_to_existing_sibling_fails Collision with a non-deleted, non-renamed sibling → PreconditionFailed
test_rename_path_with_dot_in_new_name_fails Dotted new_name rejected → PreconditionFailed
test_rename_preserves_identifier_field Identifier-field ID survives rename
test_rename_frees_name_for_addition Rename zz_old then add new z in same action
test_rename_same_column_twice_last_wins Repeated rename of same field — last one wins
test_rename_self_is_noop Rename zz is a no-op (no schema-version bump emitted)

All 18 existing transaction::update_schema tests still pass. Full iceberg lib suite: 1306/1306. Clippy + rustfmt clean.

We've been carrying this on a fork branch for a downstream consumer; happy to migrate that consumer to #2451's API surface as soon as it lands.

@TwinklerG
Copy link
Copy Markdown
Contributor

Implemented most of the schema evolution features in #2451.

@nazq nazq force-pushed the update-schema-rename branch 2 times, most recently from 411703e to c42b685 Compare June 2, 2026 15:24
Extends UpdateSchemaAction with a rename(RenameColumn) builder method,
completing the add/delete/rename triad for non-incompatible schema
evolution.

API shape:

    action.rename(
        RenameColumn::builder().name("old").new_name("new").build(),
    );

Mirrors the typed-builder shape used by apache#2451's broader schema-evolution
refactor, so callers adopting this PR today can migrate to apache#2451 with
no source change once it lands.

Renames preserve field IDs — only the leaf name changes. RenameColumn::name
uses SCHEMA_NAME_DELIMITER for nested fields (e.g. "person.name");
RenameColumn::new_name must be unqualified.

Ordering in commit():
  1. deletes validated
  2. renames validated — runs after deletes so a rename can re-use a
     name being deleted, before additions so an addition can re-use a
     name being renamed away
  3. additions validated + ID-assigned
  4. schema tree rebuilt with renames threaded through
     rebuild_fields / rebuild_field

Validation rules (match pyiceberg's UpdateSchema.rename_column):
  - missing source field → PreconditionFailed
  - source field also staged for deletion → PreconditionFailed
  - new_name contains SCHEMA_NAME_DELIMITER → PreconditionFailed
  - new_name collides with a non-deleted, non-renamed sibling →
    PreconditionFailed
  - same field renamed twice → last rename wins

Identifier-field handling falls out for free — Rust keys identifier
fields by ID, not name, so with_identifier_field_ids(base_schema
.identifier_field_ids()) propagates the set unchanged across rename.

Tests: 10 new tests covering simple root rename, nested rename,
missing field, delete-conflict, sibling-collision, dotted-name
rejection, identifier-field preservation, rename-frees-old-name
(combined with add), repeated-rename-last-wins, no-op self-rename.

Full iceberg lib suite: 1306/1306 passing. Clippy + rustfmt clean.
@nazq nazq force-pushed the update-schema-rename branch from c42b685 to eca2ba4 Compare June 2, 2026 15:25
@nazq
Copy link
Copy Markdown
Author

nazq commented Jun 2, 2026

Thanks for pointing at #2451, @TwinklerG — I went and read through it carefully.

After studying both PRs, I think your refactor is the more architecturally sound long-term direction. The operations-buffer-then-schema_update shape is what lets UpdateSchemaAction correctly model dependent operations (rename-then-update, add-then-make-required, etc.) — and it's the shape that mirrors what @Fokko built in PyIceberg's UpdateSchema and the Java SchemaUpdate, which is exactly what #697 asks for. The current add_column / delete_column style from #2120 can't grow into that without a breaking change, so paying the API churn cost once now is the right call.

To make that easier, I've updated this PR to use the RenameColumn typed-builder shape from your #2451, so the user-facing call is now identical between the two:

action.rename(RenameColumn::builder().name("old").new_name("new").build())

That way this PR functions as a clean stepping stone: callers adopting it today won't need a source change when #2451 lands — only the dispatch underneath .rename() changes (imperative-vec → operations-buffer). If maintainers want a lighter incremental win now, this PR is ready; either way, #2451 is the destination.

We've been carrying this on a fork for a downstream consumer; happy to migrate to #2451's full API as soon as it merges. If contributing test coverage for any of your #[ignore = "not yet implemented"] cases would help land it faster, let me know which ones — we have real-world data for case-insensitive resolution and decimal-precision widening that could plug in.

@TwinklerG
Copy link
Copy Markdown
Contributor

Thanks @nazq for taking time to review my PR. Your PR is a brilliant move to ensure a smooth transition. I really appreciate your validation of the architecture in #2451.
Actually the test cases including #[ignore = "not yet implemented"] cases are ported from iceberg-java core/src/test/java/org/apache/iceberg/TestSchemaUpdate.java, and I haven't implemented them all yet. I am glad if you can contribute more cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add rename_column support to UpdateSchemaAction

2 participants