Skip to content

Unify RewriteManifestsAction with SnapshotProducer once Replace/Delete ops are supported #2556

@SreeramGarlapati

Description

@SreeramGarlapati

Problem

RewriteManifestsAction (added in #2543) intentionally does not plug into SnapshotProducer. The producer path has four cross-cutting assumptions that block a clean integration today:

  1. update_snapshot_summaries rejects Operation::Replacecrates/iceberg/src/spec/snapshot_summary.rs:337-346 returns ErrorKind::DataInvalid for any operation other than Append / Overwrite / Delete. SnapshotProducer::commit() calls update_snapshot_summaries unconditionally, so the producer cannot emit a Replace snapshot at all today.
  2. ManifestProcess::process_manifests is synchronouscrates/iceberg/src/transaction/snapshot.rs:119-125. Rewrite-manifests must write new manifests (async I/O via ManifestWriter), then return them as the post-process output. There is no async seam.
  3. Summary keys are driven by added_data_files onlycrates/iceberg/src/transaction/snapshot.rs:373-379. For rewrite-manifests added_data_files is empty by definition, so no total-* carry-forward happens, and there is no hook for the rewrite-specific summary keys (manifests-created, manifests-replaced, manifests-kept, entries-processed).
  4. manifest_file() shape mismatch — the producer's manifest_file() assumes "if added_data_files non-empty, write one manifest from those files." Rewrite-manifests produces an N-to-M rewrite of existing manifest entries; the shape is fundamentally different.

Scope

Refactor once the underlying primitives land:

  • Add Operation::Replace (and Operation::Delete) to update_snapshot_summaries' allow-list, with summary-key carry-forward semantics that mirror Java BaseRewriteManifests.
  • Make ManifestProcess::process_manifests async (or add a parallel async hook) so producers can emit manifests written via ManifestWriter.
  • Decouple summary-key emission from added_data_files so rewrite-style ops can declare their own summary additions.
  • Migrate RewriteManifestsAction onto the unified path and delete the duplicated commit logic.

Out of scope for this issue: the rewrite_if predicate, cluster_by, custom spec_id, custom staging_location, and the iceberg-datafusion SQL-procedure layer — all separately deferrable from #2543.

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions