feat(datafusion): report IcebergTableScan metrics by geoffreyclaude · Pull Request #2521 · apache/iceberg-rust

geoffreyclaude · 2026-05-28T08:04:24Z

Which issue does this PR close?

Closes Report DataFusion operator metrics in IcebergTableScan #2364.

What changes are included in this PR?

IcebergTableScan now owns a DataFusion ExecutionPlanMetricsSet, returns it from ExecutionPlan::metrics(), and resets it from ExecutionPlan::reset_state().

The scan output stream is wrapped in a small poll_fn adapter that records BaselineMetrics while the Iceberg stream is polled. This exposes the standard DataFusion operator metrics such as elapsed_compute, output_rows, output_batches, output_bytes, and completion timestamps in EXPLAIN ANALYZE.

Focused tests cover the metrics wrapper directly and the catalog-backed provider execution path, including reset-state behavior.

Are these changes tested?

cargo test -p iceberg-datafusion stream_with_baseline_metrics_records_rows_and_compute --locked
cargo test -p iceberg-datafusion test_catalog_backed_provider_scan_reports_metrics --locked
cargo check -p iceberg-datafusion --locked
cargo test -p iceberg-datafusion --locked
cargo clippy -p iceberg-datafusion --all-targets --locked

mbutrovich · 2026-05-29T20:19:44Z

+    baseline_metrics: BaselineMetrics,
+) -> Pin<Box<dyn Stream<Item = DFResult<RecordBatch>> + Send>> {
+    futures::stream::poll_fn(move |cx| {
+        let baseline_metrics = baseline_metrics.clone();


I don't think baseline_metrics.clone() is needed. BaselineMetrics::elapsed_compute() and record_poll() both take &self, so the captured value can be used directly. The clone is cheap (Arc/Count copies) but it's dead weight on every poll and reads as if there's a borrow problem to work around. Suggest:

futures::stream::poll_fn(move |cx| { let _timer = baseline_metrics.elapsed_compute().timer(); let poll = stream.as_mut().poll_next(cx); baseline_metrics.record_poll(poll) })

Good catch. Removed the per-poll clone() in 19db703.

This is cleaner and also avoids dropping a cloned BaselineMetrics on every poll, which could record end_timestamp earlier than intended.

mbutrovich · 2026-05-29T20:20:21Z

+        let metrics = metrics.clone_inner();
+        assert_eq!(metrics.output_rows(), Some(3));
+        assert!(
+            metrics.elapsed_compute().is_some_and(|elapsed| elapsed > 0),


stream_with_baseline_metrics_records_rows_and_compute asserts output_rows and elapsed_compute. The PR description also lists output_batches, output_bytes, and completion timestamps as exposed. Those are in fact recorded — BaselineMetrics::record_poll → batch.record_output(...) updates output_batches and output_bytes (see datafusion/physical-expr-common/src/metrics/baseline.rs:331) — so adding assert!(metrics.output_batches() == Some(1)) and assert!(metrics.output_bytes().is_some_and(|b| b > 0)) is a cheap regression guard that matches the documented contract.

Agreed on covering the rest of the baseline metrics. I added assertions for output_batches, output_bytes, start_timestamp, and end_timestamp in the focused stream_with_baseline_metrics test in 19db703.

I adapted this through MetricsSet::sum(...) + MetricValue matching because DataFusion 53.1 does not expose MetricsSet::output_batches() / output_bytes() convenience methods.

mbutrovich

Minor suggestions, thanks for tackling this @geoffreyclaude!

geoffreyclaude · 2026-06-01T08:34:17Z

Minor suggestions, thanks for tackling this @geoffreyclaude!

@mbutrovich Thanks for the quick review! PR should now be updated with your suggestions, as two additional commits.

I also had to add a small public API snapshot update in f94d61f after rebasing onto current main, since the new check-public-api CI validates crates/integrations/datafusion/public-api.txt.

geoffreyclaude force-pushed the fix/iceberg-scan-metrics branch from 9f036f5 to fe4a322 Compare May 28, 2026 08:25

geoffreyclaude marked this pull request as ready for review May 28, 2026 08:33

mbutrovich reviewed May 29, 2026

View reviewed changes

mbutrovich suggested changes May 29, 2026

View reviewed changes

geoffreyclaude force-pushed the fix/iceberg-scan-metrics branch 3 times, most recently from fb0a0ae to 6dc9f99 Compare June 1, 2026 08:32

geoffreyclaude requested a review from mbutrovich June 1, 2026 08:33

geoffreyclaude force-pushed the fix/iceberg-scan-metrics branch from 6dc9f99 to e3293de Compare June 1, 2026 11:33

geoffreyclaude added 4 commits June 1, 2026 13:38

feat(datafusion): report IcebergTableScan metrics

be8d1ff

fix(datafusion): avoid cloning scan metrics per poll

19db703

test(datafusion): cover all scan baseline metrics

aaaa615

chore(datafusion): update public api snapshot

f94d61f

geoffreyclaude force-pushed the fix/iceberg-scan-metrics branch from e3293de to f94d61f Compare June 1, 2026 11:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(datafusion): report IcebergTableScan metrics#2521

feat(datafusion): report IcebergTableScan metrics#2521
geoffreyclaude wants to merge 4 commits into
apache:mainfrom
geoffreyclaude:fix/iceberg-scan-metrics

geoffreyclaude commented May 28, 2026 •

edited

Loading

Uh oh!

mbutrovich May 29, 2026

Uh oh!

geoffreyclaude Jun 1, 2026 •

edited

Loading

Uh oh!

mbutrovich May 29, 2026

Uh oh!

geoffreyclaude Jun 1, 2026 •

edited

Loading

Uh oh!

mbutrovich left a comment

Uh oh!

geoffreyclaude commented Jun 1, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

geoffreyclaude commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

What changes are included in this PR?

Are these changes tested?

Uh oh!

mbutrovich May 29, 2026

Choose a reason for hiding this comment

Uh oh!

geoffreyclaude Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mbutrovich May 29, 2026

Choose a reason for hiding this comment

Uh oh!

geoffreyclaude Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mbutrovich left a comment

Choose a reason for hiding this comment

Uh oh!

geoffreyclaude commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

geoffreyclaude commented May 28, 2026 •

edited

Loading

geoffreyclaude Jun 1, 2026 •

edited

Loading

geoffreyclaude Jun 1, 2026 •

edited

Loading

geoffreyclaude commented Jun 1, 2026 •

edited

Loading