bench(reader): add ArrowReader benchmark harness by viirya · Pull Request #2558 · apache/iceberg-rust

viirya · 2026-05-31T17:32:58Z

Which issue does this PR close?

Closes Add an in-repo benchmark harness for ArrowReader #2557.

What changes are included in this PR?

Adds a criterion-based benchmark harness for the ArrowReader at crates/iceberg/benches/arrow_reader.rs. Until now there were no in-repo reader benchmarks, so every performance claim in the perf epic (#2172) had to be validated against external workloads. This harness writes Parquet files to a local temp dir and reads them back through the normal FileIO path, measuring the per-FileScanTask overhead that dominates scans of tables with many small files. Because it runs on the local FS, it isolates CPU / per-task work rather than network latency.

Benchmark groups, chosen to map onto the epic's code paths:

many_small_files — scans 16/64/256 small files, reporting files/sec so per-file overhead is directly visible.
concurrency — a fixed corpus read at concurrency 1/4/16, exercising both the single-concurrency fast path and the buffered/flattened multi-task path.
migrated_table — files without embedded field IDs read via name mapping, isolating the migrated-table schema-resolution cost (perf(reader): Avoid second create_parquet_record_batch_stream_builder() call for migrated tables #2176 path).
same_file_splits — one multi-row-group file read as 1/8/32 byte-range tasks, surfacing the redundant per-split metadata fetch that the same-file metadata caching proposed in Epic: ArrowReader performance improvements for split FileScanTasks #2172 (attempted in feat(reader): cache Parquet metadata for when FileScanTasks read the same file #2100) targets.
with_predicate — scans carrying a bound predicate with row-group filtering and row selection enabled, exercising the per-task row-filter setup.

This adds criterion (with the async_tokio feature) as a workspace dev-dependency and a [[bench]] entry to the iceberg crate.

Measuring `bytes_read`, not just time

Wall-clock time on the local FS cannot show the benefit of optimizations that save round-trips (e.g. the metadata size hint #2173 or range coalescing #2181) — those need a latency-bearing backend, which is left as follow-up. But the same-file-split problem is different: its cost is redundant bytes fetched, not latency, so it is measurable even locally.

The same_file_splits group therefore reports ScanMetrics::bytes_read as its throughput basis (a deterministic count, not a sampled measurement) and prints the read amplification per split count. On a ~1.2 MiB file:

same_file_splits/1:  bytes_read = 1,694,258  (1.43x file size)
same_file_splits/8:  bytes_read = 5,529,051  (4.66x file size)
same_file_splits/32: bytes_read = 18,674,303 (15.72x file size)

Reading one file as 32 byte-range splits fetches ~15.7x the file size, because each split re-fetches the Parquet metadata independently. This is machine-independent evidence for the cost that same-file metadata caching targets — #2100 was closed partly because its author could not demonstrate a benefit on their workload (a ~1:1 task-to-file table, where same-file caching has nothing to hit); with this benchmark a caching implementation would show the amplification dropping back toward 1x.

Run with:

cargo bench -p iceberg --bench arrow_reader

Are these changes tested?

This is a benchmark, not a code change. The harness compiles and runs (cargo bench -p iceberg --bench arrow_reader), and reuses the same Parquet-writing and reader-driving patterns as the existing reader unit tests. cargo fmt and cargo clippy -p iceberg --benches are clean.

There were no in-repo benchmarks for the ArrowReader, so every performance claim in the perf epic (apache#2172) had to be validated against external workloads. This adds a criterion harness that writes Parquet files to a local temp dir and reads them back through the normal FileIO path, measuring the per-FileScanTask overhead that dominates scans of tables with many small files. Benchmark groups: - many_small_files: scans 16/64/256 small files, reporting files/sec so per-file overhead is directly visible. - concurrency: a fixed corpus read at concurrency 1/4/16, exercising both the single-concurrency fast path and the buffered/flattened multi-task path. - migrated_table: files without embedded field IDs read via name mapping, isolating the migrated-table schema-resolution cost. - same_file_splits: one multi-row-group file read as 1/8/32 byte-range tasks, surfacing the redundant per-split metadata fetch that metadata caching (item apache#5) targets. - with_predicate: scans carrying a bound predicate with row-group filtering and row selection enabled, exercising the per-task row-filter setup. This gives a reproducible baseline for evaluating reader optimizations such as operator caching (apache#2177) and metadata reuse.

Wall-clock time on the local FS cannot show the benefit of round-trip-saving optimizations, but it also undersells the same-file-split problem, whose cost is redundant *bytes fetched* rather than latency. Report ScanMetrics::bytes_read as the same_file_splits throughput basis (a deterministic count, not a sample), and print the read amplification per split count. Reading one ~1.2MB file as 32 byte-range splits fetches ~15.7x the file size, because each split re-fetches the Parquet metadata independently. This is the machine-independent evidence for the cost that same-file metadata caching (proposed in apache#2172, attempted in apache#2100) targets; a caching implementation would drive the amplification back toward 1x.

mbutrovich · 2026-06-01T13:23:33Z

Thanks for doing this @viirya! This would have made my life working on #2172 much easier. I will take a proper pass through it today.

viirya force-pushed the bench/arrow-reader-perf branch from e1e4060 to d2ac37e Compare May 31, 2026 17:38

mbutrovich self-requested a review June 1, 2026 11:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bench(reader): add ArrowReader benchmark harness#2558

bench(reader): add ArrowReader benchmark harness#2558
viirya wants to merge 2 commits into
apache:mainfrom
viirya:bench/arrow-reader-perf

viirya commented May 31, 2026 •

edited

Loading

Uh oh!

mbutrovich commented Jun 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

viirya commented May 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

What changes are included in this PR?

Measuring bytes_read, not just time

Are these changes tested?

Uh oh!

mbutrovich commented Jun 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

viirya commented May 31, 2026 •

edited

Loading

Measuring `bytes_read`, not just time