Skip to content

bench: cycle distinct prefilled blocks to remove per-append fill#2

Closed
nclack wants to merge 1 commit into
mainfrom
bench-150-prefill-blocks
Closed

bench: cycle distinct prefilled blocks to remove per-append fill#2
nclack wants to merge 1 commit into
mainfrom
bench-150-prefill-blocks

Conversation

@nclack

@nclack nclack commented Jun 18, 2026

Copy link
Copy Markdown
Owner

Closes acquire-project#150.

Streaming benches spent ~half their wall time generating their own input via the per-append fill() in pump_data / pump_data_interleaved, which understated library throughput ~2x and let the fill thread contend with the library for host memory bandwidth (medfmt/zstd regressed ~18% purely from contention).

Changes:

  • New default pump mode cycles through up to 8 distinct pre-generated blocks (PUMP_CYCLE_BLOCK_COUNT), filled once with staggered offsets so contents differ block-to-block but keep the same value distribution — realistic compression ratios with zero per-append fill cost. Applied to both single-stream and the two-stream interleaved pump.
  • --busy-producer keeps the original regenerate-every-append behavior as a host-bandwidth stress variant.
  • --prefill (single identical block) retained for back-compat / writer-only ceiling.
  • Both pumps now report Data gen time (seconds / % of wall) so harness cost is visible instead of folded into throughput.
  • Incidental: the single-stream path previously ignored --dtype (always bpe=2); it now uses dtype_bpe(dtype), matching the two-stream path.

A/B to validate: default vs --busy-producer throughput (expect ~2x gap; medfmt/zstd contention regression gone under default), and matching compressed bytes between default and busy modes (vs inflated ratio under --prefill).

@nclack

nclack commented Jun 18, 2026

Copy link
Copy Markdown
Owner Author

Superseded by acquire-project#159 (PR retargeted to upstream).

@nclack nclack closed this Jun 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Streaming benches spend ~half their time generating input; throughput conflates harness and library

1 participant