Skip to content

perf(zaino-state): Concurrent block fetches#1241

Open
emersonian wants to merge 3 commits into
zingolabs:rc_0_4_0_slow_sync_plus_updatesfrom
zecrocks:be/async_block_fetches
Open

perf(zaino-state): Concurrent block fetches#1241
emersonian wants to merge 3 commits into
zingolabs:rc_0_4_0_slow_sync_plus_updatesfrom
zecrocks:be/async_block_fetches

Conversation

@emersonian

@emersonian emersonian commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

This PR pipelines the finalized bulk-sync fetch path: blocks are fetched concurrently and committed strictly in height order. It switches the validator RPC client to HTTP/2 (prior-knowledge h2c) so a single connection multiplexes all in-flight requests, removing the connection-per-request churn the new concurrency would otherwise put on Zebra, and retries transient connection errors (with exponential backoff in the sync loop).

Background: write_blocks_to_height was serial and fetch-latency-bound: each block awaited two sequential RPCs (get_block + get_commitment_tree_roots) with zero overlap, so at the sandblast band we observed throughput collapse to ~1 blk/s while both Zaino and Zebra sat practically CPU-idle.

This update is an attempt to help Zaino sync past the sandblasted blocks faster.

The bulk-sync loop in write_blocks_to_height was serial and fetch-latency-bound, awaiting two sequential validator RPCs per block (get_block + get_commitment_tree_roots) with no overlap.

build_indexed_block_from_source was split into fetch_block_data (the parallelizable fetch) and assemble_indexed_block (the sequential chainwork + IndexedBlock build), and the v1 bulk path was driven through a futures buffered(N) stream that keeps up to N fetches in flight while yielding them in height order, so only the fetch wait was overlapped. The v0 and experimental paths kept the serial helper.

sync_fetch_concurrency (default 32) was added and documented alongside sync_write_batch_bytes in the example config.
@emersonian emersonian changed the title perf(zaino-state): Asynchronous block fetches perf(zaino-state): Concurrent block fetches Jun 17, 2026
@zancas

zancas commented Jun 19, 2026

Copy link
Copy Markdown
Member

@emersonian I am considering redirecting this to dev, opinions.

@emersonian

Copy link
Copy Markdown
Contributor Author

It sped things up overall in my testing and I think is similar to our sync speedup in our lightwalletd fork, but it did not help much through the sandblasting period.

@emersonian

Copy link
Copy Markdown
Contributor Author

If anything, HTTP/2 should definitely be merged

@AloeareV

Copy link
Copy Markdown
Contributor

@emersonian I am considering redirecting this to dev, opinions.

I am inclined to agree this should be moved to dev.

@zancas zancas self-assigned this Jun 26, 2026
@oolu4236 oolu4236 assigned idky137 and unassigned zancas Jun 30, 2026
@oolu4236 oolu4236 added the Low Priority Low priority issues that are not currently on our roadmap label Jul 1, 2026
zancas added a commit that referenced this pull request Jul 3, 2026
Three moves toward the pilot's acceptance bar (ADR 0006: a mainnet
benchmark of an optimally fast index sync), informed by emersonian's
PR #1241, which measured the monolith's serial two-await-per-block
fetch collapsing to ~1 blk/s in the sandblast band with both zaino and
zebra CPU-idle:

- Instrument the build. run() returns SpendIndexBuildStats — worker
  count, blocks, spends, and wall-clock per stage (stream/extract,
  collate, bulk-load). Collate and load are identical across build
  variants, so comparing runs isolates the streaming stage. spawn_build
  logs the stats and gains loud, ignorable-when-unset benchmark env
  knobs: ZAINO_SPEND_INDEX_START_HEIGHT / _END_HEIGHT bound the built
  range; ZAINO_SPEND_INDEX_WORKERS sets the fan-out.

- Parallelize the streaming stage. Workers pull fixed-size 1000-block
  chunks from a shared atomic queue — chunk-pulling self-balances the
  block-weight skew (sandblast blocks dwarf 2016-era ones) — and
  extract into worker-local buffers; one global sort and one MDB_APPEND
  pass follow, so workers never touch the store (the single-writer
  discipline whose violation PR #1275 diagnosed as LMDB SIGSEGV).
  There is one code path, not two: workers = 1 IS the serial baseline,
  so serial-vs-parallel comparisons vary only the fan-out. The
  move-only single-build guarantee is untouched.

- Make the fetch roots-free. Workers extract directly from the zebra
  block (extract_spends_from_zebra_block): one get_block per height,
  no get_commitment_tree_roots await, no compact conversion of the
  discarded shielded data. The compact-form extractor remains as the
  test oracle, so the existing sync-loop and presence tests now
  cross-check the two extraction paths over the same chains.
  SpendIndexSync drops its network field (it only fed activation
  heights the extractor no longer needs).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
zancas added a commit that referenced this pull request Jul 3, 2026
Three moves toward the pilot's acceptance bar (ADR 0006: a mainnet
benchmark of an optimally fast index sync), informed by emersonian's
PR #1241, which measured the monolith's serial two-await-per-block
fetch collapsing to ~1 blk/s in the sandblast band with both zaino and
zebra CPU-idle:

- Instrument the build. run() returns SpendIndexBuildStats — worker
  count, blocks, spends, and wall-clock per stage (stream/extract,
  collate, bulk-load). Collate and load are identical across build
  variants, so comparing runs isolates the streaming stage. spawn_build
  logs the stats and gains loud, ignorable-when-unset benchmark env
  knobs: ZAINO_SPEND_INDEX_START_HEIGHT / _END_HEIGHT bound the built
  range; ZAINO_SPEND_INDEX_WORKERS sets the fan-out.

- Parallelize the streaming stage. Workers pull fixed-size 1000-block
  chunks from a shared atomic queue — chunk-pulling self-balances the
  block-weight skew (sandblast blocks dwarf 2016-era ones) — and
  extract into worker-local buffers; one global sort and one MDB_APPEND
  pass follow, so workers never touch the store (the single-writer
  discipline whose violation PR #1275 diagnosed as LMDB SIGSEGV).
  There is one code path, not two: workers = 1 IS the serial baseline,
  so serial-vs-parallel comparisons vary only the fan-out. The
  move-only single-build guarantee is untouched.

- Make the fetch roots-free. Workers extract directly from the zebra
  block (extract_spends_from_zebra_block): one get_block per height,
  no get_commitment_tree_roots await, no compact conversion of the
  discarded shielded data. The compact-form extractor remains as the
  test oracle, so the existing sync-loop and presence tests now
  cross-check the two extraction paths over the same chains.
  SpendIndexSync drops its network field (it only fed activation
  heights the extractor no longer needs).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Low Priority Low priority issues that are not currently on our roadmap

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants