perf(zaino-state): Concurrent block fetches#1241
Open
emersonian wants to merge 3 commits into
Open
Conversation
The bulk-sync loop in write_blocks_to_height was serial and fetch-latency-bound, awaiting two sequential validator RPCs per block (get_block + get_commitment_tree_roots) with no overlap. build_indexed_block_from_source was split into fetch_block_data (the parallelizable fetch) and assemble_indexed_block (the sequential chainwork + IndexedBlock build), and the v1 bulk path was driven through a futures buffered(N) stream that keeps up to N fetches in flight while yielding them in height order, so only the fetch wait was overlapped. The v0 and experimental paths kept the serial helper. sync_fetch_concurrency (default 32) was added and documented alongside sync_write_batch_bytes in the example config.
…e RPC timeout to 30s
Member
|
@emersonian I am considering redirecting this to dev, opinions. |
Contributor
Author
|
It sped things up overall in my testing and I think is similar to our sync speedup in our lightwalletd fork, but it did not help much through the sandblasting period. |
Contributor
Author
|
If anything, HTTP/2 should definitely be merged |
Contributor
I am inclined to agree this should be moved to dev. |
zancas
added a commit
that referenced
this pull request
Jul 3, 2026
Three moves toward the pilot's acceptance bar (ADR 0006: a mainnet benchmark of an optimally fast index sync), informed by emersonian's PR #1241, which measured the monolith's serial two-await-per-block fetch collapsing to ~1 blk/s in the sandblast band with both zaino and zebra CPU-idle: - Instrument the build. run() returns SpendIndexBuildStats — worker count, blocks, spends, and wall-clock per stage (stream/extract, collate, bulk-load). Collate and load are identical across build variants, so comparing runs isolates the streaming stage. spawn_build logs the stats and gains loud, ignorable-when-unset benchmark env knobs: ZAINO_SPEND_INDEX_START_HEIGHT / _END_HEIGHT bound the built range; ZAINO_SPEND_INDEX_WORKERS sets the fan-out. - Parallelize the streaming stage. Workers pull fixed-size 1000-block chunks from a shared atomic queue — chunk-pulling self-balances the block-weight skew (sandblast blocks dwarf 2016-era ones) — and extract into worker-local buffers; one global sort and one MDB_APPEND pass follow, so workers never touch the store (the single-writer discipline whose violation PR #1275 diagnosed as LMDB SIGSEGV). There is one code path, not two: workers = 1 IS the serial baseline, so serial-vs-parallel comparisons vary only the fan-out. The move-only single-build guarantee is untouched. - Make the fetch roots-free. Workers extract directly from the zebra block (extract_spends_from_zebra_block): one get_block per height, no get_commitment_tree_roots await, no compact conversion of the discarded shielded data. The compact-form extractor remains as the test oracle, so the existing sync-loop and presence tests now cross-check the two extraction paths over the same chains. SpendIndexSync drops its network field (it only fed activation heights the extractor no longer needs). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
zancas
added a commit
that referenced
this pull request
Jul 3, 2026
Three moves toward the pilot's acceptance bar (ADR 0006: a mainnet benchmark of an optimally fast index sync), informed by emersonian's PR #1241, which measured the monolith's serial two-await-per-block fetch collapsing to ~1 blk/s in the sandblast band with both zaino and zebra CPU-idle: - Instrument the build. run() returns SpendIndexBuildStats — worker count, blocks, spends, and wall-clock per stage (stream/extract, collate, bulk-load). Collate and load are identical across build variants, so comparing runs isolates the streaming stage. spawn_build logs the stats and gains loud, ignorable-when-unset benchmark env knobs: ZAINO_SPEND_INDEX_START_HEIGHT / _END_HEIGHT bound the built range; ZAINO_SPEND_INDEX_WORKERS sets the fan-out. - Parallelize the streaming stage. Workers pull fixed-size 1000-block chunks from a shared atomic queue — chunk-pulling self-balances the block-weight skew (sandblast blocks dwarf 2016-era ones) — and extract into worker-local buffers; one global sort and one MDB_APPEND pass follow, so workers never touch the store (the single-writer discipline whose violation PR #1275 diagnosed as LMDB SIGSEGV). There is one code path, not two: workers = 1 IS the serial baseline, so serial-vs-parallel comparisons vary only the fan-out. The move-only single-build guarantee is untouched. - Make the fetch roots-free. Workers extract directly from the zebra block (extract_spends_from_zebra_block): one get_block per height, no get_commitment_tree_roots await, no compact conversion of the discarded shielded data. The compact-form extractor remains as the test oracle, so the existing sync-loop and presence tests now cross-check the two extraction paths over the same chains. SpendIndexSync drops its network field (it only fed activation heights the extractor no longer needs). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR pipelines the finalized bulk-sync fetch path: blocks are fetched concurrently and committed strictly in height order. It switches the validator RPC client to HTTP/2 (prior-knowledge h2c) so a single connection multiplexes all in-flight requests, removing the connection-per-request churn the new concurrency would otherwise put on Zebra, and retries transient connection errors (with exponential backoff in the sync loop).
Background: write_blocks_to_height was serial and fetch-latency-bound: each block awaited two sequential RPCs (get_block + get_commitment_tree_roots) with zero overlap, so at the sandblast band we observed throughput collapse to ~1 blk/s while both Zaino and Zebra sat practically CPU-idle.
This update is an attempt to help Zaino sync past the sandblasted blocks faster.