Single outpoint index by zancas · Pull Request #1330 · zingolabs/zaino

zancas · 2026-06-29T21:13:49Z

The front-door test set is named for the `packages/` directory it runs, so the plural is the faithful name. Hard-cut: `makers test package` now errors with `unknown set` rather than aliasing. Updates every reference site: Makefile.toml (default, case matcher, dispatch, descriptions, comment block), help.sh, docs/testing.md, README.md, and the live-tests/CONTEXT.md glossary term (now records the retired singular under _Avoid_). ADR-0004 gets a one-line supersession footer; its historical text is preserved. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Introduce the default-off `outp_to_spend_index` experimental feature gating a self-contained module for the parallel-buildable finalised spend-index POC. Adds the pure, statically read-free extractor extract_spends(&[IndexedBlock]) -> Vec<(Outpoint, TransactionHash)>, mapping each consumed transparent outpoint to its spending txid (coinbase null-prevout inputs skipped), plus a unit test for the coinbase-skip rule. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Add `collate`: encode each spend record as (outpoint key, bare 32-byte txid value), sort by key bytes (LMDB memcmp order, ready for MDB_APPEND), and reject duplicate keys (the disjoint-key invariant). Adds the EncodedSpend type and collate tests. The extractor carries a TODO for the buffer-reuse variant, to land with the per-worker loop. No integrity commitment (not MVP); the MDB_APPEND write folds into the loop slice, which opens the index's LMDB environment. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Pin that collate sorts by encoded key bytes (LMDB memcmp order), not numerically: for two spends sharing a prev_txid, the LE-encoded output index makes 256 (00 01 00 00) sort before 1 (01 00 00 00) — exactly what MDB_APPEND expects. Regression-guards the sort<->storage agreement. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Open the index's own LMDB environment (no WRITE_MAP, mirroring the finalised state) holding a single outpoint -> spending_txid database. bulk_load writes collated entries via MDB_APPEND in one transaction (one-shot, globally-sorted input; a debug_assert documents the contract and LMDB self-checks it at runtime). spending_txid is a point read (None = unspent in finalised state). Round-trip test over a temp environment. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

StateSource: a compile-time StateService-only BlockchainSource wrapping the zebra State connector; its forwarders are generated by a declarative macro that delegates to the tested ValidatorConnector::State dispatch (one construction site). SpendIndexSource marker admits StateSource + MockchainSource but not the zcashd/JSON-RPC Fetch path, so 'never FetchService' is a compile-time fact. SpendIndexSync<S: SpendIndexSource> is a move-only, single-run handle whose run(self) streams [start_height, finalised_tip] from the source, extracts spends, collates once, and MDB_APPEND-loads the index. start_height is configurable (genesis for a full index, or e.g. a network-upgrade activation). from_state is the production constructor and rejects the Fetch variant. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Drives SpendIndexSync::run over the 201-block regtest fixture via a MockchainSource and asserts run indexes exactly the finalised-range spends: every finalised spend mapped to its spender, every non-finalised spend absent, and an unspent outpoint reading back as None. The mockchain caps get_best_block_height at its max loaded height (200), so at the cfg(test) depth of 100 the finalised tip is 100; regtest coinbase maturity puts every fixture spend at height >=101, so this exercises the loop end to end in the exclusion direction (fetch -> finalised boundary -> empty collate -> LMDB -> read-back). Finalised-spend mapping with real records is covered by the spend_index_db round-trip; an end-to-end presence check needs a longer synthetic chain. multi_thread because run uses block_in_place for the LMDB write. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Adds a feature-gated BlockchainSource::finalised_spend_index_source bridge (default None; ValidatorConnector yields a StateSource for its State variant, None for Fetch) so the generic NodeBackedChainIndex can spawn the build without naming the concrete source. new_with_sync_timings spawns the one-shot build via outp_to_spend_index::spawn_build, which opens the index's own LMDB env (a sibling of the chain-index DB), builds from the Sapling activation height to the finalised tip, and logs its outcome. Replaces the redundant from_state constructor: the State/Fetch guard now lives in the trait override, and the never-Fetch guarantee still holds at the loop via SpendIndexSource. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Dev dropped async_trait (#1314, native AFIT) and reworked ChainWork into a NonZeroU128 primitive passed as Option<ChainWork> at call sites (#1313). Update the spend-index code to match: impl_state_source_forwarders! no longer emits #[async_trait] (its async fn forwarders satisfy BlockchainSource's native -> impl SendFut<_> methods), and SpendIndexSync::run passes None for build_indexed_block_from_source's now-Option<ChainWork> parent_chainwork (chainwork is irrelevant to spend extraction), dropping the unused ChainWork import. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Promotes the one-shot spend-index build from a detached task to a tracked one: NodeBackedChainIndex holds its JoinHandle (feature-gated) and aborts it in both shutdown() and Drop, so the build never outlives the index. Aborting is safe: run() accumulates spends in memory and writes the index in a single MDB_APPEND transaction at the very end, so an abort either leaves the in-memory work undone (nothing written) or lands after the atomic commit — never a partial index. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

The 201-block fixture can't exercise a non-empty index from run (coinbase maturity puts every fixture spend above the depth-100 seam). This synthesises a chain with zebra's block generator (allow_all_transparent_coinbase_spends lets transparent spends land below the seam), deterministically searches via a seeded TestRunner for a chain whose finalised range holds a spend, runs the index, and asserts every finalised spend is mapped to its spender. Addresses the presence half (Part 1) of #1334. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Land the decision record #1328 promised before implementation (the 0002 slot the issue reserved was taken by the live-tests ADR). Beyond the issue's content, the record captures what has crystallised since: - PR #1167's get_outpoint_spenders already performs the serve-time union with the non-finalised state (FullChain scope), so the remaining serving step is swapping its finalised leg from the monolith's spent table (outpoint -> TxLocation -> txid) to this index; get_outpoint_spenders(Finalised) doubles as an in-tree parity oracle alongside zebra's SpendingTransactionId. - The start height is a first-class index floor: absence means "unspent within the built range". Serving deployments must run a genesis floor, enforced by config validation; the shipped config defaults to genesis with the Sapling-activation floor present only as a commented-out option. - The one-shot pre-materialized MDB_APPEND pass is the deliberate speed-of-light baseline for the sync benchmark (~30 GB peak at genesis floor); a batched k-way-merge build is a measurement-gated contingency, not a foregone production requirement. - Acceptance bar: one mainnet run producing the benchmark numbers, a parity diff against zebra, and golden vectors captured before zebra drops its indexer feature; a synthetic-chain parity test runs first to prove the oracle plumbing. The chain-index glossary gains "index floor" (distinct from the finalized floor) and "spending transaction" (the term #1328 defined and promised to this file). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

cargo check --features outp_to_spend_index fails at the previous HEAD: the gated module still referenced NON_FINALIZED_DEPTH after this branch renamed the constant to OPERATIONAL_NFS_DEPTH. A default-off feature is compiled by no default build, so nothing caught the rename missing it. Fix the five references, and a clone-on-Copy in spawn_build. Also repoint the module header and feature comment from the phantom docs/adr/0002 path to the landed ADR 0006, and rewrite the SpendIndexDb::spending_txid doc to name the real serving plan: the serve-time union already exists as get_outpoint_spenders (FullChain scope, #1167); wiring this store in as that method's finalised leg is the deferred step. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

The same rot pattern struck twice: this branch's constant rename missed the default-off spend-index module, and dev's async-trait -> native-AFIT migration broke transparent_address_history_experimental (E0446: pub(crate) AddrEventBytes leaking through the pub TransparentHistExt trait). Both were invisible because no default build compiles gated code, and the existing clippy --all-features lint is red with dev-inherited warnings, so it guards nothing in practice. Add a check-experimental task (cargo check --tests --features experimental_features) to the makers lint front door, and fix what its first run caught: TransparentHistExt narrows to pub(crate) — it has no users outside zaino-state and no re-export, so minimum visibility applies. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Three moves toward the pilot's acceptance bar (ADR 0006: a mainnet benchmark of an optimally fast index sync), informed by emersonian's PR #1241, which measured the monolith's serial two-await-per-block fetch collapsing to ~1 blk/s in the sandblast band with both zaino and zebra CPU-idle: - Instrument the build. run() returns SpendIndexBuildStats — worker count, blocks, spends, and wall-clock per stage (stream/extract, collate, bulk-load). Collate and load are identical across build variants, so comparing runs isolates the streaming stage. spawn_build logs the stats and gains loud, ignorable-when-unset benchmark env knobs: ZAINO_SPEND_INDEX_START_HEIGHT / _END_HEIGHT bound the built range; ZAINO_SPEND_INDEX_WORKERS sets the fan-out. - Parallelize the streaming stage. Workers pull fixed-size 1000-block chunks from a shared atomic queue — chunk-pulling self-balances the block-weight skew (sandblast blocks dwarf 2016-era ones) — and extract into worker-local buffers; one global sort and one MDB_APPEND pass follow, so workers never touch the store (the single-writer discipline whose violation PR #1275 diagnosed as LMDB SIGSEGV). There is one code path, not two: workers = 1 IS the serial baseline, so serial-vs-parallel comparisons vary only the fan-out. The move-only single-build guarantee is untouched. - Make the fetch roots-free. Workers extract directly from the zebra block (extract_spends_from_zebra_block): one get_block per height, no get_commitment_tree_roots await, no compact conversion of the discarded shielded data. The compact-form extractor remains as the test oracle, so the existing sync-loop and presence tests now cross-check the two extraction paths over the same chains. SpendIndexSync drops its network field (it only fed activation heights the extractor no longer needs). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Wire zaino-state's dev-dependencies to zebra-state with the indexer and proptest-impl features: indexer exposes zebra's SpendingTransactionId read request — the temporary correctness oracle ADR 0006 designates for spend-index parity, to be captured as golden vectors before zebra removes the feature — and proptest-impl exposes populated_state for building oracle-backed test states. Dev-only; the production feature set is unchanged. Deliberately temporary, like the oracle itself: once parity is demonstrated and the golden vectors are captured, this dependency is reverted along with zebra's indexer feature. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

The synthetic-chain leg of ADR 0006's acceptance bar, and the first consumer of the zebra-state indexer dev-dep: build the index through the real run() path (chunk queue, roots-free extraction, MDB_APPEND) from a deterministic synthetic chain, commit the same blocks into a real zebra ReadStateService via populated_state (checkpoint-verified, so the oracle's finalized spending_tx_loc index covers the whole chain), then compare every prevout spent anywhere on the chain: - at or below the seam, the index and the oracle must agree byte-for-byte on the spending txid; - above the seam, the one designed divergence is asserted rather than skipped: the finalised-only index is silent while the whole-chain oracle answers; - a never-spent outpoint is None from both implementations. The presence test's deterministic chain generation moves to a shared synthetic_chain test module (seeded TestRunner, so both tests see the same chain on every run); its assertions are unchanged. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

The roots-free refactor left build_indexed_block_from_source and zaino_common::Network used only by the test mods (the presence test's compact-form expected-value path), so the feature-on lib build warned on both. Surfaced by the parity test run's test-profile build. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

zancas force-pushed the single_outpoint_index branch 4 times, most recently from 01922a9 to 5f688dd Compare June 30, 2026 04:39

zancas requested a review from AloeareV June 30, 2026 06:01

zancas force-pushed the single_outpoint_index branch 6 times, most recently from 0bbd2c5 to 6a77707 Compare July 2, 2026 04:26

zancas and others added 19 commits July 3, 2026 00:14

fmt

01fba9c

zancas force-pushed the single_outpoint_index branch from ea9cfa6 to 1ce4e67 Compare July 3, 2026 07:22

zancas changed the base branch from dev to drop_zingo_common_components July 3, 2026 07:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Single outpoint index#1330

Single outpoint index#1330
zancas wants to merge 19 commits into
drop_zingo_common_componentsfrom
single_outpoint_index

zancas commented Jun 29, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

zancas commented Jun 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

zancas commented Jun 29, 2026 •

edited

Loading