Single outpoint index#1330
Draft
zancas wants to merge 19 commits into
Draft
Conversation
01922a9 to
5f688dd
Compare
0bbd2c5 to
6a77707
Compare
The front-door test set is named for the `packages/` directory it runs, so the plural is the faithful name. Hard-cut: `makers test package` now errors with `unknown set` rather than aliasing. Updates every reference site: Makefile.toml (default, case matcher, dispatch, descriptions, comment block), help.sh, docs/testing.md, README.md, and the live-tests/CONTEXT.md glossary term (now records the retired singular under _Avoid_). ADR-0004 gets a one-line supersession footer; its historical text is preserved. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Introduce the default-off `outp_to_spend_index` experimental feature gating a self-contained module for the parallel-buildable finalised spend-index POC. Adds the pure, statically read-free extractor extract_spends(&[IndexedBlock]) -> Vec<(Outpoint, TransactionHash)>, mapping each consumed transparent outpoint to its spending txid (coinbase null-prevout inputs skipped), plus a unit test for the coinbase-skip rule. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Add `collate`: encode each spend record as (outpoint key, bare 32-byte txid value), sort by key bytes (LMDB memcmp order, ready for MDB_APPEND), and reject duplicate keys (the disjoint-key invariant). Adds the EncodedSpend type and collate tests. The extractor carries a TODO for the buffer-reuse variant, to land with the per-worker loop. No integrity commitment (not MVP); the MDB_APPEND write folds into the loop slice, which opens the index's LMDB environment. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Pin that collate sorts by encoded key bytes (LMDB memcmp order), not numerically: for two spends sharing a prev_txid, the LE-encoded output index makes 256 (00 01 00 00) sort before 1 (01 00 00 00) — exactly what MDB_APPEND expects. Regression-guards the sort<->storage agreement. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Open the index's own LMDB environment (no WRITE_MAP, mirroring the finalised state) holding a single outpoint -> spending_txid database. bulk_load writes collated entries via MDB_APPEND in one transaction (one-shot, globally-sorted input; a debug_assert documents the contract and LMDB self-checks it at runtime). spending_txid is a point read (None = unspent in finalised state). Round-trip test over a temp environment. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
StateSource: a compile-time StateService-only BlockchainSource wrapping the zebra State connector; its forwarders are generated by a declarative macro that delegates to the tested ValidatorConnector::State dispatch (one construction site). SpendIndexSource marker admits StateSource + MockchainSource but not the zcashd/JSON-RPC Fetch path, so 'never FetchService' is a compile-time fact. SpendIndexSync<S: SpendIndexSource> is a move-only, single-run handle whose run(self) streams [start_height, finalised_tip] from the source, extracts spends, collates once, and MDB_APPEND-loads the index. start_height is configurable (genesis for a full index, or e.g. a network-upgrade activation). from_state is the production constructor and rejects the Fetch variant. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Drives SpendIndexSync::run over the 201-block regtest fixture via a MockchainSource and asserts run indexes exactly the finalised-range spends: every finalised spend mapped to its spender, every non-finalised spend absent, and an unspent outpoint reading back as None. The mockchain caps get_best_block_height at its max loaded height (200), so at the cfg(test) depth of 100 the finalised tip is 100; regtest coinbase maturity puts every fixture spend at height >=101, so this exercises the loop end to end in the exclusion direction (fetch -> finalised boundary -> empty collate -> LMDB -> read-back). Finalised-spend mapping with real records is covered by the spend_index_db round-trip; an end-to-end presence check needs a longer synthetic chain. multi_thread because run uses block_in_place for the LMDB write. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Adds a feature-gated BlockchainSource::finalised_spend_index_source bridge (default None; ValidatorConnector yields a StateSource for its State variant, None for Fetch) so the generic NodeBackedChainIndex can spawn the build without naming the concrete source. new_with_sync_timings spawns the one-shot build via outp_to_spend_index::spawn_build, which opens the index's own LMDB env (a sibling of the chain-index DB), builds from the Sapling activation height to the finalised tip, and logs its outcome. Replaces the redundant from_state constructor: the State/Fetch guard now lives in the trait override, and the never-Fetch guarantee still holds at the loop via SpendIndexSource. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Dev dropped async_trait (#1314, native AFIT) and reworked ChainWork into a NonZeroU128 primitive passed as Option<ChainWork> at call sites (#1313). Update the spend-index code to match: impl_state_source_forwarders! no longer emits #[async_trait] (its async fn forwarders satisfy BlockchainSource's native -> impl SendFut<_> methods), and SpendIndexSync::run passes None for build_indexed_block_from_source's now-Option<ChainWork> parent_chainwork (chainwork is irrelevant to spend extraction), dropping the unused ChainWork import. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Promotes the one-shot spend-index build from a detached task to a tracked one: NodeBackedChainIndex holds its JoinHandle (feature-gated) and aborts it in both shutdown() and Drop, so the build never outlives the index. Aborting is safe: run() accumulates spends in memory and writes the index in a single MDB_APPEND transaction at the very end, so an abort either leaves the in-memory work undone (nothing written) or lands after the atomic commit — never a partial index. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The 201-block fixture can't exercise a non-empty index from run (coinbase maturity puts every fixture spend above the depth-100 seam). This synthesises a chain with zebra's block generator (allow_all_transparent_coinbase_spends lets transparent spends land below the seam), deterministically searches via a seeded TestRunner for a chain whose finalised range holds a spend, runs the index, and asserts every finalised spend is mapped to its spender. Addresses the presence half (Part 1) of #1334. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Land the decision record #1328 promised before implementation (the 0002 slot the issue reserved was taken by the live-tests ADR). Beyond the issue's content, the record captures what has crystallised since: - PR #1167's get_outpoint_spenders already performs the serve-time union with the non-finalised state (FullChain scope), so the remaining serving step is swapping its finalised leg from the monolith's spent table (outpoint -> TxLocation -> txid) to this index; get_outpoint_spenders(Finalised) doubles as an in-tree parity oracle alongside zebra's SpendingTransactionId. - The start height is a first-class index floor: absence means "unspent within the built range". Serving deployments must run a genesis floor, enforced by config validation; the shipped config defaults to genesis with the Sapling-activation floor present only as a commented-out option. - The one-shot pre-materialized MDB_APPEND pass is the deliberate speed-of-light baseline for the sync benchmark (~30 GB peak at genesis floor); a batched k-way-merge build is a measurement-gated contingency, not a foregone production requirement. - Acceptance bar: one mainnet run producing the benchmark numbers, a parity diff against zebra, and golden vectors captured before zebra drops its indexer feature; a synthetic-chain parity test runs first to prove the oracle plumbing. The chain-index glossary gains "index floor" (distinct from the finalized floor) and "spending transaction" (the term #1328 defined and promised to this file). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
cargo check --features outp_to_spend_index fails at the previous HEAD: the gated module still referenced NON_FINALIZED_DEPTH after this branch renamed the constant to OPERATIONAL_NFS_DEPTH. A default-off feature is compiled by no default build, so nothing caught the rename missing it. Fix the five references, and a clone-on-Copy in spawn_build. Also repoint the module header and feature comment from the phantom docs/adr/0002 path to the landed ADR 0006, and rewrite the SpendIndexDb::spending_txid doc to name the real serving plan: the serve-time union already exists as get_outpoint_spenders (FullChain scope, #1167); wiring this store in as that method's finalised leg is the deferred step. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The same rot pattern struck twice: this branch's constant rename missed the default-off spend-index module, and dev's async-trait -> native-AFIT migration broke transparent_address_history_experimental (E0446: pub(crate) AddrEventBytes leaking through the pub TransparentHistExt trait). Both were invisible because no default build compiles gated code, and the existing clippy --all-features lint is red with dev-inherited warnings, so it guards nothing in practice. Add a check-experimental task (cargo check --tests --features experimental_features) to the makers lint front door, and fix what its first run caught: TransparentHistExt narrows to pub(crate) — it has no users outside zaino-state and no re-export, so minimum visibility applies. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Three moves toward the pilot's acceptance bar (ADR 0006: a mainnet benchmark of an optimally fast index sync), informed by emersonian's PR #1241, which measured the monolith's serial two-await-per-block fetch collapsing to ~1 blk/s in the sandblast band with both zaino and zebra CPU-idle: - Instrument the build. run() returns SpendIndexBuildStats — worker count, blocks, spends, and wall-clock per stage (stream/extract, collate, bulk-load). Collate and load are identical across build variants, so comparing runs isolates the streaming stage. spawn_build logs the stats and gains loud, ignorable-when-unset benchmark env knobs: ZAINO_SPEND_INDEX_START_HEIGHT / _END_HEIGHT bound the built range; ZAINO_SPEND_INDEX_WORKERS sets the fan-out. - Parallelize the streaming stage. Workers pull fixed-size 1000-block chunks from a shared atomic queue — chunk-pulling self-balances the block-weight skew (sandblast blocks dwarf 2016-era ones) — and extract into worker-local buffers; one global sort and one MDB_APPEND pass follow, so workers never touch the store (the single-writer discipline whose violation PR #1275 diagnosed as LMDB SIGSEGV). There is one code path, not two: workers = 1 IS the serial baseline, so serial-vs-parallel comparisons vary only the fan-out. The move-only single-build guarantee is untouched. - Make the fetch roots-free. Workers extract directly from the zebra block (extract_spends_from_zebra_block): one get_block per height, no get_commitment_tree_roots await, no compact conversion of the discarded shielded data. The compact-form extractor remains as the test oracle, so the existing sync-loop and presence tests now cross-check the two extraction paths over the same chains. SpendIndexSync drops its network field (it only fed activation heights the extractor no longer needs). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Wire zaino-state's dev-dependencies to zebra-state with the indexer and proptest-impl features: indexer exposes zebra's SpendingTransactionId read request — the temporary correctness oracle ADR 0006 designates for spend-index parity, to be captured as golden vectors before zebra removes the feature — and proptest-impl exposes populated_state for building oracle-backed test states. Dev-only; the production feature set is unchanged. Deliberately temporary, like the oracle itself: once parity is demonstrated and the golden vectors are captured, this dependency is reverted along with zebra's indexer feature. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The synthetic-chain leg of ADR 0006's acceptance bar, and the first consumer of the zebra-state indexer dev-dep: build the index through the real run() path (chunk queue, roots-free extraction, MDB_APPEND) from a deterministic synthetic chain, commit the same blocks into a real zebra ReadStateService via populated_state (checkpoint-verified, so the oracle's finalized spending_tx_loc index covers the whole chain), then compare every prevout spent anywhere on the chain: - at or below the seam, the index and the oracle must agree byte-for-byte on the spending txid; - above the seam, the one designed divergence is asserted rather than skipped: the finalised-only index is silent while the whole-chain oracle answers; - a never-spent outpoint is None from both implementations. The presence test's deterministic chain generation moves to a shared synthetic_chain test module (seeded TestRunner, so both tests see the same chain on every run); its assertions are unchanged. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The roots-free refactor left build_indexed_block_from_source and zaino_common::Network used only by the test mods (the presence test's compact-form expected-value path), so the feature-on lib build warned on both. Surfaced by the parity test run's test-profile build. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
ea9cfa6 to
1ce4e67
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes: #1328