Skip to content

Single outpoint index#1330

Draft
zancas wants to merge 19 commits into
drop_zingo_common_componentsfrom
single_outpoint_index
Draft

Single outpoint index#1330
zancas wants to merge 19 commits into
drop_zingo_common_componentsfrom
single_outpoint_index

Conversation

@zancas

@zancas zancas commented Jun 29, 2026

Copy link
Copy Markdown
Member

Fixes: #1328

@zancas zancas force-pushed the single_outpoint_index branch 4 times, most recently from 01922a9 to 5f688dd Compare June 30, 2026 04:39
@zancas zancas requested a review from AloeareV June 30, 2026 06:01
@zancas zancas force-pushed the single_outpoint_index branch 6 times, most recently from 0bbd2c5 to 6a77707 Compare July 2, 2026 04:26
zancas and others added 19 commits July 3, 2026 00:14
The front-door test set is named for the `packages/` directory it runs, so
the plural is the faithful name. Hard-cut: `makers test package` now errors
with `unknown set` rather than aliasing.

Updates every reference site: Makefile.toml (default, case matcher, dispatch,
descriptions, comment block), help.sh, docs/testing.md, README.md, and the
live-tests/CONTEXT.md glossary term (now records the retired singular under
_Avoid_). ADR-0004 gets a one-line supersession footer; its historical text
is preserved.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Introduce the default-off `outp_to_spend_index` experimental feature gating a self-contained module for the parallel-buildable finalised spend-index POC. Adds the pure, statically read-free extractor extract_spends(&[IndexedBlock]) -> Vec<(Outpoint, TransactionHash)>, mapping each consumed transparent outpoint to its spending txid (coinbase null-prevout inputs skipped), plus a unit test for the coinbase-skip rule.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Add `collate`: encode each spend record as (outpoint key, bare 32-byte txid value), sort by key bytes (LMDB memcmp order, ready for MDB_APPEND), and reject duplicate keys (the disjoint-key invariant). Adds the EncodedSpend type and collate tests. The extractor carries a TODO for the buffer-reuse variant, to land with the per-worker loop. No integrity commitment (not MVP); the MDB_APPEND write folds into the loop slice, which opens the index's LMDB environment.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Pin that collate sorts by encoded key bytes (LMDB memcmp order), not numerically: for two spends sharing a prev_txid, the LE-encoded output index makes 256 (00 01 00 00) sort before 1 (01 00 00 00) — exactly what MDB_APPEND expects. Regression-guards the sort<->storage agreement.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Open the index's own LMDB environment (no WRITE_MAP, mirroring the finalised state) holding a single outpoint -> spending_txid database. bulk_load writes collated entries via MDB_APPEND in one transaction (one-shot, globally-sorted input; a debug_assert documents the contract and LMDB self-checks it at runtime). spending_txid is a point read (None = unspent in finalised state). Round-trip test over a temp environment.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
StateSource: a compile-time StateService-only BlockchainSource wrapping the zebra State connector; its forwarders are generated by a declarative macro that delegates to the tested ValidatorConnector::State dispatch (one construction site). SpendIndexSource marker admits StateSource + MockchainSource but not the zcashd/JSON-RPC Fetch path, so 'never FetchService' is a compile-time fact. SpendIndexSync<S: SpendIndexSource> is a move-only, single-run handle whose run(self) streams [start_height, finalised_tip] from the source, extracts spends, collates once, and MDB_APPEND-loads the index. start_height is configurable (genesis for a full index, or e.g. a network-upgrade activation). from_state is the production constructor and rejects the Fetch variant.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Drives SpendIndexSync::run over the 201-block regtest fixture via a MockchainSource and asserts run indexes exactly the finalised-range spends: every finalised spend mapped to its spender, every non-finalised spend absent, and an unspent outpoint reading back as None. The mockchain caps get_best_block_height at its max loaded height (200), so at the cfg(test) depth of 100 the finalised tip is 100; regtest coinbase maturity puts every fixture spend at height >=101, so this exercises the loop end to end in the exclusion direction (fetch -> finalised boundary -> empty collate -> LMDB -> read-back). Finalised-spend mapping with real records is covered by the spend_index_db round-trip; an end-to-end presence check needs a longer synthetic chain. multi_thread because run uses block_in_place for the LMDB write.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Adds a feature-gated BlockchainSource::finalised_spend_index_source bridge (default None; ValidatorConnector yields a StateSource for its State variant, None for Fetch) so the generic NodeBackedChainIndex can spawn the build without naming the concrete source. new_with_sync_timings spawns the one-shot build via outp_to_spend_index::spawn_build, which opens the index's own LMDB env (a sibling of the chain-index DB), builds from the Sapling activation height to the finalised tip, and logs its outcome. Replaces the redundant from_state constructor: the State/Fetch guard now lives in the trait override, and the never-Fetch guarantee still holds at the loop via SpendIndexSource.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Dev dropped async_trait (#1314, native AFIT) and reworked ChainWork into a NonZeroU128 primitive passed as Option<ChainWork> at call sites (#1313). Update the spend-index code to match: impl_state_source_forwarders! no longer emits #[async_trait] (its async fn forwarders satisfy BlockchainSource's native -> impl SendFut<_> methods), and SpendIndexSync::run passes None for build_indexed_block_from_source's now-Option<ChainWork> parent_chainwork (chainwork is irrelevant to spend extraction), dropping the unused ChainWork import.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Promotes the one-shot spend-index build from a detached task to a tracked one: NodeBackedChainIndex holds its JoinHandle (feature-gated) and aborts it in both shutdown() and Drop, so the build never outlives the index. Aborting is safe: run() accumulates spends in memory and writes the index in a single MDB_APPEND transaction at the very end, so an abort either leaves the in-memory work undone (nothing written) or lands after the atomic commit — never a partial index.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The 201-block fixture can't exercise a non-empty index from run (coinbase maturity puts every fixture spend above the depth-100 seam). This synthesises a chain with zebra's block generator (allow_all_transparent_coinbase_spends lets transparent spends land below the seam), deterministically searches via a seeded TestRunner for a chain whose finalised range holds a spend, runs the index, and asserts every finalised spend is mapped to its spender. Addresses the presence half (Part 1) of #1334.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Land the decision record #1328 promised before implementation (the
0002 slot the issue reserved was taken by the live-tests ADR). Beyond
the issue's content, the record captures what has crystallised since:

- PR #1167's get_outpoint_spenders already performs the serve-time
  union with the non-finalised state (FullChain scope), so the
  remaining serving step is swapping its finalised leg from the
  monolith's spent table (outpoint -> TxLocation -> txid) to this
  index; get_outpoint_spenders(Finalised) doubles as an in-tree
  parity oracle alongside zebra's SpendingTransactionId.
- The start height is a first-class index floor: absence means
  "unspent within the built range". Serving deployments must run a
  genesis floor, enforced by config validation; the shipped config
  defaults to genesis with the Sapling-activation floor present only
  as a commented-out option.
- The one-shot pre-materialized MDB_APPEND pass is the deliberate
  speed-of-light baseline for the sync benchmark (~30 GB peak at
  genesis floor); a batched k-way-merge build is a measurement-gated
  contingency, not a foregone production requirement.
- Acceptance bar: one mainnet run producing the benchmark numbers, a
  parity diff against zebra, and golden vectors captured before zebra
  drops its indexer feature; a synthetic-chain parity test runs first
  to prove the oracle plumbing.

The chain-index glossary gains "index floor" (distinct from the
finalized floor) and "spending transaction" (the term #1328 defined
and promised to this file).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
cargo check --features outp_to_spend_index fails at the previous HEAD:
the gated module still referenced NON_FINALIZED_DEPTH after this
branch renamed the constant to OPERATIONAL_NFS_DEPTH. A default-off
feature is compiled by no default build, so nothing caught the rename
missing it. Fix the five references, and a clone-on-Copy in
spawn_build.

Also repoint the module header and feature comment from the phantom
docs/adr/0002 path to the landed ADR 0006, and rewrite the
SpendIndexDb::spending_txid doc to name the real serving plan: the
serve-time union already exists as get_outpoint_spenders (FullChain
scope, #1167); wiring this store in as that method's finalised leg is
the deferred step.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The same rot pattern struck twice: this branch's constant rename
missed the default-off spend-index module, and dev's async-trait ->
native-AFIT migration broke transparent_address_history_experimental
(E0446: pub(crate) AddrEventBytes leaking through the pub
TransparentHistExt trait). Both were invisible because no default
build compiles gated code, and the existing clippy --all-features
lint is red with dev-inherited warnings, so it guards nothing in
practice.

Add a check-experimental task (cargo check --tests --features
experimental_features) to the makers lint front door, and fix what
its first run caught: TransparentHistExt narrows to pub(crate) — it
has no users outside zaino-state and no re-export, so minimum
visibility applies.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Three moves toward the pilot's acceptance bar (ADR 0006: a mainnet
benchmark of an optimally fast index sync), informed by emersonian's
PR #1241, which measured the monolith's serial two-await-per-block
fetch collapsing to ~1 blk/s in the sandblast band with both zaino and
zebra CPU-idle:

- Instrument the build. run() returns SpendIndexBuildStats — worker
  count, blocks, spends, and wall-clock per stage (stream/extract,
  collate, bulk-load). Collate and load are identical across build
  variants, so comparing runs isolates the streaming stage. spawn_build
  logs the stats and gains loud, ignorable-when-unset benchmark env
  knobs: ZAINO_SPEND_INDEX_START_HEIGHT / _END_HEIGHT bound the built
  range; ZAINO_SPEND_INDEX_WORKERS sets the fan-out.

- Parallelize the streaming stage. Workers pull fixed-size 1000-block
  chunks from a shared atomic queue — chunk-pulling self-balances the
  block-weight skew (sandblast blocks dwarf 2016-era ones) — and
  extract into worker-local buffers; one global sort and one MDB_APPEND
  pass follow, so workers never touch the store (the single-writer
  discipline whose violation PR #1275 diagnosed as LMDB SIGSEGV).
  There is one code path, not two: workers = 1 IS the serial baseline,
  so serial-vs-parallel comparisons vary only the fan-out. The
  move-only single-build guarantee is untouched.

- Make the fetch roots-free. Workers extract directly from the zebra
  block (extract_spends_from_zebra_block): one get_block per height,
  no get_commitment_tree_roots await, no compact conversion of the
  discarded shielded data. The compact-form extractor remains as the
  test oracle, so the existing sync-loop and presence tests now
  cross-check the two extraction paths over the same chains.
  SpendIndexSync drops its network field (it only fed activation
  heights the extractor no longer needs).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Wire zaino-state's dev-dependencies to zebra-state with the indexer
and proptest-impl features: indexer exposes zebra's
SpendingTransactionId read request — the temporary correctness oracle
ADR 0006 designates for spend-index parity, to be captured as golden
vectors before zebra removes the feature — and proptest-impl exposes
populated_state for building oracle-backed test states. Dev-only; the
production feature set is unchanged.

Deliberately temporary, like the oracle itself: once parity is
demonstrated and the golden vectors are captured, this dependency is
reverted along with zebra's indexer feature.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The synthetic-chain leg of ADR 0006's acceptance bar, and the first
consumer of the zebra-state indexer dev-dep: build the index through
the real run() path (chunk queue, roots-free extraction, MDB_APPEND)
from a deterministic synthetic chain, commit the same blocks into a
real zebra ReadStateService via populated_state (checkpoint-verified,
so the oracle's finalized spending_tx_loc index covers the whole
chain), then compare every prevout spent anywhere on the chain:

- at or below the seam, the index and the oracle must agree
  byte-for-byte on the spending txid;
- above the seam, the one designed divergence is asserted rather than
  skipped: the finalised-only index is silent while the whole-chain
  oracle answers;
- a never-spent outpoint is None from both implementations.

The presence test's deterministic chain generation moves to a shared
synthetic_chain test module (seeded TestRunner, so both tests see the
same chain on every run); its assertions are unchanged.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The roots-free refactor left build_indexed_block_from_source and
zaino_common::Network used only by the test mods (the presence test's
compact-form expected-value path), so the feature-on lib build warned
on both. Surfaced by the parity test run's test-profile build.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@zancas zancas force-pushed the single_outpoint_index branch from ea9cfa6 to 1ce4e67 Compare July 3, 2026 07:22
@zancas zancas changed the base branch from dev to drop_zingo_common_components July 3, 2026 07:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Finalised spend-index pilot: parallel-buildable transparent_outpoint → spending_txid (refines #1326)

1 participant