kv: secondary rows reference primary by chainbase id#304
Conversation
Construct an object at a caller-supplied chainbase id rather than the next monotonic value, advancing _next_id so subsequent emplace() calls cannot reuse ids. Used by the snapshot loader to reinstate objects at their original ids so cross-references (e.g. kv_index_object::primary_id -> kv_object::id) remain valid across save+load. Must not be called inside an undo session; the undo stack's old_next_id bookkeeping assumes monotonic id assignment.
Replace shared_blob pri_key in kv_index_object with kv_object::id_type primary_id (8 bytes). Secondary rows no longer carry a copy of the primary key, which can be large (>=256 bytes) when contracts use long composite keys. The saved bytes scale per-row per-secondary-index. Composite index tiebreaker changes from pri_key lexicographic bytes to primary_id (chainbase insertion order). Within a duplicate sec_key the iteration order now matches the order rows were inserted rather than their primary-key byte ordering. Intrinsic ABI changes: - kv_set/kv_erase return the primary row's chainbase id (was: billable RAM delta, which CDT didn't use). - kv_idx_store/remove/update take int64_t primary_id instead of (pri_key, pri_key_size). Callers thread the id from the preceding kv_set/kv_erase. - kv_it_value accepts secondary iterators; resolves the value via the slot's cached primary_id with an O(1) by_id hop, faster than the old by_code_key path. kv_idx_primary_key materializes pri_key bytes lazily from the primary kv_object on demand, caching in the iterator slot until the next move. Snapshot: snapshot_kv_object and snapshot_kv_index_object now include id; the loader calls chainbase::emplace_with_id to reinstate each row at its original id, so primary_id references stay valid without a resolution pass. kv sections can load in parallel again. Also fix kv_idx_lower_bound's double-scan on a miss by peeking backward from the returned iterator instead of issuing a second lower_bound to detect table non-emptiness. SHiP and chain_plugin get_table_rows updated to fetch pri_key bytes via the primary kv_object (by_id) when emitting rows for external consumers.
- kv_tests/kv_index_object_crud: create primary kv_objects first and thread their ids into secondary rows. - kv_comprehensive_tests/kv_secondary_index_ordering and kv_secondary_index_lookup_by_table_id: same update; resolve primary rows via by_id. - kv_comprehensive_tests/kv_secondary_tiebreak_is_primary_id_insertion_order: new test pinning the composite-key tiebreaker behavior. Uses pri_keys inserted in non-lexicographic order (500, 200, 100, 300) so any regression that tries to tiebreak by pri_key bytes fails immediately. - kv_api_tests: updated expected RAM formulas and primary_id return semantics; test_kv_api contract inserts primary rows before each kv_idx_store call and threads the returned primary_id through.
- protocol_feature_tests: bump ROA RAM from 0.0986 SYS to 0.0968 SYS to match the smaller recompiled ram_restrictions_test WASM so the exhaustion assertions still fire. - tests/sysio_util_snapshot_info_test.py: update head_block_id to match the regenerated snapshot. - Regenerate unittests/deep-mind/deep-mind.log, unittests/snapshots/*, unittests/test-data/consensus_blockchain/*. - Recompile production contract WASMs (sysio.token, sysio.system, sysio.msig, sysio.roa, sysio.bios) and pre-built test contract WASMs (snapshot_test, test_kv_api, test_kv_map, test_kv_sec_query) with the new CDT that threads primary_id through kv_multi_index and kv_table.
…dle test - emplace_with_id asserts _undo_stack is empty (developer-error guard; snapshot loader is the only current caller). Advance _next_id only when strictly below INT64_MAX so the increment stays well-defined on gcc/clang/MSVC without -fwrapv. Reaching saturation would take ~585 years at 1B emplaces/sec. - kv_api_tests/idx_value_on_secondary_handle + test_kv_api testsecvalue: pin the host <-> CDT contract that kv_it_value accepts a secondary iterator handle and resolves the value via the slot's cached primary_id. Catches any regression that drops sec-handle support or loses primary_id on iterator advance without relying on indirect kv_table coverage. - docs/kv-ram-billing.md: update kv_index_object fixed-field layout and per-row cost tables for the primary_id scheme (no pri_key bytes). - docs/kv-security-audit.md: describe the new erase ordering (kv_erase first returns primary_id, then kv_idx_remove threads it into each secondary removal).
… surfacing of missing primary - chain_plugin::get_table_rows::fetch_primary: SYS_ASSERT with contract_table_query_exception when a sec row references a missing primary. Surfaces the inconsistency to the RPC caller rather than silently returning an empty row. - kv_idx_next / kv_idx_prev slow-path re-seek: SYS_ASSERT primary_id >= 0 when status == iterator_ok. The branch is unreachable through the normal allocate/advance lifecycle; aborting on the inconsistent state beats synthesizing a re-seek tuple from garbage. Documented the invariant on kv_iterator_slot::primary_id. - apply_context.cpp: drop the redundant static_cast<int64_t>(_id) wrappers (itr->primary_id._id is already int64_t). - kv_idx_remove / kv_idx_update: include table_id, primary_id, and sec_key_size in the "entry not found" error messages so a contract debugging a mismatched-id case sees which lookup failed. - kv_it_value: add a dispatch-level comment describing how the primary-vs-secondary branching and the by_id resolution work. - Drop "O(1)" claims from comments/docs that were describing the chainbase by_id walk; by_id is an ordered tree, not a hash index.
ISSUE-1 drew a parallel to legacy multi_index (remove_secondaries before db_remove_i64). The CDT now does the reverse for kv (kv_erase first so it can return the primary_id, then kv_idx_remove with that id for each secondary). Reword to describe the current order without the misleading legacy analogy.
…erts' into feature/kv-secondary-primary-id # Conflicts: # libraries/chain/apply_context.cpp # libraries/chain/include/sysio/chain/kv_table_objects.hpp
…anges
The kv primary/secondary intrinsic signatures changed on this branch
(kv_set/kv_erase now return int64_t primary_id; kv_idx_store/remove/update
now take int64_t primary_id). Every contract that uses the kv host API
must be recompiled with the matching CDT. Rebuilt all 43 affected contract
WASMs and the 4 ABIs whose tables reference updated types:
- Production: sysio.authex, sysio.bios, sysio.chalg, sysio.epoch,
sysio.msgch, sysio.msig, sysio.opreg, sysio.roa, sysio.system,
sysio.token, sysio.uwrit, sysio.wrap
- Test helpers under contracts/test_contracts/ and unittests/test-contracts/
(kv, benchmark, api, integration, snapshot, ram, savanna/ibc, etc.)
No source changes - only compiled artifacts.
…ames Wire CDT emits `first`/`second` for `std::pair` and `std::map` fields in generated ABIs (see tests/toolchain/abigen-pass/nested_container.abi fixture in wire-cdt). cc1dde4 previously swapped in a Leap-derived ABI using `key`/`value` as a workaround, but it regresses every time the WASM is regenerated and diverges from Wire CDT's own toolchain tests. This commit updates the Python test inputs and assertions to use `first`/`second` throughout, matching Wire CDT's output. The `pvo` case (`pair<uint16_t, vec_op_uint16>`) was already on `first`/`second` and is unchanged. `transaction_json[...]['value']` row accessors are untouched - that is the clio RPC envelope field, not a pair field name.
…audit Replace the pre-PR "THIS IS A REAL ISSUE" framing in section 1.11 with the finalized design: intrinsic-level cascade was rejected (host has no schema knowledge of which sec indexes a primary has), and three properties bound misuse — host-side primary_id validation on kv_idx_store, composite-key abort on remove/update, and loud read-side failure for orphans. Expand ISSUE-1 with a host-guards table covering the misuse-to-abort mapping under the new primary_id signatures, and a structured rundown of the residual orphan failure mode (RAM still billed, in-contract reads return iterator_erased, RPC and SHiP abort, snapshot consistency unaffected at save but orphans trip read-side checks on first use). Shorten ISSUE-4 and drop the stale code snippet (kv_erase no longer returns delta — it returns primary_id), reframing it as a corollary of ISSUE-1. Fix the ISSUE-5 typo in the priority recommendations table.
b0c83c4 to
e1a9be9
Compare
| // consumers continue to see pri_key bytes in the stream. The wire format | ||
| // for secondary rows is unchanged (struct_version 0 still ends with | ||
| // a length-prefixed pri_key byte string). | ||
| const auto* primary = obj.db.find<sysio::chain::kv_object>(obj.obj.primary_id); |
There was a problem hiding this comment.
This line can no longer serialize removed KV secondary rows correctly. The PR says SHiP keeps emitting pri_key bytes by resolving primary_id -> kv_object, but KV erase flow now deletes the primary first (apply_context.cpp:713-715) and then removes secondary rows (apply_context.cpp:1019-1021). create_deltas.cpp:95-97 serializes removed rows after both removals, so the primary lookup returns null and SHiP emits pri_key_size = 0. That breaks the unchanged SHiP wire contract for secondary-row deletes and makes indexers unable to identify the removed primary key.
There was a problem hiding this comment.
Good catch, confirmed. Fixed in 26478ea by falling back to a scan of the kv_index undo session's removed_values when the live find() misses. SHiP packs deltas at block scope and pack_deltas iterates kv_index removed_values too, so the primary's row (with its key bytes) is guaranteed to be there for the same block. Wire format unchanged.
Added test_deltas_contract_index_kv_secondary_delete_pri_key (state_history_tests.cpp) which deploys get_table_test, addnumobj + erase in separate blocks, then asserts every secondary delete delta carries non-empty pri_key matching the captured pre-delete bytes.
Linear scan is O(N) per fallback (N = primaries removed in this block), only fires on the cold path (live primary missing). For typical erase patterns (a few rows per block) the scan is in the noise; if bulk-delete workloads ever push on this, can pre-build a flat_hash_map<id, blob*> in create_deltas as a follow-up.
| FC_REFLECT(sysio::chain::snapshot_kv_object, (id)(code)(payer)(key)(value)(table_id)) | ||
| FC_REFLECT(sysio::chain::snapshot_kv_index_object, (id)(code)(payer)(sec_key)(primary_id)(table_id)) |
There was a problem hiding this comment.
This changes the binary snapshot DTO layout by prepending id fields, and controller.cpp:1583-1585 unconditionally reads those ids. There is no legacy reader/version gate for snapshots written with the old (code,payer,key,...) and (code,payer,sec_key,pri_key,...) layouts, so older snapshots will be misdecoded rather than migrated. If old snapshots are intentionally unsupported, this needs an explicit snapshot format/version bump and tests that assert the failure mode.
There was a problem hiding this comment.
No existing snapshots to support. Chain has not launched.
…y-primary-id # Conflicts: # contracts/sysio.authex/sysio.authex.wasm # contracts/sysio.epoch/sysio.epoch.wasm # contracts/sysio.msgch/sysio.msgch.wasm # contracts/sysio.uwrit/sysio.uwrit.wasm # tests/sysio_util_snapshot_info_test.py # unittests/deep-mind/deep-mind.log # unittests/snapshots/blocks.index # unittests/snapshots/blocks.log # unittests/snapshots/snap_v1.bin.gz # unittests/snapshots/snap_v1.bin.json.gz # unittests/snapshots/snap_v1.json.gz
…o session When both the primary kv_object and its kv_index_object entries are removed in the same block, pack_deltas runs after both removals have already landed in chainbase. The live db.find<kv_object>(primary_id) miss caused SHiP to emit pri_key_size=0 on secondary deletion deltas, breaking indexers' ability to identify the removed primary key. Fall back to scanning the kv_index undo session's removed_values when the live lookup misses. The same block-scope undo session feeds both the kv_index and kv_index_index removed streams, so the lookup is guaranteed to find the matching primary. Adds test_deltas_contract_index_kv_secondary_delete_pri_key covering the delete-primary-and-secondary-in-same-block path.
…r merge Recompiled system + test contracts against feature/kv-secondary-primary-id CDT, then regenerated snapshot and deep-mind reference fixtures via: unit_test --run_test=deep_mind_tests -- --sys-vm --save-dmlog unit_test --run_test='snapshot_part2_tests/*' -- --sys-vm --save-snapshot --generate-snapshot-log head_block_id in tests/sysio_util_snapshot_info_test.py updated to match.
…y-primary-id apply_context::kv_idx_update: combine feature's primary_id signature with master's db.modify + invalidate_secondary_cache optimization. unittests/kv_tests::kv_index_modify_rekeys_correctly: port master's new test from pri_key bytes to primary_id row references.
|
Decided to leave this as is and just provide documentation on how users can avoid worse case scenarios. See https://github.com/Wire-Network/wire-cdt/pull/52/changes |
Summary
Secondary index rows (
kv_index_object) now store an 8-byteprimary_id(chainbase id of the referencedkv_object) instead of a copy of the primary-key bytes. For contracts with long primary keys (up to 256 B, possibly larger with composite keys), this savespri_key.size() + ~16 Bper secondary row in both chainbase and RAM billing. For short (8-byte) primary keys it's neutral.Requires coordinated wire-cdt change: Wire-Network/wire-cdt#50
Rollout
Wire is pre-launch: no live
kv_multi_indexsec-index data exists on any chain. The composite-key tiebreaker change (pri_key byte-lex → primary_id insertion order) becomes the baseline behavior for launch and does not require a protocol feature gate. No migration or state replay required.Behavior changes
pri_keybyte-lex order toprimary_id(chainbase insertion order). Within duplicatesec_keys, iteration order follows the order rows were inserted rather than primary-key lexicographic order.kv_set/kv_erasereturn value is now the primary row's chainbase id (was: billable RAM delta, which CDT didn't use).kv_idx_store/kv_idx_remove/kv_idx_updatesignature takesint64_t primary_idin place of(pri_key, pri_key_size). Callers thread the id from the precedingkv_set/kv_erase.kv_it_valueaccepts secondary iterator handles; resolves the value via the slot's cachedprimary_id(a by_id lookup), faster than the old by_code_key path for sec-iter value reads.kv_idx_primary_keymaterializespri_keybytes lazily from the primarykv_objecton demand, caching in the iterator slot until the next position change.kv_idx_lower_boundno longer does a double-scan on miss — it peeks backward from the returned iterator to detect table-non-empty.Snapshot / chainbase
chainbase::undo_index::emplace_with_id(id, ctor)constructs an object at a caller-supplied id and advances_next_idpast it. Used exclusively by the snapshot loader so primary_id references stay valid across save+load. Asserts_undo_stack.empty()at call time (developer-error guard).snapshot_kv_object/snapshot_kv_index_objectnow carryid; the loader reinstates each row at its original id, so cross-row references don't need any post-load resolution pass._next_idincrement guards against signed overflow UB at theINT64_MAXboundary.Plugins / SHiP / tools
chain_plugin::get_table_rowssecondary scans resolve primary rows viadb.find<kv_object>(sec->primary_id). Asserts the reference exists (surfaces a state-inconsistency to the RPC caller rather than silently returning an empty row).pri_keybytes in itskv_index_objectstream (wire format unchanged for external consumers); the bytes are materialized from the primary at serialize time.Tests
kv_comprehensive_tests/kv_secondary_tiebreak_is_primary_id_insertion_orderpins the composite-key tiebreaker (inserts pri_keys in non-lex order and asserts insertion-order iteration).kv_api_tests/idx_value_on_secondary_handle+test_kv_api/testsecvaluepin the raw-capi contract thatkv_it_valueaccepts sec handles.kv_tests,kv_comprehensive_tests,kv_api_testsupdated for primary_id semantics throughout.protocol_feature_testsretuned (ROA bump 0.0986 → 0.0968 SYS) to match the smaller recompiledram_restrictions_testWASM.deep-mind.log,unittests/snapshots/*,unittests/test-data/consensus_blockchain/*.Docs
docs/kv-ram-billing.mdupdated for the newkv_index_objectlayout and per-row cost tables.docs/kv-security-audit.mddescribes the new erase ordering (kv_erasefirst →primary_id→kv_idx_removefor each sec).