Optimize kv_idx_update: use modify instead of remove+create#282
Conversation
undo_index::post_modify handles AVL tree rebalancing when composite index key fields change. This avoids node deallocation, fresh node allocation, and reinsertion into all 3 AVL trees per secondary index update. Added test proving modify correctly rekeys across indices.
huangminghuang
left a comment
There was a problem hiding this comment.
Review Finding
The main behavioral risk is that db.modify preserves the kv_index_object
id, while the old remove/create implementation replaced it with a new id.
Secondary KV iterator handles cache both:
- the current secondary and primary key bytes, used by the slow re-seek path
- the chainbase object id, used by the fast
iterator_topath
kv_idx_next and kv_idx_prev prefer the cached id when it is still valid.
After PR #282, a secondary iterator acquired before kv_idx_update can still
find the modified object by id, but that object may now be at a different
secondary-key position.
Example:
- A contract gets a secondary iterator handle for
"alice". - It calls
kv_idx_updateto change"alice"to"zebra". - It calls
kv_idx_next(handle).
With the old remove/create behavior, the cached id no longer existed, so
kv_idx_next fell back to the stored key bytes and resumed from the old
"alice" position. It would likely land on "bob".
With db.modify, the cached id still exists, so kv_idx_next advances from
the moved "zebra" position and can return end, skipping entries that were
after "alice".
This also affects kv_idx_prev, kv_idx_key, and kv_idx_primary_key.
For example, kv_idx_key can now return the new key for an old handle instead
of reporting iterator_erased, which differs from the old remove/create model.
Proposed Fix
Invalidate only affected secondary iterator caches before calling db.modify.
The important part is to clear only cached_id, not the stored key bytes or
iterator status. That forces existing handles to use the old key-based slow
path on their next operation.
Add a helper to kv_iterator_pool:
void invalidate_secondary_cache(account_name code, uint16_t table_id, int64_t object_id) {
for (auto& s : _slots) {
if (s.in_use &&
!s.is_primary &&
s.code == code &&
s.table_id == table_id &&
s.cached_id == object_id) {
s.cached_id = -1;
}
}
}Then call it in apply_context::kv_idx_update before modifying the object:
const auto modified_id = itr->id._id;
kv_iterators.invalidate_secondary_cache(receiver, table_id, modified_id);
db.modify(*itr, [&](auto& o) {
o.payer = payer;
o.sec_key.assign(new_sec_key, new_sec_key_size);
});This preserves the old observable iterator behavior while still allowing
kv_idx_update to use db.modify.
Performance Impact
The cache invalidation should not erase the performance win that PR #282 is
aiming for.
The removed work is the expensive part: chainbase object removal, object
creation, allocation, full field assignment, new id creation, and delete/create
undo bookkeeping.
The proposed invalidation only scans the in-action KV iterator slot pool, which
is bounded by config::max_kv_iterators and is currently 1024. It only clears
slots whose cached id matches the object being modified. It does not touch
chainbase indexes and does not allocate objects.
In typical contract execution, the number of live iterators should be small.
Even in the worst case, this is a bounded linear scan over 1024 slots per
kv_idx_update, which is likely much cheaper than the old remove/create path.
Suggested Test
Add a contract-level test that exercises a live secondary iterator across
kv_idx_update:
- Store entries ordered as
"alice","bob","charlie". - Get a secondary iterator handle for
"alice". - Update
"alice"to"zebra". - Call
kv_idx_next(handle). - Verify it resumes according to the old
"alice"position, not the moved
"zebra"position.
A similar test for kv_idx_key(handle) after the update would also be useful
to confirm whether the API should continue reporting iterator_erased for the
old handle.
db.modify preserves the kv_index_object's chainbase id, so a live secondary iterator that cached the id would advance from the post-modify position and silently skip entries when sec_key changed. The prior remove+create path worked because the new object had a fresh id, so the cached id no longer resolved and iteration fell back to the slow re-seek using stored key bytes. Add kv_iterator_pool::invalidate_secondary_cache that clears cached_id only on matching secondary slots, leaving stored key bytes and status intact so the next op resumes from the old position. kv_idx_update calls it before db.modify.
…ptimizations Resolve conflict in libraries/chain/apply_context.cpp at the kv_idx_update db.modify lambda. Master's remove+create added o.table_id = table_id; the merged db.modify path drops the assignment since idx.find already constrains table_id on the located row. Adapt HEAD's invalidate_secondary_cache (added in 1146c38) to master's uint16_t table_id kv API: - kv_iterator_slot/invalidate_secondary_cache use (code, table_id) instead of (code, table, index_id) - kv_idx_update call site updated accordingly - kv_tests kv_index_modify_rekeys_correctly and kv_iterator_pool_invalidate_secondary_cache rewritten against the new schema; the by_code_table_idx_prikey lookup is replaced with a 4-tuple find on by_code_table_id_seckey since master has only the seckey index.
The merge from master in cd02d6d brought commit 30b1697, which collapsed kv_index_object's (table, index_id) into a single uint16_t table_id and reduced kv_iterator_pool::{allocate_secondary, invalidate_secondary_cache} to 2/3 args. apply_context.cpp::kv_idx_update merged correctly; the two test cases added in 4fe6dfd and 1146c38 still referenced the old fields and signatures. - kv_index_modify_rekeys_correctly: rewritten against table_id and by_code_table_id_seckey. Replaces the by_code_table_idx_prikey lookup (no longer a separate index) with a composite find on (sec_key + pri_key) plus an explicit "old key no longer resolves" check. - kv_iterator_pool_invalidate_secondary_cache: 2-arg allocate_secondary, 3-arg invalidate_secondary_cache. Old (different table) vs (different index) slots collapse to "different table_id"; kept as two distinct table_ids so the loop preserves more than one non-matching slot.
Summary
kv_idx_updatenow usesdb.modify()instead ofdb.remove()+db.create()when a secondary key changesundo_index::post_modifyrebalances only the affected AVL tree index, reducing tree operations from 6 to 2 per secondary index updateshared_bloballocation (pri_key no longer reassigned)get_deep_mind_loggerresult inkv_setupdate path to avoid redundant callPerformance
Estimated ~1-3us savings per secondary index update, scaling with table size (deeper AVL trees = more comparisons per operation). Most impactful for contracts with frequent secondary key modifications (orderbooks, inventories, sorted collections).