rebuild fk reverse index after partition snapshot import#56
Merged
Conversation
This was referenced May 4, 2026
6c76b28 to
99e740f
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
db_dataand FK constraints via the partition snapshot, but its in-memoryFkReverseIndexcache stayed empty for the imported records.start_fk_reverse_lookupandhandle_fk_reverse_lookup_requestreturned empty for any record sitting on a newly-imported partition — ON DELETE CASCADE missed children that the new primary owned, ON DELETE RESTRICT silently allowed deletes that should have been blocked.StoreManager::import_partitionnow ends with a privaterebuild_fk_indexes_after_importstep. It iterates every registered FK constraint and calls the existingrebuild_fk_index_for_constrainthelper, which walksdb_data.list(source_entity)(now populated with the just-imported records) and seeds the reverse index. Mirrors the existing pattern inapply_db_constraintwhere constraint Insert via Raft replication triggers the same rebuild.mqdb-cluster0.3.3 → 0.3.4. CHANGELOG entry under2026-05-03.Why import-side only
StoreManager::clear_partitionis never called anywhere in the codebase, so there is no demotion path that creates stale entries. The downstreamis_primary_for_partitionfilter (node_controller/fk.rs:357-365) drops any reverse-index entry the node no longer claims primaryship for, so even theoretical staleness is masked. Tracked as future work ifclear_partitionever gets wired in.Discovered while running the new E2E (separate follow-up)
While running
examples/cluster-rebalance-stores/run.shwith a fresh enterprise license, I observed via tracing that constraints don't reach all nodes uniformly. Across runs the leader (node 1) consistently held both unique + FK constraints locally; nodes 2/3 sometimes held a subset (e.g. node 2 had 0, node 3 had only the FK); a freshly-joined node 4 had 0. Constraints are routed throughschema_partition(entity), so any node that doesn't own that partition has the constraint reachable only via forwarding — not in its localdb_constraints.This PR's
rebuild_fk_indexes_after_importis correct in its scope (it rebuilds for whatever constraints the importing node has locally), but the broader cascade-through-rebalanced-node behavior is not solved by this PR alone — it's governed by constraint replication topology. The CHANGELOG documents this finding as a separate follow-up alongside the schema replication topology issue first noted in the 0.3.2 CHANGELOG entry.Test plan
cargo test -p mqdb-cluster --lib— 466 → 478 cluster lib tests (12 additions):FkReverseIndexunit tests indata_store.rs(insert/lookup/remove, idempotent inserts, removing unknown source ids, field-scoped keys).update_fk_reverse_indexandrebuild_fk_index_for_constraintunit tests inconstraint_ops.rs(Insert/Update/Delete paths, no-op without constraint, malformed JSON skipped, no-op for non-FK constraint).import_partition_rebuilds_fk_reverse_indexinpartition_io.rs— verified by toggling the rebuild call: fails withleft: [], right: ["c1"]when removed, passes when restored.cargo make clippy— clean (pedantic, all targets + wasm).cargo make format-check— clean.cargo make dev— full workspace green.examples/cluster-rebalance-stores/run.shactually run with an enterprise license generated locally. Phase 2 now creates 20 extra child comments (2 per parent). Phase 5 ends with a cascade-via-node-4 observation (not a hard assertion, due to the constraint-replication finding above) showing the fraction of eligible cascades that completed; in runs where the constraint partition lands on a cascade-initiating primary, all eligible children are removed.