Skip to content

rebuild fk reverse index after partition snapshot import#56

Merged
fabracht merged 3 commits intomainfrom
cluster-fk-reverse-index-snapshot
May 7, 2026
Merged

rebuild fk reverse index after partition snapshot import#56
fabracht merged 3 commits intomainfrom
cluster-fk-reverse-index-snapshot

Conversation

@fabracht
Copy link
Copy Markdown
Contributor

@fabracht fabracht commented May 4, 2026

Summary

  • Closes the "Known gap" called out in PR add partition snapshot exports for schema/index/unique/fk/constraint stores #54's CHANGELOG (mqdb-cluster 0.3.2). After a rebalance-driven replica promotion, the new primary received the imported db_data and FK constraints via the partition snapshot, but its in-memory FkReverseIndex cache stayed empty for the imported records. start_fk_reverse_lookup and handle_fk_reverse_lookup_request returned empty for any record sitting on a newly-imported partition — ON DELETE CASCADE missed children that the new primary owned, ON DELETE RESTRICT silently allowed deletes that should have been blocked.
  • StoreManager::import_partition now ends with a private rebuild_fk_indexes_after_import step. It iterates every registered FK constraint and calls the existing rebuild_fk_index_for_constraint helper, which walks db_data.list(source_entity) (now populated with the just-imported records) and seeds the reverse index. Mirrors the existing pattern in apply_db_constraint where constraint Insert via Raft replication triggers the same rebuild.
  • mqdb-cluster 0.3.3 → 0.3.4. CHANGELOG entry under 2026-05-03.

Why import-side only

StoreManager::clear_partition is never called anywhere in the codebase, so there is no demotion path that creates stale entries. The downstream is_primary_for_partition filter (node_controller/fk.rs:357-365) drops any reverse-index entry the node no longer claims primaryship for, so even theoretical staleness is masked. Tracked as future work if clear_partition ever gets wired in.

Discovered while running the new E2E (separate follow-up)

While running examples/cluster-rebalance-stores/run.sh with a fresh enterprise license, I observed via tracing that constraints don't reach all nodes uniformly. Across runs the leader (node 1) consistently held both unique + FK constraints locally; nodes 2/3 sometimes held a subset (e.g. node 2 had 0, node 3 had only the FK); a freshly-joined node 4 had 0. Constraints are routed through schema_partition(entity), so any node that doesn't own that partition has the constraint reachable only via forwarding — not in its local db_constraints.

This PR's rebuild_fk_indexes_after_import is correct in its scope (it rebuilds for whatever constraints the importing node has locally), but the broader cascade-through-rebalanced-node behavior is not solved by this PR alone — it's governed by constraint replication topology. The CHANGELOG documents this finding as a separate follow-up alongside the schema replication topology issue first noted in the 0.3.2 CHANGELOG entry.

Test plan

  • cargo test -p mqdb-cluster --lib — 466 → 478 cluster lib tests (12 additions):
    • 4 direct FkReverseIndex unit tests in data_store.rs (insert/lookup/remove, idempotent inserts, removing unknown source ids, field-scoped keys).
    • 7 update_fk_reverse_index and rebuild_fk_index_for_constraint unit tests in constraint_ops.rs (Insert/Update/Delete paths, no-op without constraint, malformed JSON skipped, no-op for non-FK constraint).
    • 1 integration test import_partition_rebuilds_fk_reverse_index in partition_io.rs — verified by toggling the rebuild call: fails with left: [], right: ["c1"] when removed, passes when restored.
  • cargo make clippy — clean (pedantic, all targets + wasm).
  • cargo make format-check — clean.
  • cargo make dev — full workspace green.
  • Pre-commit hook (format-check + clippy) ran on each commit and passed.
  • examples/cluster-rebalance-stores/run.sh actually run with an enterprise license generated locally. Phase 2 now creates 20 extra child comments (2 per parent). Phase 5 ends with a cascade-via-node-4 observation (not a hard assertion, due to the constraint-replication finding above) showing the fraction of eligible cascades that completed; in runs where the constraint partition lands on a cascade-initiating primary, all eligible children are removed.

@fabracht fabracht force-pushed the cluster-fk-reverse-index-snapshot branch from 6c76b28 to 99e740f Compare May 7, 2026 02:30
@fabracht fabracht merged commit 6f232e4 into main May 7, 2026
5 checks passed
@fabracht fabracht deleted the cluster-fk-reverse-index-snapshot branch May 7, 2026 14:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant