You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Discovered while running examples/cluster-rebalance-stores/run.sh on PR #56. Constraints (both unique and foreign_key) end up in the local ConstraintStore of only a subset of cluster nodes. FK enforcement is therefore non-uniform: cascades, RESTRICT, and orphan rejection only work fully on whichever nodes happen to hold the constraint locally.
Evidence
3-node cluster + 4th node joining via rebalance, with a unique constraint on posts.title and an FK constraint comments.post_id → posts.id registered through node 1. Tracing in StoreManager::rebuild_fk_indexes_after_import (added during PR #56 investigation, since removed) showed db_constraints.list_all() returning, per node:
Node
Total constraints
FK constraints
1 (leader, where constraint was registered)
2
1
2
0
0
3
1 (FK only, missing unique)
1
4 (rebalance target)
0
0
E2E observation in the same run: cascade-via-node-4 leaves up to 12 of 12 eligible children alive in runs where schema_partition(\"comments\") doesn't land on node 4 — children whose data partition is on node 4 are missed because no node knows to scatter the FK reverse-lookup to node 4.
Root cause
StoreManager::constraint_add_replicated writes to PartitionId::ZERO and entity DB_CONSTRAINT. The async replication pipeline only delivers that write to nodes that are replicas of the constraint's natural partition (schema_partition(entity) for entity-scoped constraints, or PartitionId::ZERO for the legacy path). Constraints are reachable from any node via forwarding but are absent from every non-replica's local ConstraintStore.
This is the same shape as the schema replication topology gap first noted in the 0.3.2 CHANGELOG entry.
Why it matters
Cascade and RESTRICT enforcement is non-uniform across the cluster:
A delete processed on a node without the FK constraint locally cannot scatter reverse-lookup correctly, so children on other nodes' primaries are missed.
A FK orphan check on a non-constraint-holding node may need a forward, increasing latency and creating a window for inconsistent state.
Rebalance compounds the issue: a node freshly promoted to primary for some data partitions may not hold the FK constraints relevant to that data, so the constraint-aware paths (update_fk_reverse_index, ON DELETE handling) silently no-op for writes hitting that node.
PR #56 closed the FkReverseIndex snapshot gap (the rebuild step is correct for whatever constraints the importing node holds locally), but the broader cascade-through-rebalanced-node behavior won't be fully correct until constraints are reliably present on every node.
Suggested directions
Two options worth weighing:
Cluster-wide broadcast state. Treat db_constraints like topic_index/wildcards/client_locations — apply locally first, then forward to all alive nodes. Constraints are small and rarely change; the overhead is negligible. This would make db_constraints.list_all() authoritative on every node.
Replicate via Raft instead of async. Constraints are cluster-management state, not per-partition data; routing them through Raft (which already replicates to every node) would deliver them uniformly without new broadcast machinery. The downside is coupling constraint adds to leader-elected Raft availability.
Both would benefit from a related fix to schemas for the same reason.
Run produces an OBSERVATION line FK cascade via node 4: N of M eligible children removed with M usually less than 21 — the missing children are those whose data-partition primaries don't hold the FK constraint locally. Re-running yields different counts depending on hash distribution.
Summary
Discovered while running examples/cluster-rebalance-stores/run.sh on PR #56. Constraints (both
uniqueandforeign_key) end up in the localConstraintStoreof only a subset of cluster nodes. FK enforcement is therefore non-uniform: cascades, RESTRICT, and orphan rejection only work fully on whichever nodes happen to hold the constraint locally.Evidence
3-node cluster + 4th node joining via rebalance, with a unique constraint on
posts.titleand an FK constraintcomments.post_id → posts.idregistered through node 1. Tracing inStoreManager::rebuild_fk_indexes_after_import(added during PR #56 investigation, since removed) showeddb_constraints.list_all()returning, per node:E2E observation in the same run: cascade-via-node-4 leaves up to 12 of 12 eligible children alive in runs where
schema_partition(\"comments\")doesn't land on node 4 — children whose data partition is on node 4 are missed because no node knows to scatter the FK reverse-lookup to node 4.Root cause
StoreManager::constraint_add_replicatedwrites toPartitionId::ZEROand entityDB_CONSTRAINT. The async replication pipeline only delivers that write to nodes that are replicas of the constraint's natural partition (schema_partition(entity)for entity-scoped constraints, orPartitionId::ZEROfor the legacy path). Constraints are reachable from any node via forwarding but are absent from every non-replica's localConstraintStore.Code references:
crates/mqdb-cluster/src/cluster/store_manager/constraint_ops.rs:13-37crates/mqdb-cluster/src/cluster/store_manager/apply.rs:208-221crates/mqdb-cluster/src/cluster/db/constraint_store.rs:184-186(entity-scoped,schema_partition(entity_str()))crates/mqdb-cluster/src/cluster/db/constraint_store.rs::export_for_partition— only includes constraints wherec.partition() == partition, so constraints don't ride along on unrelated partition snapshots either.This is the same shape as the schema replication topology gap first noted in the 0.3.2 CHANGELOG entry.
Why it matters
Cascade and RESTRICT enforcement is non-uniform across the cluster:
update_fk_reverse_index, ON DELETE handling) silently no-op for writes hitting that node.PR #56 closed the FkReverseIndex snapshot gap (the rebuild step is correct for whatever constraints the importing node holds locally), but the broader cascade-through-rebalanced-node behavior won't be fully correct until constraints are reliably present on every node.
Suggested directions
Two options worth weighing:
Cluster-wide broadcast state. Treat
db_constraintsliketopic_index/wildcards/client_locations— apply locally first, then forward to all alive nodes. Constraints are small and rarely change; the overhead is negligible. This would makedb_constraints.list_all()authoritative on every node.Replicate via Raft instead of async. Constraints are cluster-management state, not per-partition data; routing them through Raft (which already replicates to every node) would deliver them uniformly without new broadcast machinery. The downside is coupling constraint adds to leader-elected Raft availability.
Both would benefit from a related fix to schemas for the same reason.
Repro
Run produces an OBSERVATION line
FK cascade via node 4: N of M eligible children removedwith M usually less than 21 — the missing children are those whose data-partition primaries don't hold the FK constraint locally. Re-running yields different counts depending on hash distribution.