diff --git a/CHANGELOG.md b/CHANGELOG.md index 70ac8dd..320f4b6 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -13,6 +13,19 @@ Each entry lists the date and the crate versions that were released. - Routed every CLI bench/dev_bench connect through the new helper to close the same bug class for `mqdb bench db` (sync + async + cascade + unique + changefeed), `mqdb bench pubsub`, `mqdb dev bench` (db/pubsub/sub-pub), and the broker-readiness probes in both `bench/common.rs::wait_for_broker_ready` and `dev_bench/helpers.rs::wait_for_broker_ready`. Removed two now-redundant local `connect_client` helpers in `db_cascade.rs` and `db_changefeed.rs`. The `pubsub.rs` paths use custom `ConnectOptions` (clean-start, custom keep-alive) so their connect calls are wrapped inline with the same timeout pattern rather than going through the helper. - Regression test `test_cli_connect_timeout_against_silent_listener` in `crates/mqdb-cli/tests/cli_test.rs` spawns a TCP listener that accepts the connection without speaking MQTT and asserts `mqdb list ... --timeout 2` exits within 5 seconds with a "timed out" error. Verified to fail on main (pre-fix exits at ~6s with "Connection reset by peer") and pass with the fix in place. +## 2026-05-06 — mqdb-cluster 0.3.4 + +### Fixed + +- **Partition snapshot import did not populate `FkReverseIndex`.** This was the "Known gap" called out in the 0.3.2 entry. After a rebalance-driven replica promotion, the new primary held the imported `db_data` records and FK constraints but its in-memory reverse-index cache (`(target_entity, target_id, source_entity, source_field) → {source_id, …}`) was empty for those records. `start_fk_reverse_lookup` and `handle_fk_reverse_lookup_request` would return empty for any record sitting on a newly-imported partition, causing ON DELETE CASCADE to miss children that the new primary owned and ON DELETE RESTRICT to silently allow deletes that should have been blocked. +- `StoreManager::import_partition` now calls a new private `rebuild_fk_indexes_after_import` step at the end of the import. It iterates every registered FK constraint and calls the existing `rebuild_fk_index_for_constraint` helper, which walks `db_data.list(source_entity)` (now populated with the just-imported records) and seeds the reverse index. Mirrors the existing pattern at `apply.rs:215` where constraint Insert via Raft replication triggers the same rebuild. +- Test coverage: 12 new tests (466 → 478 in the cluster lib). Direct `FkReverseIndex` unit tests in `data_store.rs` (insert/lookup/remove, idempotent inserts, removing unknown source ids, field-scoped keys), `update_fk_reverse_index` and `rebuild_fk_index_for_constraint` unit tests in `constraint_ops.rs` (Insert/Update/Delete paths, no-op when no constraints, malformed JSON, non-FK constraint), and a regression test `import_partition_rebuilds_fk_reverse_index` in `partition_io.rs` that confirmed by fail-on-disable / pass-on-restore that the rebuild call is what makes the assertion pass. +- E2E in `examples/cluster-rebalance-stores/run.sh` now creates 20 extra child comments (2 per parent) spread across all 10 parents and adds a cascade-via-node-4 observation: deletes every parent through node 4 after rebalance, then prints how many of the eligible children were cascade-removed. Surfaced as an observation rather than a hard assertion because cascade outcomes through any specific node depend on whether that node has the FK constraint locally, which is governed by schema/constraint replication topology (separate concern; see below). + +### Discovered while running the new E2E (separate follow-up) + +- **Constraints don't reach all nodes uniformly.** Across runs of the new E2E, only the leader (node 1) consistently held both the unique and FK constraints locally; nodes 2/3 sometimes had a subset, and a freshly-joined node 4 had none. Because constraints route through `schema_partition(entity)`, any node that doesn't own that partition reaches the constraint only via forwarding — not in its local `db_constraints` store. The FkReverseIndex rebuild this PR adds is correct in its scope (it rebuilds for whatever constraints the importing node has locally), but a fully-correct cascade through every node requires constraints to be cluster-wide broadcast state. Tracked as future work alongside the schema replication topology issue first noted in the 0.3.2 CHANGELOG entry. + ## 2026-05-03 — mqdb-cluster 0.3.3 ### Fixed @@ -31,7 +44,7 @@ Each entry lists the date and the crate versions that were released. ### Known gaps not addressed here -- `FkReverseIndex` (cluster-wide cache of child-to-parent FK references) is **not** included in partition snapshots. It's not partition-scoped — should be either rebuilt during import via the FK constraint discovery path or treated as a separate broadcast entity. Tracked as future work. +- `FkReverseIndex` (cluster-wide cache of child-to-parent FK references) is **not** included in partition snapshots. It's not partition-scoped — should be either rebuilt during import via the FK constraint discovery path or treated as a separate broadcast entity. Tracked as future work. _Closed in 0.3.4 via the rebuild-during-import approach._ ## 2026-04-25 — mqdb-cli 0.7.5 diff --git a/Cargo.lock b/Cargo.lock index 80b495c..f5610ba 100644 --- a/Cargo.lock +++ b/Cargo.lock @@ -1411,7 +1411,7 @@ dependencies = [ [[package]] name = "mqdb-cluster" -version = "0.3.3" +version = "0.3.4" dependencies = [ "arc-swap", "bebytes", diff --git a/crates/mqdb-cluster/Cargo.toml b/crates/mqdb-cluster/Cargo.toml index 14b0635..3d9ac80 100644 --- a/crates/mqdb-cluster/Cargo.toml +++ b/crates/mqdb-cluster/Cargo.toml @@ -1,6 +1,6 @@ [package] name = "mqdb-cluster" -version = "0.3.3" +version = "0.3.4" publish = false edition.workspace = true license = "AGPL-3.0-only" diff --git a/crates/mqdb-cluster/src/cluster/db/data_store.rs b/crates/mqdb-cluster/src/cluster/db/data_store.rs index b163536..3351fbf 100644 --- a/crates/mqdb-cluster/src/cluster/db/data_store.rs +++ b/crates/mqdb-cluster/src/cluster/db/data_store.rs @@ -802,4 +802,83 @@ mod tests { assert_eq!(results.len(), 1); assert_eq!(results[0].id_str(), "alice"); } + + #[test] + fn fk_reverse_index_insert_lookup_remove() { + let idx = FkReverseIndex::new(); + idx.insert("posts", "p1", "comments", "post_id", "c1"); + idx.insert("posts", "p1", "comments", "post_id", "c2"); + idx.insert("posts", "p2", "comments", "post_id", "c3"); + + let mut p1_refs = idx.lookup("posts", "p1", "comments", "post_id"); + p1_refs.sort(); + assert_eq!(p1_refs, vec!["c1".to_string(), "c2".to_string()]); + + assert_eq!( + idx.lookup("posts", "p2", "comments", "post_id"), + vec!["c3".to_string()], + ); + + assert!( + idx.lookup("posts", "missing", "comments", "post_id") + .is_empty(), + "lookup for unknown target must be empty", + ); + + idx.remove("posts", "p1", "comments", "post_id", "c1"); + assert_eq!( + idx.lookup("posts", "p1", "comments", "post_id"), + vec!["c2".to_string()], + ); + + idx.remove("posts", "p1", "comments", "post_id", "c2"); + assert!( + idx.lookup("posts", "p1", "comments", "post_id").is_empty(), + "removing the last source_id must leave the lookup empty", + ); + } + + #[test] + fn fk_reverse_index_insert_is_idempotent_per_source_id() { + let idx = FkReverseIndex::new(); + idx.insert("posts", "p1", "comments", "post_id", "c1"); + idx.insert("posts", "p1", "comments", "post_id", "c1"); + + assert_eq!( + idx.lookup("posts", "p1", "comments", "post_id"), + vec!["c1".to_string()], + "duplicate inserts of the same source_id must collapse", + ); + } + + #[test] + fn fk_reverse_index_remove_unknown_is_noop() { + let idx = FkReverseIndex::new(); + idx.remove("posts", "p1", "comments", "post_id", "c1"); + assert!(idx.lookup("posts", "p1", "comments", "post_id").is_empty()); + + idx.insert("posts", "p1", "comments", "post_id", "c1"); + idx.remove("posts", "p1", "comments", "post_id", "c-other"); + assert_eq!( + idx.lookup("posts", "p1", "comments", "post_id"), + vec!["c1".to_string()], + "removing a non-matching source_id must not affect existing entries", + ); + } + + #[test] + fn fk_reverse_index_keys_are_field_scoped() { + let idx = FkReverseIndex::new(); + idx.insert("users", "u1", "posts", "author_id", "p1"); + idx.insert("users", "u1", "posts", "editor_id", "p2"); + + assert_eq!( + idx.lookup("users", "u1", "posts", "author_id"), + vec!["p1".to_string()], + ); + assert_eq!( + idx.lookup("users", "u1", "posts", "editor_id"), + vec!["p2".to_string()], + ); + } } diff --git a/crates/mqdb-cluster/src/cluster/store_manager/constraint_ops.rs b/crates/mqdb-cluster/src/cluster/store_manager/constraint_ops.rs index 608e10f..2cd43b4 100644 --- a/crates/mqdb-cluster/src/cluster/store_manager/constraint_ops.rs +++ b/crates/mqdb-cluster/src/cluster/store_manager/constraint_ops.rs @@ -177,3 +177,190 @@ impl StoreManager { } } } + +#[cfg(test)] +mod tests { + use super::*; + use crate::cluster::NodeId; + use crate::cluster::db::{ClusterConstraint, OnDeleteAction}; + + fn node(id: u16) -> NodeId { + NodeId::validated(id).unwrap() + } + + fn fk_comments_to_posts() -> ClusterConstraint { + ClusterConstraint::foreign_key( + "comments", + "fk_comment_post", + "post_id", + "posts", + "id", + OnDeleteAction::Cascade, + ) + } + + #[test] + fn update_fk_reverse_index_inserts_on_create() { + let store = StoreManager::new(node(1)); + store.db_constraints.add(fk_comments_to_posts()).unwrap(); + + store.update_fk_reverse_index( + Operation::Insert, + "comments", + "c1", + Some(b"{\"post_id\":\"p1\"}"), + None, + ); + + assert_eq!( + store.fk_reverse_lookup("posts", "p1", "comments", "post_id"), + vec!["c1".to_string()], + ); + } + + #[test] + fn update_fk_reverse_index_is_noop_without_fk_constraint() { + let store = StoreManager::new(node(1)); + + store.update_fk_reverse_index( + Operation::Insert, + "comments", + "c1", + Some(b"{\"post_id\":\"p1\"}"), + None, + ); + + assert!( + store + .fk_reverse_lookup("posts", "p1", "comments", "post_id") + .is_empty(), + "no constraint registered → no reverse index entry", + ); + } + + #[test] + fn update_fk_reverse_index_handles_update_repoint() { + let store = StoreManager::new(node(1)); + store.db_constraints.add(fk_comments_to_posts()).unwrap(); + + store.update_fk_reverse_index( + Operation::Insert, + "comments", + "c1", + Some(b"{\"post_id\":\"p1\"}"), + None, + ); + store.update_fk_reverse_index( + Operation::Update, + "comments", + "c1", + Some(b"{\"post_id\":\"p2\"}"), + Some(b"{\"post_id\":\"p1\"}"), + ); + + assert!( + store + .fk_reverse_lookup("posts", "p1", "comments", "post_id") + .is_empty(), + "update must remove the old reference", + ); + assert_eq!( + store.fk_reverse_lookup("posts", "p2", "comments", "post_id"), + vec!["c1".to_string()], + "update must add the new reference", + ); + } + + #[test] + fn update_fk_reverse_index_removes_on_delete() { + let store = StoreManager::new(node(1)); + store.db_constraints.add(fk_comments_to_posts()).unwrap(); + + store.update_fk_reverse_index( + Operation::Insert, + "comments", + "c1", + Some(b"{\"post_id\":\"p1\"}"), + None, + ); + store.update_fk_reverse_index( + Operation::Delete, + "comments", + "c1", + None, + Some(b"{\"post_id\":\"p1\"}"), + ); + + assert!( + store + .fk_reverse_lookup("posts", "p1", "comments", "post_id") + .is_empty(), + "delete must drop the reverse index entry", + ); + } + + #[test] + fn update_fk_reverse_index_skips_malformed_payloads() { + let store = StoreManager::new(node(1)); + store.db_constraints.add(fk_comments_to_posts()).unwrap(); + + store.update_fk_reverse_index(Operation::Insert, "comments", "c1", Some(b"not json"), None); + + assert!( + store + .fk_reverse_lookup("posts", "p1", "comments", "post_id") + .is_empty(), + "malformed JSON must be silently skipped, not panic", + ); + } + + #[test] + fn rebuild_fk_index_for_constraint_walks_existing_records() { + let store = StoreManager::new(node(1)); + let fk = fk_comments_to_posts(); + + store + .db_data + .create("comments", "c1", b"{\"post_id\":\"p1\"}", 1_000) + .unwrap(); + store + .db_data + .create("comments", "c2", b"{\"post_id\":\"p1\"}", 1_000) + .unwrap(); + store + .db_data + .create("comments", "c3", b"{\"post_id\":\"p2\"}", 1_000) + .unwrap(); + + store.rebuild_fk_index_for_constraint(&fk); + + let mut p1_refs = store.fk_reverse_lookup("posts", "p1", "comments", "post_id"); + p1_refs.sort(); + assert_eq!(p1_refs, vec!["c1".to_string(), "c2".to_string()]); + + assert_eq!( + store.fk_reverse_lookup("posts", "p2", "comments", "post_id"), + vec!["c3".to_string()], + ); + } + + #[test] + fn rebuild_fk_index_for_constraint_is_noop_for_unique_constraint() { + let store = StoreManager::new(node(1)); + let unique = ClusterConstraint::unique("comments", "uniq_text", "text"); + + store + .db_data + .create("comments", "c1", b"{\"post_id\":\"p1\"}", 1_000) + .unwrap(); + + store.rebuild_fk_index_for_constraint(&unique); + + assert!( + store + .fk_reverse_lookup("posts", "p1", "comments", "post_id") + .is_empty(), + "non-FK constraint must not populate the reverse index", + ); + } +} diff --git a/crates/mqdb-cluster/src/cluster/store_manager/partition_io.rs b/crates/mqdb-cluster/src/cluster/store_manager/partition_io.rs index d7f184b..4202734 100644 --- a/crates/mqdb-cluster/src/cluster/store_manager/partition_io.rs +++ b/crates/mqdb-cluster/src/cluster/store_manager/partition_io.rs @@ -190,9 +190,19 @@ impl StoreManager { total_imported += imported; } + self.rebuild_fk_indexes_after_import(); + Ok(total_imported) } + fn rebuild_fk_indexes_after_import(&self) { + for constraint in self.db_constraints.list_all() { + if constraint.is_foreign_key() { + self.rebuild_fk_index_for_constraint(&constraint); + } + } + } + pub fn clear_partition(&self, partition: PartitionId) -> usize { let mut total_cleared = 0; total_cleared += self.sessions.clear_partition(partition); @@ -396,4 +406,67 @@ mod tests { "db_fk for {entity}" ); } + + #[test] + fn import_partition_rebuilds_fk_reverse_index() { + use crate::cluster::db::{ClusterConstraint, OnDeleteAction}; + + let src = StoreManager::new(node(1)); + let dst = StoreManager::new(node(2)); + + let fk = ClusterConstraint::foreign_key( + "comments", + "fk_comment_post", + "post_id", + "posts", + "id", + OnDeleteAction::Cascade, + ); + src.db_constraints.add(fk.clone()).unwrap(); + dst.db_constraints.add(fk).unwrap(); + + src.db_data + .create("posts", "p1", b"{\"title\":\"hello\"}", 1_000) + .unwrap(); + src.db_data + .create( + "comments", + "c1", + b"{\"post_id\":\"p1\",\"text\":\"first\"}", + 1_000, + ) + .unwrap(); + src.update_fk_reverse_index( + crate::cluster::protocol::Operation::Insert, + "comments", + "c1", + Some(b"{\"post_id\":\"p1\",\"text\":\"first\"}"), + None, + ); + + assert_eq!( + src.fk_reverse_lookup("posts", "p1", "comments", "post_id"), + vec!["c1".to_string()], + "src reverse index must hold c1 → p1 before export", + ); + + let child_partition = src.db_data.get("comments", "c1").unwrap().partition(); + + let payload = src.export_partition(child_partition); + assert!(!payload.is_empty(), "snapshot payload must not be empty"); + let imported = dst.import_partition(&payload).expect("import"); + assert!(imported >= 1, "child record must survive import"); + + assert!( + dst.db_data.get("comments", "c1").is_some(), + "child must be present on destination after import", + ); + + let refs = dst.fk_reverse_lookup("posts", "p1", "comments", "post_id"); + assert_eq!( + refs, + vec!["c1".to_string()], + "destination reverse index must hold c1 after import — without the rebuild step this would be empty", + ); + } } diff --git a/docs/distributed-design.md b/docs/distributed-design.md index bd53160..f249391 100644 --- a/docs/distributed-design.md +++ b/docs/distributed-design.md @@ -2270,6 +2270,16 @@ constraint is added after data exists, `rebuild_fk_index_for_constraint` backfil the index by scanning existing records. This replaces the earlier O(n) full-table scan with O(1) lookups per reverse query. +The reverse index is also rebuilt after partition snapshot import (rebalance-driven +replica promotion). `StoreManager::import_partition` ends with +`rebuild_fk_indexes_after_import`, which iterates every locally-known FK constraint +and reuses `rebuild_fk_index_for_constraint` so the just-imported records are +immediately reachable from cascade and RESTRICT enforcement on the new primary. +The reverse index is intentionally not part of the snapshot wire format — it's +derivable from the imported data plus constraint definitions, so deriving it on +import avoids cross-cutting partition concerns (a single reverse-index entry maps +to no single partition). + ### Cascade and Set-Null on Delete When a record is deleted, the node performs a reverse FK lookup via the in-memory diff --git a/examples/cluster-rebalance-stores/README.md b/examples/cluster-rebalance-stores/README.md index 940d10f..17ecd09 100644 --- a/examples/cluster-rebalance-stores/README.md +++ b/examples/cluster-rebalance-stores/README.md @@ -1,16 +1,20 @@ # cluster-rebalance-stores E2E -Diagnostic end-to-end test that exercises the partition-snapshot path added -in `crates/mqdb-cluster/src/cluster/store_manager/partition_io.rs` for the -five DB stores beyond `db_data`. +Diagnostic end-to-end test that exercises the partition-snapshot path in +`crates/mqdb-cluster/src/cluster/store_manager/partition_io.rs` — both the +five DB stores beyond `db_data` (PR #54) and the FK reverse-index rebuild +that runs after import (PR #56). ## What it does 1. Starts a 3-node cluster (QUIC transport, mTLS). -2. Registers schemas, a unique constraint, and a foreign-key constraint. -3. Populates posts + a child comment. +2. Registers schemas, a unique constraint, and a foreign-key constraint + (`comments.post_id → posts.id` with `on_delete: cascade`). +3. Populates 10 posts and 21 child comments (the canonical first child plus + 2 extras per parent, so every parent owns multiple children). 4. Joins a 4th node, which forces partitions to rebalance onto it. -5. After rebalance, queries each store via node 4 and asserts behavior. +5. After rebalance, queries each store via node 4 and asserts behavior, + ending with a cascade-delete sweep of every parent through node 4. The 4th-node join is what triggers replica promotion via partition snapshots. Whatever node 4 ends up serving has to come from the snapshot stream. @@ -38,17 +42,24 @@ script invokes `target/release/mqdb` directly. **Observations** (printed but do not affect exit code): -- `SchemaStore`, `ConstraintStore`: schemas and constraints are replicated - through `PartitionId::ZERO` today (see `node_controller/db_ops.rs` and - `store_manager/constraint_ops.rs`). The current snapshot fix adds them - to the export wire stream, but whether a given schema/constraint arrives - on node 4 depends on whether the partition matching `schema_partition()` - of the entity actually rebalanced to node 4. This is a cluster design - issue (schemas should arguably be cluster-wide broadcast state, not - partition-scoped) that is broader than the snapshot fix. +- `SchemaStore`, `ConstraintStore`: even though both are listed as broadcast + entities at `node_controller/replication_ops.rs:81-85`, empirical runs of + this script show they don't reach every node uniformly. A given + schema/constraint arrives on node 4 only if either the broadcast was + delivered while node 4 was alive, or the partition matching + `schema_partition()` of the entity rebalanced to node 4. Tracked as + separate follow-ups in issues #57 (constraints) and #58 (schemas). - `UniqueStore`: duplicates are mostly rejected via node 4, but transient acceptances can occur immediately after promotion while 2-phase forwards settle. The script retries each duplicate before counting it as accepted. +- **`FkReverseIndex` cascade via node 4**: deletes every parent through + node 4 and counts how many of the eligible child comments were + cascade-removed. The reverse-index rebuild added in PR #56 ensures node + 4's local index is populated for whatever FK constraints node 4 holds + locally; the cascade is then complete *if* the cascade-initiating + primary also holds the FK constraint locally. Otherwise the cascade is + partial — gated by the upstream constraint-replication issue #57, not + by the snapshot fix. **Stores not exercised here** (covered by unit roundtrip tests instead): diff --git a/examples/cluster-rebalance-stores/run.sh b/examples/cluster-rebalance-stores/run.sh index a129c11..789008a 100755 --- a/examples/cluster-rebalance-stores/run.sh +++ b/examples/cluster-rebalance-stores/run.sh @@ -272,11 +272,67 @@ else FAIL=$((FAIL + 1)) fi +# Track child→parent mapping via parallel arrays (bash 3.2 compatible — no +# associative arrays). COMMENT_IDS[i] and COMMENT_PARENTS[i] are paired by +# index. +declare -a COMMENT_IDS +declare -a COMMENT_PARENTS + PARENT_FOR_CHILD="${POST_IDS[0]}" -RESP=$("$MQDB_BIN" create comments "${CLI_ARGS[@]}" \ - -d "{\"post_id\":\"$PARENT_FOR_CHILD\",\"text\":\"first comment\"}" 2>/dev/null) -assert_eq "create child comment" "$(json_field "$RESP" "status")" "ok" -COMMENT_ID=$(json_field "$RESP" "data.id") +COMMENT_ID="" +for attempt in 1 2 3; do + RESP=$("$MQDB_BIN" create comments "${CLI_ARGS[@]}" \ + -d "{\"post_id\":\"$PARENT_FOR_CHILD\",\"text\":\"first comment\"}" 2>/dev/null) + if [[ "$(json_field "$RESP" "status")" == "ok" ]]; then + COMMENT_ID=$(json_field "$RESP" "data.id") + break + fi + sleep 1 +done +TOTAL=$((TOTAL + 1)) +if [[ -n "$COMMENT_ID" ]]; then + echo " PASS: create child comment" + PASS=$((PASS + 1)) + COMMENT_IDS+=("$COMMENT_ID") + COMMENT_PARENTS+=("$PARENT_FOR_CHILD") +else + echo " FAIL: create child comment (after retries)" + FAIL=$((FAIL + 1)) +fi + +# Spread 20 extra comments across all 10 parents (2 per parent) so that every +# post has multiple children. Phase 5's cascade-delete assertion needs every +# parent to own at least one child whose data partition might land on node 4. +# Each create is retried up to 3 times to absorb transient routing failures +# during cluster warmup. +EXTRA_COMMENTS_OK=0 +for pid in "${POST_IDS[@]}"; do + for n in 1 2; do + cid="" + for attempt in 1 2 3; do + RESP=$("$MQDB_BIN" create comments "${CLI_ARGS[@]}" \ + -d "{\"post_id\":\"$pid\",\"text\":\"extra-$n on $pid\"}" 2>/dev/null) + if [[ "$(json_field "$RESP" "status")" == "ok" ]]; then + cid=$(json_field "$RESP" "data.id") + break + fi + sleep 1 + done + if [[ -n "$cid" ]]; then + COMMENT_IDS+=("$cid") + COMMENT_PARENTS+=("$pid") + EXTRA_COMMENTS_OK=$((EXTRA_COMMENTS_OK + 1)) + fi + done +done +TOTAL=$((TOTAL + 1)) +if [[ $EXTRA_COMMENTS_OK -eq 20 ]]; then + echo " PASS: created 20 extra child comments across $NUM_POSTS posts" + PASS=$((PASS + 1)) +else + echo " FAIL: only created $EXTRA_COMMENTS_OK of 20 extra child comments (after retries)" + FAIL=$((FAIL + 1)) +fi echo "" echo "=== Phase 3: sanity checks against node 1 (pre-rebalance) ===" @@ -430,6 +486,95 @@ else "expected when partition schema_partition(\"comments\") for the FK definition did not rebalance to node 4" fi +# --------------------------------------------------------------------------- +# Diagnostic observation: FK CASCADE through node 4. The reverse-index rebuild +# in this PR is a per-node operation — it can only populate node 4's index for +# the constraints node 4 has locally. Constraints in MQDB are routed through +# `schema_partition(entity)`, so any node that doesn't own that partition has +# the constraint reachable only via forwarding, not in its local store. That +# means cascade outcomes through any specific node depend on whether that node +# has the constraint locally, which is a separate cluster-wide concern (see +# "schema/constraint replication topology" in the 0.3.4 CHANGELOG entry). +# +# Surfaced as an observation rather than a hard assertion: prints success/fail +# counts so a human can correlate with whichever node ended up owning the +# constraint partition, but does not gate the script's exit code. +# --------------------------------------------------------------------------- + +echo "" +echo " [FkReverseIndex cascade observation] delete every parent via node 4..." +DELETED_PARENT_IDS="" +DEL_OK=0 +DEL_FAIL=0 +for pid in "${POST_IDS[@]}"; do + deleted=0 + for attempt in 1 2 3; do + RESP=$("$MQDB_BIN" delete posts "$pid" "${NODE4_ARGS[@]}" 2>/dev/null) + if [[ "$(json_field "$RESP" "status")" == "ok" ]]; then + deleted=1; break + fi + sleep 2 + done + if [[ $deleted -eq 1 ]]; then + DEL_OK=$((DEL_OK + 1)) + DELETED_PARENT_IDS="$DELETED_PARENT_IDS|$pid|" + else + DEL_FAIL=$((DEL_FAIL + 1)) + fi +done +echo " Deleted $DEL_OK of $NUM_POSTS posts ($DEL_FAIL transient failures)" + +# Allow cascade to settle. +sleep 5 + +EXPECTED_CASCADE=0 +SURVIVORS=() +SKIPPED_OF_UNDELETED=0 +for i in "${!COMMENT_IDS[@]}"; do + cid="${COMMENT_IDS[$i]}" + parent="${COMMENT_PARENTS[$i]}" + found=0 + for attempt in 1 2 3; do + RESP=$("$MQDB_BIN" read comments "$cid" "${NODE4_ARGS[@]}" 2>/dev/null) + status=$(json_field "$RESP" "status") + if [[ "$status" == "ok" ]]; then + found=1; break + fi + if [[ "$status" == "error" ]]; then + break + fi + sleep 1 + done + parent_was_deleted=0 + case "$DELETED_PARENT_IDS" in + *"|$parent|"*) parent_was_deleted=1 ;; + esac + if [[ $parent_was_deleted -eq 1 ]]; then + EXPECTED_CASCADE=$((EXPECTED_CASCADE + 1)) + if [[ $found -eq 1 ]]; then + SURVIVORS+=("$cid (parent=$parent)") + fi + elif [[ $found -eq 1 ]]; then + SKIPPED_OF_UNDELETED=$((SKIPPED_OF_UNDELETED + 1)) + fi +done + +CASCADED=$((EXPECTED_CASCADE - ${#SURVIVORS[@]})) +if [[ ${#SURVIVORS[@]} -eq 0 ]]; then + observe "FK cascade via node 4: all $EXPECTED_CASCADE eligible children removed" "ok" \ + "constraint reachable from cascade-initiating primaries" +else + observe "FK cascade via node 4: $CASCADED of $EXPECTED_CASCADE eligible children removed" "partial" \ + "${#SURVIVORS[@]} survivors — depends on whether the constraint partition lives on the cascade-initiating node; this is the schema/constraint replication gap, not the FkReverseIndex rebuild" + for s in "${SURVIVORS[@]:0:3}"; do + echo " survivor: $s" + done + [[ ${#SURVIVORS[@]} -gt 3 ]] && echo " ... and $((${#SURVIVORS[@]} - 3)) more" +fi +if [[ $SKIPPED_OF_UNDELETED -gt 0 ]]; then + echo " (skipped $SKIPPED_OF_UNDELETED children of undeleted parents)" +fi + echo "" echo "==========================================" echo " Hard assertions:"