schemas not replicated uniformly across cluster nodes

## Summary

Sibling of #57. The same broadcast plumbing that fails to deliver constraints uniformly also covers schemas: `DB_SCHEMA` is in the broadcast entity list at `crates/mqdb-cluster/src/cluster/node_controller/replication_ops.rs:81-85` and `:345-349`. The 0.3.2 CHANGELOG already flagged "schema/constraint replication topology" as a known gap; this issue tracks the schema half explicitly so it isn't lost behind the constraint fix.

## Symptoms

In the cluster-rebalance-stores E2E run (PR #56), an existing observation block already prints whether `schema get posts` and `schema get comments` succeed via node 4 after a rebalance-driven join. Across runs the result varies based on which `schema_partition(entity)` happens to land on node 4 — when the partition isn't there, node 4 reaches the schema only via forwarding to whichever node holds it locally. Tracing on the constraint side (issue #57) directly showed `db_constraints.list_all()` returning differently per node (1: 2, 2: 0, 3: 1, 4: 0). Schemas use the identical write/replication path so the same shape applies.

## Why it matters specifically for schemas

- **Write validation.** `SchemaStore::is_valid_for_write` (`crates/mqdb-cluster/src/cluster/db/schema_store.rs`) is consulted on every per-entity write. A node that doesn't hold the schema locally either has to forward the validation check (extra hop) or skip it (silent acceptance of malformed records).
- **Schema state machine.** Transitions through `SchemaState::{Active, PendingAdd, PendingDrop, Dropped}` are point-in-time decisions. If node A still sees `Active` while node B has advanced to `PendingDrop`, writes routed to one node can be accepted while the other rejects them — observable as flapping behavior on the same record.
- **`mqdb schema list` / `schema get`.** Both queries return only what the receiving node knows locally unless explicit forwarding is wired in, so the visible schema set depends on which node the client connects to.
- **Snapshot delivery on join (PR #54).** A new node's partition snapshot only includes schemas where `c.partition() == partition`. Schemas whose natural partition didn't rebalance to the new node never arrive in its `SchemaStore` — same as constraints.

## Root cause (shared with #57)

Schemas are written via a `ReplicationWrite` with the entity's own `schema_partition` — so the broadcast loop at `replication_ops.rs:350-369` should send to every alive node. Empirically (per the constraint trace in #57) those broadcasts don't reliably reach every node, and joining nodes only catch up via per-partition snapshots which are scoped to the partition being snapshotted. The two failure modes:

1. **Live broadcast misses.** A node marked alive by heartbeat may still drop or fail to apply incoming `WriteRequest`s during warmup, with no retry. `handle_write_request` logs and continues on `apply_write` failure (`replication_ops.rs:95-97`).
2. **Late-join replay.** A new node receives partition snapshots only for partitions assigned to it. Schemas living in unassigned partitions are never delivered.

## Suggested directions (same options as #57)

1. **Promote schemas to true broadcast state** — apply locally first, then dual-write to every alive node WITH retry/idempotency, identical to how `topic_index` / `wildcards` / `client_locations` are handled. Schemas are small and rarely change; the cost is negligible.
2. **Replicate via Raft.** Schemas are cluster-management data, not per-partition workload. Routing through Raft (which already replicates the log to every node) gives uniform delivery for free, at the cost of coupling schema mutations to leader-elected availability.
3. **On node join, request a full schema/constraint catalog from the leader** — independent of partition snapshots. A small explicit catch-up step alongside the snapshot stream, instead of relying on partition assignments to carry metadata.

A combined design touching both schemas and constraints would close #57 and this together.

## Repro

Same as #57: run [examples/cluster-rebalance-stores/run.sh](https://github.com/LabOverWire/MQDB/blob/main/examples/cluster-rebalance-stores/run.sh) with an enterprise license. The "OBSERVATION (missing): schema get comments via node 4" output line in failing runs reflects this gap. To prove the local-store gap directly, add a `tracing::info!` in `SchemaStore::apply_replicated` printing the count after each apply and re-run.

## Related

- #57 — companion issue for constraints (same plumbing, same failure modes).
- PR #56 — closes the FK reverse-index snapshot gap; that fix is correct in its own scope but cascade-through-rebalanced-node behavior depends on the constraint half of this pair landing.
- 0.3.2 CHANGELOG — original mention of "schema/constraint replication topology" as a known gap.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

schemas not replicated uniformly across cluster nodes #58

Summary

Symptoms

Why it matters specifically for schemas

Root cause (shared with #57)

Suggested directions (same options as #57)

Repro

Related

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

schemas not replicated uniformly across cluster nodes #58

Description

Summary

Symptoms

Why it matters specifically for schemas

Root cause (shared with #57)

Suggested directions (same options as #57)

Repro

Related

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions