Summary
handle_tick schedules the next tick after on_tick returns, with a delay computed from the timestamp captured at entry (crates/blockchain/src/lib.rs:694-708 on main):
let now_ms = ...; // captured at tick entry
self.on_tick(now_ms, ctx).await; // may run long (block building proves inline)
let ms_to_next_interval = ms_until_next_interval(now_ms, genesis_time_ms);
send_after(Duration::from_millis(ms_to_next_interval), ...);
If on_tick takes longer than the remaining time to the next interval boundary, the next tick fires past that boundary and the skipped interval's duty never runs.
Concrete case: the tick at interval 0 builds a block. Block building proves inline on the actor thread (proposer signature aggregation / proof merging, ~1.3s observed). With entry at t=0, ms_until_next_interval = 800ms, but send_after only starts counting at t≈1300ms → next tick fires at t≈2100ms, which is interval 2. Interval 1 — attestation production — is silently skipped.
Impact
- A proposing node produces no attestations for any slot it proposes, whenever block building exceeds 800ms (which it does whenever it proves).
- N-node devnets lose ~1/N of attestations every slot.
- 1-node devnets can never justify (proposer every slot → zero attestations after slot 0); 2-node devnets can't reach the 2/3 threshold either. Finalization requires ≥4 nodes purely because of this skip.
- Any other duty overrun has the same effect on subsequent intervals (e.g. a slow interval-2 tick would skip the interval-3 safe-target update).
Evidence
Single-node devnet (4 validators, aggregator), devnet5 + leanVM 0520822, release build:
- Slot 0: 4 attestations published at interval 1, aggregation completes (interval 0 had no block build).
- Slots 1–26:
Building block slot=N every slot (each ~1.3s incl. inline prove), zero Published attestation lines, justified_slot=0 finalized_slot=0 throughout.
- Same topology with 4 nodes (proposer rotates, 3/4 attest each slot): justification and finalization advance normally — confirming the mechanism rather than any crypto issue.
Possible directions
- Schedule each tick for the next interval boundary regardless of how long
on_tick took (compute the delay from a fresh timestamp, and if a boundary was missed, fire immediately so the skipped interval's duty still runs, possibly tagged with its intended interval).
- Move block building off the actor thread (
spawn_blocking, like the aggregation worker) so on_tick returns within the interval.
- Derive slot/interval from the tick's scheduled time instead of the wall clock at processing time, so late ticks still execute their intended duty.
(1) and (3) change catch-up semantics for genuinely-late nodes and need care; (2) is the most contained but only fixes the block-building instance.
Found while soak-testing the zk-alloc allocator (#412); the bug is independent of that PR and reproduces with the default allocator whenever block building exceeds one interval.
Summary
handle_tickschedules the next tick afteron_tickreturns, with a delay computed from the timestamp captured at entry (crates/blockchain/src/lib.rs:694-708onmain):If
on_ticktakes longer than the remaining time to the next interval boundary, the next tick fires past that boundary and the skipped interval's duty never runs.Concrete case: the tick at interval 0 builds a block. Block building proves inline on the actor thread (proposer signature aggregation / proof merging, ~1.3s observed). With entry at t=0,
ms_until_next_interval= 800ms, butsend_afteronly starts counting at t≈1300ms → next tick fires at t≈2100ms, which is interval 2. Interval 1 — attestation production — is silently skipped.Impact
Evidence
Single-node devnet (4 validators, aggregator),
devnet5+ leanVM 0520822, release build:Building block slot=Nevery slot (each ~1.3s incl. inline prove), zeroPublished attestationlines,justified_slot=0 finalized_slot=0throughout.Possible directions
on_ticktook (compute the delay from a fresh timestamp, and if a boundary was missed, fire immediately so the skipped interval's duty still runs, possibly tagged with its intended interval).spawn_blocking, like the aggregation worker) soon_tickreturns within the interval.(1) and (3) change catch-up semantics for genuinely-late nodes and need care; (2) is the most contained but only fixes the block-building instance.
Found while soak-testing the zk-alloc allocator (#412); the bug is independent of that PR and reproduces with the default allocator whenever block building exceeds one interval.