Skip to content

Add lean_block_proposal_* metrics for block-proposal attestation selection#414

Merged
MegaRedHand merged 2 commits into
mainfrom
metrics-block-proposal-attestation-build
Jun 3, 2026
Merged

Add lean_block_proposal_* metrics for block-proposal attestation selection#414
MegaRedHand merged 2 commits into
mainfrom
metrics-block-proposal-attestation-build

Conversation

@pablodeymo
Copy link
Copy Markdown
Collaborator

🗒️ Description / Motivation

Ports the five lean_block_proposal_* observability metrics from leanSpec PR #753 into the ethlambda block builder.

These metrics give cross-client visibility into the block-proposal attestation-selection path (build_block): how long each phase takes, how many proposal builds run, how many child payloads are greedily consumed, and how many distinct AttestationData / aggregated proofs end up in the proposed block. They align with zeam #914's getProposalAttestations instrumentation and the leanSpec naming so the leanMetrics dashboards work across clients.

What Changed

crates/blockchain/src/metrics.rs — five new metrics registered with the existing LazyLock + register_*! pattern, a BLOCK_PROPOSAL_ATTESTATION_BUILD_PHASES label constant, registration in init(), public API functions, and a unit test:

Metric Type Buckets / Labels
lean_block_proposal_attestation_build_phase_seconds HistogramVec phase = select_payloads, compact, stf_simulate; buckets 0.001…8
lean_block_proposal_attestation_builds_total Counter one per proposal attempt
lean_block_proposal_child_payloads_consumed_total Counter greedily-picked proofs before compaction
lean_block_proposal_attestation_data_selected Histogram buckets 0, 1, 2, 4, 8, 16, 32
lean_block_proposal_aggregates_selected Histogram buckets 0, 1, 2, 4, 8, 16, 32, 64, 128

crates/blockchain/src/block_builder.rs — instruments build_block: times the select_attestations, compact_attestations, and STF (process_slots + process_block) phases, and emits the counters/histograms after a successful build.

docs/metrics.md — documents all five in the Block Production Metrics table.

Correctness / Behavior Guarantees

  • No behavior change. Only metric observations were added around existing logic; block contents, selection order, and the state-transition path are untouched.
  • Architectural divergence from leanSpec, documented. leanSpec re-runs the STF inside a fixed-point loop and observes stf_simulate per round. ethlambda projects justification/finalization incrementally during selection and runs the STF exactly once at the end, so its stf_simulate is a single observation per build. This is noted on the BLOCK_PROPOSAL_ATTESTATION_BUILD_PHASES doc comment, consistent with the upstream PR's own caveat that phase timings are not directly comparable across clients.
  • attestation_data_selected and aggregates_selected are observed from the post-compaction body (one merged proof per distinct AttestationData), matching the spec's intent.
  • Metrics are only emitted on a successful build; a build that errors out in the STF is already counted by the existing lean_block_building_failures_total.

Tests Added / Run

  • New unit test metrics::tests::block_proposal_attestation_build_metrics_are_usable — verifies the phase metric registers and accepts every label in BLOCK_PROPOSAL_ATTESTATION_BUILD_PHASES, and that the companion counters/histograms are callable. Guards against drift between the label constant and the strings passed at the build_block call sites.
  • Existing block_builder tests (build_block_*, compact_attestations_*, extend_proofs_greedily_*) now exercise the new metric paths and pass unchanged.

Commands run:

  • make fmt — clean
  • cargo clippy -p ethlambda-blockchain --all-targets -- -D warnings — clean
  • cargo test -p ethlambda-blockchain --lib — 23 passing

Related Issues / PRs

✅ Verification Checklist

  • Ran make fmt — clean
  • Ran make lint (clippy with -D warnings) — clean
  • Ran cargo test --workspace --release — only ethlambda-blockchain lib suite run (23 passing); full workspace release run not yet executed

…ction

Port the five block-proposal observability metrics from leanSpec PR #753 into
the ethlambda block builder:

- lean_block_proposal_attestation_build_phase_seconds{phase} — phase-level
  timing for select_payloads, compact, and stf_simulate
- lean_block_proposal_attestation_builds_total — completed selection runs
- lean_block_proposal_child_payloads_consumed_total — greedily-picked proofs
  before compaction
- lean_block_proposal_attestation_data_selected — distinct AttestationData in
  the block body
- lean_block_proposal_aggregates_selected — proofs after compaction

Instrument build_block to time the three phases and emit the counters and
histograms after a successful build. ethlambda projects justification and
finalization incrementally during selection and runs the STF once at the end,
rather than re-running it inside a fixed-point loop as leanSpec does, so
stf_simulate is a single observation per build; this divergence is documented
on the phase-label constant.

Add a unit test guarding against drift between the phase-label constant and the
strings passed at the build_block call sites, and document all five metrics in
docs/metrics.md.
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 3, 2026

🤖 Kimi Code Review

The PR adds comprehensive metrics instrumentation for block-proposal attestation selection. The implementation follows existing patterns in the codebase and is generally well-structured.

Issues and suggestions:

  1. Histogram bucket upper bound too low (metrics.rs:447-448)
    The lean_block_proposal_attestation_data_selected histogram maxes out at 32.0. With 64+ committees per slot on mainnet, distinct AttestationData entries may exceed this, causing loss of granularity in the +Inf bucket.

    // Consider: vec![0.0, 1.0, 2.0, 4.0, 8.0, 16.0, 32.0, 64.0, 128.0]
  2. Missing metrics on error paths (block_builder.rs:93-96)
    If process_slots or process_block returns an error, the stf_simulate timing observation is skipped. While the failure is tracked by lean_block_building_failures_total, the partial timing data for the failed STF run is lost. Consider using a scope guard or recording the metric before the ? operator if you need visibility into failed attempt durations.

  3. Type safety for phase labels (metrics.rs:815-818)
    The observe_block_proposal_phase function accepts any &str, risking runtime panics on unregistered labels. While the test guards against drift, consider using an enum or verifying against BLOCK_PROPOSAL_ATTESTATION_BUILD_PHASES at compile time or runtime in debug builds.

Positive aspects:

  • Correct use of Instant::now()/elapsed() for high-precision timing without non-deterministic side effects on consensus.
  • Proper separation of metrics: child_payloads_consumed captured pre-compaction vs attestation_data_selected post-compaction correctly reflects the algorithm phases.
  • The test module validates metric registration and prevents label drift between constants and call sites.
  • Documentation in metrics.md accurately reflects the implementation.

Consensus safety note:
The metrics additions are purely observational and do not affect state transition logic or fork-choice rules. The timing measurements are isolated from consensus-critical paths and cannot introduce non-determinism into the protocol state.


Automated review by Kimi (Moonshot AI) · kimi-k2.5 · custom prompt

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 3, 2026

🤖 Codex Code Review

Findings

  1. Medium: the new unit test is nondeterministic because it asserts exact values on process-global Prometheus statics. In the same test binary, other unit tests call build_block(), which now increments these same metrics, so get_sample_count() == 1 / get() == 1 / get() == 3 can fail depending on execution order and test parallelism. Use relative assertions (>= 1), record deltas before/after, or isolate the registry instead. See crates/blockchain/src/metrics.rs and the existing build_block() unit tests in crates/blockchain/src/block_builder.rs.

  2. Low: lean_block_proposal_attestation_builds_total is documented as “one per proposal attempt”, but it is only incremented after compact_attestations, process_slots, and process_block all succeed. Any failure in compaction/XMSS aggregation or STF skips the counter entirely, so dashboards derived from it will undercount actual attempts and bias phase metrics toward successes. Either move the increment to the start of build_block() or rename/document it as a success-only metric. See crates/blockchain/src/block_builder.rs and crates/blockchain/src/metrics.rs.

I did not see a consensus-path logic change here: fork choice, attestation validation, justification/finalization, STF, XMSS verification, and SSZ behavior are unaffected beyond telemetry. I could not run cargo test in this environment because the toolchain wants to access rustup/network state that is blocked here.


Automated review by OpenAI Codex · gpt-5.4 · custom prompt

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 3, 2026

🤖 Claude Code Review

Code Review: PR #414lean_block_proposal_* metrics

Overview

This PR adds five observability metrics to the build_block path, porting from leanSpec PR #753. The change is purely additive — no behavioral modification, no new logic, correct _total suffix on counters. The overall implementation is clean and follows existing patterns well.


Correctness

stf_simulate timer excludes hash_tree_root() (block_builder.rs:94–101)

let stf_start = Instant::now();
process_slots(&mut post_state, slot)?;
process_block(&mut post_state, &final_block)?;
metrics::observe_block_proposal_phase("stf_simulate", stf_start.elapsed());
final_block.state_root = post_state.hash_tree_root();  // ← not timed

hash_tree_root() is a full Merkle tree hash of the state — non-trivial cost. If other clients include this in their stf_simulate observation, the cross-client comparison that is the stated purpose of the metric becomes apples-to-oranges. The leanSpec PR description suggests stf_simulate covers the full "candidate block state transition", which semantically includes sealing the state root. Moving the observation one line down would fix this:

let stf_start = Instant::now();
process_slots(&mut post_state, slot)?;
process_block(&mut post_state, &final_block)?;
final_block.state_root = post_state.hash_tree_root();
metrics::observe_block_proposal_phase("stf_simulate", stf_start.elapsed());

Style

Comment violates project guidelines (block_builder.rs:68–70)

// Each entry in `selected` is one greedily-picked child payload, counted
// before compaction merges proofs sharing the same AttestationData.
let child_payloads_consumed = selected.len();

CLAUDE.md: "Default to writing no comments. Only add one when the WHY is non-obvious. Don't explain WHAT the code does." This comment explains WHAT, which the variable name already conveys. The distinction "before compaction" is derivable from the ordering of selected.len() before compact_attestations. It should be removed.

The second inline comment (// ethlambda runs the STF once after selection…) is more defensible — it documents an architectural cross-client difference that would genuinely surprise a reader. That one is fine to keep.


Design / Low-Risk Notes

observe_block_proposal_phase accepts a free-form &str (metrics.rs:815–819)

pub fn observe_block_proposal_phase(phase: &str, elapsed: Duration) {
    LEAN_BLOCK_PROPOSAL_ATTESTATION_BUILD_PHASE_SECONDS
        .with_label_values(&[phase])
        .observe(elapsed.as_secs_f64());
}

A typo at any call site silently creates a new Prometheus label series. The unit test guards against drift in the three defined labels, but a future developer adding a fourth phase could pass "select_payload" and produce a ghost label series that never appears in dashboards. This is consistent with how ATTESTATION_AGGREGATE_COVERAGE_SECTIONS is handled elsewhere, so it's not a regression — just worth flagging as a known footgun. An enum would close the gap if stronger guarantees become desirable.

attestation_data_selected and aggregates_selected always observe the same count

After compacted.into_iter().unzip(), both aggregated_attestations and aggregated_signatures have identical lengths, so final_block.body.attestations.len() and aggregated_signatures.len() are always equal. The PR description notes this is intentional for leanMetrics dashboard parity, and the conceptual separation (distinct AttestationData vs. distinct proofs) has value for cross-client comparison, so this is fine as-is.

Test fragility (metrics.rs:857)

assert_eq!(
    LEAN_BLOCK_PROPOSAL_ATTESTATION_BUILD_PHASE_SECONDS
        .with_label_values(&[phase])
        .get_sample_count(),
    1,
    ...
);

Global Prometheus statics persist across the test binary run. If another test in the suite calls observe_block_proposal_phase (unlikely today, since these metrics are new), the hardcoded == 1 assertion would fail. Low probability now, but the pattern is fragile. >= 1 or resetting counts before the assertion would be more robust.


Summary

The implementation is correct, consistent with existing patterns, and zero-risk behaviorally. Two items worth addressing before merge:

  • Item 1 (minor correctness): Move the stf_simulate observation to after hash_tree_root() for accurate cross-client comparison.
  • Item 2 (style): Remove the child_payloads_consumed comment per CLAUDE.md guidelines.

The test design, metric naming, bucket choices, init() registration, and docs table are all solid.


Automated review by Claude (Anthropic) · sonnet · custom prompt

Comment thread crates/blockchain/src/block_builder.rs Outdated
Comment thread crates/blockchain/src/metrics.rs Outdated
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Jun 3, 2026

Greptile Summary

This PR ports five lean_block_proposal_* observability metrics from the leanSpec cross-client spec into the ethlambda block builder, instrumenting the build_block attestation-selection path with phase timers and aggregate counters.

  • block_builder.rs wraps select_attestations, compact_attestations, and the STF call with Instant-based timers, and emits counters/histograms for payload counts and post-compaction attestation sizes after each successful build.
  • metrics.rs registers five new statics (HistogramVec, two Histogram, two IntCounter) under the existing LazyLock pattern, forces them in init(), and adds a unit test — though the test's absolute-value assertions on shared global counters can fail non-deterministically when the parallel build_block_* tests increment the same metrics first.
  • docs/metrics.md adds corresponding rows to the Block Production Metrics table with correct bucket and label values.

Confidence Score: 4/5

Safe to merge — only metric observations were added around existing logic; block selection and the STF path are untouched.

The instrumentation is purely additive and the production code path is correct. The only concern is the new unit test, whose exact-value counter assertions can produce non-deterministic failures because the existing build_block_* tests now also trigger these same global metric counters and run in parallel by default.

crates/blockchain/src/metrics.rs — specifically the block_proposal_attestation_build_metrics_are_usable test assertions.

Important Files Changed

Filename Overview
crates/blockchain/src/block_builder.rs Instruments build_block with three phase timers and four metric observations; no behavior change to block selection or STF logic.
crates/blockchain/src/metrics.rs Registers five new metrics and exposes public API functions correctly; the new unit test has fragile exact-value assertions against global counters shared with parallel build_block_* tests.
docs/metrics.md Adds five rows to the Block Production Metrics table; buckets and labels match the registered metric definitions.

Sequence Diagram

sequenceDiagram
    participant BB as build_block
    participant SA as select_attestations
    participant CA as compact_attestations
    participant STF as process_slots/process_block
    participant M as metrics

    BB->>SA: select_attestations(head_state, ...)
    SA-->>BB: selected (Vec of Attestation+Proof pairs)
    BB->>M: observe_block_proposal_phase("select_payloads", elapsed)

    BB->>CA: compact_attestations(selected, head_state)
    CA-->>BB: compacted (one entry per AttestationData)
    BB->>M: observe_block_proposal_phase("compact", elapsed)

    BB->>STF: process_slots + process_block
    STF-->>BB: post_state
    BB->>M: observe_block_proposal_phase("stf_simulate", elapsed)

    BB->>M: inc_block_proposal_attestation_builds()
    BB->>M: inc_block_proposal_child_payloads_consumed(selected.len())
    BB->>M: observe_block_proposal_attestation_data_selected(attestations.len())
    BB->>M: observe_block_proposal_aggregates_selected(signatures.len())
Loading
Prompt To Fix All With AI
Fix the following 1 code review issue. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 1
crates/blockchain/src/metrics.rs:862-889
**Test assertions race against shared global counters**

The assertions on absolute counter values (`get() == 1`, `get() == 3`, `get_sample_count() == 1`) assume the global metric statics start at zero when this test runs. However, `build_block_caps_attestation_data_entries`, `build_block_absorbs_older_but_justified_source`, and `build_block_cascades_projected_justification_across_rounds` all call `build_block`, which now increments `LEAN_BLOCK_PROPOSAL_ATTESTATION_BUILDS_TOTAL`, `LEAN_BLOCK_PROPOSAL_CHILD_PAYLOADS_CONSUMED_TOTAL`, and adds phase-histogram observations. Because `cargo test --lib` runs tests in parallel by default, any of those three tests racing ahead will leave the counters above zero, causing the assertions here to fail non-deterministically. A safer approach is to read the current value before the observation and assert a delta of exactly one, or restrict the assertions to relative `>=` comparisons.

Reviews (1): Last reviewed commit: "Add lean_block_proposal_* metrics for bl..." | Re-trigger Greptile

Comment thread crates/blockchain/src/metrics.rs Outdated
…tation

Address review feedback on PR #414:

- Move the inc_block_proposal_attestation_builds() call into the select
  attestations fixed-point loop so it counts each attestation selected (one
  per round that picks an AttestationData), rather than once per build_block.
  Update the counter help text and docs/metrics.md accordingly.
- Remove the metrics unit test: its absolute-value assertions on process-global
  Prometheus statics race against the parallel build_block_* tests that now
  touch the same counters. The build_block_* tests already exercise the metric
  call paths.
- Drop the redundant child_payloads_consumed comment.
Copy link
Copy Markdown
Collaborator

@MegaRedHand MegaRedHand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@MegaRedHand MegaRedHand merged commit 85115d7 into main Jun 3, 2026
4 of 5 checks passed
@MegaRedHand MegaRedHand deleted the metrics-block-proposal-attestation-build branch June 3, 2026 20:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants