refactor(stark): unify & clean up the commitment layer#735
Conversation
Unifies the two near-identical bit-reversed leaf hashers (per-row + per-row-pair) into one keccak_leaves_bit_reversed_grouped(columns, rows_per_leaf), and the two near-identical commit fns (commit_columns_bit_reversed + commit_composition_polynomial) into commit_bit_reversed(columns, rows_per_leaf), in a new crypto/stark/src/commitment.rs. Removes them from the IsStarkProver trait (they used no self). prover.rs re-exports the named leaf hashers for the math-cuda GPU parity test. Byte-identical (stark 128/128).
Every FRI query opens a value and its symmetric counterpart (LDE positions 2*iota, 2*iota+1). The composition-poly commitment already grouped that pair into one leaf; the trace (main/aux/precomputed) committed one row per leaf and opened TWO leaves (proof + proof_sym) per query. Commit the trace with rows_per_leaf=2 too, so each query opens ONE leaf/path; drop the now-redundant proof_sym from PolynomialOpenings (also shrinks the composition opening, which stored the path twice). verify_opening_pair now reconstructs the paired leaf and verifies once (mirrors verify_composition_poly_opening); the dead verify_opening is removed. Halves trace Merkle authentication-path data per query (smaller proofs + less verifier hashing). stark 128/128 (prove+verify). NOTE: proof FORMAT change (not byte-identical). cuda follow-up: the GPU trace leaf+tree builders (gpu_lde::try_expand_leaf_and_tree_batched_keep/_ext3_keep + math-cuda kernels) still build 1-row leaves and must switch to the row-pair pattern (the GPU composition builder already pairs) or cuda proofs will fail verification.
…ropagation, doc/log fixes
The trace commitment now uses the row-pair leaf layout (ROWS_PER_LEAF=2), same as composition; rows_per_leaf=1 is only kept for the GPU parity test. Update the module/const/wrapper docs that still described the pre-pairing per-row trace.
Extract two helpers on IsStarkProver, collapsing the near-duplicate main-trace and aux-trace commit code: - spill_tree<C>: the identical-except-label disk-spill block, shared by the main / preprocessed-split / aux commit sites (4 call sites). - commit_plain<C>: commit_bit_reversed + spill_tree + TableCommit::plain, shared by the main-trace (non-preprocessed None arm) and aux-trace plain-commit paths. Proof output is byte-identical (stark 128/128, +disk-spill 133/133). The only behavioral delta is in the instruments profiling feature: the aux commit's timing bucket now includes its (tiny) disk spill, matching what the main path already measured. clippy clean on default / disk-spill / instruments; builds on the combined feature set.
The trace Merkle commitment moved to a row-pair leaf layout (ROWS_PER_LEAF=2), but the 5 preprocessed-table commitment computers (bitwise/keccak_rc/page/decode/register) still built a 1-row-per-leaf tree manually, so the prover's row-pair precomputed root no longer matched the computed/hardcoded one -> PrecomputedCommitmentMismatch at prove time. - Route all 5 compute_*commitment fns through the shared stark::commitment::commit_bit_reversed(.., ROWS_PER_LEAF), dropping the manual bit-reverse + columns2rows + 1-row BatchedMerkleTree::build. - Regenerate the hardcoded bitwise/keccak_rc/zero_page static commitments for the row-pair layout (via compute_static_commitments). - cargo fmt (prover.rs + touched tables). Verified: static_commitments drift tests 5/5, stark 128/128, clippy + fmt clean, no PrecomputedCommitmentMismatch. Full ELF prove/verify runs in CI (guest artifacts absent locally).
…layout The compile-time decode-commitment const for sub.elf shifted with the row-pair preprocessed commitment; regenerated via commitment_from_elf (the print_decode_commitment_for_sub regen path).
…air value Regenerated from CI's sub.elf (local riscv toolchain unavailable); removed the temporary print instrumentation.
… perf/commitment-row-pair # Conflicts: # crypto/stark/src/prover.rs
… GPU column-LDE dead_code The R2 parts-LDE / comp-poly-tree GPU dispatches no longer fire since #699/#700 route degree-3 tables through the 2-part fused coset_lde_full path (no AIR has number_of_parts > 2). try_expand_columns_batched* are debug-checks-only after the #650 row-major LDE became production.
|
/bench |
Benchmark — ethrex 20 transfers (median of 3)Table parallelism: auto (cores / 3)
Commit: 1a166e0 · Baseline: cached · Runner: self-hosted bench |
|
/bench-gpu |
GPU Benchmark (ABBA) —
|
|
/bench-gpu 5 |
Follow-up to the row-pair commitment PR. Excludes the intentional proof-format break (proof_sym removal / leaf-count change), which is by design. Test coverage (in their own files under crypto/stark/src/tests/, per the crate's test structure): - tests/commitment_tests.rs: direct unit tests pinning the row-pair leaf layout (R=1 and R=2) against an independent reference, wrapper agreement, commit-root consistency, empty-input short-circuit, and a release-mode divisibility-assert test. Previously the leaf layout was only covered transitively via full prove->verify, and GPU parity tests compared against an inline reimpl rather than this module. - tests/row_pair_opening_tests.rs: two negative tests for the row-pair verify_opening_pair — a tampered symmetric trace evaluation and a corrupted Merkle authentication path must both be rejected. Removing proof_sym deleted the old "symmetric opening mismatch" rejection class; these restore it (an impl ignoring evaluations_sym / the auth path would otherwise pass every existing test). Reuses the now pub(crate) make_valid_simple_proof helper. - cuda_path_integration.rs: restore assert!(gpu_comp_poly_tree_calls() > 0). try_build_comp_poly_tree_gpu is dispatched unconditionally (round 2, after the parts-count branch), so it fires for the common number_of_parts == 2 (degree-3) case — it was NOT obsolete. Keep the genuinely-dead parts-LDE assertion dropped. Cleanups: - Delete dead fn commit_plain (zero callers; main/aux commit inline commit_rows_bit_reversed + spill_tree, which are row-major and incompatible with its column-major signature — the dedup never landed). - Delete orphaned pub fn columns2rows (all callers removed by #735). - Promote the commitment.rs precondition checks from debug_assert! to hard assert! — they run once per commit (negligible) and the release failure mode on this pub API is a silently wrong Merkle root. - Add ProvingError::Fft and map FFTError to it instead of WrongParameter (internal FFT failure is not a caller-supplied-parameter error). - Fix stale profiler label commit_composition_poly -> commit_bit_reversed. - Drop the rot-prone "// = 2" comment on the local ROWS_PER_LEAF alias.
Follow-up to the row-pair commitment PR. Excludes the intentional proof-format break (proof_sym removal / leaf-count change), which is by design. Test coverage (in their own files under crypto/stark/src/tests/, per the crate's test structure): - tests/commitment_tests.rs: direct unit tests pinning the row-pair leaf layout (R=1 and R=2) against an independent reference, wrapper agreement, commit-root consistency, and empty-input short-circuit. Previously the leaf layout was only covered transitively via full prove->verify, and GPU parity tests compared against an inline reimpl rather than this module. - tests/row_pair_opening_tests.rs: two negative tests for the row-pair verify_opening_pair — a tampered symmetric trace evaluation and a corrupted Merkle authentication path must both be rejected. Removing proof_sym deleted the old "symmetric opening mismatch" rejection class; these restore it (an impl ignoring evaluations_sym / the auth path would otherwise pass every existing test). Reuses the now pub(crate) make_valid_simple_proof helper. - cuda_path_integration.rs: restore assert!(gpu_comp_poly_tree_calls() > 0). try_build_comp_poly_tree_gpu is dispatched unconditionally (round 2, after the parts-count branch), so it fires for the common number_of_parts == 2 (degree-3) case — it was NOT obsolete. Keep the genuinely-dead parts-LDE assertion dropped. Cleanups: - Delete dead fn commit_plain (zero callers; main/aux commit inline commit_rows_bit_reversed + spill_tree, which are row-major and incompatible with its column-major signature — the dedup never landed). - Delete orphaned pub fn columns2rows (all callers removed by #735). - Add ProvingError::Fft and map FFTError to it instead of WrongParameter (internal FFT failure is not a caller-supplied-parameter error). - Fix stale profiler label commit_composition_poly -> commit_bit_reversed. - Drop the rot-prone "// = 2" comment on the local ROWS_PER_LEAF alias.
…#740) Follow-up to the row-pair commitment PR. Excludes the intentional proof-format break (proof_sym removal / leaf-count change), which is by design. Test coverage (in their own files under crypto/stark/src/tests/, per the crate's test structure): - tests/commitment_tests.rs: direct unit tests pinning the row-pair leaf layout (R=1 and R=2) against an independent reference, wrapper agreement, commit-root consistency, and empty-input short-circuit. Previously the leaf layout was only covered transitively via full prove->verify, and GPU parity tests compared against an inline reimpl rather than this module. - tests/row_pair_opening_tests.rs: two negative tests for the row-pair verify_opening_pair — a tampered symmetric trace evaluation and a corrupted Merkle authentication path must both be rejected. Removing proof_sym deleted the old "symmetric opening mismatch" rejection class; these restore it (an impl ignoring evaluations_sym / the auth path would otherwise pass every existing test). Reuses the now pub(crate) make_valid_simple_proof helper. - cuda_path_integration.rs: restore assert!(gpu_comp_poly_tree_calls() > 0). try_build_comp_poly_tree_gpu is dispatched unconditionally (round 2, after the parts-count branch), so it fires for the common number_of_parts == 2 (degree-3) case — it was NOT obsolete. Keep the genuinely-dead parts-LDE assertion dropped. Cleanups: - Delete dead fn commit_plain (zero callers; main/aux commit inline commit_rows_bit_reversed + spill_tree, which are row-major and incompatible with its column-major signature — the dedup never landed). - Delete orphaned pub fn columns2rows (all callers removed by #735). - Add ProvingError::Fft and map FFTError to it instead of WrongParameter (internal FFT failure is not a caller-supplied-parameter error). - Fix stale profiler label commit_composition_poly -> commit_bit_reversed. - Drop the rot-prone "// = 2" comment on the local ROWS_PER_LEAF alias.
#744) Two items that landed too late for #740: - Move the shared make_valid_simple_proof helper out of small_trace_tests.rs into tests/trace_test_helpers.rs, where the crate keeps shared test helpers (matching how prover_tests sources get_trace_evaluations). row_pair_opening_tests.rs and small_trace_tests.rs now both import it from there instead of one test file reaching sideways into another. - Delete AGENTS.md (added by #735).
Resolves the conflict between #715 ("perf/row-major trace LDE for GPU path") and this PR's row-pair commitment. The two reworked the same GPU trace-LDE/commit functions for orthogonal goals: #715 made the LDE consume a row-major input buffer (memory-layout win, kept a per-row commitment); this PR groups two bit-reversed rows per Merkle leaf (one path per query, proof_sym removed). Resolution keeps BOTH: #715's row-major LDE input, now emitting a row-pair tree. CPU side (verifier, proof format, commitment.rs) merged cleanly to the row-pair form — it is the fixed target the GPU must match. GPU side: - New CUDA kernel `keccak256_leaves_base_row_major_row_pair` (+ launcher `launch_keccak_base_row_major_row_pair`, device registration): the row-major analog of `keccak256_leaves_base_row_pair_batched`. Each leaf hashes two consecutive bit-reversed rows of `m` u64 lanes (base m cols; ext3 m*3, since components c0,c1,c2 are consecutive). Byte layout matches the CPU `commit_bit_reversed(.., 2)` and `verify_opening_pair`. - `coset_lde_row_major_inner` now builds `lde_size/2` row-pair leaves via the new launcher (was per-row). This is the only producer for the GPU base + ext3 trace commit (the two `_keep` wrappers). - gpu_lde.rs trace fast paths take #715's row-major dispatch unchanged (row-pair-ness is internal to the math-cuda function above). Deleted as now-unused (superseded): - The per-row row-major path: kernel `keccak256_leaves_base_row_major`, its launcher, and device field (replaced by the row-pair version). - `alloc_merkle_nodes` and the column-based `_keep` wrappers in math-cuda lde.rs (this PR's earlier GPU path; #715's row-major path replaces them). - Dead `columns_to_row_major` in prover.rs (round 1 reads the trace row-major directly via `main_data_row_major`). Fixed main-side tests for this PR's API changes: - prover_tests: `commit_rows_bit_reversed_matches_commit_bit_reversed` now compares the row-major commit against `commit_bit_reversed(.., 2)` (commit_columns_bit_reversed was removed). Validates row-major == column-major row-pair roots on CPU. - merkle_root_parity: pass `rows_per_leaf = 1` to keccak_leaves_base/ext3 (generic per-row primitive parity test). Verified locally: cargo build/check (default + `--features cuda`, incl. tests), `cargo test -p stark` (137 pass), `make lint` (all combos). The new CUDA kernel's byte-level correctness is NOT runtime-tested here (no GPU); the `cuda_path_integration` + math-cuda GPU tests must be run on a GPU host to confirm GPU↔CPU↔verifier root parity.
|
/bench |
|
/bench-gpu |
|
/ai-review |
|
/bench-gpu 6 |
Codex Code ReviewNo findings. I reviewed the PR diff statically, focusing on the row-pair commitment change, verifier opening semantics, CUDA leaf/tree sizing, and changed preprocessed commitment paths. I did not run builds or tests per the review constraints. |
|
/benchmark |
AI ReviewPR #735 · 30 changed files
Findings
Status column reflects the verdict from the verifier: deepseek-verifier (openrouter/deepseek/deepseek-v4-pro). AI-001: Breaking change: PolynomialOpenings.proof_sym field removed (wire-format breaking)
Claim The Evidence The struct changed from Suggested fix Document the wire-format break clearly in release notes / migration guide. Consider a serde-compat shim or versioned envelope if backward-compat is required; otherwise bump a major version / add AI-002: Breaking change: IsStarkProver::commit_columns_bit_reversed removed
Claim
Evidence The diff deletes the default trait method body (prover.rs diff lines ~4404-4420). Suggested fix Either keep a deprecated wrapper method that calls AI-003: Static preprocessed commitments updated — verifier must recompile against new bytes
Claim Hard-coded Evidence The diff in Suggested fix Add a CHANGELOG entry explaining that hardcoded preprocessed commitments changed because the leaf layout changed, with instructions to rerun AI-004: GPU row-pair path skipped for preprocessed tables — small proof failures possible
Claim The fused GPU Evidence prover.rs lines 731-732: Suggested fix Add a comment explaining the gating (GPU helper has no subset variant yet) so future readers don't try to enable it without adding the missing split path. Optionally extend the GPU helper to take a column subset. AI-005: Dead code: keccak_leaves_bit_reversed (rows_per_leaf=1) wrapper kept but only used by tests
Claim The Evidence commitment.rs lines 109-121 keep the Suggested fix Mark AI-006: commit_main_trace Vec::with_capacity + extend_from_slice relies on coset_lde_full_expand_row_major resize to zero-pad — works but is fragile
Claim prover.rs allocates Evidence prover.rs lines 776-777: Suggested fix Either explicitly call AI-007: Log level downgraded from info! to log::debug! in prove_rounds_2_to_4
Claim The "Started proof generation..." / "End proof generation" messages in Evidence prover.rs lines 2347 and 2461 now use Suggested fix Restore AI-008: Newly-added helper alloc_merkle_nodes / columns2rows removals leave no dead references, but verify no external use remains
Claim
Evidence Removed per the diff (trace.rs delete block, gpu_lde.rs delete block). All in-tree callers I could grep are gone. No external crate is in scope to check, but this is a public-API removal worth flagging. Suggested fix Same migration note as the other breaking changes. Optionally re-export a shim that emits a deprecation warning for one release. AI-009: Step_3 FRI in verifier uses BatchedMerkleTreeBackend (Vector backend) but FRI commit uses FriLayerMerkleTreeBackend (Pair backend)
Claim The FRI verifier uses Evidence verifier.rs line 438: Suggested fix Use Reviewer Lanes
Verification Lanes
Native Codex and Claude reviews run separately and post their own comments. They are not included in this structured provenance report. Raw lane outputs, candidates, final issues, and model metrics are uploaded as workflow artifacts. |
No description provided.