Skip to content

add mobench support to ProveKit v1#430

Open
dcbuild3r wants to merge 1474 commits into
v1from
dcbuild3r/mobench-v1-browserstack
Open

add mobench support to ProveKit v1#430
dcbuild3r wants to merge 1474 commits into
v1from
dcbuild3r/mobench-v1-browserstack

Conversation

@dcbuild3r
Copy link
Copy Markdown
Collaborator

@dcbuild3r dcbuild3r commented May 2, 2026

Summary

  • add v1 mobench BrowserStack wiring for passport age-check and OPRF prove benchmarks
  • run the same full Android/iOS triad matrix as PR add mobench support to ProveKit main #429
  • fix Android prove timing so fixture clone/setup is outside the measured prove span
  • use jemalloc as the Android fallback allocator under the native C FFI allocator
  • harden Android incomplete-fixture reporting with timeout/build/kill-evidence fields

Android mobench fix note

The earlier failing run was https://github.com/worldfnd/provekit/actions/runs/26002840796. The missing Vivo Y21 monolithic cell did not recover a BrowserStack session payload or summary.json; available artifacts only show the BrowserStack fetch timeout after 7200s for build b1ebace919a1b8cad3861313180ac9a18a7e461d. I grepped the recovered Android artifacts and job log for lowmemorykiller, Process * was killed, oom_reaper, SIGKILL, and abnormal signal text; there was no hit because BrowserStack did not return the killed session logs for that cell.

Before/failure memory from that run:

  • Vivo Y21 OPRF: 291 MB process peak, 9.614s
  • Vivo Y21 fragmented age check: 1206 MB process peak, 44.896s
  • Vivo Y21 monolithic age check: no row; BrowserStack timeout/no summary
  • S24 OPRF: 365 MB process peak, 2.216s

What changed:

  • Android and iOS now use the same semantic boundary for prove benches: per-iteration fixture setup happens before the measured closure, and profile_phase("prove") wraps only the prover entry point. Previously prepared.clone() ran inside profile_phase("prove"), so clone cost and peak memory were charged to proving.
  • Android native C FFI fallback allocation now routes through jemalloc instead of Bionic malloc when no host callback/mmap allocator is active.
  • If BrowserStack returns no summary, failure.json now records attempts, fetch timeout seconds, build id, and any LMK/OOM/SIGKILL lines recovered from attempt/device logs.

After numbers: pending the fresh BrowserStack rerun on this commit.

Validation

  • cargo fmt --all
  • cargo test -p bench-mobile --lib
  • cargo test -p bench-mobile --test examples_smoke
  • cargo test -p bench-mobile --test passport_smoke
  • cargo check -p provekit-ffi --target aarch64-linux-android with NDK 26.1 aarch64-linux-android34-clang
  • ruby -e 'require "yaml"; YAML.load_file(".github/workflows/mobile-bench-reusable.yml")'
  • git diff --check

Bisht13 and others added 30 commits February 20, 2026 18:28
Unify the duplicated PrefixCovector struct and six shared functions
(expand_powers, make_public_weight, build_prefix_covectors,
compute_alpha_evals, compute_public_eval) from prover and verifier
into a single implementation in provekit-common.

The prover's vestigial 'deferred: bool' field (always false) is dropped.
Also removes empty test modules from common/lib.rs.
…e helper

Replace ~250 lines of near-identical AND/XOR match arms with a single
process_binop_opcode method that handles all four operand combinations
(witness/witness, constant/witness, witness/constant, constant/constant)
and dispatches to the correct ops vector.
Verifier::verify() used self.whir_for_witness.take().unwrap() which
would panic if called twice. Replace with .take().context() for a
descriptive error message instead of a bare panic.
…notations

Use destructuring in from_noir_proof_scheme for clearer construction.
Add #[must_use] to public constructors and accessors (PublicInputs::new,
from_vec, len, is_empty, hash, ConstantOrR1CSWitness::to_tuple,
Prover/Verifier::from_noir_proof_scheme, size).
…itch to base64

- Parallelize right-multiply (A * witness) over rows via into_par_iter.
  Left-multiply intentionally stays sequential (38 MB per accumulator
  makes fold-reduce prohibitive; callers already parallelize via rayon::join).
- Avoid redundant allocations in calculate_witness_bounds: compute C
  element-wise and resize a/b/c in-place instead of pad_to_power_of_two.
- Switch human-readable serde encoding from hex (100% overhead) to base64
  (33% overhead), cutting proof file size ~25%. Deserializer auto-detects
  hex for backwards compatibility.
…efault

- Remove empty #[cfg(test)] mod tests {} from prover, r1cs-compiler
- Remove commented-out 'pub mod file_io' from utils
- Fix verify.rs docstring ('Prove' -> 'Verify')
- Collapse split use-path in common/whir_r1cs.rs
- Align struct field formatting in r1cs-compiler/whir_r1cs.rs
- Disable jemalloc as default feature in CLI (opt-in instead)
- Switch whir dependency from local path (../../whir) to pinned git
  revision for reproducible builds.
- Enable profiling-allocator as default CLI feature; make jemalloc
  depend on profiling-allocator.
… matrix ops

Remove r1cs.clone() and alpha.clone() by taking references, add
SparseMatrix::transpose() for parallel right-multiply, parallelize
verifier key and proof file reads with rayon::join, truncate eq_alpha
allocation to actual entry count, and take owned NoirProof to avoid
proof clone. End-to-end verify drops from ~2s to ~510ms on
complete_age_check.
…te clones

Consume objects as soon as they finish their job to reduce peak memory:

- PrefixCovector: store only short alpha-weight prefix, zero-pad via
  logical_size (~192 MB savings vs full-domain Covector)
- CompressedR1CS: serialize R1CS during commits, decompress at sumcheck
  (~61 MB savings during commit phase)
- CompressedLayers: serialize w2_layers during commit_w1, decompress
  before solve_w2 (~271 MB savings during commit_w1)
- Remove padded_witness from WhirR1CSCommitment, pass full_witness as
  parameter to prove() instead of cloning+storing
- Shrink get_public_weights to tiny prefix (~64 MB savings)
- Take ownership of alphas in create_weights to avoid copy+coexistence
- Early drops: acir_map after solve_w2, witness consumed via into_iter
- Extract solve_witness_vec as free function so layers drop on return
- commit() takes (num_witnesses, num_constraints) instead of &R1CS
- Work with raw Vec<FieldElement> instead of CoefficientList/EvaluationsList
- Drop program and witness_generator after witness generation

Global peak reduced from ~1.8 GB to ~1.22 GB for complete_age_check.
Update whir dependency from PR 217 (ec295ced) to PR 225 (d67518d1)
which introduces an ownership-based prove() API with Cow parameters.

Adapt provekit to the new API: wrap vectors/witnesses/evaluations in
Cow::Owned, implement the new as_any() LinearForm requirement, and
replace the removed Domain type in gnark_config with
GeneralEvaluationDomain.

Fix a pre-existing bug in PrefixCovector::mle_evaluate that used the
wrong variable ordering for whir's big-endian MLE convention. The
leading point variables select upper/lower array halves, so the
(1-p) zero-padding factor must apply to point[..r] (head) with the
prefix MLE evaluated at point[r..] (tail), not the reverse.
…cross-evals, PrefixCovector tests

- Remove R1CSSolver trait, convert test_witness_satisfaction to free fn (#3)
- Make w2_layers compression conditional on has_challenges (#4)
- Parallelize cross-evaluation dot products with rayon::join (~12% prove speedup) (#6)
- Add debug_assert in PrefixCovector::accumulate (#11)
- Add PrefixCovector unit tests: size, mle_evaluate, accumulate, prefix=logical (#2)
- Fix clippy: &mut Vec -> &mut [] in solve_witness_vec
feat: port provekit to zkWHIR 2.0
dcbuild3r added 28 commits May 17, 2026 12:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bench Run mobile benchmarks on PRs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants