Part of #151 .
Problem. Full grid blows budget: ~20 scenarios × ~24 servers × 5 runs × 2 arches ≈ 140h. At default BENCH_DURATION=120s + WARMUP=30s (~155s/cell) + BENCH_RUNS=5, a (scenario×server) pair ≈ 13 min. loadgen is amd64-only → both arches driven serially from the amd64 client (≈ ×2 wall).
Goal. A documented budget model + curated matrix that fits both arches in <24h weekly, with knobs to expand on release weeks.
Scope.
Define a headline matrix (~15 servers × ~12 scenarios) vs a full matrix (everything), selectable via BENCH_CELLS.
Tune per-cell BENCH_DURATION/WARMUP/BENCH_RUNS for statistical validity within budget (e.g. 60s/15s/3 weekly; longer for the saturation headline).
Decide which scenarios get the expensive rated/SLO sweep (P1: Drive rated-mode + LatencyAtSLO sweep; make the regression gate live #156 ) — curated subset only.
Budget calculator (cells × duration × runs × arches → wall-clock); CI assert chosen config <24h.
Log what's trimmed (no silent truncation).
Acceptance. A documented, CI-asserted config completing both arches <24h; a --full profile for occasional exhaustive runs.
Refs: mage_bench.go (BENCH_* knobs), mage_cluster.go (loadgen amd64-only).
Part of #151.
Problem. Full grid blows budget: ~20 scenarios × ~24 servers × 5 runs × 2 arches ≈ 140h. At default
BENCH_DURATION=120s + WARMUP=30s(~155s/cell) +BENCH_RUNS=5, a (scenario×server) pair ≈ 13 min. loadgen is amd64-only → both arches driven serially from the amd64 client (≈ ×2 wall).Goal. A documented budget model + curated matrix that fits both arches in <24h weekly, with knobs to expand on release weeks.
Scope.
BENCH_CELLS.BENCH_DURATION/WARMUP/BENCH_RUNSfor statistical validity within budget (e.g. 60s/15s/3 weekly; longer for the saturation headline).Acceptance. A documented, CI-asserted config completing both arches <24h; a
--fullprofile for occasional exhaustive runs.Refs:
mage_bench.go(BENCH_* knobs),mage_cluster.go(loadgen amd64-only).