Skip to content

Feat: Batched Execution, improved benchmarks, better gas optimization#71

Open
sudo-owen wants to merge 17 commits into
mainfrom
claude/batched-from-main
Open

Feat: Batched Execution, improved benchmarks, better gas optimization#71
sudo-owen wants to merge 17 commits into
mainfrom
claude/batched-from-main

Conversation

@sudo-owen
Copy link
Copy Markdown
Collaborator

No description provided.

sudo-owen and others added 17 commits May 28, 2026 16:17
All per-turn-mutable fields (winner/flags/activeMonIndex/lastExecTs/turnId) packed into slot 1
so per-turn BattleData mutations coalesce to one SSTORE (main wrote 2: turnId in slot 0).
turnId uint64->uint16, lastExecuteTimestamp uint48->uint40. Engine field access unchanged
(Solidity handles the new slots). Verified: EngineTest 50/50, InlineEngineGasTest 3/3.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ness

- Engine.executeBatchedTurns: loops _executeInternal with DIRECT storage (no transient shadow);
  EVM warm-slot discount amortizes cold SLOADs across the single tx for free. + getStorageKey /
  getSubmitContext, resetCallContext clears per-turn transients.
- SignedCommitManager: moveBuffer + bufferCounters + submitTurnMoves (SINGLE-SIG: msg.sender ==
  committer, revealer sig pins the committer move hash) + executeBuffered + pack/unpack helpers.
- Structs.TurnSubmission (no committer sig). IEngine surface. test/abstract/BatchHelper (single-sig).
- RealMonReplayGasTest: faithful 26-turn real-game replay via SetupMons reuse; asserts
  legacy==batched end state and reports production-faithful steady-state gas.

Result (real 26-turn game, vm.cool steady-state): clean-batched 4,584,625 vs main 5,277,953
= -693,328 (-13.1%); clean-legacy ~= main (no regression). Equivalence verified.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
p0 submits both their move and the CPU's move (computed client-side) in one tx; engine executes
directly, skipping getCPUContext (dozen+ cold SLOADs) + calculateMove every CPU turn. Trust model:
lying only weakens the CPU against p0 (PvE self-handicap); msg.sender == p0 binding unchanged.
Build + CPU suites pass (BetterCPU 52, FairCPU/OkayCPU).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ort.py)

Parses a battle desync report (teams + per-turn moveIndex/salt/extraData) into the Solidity
team monIds + per-slot MonStats + deduped Turn[] sequence for a RealMonReplayGasTest-style
faithful replay. Verified: output reproduces the hand-written 26-turn test data byte-for-byte
from the raw fixture, so any real prod game becomes a gas + equivalence regression.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Batching reduces gas ~13% vs main on a real 26-turn game (equivalence-verified); the prior branch's
transient shadow was counterproductive (EVM already amortizes warm slots free); methodology +
remaining A2 follow-up documented.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…=committer)

Drop the committer signature; bind the committer via msg.sender + the revealer sig (which pins
(battleKey, turnId, committerMoveHash)). Saves ~3.6k/turn on the legacy fallback (clean-legacy
5,296,078 -> 5,201,946). Rewrote the ~15 dual-sig test sites + 4 security tests to the single-sig
model (unilateral-revealer -> NotCommitter; third-party relay -> NotCommitter; committer-move-
changed -> InvalidSignature; replay still prevented by turnId binding). All 506 tests pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…asure helper

- Extract FullyOptimizedInlineGasTest from InlineEngineGasTest.sol into its own file
  (one contract per file; matches the already-separate snapshot JSONs). Behavior-preserving
  (both suites pass, snapshots unchanged).
- test/abstract/GasMeasure.sol: shared production-faithful gas measurement — per-tx cold
  accounting (vm.cool) + a deterministic storage-access tally (cold/warm SLOAD, SSTORE tiers),
  with _snapScenario() recording tally + cold-per-tx gas. Basis for converting the gas tests
  off the all-warm gasleft span (which masks cold-access regressions).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…e subset InlineEngineGasTest

- FullyOptimizedInlineGasTest (the production stack: inline validation/RNG/stamina + SignedMatchmaker
  + single-sig dual-signed) now measures each battle per-turn as a cold-start tx (vm.cool) with a
  deterministic storage-access tally (cold/warm SLOAD, z->nz/nz->nz/no-op SSTORE) + cold-per-tx gas.
  Cool+tally is gated inside _fastTurn/_fastSwitchReveal by _measuring, so battle spans just wrap
  with _beginMeasure()/_endMeasure() (no per-turn churn). Setup spans dropped (gasleft polluted by
  the tally's cumulative memory); storage-reuse now asserted via Battle3 < Battle1 cold gas
  (Battle3 replays Battle1 but reuses storage: zToNz 30 -> 4).
- Removed InlineEngineGasTest (+ snapshot): a strict subset of FullyOptimized's optimizations
  (inline validation only, on the slower commit-reveal flow).
- _tally sizes its dedup scratch to the actual per-window access count (was a fixed 8192 array,
  which OOG'd when called once per turn across a battle).

All 503 tests pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…uction stack)

EngineGasTest benchmarked the external-validator + commit-reveal config, which isn't what we ship
(production uses inline validation + dual-signed). FullyOptimizedInlineGasTest (new GasMeasure
format) is the production-faithful gas tracker. All 498 tests pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…regen config

The replay was measuring DefaultRuleset(new StaminaRegen()) — the SLOW external stamina-regen
path. Production uses INLINE_STAMINA_REGEN_RULESET (engine-internal regen), which avoids the
per-round-end/after-move reentrant calls (getPlayerSwitchForTurnFlag, getMoveDecisionForBattleState,
stamina getMonStateForBattle). Switching the replay to the prod config:

  clean-legacy : 5,201,946 -> 4,624,316  (inline saves ~577k, ~11%)
  clean-batched: 4,583,171 -> 4,106,467  (inline saves ~477k, ~10%)
  batching delta under prod config: 517,849 (~11.2%)

Inline regen alone saves more than batching does. The old external-regen main baseline (5,277,953)
is no longer comparable and the misleading 'batched < MAIN' line is dropped. The reentrant-read
breakdown that was steering optimizations must be redone against this config.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Merges the stale OPT_PLAN.md (described the abandoned transient-shadow / dual-sig / CPU-state-hint
design) and ANALYSIS_BATCHED_GAS.md (external-regen baseline) into one accurate record:
- Corrected production-faithful headline (inline config): legacy 4,624,316 / batched 4,106,467.
- Inline stamina regen was the dominant win (~11%, config not code); batching ~11.2% on top.
- Documents what was tried and rejected with measured reasons: transient shadow (-94k), #4 no-op
  SSTORE guard (regressed every scenario), #6 transient-reset trim (breaks equivalence), delegatecall
  moves (no SLOAD saving + storage-corruption risk).
- Ranks remaining opportunities honestly; flags the CPU/single-player one-tx batch-submit as the
  biggest remaining lever (gated on whether the no-batch constraint is PvP-fairness-only).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Convert the IMoveSet single-effect scans (DeepFreeze frostbite, IronWall/Somniphobia idempotency,
GildedRecovery status-removal, NightTerrors terror-count + sleep-check, Baselight import) from
getEffects()+in-memory-scan to the targeted engine.getEffectData() finder, which locates the effect
internally and returns (exists, index, data) without materializing the full EffectInstance[] array.

Semantically identical (494 tests pass, real-game equivalence holds). getEffectData rollout saves
~49k/game on the real replay (legacy), dominated by Baselight's move-facing level reads. Ability
activateOnSwitch scans are NOT converted — those abilities are inline-encoded and never make the
external call. MegaStarBlast left as-is (scans for Overclock by address AND data, not a first-match).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant