Feat: Batched Execution, improved benchmarks, better gas optimization#71
Open
sudo-owen wants to merge 17 commits into
Open
Feat: Batched Execution, improved benchmarks, better gas optimization#71sudo-owen wants to merge 17 commits into
sudo-owen wants to merge 17 commits into
Conversation
All per-turn-mutable fields (winner/flags/activeMonIndex/lastExecTs/turnId) packed into slot 1 so per-turn BattleData mutations coalesce to one SSTORE (main wrote 2: turnId in slot 0). turnId uint64->uint16, lastExecuteTimestamp uint48->uint40. Engine field access unchanged (Solidity handles the new slots). Verified: EngineTest 50/50, InlineEngineGasTest 3/3. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ness - Engine.executeBatchedTurns: loops _executeInternal with DIRECT storage (no transient shadow); EVM warm-slot discount amortizes cold SLOADs across the single tx for free. + getStorageKey / getSubmitContext, resetCallContext clears per-turn transients. - SignedCommitManager: moveBuffer + bufferCounters + submitTurnMoves (SINGLE-SIG: msg.sender == committer, revealer sig pins the committer move hash) + executeBuffered + pack/unpack helpers. - Structs.TurnSubmission (no committer sig). IEngine surface. test/abstract/BatchHelper (single-sig). - RealMonReplayGasTest: faithful 26-turn real-game replay via SetupMons reuse; asserts legacy==batched end state and reports production-faithful steady-state gas. Result (real 26-turn game, vm.cool steady-state): clean-batched 4,584,625 vs main 5,277,953 = -693,328 (-13.1%); clean-legacy ~= main (no regression). Equivalence verified. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
p0 submits both their move and the CPU's move (computed client-side) in one tx; engine executes directly, skipping getCPUContext (dozen+ cold SLOADs) + calculateMove every CPU turn. Trust model: lying only weakens the CPU against p0 (PvE self-handicap); msg.sender == p0 binding unchanged. Build + CPU suites pass (BetterCPU 52, FairCPU/OkayCPU). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ort.py) Parses a battle desync report (teams + per-turn moveIndex/salt/extraData) into the Solidity team monIds + per-slot MonStats + deduped Turn[] sequence for a RealMonReplayGasTest-style faithful replay. Verified: output reproduces the hand-written 26-turn test data byte-for-byte from the raw fixture, so any real prod game becomes a gas + equivalence regression. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Batching reduces gas ~13% vs main on a real 26-turn game (equivalence-verified); the prior branch's transient shadow was counterproductive (EVM already amortizes warm slots free); methodology + remaining A2 follow-up documented. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…=committer) Drop the committer signature; bind the committer via msg.sender + the revealer sig (which pins (battleKey, turnId, committerMoveHash)). Saves ~3.6k/turn on the legacy fallback (clean-legacy 5,296,078 -> 5,201,946). Rewrote the ~15 dual-sig test sites + 4 security tests to the single-sig model (unilateral-revealer -> NotCommitter; third-party relay -> NotCommitter; committer-move- changed -> InvalidSignature; replay still prevented by turnId binding). All 506 tests pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…asure helper - Extract FullyOptimizedInlineGasTest from InlineEngineGasTest.sol into its own file (one contract per file; matches the already-separate snapshot JSONs). Behavior-preserving (both suites pass, snapshots unchanged). - test/abstract/GasMeasure.sol: shared production-faithful gas measurement — per-tx cold accounting (vm.cool) + a deterministic storage-access tally (cold/warm SLOAD, SSTORE tiers), with _snapScenario() recording tally + cold-per-tx gas. Basis for converting the gas tests off the all-warm gasleft span (which masks cold-access regressions). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…e subset InlineEngineGasTest - FullyOptimizedInlineGasTest (the production stack: inline validation/RNG/stamina + SignedMatchmaker + single-sig dual-signed) now measures each battle per-turn as a cold-start tx (vm.cool) with a deterministic storage-access tally (cold/warm SLOAD, z->nz/nz->nz/no-op SSTORE) + cold-per-tx gas. Cool+tally is gated inside _fastTurn/_fastSwitchReveal by _measuring, so battle spans just wrap with _beginMeasure()/_endMeasure() (no per-turn churn). Setup spans dropped (gasleft polluted by the tally's cumulative memory); storage-reuse now asserted via Battle3 < Battle1 cold gas (Battle3 replays Battle1 but reuses storage: zToNz 30 -> 4). - Removed InlineEngineGasTest (+ snapshot): a strict subset of FullyOptimized's optimizations (inline validation only, on the slower commit-reveal flow). - _tally sizes its dedup scratch to the actual per-window access count (was a fixed 8192 array, which OOG'd when called once per turn across a battle). All 503 tests pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…uction stack) EngineGasTest benchmarked the external-validator + commit-reveal config, which isn't what we ship (production uses inline validation + dual-signed). FullyOptimizedInlineGasTest (new GasMeasure format) is the production-faithful gas tracker. All 498 tests pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…regen config The replay was measuring DefaultRuleset(new StaminaRegen()) — the SLOW external stamina-regen path. Production uses INLINE_STAMINA_REGEN_RULESET (engine-internal regen), which avoids the per-round-end/after-move reentrant calls (getPlayerSwitchForTurnFlag, getMoveDecisionForBattleState, stamina getMonStateForBattle). Switching the replay to the prod config: clean-legacy : 5,201,946 -> 4,624,316 (inline saves ~577k, ~11%) clean-batched: 4,583,171 -> 4,106,467 (inline saves ~477k, ~10%) batching delta under prod config: 517,849 (~11.2%) Inline regen alone saves more than batching does. The old external-regen main baseline (5,277,953) is no longer comparable and the misleading 'batched < MAIN' line is dropped. The reentrant-read breakdown that was steering optimizations must be redone against this config. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Merges the stale OPT_PLAN.md (described the abandoned transient-shadow / dual-sig / CPU-state-hint design) and ANALYSIS_BATCHED_GAS.md (external-regen baseline) into one accurate record: - Corrected production-faithful headline (inline config): legacy 4,624,316 / batched 4,106,467. - Inline stamina regen was the dominant win (~11%, config not code); batching ~11.2% on top. - Documents what was tried and rejected with measured reasons: transient shadow (-94k), #4 no-op SSTORE guard (regressed every scenario), #6 transient-reset trim (breaks equivalence), delegatecall moves (no SLOAD saving + storage-corruption risk). - Ranks remaining opportunities honestly; flags the CPU/single-player one-tx batch-submit as the biggest remaining lever (gated on whether the no-batch constraint is PvP-fairness-only). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Convert the IMoveSet single-effect scans (DeepFreeze frostbite, IronWall/Somniphobia idempotency, GildedRecovery status-removal, NightTerrors terror-count + sleep-check, Baselight import) from getEffects()+in-memory-scan to the targeted engine.getEffectData() finder, which locates the effect internally and returns (exists, index, data) without materializing the full EffectInstance[] array. Semantically identical (494 tests pass, real-game equivalence holds). getEffectData rollout saves ~49k/game on the real replay (legacy), dominated by Baselight's move-facing level reads. Ability activateOnSwitch scans are NOT converted — those abilities are inline-encoded and never make the external call. MegaStarBlast left as-is (scans for Overclock by address AND data, not a first-match). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.