Skip to content

Simplification lemmas to improve Haskell booster coverage#2859

Merged
ehildenb merged 28 commits into
masterfrom
lemmas-tier1
Jun 11, 2026
Merged

Simplification lemmas to improve Haskell booster coverage#2859
ehildenb merged 28 commits into
masterfrom
lemmas-tier1

Conversation

@ehildenb

@ehildenb ehildenb commented Jun 4, 2026

Copy link
Copy Markdown
Member

This PR adds a curated first tier of simplification lemmas, plus the supporting attribute annotations, so that common arithmetic, Bool, bytes, and #asWord goals discharge directly in the Haskell booster instead of falling back to kore-rpc. Each new rule is paired with a focused coverage claim in the functional specs, and the change set includes the SMT support needed to keep the MCD/DSS proofs passing once the new lemmas simplify their keccak side-conditions away.

Changes:

  • Add simplification lemmas across int-simplification.k, evm-int-simplification.k, bytes-simplification.k, lemmas.k, and buf.md: ==Int commutativity normalization (concrete argument on the left), arithmetic and Bool tautologies, A <=Int A +Int B short-circuit when 0 <=Int B, concrete bytes-concat identities, #asWord comparison-to-false cases, minInt/maxInt base comparisons, eq-false-lt, #widthOpCode bounds, and a concrete #powByteLen comparison rule.
  • Mark #widthOpCode [total, smtlib] and #asAccount [total]; add [preserves-definedness] to the rules that preserve definedness.
  • tests/specs/mcd/verification.k: mark the keccak disequality rule [smt-lemma] so Z3 can discharge keccak(_) =/=Int <small constant> independently of the path condition. This is required once the new lemmas simplify that constraint away, and fixes the heal/fold/frob DSS proofs and the flopper-tick/end-pack/flopper-dent rules proofs.
  • Add coverage claims for the new rules in lemmas-no-smt-spec.k, int-simplifications-spec.k, and abi-spec.k.

Validation:

  • Full prove suite (functional, rules, optimizations, DSS) on K v7.1.330.
  • Per-spec timing comparison against master: the lemma set is performance-neutral on the large majority of specs; the heaviest MCD/DSS proofs see roughly +20–25% from the cumulative per-step cost of the larger rule set, and the keccak [smt-lemma] itself adds ~1%.
  • Confirmed that all slowdown disappers when coupled with this Haskell backend PR: Booster: tunable in-place equation evaluation at rewritten subterms (--equation-max-local-steps) haskell-backend#4153
  • Confirmed that this PR reduces fallbacks to Kore dramatically on the KEVM test-suite (and on Kontrol).

ehildenb and others added 18 commits June 4, 2026 20:15
…-asWord

Without this attribute Booster refused to apply the rule because the LHS
contains the non-total hook Lbl_[_]_BYTES-HOOKE.  After the fix Booster
correctly applies it (8× succeeded in the execute-node simplify request).

Analysis: scripts/bench-prove.py --analyse-fallbacks shows the "Uncertain
about definedness of rule due to: non-total symbol Lbl_[_]_BYTES-HOOKE"
error gone, replaced by successful Booster applications.  The rule-index
gap and indeterminate-match cases for other equations remain and will be
addressed in follow-up commits.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
(cherry picked from commit 25e00d935ac1f9ce25394e40865dbb36c0cbd80b)
(cherry picked from commit 6ae0def0a1de63aa5ffb13d1cdb2abe7d5c3651d)
bytes-simplification.k: add preserves-definedness to lengthBytes-buf
and lengthBytes-range.

evm-int-simplification.k: add asWord-buf-inversion-rangeUInt256.

(cherry picked from commit d4dad3335e7e98d76057c9c0b07387a21a20c0b2)
(cherry picked from commit c60634bc589f05a98aa30b06409df4ed478d407f)
…n-left

Kore's execute-fallback applies INT-KORE.eq-int-true-rigth which can swap
==Int argument order in path conditions (e.g. lengthBytes(BA) ==Int 32 →
32 ==Int lengthBytes(BA)).  This makes Booster's implies check fail due to a
syntactic mismatch even when the logical content is identical.

The new rule [int-eq-comm-concrete] at priority 30 fires before the +Int
cancellation rules and normalises concrete-on-left to concrete-on-right,
restoring the form the target spec expects.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
(cherry picked from commit 1d384970bda6669b479959a31f152c0e8aedde35)
(cherry picked from commit 2cf8e8c620fee897758f02f5b2b512595428fbec)
…ation lemmas

Adds to INT-SIMPLIFICATION-COMMON:
- A <=Int A => true  (reflexivity; fixes chop-overflow-02 under --booster-only-simplify)
- A <=Int A +Int B => 0 <=Int B
- A +Int ((B -Int A) +Int C) => B +Int C  (4-term cancellation; fixes cancellation-02)
- A <Int B andBool B <Int A => false  (strict order contradiction; fixes b2w-03)

Adds to INT-SIMPLIFICATION-HASKELL:
- A <Int B andBool C <=Int A => false requires B <=Int C [concrete(B,C)]  (range contradiction)
- C <=Int A andBool A <Int B => false requires B <=Int C [concrete(B,C)]  (range contradiction)
  (requires is purely concrete, evaluated by LLVM hook — no circular self-application risk)

Adds to LEMMAS-HASKELL:
- B orBool notBool B => true  (and commuted variant; fixes b2w-05, chop-additional-knowledge)
- B andBool notBool B => false  (and commuted variant)

Validation: full functional test suite passes in normal mode (slot-updates-spec.k failure
is pre-existing, unrelated to these changes). Under --booster-only-simplify, lemmas-spec.k
drops from 22 to 18 failing claims. No loops detected (checked via kevm-analyse hot-rules).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
(cherry picked from commit 3bb01297db74cacf75fa138f0550440882df8de1)
(cherry picked from commit 4c13678fdb82d593bfc495cf0418804a3e505c90)
…en 0 <=Int B

Add a higher-priority (40) variant of `A <=Int A+B => 0 <=Int B` that
short-circuits directly to `true` when `0 <=Int B` is already in the path
condition, fixing `chop-overflow-01` in --booster-only-simplify mode where the
intermediate `0 <=Int Y` form cannot be discharged at the implies check.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
(cherry picked from commit bfb0225f96a28cbf42ddd520dd49773b234944e5)
In K, .Bytes (the constructor) and b"" (LLVM-evaluated concrete empty bytes,
a Kore domain value \dv{SortBytes}("")) are different Kore terms.  The existing
bytes-concat-empty-right/left rules use .Bytes and do not fire when LLVM hooks
(e.g. #encodeArgs base case) return b"".

New rules:
  [bytes-concat-empty-right-concrete]: B +Bytes b"" => B
  [bytes-concat-empty-left-concrete]:  b"" +Bytes B => B

Fixes booster-only failure for encodepacked-keccak01: the contract evaluates
keccak(1 : #encodeArgs(#uint256(A0))) where the #encodeArgs base case produces
b"", leaving keccak(X +Bytes b"") vs keccak(X) as the implies-check mismatch.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
(cherry picked from commit 829d293906d6efb769649002b2cea2ebf90235fc)
(cherry picked from commit a4669b0942edb8c974750490affdad4e1ed1d54d)
…ication, add coverage claim

The rule `2 ^Int (8 *Int SIZE) => #powByteLen(SIZE)` was [mixed] in booster: it
sometimes matched successfully but sometimes failed because booster rejected the
rewrite without proof that #powByteLen(SIZE) is defined. Since #powByteLen is a
function symbol with no-evaluators (always yields a defined term), adding
[preserves-definedness] is safe and fixes the gap.

This was the root cause of the kore-vs-booster gap in ecrecoverloop02-sig1-invalid
and ecrecoverloop02-sigs-valid benchmarks specs.

The new abi-spec.k claim pow-byte-len-simplify exercises this rule directly in
booster-only mode and passes after the fix.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
(cherry picked from commit efa0e111609946d7c3506e3a7ae2c6d52cef16e4)
(cherry picked from commit fcab3c84c70d9034bee5d6a3b3152c306b6e9d00)
…nedness] to 46 comparison normalization rules, add coverage claims

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
(cherry picked from commit ccce6b5dc1442f51df8939c71b3e378cf74c5894)
(cherry picked from commit 2b7099cf814c7ed27e8bc5af007cd52591e7b869)
… BOOL-KORE @-variable patterns, add coverage claim

The BOOL-KORE module in K's domains.md defines 8 rules using @-prefixed
"here variables" (e.g., {true #Equals notBool @b}) which booster cannot
match.  Add equivalent rules with regular B:Bool variables to LEMMAS-HASKELL
so booster can apply the same normalizations.  Add a functional spec claim
exercising the notBool case, which passes in booster-only mode.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
(cherry picked from commit 80ed6938ac79faf1a98ef55ddc2a0226a3ae39b2)
(cherry picked from commit c498024497ff2a65fd6cc58916db745813f5acf4)
… add preserves-definedness to range-empty

The range-empty rule (#range(_,S,W) => .Bytes when S<0 or W<=0) previously
lacked [preserves-definedness], so booster refused to apply it during the
implies check.  Tag it accordingly and add two coverage claims to
lemmas-no-smt-spec.k to verify it fires in booster.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
(cherry picked from commit 666ba06c5956d8b9b540446089f73b6a45fba000)
(cherry picked from commit f30c2ff9ac9e7bdf0af6f10b47f716f5e0f1728e)
…parison rule

After commit 5d0d8083c added [preserves-definedness] to the
2^Int(8*SIZE) => #powByteLen(SIZE) rule, booster fires it consistently
even when SIZE is concrete (the rule fires before path-condition lookup
makes the argument concrete).  Because #powByteLen has no-evaluators,
the result #powByteLen(N) cannot self-evaluate, leaving CONST <Int

Add [powByteLen-lt-concrete] to bridge that gap: when both CONST and N
are concrete the requires-clause delegates back to 2^Int arithmetic,
which can be evaluated directly.

Add a coverage claim in lemmas-no-smt-spec.k to verify the rule fires.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
(cherry picked from commit 8b4972c4cc410995c3ec2360ca6dd50c189d374b)
(cherry picked from commit 11b7050dfc88d903d1e1c1cb62802b6db25b9187)
#asAccount is total over its Bytes domain — the two rules
(lengthBytes == 0 → .Account; [owise] → #asWord(BS)) are jointly
exhaustive. Marking it [total] lets booster's definedness checker
clear the RHS-definedness gate on rules that contain #asAccount in
their RHS — notably [call.delegatedAuthority] at evm.md:1633, where
the symbolic argument #range(CODE, 3, 20) prevents concrete evaluation
and previously caused booster to abort with "Uncertain about
definedness of rule due to: non-total symbol Lbl#asAccount", yielding
to kore for the rewrite step.

The recover-mode sweep across functional + examples + erc20 +
benchmarks identified 12 kore-execute handoffs (all benchmarks
ecrecover variants) that traced to exactly this definedness abort.
Validated locally on benchmarks/ecrecover00-siginvalid-spec.k:
2 handoffs → 0, proof still passes.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
(cherry picked from commit 33c04ff6558c339ffc06759dcaa49cd139222ef8)
(cherry picked from commit 4ce956c09bb03a7d06fdf533f95b2cfaa40b0dc7)
…comparison rules

Tier-1 subset of upstream 258a68c71: keep the two base rules (sound since
minInt(A,_) <= A and C <= maxInt(C,_)); drop the non-linear shift/factored
variants pending review.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…cations

Tier-1 subset of upstream 584462824: the asWord-{lt,le,eq}-false rules
(contrapositives of path-condition facts; RHS is false, so trivially
definedness-preserving). Drops that commit's slot-updates-spec.k edits
(claims regressed PASS->FAIL and need investigation). Coverage claims for
these rules are already in lemmas-no-smt-spec.k.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Tier-1 subset of upstream d0f4bdec3: keep eq-false-lt (A ==Int B => false
when A <Int B, B concrete). Drops le-1073741824-maxUInt64, whose 2^30 bound
is a non-general magic constant pending review. Coverage claim is already in
lemmas-no-smt-spec.k.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…und + concat-lookup simplifications

Consolidates the Tier-1 #widthOpCode work from upstream Rounds 2-5 (which
iteratively rewrote the same lines): #widthOpCode is total (PUSH ops -> 2..33,
owise -> 1) and gets an smtlib symbol; widthOpCode-lb/ub encode its 0..33 range
as smt-lemmas; widthOpCode-concat-{left,right} push a concrete-prefix byte
lookup through +Bytes. Drops the Round-4 asWord-range-buf and the
#computeValidJumpDests preserves-definedness riders pending review.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ication rules

For a [simplification] rule the booster's definedness obligation is the set of
non-total, non-constructor symbols on the LHS and RHS (Internalise.hs); when that
set is empty the rule is determined definedness-preserving automatically and the
attribute is a no-op. Removed it from the 66 rules added here whose every symbol
is total (`+Int -Int *Int` / comparisons / `minInt maxInt lengthBytes +Bytes
notBool` / the `total` functions `#asWord #buf #range #widthOpCode`), i.e. the
comparison-normalization block, eq-false-lt, the minInt<maxInt base rules,
asWord-{lt,le,eq}-false, asWord-buf-inversion, range-empty, lengthBytes-buf/range,
b"" concat identities, and the Bool #Equals distributions.

Kept on the 5 rules that root a partial symbol (where the attribute does real
work): the two #powByteLen rules (no-evaluators), lookup-as-asWord and
widthOpCode-concat-{left,right} (partial BYTES.get `_[_]`).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Comment thread kevm-pyk/src/kevm_pyk/kproj/evm-semantics/lemmas/lemmas.k Outdated
Comment thread kevm-pyk/src/kevm_pyk/kproj/evm-semantics/buf.md
ehildenb and others added 2 commits June 4, 2026 21:09
Per review: N need not be syntactically concrete -- once N is concrete the
`requires CONST <Int 2^Int(8*N)` reduces to a linear inequality, and if N is
symbolic the requires is undischargeable so the rule simply does not fire.
(It does not subsume `SIZE <Int #powByteLen(SIZE)`, which is the diagonal
same-variable case.)

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Per review: bytes-simplification.k already provides bytes-concat-lookup-{left,
right}, so these were only a workaround for booster not descending into
#widthOpCode(...) to apply them. Remove rather than duplicate per-function; if
the non-descent persists it should be addressed in the backend.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
ehildenb and others added 4 commits June 5, 2026 14:01
…used bool-eq #Equals rules

Whole-prove-suite rule-firing measurement (booster Simplify + kore SimplifyKore
logging over functional/benchmarks/erc20/examples/kontrol/mcd) shows the eight
bool-eq-* ML-#Equals distribution rules are never applied by the booster: five
fire in no engine at all, and three (bool-eq-not-true-l/r, bool-eq-and-true-l)
fire only in the kore-rpc implies/subsumption path. Since this PR targets booster
performance and these proofs already discharge those goals under kore without the
rules, all eight are removed. The bool-*-self rules are kept (they fire in the
booster). The bool-eq-not-false coverage claim, which was discharged by
bool-eq-not-true-l rather than by the not-false rules, is removed with them.

Verified: lemmas-no-smt-spec proves all 22 remaining claims under --use-booster.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…o fix DSS/MCD implies failures

The new tier1 simplification lemmas simplify away the keccak path-condition
constraint during EVM execution, so the final subsumption check can no longer
prove `keccak(_) =/=Int smallInt` from the path condition alone. Encoding the
rule as an SMT lemma lets Z3 discharge it independently of the path condition.

This matches the treatment already present in mcd-structured/verification.k,
whose sibling proofs pass; the plain mcd suite lacked the annotation. Fixes the
heal/fold/frob-diff-zero-dart DSS proofs and the mcd flopper-tick/end-pack rule
proofs. Verified mcd/end-pack-pass-rough-spec.k passes with the change.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@ehildenb ehildenb changed the title Lemmas tier1 Simplification lemmas to improve Haskell booster coverage Jun 8, 2026
…under booster-dev

Verified by running the booster-dev binary over the full skip-list on this branch:
mcd-structured/dsvalue-peek-pass-rough now passes under tier1's lemmas (failed on
master), and 4 entries (functional/bitwise-{asword,simplifications},
mcd[-structured]/dsvalue-read-pass-summarize) were stale (already passing on master).
bitwise-mask-shift-spec.k is intentionally kept: it regresses under booster-dev on
this branch.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@ehildenb

ehildenb commented Jun 9, 2026

Copy link
Copy Markdown
Member Author

(i) Performance changes

Caveat: no clean perf benchmark was run. Numbers below are recover-mode wall-clock under concurrent np (≈2× RPC, no caching, cross-run contention) — directionally fine, not authoritative. The dedicated isolated normal-mode methodology (CLAUDE.md "Comparing haskell-backend builds") is the right tool for real perf and is a recommended follow-up.

scope master tier1 delta
functional suite wall-clock 1583 s 1601 s +1%
rules suite wall-clock 3255 s 4019 s +23%*
rules, sum over 124 both-passed specs 11949 s 12319 s +3% (noise)
dss (vat-spec, capped) 10800 s (timeout) 10800 s (timeout)

*The +23% rules wall-clock is not a slowdown: tier1 completes 14 more proofs that master fails/crashes out of quickly. The fair per-spec view (both-passed specs) is +3%, i.e. flat within contention noise. Per-spec movers are mixed and small (largest: mcd/flopper-dent-guy-same +105 s, ecrecoverloop02-sigs-valid −56 s) — consistent with noise, not a systematic regression.

Takeaway: no evidence of a performance regression; a proper isolated perf run is needed to claim any speedup.

@ehildenb

ehildenb commented Jun 9, 2026

Copy link
Copy Markdown
Member Author

(ii) Fallback changes (the headline — clear reduction)

suite metric master tier1 delta
functional total handoffs (488 common claims) 21 11 −48%
functional flavour implies 21 implies 11 implies-only
rules total handoffs (124 both-passed specs) 73 58 −21%
rules execute / implies 30 / 43 16 / 42 execute ~halved
rules all 264 common claims 105 81 −23%
dss (vat-spec, partial) total handoffs (36 common claims) 120 33 −72%
dss execute / implies 109 / 11 21 / 12 execute ~−80%

The dominant, consistent effect is a large reduction in execute-flavour fallback (the booster aborting a symbolic-execution rewrite and handing the step to Kore): ~½ on rules, ~⅕ on DSS. implies-flavour fallback is flat on rules/dss and is the entire (smaller) functional effect. Reductions concentrate where tier1 added lemmas: the ecrecover/ecrecoverloop benchmarks, MCD flapper/flopper/end/pot proofs, and the DSS vat proofs (helped further by tier1's keccak-disequality [smt-lemma] fix).

Even stronger than the counts — recover-mode provability (does the proof close booster-first at all):

suite master pass/fail tier1 pass/fail
rules 124 / 18 138 / 4

14 rules specs that master can only close with Kore now close booster-side on tier1; 0 regressions. (The 4 fail-on-both are recover-mode limitations — see anomalies.)

@ehildenb

ehildenb commented Jun 9, 2026

Copy link
Copy Markdown
Member Author

(iii) Booster-dev changes (skip-list)

Verified by running the standalone booster-dev binary on both branches over the full 137-entry failing-symbolic.haskell-booster-dev list (neutralised, then restored). Methodological note: recover-mode "0 handoffs" is not a proxy for booster-dev provability — of 82 specs that looked booster-only in recover-mode, only 5 actually pass booster-dev. The verification was essential.

Removed from the skip-list on this branch (5; all pass booster-dev on tier1):

spec master booster-dev tier1 booster-dev attribution
mcd-structured/dsvalue-peek-pass-rough-spec.k fail pass tier1-specific
functional/bitwise-asword-spec.k pass pass stale
functional/bitwise-simplifications-spec.k pass pass stale
mcd/dsvalue-read-pass-summarize-spec.k pass pass stale
mcd-structured/dsvalue-read-pass-summarize-spec.k pass pass stale

1 removal is tier1's doing; 4 are stale entries the list had drifted on (already passing on master). Skip-list: 137 → 132 lines.

@ehildenb

ehildenb commented Jun 9, 2026

Copy link
Copy Markdown
Member Author

(iv) Anomalies / non-passing proofs

# anomaly detail
1 booster-dev regression functional/bitwise-mask-shift-spec.k passes booster-dev on master, fails on tier1 (stable recheck, 1 failing node — not a timeout). Still passes in normal & recover mode; skip-listed so no CI break. Kept on the list. A tier1 lemma edit weakened the standalone-booster path for this claim — worth the author's look.
2 recover-mode fail-on-both 4 specs fail under recover-mode on both branches — examples/sum-to-n-foundry, kontrol/test-{allowchangestest-testfailallowchangestostorage, countertest-testincrement, owneruponlytest-testincrementasowner}-0. These are recover-mode limitations (stuck/indeterminate), independent of the lemmas; all pass in normal mode.
3 minor handoff regression examples/erc721-spec.md rises 1→2 implies-handoffs on tier1 (single spec, implies-flavour).
4 implies-count artifact (rules) Over all common claims, implies-handoffs appear to rise 45→53 on tier1 — an artifact of tier1 completing 14 extra proofs that master never reached. On the both-passed set, implies is flat (43→42).
5 DSS incomplete Both branches hit the 3 h cap on the 2-spec dss test; the −72% is over the identical first tranche (mcd/vat-spec.k, 36 claims), not an end-to-end completion. recover-mode is too slow for full DSS; a normal-mode count-aborts.py pass is the right end-to-end instrument.

ehildenb and others added 3 commits June 9, 2026 15:49
…s-summarize specs failing booster-dev in CI

PR #2859 CI 'Proofs: Rules (booster-dev)' failed on mcd/dsvalue-read-pass-summarize
and mcd-structured/dsvalue-read-pass-summarize (SystemExit: 1); local verification
diverged from CI. Re-add them; the other three drops (mcd-structured/dsvalue-peek-pass-rough
and functional/bitwise-{asword,simplifications}) pass in CI and stay removed.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Comment thread kevm-pyk/src/kevm_pyk/kproj/evm-semantics/buf.md
@ehildenb

ehildenb commented Jun 11, 2026

Copy link
Copy Markdown
Member Author

Overall, these lemmas alone are a mild performance hit (~6%). But coupled with runtimeverification/haskell-backend#4153 with the optional local simplification recursion set to 20, become performance neutral overall (and performance positive in many simplification-heavy cases). These lemmas alone eliminate many fallbacks to Kore though, so this gets us closer ot our goal of eliminating the Kore backend altogether.

@ehildenb ehildenb marked this pull request as ready for review June 11, 2026 05:57
@ehildenb ehildenb merged commit b460c09 into master Jun 11, 2026
12 checks passed
@ehildenb ehildenb deleted the lemmas-tier1 branch June 11, 2026 12:38
ehildenb added a commit to runtimeverification/haskell-backend that referenced this pull request Jun 11, 2026
…--equation-max-local-steps) (#4153)

This PR generalizes booster's equation evaluation strategy. Today, when
an equation rewrites a subterm during the bottom-up simplification
traversal, re-normalization happens by restarting whole-term passes from
the top, so a rewrite chain advances one step per pass. This PR adds a
per-pass budget, `--equation-max-local-steps N`: after a rewrite, the
result is re-simplified in place (LLVM evaluation plus re-entering the
bottom-up traversal) for up to N steps per pass before deferring back to
the global restart loop. N=0 — the default — is exactly the existing
restart-only behavior as the degenerate case of the same code path, so
this changes nothing out of the box; the knob lets users tune how deep
in-place simplification chains may run.
  
**Changes:** 
- New server option `--equation-max-local-steps` (default 0, identical
to current behavior); the budget resets every global pass.
- `Booster.Pattern.ApplyEquations`: on a rewrite in the bottom-up pass,
`localFixpointEval` re-enters the cached traversal in place, counting
re-entries against the budget.
- Termination stays fully bounded: at most `--equation-max-iterations`
passes times (budget + 1) applications per chain, with two-layer loop
detection (per-step chain check inside the budget, whole-term snapshots
across passes, which catch cycles longer than the budget).
- Unit tests pin the default-0 boundary (identical to the old
iteration-limit behavior) and the budgeted behaviors (deeper chains,
per-step loop detection, combined bound) in an explicit options window.
  
**Validation:**
- At the default, the unit suite and rpc-integration tests are
unchanged, including the `test-log-simplify-json` log-trace golden — the
default is trace-identical to master.
- KEVM dose-response sweep (15 biggest-moving specs, N ∈
{0,1,2,3,4,5,20,100,1000}, all specs passing in every configuration):
budgets 1–3 give about −30% total wall-clock (1530 s at N=0 → 1042 s at
N=2), with per-spec wins of −30…−50% saturating at N=1–2.
- The sweep also shows why this is a knob and not yet a new default: at
N≥4 one spec family (`ecrecoverloop02`) regresses +40–50%, because an
in-place chain completes a second nested memory write before the
restart-pass concat rules flatten the first, and booster hands the
resulting shape to Kore (Kore-side equation applications jump 2 980 → 26
477). The regression is an indicator of missing simplification lemmas /
`preserves-definedness` annotations, not of the strategy.
- Re-running the sweep with an extended lemma set that fills those gaps
removes the cliff entirely: improvement becomes monotone in the budget,
with the best total at N=20 (−35% vs N=0). Once the lemma set keeps
simplification in-house, re-simplifying at the same subterm until done
is a net performance win. This is the extended lemmas set:
runtimeverification/evm-semantics#2859. On that
PR, without this change KEVM has some proofs that have major
regressions. Coupled with this PR, and setting N=20, we get only
performance wins there.

---------

Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants