Simplification lemmas to improve Haskell booster coverage#2859
Conversation
…-asWord Without this attribute Booster refused to apply the rule because the LHS contains the non-total hook Lbl_[_]_BYTES-HOOKE. After the fix Booster correctly applies it (8× succeeded in the execute-node simplify request). Analysis: scripts/bench-prove.py --analyse-fallbacks shows the "Uncertain about definedness of rule due to: non-total symbol Lbl_[_]_BYTES-HOOKE" error gone, replaced by successful Booster applications. The rule-index gap and indeterminate-match cases for other equations remain and will be addressed in follow-up commits. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> (cherry picked from commit 25e00d935ac1f9ce25394e40865dbb36c0cbd80b) (cherry picked from commit 6ae0def0a1de63aa5ffb13d1cdb2abe7d5c3651d)
bytes-simplification.k: add preserves-definedness to lengthBytes-buf and lengthBytes-range. evm-int-simplification.k: add asWord-buf-inversion-rangeUInt256. (cherry picked from commit d4dad3335e7e98d76057c9c0b07387a21a20c0b2) (cherry picked from commit c60634bc589f05a98aa30b06409df4ed478d407f)
…n-left Kore's execute-fallback applies INT-KORE.eq-int-true-rigth which can swap ==Int argument order in path conditions (e.g. lengthBytes(BA) ==Int 32 → 32 ==Int lengthBytes(BA)). This makes Booster's implies check fail due to a syntactic mismatch even when the logical content is identical. The new rule [int-eq-comm-concrete] at priority 30 fires before the +Int cancellation rules and normalises concrete-on-left to concrete-on-right, restoring the form the target spec expects. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> (cherry picked from commit 1d384970bda6669b479959a31f152c0e8aedde35) (cherry picked from commit 2cf8e8c620fee897758f02f5b2b512595428fbec)
…ation lemmas Adds to INT-SIMPLIFICATION-COMMON: - A <=Int A => true (reflexivity; fixes chop-overflow-02 under --booster-only-simplify) - A <=Int A +Int B => 0 <=Int B - A +Int ((B -Int A) +Int C) => B +Int C (4-term cancellation; fixes cancellation-02) - A <Int B andBool B <Int A => false (strict order contradiction; fixes b2w-03) Adds to INT-SIMPLIFICATION-HASKELL: - A <Int B andBool C <=Int A => false requires B <=Int C [concrete(B,C)] (range contradiction) - C <=Int A andBool A <Int B => false requires B <=Int C [concrete(B,C)] (range contradiction) (requires is purely concrete, evaluated by LLVM hook — no circular self-application risk) Adds to LEMMAS-HASKELL: - B orBool notBool B => true (and commuted variant; fixes b2w-05, chop-additional-knowledge) - B andBool notBool B => false (and commuted variant) Validation: full functional test suite passes in normal mode (slot-updates-spec.k failure is pre-existing, unrelated to these changes). Under --booster-only-simplify, lemmas-spec.k drops from 22 to 18 failing claims. No loops detected (checked via kevm-analyse hot-rules). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> (cherry picked from commit 3bb01297db74cacf75fa138f0550440882df8de1) (cherry picked from commit 4c13678fdb82d593bfc495cf0418804a3e505c90)
…en 0 <=Int B Add a higher-priority (40) variant of `A <=Int A+B => 0 <=Int B` that short-circuits directly to `true` when `0 <=Int B` is already in the path condition, fixing `chop-overflow-01` in --booster-only-simplify mode where the intermediate `0 <=Int Y` form cannot be discharged at the implies check. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> (cherry picked from commit bfb0225f96a28cbf42ddd520dd49773b234944e5)
In K, .Bytes (the constructor) and b"" (LLVM-evaluated concrete empty bytes,
a Kore domain value \dv{SortBytes}("")) are different Kore terms. The existing
bytes-concat-empty-right/left rules use .Bytes and do not fire when LLVM hooks
(e.g. #encodeArgs base case) return b"".
New rules:
[bytes-concat-empty-right-concrete]: B +Bytes b"" => B
[bytes-concat-empty-left-concrete]: b"" +Bytes B => B
Fixes booster-only failure for encodepacked-keccak01: the contract evaluates
keccak(1 : #encodeArgs(#uint256(A0))) where the #encodeArgs base case produces
b"", leaving keccak(X +Bytes b"") vs keccak(X) as the implies-check mismatch.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
(cherry picked from commit 829d293906d6efb769649002b2cea2ebf90235fc)
(cherry picked from commit a4669b0942edb8c974750490affdad4e1ed1d54d)
…ication, add coverage claim The rule `2 ^Int (8 *Int SIZE) => #powByteLen(SIZE)` was [mixed] in booster: it sometimes matched successfully but sometimes failed because booster rejected the rewrite without proof that #powByteLen(SIZE) is defined. Since #powByteLen is a function symbol with no-evaluators (always yields a defined term), adding [preserves-definedness] is safe and fixes the gap. This was the root cause of the kore-vs-booster gap in ecrecoverloop02-sig1-invalid and ecrecoverloop02-sigs-valid benchmarks specs. The new abi-spec.k claim pow-byte-len-simplify exercises this rule directly in booster-only mode and passes after the fix. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> (cherry picked from commit efa0e111609946d7c3506e3a7ae2c6d52cef16e4) (cherry picked from commit fcab3c84c70d9034bee5d6a3b3152c306b6e9d00)
…nedness] to 46 comparison normalization rules, add coverage claims Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> (cherry picked from commit ccce6b5dc1442f51df8939c71b3e378cf74c5894) (cherry picked from commit 2b7099cf814c7ed27e8bc5af007cd52591e7b869)
… BOOL-KORE @-variable patterns, add coverage claim
The BOOL-KORE module in K's domains.md defines 8 rules using @-prefixed
"here variables" (e.g., {true #Equals notBool @b}) which booster cannot
match. Add equivalent rules with regular B:Bool variables to LEMMAS-HASKELL
so booster can apply the same normalizations. Add a functional spec claim
exercising the notBool case, which passes in booster-only mode.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
(cherry picked from commit 80ed6938ac79faf1a98ef55ddc2a0226a3ae39b2)
(cherry picked from commit c498024497ff2a65fd6cc58916db745813f5acf4)
… add preserves-definedness to range-empty The range-empty rule (#range(_,S,W) => .Bytes when S<0 or W<=0) previously lacked [preserves-definedness], so booster refused to apply it during the implies check. Tag it accordingly and add two coverage claims to lemmas-no-smt-spec.k to verify it fires in booster. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> (cherry picked from commit 666ba06c5956d8b9b540446089f73b6a45fba000) (cherry picked from commit f30c2ff9ac9e7bdf0af6f10b47f716f5e0f1728e)
…parison rule After commit 5d0d8083c added [preserves-definedness] to the 2^Int(8*SIZE) => #powByteLen(SIZE) rule, booster fires it consistently even when SIZE is concrete (the rule fires before path-condition lookup makes the argument concrete). Because #powByteLen has no-evaluators, the result #powByteLen(N) cannot self-evaluate, leaving CONST <Int Add [powByteLen-lt-concrete] to bridge that gap: when both CONST and N are concrete the requires-clause delegates back to 2^Int arithmetic, which can be evaluated directly. Add a coverage claim in lemmas-no-smt-spec.k to verify the rule fires. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> (cherry picked from commit 8b4972c4cc410995c3ec2360ca6dd50c189d374b) (cherry picked from commit 11b7050dfc88d903d1e1c1cb62802b6db25b9187)
#asAccount is total over its Bytes domain — the two rules (lengthBytes == 0 → .Account; [owise] → #asWord(BS)) are jointly exhaustive. Marking it [total] lets booster's definedness checker clear the RHS-definedness gate on rules that contain #asAccount in their RHS — notably [call.delegatedAuthority] at evm.md:1633, where the symbolic argument #range(CODE, 3, 20) prevents concrete evaluation and previously caused booster to abort with "Uncertain about definedness of rule due to: non-total symbol Lbl#asAccount", yielding to kore for the rewrite step. The recover-mode sweep across functional + examples + erc20 + benchmarks identified 12 kore-execute handoffs (all benchmarks ecrecover variants) that traced to exactly this definedness abort. Validated locally on benchmarks/ecrecover00-siginvalid-spec.k: 2 handoffs → 0, proof still passes. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> (cherry picked from commit 33c04ff6558c339ffc06759dcaa49cd139222ef8) (cherry picked from commit 4ce956c09bb03a7d06fdf533f95b2cfaa40b0dc7)
…comparison rules Tier-1 subset of upstream 258a68c71: keep the two base rules (sound since minInt(A,_) <= A and C <= maxInt(C,_)); drop the non-linear shift/factored variants pending review. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…cations
Tier-1 subset of upstream 584462824: the asWord-{lt,le,eq}-false rules
(contrapositives of path-condition facts; RHS is false, so trivially
definedness-preserving). Drops that commit's slot-updates-spec.k edits
(claims regressed PASS->FAIL and need investigation). Coverage claims for
these rules are already in lemmas-no-smt-spec.k.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Tier-1 subset of upstream d0f4bdec3: keep eq-false-lt (A ==Int B => false when A <Int B, B concrete). Drops le-1073741824-maxUInt64, whose 2^30 bound is a non-general magic constant pending review. Coverage claim is already in lemmas-no-smt-spec.k. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…und + concat-lookup simplifications
Consolidates the Tier-1 #widthOpCode work from upstream Rounds 2-5 (which
iteratively rewrote the same lines): #widthOpCode is total (PUSH ops -> 2..33,
owise -> 1) and gets an smtlib symbol; widthOpCode-lb/ub encode its 0..33 range
as smt-lemmas; widthOpCode-concat-{left,right} push a concrete-prefix byte
lookup through +Bytes. Drops the Round-4 asWord-range-buf and the
#computeValidJumpDests preserves-definedness riders pending review.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ication rules
For a [simplification] rule the booster's definedness obligation is the set of
non-total, non-constructor symbols on the LHS and RHS (Internalise.hs); when that
set is empty the rule is determined definedness-preserving automatically and the
attribute is a no-op. Removed it from the 66 rules added here whose every symbol
is total (`+Int -Int *Int` / comparisons / `minInt maxInt lengthBytes +Bytes
notBool` / the `total` functions `#asWord #buf #range #widthOpCode`), i.e. the
comparison-normalization block, eq-false-lt, the minInt<maxInt base rules,
asWord-{lt,le,eq}-false, asWord-buf-inversion, range-empty, lengthBytes-buf/range,
b"" concat identities, and the Bool #Equals distributions.
Kept on the 5 rules that root a partial symbol (where the attribute does real
work): the two #powByteLen rules (no-evaluators), lookup-as-asWord and
widthOpCode-concat-{left,right} (partial BYTES.get `_[_]`).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Per review: N need not be syntactically concrete -- once N is concrete the `requires CONST <Int 2^Int(8*N)` reduces to a linear inequality, and if N is symbolic the requires is undischargeable so the rule simply does not fire. (It does not subsume `SIZE <Int #powByteLen(SIZE)`, which is the diagonal same-variable case.) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Per review: bytes-simplification.k already provides bytes-concat-lookup-{left,
right}, so these were only a workaround for booster not descending into
#widthOpCode(...) to apply them. Remove rather than duplicate per-function; if
the non-descent persists it should be addressed in the backend.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…used bool-eq #Equals rules Whole-prove-suite rule-firing measurement (booster Simplify + kore SimplifyKore logging over functional/benchmarks/erc20/examples/kontrol/mcd) shows the eight bool-eq-* ML-#Equals distribution rules are never applied by the booster: five fire in no engine at all, and three (bool-eq-not-true-l/r, bool-eq-and-true-l) fire only in the kore-rpc implies/subsumption path. Since this PR targets booster performance and these proofs already discharge those goals under kore without the rules, all eight are removed. The bool-*-self rules are kept (they fire in the booster). The bool-eq-not-false coverage claim, which was discharged by bool-eq-not-true-l rather than by the not-false rules, is removed with them. Verified: lemmas-no-smt-spec proves all 22 remaining claims under --use-booster. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…o fix DSS/MCD implies failures The new tier1 simplification lemmas simplify away the keccak path-condition constraint during EVM execution, so the final subsumption check can no longer prove `keccak(_) =/=Int smallInt` from the path condition alone. Encoding the rule as an SMT lemma lets Z3 discharge it independently of the path condition. This matches the treatment already present in mcd-structured/verification.k, whose sibling proofs pass; the plain mcd suite lacked the annotation. Fixes the heal/fold/frob-diff-zero-dart DSS proofs and the mcd flopper-tick/end-pack rule proofs. Verified mcd/end-pack-pass-rough-spec.k passes with the change. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…(review)" This reverts commit 6cd5f0d.
…under booster-dev
Verified by running the booster-dev binary over the full skip-list on this branch:
mcd-structured/dsvalue-peek-pass-rough now passes under tier1's lemmas (failed on
master), and 4 entries (functional/bitwise-{asword,simplifications},
mcd[-structured]/dsvalue-read-pass-summarize) were stale (already passing on master).
bitwise-mask-shift-spec.k is intentionally kept: it regresses under booster-dev on
this branch.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
(i) Performance changesCaveat: no clean perf benchmark was run. Numbers below are recover-mode wall-clock under concurrent np (≈2× RPC, no caching, cross-run contention) — directionally fine, not authoritative. The dedicated isolated normal-mode methodology (CLAUDE.md "Comparing haskell-backend builds") is the right tool for real perf and is a recommended follow-up.
*The +23% rules wall-clock is not a slowdown: tier1 completes 14 more proofs that master fails/crashes out of quickly. The fair per-spec view (both-passed specs) is +3%, i.e. flat within contention noise. Per-spec movers are mixed and small (largest: Takeaway: no evidence of a performance regression; a proper isolated perf run is needed to claim any speedup. |
(ii) Fallback changes (the headline — clear reduction)
The dominant, consistent effect is a large reduction in Even stronger than the counts — recover-mode provability (does the proof close booster-first at all):
14 rules specs that master can only close with Kore now close booster-side on tier1; 0 regressions. (The 4 fail-on-both are recover-mode limitations — see anomalies.) |
(iii) Booster-dev changes (skip-list)Verified by running the standalone Removed from the skip-list on this branch (5; all pass
1 removal is tier1's doing; 4 are stale entries the list had drifted on (already passing on master). Skip-list: 137 → 132 lines. |
(iv) Anomalies / non-passing proofs
|
…s-summarize specs failing booster-dev in CI PR #2859 CI 'Proofs: Rules (booster-dev)' failed on mcd/dsvalue-read-pass-summarize and mcd-structured/dsvalue-read-pass-summarize (SystemExit: 1); local verification diverged from CI. Re-add them; the other three drops (mcd-structured/dsvalue-peek-pass-rough and functional/bitwise-{asword,simplifications}) pass in CI and stay removed. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
Overall, these lemmas alone are a mild performance hit (~6%). But coupled with runtimeverification/haskell-backend#4153 with the optional local simplification recursion set to 20, become performance neutral overall (and performance positive in many simplification-heavy cases). These lemmas alone eliminate many fallbacks to Kore though, so this gets us closer ot our goal of eliminating the Kore backend altogether. |
…--equation-max-local-steps) (#4153) This PR generalizes booster's equation evaluation strategy. Today, when an equation rewrites a subterm during the bottom-up simplification traversal, re-normalization happens by restarting whole-term passes from the top, so a rewrite chain advances one step per pass. This PR adds a per-pass budget, `--equation-max-local-steps N`: after a rewrite, the result is re-simplified in place (LLVM evaluation plus re-entering the bottom-up traversal) for up to N steps per pass before deferring back to the global restart loop. N=0 — the default — is exactly the existing restart-only behavior as the degenerate case of the same code path, so this changes nothing out of the box; the knob lets users tune how deep in-place simplification chains may run. **Changes:** - New server option `--equation-max-local-steps` (default 0, identical to current behavior); the budget resets every global pass. - `Booster.Pattern.ApplyEquations`: on a rewrite in the bottom-up pass, `localFixpointEval` re-enters the cached traversal in place, counting re-entries against the budget. - Termination stays fully bounded: at most `--equation-max-iterations` passes times (budget + 1) applications per chain, with two-layer loop detection (per-step chain check inside the budget, whole-term snapshots across passes, which catch cycles longer than the budget). - Unit tests pin the default-0 boundary (identical to the old iteration-limit behavior) and the budgeted behaviors (deeper chains, per-step loop detection, combined bound) in an explicit options window. **Validation:** - At the default, the unit suite and rpc-integration tests are unchanged, including the `test-log-simplify-json` log-trace golden — the default is trace-identical to master. - KEVM dose-response sweep (15 biggest-moving specs, N ∈ {0,1,2,3,4,5,20,100,1000}, all specs passing in every configuration): budgets 1–3 give about −30% total wall-clock (1530 s at N=0 → 1042 s at N=2), with per-spec wins of −30…−50% saturating at N=1–2. - The sweep also shows why this is a knob and not yet a new default: at N≥4 one spec family (`ecrecoverloop02`) regresses +40–50%, because an in-place chain completes a second nested memory write before the restart-pass concat rules flatten the first, and booster hands the resulting shape to Kore (Kore-side equation applications jump 2 980 → 26 477). The regression is an indicator of missing simplification lemmas / `preserves-definedness` annotations, not of the strategy. - Re-running the sweep with an extended lemma set that fills those gaps removes the cliff entirely: improvement becomes monotone in the budget, with the best total at N=20 (−35% vs N=0). Once the lemma set keeps simplification in-house, re-simplifying at the same subterm until done is a net performance win. This is the extended lemmas set: runtimeverification/evm-semantics#2859. On that PR, without this change KEVM has some proofs that have major regressions. Coupled with this PR, and setting N=20, we get only performance wins there. --------- Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
This PR adds a curated first tier of simplification lemmas, plus the supporting attribute annotations, so that common arithmetic, Bool, bytes, and
#asWordgoals discharge directly in the Haskell booster instead of falling back to kore-rpc. Each new rule is paired with a focused coverage claim in the functional specs, and the change set includes the SMT support needed to keep the MCD/DSS proofs passing once the new lemmas simplify their keccak side-conditions away.Changes:
int-simplification.k,evm-int-simplification.k,bytes-simplification.k,lemmas.k, andbuf.md:==Intcommutativity normalization (concrete argument on the left), arithmetic and Bool tautologies,A <=Int A +Int Bshort-circuit when0 <=Int B, concrete bytes-concat identities,#asWordcomparison-to-false cases,minInt/maxIntbase comparisons,eq-false-lt,#widthOpCodebounds, and a concrete#powByteLencomparison rule.#widthOpCode[total, smtlib]and#asAccount[total]; add[preserves-definedness]to the rules that preserve definedness.tests/specs/mcd/verification.k: mark the keccak disequality rule[smt-lemma]so Z3 can dischargekeccak(_) =/=Int <small constant>independently of the path condition. This is required once the new lemmas simplify that constraint away, and fixes theheal/fold/frobDSS proofs and theflopper-tick/end-pack/flopper-dentrules proofs.lemmas-no-smt-spec.k,int-simplifications-spec.k, andabi-spec.k.Validation:
[smt-lemma]itself adds ~1%.