Formal Evaluation Semantics for Aiur
Goal
Define an Aiur.eval function in Lean — a pure, definitional interpreter for Aiur
Terms — so that we can state and prove theorems about the behavior of Aiur programs.
This is not meant to be efficient; it is a specification against which the bytecode
compiler and the Rust runtime can be verified.
Overview
The core idea: given a Toplevel (the set of all function/datatype declarations) and
a Function, evaluating that function on concrete Value arguments produces a
Result containing:
- An output
Value (the return value).
- A
Prop-valued side condition accumulating all assertEq obligations.
This lets us state theorems like:
-- If `foo_fn` does `assert_eq!(x, 0); 2 * x`:
theorem foo_spec (top : Toplevel) (x : G) :
Aiur.eval top `foo_fn [.field x] = .ok (.field (2 * x)) (x = 0)
Core Definitions
Values
Value is the semantic domain — what Aiur terms evaluate to. It mirrors Data but
is fully reduced:
inductive Value where
| unit : Value
| field : G → Value
| tuple : Array Value → Value
| array : Array Value → Value
deriving Repr, BEq, Inhabited
Memory
Store/load operations require a heap model. A Memory is a simple map from addresses
to values:
structure Memory where
cells : Array Value
deriving Inhabited
def Memory.store (mem : Memory) (v : Value) : G × Memory :=
let addr := G.ofNat mem.cells.size
(addr, { cells := mem.cells.push v })
def Memory.load (mem : Memory) (addr : G) : Option Value :=
mem.cells[addr.val.toNat]?
Result and EvalState
Evaluation is partial (recursion may diverge, pattern matches may be non-exhaustive)
and stateful (memory, IO buffers). We track:
structure EvalState where
memory : Memory
ioBuffer : IOBuffer
fuel : Nat -- recursion bound
inductive Result where
| ok : Value → Prop → Result -- value + accumulated assertions
| fail : String → Result -- stuck / out of fuel / match failure
The Prop component in ok is the conjunction of all assertEq conditions
encountered during evaluation. When there are no assertions, this is True.
Environment
An Env maps local variables to values. Function lookup goes through the Toplevel:
abbrev Env := Std.HashMap Local Value
def lookupFunction (top : Toplevel) (name : Global) : Option Function :=
top.functions.find? fun f => f.name == name
The Evaluator
partial def eval (top : Toplevel) (env : Env) (state : EvalState) :
Term → EvalState × Result
The key cases:
| Term |
Semantics |
.var x |
Look up x in env |
.data (.field g) |
.ok (.field g) True |
.data (.tuple ts) |
Evaluate each element, combine |
.data (.array ts) |
Evaluate each element, combine |
.ret t |
Evaluate t (marks escaping to caller) |
.let pat val body |
Evaluate val, match against pat to extend env, evaluate body |
.match t branches |
Evaluate t, try each (pat, body) in order |
.app name args |
Evaluate args, look up function, substitute into body, recurse (decrement fuel) |
.add a b |
Evaluate both to fields, return field addition (mod gSize) |
.sub a b |
Field subtraction |
.mul a b |
Field multiplication |
.eqZero a |
Evaluate a to .field g, return .field (if g == 0 then 1 else 0) |
.assertEq a b ret |
Evaluate a, b, evaluate ret, return ret's value with (a_val = b_val) ∧ ret_prop |
.store t |
Evaluate t, push to memory, return pointer field |
.load t |
Evaluate t to a field (address), look up in memory |
.proj t i |
Evaluate t to a tuple, project component i |
.get t i |
Evaluate t to an array, index at i |
.slice t i j |
Evaluate t to an array, extract [i, j) |
.set t i v |
Evaluate t to an array, evaluate v, replace at i |
For function calls, fuel is decremented. If fuel reaches 0, the result is
.fail "out of fuel". This makes eval total.
Pattern Matching
def matchPattern (pat : Pattern) (v : Value) : Option (List (Local × Value))
This attempts to destructure v according to pat, returning bindings on success.
For Pattern.field g, it checks that v = .field g. For Pattern.var x, it always
succeeds with [(x, v)]. For constructors (Pattern.ref), it checks the tag and
recursively matches sub-patterns.
assertEq Accumulation
The crucial design point: assertEq does not fail evaluation. Instead, the equality
is accumulated as a Prop in the result:
| .assertEq a b ret => do
let (state, a_val, a_prop) ← evalValue top env state a
let (state, b_val, b_prop) ← evalValue top env state b
let (state, ret_val, ret_prop) ← eval top env state ret
let assertion := (a_val = b_val)
(state, .ok ret_val (a_prop ∧ b_prop ∧ assertion ∧ ret_prop))
This mirrors how the circuit works: assertEq is a constraint, not a runtime check.
The evaluator produces the output regardless, and the proposition captures what must
hold for the execution to be valid.
Flattening to Array G
To connect eval to the existing test infrastructure (which works with Array G),
define:
def Value.flatten : Value → Array G
| .unit => #[]
| .field g => #[g]
| .tuple vs => vs.foldl (init := #[]) fun acc v => acc ++ v.flatten
| .array vs => vs.foldl (init := #[]) fun acc v => acc ++ v.flatten
And a top-level entry point:
def Aiur.run (top : Toplevel) (funcName : Global) (args : Array G)
(fuel : Nat := 1000) : Option (Array G × Prop) :=
let func ← lookupFunction top funcName
let env := Env.ofList (func.inputs.zip (unflattenArgs func.inputs args))
let state := { memory := default, ioBuffer := default, fuel }
match eval top env state func.body with
| (_, .ok val prop) => some (val.flatten, prop)
| _ => none
Stating Theorems
With this setup, we can write specifications like:
-- A function that asserts its input is zero and doubles it
-- fn foo(x: G) -> G { assert_eq!(x, 0); x + x }
theorem foo_correct (top : Toplevel) (x : G)
(h_func : lookupFunction top `foo = some foo_fn) :
Aiur.run top `foo #[x] = some (#[x + x], x = (0 : G)) := by
...
-- A pure function with no assertions
-- fn sum(x: G, y: G) -> G { x + y }
theorem sum_correct (top : Toplevel) (x y : G)
(h_func : lookupFunction top `sum = some sum_fn) :
Aiur.run top `sum #[x, y] = some (#[x + y], True) := by
...
-- Conditional: fn abs_or_zero(x: G) -> G { match eq_zero(x) { 1 => 0, _ => x } }
theorem abs_or_zero_spec (top : Toplevel) (x : G) ... :
Aiur.run top `abs_or_zero #[x] =
if x = 0 then some (#[(0 : G)], True) else some (#[x], True) := by
...
Key Design Decisions
1. Prop-valued assertions, not runtime failure
assertEq accumulates a Prop rather than causing eval to fail. This is essential
because Aiur is a circuit language: the prover always produces an output, and the
constraints determine whether the output is valid. Separating the output from the
validity condition is what makes formal verification useful — you can reason about what
the circuit computes independently of whether the constraints are satisfied.
2. Fuel-based termination
Using a fuel : Nat parameter makes eval structurally recursive (total). Theorems
are stated with "sufficient fuel" hypotheses, or we prove that certain inputs require
bounded recursion.
3. Operating on Term (pre-simplification)
eval works on Term, the surface-level IR, not on TypedTerm or Bytecode. This
means it can be used to specify the semantics of Aiur programs as written. A separate
correctness theorem would relate eval on Term to execution of compiled Bytecode.
4. Memory as an array
The memory model is a simple append-only array. store appends and returns the index
as a G. load indexes into the array. This is adequate because Aiur memory is
append-only (no mutation of existing cells).
Verification Roadmap
Phase 1: Definitional interpreter (Aiur.eval)
- Define
Value, Memory, EvalState, Result, Env.
- Implement
eval, matchPattern, Value.flatten.
- Implement
Aiur.run as the top-level entry point.
Phase 2: Basic specifications
- State and prove specs for simple functions:
id, sum, prod.
- State and prove specs involving
assertEq.
- State and prove specs involving
match on field values.
Phase 3: Data structure operations
- Specs for
proj, get, slice, set.
- Specs for tuple/array construction and destructuring.
- Specs involving
store/load.
Phase 4: Recursion and datatypes
- Specs for recursive functions (
factorial, fibonacci) with fuel analysis.
- Specs for enum construction and pattern matching (
Shape, Nat).
- Mutual recursion (
even/odd).
Phase 5: Compiler correctness
- Define a relation between
eval on Term and Bytecode.Toplevel.execute.
- Prove that compilation preserves semantics: if
eval produces (val, prop),
then execute on the compiled bytecode produces val.flatten (assuming prop
holds).
Phase 6: Byte and u32 operations
- Specs for
u8_add, u8_xor, u8_shift_left, u8_bit_decomposition, etc.
- Specs for
u32_less_than.
- These require modeling the Goldilocks field arithmetic precisely.
Relation to CSLib
The approach is analogous to how CSLib provides
foundational definitions for formalizing computer science in Lean. Where CSLib defines
evaluation semantics for lambda calculus and other foundational languages, we define
evaluation semantics for Aiur — a domain-specific circuit language. The key parallel
is: define a clean denotational/operational semantics first, then prove properties
against it, then (optionally) prove that concrete implementations refine the semantics.
Formal Evaluation Semantics for Aiur
Goal
Define an
Aiur.evalfunction in Lean — a pure, definitional interpreter for AiurTerms — so that we can state and prove theorems about the behavior of Aiur programs.This is not meant to be efficient; it is a specification against which the bytecode
compiler and the Rust runtime can be verified.
Overview
The core idea: given a
Toplevel(the set of all function/datatype declarations) anda
Function, evaluating that function on concreteValuearguments produces aResultcontaining:Value(the return value).Prop-valued side condition accumulating allassertEqobligations.This lets us state theorems like:
Core Definitions
Values
Valueis the semantic domain — what Aiur terms evaluate to. It mirrorsDatabutis fully reduced:
Memory
Store/load operations require a heap model. A
Memoryis a simple map from addressesto values:
Result and EvalState
Evaluation is partial (recursion may diverge, pattern matches may be non-exhaustive)
and stateful (memory, IO buffers). We track:
The
Propcomponent inokis the conjunction of allassertEqconditionsencountered during evaluation. When there are no assertions, this is
True.Environment
An
Envmaps local variables to values. Function lookup goes through theToplevel:The Evaluator
The key cases:
.var xxinenv.data (.field g).ok (.field g) True.data (.tuple ts).data (.array ts).ret tt(marks escaping to caller).let pat val bodyval, match againstpatto extendenv, evaluatebody.match t branchest, try each(pat, body)in order.app name args.add a bgSize).sub a b.mul a b.eqZero aato.field g, return.field (if g == 0 then 1 else 0).assertEq a b reta,b, evaluateret, returnret's value with(a_val = b_val) ∧ ret_prop.store tt, push to memory, return pointer field.load ttto a field (address), look up in memory.proj t itto a tuple, project componenti.get t itto an array, index ati.slice t i jtto an array, extract[i, j).set t i vtto an array, evaluatev, replace atiFor function calls,
fuelis decremented. If fuel reaches 0, the result is.fail "out of fuel". This makesevaltotal.Pattern Matching
This attempts to destructure
vaccording topat, returning bindings on success.For
Pattern.field g, it checks thatv = .field g. ForPattern.var x, it alwayssucceeds with
[(x, v)]. For constructors (Pattern.ref), it checks the tag andrecursively matches sub-patterns.
assertEq Accumulation
The crucial design point:
assertEqdoes not fail evaluation. Instead, the equalityis accumulated as a
Propin the result:This mirrors how the circuit works:
assertEqis a constraint, not a runtime check.The evaluator produces the output regardless, and the proposition captures what must
hold for the execution to be valid.
Flattening to
Array GTo connect
evalto the existing test infrastructure (which works withArray G),define:
And a top-level entry point:
Stating Theorems
With this setup, we can write specifications like:
Key Design Decisions
1. Prop-valued assertions, not runtime failure
assertEqaccumulates aProprather than causingevalto fail. This is essentialbecause Aiur is a circuit language: the prover always produces an output, and the
constraints determine whether the output is valid. Separating the output from the
validity condition is what makes formal verification useful — you can reason about what
the circuit computes independently of whether the constraints are satisfied.
2. Fuel-based termination
Using a
fuel : Natparameter makesevalstructurally recursive (total). Theoremsare stated with "sufficient fuel" hypotheses, or we prove that certain inputs require
bounded recursion.
3. Operating on
Term(pre-simplification)evalworks onTerm, the surface-level IR, not onTypedTermorBytecode. Thismeans it can be used to specify the semantics of Aiur programs as written. A separate
correctness theorem would relate
evalonTermto execution of compiledBytecode.4. Memory as an array
The memory model is a simple append-only array.
storeappends and returns the indexas a
G.loadindexes into the array. This is adequate because Aiur memory isappend-only (no mutation of existing cells).
Verification Roadmap
Phase 1: Definitional interpreter (
Aiur.eval)Value,Memory,EvalState,Result,Env.eval,matchPattern,Value.flatten.Aiur.runas the top-level entry point.Phase 2: Basic specifications
id,sum,prod.assertEq.matchon field values.Phase 3: Data structure operations
proj,get,slice,set.store/load.Phase 4: Recursion and datatypes
factorial,fibonacci) with fuel analysis.Shape,Nat).even/odd).Phase 5: Compiler correctness
evalonTermandBytecode.Toplevel.execute.evalproduces(val, prop),then
executeon the compiled bytecode producesval.flatten(assumingpropholds).
Phase 6: Byte and u32 operations
u8_add,u8_xor,u8_shift_left,u8_bit_decomposition, etc.u32_less_than.Relation to CSLib
The approach is analogous to how CSLib provides
foundational definitions for formalizing computer science in Lean. Where CSLib defines
evaluation semantics for lambda calculus and other foundational languages, we define
evaluation semantics for Aiur — a domain-specific circuit language. The key parallel
is: define a clean denotational/operational semantics first, then prove properties
against it, then (optionally) prove that concrete implementations refine the semantics.