From 68c115f853c9b5bfa531d5c6028b4537cc1a9dad Mon Sep 17 00:00:00 2001 From: philippevelha Date: Sat, 23 May 2026 11:52:34 +0200 Subject: [PATCH 01/28] docs: add mir_autodiff INTERNALS Covers the forward-mode source-transformation AD pass: Unknown/Derivative types, KnownDerivatives/DerivativeIntern, LiveDerivatives three-phase dataflow analysis, subgraph sharing optimisation, DerivativeBuilder per-opcode rules, and a traced sin_second_order worked example. Co-Authored-By: Claude Sonnet 4.6 --- docs/mir_autodiff/INTERNALS.md | 289 +++++++++++++++++++++++++++++++++ 1 file changed, 289 insertions(+) create mode 100644 docs/mir_autodiff/INTERNALS.md diff --git a/docs/mir_autodiff/INTERNALS.md b/docs/mir_autodiff/INTERNALS.md new file mode 100644 index 00000000..232b4407 --- /dev/null +++ b/docs/mir_autodiff/INTERNALS.md @@ -0,0 +1,289 @@ +# `mir_autodiff` — Internals + +Automatic differentiation pass for OpenVAF's MIR. + +--- + +## 1. Purpose and Context + +Circuit simulators need the Jacobian of a device model's DAE system — partial derivatives of branch currents/voltages with respect to node potentials. OpenVAF computes these derivatives at compile time by transforming the MIR itself, not by differentiating LLVM IR. + +Working at the MIR level has two advantages: +- The resulting derivative instructions are in the same SSA form as the primal instructions, so every subsequent `mir_opt` pass (constant propagation, GVN, DCE, inst-combine) applies to derivatives automatically. +- The transformation is portable: it runs before any target-specific codegen. + +The technique is **forward-mode source transformation**. For each `Unknown` variable (a node voltage or current that the simulator may probe), the pass inserts instructions that compute `d(value)/d(unknown)` alongside each existing instruction. Higher-order and mixed partial derivatives are handled by composing multiple rounds of differentiation. + +--- + +## 2. Module Map + +| File | Role | +|---|---| +| `lib.rs` | Public entry point `auto_diff()`; `zero_derivative()` helper | +| `intern.rs` | `Derivative` / `DerivativeInfo` types; `DerivativeIntern` table | +| `live_derivatives.rs` | `LiveDerivatives` dataflow analysis; `ChainRule` type | +| `postorder.rs` | Reverse-postorder block iterator used by the builder | +| `builder.rs` | `DerivativeBuilder` — inserts derivative instructions into the function | +| `subgraph.rs` | `SubGraphExplorer` — subgraph sharing optimisation | + +--- + +## 3. Entry Point + +```rust +pub fn auto_diff( + mut func: impl AsMut, + dom_tree: &DominatorTree, + derivatives: &KnownDerivatives, + extra_derivatives: &[(Value, mir::Unknown)], +) -> AHashMap<(Value, mir::Unknown), Value> +``` + +`func` is the MIR `Function` to be transformed in place. `dom_tree` must be pre-computed. `derivatives` carries the set of `Unknown`s declared in the function and the `ddx()` call sites. `extra_derivatives` seeds additional `(value, unknown)` pairs that the caller needs but that are not implied by `ddx()` calls alone. + +The return value maps each `(primal_value, unknown)` pair to the SSA `Value` holding that derivative after the transformation. Callers in `sim_back` use this map to wire Jacobian entries. + +--- + +## 4. `Unknown` and `Derivative` Types + +### `Unknown` (defined in `mir`) + +```rust +pub struct Unknown(pub u32); +``` + +A newtype index identifying one differentiation variable — typically a node voltage or branch current. `Unknown`s are interned into the `unknowns` table inside `KnownDerivatives`; the associated `Value` is the SSA value that represents the variable in the primal computation. + +### `Derivative` (defined in `intern.rs`) + +```rust +struct Derivative(u32); + +struct DerivativeInfo { + previous_order: Option, + base: Unknown, +} +``` + +`Derivative` is an index into a `TiSet` table owned by `DerivativeIntern`. Each entry records: +- `base` — the `Unknown` differentiated at this level. +- `previous_order` — if `Some(d)`, this is a higher-order derivative built on top of `d`. + +This forms a linked list encoding mixed partial derivatives. For example: +- First-order ∂/∂x: `DerivativeInfo { base: x, previous_order: None }` +- Second-order ∂²/∂x²: `DerivativeInfo { base: x, previous_order: Some(d_x) }` +- Mixed ∂²/∂x∂y: `DerivativeInfo { base: y, previous_order: Some(d_x) }` + +New derivatives are created with `raise_order(base)` (adds one level over an existing `Derivative`) or `raise_order_with(unknown)` (starts a fresh first-order derivative). + +--- + +## 5. `KnownDerivatives` and `DerivativeIntern` + +### `KnownDerivatives` (defined in `mir`) + +```rust +pub struct KnownDerivatives { + pub unknowns: TiSet, + pub ddx_calls: AHashMap, HybridBitSet)>, +} +``` + +`unknowns` maps each `Unknown` index to the SSA `Value` that holds the probed quantity. `ddx_calls` maps each `ddx()` `FuncRef` to a pair of bitsets `(first_order, higher_order)` specifying which `Unknown`s that particular `ddx()` call differentiates. + +### `DerivativeIntern` (defined in `intern.rs`) + +```rust +struct DerivativeIntern<'a> { + unknowns: TiSet, + ddx_calls: &'a AHashMap, HybridBitSet)>, + derivatives: TiSet, + buf: Vec, +} +``` + +`DerivativeIntern` is the working intern table used throughout the pass. It wraps the data from `KnownDerivatives` and adds the `derivatives` table (which grows as higher-order derivatives are requested) plus a scratch `buf`. It is constructed once at the start of `auto_diff` and passed through the liveness and builder phases. + +--- + +## 6. Phase 1 — `LiveDerivatives` + +Before inserting any instructions, the pass must determine which `(instruction, derivative)` pairs are actually needed. This avoids computing derivatives of dead values. + +```rust +pub(crate) struct LiveDerivatives { + pub mat: SparseBitMatrix, + pub conversions: AHashMap>, +} +``` + +`mat` is a sparse bit matrix: `mat[inst]` is the set of `Derivative`s whose value at the output of `inst` will be consumed downstream. `conversions` records chain-rule steps that must be performed at specific instructions (used when subgraph optimisation introduces synthetic unknowns). + +`LiveDerivatives::build()` runs in three steps, then calls the subgraph optimiser: + +**Step 1 — `populate_reachable_unknowns`** +Seeds the matrix. For each `ddx()` call instruction, marks the requested derivative as live at that instruction. For `OptBarrier` instructions, marks all derivatives that pass through the barrier as live. + +**Step 2 — `live_derivative_fixpoint`** +Propagates liveness backwards through the data-flow graph. If derivative `d` is live at an instruction `i`, then for each input operand `v` of `i`, the derivative of `v` is marked live at the defining instruction of `v`. This iterates until no new bits are set. For `PhiNode` instructions, liveness flows into all predecessor definitions. + +**Step 3 — `strip_unneeded_derivatives`** +Removes derivatives that became live only as intermediate results but whose final consumers were already eliminated by a previous optimisation. This keeps the matrix tight. + +After the three steps, `run_subgraph_opt()` is called to find instructions differentiated with respect to multiple unknowns; it may add new synthetic `Unknown`s and entries in `conversions`. + +--- + +## 7. Subgraph Optimisation + +```rust +struct SubGraphExplorer { ... } +``` + +When two derivatives share a common sub-expression, computing them independently duplicates work. `SubGraphExplorer` finds "subgraphs" — contiguous sets of instructions differentiated with respect to multiple unknowns — and replaces them with a single computation for a synthetic unknown, followed by chain-rule multiplications. + +The cost criterion before committing to a subgraph: + +```rust +saved_insts > extra_inst_approx + 4 + && (saved - extra) * 100 / num_insts > 15 +``` + +Both conditions must hold: the absolute saving must exceed the overhead by at least 4 instructions, and the relative gain must be at least 15 % of the subgraph size. This avoids regressing small subgraphs where the chain-rule overhead dominates. + +When a subgraph is accepted, `SubGraphExplorer` adds synthetic `Unknown`s to `DerivativeIntern` and records the chain-rule multiplications in `LiveDerivatives::conversions`. + +--- + +## 8. Phase 2 — `DerivativeBuilder` + +`build_derivatives()` is the public entry within the builder module. It constructs a `DerivativeBuilder` and walks the function's blocks in reverse postorder (dominators before uses). + +```rust +struct DerivativeBuilder<'a, 'b> { ... } +``` + +**Seeding.** Before the walk begins, for each `Unknown` `u` whose associated `Value` is `x`, the seed `(x, u) → F_ONE` is inserted into the derivative map (the derivative of a variable with respect to itself is 1). + +**Per-instruction processing.** For each instruction, `inst_derivative()` is called for every `Derivative` marked live at that instruction in the `LiveDerivatives` matrix. It emits the appropriate derivative instruction(s) into the function and records the resulting `Value` in the map. + +**Cache mechanism.** `inst_cache()` is called before differentiation of an instruction. It pre-computes sub-expressions that multiple derivative rules reuse. For example, for `Fdiv(a, b)`, the cache stores `b * b` so that the quotient rule `(a' * b - a * b') / b²` only computes the denominator once across all differentiated operands. + +**Cyclic phi fix-up.** Loops introduce `PhiNode` instructions that are cyclic in the data-flow graph. The builder uses a two-pass approach: in the first pass it inserts placeholder `PhiNode`s for the derivative of each loop variable; in the second pass it fills in the correct operands once the derivative of the loop body is known. + +--- + +## 9. Differentiation Rules + +### Zero-derivative set + +`zero_derivative()` returns `true` for opcodes whose output is always constant with respect to any unknown. These instructions are skipped entirely: + +- Integer arithmetic and comparison opcodes (`Iadd`, `Isub`, `Imul`, etc.) +- Boolean logic (`BAnd`, `BOr`, `BXor`, `BNot`) +- Type conversions from integer to float that act as constants (`Iconst`, `Uicast`, `Iicast`) +- Control-flow affecting opcodes (`Jump`, `Branch`, `Exit`, `Phi` over integer types) +- `Iabs`, `IsNeg`, `IsZero`, `IsInf` + +### Per-opcode rules + +| Opcode | Primal | Derivative rule | +|---|---|---| +| `Fadd(a, b)` | a + b | a' + b' | +| `Fsub(a, b)` | a − b | a' − b' | +| `Fmul(a, b)` | a · b | a' · b + a · b' | +| `Fdiv(a, b)` | a / b | (a' · b − a · b') / b² | +| `Fneg(a)` | −a | −a' | +| `Fabs(a)` | \|a\| | a' · sign(a) | +| `Sqrt(a)` | √a | a' / (2 · √a) | +| `Exp(a)` | eᵃ | a' · eᵃ | +| `Ln(a)` | ln a | a' / a | +| `Log(a)` | log₁₀ a | a' / (a · ln 10) | +| `Sin(a)` | sin a | a' · cos a | +| `Cos(a)` | cos a | −a' · sin a | +| `Tan(a)` | tan a | a' / cos²(a) | +| `Hypot(a, b)` | √(a²+b²) | (a·a' + b·b') / √(a²+b²) | +| `Atan2(a, b)` | atan(a/b) | (a'·b − a·b') / (a²+b²) | +| `Pow(a, b)` | aᵇ | b·aᵇ⁻¹·a' + aᵇ·ln(a)·b' | +| `PhiNode` | φ(preds) | φ(pred derivatives) | + +### Special case: `Pow` with `base == 0` + +The term `aᵇ·ln(a)` is undefined when `a = 0` and `b` involves unknowns. The builder inserts a guard block: + +``` +if a == 0.0: + d/du (Pow) = 0.0 ; ln(0) term vanishes, aᵇ⁻¹ term handled by limit +else: + d/du (Pow) = b*a^(b-1)*a' + a^b * ln(a) * b' +``` + +The two paths are joined with a `PhiNode` in the merge block. + +### Special case: cyclic `PhiNode` + +For a loop with induction variable `x_next = phi(x_entry, x_body)`, the derivative is: + +``` +dx_next' = phi(x_entry', x_body') +``` + +The entry derivative is known (often 0 or 1 from the seed); the body derivative depends on the loop computation. The builder emits a placeholder `PhiNode` for `x_next'`, processes the loop body deriving `x_body'`, then patches the placeholder's operands. + +--- + +## 10. Worked Example — `sin_second_order` + +This example traces the `sin_second_order` test in `builder/tests.rs` end-to-end. + +### Primal MIR (simplified) + +``` +v12 = fmul v10, v11 ; v10 is Unknown(0), v11 is a parameter +v13 = sin v12 +v14 = call ddx(v13) ; first-order ∂(v13)/∂v10 +v15 = call ddx(v14) ; second-order ∂²(v13)/∂v10² +v100 = optbarrier v15 +``` + +`ddx_calls` tells the pass that both `call ddx` instructions differentiate with respect to `Unknown(0)` (associated with `v10`). + +### Liveness analysis + +Working backwards from `v100`: +- `optbarrier v15` → `v15` is live for derivative `d_x` (first order). +- `call ddx(v14)` consuming `v14` → `v14` is live for `d_x`. +- `call ddx(v13)` consuming `v13` → `v13` is live for `d_x`. +- `sin v12` → `v12` is live for `d_x` (needed for `cos v12`). +- `fmul v10, v11` → `v10` is the unknown itself; seed covers this. + +For the second-order request, `v14 = call ddx(v13)` is itself a `ddx()` instruction, so the liveness for the outer `ddx` seeds `d_x` liveness at `v14`'s definition, which then propagates through `v13` again — this time with derivative `d_x²` (second order). + +### Instruction insertion + +The builder processes blocks in reverse postorder and emits (in the same block, after the primal instructions): + +``` +; First-order ∂(sin(v10·v11))/∂v10: +v101 = cos v12 ; cos(v10·v11) +v102 = fmul v11, v101 ; v11 · cos(v10·v11) [= ∂(v13)/∂v10] + +; Second-order ∂²(sin(v10·v11))/∂v10²: +v103 = sin v12 ; sin(v10·v11) +v104 = fneg v103 ; −sin(v10·v11) +v105 = fmul v11, v104 ; v11 · (−sin(v10·v11)) +v106 = fmul v105, v11 ; v11² · (−sin(v10·v11)) [= ∂²(v13)/∂v10²] +``` + +The `optbarrier` operand is updated: `v100 = optbarrier v106`. + +### Mathematical verification + +Let `u = v10·v11`. Then: +- `v13 = sin(u)` +- `∂v13/∂v10 = cos(u) · v11` ✓ (`v102`) +- `∂²v13/∂v10² = −sin(u) · v11²` ✓ (`v106`) + +Note that `cos v12` and `sin v12` are both emitted separately: the first-order rule for `sin` needs `cos`, and the second-order rule for `cos` needs `−sin`. `mir_opt` (GVN) will deduplicate these after the pass if the same value is already live. From a2bd0a5959b6a79cb71d5e3af8333b0a734d86dc Mon Sep 17 00:00:00 2001 From: philippevelha Date: Sat, 23 May 2026 11:57:51 +0200 Subject: [PATCH 02/28] docs: add hir_lower INTERNALS MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Covers the HIR→MIR lowering pipeline: three-layer architecture (MirBuilder/LoweringCtx/BodyLoweringCtx), HirInterner reverse-mapping table, ParamKind/PlaceKind/CallBackKind enums, SSA construction, statement and expression lowering rules, and a resistor_va worked example tracing V(a,b)<+... to SSA instructions. Co-Authored-By: Claude Sonnet 4.6 --- docs/hir_lower/INTERNALS.md | 390 ++++++++++++++++++++++++++++++++++++ 1 file changed, 390 insertions(+) create mode 100644 docs/hir_lower/INTERNALS.md diff --git a/docs/hir_lower/INTERNALS.md b/docs/hir_lower/INTERNALS.md new file mode 100644 index 00000000..dac3fef2 --- /dev/null +++ b/docs/hir_lower/INTERNALS.md @@ -0,0 +1,390 @@ +# `hir_lower` — Internals + +HIR → MIR lowering: converts typed HIR constructs into SSA-form MIR. + +--- + +## 1. Purpose and Position + +`hir_lower` is the bridge between the semantic frontend (`hir`, `hir_def`, `hir_ty`) and the middle-end IR (`mir`). Its input is a `hir::Module` — a fully resolved, type-checked Verilog-A module — together with a `CompilationDB` Salsa database. Its output is a pair `(mir::Function, HirInterner)`. + +The `Function` is a complete SSA-form MIR function encoding the module's analog behavior: voltages and currents as function parameters, contributions as accumulated SSA values, and runtime service calls (noise, `$limit`, `ddt`) as opaque MIR `call` instructions. + +The `HirInterner` is the reverse-mapping table: given an SSA `Value` or `FuncRef` in the `Function`, it tells you which HIR concept it corresponds to. Downstream crates (`sim_back`, `mir_autodiff`) need this to interpret the MIR without importing the full HIR. + +--- + +## 2. Module Map + +| File | Role | +|---|---| +| `lib.rs` | `MirBuilder` (public API), `HirInterner`, and all key enums: `ParamKind`, `PlaceKind`, `CurrentKind`, `IdtKind`, `ImplicitEquationKind`, `LimitState`, `ImplicitEquation` | +| `ctx.rs` | `LoweringCtx` — wraps `FunctionBuilder` and `&mut HirInterner`; manages places, params, and callbacks | +| `body.rs` | `BodyLoweringCtx` — thin wrapper adding a `BodyRef` (HIR statement/expression body) to `LoweringCtx` | +| `stmt.rs` | `lower_stmt` dispatch: assignment, contribution, control flow, loops, case | +| `expr.rs` | `lower_expr` dispatch: literals, reads, binary/unary ops, builtins, nature access, `$limit` | +| `callbacks.rs` | `CallBackKind` enum and `FunctionSignature` generation for runtime service calls | +| `parameters.rs` | `HirInterner::insert_param_init` — builds the parameter-validation MIR (bounds checking, default values) | +| `state.rs` | `HirInterner::insert_var_init` — patches hidden-state `Param`s with variable initializer expressions | +| `dimensions.rs` | Array dimension helpers | +| `fmt.rs` | `DisplayKind` and `FmtArg` — support types for `$display`/`$debug` format string lowering | + +--- + +## 3. `MirBuilder` + +`MirBuilder` is the public entry point. It follows a builder pattern: callers set options on it, then call `build()`. + +```rust +pub struct MirBuilder<'a> { + db: &'a CompilationDB, + module: Module, + is_output: &'a dyn Fn(PlaceKind) -> bool, + required_vars: &'a mut dyn Iterator, + tagged_reads: AHashSet, + tag_writes: bool, + ctx: Option<&'a mut FunctionBuilderContext>, + lower_equations: bool, +} +``` + +The key options: +- `is_output` — a predicate the caller supplies to say which `PlaceKind`s should appear in `outputs` of the `HirInterner`. Typically `PlaceKind::Contribute { .. } | PlaceKind::ImplicitResidual { .. }`. +- `required_vars` — variables that must have their `Place`s declared even if they are never written in the analog block (needed for `HirInterner::insert_var_init`). +- `lower_equations` — when `false` the `no_equations` flag is set, suppressing contribution accumulation and noise/limit lowering (used for parameter-setup functions). +- `tag_writes` / `tagged_reads` — mark specific variable reads/writes with `OptBarrier`s so `sim_back` can identify them as observable outputs. + +`build()` orchestrates the full lowering: + +1. Creates an empty `Function` and `HirInterner`. +2. Wraps the `Function` in a `FunctionBuilder` (Cranelift-style SSA builder from `mir_build`). +3. Constructs `LoweringCtx` over the builder and interner. +4. Calls `body_ctx.lower_entry_stmts()` twice — first for the `analog initial` block, then for the `analog` block. The `analog initial` block runs once at simulation setup; the `analog` block runs each Newton iteration. +5. Declares places for any `required_vars` via `dec_place`. +6. Iterates all declared places; for those where `is_output` returns `true`, reads the current SSA value, wraps it in `ensure_optbarrier`, and stores the result in `interner.outputs`. +7. Emits `ret()` and finalizes the SSA construction. + +--- + +## 4. `HirInterner` + +```rust +pub struct HirInterner { + pub outputs: IndexMap, ahash::RandomState>, + pub params: TiMap, + pub callbacks: TiSet, + pub callback_uses: TiVec>, + pub tagged_reads: IndexMap, + pub implicit_equations: TiVec, + pub lim_state: TiMap>, +} +``` + +`outputs` — maps each output `PlaceKind` to the SSA `Value` holding its final accumulated value (wrapped in `OptBarrier`), or `None` if the place was not written. + +`params` — the central HIR↔MIR parameter table. Every MIR `Param` (a function input) has a corresponding `ParamKind` entry here. Downstream crates look up `ParamKind` to know what each `Param` means at the circuit level. + +`callbacks` and `callback_uses` — the set of runtime-service `FuncRef`s used in the `Function`. `callback_uses` records every call site for each callback; `sim_back` uses this to identify `ddt` and noise calls. + +`tagged_reads` — maps `OptBarrier`-wrapped values back to the `Variable` they were read from. Used by `sim_back` to locate variable probe points. + +`implicit_equations` — each `ImplicitEquation` index maps to an `ImplicitEquationKind` (`Ddt`, `NoiseSrc`, or one of the `Idt` variants), recording why an implicit node was introduced. + +`lim_state` — records each `$limit` call site: maps the probe `Value` to a list of `(stored_value, negated)` pairs that `sim_back` uses to thread the limiting iteration. + +### `unknowns()` + +`HirInterner::unknowns()` converts the interner into the `KnownDerivatives` struct that `mir_autodiff` consumes. It walks all `params` entries and assigns an `Unknown` index to each `Param` that the AD pass must differentiate: + +- `ParamKind::Voltage { hi, lo }` — gets an `Unknown` if a `NodeDerivative(hi)` or `NodeDerivative(lo)` callback exists in `callbacks`, or if `sim_derivatives` is set (the caller's flag indicating the simulator will ask for Jacobian entries). +- `ParamKind::Current(_)` and `ParamKind::ImplicitUnknown(_)` — get an `Unknown` only if `sim_derivatives` is set. +- Other kinds (parameters, temperature, etc.) — get an `Unknown` if a `Derivative(param)` callback exists (i.e., the model called `ddx(expr, param)`). + +--- + +## 5. `ParamKind` and `PlaceKind` + +### `ParamKind` — what an MIR `Param` represents + +`ParamKind` is the enumeration of all quantities that enter the MIR function as SSA function parameters (i.e., values the simulator supplies at runtime). + +| Variant | Meaning | +|---|---| +| `Param(Parameter)` | A model parameter (e.g., `R`, `tnom`) | +| `Voltage { hi, lo }` | Potential difference V(hi,lo); `lo=None` means potential to ground | +| `Current(CurrentKind)` | Branch or port current | +| `Temperature` | Simulation temperature (`$temperature`) | +| `Abstime` | Absolute simulation time (`$abstime`) | +| `EnableIntegration` | Flag: is time-domain integration active | +| `EnableLim` | Flag: is limiting active this Newton step | +| `PrevState(LimitState)` | Value of a `$limit` probe from the previous iteration | +| `NewState(LimitState)` | Value stored by a `$limit` call | +| `ParamGiven { param }` | Boolean: was `param` explicitly given by the netlist | +| `PortConnected { port }` | Boolean: is port node connected | +| `ParamSysFun(ParamSysFun)` | System function like `$mfactor`, `$xposition` | +| `HiddenState(Variable)` | Initial value of a real variable before the analog block runs | +| `ImplicitUnknown(ImplicitEquation)` | Voltage of an implicit internal node | + +`ParamKind::op_dependent()` returns `true` for kinds whose value changes each Newton iteration (voltages, currents, time, limiting state). This distinction matters in `sim_back` when splitting the function into init and eval kernels. + +### `PlaceKind` — mutable SSA slots + +`PlaceKind` names every mutable memory location that the Cranelift-style SSA builder tracks. The builder transparently inserts `PhiNode` instructions at join points. + +| Variant | Initialized to | Notes | +|---|---|---| +| `Var(Variable)` | `HiddenState(var)` param | The HIR variable; reads go through `read_variable` | +| `Contribute { dst, reactive, voltage_src }` | `F_ZERO` | Accumulated contribution to a branch | +| `ImplicitResidual { equation, reactive }` | `F_ZERO` | Residual of an implicit equation | +| `IsVoltageSrc(BranchWrite)` | `FALSE` | Set to `true` when a potential contribution is made | +| `CollapseImplicitEquation(eq)` | `TRUE` | `false` means "don't collapse"; init-only | +| `FunctionReturn(fun)` | `F_ZERO` or `ZERO` | Return value of an inline user function | +| `FunctionArg(arg)` | caller expression | Argument slot for an inlined function call | +| `Param / ParamMin / ParamMax` | (no init) | Written during parameter initialization | +| `BoundStep` | `INFINITY` | Simulator time-step bound | + +--- + +## 6. `LoweringCtx` and SSA Construction + +```rust +pub struct LoweringCtx<'a, 'c> { + pub db: &'a CompilationDB, + pub func: FunctionBuilder<'c>, + pub no_equations: bool, + pub intern: &'a mut HirInterner, + pub places: TiSet, + tagged_vars: AHashSet, + pub inside_lim: bool, + pub num_noise_sources: u32, +} +``` + +`LoweringCtx` owns the `FunctionBuilder` (the Cranelift-style SSA construction API from `mir_build`) and a mutable borrow of the `HirInterner` being built. + +### Place protocol + +`dec_place(kind)` declares a new SSA variable slot for `kind`. If the slot is new, it initializes it in the function entry block with the appropriate default value (see the table above). Returns the `Place` handle. + +`def_place(kind, val)` writes `val` to the slot for `kind` at the current program point. `use_place(kind)` reads the current value, inserting a `PhiNode` at block joins automatically. + +### `use_param` + +```rust +pub fn use_param(&mut self, kind: ParamKind) -> Value { + let len = self.intern.params.len(); + let entry = self.intern.params.raw.entry(kind); + *entry.or_insert_with(|| self.func.func.dfg.make_param(len.into())) +} +``` + +If the requested `ParamKind` has not been seen before, a new `Param` is allocated in the `Function`'s `DataFlowGraph` and registered in `interner.params`. The associated `Value` is returned. Subsequent calls with the same `kind` return the same `Value` without allocating. + +### Ground-node elision + +In Verilog-A, node `gnd` is implicit ground (potential = 0). `LoweringCtx::node(n)` returns `None` when `n.is_gnd(db)`. The `nodes(hi, lo, kind)` helper handles all four cases: + +- `(Some(hi), None)` → `use_param(kind(hi, None))` +- `(None, Some(lo))` → `use_param(kind(lo, None))` followed by `fneg` +- `(Some(hi), Some(lo))` → canonical `use_param(kind(hi, Some(lo)))`; checks if the inverted pair was already allocated and negates if so +- `(None, None)` → `F_ZERO` directly, no allocation + +### `no_equations` + +When `no_equations` is `true`, the lowering context skips contribution accumulation, noise calls, and `$limit` calls. This mode is used by `sim_back` when building the model-parameter-setup function, which evaluates parameter expressions but does not need circuit equations. + +--- + +## 7. Statement Lowering + +`BodyLoweringCtx::lower_stmt` dispatches on `Stmt`: + +**`Stmt::Assignment { lhs, rhs }`** — evaluates `rhs` to an SSA `Value`, then calls `def_place(lhs.into(), val)`. The `From` impl converts variable/function-return/function-arg lhs into the corresponding `PlaceKind`. + +**`Stmt::Contribute { kind, branch, rhs }`** — calls `contribute(voltage_src, branch, rhs)`: +1. Sets `IsVoltageSrc(branch)` to reflect whether this is a potential (`V <+`) or flow (`I <+`) contribution. +2. For `V(node) <+ 0.0` (ideal voltage source): emits a `CollapseHint(hi, lo)` callback (tells the simulator the two nodes may merge). +3. Initializes `Contribute { reactive: false, voltage_src }` to `F_ZERO` if not already present. +4. Evaluates `rhs`; if `rhs == F_ZERO`, returns early (no-op contribution). +5. Reads the current accumulated value `old`, computes `new = old + rhs` (or `old - rhs` for negated branches), and writes back via `def_place`. + +**`Stmt::If { cond, then_branch, else_branch }`** — evaluates `cond`, calls `make_cond` which creates three blocks (then, else, merge), seals them, and lowers each branch into its block. + +**`Stmt::ForLoop` / `Stmt::WhileLoop`** — `lower_loop` creates three blocks: loop-condition head, loop body, loop exit. Emits `jump` → cond block, `br_loop` (a conditional branch that also marks the back-edge for the SSA builder), and a back-edge `jump` at the end of the body to re-seal the condition block. + +**`Stmt::Case`** — `lower_case` compiles a chain of equality tests. Each case value generates a `br` into the body block or the next test. The default case, if present, is lowered inline at the fall-through point. All body blocks `jump` to the shared `end` block. + +--- + +## 8. Expression Lowering + +`lower_expr` evaluates a HIR `ExprId` and returns its SSA `Value`. + +**Literals** — `Literal::Float(v)` → `fconst(v)`, `Literal::Int(v)` → `iconst(v)`, `Literal::String(s)` → `sconst(s)`, `Literal::Inf` → `INFINITY` or `iconst(i32::MAX)` depending on type. + +**Variable reads** — `Expr::Read(Ref::Variable(var))` calls `read_variable`, which calls `use_place(Var(var))` and, if `var` is in `tagged_vars`, wraps the result in `optbarrier` and records it in `interner.tagged_reads`. + +**Parameter reads** — `Expr::Read(Ref::Parameter(p))` → `use_param(ParamKind::Param(p))`. + +**Binary operators** — `match_signature!` dispatches on the HIR type signature to select the correct opcode. Boolean `||` and `&&` are short-circuit: `a || b` becomes `if a { true } else { b }` (a `make_select` call), ensuring the right-hand side is not evaluated when unnecessary. + +**Nature access** (`BuiltIn::potential` and `BuiltIn::flow`) — the core circuit probes: +- `V(a, b)` → `nodes_from_args(args, |hi, lo| ParamKind::Voltage{hi, lo})` which calls `ctx.nodes(hi, lo, ...)`, handling ground elision. +- `V(branch)` → resolves the branch's hi and lo nodes, then calls `ctx.nodes(...)`. +- `I(branch)` → `use_param(ParamKind::Current(CurrentKind::Branch(br)))`. +- `I(a, b)` → `use_param(ParamKind::Current(CurrentKind::Unnamed{hi, lo}))`. +- `I()` → `use_param(ParamKind::Current(CurrentKind::Port(node)))`. + +**`BuiltIn::abs`** — lowered as a conditional: `if val < 0 { -val } else { val }` via `lower_select_with`. + +**`BuiltIn::limexp`** — lowered as a linearized exponential to prevent overflow: + +``` +if arg > ln(1e30): + (arg - ln(1e30)) * 1e30 + 1e30 +else: + exp(arg) +``` + +**`$limit` calls** — two-phase protocol described in section 8 below. + +**`$fatal`** — emits the display message, then `SetRetFlag(Abort)` callback, then `exit` instruction. Creates an unreachable block afterward (required because the SSA builder must always have an active block, but the code after `$fatal` is dead). + +--- + +## 9. `CallBackKind` and Runtime Callbacks + +Runtime services that cannot be expressed as pure arithmetic are lowered as MIR `call` instructions. Each `CallBackKind` variant has a unique `FunctionSignature` name; the MIR is agnostic about what the call does. + +```rust +pub enum CallBackKind { + Print { kind: DisplayKind, arg_tys: Box<[FmtArg]> }, + SimParam, + SimParamOpt, + SimParamStr, + Derivative(Param), + NodeDerivative(Node), + ParamInfo(ParamInfoKind, Parameter), + CollapseHint(Node, Option), + LimDiscontinuity, + Analysis, + BuiltinLimit { name: Spur, num_args: u32 }, + StoreLimit(LimitState), + TimeDerivative, + WhiteNoise { name: Spur, idx: u32 }, + FlickerNoise { name: Spur, idx: u32 }, + NoiseTable(Box), + SetRetFlag(RetFlag), +} +``` + +**`Derivative(Param)` and `NodeDerivative(Node)`** — represent `ddx(expr, param)` and `ddx(expr, V(node))` calls in the Verilog-A source. They are recorded in `callbacks` so that `HirInterner::unknowns()` can later identify which `Param`s are differentiation targets. + +**`SimParam` / `SimParamOpt` / `SimParamStr`** — query simulator-defined string-keyed parameters (`$simparam`). + +**`WhiteNoise`, `FlickerNoise`, `NoiseTable`** — each noise source gets a unique `idx` to prevent the optimizer from treating `white_noise(x) - white_noise(x)` as zero (they are statistically independent sources). + +**`StoreLimit(LimitState)`** — called by `finish_limit`; stores the `$limit` result for the next iteration's `PrevState` parameter. + +**`CollapseHint(hi, lo)`** — emitted when `V(hi, lo) <+ 0.0`; tells the simulator the two nodes may be collapsed into one. Has `ignore_if_op_dependent = true`, so `sim_back` omits it from operating-point functions. + +**Tracking.** `CallBackKind::tracked()` returns `false` only for `Print` variants. All other callbacks have their call sites recorded in `interner.callback_uses`. `sim_back` uses this to find `TimeDerivative` (i.e., `ddt()`) calls and noise calls when constructing the DAE system. + +### `$limit` two-phase protocol + +Lowering a `$limit(probe, fn, ...)` call: + +1. `start_limit(probe)` — allocates a `LimitState` index; registers the probe in `intern.lim_state`; returns the state handle. +2. Reads `PrevState(state)` and `EnableLim` as `Param`s. +3. `lower_select_with(enable_lim, ...)` — in the `true` branch, evaluates the limit function body with `new_val` and `old_val` as arguments; in the `false` branch, returns `new_val` unchanged. +4. `finish_limit(state, result)` — emits `call StoreLimit(state)` to persist the result; patches `intern.lim_state` with the stored value. + +--- + +## 10. Worked Example — `resistor.va` + +```verilog +module resistor_va(A, B); + inout A, B; electrical A, B; + branch (A, B) br_a_b; + parameter real R = 0.0 from [0:inf]; + parameter real zeta = 0.0 from [-20:20]; + parameter real tnom = 300.0 from [0:inf]; + real res, vres; + analog begin + vres = V(br_a_b); + res = R * pow($temperature / tnom, zeta); + I(br_a_b) <+ vres / res; + end +endmodule +``` + +`MirBuilder::build()` is called with `is_output = |k| matches!(k, PlaceKind::Contribute{..})`. + +**Step 1 — `vres = V(br_a_b)`** + +`lower_stmt(Assignment { lhs: Var(vres), rhs: potential(br_a_b) })`: +- `lower_expr(potential(br_a_b))`: resolves `br_a_b` to `hi=A`, `lo=B`; calls `ctx.nodes(A, Some(B), |hi,lo| ParamKind::Voltage{hi,lo})`. +- Neither A nor B is ground, so allocates `p0: Param` for `ParamKind::Voltage{hi:A, lo:Some(B)}`. Result SSA value: `v0 = param p0`. +- `dec_place(Var(vres))` → initializes `place0` to `HiddenState(vres)` param `p1` in entry block. +- `def_place(Var(vres), v0)` → `v0` is the current definition of `place0`. + +**Step 2 — `res = R * pow($temperature / tnom, zeta)`** + +`lower_expr(R * pow(...))`: +- `R` → `use_param(ParamKind::Param(R))` → `v1 = param p2`. +- `$temperature` → `use_param(ParamKind::Temperature)` → `v2 = param p3`. +- `tnom` → `use_param(ParamKind::Param(tnom))` → `v3 = param p4`. +- `zeta` → `use_param(ParamKind::Param(zeta))` → `v4 = param p5`. +- `v5 = fdiv v2, v3` (temperature / tnom) +- `v6 = pow v5, v4` (pow(temp/tnom, zeta)) +- `v7 = fmul v1, v6` (R * pow(...)) +- `def_place(Var(res), v7)`. + +**Step 3 — `I(br_a_b) <+ vres / res`** + +`lower_stmt(Contribute { kind: Flow, branch: br_a_b, rhs: vres/res })`: +- `def_place(IsVoltageSrc(br_a_b), FALSE)` — this is a flow contribution. +- `dec_place(Contribute{dst:br_a_b, reactive:false, voltage_src:false})` → initialized to `F_ZERO`. +- `lower_expr(vres / res)`: + - `vres` → `use_place(Var(vres))` = `v0`. + - `res` → `use_place(Var(res))` = `v7`. + - `v8 = fdiv v0, v7`. +- `old = use_place(Contribute{...})` = `F_ZERO`. +- Since `old == F_ZERO`: `new = v8` (no `fadd` needed). +- `def_place(Contribute{dst:br_a_b, ...}, v8)`. + +**Step 4 — finalization** + +`build()` iterates `places`; for `Contribute{dst:br_a_b, ...}`: +- `use_place(...)` = `v8`. +- `v9 = optbarrier v8`. +- `interner.outputs[Contribute{...}] = Some(v9)`. + +Emits `ret`. Final `Function` sketch (entry block only): + +``` +function resistor_va: + p0 = Voltage{A, Some(B)} ; V(A,B) + p1 = HiddenState(vres) + p2 = Param(R) + p3 = Temperature + p4 = Param(tnom) + p5 = Param(zeta) + + v0 = param p0 + v1 = param p2 + v2 = param p3 + v3 = param p4 + v4 = param p5 + v5 = fdiv v2, v3 + v6 = pow v5, v4 + v7 = fmul v1, v6 + v8 = fdiv v0, v7 + v9 = optbarrier v8 + ret +``` + +`interner.outputs[Contribute{dst:br_a_b, reactive:false, voltage_src:false}] = v9`. + +`sim_back` reads this output value and connects it to the DAE branch current equation for the `br_a_b` branch. From 91914859939acb47770cfbf9a36a916e19b9e474 Mon Sep 17 00:00:00 2001 From: philippevelha Date: Sat, 23 May 2026 11:59:41 +0200 Subject: [PATCH 03/28] docs: add sim_back INTERNALS Covers the simulation backend: CompiledModule 8-step construction pipeline, DaeSystem (SimUnknown/Residual/MatrixEntry), Initialization op-independent/op-dependent split with CacheSlot mechanism, NodeCollapse CollapsePair sources, and a full resistor_va trace from CompiledModule::new through to Jacobian entry wiring. Co-Authored-By: Claude Sonnet 4.6 --- docs/sim_back/INTERNALS.md | 437 +++++++++++++++++++++++++++++++++++++ 1 file changed, 437 insertions(+) create mode 100644 docs/sim_back/INTERNALS.md diff --git a/docs/sim_back/INTERNALS.md b/docs/sim_back/INTERNALS.md new file mode 100644 index 00000000..b3e594f8 --- /dev/null +++ b/docs/sim_back/INTERNALS.md @@ -0,0 +1,437 @@ +# `sim_back` — Internals + +Simulation backend: takes HIR module information and produces a `CompiledModule` ready for LLVM codegen. + +Related: [hir_lower INTERNALS](../hir_lower/INTERNALS.md) · [mir INTERNALS](../mir/INTERNALS.md) · [mir_autodiff INTERNALS](../mir_autodiff/INTERNALS.md) + +--- + +## 1. Purpose and Position + +`sim_back` sits between `hir_lower` and `osdi`. Its job is to take a fully lowered MIR function and transform it into the four artifacts that `osdi` needs to emit an OSDI 0.4 shared library: + +| Artifact | Field in `CompiledModule` | Purpose | +|---|---|---| +| Eval function | `eval` + `intern` | Per-Newton-iteration kernel: residuals, Jacobian, noise | +| Init function | `init` | Per-instance-setup kernel: op-independent calculations and cache slots | +| Model-param function | `model_param_setup` + `model_param_intern` | Model-level parameter validation (bounds, defaults) | +| Node collapse table | `node_collapse` | Which node pairs the simulator may merge | + +The DAE system (`dae_system`) embedded in `CompiledModule` describes the sparsity pattern and semantics of the Jacobian so that `osdi` can lay out the correct OSDI descriptor tables. + +--- + +## 2. Module Map + +| File | Role | +|---|---| +| `lib.rs` | `CompiledModule`, `SimUnknownKind`, `collect_modules`; debug helpers | +| `module_info.rs` | `ModuleInfo`, `ParamInfo`, `OpVar`, `collect_modules` | +| `context.rs` | `Context` — working state; optimization stage dispatch; taint analysis | +| `topology.rs` | `Topology`, `BranchInfo`, `Contribution` — intermediate branch representation; `ddt` linearization; small-signal network; noise | +| `topology/builder.rs` | `create_dimension` — linear contribution factoring | +| `topology/lineralize.rs` | Linearization of `ddt`/noise into implicit nodes or direct contribution dimensions | +| `topology/small_signal_network.rs` | Detection of nodes with statically zero large-signal voltage | +| `dae.rs` | `DaeSystem`, `Residual`, `MatrixEntry`, `SimUnknown` | +| `dae/builder.rs` | `Builder` — assembles `DaeSystem` from `Topology`; calls `auto_diff`; builds Jacobian | +| `init.rs` | `Initialization`, `CacheSlot` — init/eval function split | +| `node_collapse.rs` | `NodeCollapse`, `CollapsePair` | +| `noise.rs` | `NoiseSource`, `NoiseSourceKind` | +| `util.rs` | `add`, `is_op_dependent`, `strip_optbarrier_if_const`, `update_optbarrier` helpers | + +--- + +## 3. `collect_modules` and `ModuleInfo` + +```rust +pub fn collect_modules( + db: &CompilationDB, + all_vars_opvars: bool, + sink: &mut ConsoleSink, +) -> Option> +``` + +`collect_modules` is the entry point called by `openvaf::compile()`. It iterates over every `hir::Module` in the compilation unit and builds a `ModuleInfo` for each. + +```rust +pub struct ModuleInfo { + pub module: Module, + pub params: IndexMap, + pub sys_fun_alias: IndexMap, ahash::RandomState>, + pub op_vars: IndexMap, +} +``` + +`ModuleInfo::collect` walks every declaration in the module via `module.rec_declarations(db)` and applies two filters: + +**Parameters** — every `Parameter` is collected if it has a `desc` or `units` attribute. The attribute values populate `ParamInfo`: + +```rust +pub struct ParamInfo { + pub name: SmolStr, + pub alias: Vec, + pub unit: String, + pub description: String, + pub group: String, + pub is_instance: bool, +} +``` + +The `type` attribute distinguishes model parameters (`type = "model"`, the default) from instance parameters (`type = "instance"`). Instance parameters are included in the per-instance init function; model parameters go into the separate `model_param_setup` function. + +**Operating-point variables** — a `Variable` becomes an `OpVar` only when it is declared at module scope (not inside a named block or function) and has at least one of `desc` or `units`. These variables are marked as outputs in the MIR so their final values are readable by the simulator as operating-point data. + +--- + +## 4. `CompiledModule::new()` Pipeline + +`CompiledModule::new` runs eight sequential steps. Each step consumes and transforms the central `Context` object. + +``` +Step 1 Context::new + MirBuilder (with_equations + with_tagged_writes) + → (Function, HirInterner) + insert_var_init patches HiddenState params + +Step 2 compute_outputs(true) → compute_cfg() + optimize(Initial) + dead_code_elimination + sparse_conditional_constant_propagation + inst_combine + simplify_cfg_no_phi_merge + GVN + +Step 3 Topology::new + resolve branches from HirInterner outputs + linearize ddt / noise + detect small-signal network + +Step 4 DaeSystem::new + assign SimUnknowns (ports first, then internal nodes) + build residuals and noise sources per branch + call auto_diff → Jacobian derivatives + build_jacobian (dense → sparse) + build_lim_rhs (limiting correction terms) + ensure_optbarriers + mfactor scaling + +Step 5 compute_cfg() → optimize(PostDerivative) → dae_system.sparsify + +Step 6 refresh_op_dependent_insts + taint from op_dependent ParamKinds and op_dependent callbacks + propagate_taint → op_dependent_insts bitset + +Step 7 Initialization::new + split_block: copy non-op-dependent insts to init function + build_init_itern: copy params to init interner + build_init_cache: CacheSlot per GVN equivalence class + optimize init function (ADCE + simplify_cfg) + NodeCollapse::new + +Step 8 insert_param_init (instance params on init.func) + insert_param_init (model params → model_param_setup function) + optimize model_param_setup (SCCP + simplify_cfg) +``` + +--- + +## 5. `Context` and Optimization Stages + +```rust +pub(crate) struct Context<'a> { + pub(crate) func: Function, + pub(crate) cfg: ControlFlowGraph, + pub(crate) dom_tree: DominatorTree, + pub(crate) intern: HirInterner, + pub(crate) db: &'a CompilationDB, + pub(crate) module: &'a ModuleInfo, + pub(crate) output_values: BitSet, + pub(crate) op_dependent_insts: BitSet, + pub(crate) op_dependent_vals: Vec, +} +``` + +`Context::new` calls `MirBuilder` with: +- `is_output`: `Contribute`, `ImplicitResidual`, `CollapseImplicitEquation`, `IsVoltageSrc`, and `Var(v)` for each `op_var` in `ModuleInfo`. +- `with_equations()` — enables contribution and noise lowering. +- `with_tagged_writes()` — marks every variable write with an `OptBarrier` so the init split can identify and cache them. + +After build, `intern.insert_var_init` replaces `HiddenState(var)` `Param`s in the function with the actual variable initializer expressions. + +### Optimization stages + +`optimize(stage)` runs: + +| Pass | Initial | PostDerivative | Final | +|---|---|---|---| +| `dead_code_elimination` | ✓ | | | +| `sparse_conditional_constant_propagation` | ✓ | ✓ | ✓ | +| `inst_combine` | ✓ | ✓ | ✓ | +| `simplify_cfg_no_phi_merge` | ✓ | ✓ | | +| `simplify_cfg` (with phi merge) | | | ✓ | +| `GVN` | ✓ | ✓ | ✓ | +| Aggressive DCE | | | ✓ | + +The `PostDerivative` stage runs after `auto_diff` has added Jacobian instructions, allowing GVN and inst-combine to simplify them. + +### Op-dependent taint analysis + +`refresh_op_dependent_insts` seeds a taint set with: +- All live `Param`s whose `ParamKind::op_dependent()` returns `true` (voltages, currents, implicit unknowns, abstime, enabling flags, limiting state). +- All `op_dependent` callback results (noise calls, `$limit`, `$simparam`, `analysis`). + +`propagate_taint` then marks every instruction reachable in the data-flow graph from those seeds as op-dependent. The resulting `op_dependent_insts` bitset is used by `Initialization::new` to decide which instructions stay in the eval function and which are moved to init. + +--- + +## 6. `Topology` + +`Topology` is an intermediate representation between the raw `HirInterner` outputs and the `DaeSystem`. It resolves the flat set of `PlaceKind::Contribute` values into structured `BranchInfo` entries. + +```rust +pub(crate) struct BranchInfo { + pub is_voltage_src: Value, // FALSE / TRUE / dynamic + pub voltage_src: Contribution, + pub current_src: Contribution, +} + +pub(crate) struct Contribution { + pub unknown: Option, + pub resist: Value, + pub react: Value, + pub resist_small_signal: Value, + pub react_small_signal: Value, + pub noise: Vec, +} +``` + +`is_voltage_src` is the `IsVoltageSrc` output value from the MIR. If it is the constant `FALSE`, the branch is a pure current source; if `TRUE`, a voltage source; otherwise a runtime-switched branch. + +**`ddt` handling.** Each `TimeDerivative` callback (`ddt(x)`) in the MIR is either linearized into a direct reactive contribution (when `x` is linear in the circuit unknowns) or replaced with an implicit internal node carrying the reactive equation. + +**Small-signal network.** `small_signal_network` identifies nodes whose large-signal voltage is statically zero. Contributions from these nodes are separated into the `resist_small_signal` / `react_small_signal` fields of `Contribution`. This allows `build_jacobian` to skip Jacobian entries for these terms during large-signal simulation. + +**Noise extraction.** Each noise callback in `HirInterner::callback_uses` is extracted into the `noise` list of the appropriate `Contribution`. Each `NoiseSource` carries `kind` (`WhiteNoise`, `FlickerNoise`, `NoiseTable`), `hi`/`lo` `SimUnknown` indices, and a `factor` value. + +--- + +## 7. `DaeSystem` + +```rust +pub struct DaeSystem { + pub unknowns: TiSet, + pub residual: TiVec, + pub jacobian: TiVec, + pub small_signal_parameters: IndexSet, + pub noise_sources: Vec, + pub model_inputs: Vec<(u32, u32)>, + pub num_resistive: u32, + pub num_reactive: u32, +} +``` + +### `SimUnknown` and `SimUnknownKind` + +Each `SimUnknown` is a newtype index into `unknowns`. The three kinds are: + +- `KirchoffLaw(Node)` — a KCL node equation; residual = sum of currents flowing into the node. +- `Current(CurrentKind)` — a branch with a probe on its current (named branch, unnamed `I(a,b)`, or port flow); introduces a separate equation `I_branch − I_computed = 0`. +- `Implicit(ImplicitEquation)` — an internal implicit node introduced by `ddt` or `idt`. + +Port nodes are always assigned the first `SimUnknown` indices; internal nodes follow. + +### `Residual` + +```rust +pub struct Residual { + pub resist: Value, // large-signal I + pub react: Value, // large-signal Q (ddt term) + pub resist_small_signal: Value, + pub react_small_signal: Value, + pub resist_lim_rhs: Value, // J*(x_lim - x) correction, resistive + pub react_lim_rhs: Value, // J*(x_lim - x) correction, reactive +} +``` + +The DAE for unknown `i` is `I_i(x) + ddt(Q_i(x)) = 0`. The Newton step solves `J·Δx = I + ddt(Q)`. `resist_lim_rhs` and `react_lim_rhs` provide a corrective right-hand side term needed when `$limit` is active: since the simulator evaluates the model at `x_lim` rather than `x`, it needs `J(x_lim)·(x_lim − x)` subtracted from the residual so the Newton iteration converges to the correct solution. + +### `MatrixEntry` + +```rust +pub struct MatrixEntry { + pub row: SimUnknown, + pub col: SimUnknown, + pub resist: Value, // ∂I_row/∂x_col + pub react: Value, // ∂Q_row/∂x_col +} +``` + +The Jacobian is stored sparsely. `build_jacobian` constructs a dense row per residual and then emits only non-zero entries. `num_resistive` and `num_reactive` count how many entries have non-zero `resist` and `react` respectively; `osdi` uses these counts to size the OSDI descriptor tables. + +### Jacobian construction + +`DaeSystem::new` (via `Builder::finish`): + +1. Calls `intern.unknowns(func, sim_derivatives=true)` to produce `KnownDerivatives`, marking all voltage/current/implicit `Param`s as differentiation unknowns. +2. Calls `jacobian_derivatives` to build the `extra_derivatives` list: `(residual_value, unknown)` pairs for every non-constant residual. +3. Calls `mir_autodiff::auto_diff` to insert derivative instructions into the function. +4. `build_jacobian` iterates residuals, applies each derivative to the dense row, and sparsifies. + +### `sparsify` + +After the `PostDerivative` optimization pass, `DaeSystem::sparsify` strips redundant `OptBarrier`s from residual and Jacobian values when the underlying value is already a constant or parameter (i.e., no computation is needed). Zero entries are removed from `jacobian` entirely. + +--- + +## 8. `Initialization` — The Init/Eval Split + +```rust +pub struct Initialization { + pub func: Function, + pub intern: HirInterner, + pub cached_vals: IndexMap, + pub cache_slots: TiMap, u32), hir::Type>, +} +``` + +`Initialization::new` splits the single eval `Function` into two functions: one that runs once at instance setup (init) and one that runs each Newton iteration (eval). + +### `split_block` + +For each block in the eval function, `split_block` iterates its instructions: + +- **Op-independent instruction**: copied to `init.func` (with values remapped via `val_map`), then zapped from `eval.func` if it is safe to remove. +- **Op-dependent terminator** (`Branch` or `Jump`): instead of copying, a `jump` to the else destination is emitted in `init.func`. This prevents op-dependent branches from fragmenting the init function's control flow. +- **Op-dependent `CollapseHint` callback**: `ignore_if_op_dependent` returns `true` for this kind, so it is zapped from eval without being copied to init. + +### Cache slots + +An op-independent instruction whose result is used in the eval function must be communicated via a cache slot — a shared memory location written by init and read by eval. + +After `split_block`, tagged values (writes to `op_var` variables) and output `OptBarrier`s over op-independent values are candidates for caching. `build_init_cache` runs ADCE on eval to determine which cached values are actually consumed, then creates a `CacheSlot` for each, grouped by GVN equivalence class. In eval, the corresponding `Value` is replaced with a `Param` referencing that slot; in init, the slot is written with an `OptBarrier` over the computed value. + +--- + +## 9. `NodeCollapse` + +```rust +pub struct NodeCollapse { + pairs: TiSet)>, + extra_pairs: TiVec>, +} +``` + +Node collapsing allows a simulator to merge two circuit nodes into one, removing a degree of freedom. `NodeCollapse` enumerates all pairs that can legally be collapsed. + +Two sources contribute collapse pairs: + +1. **`CollapseImplicitEquation` outputs** in `init.intern.outputs` — when an implicit equation is always collapsed (its `CollapseImplicitEquation` value is `TRUE` at instance-setup time), the associated implicit node disappears and its `SimUnknown` pair is registered. + +2. **`CollapseHint(hi, lo)` callbacks** in `init.intern.callbacks` — emitted by `hir_lower` when `V(hi, lo) <+ 0.0` is seen. Translated to `(KirchoffLaw(hi), KirchoffLaw(lo))` pairs. + +`extra_pairs` handles a secondary effect: if a branch current `I(a, b)` is probed (creating a `Current(kind)` unknown) and the underlying branch `(a, b)` is collapsible, then the current unknown must also be collapsed when the node pair collapses. These dependent pairs are stored in `extra_pairs[primary_pair]` as a `HybridBitSet`. + +`NodeCollapse::hint(hi, lo, f)` is the API used by `osdi`: given a collapsing signal for a pair, it calls `f` for the primary pair and all its `extra_pairs`. + +--- + +## 10. Worked Example — `resistor_va` + +```verilog +module resistor_va(A, B); + inout A, B; electrical A, B; + branch (A, B) br_a_b; + parameter real R = 0.0 from [0:inf]; (* desc="Ohmic resistance", units="Ohm" *) + parameter real zeta = 0.0 from [-20:20]; (* desc="Temperature coeff" *) + parameter real tnom = 300.0 from [0:inf]; (* desc="Reference Temp.", units="Kelvin" *) + real res, vres; + analog begin + vres = V(br_a_b); + res = R * pow($temperature / tnom, zeta); + I(br_a_b) <+ vres / res; + end +endmodule +``` + +### `ModuleInfo` + +`collect` finds three parameters (all have `desc` or `units`): `R`, `zeta`, `tnom`. None has `type = "instance"`, so all are model parameters. `res` and `vres` have no attributes, so `op_vars` is empty. `sys_fun_alias` is also empty. + +### `Context::new` — eval function sketch + +After `MirBuilder::build` and `insert_var_init` (which replaces the `HiddenState(res)` and `HiddenState(vres)` params with `F_ZERO` since the variables have no initializer): + +``` +p0 = Voltage{A, Some(B)} ; V(A,B) — op_dependent +p1 = Param(R) ; not op_dependent +p2 = Temperature ; not op_dependent +p3 = Param(tnom) +p4 = Param(zeta) + +v0 = param p0 ; V(A,B) +v1 = param p1 ; R +v2 = param p2 ; $temperature +v3 = param p3 ; tnom +v4 = param p4 ; zeta +v5 = fdiv v2, v3 ; temp/tnom +v6 = pow v5, v4 ; (temp/tnom)^zeta +v7 = fmul v1, v6 ; R * (temp/tnom)^zeta +v8 = fdiv v0, v7 ; V(A,B) / R*(temp/tnom)^zeta +v9 = optbarrier v8 +ret +``` + +`intern.outputs[Contribute{br_a_b, resistive, current}] = v9` + +### `Topology` + +`br_a_b` has `IsVoltageSrc = FALSE` (constant), so it is a pure current branch. Its `current_src.resist = v8`, `current_src.react = F_ZERO`. No noise sources. + +### `DaeSystem` + +Unknowns (ports first): `sim_node0 = KirchoffLaw(A)`, `sim_node1 = KirchoffLaw(B)`. + +`build_branch(br_a_b, ...)` calls `add_kirchoff_law`: +- `residual[KirchoffLaw(A)].resist += v8` +- `residual[KirchoffLaw(B)].resist -= v8` + +`sim_unknown_reads` = `[(Voltage{A,B}, v0)]`. + +`auto_diff` is called with `extra_derivatives = [(v8, Unknown(V(A,B)))]`: +- `d(v8)/d(V(A,B))` = `d(V(A,B)/v7)/d(V(A,B))` = `1/v7` +- New instruction: `v10 = fdiv F_ONE, v7` + +`build_jacobian`: + +| Row | Col | resist | react | +|---|---|---|---| +| `sim_node0` (A) | `sim_node0` (A) | `v10` (= 1/R) | `F_ZERO` | +| `sim_node0` (A) | `sim_node1` (B) | `fneg(v10)` (= −1/R) | `F_ZERO` | +| `sim_node1` (B) | `sim_node0` (A) | `fneg(v10)` | `F_ZERO` | +| `sim_node1` (B) | `sim_node1` (B) | `v10` | `F_ZERO` | + +After `sparsify` and GVN, identical `fneg(v10)` subexpressions are deduplicated. + +### `Initialization` split + +`refresh_op_dependent_insts` seeds taint at `v0` (the `Voltage{A,B}` param). Taint propagates to `v8` and `v9`. Instructions `v5`, `v6`, `v7` are op-independent. + +`split_block` copies `v5 = fdiv v2, v3`, `v6 = pow v5, v4`, `v7 = fmul v1, v6` to `init.func`. Since `v7` is used in the eval function (by `v8`), it becomes a `CacheSlot`. In eval, `v7` is replaced with a `Param` reading that slot. + +Final eval function (after init split): + +``` +p0 = Voltage{A, B} ; op_dependent +pN = CacheSlot(0) ; v7 cached from init: R*(temp/tnom)^zeta + +v0 = param p0 +v7c = param pN ; cached result +v8 = fdiv v0, v7c +v9 = optbarrier v8 +... +``` + +`init.func` computes `v7` once and stores it in `CacheSlot(0)`. + +### `NodeCollapse` + +No `CollapseHint` calls and no `CollapseImplicitEquation` outputs → `node_collapse.num_pairs() = 0`. The simulator will not collapse any nodes for this model. From fb45898e97ab25e1ded1325f09e44a02f04d9af8 Mon Sep 17 00:00:00 2001 From: philippevelha Date: Sat, 23 May 2026 11:59:56 +0200 Subject: [PATCH 04/28] docs: add osdi INTERNALS Covers the OSDI 0.4 codegen layer: compile() two-phase structure, OsdiTys stdlib bitcode embedding, OsdiCompilationUnit per-module codegen, OsdiInstanceData/OsdiModelData constant field indices, four generated functions (access/setup_model/setup_instance/eval), 46-field OsdiDescriptor, callback resolution, and all six exported globals. Co-Authored-By: Claude Sonnet 4.6 --- docs/osdi/INTERNALS.md | 311 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 311 insertions(+) create mode 100644 docs/osdi/INTERNALS.md diff --git a/docs/osdi/INTERNALS.md b/docs/osdi/INTERNALS.md new file mode 100644 index 00000000..9c30ac6b --- /dev/null +++ b/docs/osdi/INTERNALS.md @@ -0,0 +1,311 @@ +# `osdi` — Internals + +LLVM codegen and OSDI 0.4 descriptor emission: the final compilation stage. + +Related: [sim_back INTERNALS](../sim_back/INTERNALS.md) · [ARCHITECTURE](../ARCHITECTURE.md) + +--- + +## 1. Purpose and Position + +`osdi` receives the `CompiledModule` collection produced by `sim_back` and turns it into native object files. It has two responsibilities that are easy to conflate: + +**Codegen** — for each module, generate four LLVM functions (`access`, `setup_model`, `setup_instance`, `eval`) by translating the MIR `Function`s from `CompiledModule` into LLVM IR, wiring runtime callbacks to stdlib implementations, and laying out instance and model data structs. + +**Descriptor emission** — collect all per-module metadata (node count, parameter layout, Jacobian sparsity, function pointers, memory offsets) into a flat LLVM constant struct array `OSDI_DESCRIPTORS`, plus the well-known size sentinel `OSDI_DESCRIPTOR_SIZE`. A circuit simulator `dlopen`s the resulting shared library, reads these globals, and uses the function pointers and layout tables to call into the model. + +The `linker` crate then links all object files into a single `.so`/`.dll`. + +--- + +## 2. Module Map + +| File | Role | +|---|---| +| `lib.rs` | `compile()` entry point; `OsdiTys` instantiation; `OsdiLimId`; exported-globals emission; `lltype` helper; `intern_names` | +| `compilation_unit.rs` | `OsdiModule`, `OsdiCompilationUnit`; `new_codegen`; `general_callbacks`; `print_callback` | +| `metadata/osdi_0_4.rs` | Generated: `OsdiTys`, `OsdiDescriptor`, `OsdiNode`, `OsdiJacobianEntry`, `OsdiParamOpvar`, `OsdiNoiseSource`, `OsdiNodePair`, `OsdiLimFunction` (Rust structs); all OSDI flag constants; `stdlib_bitcode()` | +| `metadata.rs` | `OsdiLimFunction` (interned form); re-exports `osdi_0_4` | +| `inst_data.rs` | `OsdiInstanceData` — LLVM struct for per-instance memory; `EvalOutput` enum; constant field index offsets | +| `model_data.rs` | `OsdiModelData` — LLVM struct for per-model memory | +| `access.rs` | `access_function()` — read/write model and instance parameters by integer ID | +| `setup.rs` | `setup_model()`, `setup_instance()` — parameter initialization kernels | +| `eval.rs` | `eval()` — main Newton-iteration kernel; gated by `CALC_*` flags | +| `load.rs` | `load_noise()`, `load_residual_*`, `load_jacobian_*` helper functions; `JacobianLoadType` | +| `noise.rs` | Noise contribution helpers used by `load_noise` | +| `bitfield.rs` | `is_flag_set`, `is_flag_set_mem`, `is_flag_unset` — inline bitfield helpers | + +--- + +## 3. `compile()` Overview + +```rust +pub fn compile<'a>( + db: &'a CompilationDB, + modules: &'a [ModuleInfo], + dst: &'a Utf8Path, + target: &'a Target, + back: &'a LLVMBackend, + emit: bool, + opt_lvl: LLVMCodeGenOptLevel, + ... +) -> (Vec, Vec>, Rodeo) +``` + +`compile()` runs in two phases. + +**Phase 1 — sequential setup.** For each `ModuleInfo`, `CompiledModule::new` builds the complete sim_back output (MIR, DAE system, init/eval split). During this loop, any `CallBackKind::BuiltinLimit` callbacks are collected into a shared `lim_table` — a deduplicated set of `OsdiLimFunction` entries keyed by name and argument count. + +`OsdiModule::new` then wraps each `CompiledModule`. The module's `sym` field is set to a base-n encoding of the module's UUID: `base_n::encode(module.info.module.uuid(db) as u128, CASE_INSENSITIVE)`. This symbol suffix is appended to every generated function name to avoid collisions in the linked library. + +`intern_names` walks all parameter names, aliases, units, descriptions, and system-function aliases and pre-interns them into the shared `Rodeo` string table. This ensures string literals are deduplicated across modules before codegen begins. + +**Phase 2 — parallel codegen.** `rayon_core::scope` spawns four tasks per module plus one shared descriptor task: + +| Task (per module) | LLVM module name | Object file index | Function generated | +|---|---|---|---| +| Access | `access_{sym}` | `i*4` | `access_{sym}` | +| Setup model | `setup_model_{sym}` | `i*4+1` | `setup_model_{sym}` | +| Setup instance | `setup_instance_{sym}` | `i*4+2` | `setup_instance_{sym}` | +| Eval | `eval_{sym}` | `i*4+3` | `eval_{sym}` | + +Each task calls `new_codegen` to create an `LLVMBackend`-owned LLVM module, loads the stdlib bitcode, constructs `OsdiTys` and `OsdiCompilationUnit`, generates its function, optimizes, and emits an object file. + +The **descriptor task** (no rayon spawn — runs on the scope's thread) creates one more LLVM module, builds the `OsdiDescriptor` constant for each module by calling `cguint.descriptor(...)`, converts it to an LLVM constant via `descriptor.to_ll_val(&cx, &tys)`, then calls `cx.export_array` and `cx.export_val` to emit the well-known globals. + +--- + +## 4. `OsdiTys` and the Stdlib Bitcode + +Every LLVM module created for OSDI codegen loads a pre-compiled bitcode file: + +```rust +pub fn stdlib_bitcode(target: &Target) -> &'static [u8] { ... } +``` + +`osdi/build.rs` compiles a C/LLVM-IR stdlib once per supported target triple and embeds the result as `include_bytes!` constants. The supported triples are: `x86_64-unknown-linux-gnu`, `x86_64-pc-windows-msvc`, `x86_64-pc-windows-gnu`, `x86_64-apple-macosx10.15.0`, `aarch64-unknown-linux-gnu`, `aarch64-pc-windows-msvc`, `arm64-apple-macosx11.0.0`. + +`new_codegen` loads the bitcode via `cx.include_bitcode(stdlib_bitcode(back.target()))`, then sets all non-declaration functions to `LLVMInternalLinkage` so they are not exported. It also looks up two constants that format-string helpers need — `EXP` (a table of floating-point exponents) and `FMT_CHARS` (a table of SI prefix characters) — and sets these to internal linkage as well. + +`OsdiTys` is a bundle of LLVM struct types that mirror the OSDI 0.4 C structs. It is built by `OsdiTyBuilder::new(ctx, target_data)`, which constructs each type in dependency order: + +``` +OsdiLimFunction → OsdiSimParas → OsdiSimInfo +OsdiInitErrorPayload → OsdiInitError → OsdiInitInfo +OsdiNodePair → OsdiJacobianEntry → OsdiNode +OsdiParamOpvar → OsdiNoiseSource → OsdiDescriptor +``` + +`target_data` (an `LLVMTargetDataRef` created from `target.data_layout`) is needed to compute `OsdiInitErrorPayload`'s size, which must match the ABI size of `i32` on the target. + +--- + +## 5. `OsdiModule` and `OsdiCompilationUnit` + +```rust +pub struct OsdiModule<'a> { + pub info: &'a ModuleInfo, + pub dae_system: &'a DaeSystem, + pub eval: &'a Function, + pub intern: &'a HirInterner, + pub init: &'a Initialization, + pub model_param_setup: &'a Function, + pub model_param_intern: &'a HirInterner, + pub lim_table: &'a TiSet, + pub node_collapse: &'a NodeCollapse, + pub sym: String, +} +``` + +`OsdiModule` is a pure reference wrapper around `CompiledModule`. It adds `sym` and `lim_table` (shared across all modules in the compilation). + +```rust +pub struct OsdiCompilationUnit<'a, 'b, 'll> { + pub db: &'a CompilationDB, + pub inst_data: OsdiInstanceData<'ll>, + pub model_data: OsdiModelData<'ll>, + pub tys: &'a OsdiTys<'ll>, + pub cx: &'a CodegenCx<'b, 'll>, + pub module: &'a OsdiModule<'b>, + pub lim_dispatch_table: Option<&'ll llvm_sys::LLVMValue>, +} +``` + +`OsdiCompilationUnit::new` constructs `OsdiInstanceData` and `OsdiModelData` (which build the LLVM struct types for the instance and model blobs), then — if this is the eval task and the module has `$limit` calls — creates an external `OSDI_LIM_TABLE` global that the eval function will index at runtime. + +--- + +## 6. `OsdiInstanceData` and `OsdiModelData` + +### Instance struct layout + +The per-instance LLVM struct has a fixed prefix of `NUM_CONST_FIELDS = 8` fields at known indices, followed by variable-length parameter, cache-slot, and Jacobian-pointer fields: + +| Index constant | Field | Purpose | +|---|---|---| +| `PARAM_GIVEN = 0` | bitfield | One bit per parameter: was it explicitly set? | +| `JACOBIAN_PTR_RESIST = 1` | `*f64` | Pointer into the simulator's resistive stamp array | +| `JACOBIAN_PTR_REACT = 2` | `*f64` | Pointer into the simulator's reactive stamp array | +| `NODE_MAPPING = 3` | `u32[num_nodes]` | Maps OSDI node index → simulator node index | +| `COLLAPSED = 4` | bitfield | One bit per collapsible pair: was it collapsed? | +| `TEMPERATURE = 5` | `f64` | Instance temperature | +| `CONNECTED = 6` | bitfield | One bit per port: is it connected? | +| `STATE_IDX = 7` | `u32` | Starting index in the simulator's `$limit` state array | + +After these: cached eval outputs (written by `setup_instance`, read by `eval`), user instance parameters, and operating-point variable slots. + +### `EvalOutput` + +When generating `eval`, each MIR output value maps to one of four `EvalOutput` variants: + +```rust +pub enum EvalOutput { + Calculated(EvalOutputSlot), // computed during eval, stored in instance struct + Const(Const, PackedOption), // LLVM constant; may also have a slot + Param(Param), // already in model struct (Param/Temperature/ParamSysFun kinds) + Cache(CacheSlot), // pre-computed by setup_instance, read from instance struct +} +``` + +### Model struct + +`OsdiModelData` holds model-level parameters (all parameters, not just instance ones) plus `$mfactor` and other system function values. Its layout is built in `model_data.rs` using the same `lltype` helper. + +--- + +## 7. The Four Generated Functions + +Each function is emitted into its own LLVM module and compiled to a separate object file. The `OsdiDescriptor` stores a pointer to each. + +### `access_{sym}(inst, model, param_id, flags) → *void` + +The parameter read/write gateway. Signature: + +``` +fn(inst: *opaque, model: *opaque, param_id: u32, flags: u32) -> *opaque +``` + +Dispatches on the upper bits of `param_id` using a switch over three cases (`PARA_KIND_MODEL`, `PARA_KIND_INST`, `PARA_KIND_OPVAR`), then within each case uses a second switch on the lower bits to find the correct struct field via `LLVMBuildStructGEP2`. The `ACCESS_FLAG_SET` bit in `flags` selects write mode; `ACCESS_FLAG_INSTANCE` routes to the instance struct for op-vars. Returns a pointer to the field; the simulator performs the actual load or store. + +### `setup_model_{sym}(model, sim_info, ret_flags, handle)` + +Translates `model_param_setup` (the `Function` from `CompiledModule`) to LLVM IR, then writes each evaluated output into the model struct. Parameter bound violations emit `OsdiInitError` records via a runtime callback. The `sim_info` argument carries the `OsdiSimInfo` struct with simulator-provided system parameters. + +### `setup_instance_{sym}(inst, model, sim_info, ret_flags, handle, node_mapping, connected)` + +Translates `init.func` (the op-independent init `Function`). Writes evaluated cache-slot values into the instance struct. Also applies any initial-condition (`IC`) overrides from `sim_info`. For each `CollapseImplicitEquation` output: if the value is `TRUE`, marks the corresponding bit in `COLLAPSED` and updates `NODE_MAPPING` to merge nodes. + +### `eval_{sym}(inst, model, sim_info, ret_flags)` + +The Newton-iteration kernel. The `sim_info.flags` field is a bitmask gating which computations to run: + +| Flag | Value | Effect | +|---|---|---| +| `CALC_RESIST_RESIDUAL` | 1 | Compute and store resistive residual | +| `CALC_REACT_RESIDUAL` | 2 | Compute and store reactive residual | +| `CALC_RESIST_JACOBIAN` | 4 | Compute and store resistive Jacobian entries | +| `CALC_REACT_JACOBIAN` | 8 | Compute and store reactive Jacobian entries | +| `CALC_NOISE` | 16 | Compute noise contributions | +| `CALC_OP` | 32 | Compute operating-point variables | +| `CALC_RESIST_LIM_RHS` | 64 | Compute resistive limiting correction RHS | +| `CALC_REACT_LIM_RHS` | 128 | Compute reactive limiting correction RHS | +| `ENABLE_LIM` | 256 | Limiting is active this Newton step | +| `INIT_LIM` | 512 | Initialize limit state | + +The function translates `eval` (the `Function` from `CompiledModule`) via `mir_llvm`, materializing the `CALC_*` flag checks as LLVM conditional branches. Results are stored via the `JACOBIAN_PTR_RESIST`/`JACOBIAN_PTR_REACT` pointers in the instance struct (written at each Newton step by the simulator before calling `eval`) and via separate load-helper functions. Return flags `EVAL_RET_FLAG_LIM`, `EVAL_RET_FLAG_FATAL`, `EVAL_RET_FLAG_FINISH`, `EVAL_RET_FLAG_STOP` are written into `ret_flags`. + +--- + +## 8. `OsdiDescriptor` + +`OsdiDescriptor<'ll>` is a Rust struct with 46 fields. Selected fields and their sources: + +| Field | Source | +|---|---| +| `name` | `module.info.module.name(db)` | +| `num_nodes` / `num_terminals` | `DaeSystem::unknowns` count of `KirchoffLaw` / port count | +| `nodes: Vec` | One entry per `KirchoffLaw` unknown: name, units, residual offsets in instance struct | +| `num_jacobian_entries` / `jacobian_entries` | `DaeSystem::jacobian` entries; each `OsdiJacobianEntry` = `OsdiNodePair` + `flags` | +| `num_collapsible` / `collapsible` | `NodeCollapse::pairs()` as `OsdiNodePair` values | +| `collapsed_offset` | Byte offset of `COLLAPSED` field in instance struct | +| `noise_sources` | `DaeSystem::noise_sources`; each `OsdiNoiseSource` = name + `OsdiNodePair` | +| `num_params` / `num_instance_params` / `num_opvars` | Counts from `ModuleInfo` | +| `param_opvar: Vec` | One entry per parameter and op-var: names, aliases, units, description, `flags` (type + kind bits), `len` for arrays | +| `node_mapping_offset` | Byte offset of `NODE_MAPPING` in instance struct | +| `jacobian_ptr_resist_offset` | Byte offset of `JACOBIAN_PTR_RESIST` in instance struct | +| `instance_size` / `model_size` | `LLVMABISizeOfType` of the instance/model LLVM structs | +| `access` / `setup_model` / `setup_instance` / `eval` | LLVM function value pointers (declared in the descriptor LLVM module as external) | +| `load_noise` / `load_residual_resist` / `load_jacobian_resist` / … | Pointers to the load-helper functions | +| `num_resistive_jacobian_entries` / `num_reactive_jacobian_entries` | From `DaeSystem::num_resistive` / `num_reactive` | +| `num_inputs` / `inputs` | `DaeSystem::model_inputs` as `OsdiNodePair` values | + +`OsdiJacobianEntry::flags` encodes which of the four values (resist/react × const/computed) are non-zero: + +``` +JACOBIAN_ENTRY_RESIST_CONST = 1 // resist value is a compile-time constant +JACOBIAN_ENTRY_REACT_CONST = 2 // react value is a compile-time constant +JACOBIAN_ENTRY_RESIST = 4 // resist value present +JACOBIAN_ENTRY_REACT = 8 // react value present +``` + +`OsdiParamOpvar::flags` encodes type and kind: + +``` +flags = PARA_TY_REAL(0) | PARA_TY_INT(1) | PARA_TY_STR(2) — bits 1:0 + | PARA_KIND_MODEL(0<<30) | PARA_KIND_INST(1<<30) | PARA_KIND_OPVAR(2<<30) — bits 31:30 +``` + +`to_ll_val(&cx, &tys)` converts the Rust struct to a `ctx.const_struct(tys.osdi_descriptor, &fields)` LLVM constant, with embedded `const_arr_ptr` sub-arrays for nodes, jacobian entries, collapsible pairs, noise sources, and param/opvar entries. + +--- + +## 9. Callback Resolution + +`general_callbacks` maps each `CallBackKind` in `HirInterner::callbacks` to a `TiVec>>`: + +```rust +pub fn general_callbacks<'ll>( + intern: &HirInterner, + builder: &mut mir_llvm::Builder<'_, '_, 'll>, + ret_flags: &'ll LLVMValue, + handle: &'ll LLVMValue, + simparam: &'ll LLVMValue, +) -> TiVec>> +``` + +Selected mappings: + +| `CallBackKind` | Resolution | +|---|---| +| `SimParam` | `simparam` stdlib function; state = `[simparam, handle, ret_flags]` | +| `SimParamOpt` | `simparam_opt` stdlib function; state = `[simparam]` | +| `SimParamStr` | `simparam_str` stdlib function | +| `Derivative(_)` / `NodeDerivative(_)` | `const_callback` returning `0.0` — if these survived to codegen, they were zero-valued | +| `Print { kind, arg_tys }` | `print_callback` (hand-built LLVM IR): `snprintf` into a heap buffer, then `osdi_log(handle, msg, flags)` | +| `SetRetFlag(flag)` | `set_ret_flag_fatal` / `set_ret_flag_finish` / `set_ret_flag_stop` stdlib functions; state = `[ret_flags]` | +| `ParamInfo`, `CollapseHint`, `BuiltinLimit`, `StoreLimit`, `LimDiscontinuity`, `Analysis`, noise, `TimeDerivative` | `None` — handled by their respective setup/eval generation paths, not as general callbacks | + +`print_callback` generates a complete LLVM function inline: it calls `snprintf` twice (first to measure, then to write), allocates a heap buffer, handles errors via a phi node, then calls the `osdi_log` function pointer (loaded from the `osdi_log` global). The log level constants (`LOG_LVL_DEBUG` through `LOG_LVL_FATAL`) and `LOG_FMT_ERR` are OR'd into the flags argument. + +### `$limit` dispatch table + +When `eval` is built with `eval=true` and the module has `$limit` calls, `OsdiCompilationUnit::new` creates an external `OSDI_LIM_TABLE` array global. At runtime, the simulator links this with the actual limit function table emitted in the descriptor object. The eval function indexes into this table to dispatch user-defined limit functions by name. + +--- + +## 10. Exported Globals + +The descriptor LLVM module (the last object file, at index `modules.len() * 4`) exports: + +| Symbol | Type | Value | +|---|---|---| +| `OSDI_DESCRIPTORS` | `OsdiDescriptor[N]` | Array of N descriptor structs, one per module | +| `OSDI_DESCRIPTOR_SIZE` | `u32` | `LLVMABISizeOfType(tys.osdi_descriptor)` — byte stride for simulators supporting multiple OSDI versions | +| `OSDI_NUM_DESCRIPTORS` | `u32` | N (number of modules) | +| `OSDI_VERSION_MAJOR` | `u32` | `0` | +| `OSDI_VERSION_MINOR` | `u32` | `4` | +| `OSDI_LIM_TABLE` | `OsdiLimFunction[M]` | Emitted only if `lim_table` is non-empty; M = number of unique `$limit` functions | +| `OSDI_LIM_TABLE_LEN` | `u32` | M | +| `osdi_log` | `*fn(handle, msg, flags)` | Initialized to null pointer; the simulator fills this in after `dlopen` so models can emit log messages | + +A simulator loads the shared library, reads `OSDI_NUM_DESCRIPTORS`, then strides through `OSDI_DESCRIPTORS` by `OSDI_DESCRIPTOR_SIZE` bytes per entry. This stride-based iteration lets a simulator handle libraries compiled against a newer minor version without recompiling against the new header. From 683129cda397404001f896ae1080c7ccd00c0344 Mon Sep 17 00:00:00 2001 From: philippevelha Date: Sat, 23 May 2026 12:49:50 +0200 Subject: [PATCH 05/28] docs: add mir_llvm INTERNALS Covers the MIR-to-LLVM-IR translation layer: LLVMBackend/ModuleLlvm ownership model, lifetime hierarchy ('t/'a/'ll/'cx), CodegenCx per-module context, Types primitive table, BuilderVal/MemLoc lazy-load abstraction, callback protocol (Prebuilt/Inline), build_func() RPO walk with deferred phi fix-up, complete opcode-to-LLVM dispatch table, fast-math signaling via negative srcloc, intrinsics/libm registry, and a resistor eval worked example tracing fmul/fadd/exit to LLVM IR. Co-Authored-By: Claude Sonnet 4.6 --- docs/mir_llvm/INTERNALS.md | 429 +++++++++++++++++++++++++++++++++++++ 1 file changed, 429 insertions(+) create mode 100644 docs/mir_llvm/INTERNALS.md diff --git a/docs/mir_llvm/INTERNALS.md b/docs/mir_llvm/INTERNALS.md new file mode 100644 index 00000000..7bdbd077 --- /dev/null +++ b/docs/mir_llvm/INTERNALS.md @@ -0,0 +1,429 @@ +# `mir_llvm` — Internals + +The LLVM codegen layer for OpenVAF's MIR. + +--- + +## 1. Purpose and Position + +`mir_llvm` translates a fully-optimised MIR `Function` (see [`docs/mir/INTERNALS.md`](../mir/INTERNALS.md)) into LLVM IR and then to a native object file. It sits at the very bottom of the pipeline, called by [`osdi`](../osdi/INTERNALS.md) for each of the four generated functions per Verilog-A module. + +The crate has a narrow mandate: + +- It owns the LLVM context, module, and target machine. +- It maps each MIR opcode to the corresponding LLVM builder call. +- It provides helpers for declaring globals, emitting constant values, linking bitcode, running the pass manager, and writing the object file. + +It does **not** implement optimisation passes (those run inside `mir_opt` on the MIR), define OSDI struct layouts (that is `osdi`'s job), or assign semantic meaning to opcodes (those are fixed by the MIR spec). + +--- + +## 2. Module Map + +| File | Role | +|---|---| +| `lib.rs` | `LLVMBackend`, `ModuleLlvm`, `LLVMString`; target machine creation; utility fns | +| `context.rs` | `CodegenCx<'a,'ll>` — per-module codegen context | +| `builder.rs` | `Builder<'a,'cx,'ll>`, `BuilderVal`, `MemLoc` — per-function instruction emitter | +| `types.rs` | `Types<'ll>` primitive type table; `const_*` / `ty_*` helpers on `CodegenCx` | +| `declarations.rs` | `declare_ext_fn`, `declare_int_fn`, `define_global`, `export_val`, `export_array`, … | +| `callbacks.rs` | `CallbackFun`, `BuiltCallbackFun`, `InlineCallbackBuilder` | +| `intrinsics.rs` | `intrinsic()` on `CodegenCx`; maps opcode names to LLVM intrinsics / libm symbols | + +--- + +## 3. Lifetime Hierarchy + +Three separate lifetime parameters appear throughout the crate. Understanding them avoids confusion when reading `Builder` signatures. + +``` +'t — the Target spec (essentially 'static; valid for the entire process) +'a — the codegen session: the Rodeo literal table, the Target ref +'ll — the LLVM module: every &'ll LLVMValue and &'ll LLVMType is valid + exactly as long as the owning ModuleLlvm is alive +'cx — the CodegenCx borrow inside a Builder (subset of 'a + 'll) +``` + +`Builder<'a, 'cx, 'll>` borrows: +- `'cx: 'a + 'll` — a `&'a CodegenCx<'cx, 'll>`, +- `'a` — the MIR `Function` and the LLVM builder handle, +- `'ll` — all produced `&'ll LLVMValue` / `&'ll LLVMBasicBlock` refs. + +Because `'ll` is tied to `ModuleLlvm`, all generated values are automatically invalidated when the module is dropped. + +--- + +## 4. `LLVMBackend` and `ModuleLlvm` + +### `LLVMBackend<'t>` + +```rust +pub struct LLVMBackend<'t> { + target: &'t Target, + target_cpu: String, + features: String, +} +``` + +The entry point for the codegen session. Constructed once per compilation unit with `LLVMBackend::new(cg_opts, target, target_cpu, target_features)`. The constructor resolves `"native"` CPU/features by calling `LLVMGetHostCPUName` / `LLVMGetHostCPUFeatures`, then merges in `target.options.features` and any extra `target_features` strings. + +Two factory methods create the per-module objects: + +```rust +pub unsafe fn new_module(&self, name: &str, opt_lvl: LLVMCodeGenOptLevel) -> Result +pub unsafe fn new_ctx<'a, 'll>(&'a self, literals: &'a Rodeo, module: &'ll ModuleLlvm) -> CodegenCx<'a, 'll> +``` + +### `ModuleLlvm` + +```rust +pub struct ModuleLlvm { + llcx: LLVMContextRef, + llmod_raw: LLVMModuleRef, + tm: LLVMTargetMachineRef, + opt_lvl: LLVMCodeGenOptLevel, +} +``` + +Owns the LLVM context (`llcx`) and module (`llmod_raw`) so their lifetimes are tied together. The `Drop` impl calls `LLVMContextDispose` and `LLVMDisposeTargetMachine`. + +Key methods: + +**`include_bitcode(bitcode: &[u8])`** — parses a slice of LLVM bitcode bytes into a new in-memory module, then links it into `self` with `LLVMLinkModules2`. This is how the embedded stdlib (math helpers, `osdi_log` glue) is merged into each generated module. + +**`optimize()`** — runs LLVM's new pass manager via `LLVMRunPasses`. The pipeline string is derived from `opt_lvl`: + +| `LLVMCodeGenOptLevel` | pipeline string | +|---|---| +| `LLVMCodeGenLevelNone` | `"default"` | +| `LLVMCodeGenLevelLess` | `"default"` | +| `LLVMCodeGenLevelDefault` | `"default"` | +| `LLVMCodeGenLevelAggressive` | `"default"` | + +**`emit_object(dst: &Path)`** — calls `LLVMTargetMachineEmitToFile` with `LLVMObjectFile` to write a native `.o`. + +**`verify()` / `verify_and_print()`** — wraps `LLVMVerifyModule`; useful during development and testing. + +**`to_str()`** — calls `LLVMPrintModuleToString`; returns the textual LLVM IR as an `LLVMString`. Used in tests. + +The target machine is always created with `LLVMRelocPIC` (position-independent code) and `LLVMCodeModelDefault`, matching the requirements of a shared library output. + +--- + +## 5. `CodegenCx` + +```rust +pub struct CodegenCx<'a, 'll> { + pub llmod: &'ll LLVMModule, + pub llcx: &'ll LLVMContext, + pub target: &'a Target, + pub literals: &'a Rodeo, + str_lit_cache: RefCell>, + pub(crate) intrinsics: RefCell>, + pub(crate) local_gen_sym_counter: Cell, + pub(crate) tys: Types<'ll>, +} +``` + +`CodegenCx` is the per-module shared state used by every `Builder`. It holds: + +- The LLVM module and context refs (borrowed from `ModuleLlvm`). +- The `Rodeo` literal table from the HIR layer, needed to resolve `Const::Str` values. +- `str_lit_cache` — deduplicates string literal globals. The first time a `Spur` is requested, `const_str()` creates an internal-linkage global `[N x i8]` constant and caches it; subsequent references return the same global. +- `intrinsics` — lazily populated by `intrinsic()` (see §12). +- `local_gen_sym_counter` — a monotonic counter used by `generate_local_symbol_name(prefix)`, which produces names like `"str.0"`, `"cb.3"`, `"arr.7"` for internal symbols. +- `tys: Types<'ll>` — the primitive type table (see §6). + +`include_bitcode()` delegates to `ModuleLlvm`'s own method through the stored module pointer. + +--- + +## 6. `Types` and Constant Helpers + +`Types<'ll>` is a plain struct that pre-builds the primitive LLVM types once and caches them as `&'ll Type` references: + +| Field | LLVM type | +|---|---| +| `double` | `double` (f64) | +| `char` | `i8` | +| `int` | `i32` | +| `size` | `iN` where N = `target.pointer_width` | +| `ptr` | `i8*` (opaque pointer in address space 0) | +| `fat_ptr` | `{ i8*, i64 }` (pointer + metadata word) | +| `bool` | `i1` | +| `void` | `void` | +| `null_ptr_val` | `null` constant of type `i8*` | + +`CodegenCx` exposes these through `ty_double()`, `ty_int()`, `ty_ptr()`, etc., plus `ty_aint(bits)` for arbitrary-width integers, `ty_struct(name, elems)`, `ty_func(args, ret)`, and `ty_variadic_func(args, ret)`. + +The `const_*` family on `CodegenCx` covers the common constant types: + +```rust +const_real(f64) // LLVMConstReal +const_int(i32) // LLVMConstInt(i32, signed) +const_unsigned_int(u32) // LLVMConstInt(i32, signed) [same underlying call] +const_isize(isize) // LLVMConstInt(iN) +const_bool(bool) // LLVMConstInt(i1) +const_c_bool(bool) // LLVMConstInt(i8) +const_u8(u8) // LLVMConstInt(i8) +const_arr(elem_ty, vals) // LLVMConstArray2 +const_struct(ty, vals) // LLVMConstNamedStruct +const_null_ptr() // cached null i8* +const_undef(ty) // LLVMGetUndef +``` + +`const_val(&Const)` dispatches over `mir::Const` variants (`Float`, `Int`, `Bool`, `Str`) to the appropriate helper. + +`declarations.rs` adds higher-level helpers: + +- `declare_ext_fn(name, fn_type)` — C calling convention, no unnamed-address. +- `declare_int_fn(name, fn_type)` — FastCall convention, internal linkage, global-unnamed-addr (addresses are never significant; enables merging). +- `declare_int_c_fn(name, fn_type)` — C calling convention, internal linkage. +- `define_global(name, ty)` — adds a named global; returns `None` if already defined. +- `define_private_global(ty)` — unnamed global with private linkage. +- `export_val(name, ty, val, is_const)` — external linkage + DLLExport storage class; used for `OSDI_DESCRIPTORS` et al. +- `export_array(name, elem_ty, vals, is_const, add_cnt)` — array global; if `add_cnt`, also exports `.cnt` as a `size_t`. +- `const_arr_ptr(elem_ty, vals)` — an internal-linkage immutable array global; returns a pointer to it. + +--- + +## 7. `BuilderVal` and `MemLoc` + +MIR SSA values do not map one-for-one to LLVM IR values: some values live in memory and must be loaded before use. `BuilderVal` represents this: + +```rust +pub enum BuilderVal<'ll> { + Undef, // not yet defined (initialiser for the values table) + Eager(&'ll LLVMValue), // already an LLVM IR value + Load(Box>), // must be loaded with LLVMBuildLoad2 +} +``` + +`BuilderVal::get(&builder)` materialises the value: for `Eager` it is a direct return; for `Load` it emits a `load` instruction at the current insertion point. + +`MemLoc<'ll>` is a GEP descriptor: + +```rust +pub struct MemLoc<'ll> { + pub ptr: &'ll LLVMValue, // base pointer + pub ptr_ty: &'ll LLVMType, // type of the pointed-to aggregate + pub ty: &'ll LLVMType, // type of the field being accessed + pub indices: Box<[&'ll LLVMValue]>, +} +``` + +`MemLoc::struct_gep(ptr, ptr_ty, ty, idx, cx)` is the convenience constructor for a single-field struct access. `to_ptr()` emits a `getelementptr` to compute the field address; `read()` follows with a `load`. + +The primary use case is Jacobian pointer slots: `osdi` populates `builder.params` with `BuilderVal::Load` entries that point into the instance data struct. When `build_inst` encounters a `Param` value, `build_consts()` has already copied `params[p]` into `values[v]`, so the load is emitted lazily on first use. + +--- + +## 8. Callback Protocol + +MIR `Call` instructions reference a `FuncRef` — an abstract handle, not a concrete symbol. The mapping from `FuncRef` to an actual LLVM function is resolved by the caller (i.e., `osdi`) before `build_func()` is invoked. The resolution is stored in `Builder::callbacks: TiVec>>`. + +```rust +pub enum CallbackFun<'ll> { + Prebuilt(BuiltCallbackFun<'ll>), + Inline { builder: Box>, state: Box<[&'ll LLVMValue]> }, +} + +pub struct BuiltCallbackFun<'ll> { + pub fun_ty: &'ll LLVMType, + pub fun: &'ll LLVMValue, + pub state: Box<[&'ll LLVMValue]>, + pub num_state: u32, +} +``` + +For `Prebuilt`: the builder prepends `state` values to the MIR-supplied arguments and emits `LLVMBuildCall2`. If `num_state > 0`, it indicates that `state` contains `num_state` entries per call *instance*: the builder iterates `state.len() / num_state` times, calling the function once per slice — used for multi-instance Jacobian store callbacks. + +For `Inline`: `InlineCallbackBuilder::build_inline(builder, state)` is called to emit the required instructions directly, without producing a separate function. The `state` values are pre-bound closure arguments. This is used for the `$limit` dispatch table in the OSDI backend. + +If a `FuncRef` is absent from `callbacks` (i.e., `None`), the `Call` instruction is silently dropped. This makes it safe to lower a function that contains callbacks not needed in the current context (e.g., an `init` function that ignores Jacobian callbacks). + +`CodegenCx` also provides callback factories: +- `const_callback(args, val)` — synthesises an internal function that ignores all arguments and returns `val`. +- `trivial_callbacks(args)` — synthesises a no-op `void` function (used to zero out callbacks that are not relevant). +- `const_return(args, idx)` — synthesises a function that returns its `idx`-th argument unchanged. + +--- + +## 9. `build_func()` — the Translation Walk + +`Builder::build_func()` translates the entire MIR `Function` into LLVM IR in three steps. + +**Step 0 — entry block and basic block allocation.** `Builder::new()` pre-allocates one `LLVMBasicBlock` per MIR block using `LLVMAppendBasicBlockInContext`. It also allocates a synthetic entry block, positions the builder there, and optionally allocates a stack slot (`LLVMBuildAlloca`) for the return value if the function is non-void. + +**Step 1 — constant seeding.** `build_consts()` iterates `func.dfg.values()`. For each `ValueDef::Const(c)`, it calls `cx.const_val(&c)` and stores the result as `BuilderVal::Eager`. For each `ValueDef::Param(p)`, it copies `params[p]` (which was filled in by the caller). `ValueDef::Result` values are left as `Undef` until the defining instruction is processed. + +**Step 2 — block iteration.** `build_func()` computes a `ControlFlowGraph`, collects a postorder traversal, reverses it to obtain RPO (dominators before uses), and calls `build_bb(bb)` for each block. Inside `build_bb`, every instruction in the block is passed to `build_inst(inst, fast_math_mode)`. The fast-math mode is determined by the instruction's `srcloc`: a negative source location flags `FastMathMode::Partial` (see §11). + +**Step 3 — phi operand fix-up.** `PhiNode` instructions are emitted during the forward pass as empty `LLVMBuildPhi` placeholders added to `unfinished_phis`. After all blocks are processed, the outer loop iterates `unfinished_phis` and calls `LLVMAddIncoming` to wire each phi's predecessor values, which are all known by this point. + +The entry block's only instruction is an unconditional branch to the MIR entry block (`LLVMBuildBr`). This matches the LLVM convention that `alloca`s must live in the function entry block. + +--- + +## 10. Opcode-to-LLVM Dispatch + +`build_inst` first matches on `InstructionData` variant to handle structural cases, then dispatches on `Opcode` for computation opcodes. + +### Structural cases (handled first) + +| `InstructionData` variant | LLVM emission | +|---|---| +| `Branch { cond, then_dst, else_dst }` | `LLVMBuildCondBr` | +| `Jump { destination }` | `LLVMBuildBr` | +| `Exit` | `ret void` if `return_void`, else load `ret_allocated` + `ret` | +| `PhiNode(phi)` | `LLVMBuildPhi` placeholder; deferred operand fill (§9) | +| `Call { func_ref, args }` | dispatch through `builder.callbacks[func_ref]` (§8) | + +### Computation opcodes (`Unary` / `Binary`) + +| MIR opcode | LLVM instruction / intrinsic | +|---|---| +| `Iadd` | `LLVMBuildAdd` | +| `Isub` | `LLVMBuildSub` | +| `Imul` | `LLVMBuildMul` | +| `Idiv` | `LLVMBuildSDiv` | +| `Irem` | `LLVMBuildSRem` | +| `Ishl` | `LLVMBuildShl` | +| `Ishr` | `LLVMBuildLShr` (logical right shift) | +| `Ixor` | `LLVMBuildXor` | +| `Iand` | `LLVMBuildAnd` | +| `Ior` | `LLVMBuildOr` | +| `Ineg` | `LLVMBuildNeg` | +| `Inot` / `Bnot` | `LLVMBuildNot` | +| `Fadd` | `LLVMBuildFAdd` | +| `Fsub` | `LLVMBuildFSub` | +| `Fmul` | `LLVMBuildFMul` | +| `Fdiv` | `LLVMBuildFDiv` | +| `Frem` | `LLVMBuildFRem` | +| `Fneg` | `LLVMBuildFNeg` | +| `Ilt`/`Igt`/`Ile`/`Ige` | `LLVMBuildICmp` (SLT/SGT/SLE/SGE) | +| `Ieq`/`Beq`/`Ine`/`Bne` | `LLVMBuildICmp` (EQ/NE) | +| `Flt`/`Fgt`/`Fle`/`Fge` | `LLVMBuildFCmp` (OLT/OGT/OLE/OGE) | +| `Feq`/`Fne` | `LLVMBuildFCmp` (OEQ/ONE) | +| `IFcast` | `LLVMBuildSIToFP` → `double` | +| `BFcast` | `LLVMBuildUIToFP` → `double` | +| `BIcast` | `LLVMBuildIntCast2` → `i32` | +| `IBcast` | `LLVMBuildICmp(NE, val, 0)` | +| `FBcast` | `LLVMBuildFCmp(ONE, val, 0.0)` | +| `FIcast` | `llvm.lround.i32.f64` (round-to-nearest-integer) | +| `Sqrt` | `llvm.sqrt.f64` | +| `Exp` | `llvm.exp.f64` | +| `Ln` | `llvm.log.f64` | +| `Log` | `llvm.log10.f64` | +| `Sin` | `llvm.sin.f64` | +| `Cos` | `llvm.cos.f64` | +| `Pow` | `llvm.pow.f64` | +| `Floor` | `llvm.floor.f64` | +| `Ceil` | `llvm.ceil.f64` | +| `Clog2` | `llvm.ctlz(val, true)` then `LLVMBuildSub(32, ctlz)` | +| `Tan` | `tan` (libm) | +| `Hypot` | `hypot` (Linux) / `_hypot` (Windows) | +| `Asin`/`Acos`/`Atan`/`Atan2` | `asin`/`acos`/`atan`/`atan2` (libm) | +| `Sinh`/`Cosh`/`Tanh` | `sinh`/`cosh`/`tanh` (libm) | +| `Asinh`/`Acosh`/`Atanh` | `asinh`/`acosh`/`atanh` (libm) | +| `Seq` | `strcmp(a, b) == 0` | +| `Sne` | `strcmp(a, b) != 0` | +| `OptBarrier` | transparent passthrough (returns `values[args[0]]` unchanged) | + +The `Hypot` Windows special case (`_hypot`) is the only target-conditional path in the entire opcode table; it checks `target.options.is_like_windows`. + +`OptBarrier` deserialises to a no-op: it exists only to prevent MIR optimisation passes from folding across the barrier. By the time we reach codegen, its operand is simply forwarded. + +--- + +## 11. Fast-Math Signaling + +LLVM's fast-math flags permit IEEE-754 relaxations that enable vectorisation and constant folding of floating-point expressions. OpenVAF applies them selectively rather than globally. + +The signal is encoded in the `srcloc` field of the MIR instruction. A **negative** source location value is the convention for "this instruction was synthesised for performance and may be optimised aggressively." When `build_bb` encounters such an instruction it passes `FastMathMode::Partial`; otherwise `FastMathMode::Disabled`. + +```rust +let fast_math = self.func.srclocs.get(inst).map_or(false, |loc| loc.0 < 0); +``` + +After emitting the LLVM instruction, `build_inst` sets the flags: + +| Mode | `LLVMSetFastMathFlags` value | Semantics | +|---|---|---| +| `Partial` | `0x01 \| 0x02 \| 0x10` | Reassoc \| Reciprocal \| Contract | +| `Full` | `0x1F` | All flags (defined but not currently used by `build_bb`) | +| `Disabled` | (not called) | strict IEEE-754 | + +Only float and transcendental opcodes (`Fadd`, `Fsub`, `Fmul`, `Fdiv`, `Frem`, `Fneg`, comparisons, and all math functions) receive the annotation; integer opcodes ignore it. + +--- + +## 12. Intrinsics and libm + +`CodegenCx::intrinsic(name)` is the single lookup point for both LLVM intrinsics and C library symbols. It checks `self.intrinsics` (the `RefCell`) first; on a miss it uses the `ifn!` macro to declare the function and populate the cache. + +LLVM intrinsics (e.g., `llvm.sqrt.f64`) are declared with `declare_ext_fn`; they are resolved by the LLVM backend without any external symbol. C library functions (e.g., `tan`, `atanh`) are also declared as `extern` with C calling convention — the linker resolves them from the system libm when the final shared library is linked. + +The `snprintf` entry is variadic: + +```rust +if name == "snprintf" { + return Some(self.insert_intrinsic("snprintf", &[t_str, t_isize, t_str], t_i32, true)); +} +``` + +This is used by `osdi`'s `print_callback` to format diagnostic messages before passing them to `osdi_log`. + +--- + +## 13. Worked Example — Resistor `eval` + +Starting from the resistor Verilog-A: + +```verilog +V(a,b) <+ R * I(a,b); +``` + +After `hir_lower`, `sim_back` AD, and `mir_opt`, the `eval` function body contains (simplified) MIR: + +``` +; params: p0 = R (model param), p1 = I(a,b) (branch current) +v10 = fmul p0, p1 ; R * I(a,b) +v11 = fadd v10, v12 ; add to existing residual accumulator +exit +``` + +`osdi` calls `Builder::new(cx, &eval_func, llfunc, None, true)` (returns void). Before calling `build_func()` it: + +1. Populates `builder.params`: `params[0] = BuilderVal::Load(MemLoc for R field)`, `params[1] = BuilderVal::Eager(branch_current_val)`. +2. Populates `builder.callbacks` with the resolved Jacobian store callbacks. +3. Sets `builder.ret_store_ptr` to the flags output pointer. + +`build_consts()` copies `params[0]` and `params[1]` into `values[p0]` and `values[p1]`. + +`build_func()` processes the single block. For `fmul p0, p1`: + +1. `values[p0].get(builder)` emits `load double, ptr %R_ptr` (because it is `BuilderVal::Load`). +2. `values[p1].get(builder)` returns the eager branch current value directly. +3. `LLVMBuildFMul` emits `%v10 = fmul double %R_loaded, %branch_current`. + +For `fadd v10, v12`: both operands are `Eager`; `LLVMBuildFAdd` emits `%v11 = fadd double %v10, %v12`. + +For `exit` with `return_void = true`: `ret_void()` stores the return flags and emits `ret void`. + +The resulting LLVM IR fragment (as `ModuleLlvm::to_str()` would show) is: + +```llvm +define internal fastcc void @eval.0(ptr %inst, ptr %model, ...) { +entry: + br %bb0 +bb0: + %R_loaded = load double, ptr %R_ptr + %v10 = fmul double %R_loaded, %branch_current + %v11 = fadd double %v10, %acc + store i32 %flags, ptr %flags_out + ret void +} +``` + +After `ModuleLlvm::optimize()`, the inlined `load` and arithmetic may be folded further or vectorised depending on the opt level. `emit_object()` then writes the native object file. From b47b0e481ac025ce30c62b40eaf3882ab7111801 Mon Sep 17 00:00:00 2001 From: philippevelha Date: Sat, 23 May 2026 13:39:44 +0200 Subject: [PATCH 06/28] docs: add hir_def INTERNALS Covers the first compilation layer above parsing: Salsa query architecture (InternDB/HirDefDB), ItemTree as the incremental invalidation barrier, three-layer ID system (ItemTreeId/ItemLoc/interned IDs), DefMap scope tree and DefCollector, ScopeId cross-map path resolution, DefWithBodyId/Body/BodySourceMap, unresolved Expr/Stmt trees, *Data semantic query layer, and a full resistor_va trace from source through ItemTree -> DefMap -> Body. Co-Authored-By: Claude Sonnet 4.6 --- docs/hir_def/INTERNALS.md | 568 ++++++++++++++++++++++++++++++++++++++ 1 file changed, 568 insertions(+) create mode 100644 docs/hir_def/INTERNALS.md diff --git a/docs/hir_def/INTERNALS.md b/docs/hir_def/INTERNALS.md new file mode 100644 index 00000000..9a92bf3d --- /dev/null +++ b/docs/hir_def/INTERNALS.md @@ -0,0 +1,568 @@ +# `hir_def` — Internals + +The first compilation layer above parsing: item collection, name resolution, and expression lowering. + +--- + +## 1. Purpose and Position + +`hir_def` is the crate that turns a parsed Verilog-A source file into the compiler's internal representation. It sits immediately above `basedb` (which owns the file system, the virtual file system, and the raw CST parser) and immediately below `hir_ty` (which resolves types and checks semantic correctness). + +The crate produces four independent artifacts, each cached as a Salsa query: + +1. **`ItemTree`** — a flat, body-free summary of all item declarations in a file. This is the *invalidation barrier*: editing inside a function body does not change the `ItemTree`, so name resolution and item data do not need to rerun. +2. **`DefMap`** — the result of name resolution: a tree of scopes, each mapping names to `ScopeDefItem` handles. +3. **`Body`** — the normalized expression/statement tree for each definition that has a body (modules, parameters, variables, functions, nature attributes). +4. **`*Data` queries** — semantic structs (`DisciplineData`, `NatureData`, `ModuleData`, …) that present item information without requiring the caller to hold a reference to the `ItemTree`. + +--- + +## 2. Module Map + +| File | Role | +|---|---| +| `lib.rs` | ID types and `impl_intern!` macro; `ItemLoc`, `DefWithBodyId`, `ScopeId`; `Intern`/`Lookup` traits | +| `db.rs` | `InternDB` and `HirDefDB` Salsa query group traits | +| `item_tree.rs` | `ItemTree`, `ItemTreeData`, all item structs, `ItemTreeNode` trait | +| `item_tree/lower.rs` | `Ctx` — lowering from CST to `ItemTree` | +| `nameres.rs` | `DefMap`, `Scope`, `ScopeDefItem`, `ScopeOrigin`, path resolution | +| `nameres/collect.rs` | `DefCollector` — walks `ItemTree` to build `DefMap` | +| `body.rs` | `Body`, `BodySourceMap`, `body_with_sourcemap_query` | +| `body/lower.rs` | `LowerCtx` — lowering from CST expressions/statements to `Body` | +| `expr.rs` | `Expr`, `ExprId`, `Stmt`, `StmtId`, `Literal` | +| `data.rs` | `DisciplineData`, `NatureData`, `VarData`, `ParamData`, `NodeData`, `BranchData`, `FunctionData`, `ModuleData`, `AliasParamData` | +| `builtin.rs` | `BuiltIn`, `ParamSysFun`; `insert_builtin_scope` | +| `path.rs` | `Path` | +| `types.rs` | `Type` enum | + +--- + +## 3. Salsa Query Architecture + +The crate is structured around two Salsa query group traits defined in `db.rs`. + +### `InternDB` + +Provides bidirectional interning for all compound location types. For each kind of definition there is a symmetric pair of queries: + +``` +intern_module(ModuleLoc) → ModuleId lookup_intern_module(ModuleId) → ModuleLoc +intern_param(ParamLoc) → ParamId lookup_intern_param(ParamId) → ParamLoc +intern_var(VarLoc) → VarId lookup_intern_var(VarId) → VarLoc +intern_nature(NatureLoc) → NatureId ... +intern_discipline(DisciplineLoc) → DisciplineId +intern_block(BlockLoc) → BlockId +intern_branch(BranchLoc) → BranchId +intern_function(FunctionLoc) → FunctionId +intern_nature_attr(NatureAttrLoc) → NatureAttrId +intern_discipline_attr(DisciplineAttrLoc) → DisciplineAttrId +intern_node(NodeLoc) → NodeId +intern_function_arg(FunctionArgLoc) → FunctionArgId +intern_alias_param(AliasParamLoc) → AliasParamId +``` + +### `HirDefDB` + +Provides the derived queries that downstream crates consume: + +| Query | Input | Output | Notes | +|---|---|---|---| +| `item_tree` | `FileId` | `Arc` | Parses file; strips bodies | +| `def_map` | `FileId` | `Arc` | Root-file name resolution | +| `block_def_map` | `BlockId` | `Option>` | Named block scope; `None` if block has no declarations | +| `function_def_map` | `FunctionId` | `Arc` | Function-body scope | +| `body` | `DefWithBodyId` | `Arc` | Normalized expression tree | +| `body_with_sourcemap` | `DefWithBodyId` | `(Arc, Arc)` | With AST provenance | +| `param_body_with_sourcemap` | `ParamId` | `(Arc, Arc, ParamExprs)` | Parameter default/bounds | +| `discipline_data` | `DisciplineId` | `Arc` | | +| `nature_data` | `NatureId` | `Arc` | | +| `var_data` | `VarId` | `Arc` | | +| `param_data` | `ParamId` | `Arc` | | +| `node_data` | `NodeId` | `Arc` | | +| `branch_data` | `BranchId` | `Arc` | | +| `function_data` | `FunctionId` | `Arc` | | +| `module_data` | `ModuleId` | `Arc` | | +| `alias_data` | `AliasParamId` | `Arc` | | + +The dependency graph that Salsa tracks automatically: `item_tree` is a leaf (depends only on the parsed CST). `def_map` depends on `item_tree`. `body` depends on `def_map` and `item_tree`. When a file changes, Salsa invalidates `item_tree` for that file; if the resulting `ItemTree` is structurally identical (only a body expression changed), `def_map` and `body` for unrelated definitions are not recomputed. + +--- + +## 4. `ItemTree` — the AST Invalidation Barrier + +```rust +pub struct ItemTree { + pub top_level: Box<[RootItem]>, + pub(crate) data: ItemTreeData, + pub(crate) blocks: AHashMap, Block>, +} +``` + +The `ItemTree` captures every *item declaration* in a source file while deliberately omitting expression bodies. This means it is unchanged when the user edits inside an `analog begin ... end` block or a parameter default expression — making it an ideal Salsa invalidation boundary. + +`top_level` lists the root items: only `Module`, `Nature`, and `Discipline` can appear at file scope. + +`data` is the flat `ItemTreeData` struct, which holds one `Arena` per item kind: + +``` +modules, disciplines, natures, nature_attrs, discipline_attrs, +variables, parameters, alias_parameters, nets, ports, branches, functions +``` + +`blocks` maps each named `begin : name` block (by its stable `AstId`) to a `Block { name, scope_items }`. + +### `ItemTreeNode` trait + +Every item type implements `ItemTreeNode`: + +```rust +pub trait ItemTreeNode: Clone { + type Source: AstNode; + fn name(&self) -> &Name; + fn ast_id(&self) -> AstId; + fn lookup(tree: &ItemTree, index: Idx) -> &Self; + fn id_from_mod_item(mod_item: ScopeItem) -> Option>; + fn id_to_mod_item(id: ItemTreeId) -> ScopeItem; +} +``` + +This makes the item types generic over the `ScopeItem` enum via the `item_tree_nodes!` macro. `Index> for ItemTree` is also derived by the macro, so `tree[idx]` works for all item types. + +### Item structs + +| Type | Key fields | +|---|---| +| `Module` | `name`, `nodes: TiVec`, `num_ports: u32`, `items: Vec` | +| `Node` | `name`, `is_port: bool`, `decls: Vec` | +| `Param` | `name`, `ty: Option`, `is_local: bool` | +| `Var` | `name`, `ty: Type` | +| `AliasParam` | `name`, `src: Option` | +| `Branch` | `name`, `kind: BranchKind` | +| `Function` | `name`, `ty: Type`, `args: TiVec`, `items: Vec` | +| `Nature` | `name`, `parent: Option`, `access`, `ddt_nature`, `idt_nature`, `abstol`, `units`, `attrs: IdxRange` | +| `Discipline` | `name`, `potential: Option<(NatureRef, LocalDisciplineAttrId)>`, `flow`, `domain: Option<(Domain, LocalDisciplineAttrId)>` | + +`BranchKind` encodes the three syntactic forms of a Verilog-A branch declaration: + +```rust +pub enum BranchKind { + PortFlow(Path), // branch(port_name) + NodeGnd(Path), // branch(node_name) — second node is ground + Nodes(Path, Path), // branch(node_a, node_b) + Missing, // parse error recovery +} +``` + +A `Node` groups together all declarations that refer to the same electrical node. A module port like `inout p;` and a subsequent `electrical p;` both produce `NodeTypeDecl` entries pointing at `p`'s `Node`. The `decls: Vec` field holds `NodeTypeDecl::Port(port_idx)` or `NodeTypeDecl::Net(net_idx)` for each declaration of that node. + +--- + +## 5. ID System: Three Layers + +Every definition in `hir_def` is identified at three levels of abstraction. + +### Layer 1 — `ItemTreeId` + +```rust +pub type ItemTreeId = Idx; +``` + +A typed arena index into `ItemTreeData`. It is local to a specific `ItemTree` (i.e., a specific file). Two items in different files can have the same `ItemTreeId` and mean different things. + +### Layer 2 — `ItemLoc` + +```rust +pub struct ItemLoc { + pub scope: ScopeId, + pub id: ItemTreeId, +} +``` + +A globally unique item location: the `ScopeId` identifies which file and scope the item lives in; the `ItemTreeId` identifies which item within that file's `ItemTree`. `ItemLoc` is the type alias `ModuleLoc`, `ItemLoc` is `ParamLoc`, and so on. + +### Layer 3 — Interned IDs + +```rust +pub struct ModuleId(salsa::InternId); +pub struct ParamId(salsa::InternId); +// … one per item kind +``` + +Opaque handles that wrap a Salsa-interned integer. They are `Copy`, `Hash`, and stable across incremental recomputation. The `impl_intern!` macro generates both the struct and the `Intern` / `Lookup` trait implementations: + +```rust +impl Intern for ModuleLoc { + type ID = ModuleId; + fn intern(self, db: &dyn HirDefDB) -> ModuleId { db.intern_module(self) } +} +impl Lookup for ModuleId { + type Data = ModuleLoc; + fn lookup(&self, db: &dyn HirDefDB) -> ModuleLoc { db.lookup_intern_module(*self) } +} +``` + +Callers obtain the full location with `id.lookup(db)`, and from there can reach the `ItemTree` entry with `loc.item_tree(db)[loc.id]`. + +--- + +## 6. `DefMap` and Name Resolution + +```rust +pub struct DefMap { + src: DefMapSource, + scopes: Arena, + root_scope: LocalScopeId, + pub diagnostics: Vec, +} + +pub struct Scope { + pub origin: ScopeOrigin, + parent: Option, + pub children: IndexMap, + pub declarations: IndexMap, +} +``` + +A `DefMap` is a flat arena of `Scope`s connected by parent links. `root_scope` is the entry scope (always index 0). `DefMap::entry()` returns `LocalScopeId::from(0u32)`. + +**`DefMapSource`** records which Salsa query produced this map: + +```rust +pub enum DefMapSource { + Root, // def_map(file) + Block(BlockId), // block_def_map(block) + Function(FunctionId), // function_def_map(fun) +} +``` + +**`ScopeOrigin`** records what construct opened the scope: + +```rust +pub enum ScopeOrigin { + Root, + Module(ModuleId), + Block(BlockId), + Function(FunctionId), +} +``` + +**`ScopeDefItem`** is the union of all things a name can resolve to: + +```rust +pub enum ScopeDefItem { + ModuleId(ModuleId), BlockId(BlockId), + NatureId(NatureId), NatureAccess(NatureAccess), DisciplineId(DisciplineId), + NodeId(NodeId), VarId(VarId), ParamId(ParamId), AliasParamId(AliasParamId), + BranchId(BranchId), FunctionId(FunctionId), + ParamSysFun(ParamSysFun), // $mfactor, $vflip, etc. + BuiltIn(BuiltIn), // abs, sin, V, I, … + FunctionReturn(FunctionId), // the implicit return variable of an analog function + FunctionArgId(FunctionArgId), + NatureAttrId(NatureAttrId), +} +``` + +`NatureAccess(NatureAttrId)` deserves a note: in Verilog-A, each nature defines an *access function* (e.g., `Voltage` defines `V`, `Current` defines `I`). OpenVAF represents this as a `NatureAccess` in the scope, pointing at the nature attribute that records the access name. This is how `V(a,b)` and `I(a,b)` resolve. + +### `DefCollector` and scope building + +`nameres/collect.rs` provides three entry points: + +- `collect_root_def_map(db, file)` — walks `tree.top_level`, creating child scopes for each Module, Nature, and Discipline. Inside each module scope it registers nodes, parameters, variables, branches, and functions as declarations. +- `collect_function_map(db, fun)` — builds the function's own scope with its arguments and local variables. +- `collect_block_map(db, block)` — builds a named block's scope. Returns `None` if the block has no declarations (no scope needed). + +### Builtin scope + +```rust +static BUILTIN_SCOPE: Lazy> = Lazy::new(|| { + let mut scope = IndexMap::default(); + insert_builtin_scope(&mut scope); + scope +}); +``` + +The builtin scope is initialized once (lazily) and contains all Verilog-A built-in functions (`BuiltIn` variants: `abs`, `sin`, `exp`, `ln`, `V`, `I`, …) and system parameters (`ParamSysFun` variants: `$mfactor`, `$vflip`, etc.). During name resolution, when a lookup exhausts all parent scopes, the collector falls back to `BUILTIN_SCOPE`. + +--- + +## 7. `ScopeId` and Path Resolution + +```rust +pub struct ScopeId { + pub root_file: FileId, + pub local_scope: LocalScopeId, + pub src: DefMapSource, +} +``` + +`ScopeId` is a portable scope reference that can be stored inside `Body` (in `stmt_scopes`) without holding a reference to the `DefMap` itself. Given a `db`, `ScopeId::def_map(db)` reconstructs the owning `DefMap` by dispatching on `src`. + +### Path resolution + +`ScopeId::resolve_path(db, path)` dispatches: + +- If `path.is_root_path` (written `::name` in Verilog-A): look up in the root `def_map` of the file, bypassing the local scope hierarchy. +- Otherwise: delegate to `def_map.resolve_normal_path_in_scope(local_scope, segments, db)`. + +`resolve_normal_path_in_scope` walks up the parent chain within the `DefMap`, checking `scope.declarations` at each level. If the root of the `DefMap` is reached without a match, the search *crosses* into the parent `DefMap`: + +- For a block scope (`DefMapSource::Block(block)`): continues into the enclosing function or module scope via `block.parent.def_map(db)`. +- For a function scope (`DefMapSource::Function`): escalates to the module's root `DefMap`. + +Multi-segment paths (`nature.attr`) resolve the first segment to a `ScopeDefItem` that has children (a `ModuleId` or `NatureId`), then step into the child scope for the remaining segments. + +`resolve_item_path` is the typed variant: it calls `resolve_path` and then tries to downcast to `T`, producing a `PathResolveError::ExpectedItemKind` on mismatch. + +--- + +## 8. `DefWithBodyId` and `Body` + +### `DefWithBodyId` + +```rust +pub enum DefWithBodyId { + ParamId(ParamId), + ModuleId { initial: bool, module: ModuleId }, + FunctionId(FunctionId), + VarId(VarId), + NatureAttrId(NatureAttrId), + DisciplineAttrId(DisciplineAttrId), +} +``` + +This enum enumerates every definition that has an expression body. `ModuleId { initial: false, module }` selects the regular `analog begin ... end` block; `initial: true` selects the `analog initial begin ... end` block. The body query dispatches on `initial` to call `ast.analog_behaviour()` vs `ast.analog_initial_behaviour()` on the AST node. + +### `Body` + +```rust +pub struct Body { + pub exprs: Arena, + pub stmt_scopes: ArenaMap, + pub stmts: Arena, + pub entry_stmts: Box<[StmtId]>, +} +``` + +`exprs` and `stmts` are flat arenas. All `ExprId` and `StmtId` values are valid indices into these arenas. `entry_stmts` holds the top-level statement IDs — for a module analog block these are the statements directly inside `analog begin ... end`. + +`stmt_scopes` maps each statement to the `ScopeId` that was active when the statement was lowered. This is essential for `hir_ty`: when it resolves an expression inside a named block, it needs to know which `DefMap` to use. + +### `BodySourceMap` + +```rust +pub struct BodySourceMap { + pub expr_map: HashMap, ExprId>, + pub expr_map_back: ArenaMap>>, + pub stmt_map: HashMap, StmtId>, + pub stmt_map_back: ArenaMap>>, + lint_map: ArenaMap, + pub diagnostics: Vec, +} +``` + +The source map provides bidirectional provenance between `Body` arenas and the CST. `expr_map_back[expr_id]` gives the `AstPtr` of the original AST expression, used by diagnostics and IDE features to map an error back to a source location. + +--- + +## 9. `Expr` and `Stmt` + +`Body` uses an unresolved expression representation: `Path` values are sequences of `Name`s, not resolved `ScopeDefItem`s. Resolution happens in `hir_ty`. + +### `Expr` + +```rust +pub enum Expr { + Missing, + Path { path: Path, port: bool }, + BinaryOp { lhs: ExprId, rhs: ExprId, op: Option }, + UnaryOp { expr: ExprId, op: UnaryOp }, + Select { cond: ExprId, then_val: ExprId, else_val: ExprId }, + Call { fun: Option, args: Vec }, + Array(Vec), + Literal(Literal), +} +``` + +`Missing` is the error-recovery node; it arises when the parser could not produce a valid expression for a required position. + +`Path { port: bool }` — the `port` flag distinguishes a port reference `` from an ordinary name reference, as they are syntactically different in Verilog-A. + +`Call { fun: Option, args }` — both user-defined analog functions and built-in access functions (`V(a,b)`, `I(a)`) are represented the same way here. Resolution in `hir_ty` distinguishes them. + +`Literal` variants: + +| Variant | Type | +|---|---| +| `String(Box)` | string literal | +| `Int(i32)` | integer literal | +| `Float(Ieee64)` | real literal (bit-preserving IEEE 754 wrapper) | +| `Inf` | the keyword `inf` | + +### `Stmt` + +```rust +pub enum Stmt { + Missing, + Empty, + Expr(ExprId), + EventControl { event: Event, body: StmtId }, + Assignment { dst: ExprId, val: ExprId, assignment_kind: ast::AssignOp }, + Block { body: Vec }, + If { cond: ExprId, then_branch: StmtId, else_branch: StmtId }, + ForLoop { init: StmtId, cond: ExprId, incr: StmtId, body: StmtId }, + WhileLoop { cond: ExprId, body: StmtId }, + Case { discr: ExprId, case_arms: Vec }, +} +``` + +`Assignment` covers both regular assignments (`=`) and Verilog-A contribution statements (`<+`); the distinction is captured in `assignment_kind: ast::AssignOp`. Downstream (`hir_ty`, `hir_lower`), contribution statements become branch current/voltage contributions. + +`EventControl { event, body }` represents `@(initial_step) begin ... end`. The `Event` enum currently has one non-exhaustive variant: `Event::Global { kind: GlobalEvent, phases: Vec }` where `GlobalEvent` is `InitialStep` or `FinalStep`. + +Both `Expr` and `Stmt` provide `walk_child_exprs` and `walk_child_stmts` helper methods that call a closure on all direct child IDs, enabling traversal without pattern-matching the full enum. + +--- + +## 10. `*Data` Queries + +The `data.rs` module provides a second query layer above `ItemTree`. Each `*Data` struct contains the same information as the corresponding `ItemTree` item but with: + +- Index indirections resolved (e.g., `IdxRange` expanded into `Arena`) +- Self-contained values (no need to hold an `Arc` reference) + +`DisciplineData` is the most frequently used: + +```rust +pub struct DisciplineData { + pub name: Name, + pub potential: Option, + pub flow: Option, + pub domain: Option, + pub attrs: Arena, +} +``` + +`DisciplineData::compatible(self, other)` is a key predicate: two nodes can be connected only if their disciplines are compatible. Compatibility holds if either discipline has an unspecified domain, or if both have no natures declared (abstract discipline), or if both share identical `potential`, `flow`, and `domain`. + +`NatureRef { name: Name, kind: NatureRefKind }` — used inside `DisciplineData` to refer to a nature by name and kind (`Nature`, `DisciplinePotential`, `DisciplineFlow`). The actual `NatureId` is resolved later by `hir_ty`. + +Other `*Data` types follow the same pattern (query key → `Arc`): + +| Query | Struct | Notable fields | +|---|---|---| +| `var_data(VarId)` | `VarData` | `name`, `ty: Type` | +| `param_data(ParamId)` | `ParamData` | `name`, `ty: Option` | +| `node_data(NodeId)` | `NodeData` | `name`, `is_port`, `discipline: Option` | +| `branch_data(BranchId)` | `BranchData` | `name`, `kind: BranchKind` (paths resolved) | +| `function_data(FunctionId)` | `FunctionData` | `name`, `ty`, `args` | +| `module_data(ModuleId)` | `ModuleData` | `name`, `nodes`, `ports` | +| `alias_data(AliasParamId)` | `AliasParamData` | `name`, `src: Option` | + +--- + +## 11. Worked Example — `resistor.va` + +Starting from the resistor source: + +```verilog +`include "disciplines.vams" +module resistor (p, n); + inout p, n; + electrical p, n; + parameter real R = 1e3 from (0:inf); + analog begin + V(p,n) <+ R * I(p,n); + end +endmodule +``` + +### Step 1 — `ItemTree` + +`item_tree(resistor.va)` produces (simplified): + +``` +top_level: [RootItem::Module(idx=0)] + +data.modules[0] = Module { + name: "resistor", + nodes: [ + Node { name: "p", is_port: true, decls: [Port(port_p), Net(net_p)] }, + Node { name: "n", is_port: true, decls: [Port(port_n), Net(net_n)] }, + ], + num_ports: 2, + items: [Node(0), Node(1), Parameter(idx_R)], + ast_id: AstId(…), +} + +data.parameters[idx_R] = Param { + name: "R", + ty: Some(Type::Real), + is_local: false, + ast_id: AstId(…), +} +``` + +The analog block body (`V(p,n) <+ R * I(p,n)`) is *not* present in the `ItemTree`. + +### Step 2 — `DefMap` + +`def_map(resistor.va)` produces: + +``` +scope[0] (root, origin=Root): + children: { "resistor" → scope[1] } + declarations: (Natures and Disciplines from included file, built-ins via BUILTIN_SCOPE) + +scope[1] (module, origin=Module(module_id)): + parent: scope[0] + children: {} + declarations: { + "p" → NodeId(node_p), + "n" → NodeId(node_n), + "R" → ParamId(param_R), + "V" → NatureAccess(voltage_access_attr_id), + "I" → NatureAccess(current_access_attr_id), + } +``` + +`V` and `I` appear as `NatureAccess` because the included `disciplines.vams` defines the `electrical` discipline with `Voltage` and `Current` natures, and those natures declare access functions `V` and `I` respectively. + +### Step 3 — `Body` + +`body(DefWithBodyId::ModuleId { initial: false, module: module_id })` produces: + +``` +entry_stmts: [stmt_0] + +stmts[stmt_0] = Stmt::Assignment { + dst: expr_0, + val: expr_1, + assignment_kind: AssignOp::Contribute, // <+ +} + +exprs[expr_0] = Expr::Call { + fun: Some(Path { segments: ["V"], is_root_path: false }), + args: [expr_2, expr_3], +} +exprs[expr_2] = Expr::Path { path: Path { segments: ["p"] }, port: false } +exprs[expr_3] = Expr::Path { path: Path { segments: ["n"] }, port: false } + +exprs[expr_1] = Expr::BinaryOp { + lhs: expr_4, + rhs: expr_5, + op: Some(BinaryOp::Mul), +} +exprs[expr_4] = Expr::Path { path: Path { segments: ["R"] }, port: false } + +exprs[expr_5] = Expr::Call { + fun: Some(Path { segments: ["I"], is_root_path: false }), + args: [expr_6, expr_7], +} +exprs[expr_6] = Expr::Path { path: Path { segments: ["p"] }, port: false } +exprs[expr_7] = Expr::Path { path: Path { segments: ["n"] }, port: false } +``` + +All paths are still unresolved strings at this stage. `hir_ty` will resolve `"V"` → `NatureAccess(voltage_access_attr_id)`, `"R"` → `ParamId(param_R)`, `"p"` → `NodeId(node_p)`, and so on. + +The `BodySourceMap` records the `AstPtr` for each `ExprId` and `StmtId`, so that if type checking later rejects the `R * I(p,n)` expression, the diagnostic can point at the exact source range. From 0843cd7032770f17b9baae8bb5cf56ac90f56c59 Mon Sep 17 00:00:00 2001 From: philippevelha Date: Sat, 23 May 2026 13:47:32 +0200 Subject: [PATCH 07/28] docs: add hir_ty INTERNALS --- docs/hir_ty/INTERNALS.md | 464 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 464 insertions(+) create mode 100644 docs/hir_ty/INTERNALS.md diff --git a/docs/hir_ty/INTERNALS.md b/docs/hir_ty/INTERNALS.md new file mode 100644 index 00000000..48907c70 --- /dev/null +++ b/docs/hir_ty/INTERNALS.md @@ -0,0 +1,464 @@ +# `hir_ty` — Internals + +Type inference and semantic resolution for OpenVAF's HIR. + +--- + +## 1. Purpose and Position + +`hir_ty` is the layer that turns `hir_def`'s unresolved `Body` (where every path is still a sequence of name strings) into a fully typed, resolved representation. It sits above `hir_def` and below `hir_lower`, which consumes its results to emit MIR. + +Concretely, `hir_ty` adds three things that `hir_def` leaves unresolved: + +1. **Type assignment** — every `ExprId` in a `Body` gets a `Ty`, the extended type that distinguishes between values, node references, branch references, nature references, and so on. +2. **Call resolution** — every `Expr::Call` gets a `ResolvedFun` (a built-in, a user function, a system parameter) and a `Signature` (the overload that matched). +3. **Assignment destination analysis** — every `Stmt::Assignment` gets an `AssignDst`, distinguishing variable writes from flow/potential contributions. + +The crate also resolves the nature and discipline hierarchy (`NatureTy`, `DisciplineTy`, `BranchTy`) and performs semantic validation beyond what the parser can catch. + +--- + +## 2. Module Map + +| File | Role | +|---|---| +| `lib.rs` | Re-exports; `BranchTy`, `DisciplineTy`, `NatureTy` | +| `db.rs` | `HirTyDB` Salsa query group; `Alias` enum; transparent queries | +| `lower.rs` | `NatureTy`, `DisciplineTy`, `BranchTy`, `DisciplineAccess`, `BranchKind` | +| `types.rs` | `Ty`, `TyRequirement`, `TyEquivalence`, `Signature`, `SignatureData`, `BuiltinInfo` | +| `inference.rs` | `InferenceResult`, `Ctx`, inference engine, `ResolvedFun`, `AssignDst`, `BranchWrite` | +| `inference/fmt_parser.rs` | Format-string argument parser for `$display` family | +| `builtin.rs` | `bultins!` macro; `BuiltinInfo` factories; nature-access and ddx signature constants | +| `builtin/generated.rs` | Code-generated `BUILTIN_INFO` table mapping `BuiltIn` discriminants to `BuiltinInfo` | +| `diagnostics.rs` | `TypeMismatch`, `SignatureMismatch`, `ArrayTypeMismatch`, diagnostic rendering | +| `validation/body.rs` | `BodyValidationDiagnostic`; context-sensitive checks (illegal contribute, port misuse, etc.) | +| `validation/types.rs` | Type-level validation helpers | + +--- + +## 3. `HirTyDB` Query Group + +```rust +#[salsa::query_group(HirTyDatabase)] +pub trait HirTyDB: HirDefDB + Upcast { … } +``` + +`HirTyDB` extends `HirDefDB` with the following queries: + +| Query | Input | Output | Notes | +|---|---|---|---| +| `nature_info` | `NatureId` | `Arc` | Cycle-recovered (circular parent chains) | +| `discipline_info` | `DisciplineId` | `Arc` | | +| `branch_info` | `BranchId` | `Option>` | `None` if paths can't be resolved | +| `inference_result` | `DefWithBodyId` | `Arc` | Main type-inference result | +| `nature_attr_ty` | `NatureAttrId` | `Option` | Cycle-recovered | +| `resolve_alias` | `AliasParamId` | `Option` | Cycle-recovered | +| `node_discipline` | `NodeId` | `Option` | Transparent | +| `param_ty` | `ParamId` | `Type` | Transparent; infers from default if no explicit type | +| `known_limit_functions` | — | `Option>` | **Input** query; set by the CLI | + +**Input queries** have no computed implementation — their value is injected by the caller (the `osdi` driver sets `known_limit_functions` from the simulator's list of `$limit`-aware functions). + +**Cycle recovery.** Verilog-A allows circular nature parent chains in principle; Salsa would loop forever. `nature_info_recover` breaks the cycle by calling `NatureTy::obtain(db, nature, false)` — the `false` skips parent resolution. Similarly, `resolve_alias_recover` returns `Some(Alias::Cycel)` (note: a typo in the source that is preserved here) to signal a circular alias. + +**`Alias`** is the result of resolving an `aliasparam`: + +```rust +pub enum Alias { + Cycel, // circular alias chain detected + Param(ParamId), + ParamSysFun(ParamSysFun), +} +``` + +--- + +## 4. `NatureTy` and `DisciplineTy` + +### `NatureTy` + +```rust +pub struct NatureTy { + pub ddt_nature: NatureId, // nature of d/dt of this nature (self if none declared) + pub idt_nature: NatureId, // nature of ∫dt of this nature (self if none declared) + pub parent: Option, + pub base_nature: NatureId, // root of the inheritance chain + pub units: Option, +} +``` + +`NatureTy::obtain(db, nature, resolve_parent)` builds the struct by: + +1. Looking up the nature's `ddt_nature` and `idt_nature` from `NatureData`, resolving the `NatureRef` against the root `DefMap`. +2. If a parent is declared and `resolve_parent` is true, recursively calling `db.nature_info(parent)` and inheriting unset fields from it. +3. Setting `base_nature` to the parent's `base_nature`, or `nature` itself if no parent. + +**`NatureTy::compatible(db, n1, n2)`** — two natures are compatible if they share the same `units` string. This is the predicate used to determine whether an access function (e.g., `V`) applies to a node's discipline. + +**`NatureTy::related(db, n1, n2)`** — two natures are related if they share the same `base_nature`. Used for checking that `ddt`/`idt` pairs are coherent. + +**`NatureTy::lookup_attr(db, nature, name)`** — walks up the parent chain looking for a nature attribute by name. Returns `NatureAttrId` or `PathResolveError::NotFoundIn`. + +### `DisciplineTy` + +```rust +pub struct DisciplineTy { + pub flow: Option, + pub potential: Option, +} +``` + +`discipline_info_query` resolves the `NatureRef` names in `DisciplineData` to actual `NatureId`s by calling `lookup_nature` against the root `DefMap`. + +**`DisciplineTy::access(nature, db) -> Option`** — the key method for resolving an access call. It checks: + +1. Is `nature` compatible with the discipline's `flow` nature? → `Some(DisciplineAccess::Flow)` +2. Is `nature` compatible with the discipline's `potential` nature? → `Some(DisciplineAccess::Potential)` +3. Neither → `None` (invalid access, diagnosed as `InvalidNatureAccess`) + +**`DisciplineTy::compatible(other, db)`** — both `flow` natures must be compatible with each other AND both `potential` natures must be compatible. Used during `BranchKind::Nodes` discipline resolution. + +--- + +## 5. `BranchTy` + +```rust +pub struct BranchTy { + pub discipline: DisciplineId, + pub kind: BranchKind, +} + +pub enum BranchKind { + PortFlow(NodeId), + NodeGnd(NodeId), + Nodes(NodeId, NodeId), +} +``` + +`branch_info_query` resolves the `Path`-based `hir_def::BranchKind` into the `NodeId`-based `hir_ty::BranchKind`: + +``` +hir_def::BranchKind::PortFlow(Path) → BranchKind::PortFlow(NodeId) +hir_def::BranchKind::NodeGnd(Path) → BranchKind::NodeGnd(NodeId) +hir_def::BranchKind::Nodes(P1, P2) → BranchKind::Nodes(NodeId, NodeId) +hir_def::BranchKind::Missing → None +``` + +After resolving node IDs, the discipline is extracted via `BranchKind::discipline(db)`: + +- `PortFlow(node)` / `NodeGnd(node)` — use `db.node_discipline(node)`. +- `Nodes(node1, node2)` — use `node1`'s discipline; if different from `node2`'s, check `DisciplineTy::compatible`. If incompatible, no discipline can be assigned and `branch_info` returns `None`. + +--- + +## 6. `Ty` — the Extended Type + +`Type` (from `hir_def`) represents only value types: `Real`, `Integer`, `Bool`, `String`, `Array`, `Void`, `Err`. `Ty` extends this to cover every kind of expression position that can appear in a Verilog-A body: + +```rust +pub enum Ty { + Val(Type), // a computed value + Node(NodeId), // a net/port reference + PortFlow(NodeId), // a reference + Nature(NatureId), // a nature reference + Discipline(DisciplineId), // a discipline reference + Var(Type, VarId), // a variable reference (lvalue) + NatureAttr(Type, NatureAttrId), // a nature attribute reference + FunctionVar { ty: Type, fun: FunctionId, // function return var or arg + arg: Option }, + Param(Type, ParamId), // a parameter reference + Literal(Type), // an uncoerced literal + InfLiteral, // the keyword `inf` + Branch(BranchId), // a branch reference + Scope, // a module/block name (not a value) + BuiltInFunction, // resolved but not yet called + UserFunction(FunctionId), // resolved but not yet called +} +``` + +`Ty::to_value() -> Option` extracts the scalar type from any `Ty` that carries a value: `Val`, `Var`, `NatureAttr`, `Param`, `Literal`, `FunctionVar`. `InfLiteral` promotes to `Real`. All other variants return `None`. + +### `TyRequirement` + +What an expression position *demands*: + +```rust +pub enum TyRequirement { + Val(Type), // a value of a specific type + Condition, // anything assignable to Bool + AnyVal, // any value type + ArrayAnyLength { ty: Type },// array of any length with element type ty + Node, PortFlow, Nature, // reference kinds + Var(Type), Param(Type), // exact-type references (no conversion) + AnyParam, // parameter of any type + Branch, // branch reference + Literal(Type), // uncoerced literal of given type + Function, // callable +} +``` + +`TyEquivalence` (internal) controls how `satisfies` compares types: `Exact`, `Semantic` (treats `Bool` ↔ `Integer` as equivalent), or `Conversion` (allows implicit widening, e.g., `Integer` → `Real`). The `expect` method in `Ctx` calls `satisfies_with_conversion` and records a cast when a type needs coercion. + +### `Signature` and `SignatureData` + +```rust +pub struct Signature(pub u32); // index into a function's signatures array + +pub struct SignatureData { + pub args: Cow<'static, [TyRequirement]>, + pub return_ty: Type, +} +``` + +Each built-in function has one or more overloaded `SignatureData`s (e.g., `abs` has `ABS_INT` and `ABS_REAL`). Overload resolution picks the best-matching `Signature` and records it in `InferenceResult::resolved_signatures`. + +--- + +## 7. `InferenceResult` + +```rust +pub struct InferenceResult { + pub expr_types: ArenaMap, + pub resolved_calls: AHashMap, + pub resolved_signatures: AHashMap, + pub assignment_destination: AHashMap, + pub casts: AHashMap, + pub diagnostics: Vec, +} +``` + +**`expr_types`** — maps every `ExprId` to its `Ty`. Initialised to `Ty::Val(Type::Err)` for all expressions before inference runs; filled in by `infere_expr`. + +**`resolved_calls`** — maps call expressions (`Expr::Call`) to `ResolvedFun`: + +```rust +pub enum ResolvedFun { + User { func: FunctionId, limit: bool }, // user function; `limit` = called via $limit + BuiltIn(BuiltIn), // built-in; includes `potential` and `flow` + Param(ParamSysFun), // $mfactor etc. + InvalidNatureAccess(NatureId), // nature used on wrong discipline +} +``` + +**`resolved_signatures`** — maps call expressions to the `Signature` (overload index) that was selected. For nature access calls this is one of `NATURE_ACCESS_BRANCH`, `NATURE_ACCESS_NODES`, `NATURE_ACCESS_NODE_GND`, `NATURE_ACCESS_PORT_FLOW`. + +**`assignment_destination`** — maps `StmtId`s of `Stmt::Assignment` to `AssignDst`: + +```rust +pub enum AssignDst { + Var(VarId), + FunVar { fun: FunctionId, arg: Option }, + Flow(BranchWrite), + Potential(BranchWrite), +} + +pub enum BranchWrite { + Named(BranchId), + Unnamed { hi: NodeId, lo: Option }, +} +``` + +**`casts`** — maps expression IDs to the target type when an implicit conversion is needed (e.g., an `Integer` literal used where `Real` is expected). + +--- + +## 8. `Ctx` and the Inference Walk + +`infere_body_query` constructs a `Ctx` and drives the walk: + +```rust +struct Ctx<'a> { + result: InferenceResult, + body: &'a Body, + db: &'a dyn HirTyDB, + expr_stmt_ty: Option, // expected value type for Expr-statement bodies +} +``` + +`expr_stmt_ty` is set for parameters (`param_data(param).ty`) and variables (`var_data(var).ty`) — bodies that consist of a single value expression rather than a procedural block. For module analog blocks and functions it is `None`. + +**`infere_stmt(stmt)`** dispatches on `Stmt` variant: + +- `Stmt::Expr(expr)` → `infere_assignment(stmt, expr, expr_stmt_ty)` (treats the expression as an implicit assignment to the declared type) +- `Stmt::Assignment { dst, val, op }` → `infere_assignment_dst` to resolve `dst`, then `infere_assignment` to check `val` against the destination type +- `Stmt::If/ForLoop/WhileLoop { cond }` → `infere_cond` (expects `Condition`) +- `Stmt::Case { discr, case_arms }` → infers discriminant; checks each arm value satisfies the discriminant's `TyRequirement` +- All statement kinds call `walk_child_stmts` to recurse + +**`infere_expr(stmt, expr) -> Option`** is the core dispatch. It matches on `self.body.exprs[expr]`: + +| `Expr` variant | Result | +|---|---| +| `Missing` | `None` | +| `Path { port: true }` | `Ty::PortFlow(resolve_item_path)` | +| `Path { port: false }` | dispatch on resolved `ScopeDefItem` (see §10) | +| `BinaryOp { op: None }` | recurse into both sides, return `None` | +| `BinaryOp { op: Some(op) }` | `infere_bin_op` — selects signature based on op category | +| `UnaryOp { Identity }` | propagate child type | +| `UnaryOp { Neg }` | expect `Integer` or `Real`; return same type | +| `UnaryOp { BitNegate }` | expect `Integer`; return `Integer` | +| `UnaryOp { Not }` | expect `Condition`; return `Bool` | +| `Select { cond, then_val, else_val }` | `infere_cond(cond)`; resolve then/else against `SELECT` signatures | +| `Call { fun, args }` | `infere_fun_call` | +| `Array([])` | `Ty::Val(EmptyArray)` | +| `Array(args)` | `infere_array` — all elements must share a common type | +| `Literal(Float)` | `Ty::Literal(Real)` | +| `Literal(Int)` | `Ty::Literal(Integer)` | +| `Literal(Inf)` | sets `expr_types[expr]` to `expr_stmt_ty` and returns `None` | +| `Literal(String)` | `Ty::Literal(String)` | + +After computing the type, `infere_expr` stores it in `result.expr_types[expr]` and returns it. + +**`expect(expr, parent_expr, found_ty, requirements)`** checks that `found_ty` satisfies at least one requirement in the list. On success it returns the index of the matching requirement variant. If `CAST` is true (compile-time const generic) it also records a cast. On failure it pushes a `TypeMismatch` diagnostic. + +--- + +## 9. Assignment Destination Resolution + +`infere_assignment_dst(stmt, dst_expr, op)` infers `dst_expr`, then classifies the result: + +| `Ty` of `dst_expr` | `AssignDst` | +|---|---| +| `Ty::Var(ty, var)` | `AssignDst::Var(var)` | +| `Ty::FunctionVar { fun, ty, arg }` | `AssignDst::FunVar { fun, arg }` | +| `Ty::Val(Real)` with `resolved_calls[dst_expr] == BuiltIn::potential` | `AssignDst::Potential(BranchWrite)` | +| `Ty::Val(Real)` with `resolved_calls[dst_expr] == BuiltIn::flow` | `AssignDst::Flow(BranchWrite)` | +| Anything else | `InvalidAssignDst` diagnostic | + +The `BranchWrite` for a nature access is extracted from `resolved_signatures[dst_expr]`: + +- `NATURE_ACCESS_BRANCH` → `BranchWrite::Named(args[0].unwrap_branch())` +- `NATURE_ACCESS_NODES` → `BranchWrite::Unnamed { hi: args[0].unwrap_node(), lo: Some(args[1].unwrap_node()) }` +- `NATURE_ACCESS_NODE_GND` → `BranchWrite::Unnamed { hi: args[0].unwrap_node(), lo: None }` +- `NATURE_ACCESS_PORT_FLOW` → error (`PotentialOfPortFlow` is illegal as an assignment destination) + +**Operator cross-check.** After determining the `AssignDst`, the operator is validated: + +- `Var`/`FunVar` + `<+` → `InvalidAssignDst` (suggest `=`) +- `Flow`/`Potential` + `=` → `InvalidAssignDst` (suggest `<+`) +- All other combinations → `assignment_destination.insert(stmt, dst)` + +--- + +## 10. Nature Access Resolution + +When `infere_expr` encounters `Expr::Call { fun, args }` and the resolved `ScopeDefItem` is `NatureAccess(access)`, it calls `infere_nature_access(stmt, expr, access, args)`. + +The resolution proceeds in three steps: + +**Step 1 — Argument validation.** `infere_builtin(stmt, expr, BuiltIn::flow, args)` is called unconditionally. The `FLOW` built-in has four signatures: + +``` +NATURE_ACCESS_BRANCH(Branch) -> Real +NATURE_ACCESS_NODES(Node, Node) -> Real +NATURE_ACCESS_NODE_GND(Node) -> Real +NATURE_ACCESS_PORT_FLOW(PortFlow) -> Real +``` + +`resolve_function_args` picks the matching signature, records it in `resolved_signatures`, and validates the argument types. If no signature matches, the function returns early. + +**Step 2 — Discipline lookup.** `infere_access_kind(nature, expr, arg0)` is called with the access attribute's owning `NatureId`. It reads `resolved_signatures[expr]` to determine the argument kind, extracts the node or branch from `expr_types[arg0]`, and calls `db.node_discipline(node)` (or `branch_info(branch).discipline`) to get the `DisciplineId`. + +**Step 3 — Access kind determination.** `DisciplineTy::access(nature, db)` checks: + +1. `NatureTy::compatible(discipline.flow, nature)` → `DisciplineAccess::Flow` +2. `NatureTy::compatible(discipline.potential, nature)` → `DisciplineAccess::Potential` +3. Neither → `None` + +The result updates `resolved_calls[expr]`: + +| `DisciplineAccess` | `ResolvedFun` stored | +|---|---| +| `Flow` | `BuiltIn::flow` | +| `Potential` | `BuiltIn::potential` | +| `None` | `InvalidNatureAccess(nature)` | + +This is the mechanism by which `V(p,n)` (a `NatureAccess` for the `Voltage` nature) becomes `ResolvedFun::BuiltIn(BuiltIn::potential)` in the `InferenceResult` — because `electrical` discipline maps `Voltage` as a potential nature. + +--- + +## 11. Worked Example — Resistor `V(p,n) <+ R * I(p,n)` + +Starting from the `Body` produced by `hir_def` (see [`docs/hir_def/INTERNALS.md`](../hir_def/INTERNALS.md) §11): + +``` +stmt_0 = Stmt::Assignment { dst: expr_0, val: expr_1, assignment_kind: Contribute } +expr_0 = Expr::Call { fun: Some(Path["V"]), args: [expr_2, expr_3] } +expr_2 = Expr::Path { path: ["p"], port: false } +expr_3 = Expr::Path { path: ["n"], port: false } +expr_1 = Expr::BinaryOp { lhs: expr_4, rhs: expr_5, op: Some(Mul) } +expr_4 = Expr::Path { path: ["R"], port: false } +expr_5 = Expr::Call { fun: Some(Path["I"]), args: [expr_6, expr_7] } +expr_6 = Expr::Path { path: ["p"], port: false } +expr_7 = Expr::Path { path: ["n"], port: false } +``` + +**Step 1 — `infere_stmt(stmt_0)`.** +Dispatches to `infere_assignment_dst(stmt_0, expr_0, Contribute)`. + +**Step 2 — Resolve `expr_0 = V(p,n)`.** +`infere_expr(stmt_0, expr_0)` → `Expr::Call`. `resolve_path(…, ["V"])` → `ScopeDefItem::NatureAccess(voltage_access_id)`. +`infere_nature_access` called: + +- `infere_builtin(BuiltIn::flow, [expr_2, expr_3])`: + - `infere_expr(stmt_0, expr_2)` → `"p"` → `ScopeDefItem::NodeId(node_p)` → `Ty::Node(node_p)` + - `infere_expr(stmt_0, expr_3)` → `"n"` → `Ty::Node(node_n)` + - Signature `NATURE_ACCESS_NODES(Node, Node) -> Real` selected + - `resolved_signatures[expr_0] = NATURE_ACCESS_NODES` +- `infere_access_kind(voltage_nature, expr_0, expr_2)`: + - `node_discipline(node_p)` → `electrical_discipline_id` + - `discipline_info(electrical).access(voltage_nature, db)`: + - `NatureTy::compatible(current_nature, voltage_nature)` → false (units differ: A vs V) + - `NatureTy::compatible(voltage_nature, voltage_nature)` → true + - Returns `DisciplineAccess::Potential` +- `resolved_calls[expr_0] = ResolvedFun::BuiltIn(BuiltIn::potential)` +- `expr_types[expr_0] = Ty::Val(Type::Real)` + +**Step 3 — Assignment destination.** +Back in `infere_assignment_dst`: +`Ty::Val(Real)` + `resolved_calls == BuiltIn::potential` + `NATURE_ACCESS_NODES` → +`AssignDst::Potential(BranchWrite::Unnamed { hi: node_p, lo: Some(node_n) })` + +Operator check: `Potential` + `Contribute` → valid. +`assignment_destination[stmt_0] = AssignDst::Potential(Unnamed { hi: node_p, lo: Some(node_n) })` + +**Step 4 — Resolve `expr_1 = R * I(p,n)`.** +`infere_assignment(stmt_0, expr_1, Some(Real))`: + +- `infere_expr(stmt_0, expr_4)` → `"R"` → `ScopeDefItem::ParamId(param_R)` → `Ty::Param(Real, param_R)` +- `infere_expr(stmt_0, expr_5)` = `I(p,n)` → same process as `V(p,n)` but with `Current` nature → `resolved_calls[expr_5] = BuiltIn::flow` → `Ty::Val(Real)` +- `infere_bin_op(Mul, expr_4, expr_5)`: + - Both satisfy `REAL_BIN_OP`; signature `REAL_OP` selected + - `resolved_signatures[expr_1] = REAL_OP` + - `expr_types[expr_1] = Ty::Val(Real)` +- `Real.is_assignable_to(Real)` → no cast needed + +**Final `InferenceResult` (relevant entries):** + +``` +expr_types: + expr_0 → Ty::Val(Real) // V(p,n) + expr_1 → Ty::Val(Real) // R * I(p,n) + expr_2 → Ty::Node(node_p) + expr_3 → Ty::Node(node_n) + expr_4 → Ty::Param(Real, param_R) + expr_5 → Ty::Val(Real) // I(p,n) + expr_6 → Ty::Node(node_p) + expr_7 → Ty::Node(node_n) + +resolved_calls: + expr_0 → BuiltIn::potential + expr_5 → BuiltIn::flow + +resolved_signatures: + expr_0 → NATURE_ACCESS_NODES + expr_5 → NATURE_ACCESS_NODES + +assignment_destination: + stmt_0 → Potential(Unnamed { hi: node_p, lo: Some(node_n) }) + +casts: {} +diagnostics: [] +``` + +`hir_lower` consumes this `InferenceResult` to emit the MIR contribution statement, using `assignment_destination[stmt_0]` to know that this is a potential (voltage) write across the `p`–`n` branch. From 307a785f1e90b93eef093db6d1f9e77cb1f9db12 Mon Sep 17 00:00:00 2001 From: philippevelha Date: Sat, 23 May 2026 13:57:17 +0200 Subject: [PATCH 08/28] docs: add hir INTERNALS --- docs/hir/INTERNALS.md | 791 ++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 791 insertions(+) create mode 100644 docs/hir/INTERNALS.md diff --git a/docs/hir/INTERNALS.md b/docs/hir/INTERNALS.md new file mode 100644 index 00000000..8f1345b0 --- /dev/null +++ b/docs/hir/INTERNALS.md @@ -0,0 +1,791 @@ +# `hir` Crate Internals + +> **Cross-links:** +> [`hir_def` INTERNALS](../hir_def/INTERNALS.md) | +> [`hir_ty` INTERNALS](../hir_ty/INTERNALS.md) | +> [`hir_lower` INTERNALS](../hir_lower/INTERNALS.md) | +> [Architecture overview](../ARCHITECTURE.md) + +--- + +## 1. Purpose and Position + +The `hir` crate is the **public API surface of the entire OpenVAF front end**. +Everything below it — `hir_def` (item tree, name resolution, bodies) and +`hir_ty` (type inference, nature/discipline resolution) — is considered an +implementation detail. + +The crate's own module-level comment captures the design intent precisely: + +> HIR is written in "OO" style. Each type is self-contained (as in, it knows +> its parents and full context). `hir_*` crates are written in "ECS" style, with +> relatively little abstraction. Many types are not self-contained, and +> explicitly use local indexes, arenas, etc. + +Concretely, this means: + +- `hir_def` exposes raw Salsa intern IDs (`ModuleId`, `VarId`, `NodeId`, …) + and arena indices. Callers must pass a database reference to every + operation and know which arena to index. +- `hir_ty` exposes `InferenceResult` maps keyed by `ExprId`/`StmtId`. + Callers must join them with a `hir_def::Body` themselves. +- `hir` wraps each intern ID in a thin newtype (`Module`, `Variable`, `Node`, + …), combines `Body` and `InferenceResult` into a single `BodyRef`, and + presents a uniform API where every method takes only `&CompilationDB`. + +The crate is the layer that `hir_lower`, `sim_back`, `osdi`, and any external +tooling (e.g. language servers, linters) actually import. Nothing inside the +`hir_*` implementation crates is visible through the `hir` public API unless +explicitly re-exported. + +--- + +## 2. Module Map + +``` +hir/src/ + lib.rs — all public OO types: CompilationUnit, Module, Node, Branch, … + db.rs — CompilationDB: the single concrete Salsa database + body.rs — Body, BodyRef, Stmt, Expr, Ref, ResolvedFun, AssignmentLhs + rec_declarations.rs — RecDeclarations: depth-first scope iterator + declarations.rs — ScopePaths: scope iterator that builds dotted paths (older variant) + attributes.rs — AstCache: SourceFile + AstIdMap pairing for attribute lookup + diagnostics.rs — collect(): stitches all diagnostic sources into one sink pass +hir/tests/ + data_tests.rs — integration and UI tests via mini_harness +``` + +`declaration.rs` and `rec_declarations.rs` are nearly identical; the former +uses `slice::Iter` on a `Vec<(Name, ScopeDefItem)>` and the latter (the +current implementation) uses `indexmap::map::Iter`. `RecDeclarations` is what +`Module::rec_declarations()` actually returns. + +--- + +## 3. `CompilationDB` — The Single Concrete Database + +```rust +// db.rs +#[salsa::database(BaseDatabase, InternDatabase, HirDefDatabase, HirTyDatabase)] +pub struct CompilationDB { + storage: salsa::Storage, + vfs: Arc>, + root_file: FileId, +} +``` + +`CompilationDB` is the only type in the entire OpenVAF compiler that +implements all four Salsa query groups simultaneously: + +| Query group | Defined in | Provides | +|---|---|---| +| `BaseDatabase` | `basedb` | VFS, preprocessing, parsing, `AstIdMap` | +| `InternDatabase` | `hir_def` | `intern_*` / `lookup_intern_*` for all Salsa IDs | +| `HirDefDatabase` | `hir_def` | `item_tree`, `def_map`, `body`, `*_data` queries | +| `HirTyDatabase` | `hir_ty` | `inference_result`, `discipline_info`, `branch_info`, … | + +Because it implements all four groups, a `&CompilationDB` can be passed +wherever any of the underlying query group trait objects are required. The +`Upcast` and `Upcast` impls make this explicit. + +### Constructors + +**`new_fs(root_file, include_dirs, macro_flags, lints)`** — production entry +point. Reads the file from the real filesystem, registers include directories, +applies the standard macro flags (`STANDARD_FLAGS` from `basedb`), and converts +named lint overrides into the global lint overwrite table. + +**`new_virtual(contents)`** — test-and-tooling entry point. Registers a +virtual file at `/root.va` with no include directories. Used extensively in +the integration and UI tests. + +**`new(root_file, contents, include_dirs, macro_flags, lints)`** — the shared +implementation that both constructors delegate to. It: + +1. Creates a `Vfs`, inserts the standard library (from `basedb`), and assigns + a `FileId` to the root file. +2. Builds the include-dir list, prepending `/std` (the built-in standard + library virtual path). +3. Constructs the `STANDARD_FLAGS` prefix for macro flags. +4. Populates `global_lint_overwrites`: the special names `"all"`, + `"warnings"`, and `"errors"` expand to bulk overrides; any other string is + looked up in the lint registry. + +The `unsafe transmute` inside the lint-overwrite construction is a Salsa +ergonomic workaround: Salsa requires `Arc>` but constructing one +requires going through a plain `Arc<[V]>` first. + +### Snapshot support + +`CompilationDB` implements `ParallelDatabase` by cloning the Salsa storage +snapshot and sharing the same `Arc>`. This enables parallel +query execution in multi-threaded contexts. + +--- + +## 4. `CompilationUnit` — The Compilation Entry Point + +```rust +pub struct CompilationUnit { + root_file: FileId, +} +``` + +`CompilationUnit` is the user-facing handle to a single compiled Verilog-A +file. Obtained via `db.compilation_unit()`. + +| Method | What it does | +|---|---| +| `name(db)` | Returns the filename from the VFS path | +| `root_file()` | Returns the raw `FileId` | +| `modules(db)` | Walks `def_map(root_file)[entry].declarations` for `ModuleId` items → `Vec` | +| `diagnostics(db, sink)` | Delegates to `diagnostics::collect(db, root_file, sink)` | +| `test_diagnostics(db)` | Like `diagnostics` but captures to a colour-stripped `Buffer`; used in snapshot tests | +| `ast(db)` | Constructs an `AstCache` for attribute resolution | +| `preprocess(db)` | Returns the `Preprocess` result (macro-expanded token stream) | + +The `.modules()` implementation shows the ECS→OO translation pattern in its +simplest form: + +```rust +root_def_map[root_def_map.entry()] + .declarations + .iter() + .filter_map(|(_, def)| { + if let ScopeDefItem::ModuleId(id) = *def { + Some(Module { id }) + } else { + None + } + }) + .collect() +``` + +The raw `ScopeDefItem::ModuleId(id)` from `hir_def` is wrapped in the OO +newtype `Module { id }` before being returned. + +--- + +## 5. OO Type Hierarchy + +Every entity in a Verilog-A compilation has a corresponding OO newtype in +`hir`. Each type stores only an intern ID and delegates all queries to +`&CompilationDB`. + +### Entity types + +| Type | Wraps | Key methods | +|---|---|---| +| `Module` | `ModuleId` | `name`, `ports`, `internal_nodes`, `child_scopes`, `declarations`, `rec_declarations`, `analog_block`, `analog_initial_block`, `lookup_var` | +| `Block` | `BlockId` | `name` | +| `Function` | `FunctionId` | `name`, `return_ty`, `args`, `arg(idx)`, `body` | +| `FunctionArg` | `FunctionId + LocalFunctionArgId` | `name`, `ty`, `is_input`, `is_output`, `function` | +| `Node` | `NodeId` | `name`, `discipline`, `is_input`, `is_output`, `is_port`, `is_gnd` | +| `Variable` | `VarId` | `name`, `ty`, `init`, `get_attr` | +| `Parameter` | `ParamId` | `name`, `default`, `bounds`, `init`, `ty`, `get_attr` | +| `AliasParameter` | `AliasParamId` | `name`, `resolve` → `Option` | +| `Branch` | `BranchId` | `name`, `discipline`, `kind`, `get_attr` | +| `Discipline` | `DisciplineId` | `name`, `potential` → `Option`, `flow` → `Option` | +| `Nature` | `NatureId` | `name`, `units` | +| `NatureAttribute` | `NatureAttrId` | `name`, `value` → `Body` | + +All types derive `Copy`, `Clone`, `PartialEq`, `Eq`, `Hash`. `Debug` is +implemented via the `stdx::impl_debug!` macro, which formats using the +underlying Salsa intern ID's debug representation. + +### `FunctionArg` is a two-field struct + +```rust +pub struct FunctionArg { + fun_id: FunctionId, + arg_id: LocalFunctionArgId, +} +``` + +`LocalFunctionArgId` is a typed index into `FunctionData::args`. This +diverges from the single-ID pattern because function arguments don't have +their own Salsa intern ID in `hir_def` — they are always accessed through the +parent `FunctionData`. + +### `AliasParameter::resolve` + +Alias parameters are a Verilog-A construct that lets a parameter be an alias +for another parameter or a system parameter (`$temperature`, etc.). +`AliasParameter::resolve()` calls `db.resolve_alias(id)` (a `hir_ty` query) +and maps the result: + +```rust +hir_ty::db::Alias::Cycel → None // cycle in alias chain +hir_ty::db::Alias::Param(id) → Some(ResolvedAliasParameter::Parameter(Parameter { id })) +hir_ty::db::Alias::ParamSysFun → Some(ResolvedAliasParameter::SystemParameter(param)) +``` + +The `Alias::Cycel` variant name is a typo in `hir_ty` source; it is preserved +faithfully here. + +### `Module::analog_block` vs `analog_initial_block` + +```rust +pub fn analog_initial_block(&self, db: &CompilationDB) -> Body { + Body::new(DefWithBodyId::ModuleId { initial: true, module: self.id }, db) +} +pub fn analog_block(&self, db: &CompilationDB) -> Body { + Body::new(DefWithBodyId::ModuleId { initial: false, module: self.id }, db) +} +``` + +In Verilog-A, an `analog initial` block runs once at simulation start; the +main `analog` block runs at each time step. Both are represented as +`DefWithBodyId::ModuleId` differing only in the `initial` flag. + +--- + +## 6. Scope Traversal + +### `Scope` enum + +```rust +pub enum Scope { + Module(Module), + Block(Block), + Function(Function), +} +``` + +`Scope` provides a unified handle over the three kinds of declaration +containers. The private `def_map_and_scope()` method resolves the appropriate +`(LocalScopeId, Arc)` pair for each variant: + +- `Module` → `id.lookup(db).scope.local_scope` and the root def map +- `Block` → `db.block_def_map(id)`, entry scope (may be `None` for unnamed blocks) +- `Function` → `db.function_def_map(id)`, entry scope + +`.children(db)` returns immediate child scopes by examining +`def_map[scope].children` and mapping `ScopeOrigin` → `Scope` variant. + +`.declarations(db)` returns `Vec<(Name, ScopeDef)>` for the user-visible +declarations in this scope. Implementation details — `BuiltIn`, `NatureId`, +`NatureAccess`, `DisciplineId`, `ParamSysFun`, `FunctionReturn`, +`FunctionArgId`, `NatureAttrId` — are filtered out with `return None`. + +### `ScopeDef` enum + +```rust +#[non_exhaustive] +pub enum ScopeDef { + Block(Block), + ModuleInstance(Module), + Node(Node), + Variable(Variable), + Parameter(Parameter), + AliasParameter(AliasParameter), + Branch(Branch), + Function(Function), +} +``` + +This is the public projection of `hir_def::nameres::ScopeDefItem`. The +`#[non_exhaustive]` attribute signals that new variants may be added in future +versions without being a breaking change. + +### `RecDeclarations` + +`RecDeclarations<'a>` is an `Iterator` that walks +all declarations in a scope and recursively descends into named `Block` +sub-scopes. + +```rust +pub struct RecDeclarations<'a> { + path: Vec, + stack: Vec, + db: &'a CompilationDB, +} +``` + +The `stack` holds a `Vec` of pending scopes to visit; each `Scope` +element wraps an `Arc` to keep the map alive and an `indexmap::Iter` +over its declarations. When a `BlockId` entry is encountered and that block +has a named def map, a new frame is pushed onto the stack and the traversal +descends. + +`to_path(name)` builds the dotted-path string for the current position: +`["foo", "bar"]` + `name = "x"` → `"foo.bar.x"`. + +`Module::rec_declarations(db)` is the entry point: + +```rust +pub fn rec_declarations(self, db: &CompilationDB) -> RecDeclarations<'_> { + RecDeclarations::new(Scope::Module(self), db) +} +``` + +### `BranchWrite` and `BranchKind` + +```rust +pub enum BranchWrite { + Named(Branch), + Unnamed { hi: Node, lo: Option }, +} + +pub enum BranchKind { + PortFlow(Node), + NodeGnd(Node), + Nodes(Node, Node), +} +``` + +`BranchWrite` represents the target of a contribution statement (`V(a,b) <+` +or `I(b) <+`). A named branch refers to an explicit `branch` declaration; an +unnamed branch is written directly with node references. + +`BranchWrite::nodes(db)` resolves a named branch to its `(hi, Option)` +node pair by delegating to `Branch::kind(db)`. + +`BranchKind` is the resolved form of a `BranchId`: either a two-node branch +(`Nodes`), a single-node-to-ground branch (`NodeGnd`), or a port-flow branch +(`PortFlow`). + +The `From` impl on the public `BranchWrite` wraps the +`hir_ty` internal type (`inference::BranchWrite`) in the public `hir` types. + +--- + +## 7. `Body` and `BodyRef` + +```rust +pub struct Body { + body: Arc, + infere: Arc, +} +``` + +`Body` bundles the two pieces of data a consumer needs to traverse a +definition's expression tree: + +- `hir_def::body::Body` — the unresolved `Expr`/`Stmt` arenas, entry + statement list, scope assignments +- `hir_ty::inference::InferenceResult` — the five inference maps + (`expr_types`, `resolved_calls`, `resolved_signatures`, + `assignment_destination`, `casts`) + +Both are `Arc`-wrapped so `Body` is cheap to clone and share across threads. + +`Body::borrow()` gives a `BodyRef<'_>`, a borrowed view: + +```rust +pub struct BodyRef<'a> { + body: &'a hir_def::body::Body, + infere: &'a inference::InferenceResult, +} +``` + +`BodyRef` is the type all traversal methods live on. The split between `Body` +(owned) and `BodyRef` (borrowed) follows the same pattern as `String`/`str`. + +### Entry point + +```rust +pub fn entry(&self) -> &'a [StmtId] { + &self.body.entry_stmts +} +``` + +Returns the top-level statement IDs for this body. Callers iterate over these +and call `get_stmt()` to obtain decoded public `Stmt` values. + +### Type-projection helpers + +These helpers extract a specific Salsa ID from an `ExprId` by consulting +`infere.expr_types`: + +| Method | Returns | When to call | +|---|---|---| +| `into_node(expr)` | `Node` | When expr type is `Ty::Node(id)` | +| `into_port_flow(expr)` | `Node` | When expr type is `Ty::PortFlow(id)` | +| `into_parameter(expr)` | `Parameter` | When expr type is `Ty::Param(_, id)` | +| `into_branch(expr)` | `Branch` | When expr type is `Ty::Branch(id)` | + +These are used by `hir_lower` to extract the resolved entity references from +nature-access call arguments. + +### `expr_type` and `needs_cast` + +```rust +pub fn expr_type(&self, expr: ExprId) -> Type { + self.infere.expr_types[expr].to_value().unwrap() +} + +pub fn needs_cast(&self, expr: ExprId) -> Option<(Type, &'a Type)> { + let dst = self.infere.casts.get(&expr)?; + let src = self.expr_type(expr); + Some((src, dst)) +} +``` + +`expr_type` projects `Ty` (the extended type from `hir_ty`) down to the +simpler `Type` (the surface type from `hir_def`) via `Ty::to_value()`. + +`needs_cast` returns `Some((src_type, dst_type))` when the inference result +recorded a required implicit cast. The backend uses this to emit the +appropriate LLVM conversion instruction. + +### Literal helpers (`as_literalint`, `as_literalsignedint`) + +Two low-level helpers that pattern-match on `hir_def::Expr` directly to +extract integer literals, including the signed case where a literal is wrapped +in a `UnaryOp::Neg`. These are used by downstream backends that need to +inspect constant integer values at compile time. + +--- + +## 8. `Stmt` Dispatch + +`BodyRef::get_stmt(stmnt: StmtId) → Option>` translates a raw +`hir_def::Stmt` into the public `Stmt` enum. Returns `None` for +`hir_def::Stmt::Empty` and `hir_def::Stmt::Missing` (placeholder nodes +inserted by the parser on syntax errors). + +### Full dispatch table + +| `hir_def::Stmt` | Mapped to public `Stmt` | +|---|---| +| `Empty` / `Missing` | `None` | +| `Expr(e)` | `Stmt::Expr(e)` | +| `EventControl { event, body }` | `Stmt::EventControl { event, body }` | +| `Assignment { val, .. }` with `AssignDst::Var(id)` | `Stmt::Assignment { lhs: AssignmentLhs::Variable(..), rhs: val }` | +| `Assignment { val, .. }` with `AssignDst::FunVar { fun, arg: None }` | `Stmt::Assignment { lhs: AssignmentLhs::FunctionReturn(..), rhs: val }` | +| `Assignment { val, .. }` with `AssignDst::FunVar { fun, arg: Some(arg) }` | `Stmt::Assignment { lhs: AssignmentLhs::FunctionArg(..), rhs: val }` | +| `Assignment { val, .. }` with `AssignDst::Flow(branch)` | `Stmt::Contribute { kind: ContributeKind::Flow, branch: branch.into(), rhs: val }` | +| `Assignment { val, .. }` with `AssignDst::Potential(branch)` | `Stmt::Contribute { kind: ContributeKind::Potential, branch: branch.into(), rhs: val }` | +| `Block { body }` | `Stmt::Block { body }` | +| `If { cond, then_branch, else_branch }` | `Stmt::If { .. }` | +| `ForLoop { init, cond, incr, body }` | `Stmt::ForLoop { .. }` | +| `WhileLoop { cond, body }` | `Stmt::WhileLoop { .. }` | +| `Case { discr, case_arms }` | `Stmt::Case { .. }` | + +The critical translation is for `hir_def::Stmt::Assignment`. In the raw AST, +both variable assignments (`foo = bar`) and contribution statements +(`V(a,b) <+`) are represented as the same `Assignment` node. The +`InferenceResult::assignment_destination` map (keyed by `StmtId`) holds the +resolved `AssignDst` that disambiguates them at the `hir` level: + +- `AssignDst::Var` / `AssignDst::FunVar` → `Stmt::Assignment` +- `AssignDst::Flow` / `AssignDst::Potential` → `Stmt::Contribute` + +This is one of the most important translations in the crate: callers never +need to know that both are represented identically in `hir_def`. + +### Public `Stmt` enum + +```rust +pub enum Stmt<'a> { + Expr(ExprId), + EventControl { event: &'a Event, body: StmtId }, + Contribute { kind: ContributeKind, branch: BranchWrite, rhs: ExprId }, + Assignment { lhs: AssignmentLhs, rhs: ExprId }, + Block { body: &'a [StmtId] }, + If { cond: ExprId, then_branch: StmtId, else_branch: StmtId }, + ForLoop { init: StmtId, cond: ExprId, incr: StmtId, body: StmtId }, + WhileLoop { cond: ExprId, body: StmtId }, + Case { discr: ExprId, case_arms: &'a [Case] }, +} +``` + +`'a` is the lifetime of the `BodyRef` borrow. References like `&'a Event`, +`&'a [StmtId]`, and `&'a [Case]` borrow directly from the underlying +`hir_def::Body` arena. + +--- + +## 9. `Expr` Dispatch + +`BodyRef::get_expr(expr: ExprId) → Expr<'a>` translates a raw `hir_def::Expr` +into the public `Expr` enum. Panics on `hir_def::Expr` variants that should +never appear in a valid (post-inference) body. + +### Full dispatch table + +| `hir_def::Expr` | Mapped to public `Expr` | +|---|---| +| `Path { .. }` | `Expr::Read(resolve_path(expr))` — see below | +| `BinaryOp { lhs, rhs, op: Some(op) }` | `Expr::BinaryOp { lhs, rhs, op }` | +| `UnaryOp { expr, op }` | `Expr::UnaryOp { expr, op }` | +| `Select { cond, then_val, else_val }` | `Expr::Select { .. }` | +| `Call { args, .. }` with `ResolvedFun::User { func, limit }` | `Expr::Call { fun: ResolvedFun::User { func: Function { id: func }, limit }, args }` | +| `Call { args, .. }` with `ResolvedFun::BuiltIn(builtin)` | `Expr::Call { fun: ResolvedFun::BuiltIn(builtin), args }` | +| `Call { .. }` with `ResolvedFun::Param(param)` | `Expr::Read(Ref::ParamSysFun(param))` — special case | +| `Array(args)` | `Expr::Array(args)` | +| `Literal(lit)` | `Expr::Literal(lit)` | + +### `resolve_path` — path to `Ref` + +`hir_def::Expr::Path` represents any identifier reference. After inference, +the `InferenceResult::expr_types` map holds the resolved type, which encodes +the identity of the referenced entity: + +| `Ty` variant | Mapped to `Ref` | +|---|---| +| `Ty::Var(_, id)` | `Ref::Variable(Variable { id })` | +| `Ty::Param(_, id)` | `Ref::Parameter(Parameter { id })` | +| `Ty::FunctionVar { fun, arg: Some(arg), .. }` | `Ref::FunctionArg(FunctionArg { fun_id: fun, arg_id: arg })` | +| `Ty::FunctionVar { fun, arg: None, .. }` | `Ref::FunctionReturn(Function { id: fun })` | +| `Ty::NatureAttr(_, id)` | `Ref::NatureAttr(NatureAttribute { id })` | +| any other `Ty` + `resolved_calls` entry `ResolvedFun::Param(param)` | `Ref::ParamSysFun(param)` | + +### `Param`-as-call special case + +Verilog-A system parameters like `$temperature` and `$vt` can be written +either as bare identifiers or as zero-argument calls (`$temperature()`). +`hir_def` parses the call form as `Expr::Call`, but `hir_ty` resolves it to +`inference::ResolvedFun::Param`. `get_expr()` catches this case and returns +`Expr::Read(Ref::ParamSysFun(param))` instead of `Expr::Call`, hiding the +syntactic distinction from downstream consumers. + +### `get_call_signature` + +```rust +pub fn get_call_signature(&self, expr: ExprId) -> Signature { + self.infere.resolved_signatures.get(&expr).copied().unwrap_or(Signature(u32::MAX)) +} +``` + +Returns the resolved overload signature for a call expression. +`Signature(u32::MAX)` signals "no recorded signature" (e.g. user-defined +function calls, which are not overloaded). The `hir::signatures` module +re-exports all named `Signature` constants from `hir_ty::builtin` and +`hir_ty::types` so callers can match against them by name. + +### Public `Expr` enum + +```rust +pub enum Expr<'a> { + Read(Ref), + BinaryOp { lhs: ExprId, rhs: ExprId, op: BinaryOp }, + UnaryOp { expr: ExprId, op: UnaryOp }, + Select { cond: ExprId, then_val: ExprId, else_val: ExprId }, + Call { fun: ResolvedFun, args: &'a [ExprId] }, + Array(&'a [ExprId]), + Literal(&'a Literal), +} +``` + +```rust +pub enum Ref { + Variable(Variable), + Parameter(Parameter), + FunctionArg(FunctionArg), + FunctionReturn(Function), + NatureAttr(NatureAttribute), + ParamSysFun(ParamSysFun), +} + +pub enum ResolvedFun { + User { func: Function, limit: bool }, + BuiltIn(BuiltIn), +} +``` + +`ResolvedFun::User::limit` is `true` when the call goes through a `$limit` +wrapper (used in certain compact model convergence-aid patterns). + +--- + +## 10. `AstCache` and Attribute Resolution + +Verilog-A supports `(*` attribute `*)` annotations on declarations. The +`hir` API surfaces this through `get_attr()` methods on `Variable`, +`Parameter`, and `Branch`. These methods require an `AstCache`: + +```rust +pub struct AstCache { + ast: syntax::SourceFile, + id_map: Arc, +} +``` + +`AstCache` is constructed via `CompilationUnit::ast(db)`, which calls +`db.parse(root_file).tree()` and `db.ast_id_map(root_file)`. It pairs the +concrete syntax tree with the stable `AstIdMap` (the `ErasedAstId` index +structure maintained by `basedb`). + +`resolve_attribute(name, erased_id)` locates an attribute by name on any AST +node identified by its `ErasedAstId`: + +1. Calls `id_map.get_attr(id, attribute)` to get the index of the named + attribute (returns `None` if absent). +2. Calls `id_map.get_syntax(id).to_node(ast.syntax())` to recover the CST node. +3. Retrieves the attributes iterator for that node. For `Var` and `Param` + nodes, attributes are attached to the parent statement, not the declaration + itself, so `ast.parent().unwrap()` is called. +4. Returns `attrs.nth(idx)` — the `ast::Attr` CST node. + +The returned `ast::Attr` can be inspected to extract the attribute's value +expression. This round-trip through the CST is intentional: the attribute +system is not lowered into `hir_def`, so the only way to access attributes is +to go back to the original parse tree. + +--- + +## 11. Diagnostics Collection + +`hir::diagnostics::collect(db, root_file, sink)` is the single function that +aggregates all front-end diagnostics into one `DiagnosticSink` pass. It is +called by `CompilationUnit::diagnostics()`. + +The collection pipeline runs in this order: + +1. **Preprocessor diagnostics** — `db.preprocess(root_file).diagnostics` + (unknown macros, unterminated `ifdef`, etc.) + +2. **Parser diagnostics** — `db.parse(root_file).errors()` (syntax errors + from the CST parser) + +3. **Type-validation diagnostics** — `hir_ty::validation::TypeValidationDiagnostic::collect(db, root_file)` + wrapped in `TypeValidationDiagnosticWrapped` with access to the parse tree, + source map, and item tree for source location rendering + +4. **Name-resolution (def) diagnostics** — `collect_def_map()` iterates + `def_map.diagnostics`, wrapping each in `DefDiagnosticWrapped` + +5. **Body/inference diagnostics** — for each module (analog + initial blocks), + function, and named block: + - `InferenceDiagnosticWrapped` — type mismatches, unresolved names, invalid + nature accesses, etc. from `inference_result(def).diagnostics` + - `BodyValidationDiagnostic::collect(db, def)` — semantic rule violations + (e.g. `ddt`/`idt` used outside analog context) wrapped in + `BodyValidationDiagnosticWrapped` + +The traversal is recursive: `collect_scope()` recursively visits child +functions and named blocks inside each module scope. + +All diagnostic types implement the `Diagnostic` trait from `basedb`, which +provides a `render()` method that produces a source-annotated message using +the parse tree, source map, and AST ID map. + +--- + +## 12. Worked Example + +### Verilog-A source + +```verilog +`include "disciplines.vams" + +module resistor(p, n); + inout p, n; + electrical p, n; + parameter real R = 1e3; + analog begin + V(p,n) <+ R * I(p,n); + end +endmodule +``` + +### Step 1: Build the database + +```rust +use hir::CompilationDB; + +let db = CompilationDB::new_virtual(SOURCE)?; +let unit = db.compilation_unit(); +``` + +`new_virtual` registers the source at `/root.va`, inserts the standard +library, and returns a `CompilationDB`. No Salsa queries have run yet; +everything is demand-driven. + +### Step 2: Enumerate modules + +```rust +let modules = unit.modules(&db); +// → [Module { id: ModuleId(0) }] + +let m = modules[0]; +println!("{}", m.name(&db)); // "resistor" +``` + +`modules()` walks `def_map(root_file)[entry].declarations` for +`ScopeDefItem::ModuleId` entries and wraps each in `Module { id }`. + +### Step 3: Inspect ports and internal nodes + +```rust +let ports = m.ports(&db); +// → [Node { NodeId(0) = "p" }, Node { NodeId(1) = "n" }] + +for node in &ports { + println!("{}: discipline={}", node.name(&db), node.discipline(&db).name(&db)); +} +// p: discipline=electrical +// n: discipline=electrical +``` + +`Node::discipline()` calls `db.node_discipline(id)` (a `hir_ty` query that +resolves the discipline from the node's declaration scope). + +### Step 4: Enumerate declarations + +```rust +for (name, def) in m.rec_declarations(&db) { + println!("{name}: {def:?}"); +} +// p: Node(Node { NodeId(0) }) +// n: Node(Node { NodeId(1) }) +// R: Parameter(Parameter { ParamId(0) }) +``` + +`rec_declarations` uses `RecDeclarations`, recursing into named sub-scopes. +Builtins, natures, and disciplines are filtered out. + +### Step 5: Walk the analog body + +```rust +let body = m.analog_block(&db); +let br = body.borrow(); + +for &sid in br.entry() { + match br.get_stmt(sid) { + Some(hir::Stmt::Contribute { kind, branch, rhs }) => { + println!("Contribute ({kind:?}) to {branch:?}"); + // Contribute (Potential) to Unnamed { hi: Node(p), lo: Some(Node(n)) } + } + _ => {} + } +} +``` + +`get_stmt` consults `infere.assignment_destination[sid]`. The raw +`hir_def::Stmt::Assignment` for `V(p,n) <+` maps to `AssignDst::Potential`, +so the result is `Stmt::Contribute { kind: ContributeKind::Potential, … }`. + +### Step 6: Walk the RHS expression + +The `rhs: ExprId` from the contribute statement refers to `R * I(p,n)`. +Walking it: + +```rust +// rhs → BinaryOp { lhs: ExprId(R_read), rhs: ExprId(I_call), op: Mul } +match br.get_expr(rhs) { + Expr::BinaryOp { lhs, rhs: i_call, op } => { + // lhs: Expr::Read(Ref::Parameter(Parameter { ParamId(0) "R" })) + // i_call: Expr::Call { fun: ResolvedFun::BuiltIn(BuiltIn::flow), args: [p, n] } + let sig = br.get_call_signature(i_call); + // sig == hir::signatures::NATURE_ACCESS_NODES + } + _ => unreachable!() +} +``` + +The nature access `I(p,n)` has been resolved to `ResolvedFun::BuiltIn(flow)` +with signature `NATURE_ACCESS_NODES`. The two argument expressions have types +`Ty::Node(NodeId(0))` and `Ty::Node(NodeId(1))`, accessible via +`br.into_node(arg_expr)`. + +### Summary of resolved state + +| What | Raw `hir_def` | Resolved in `hir` | +|---|---|---| +| `V(p,n) <+` | `Stmt::Assignment` | `Stmt::Contribute { kind: Potential, branch: Unnamed { p, n } }` | +| `R` identifier | `Expr::Path` | `Expr::Read(Ref::Parameter(R))` | +| `I(p,n)` call | `Expr::Call` | `Expr::Call { fun: BuiltIn(flow), sig: NATURE_ACCESS_NODES }` | +| `p` argument | `Expr::Path` | `Ty::Node(p)` → `into_node()` → `Node { p }` | From 121546ba3b5d4f97aa1c7a6a0bdc6259b854eded Mon Sep 17 00:00:00 2001 From: philippevelha Date: Sat, 23 May 2026 14:03:25 +0200 Subject: [PATCH 09/28] docs: rewrite ARCHITECTURE.md with cross-links and stage-by-stage walkthrough --- docs/ARCHITECTURE.md | 529 +++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 529 insertions(+) create mode 100644 docs/ARCHITECTURE.md diff --git a/docs/ARCHITECTURE.md b/docs/ARCHITECTURE.md new file mode 100644 index 00000000..1c445a72 --- /dev/null +++ b/docs/ARCHITECTURE.md @@ -0,0 +1,529 @@ +# OpenVAF — Architecture + +OpenVAF is a compiler for Verilog-A compact device models. It reads `.va` +source files and produces native shared libraries (`.so` on Linux/macOS, +`.dll` on Windows) that expose the [OSDI 0.4](https://opensimulatorinterface.org) +interface. Circuit simulators such as ngspice and Qucs-S can `dlopen` these +libraries and use the models directly. + +--- + +## Repository layout + +``` +openvaf/ — all Rust source (read-only reference; do not edit) + openvaf/ — the CLI entry point and top-level orchestration + hir*/ — front end: parsing, HIR, type inference + mir*/ — middle end: SSA IR, optimisation, AD + sim_back/ — simulation model builder + osdi/ — OSDI descriptor emission and full codegen driver + mir_llvm/ — MIR → LLVM IR translation + linker/ — platform linker invocation + target/ — target triple and CPU definitions + basedb/ — root Salsa database + tokens/ lexer/ preprocessor/ parser/ syntax/ vfs/ — front-end pipeline + stdx/ arena/ bitset/ … — utility crates + +docs/ — this documentation tree + ARCHITECTURE.md — this file + mir/INTERNALS.md — MIR SSA representation + mir_autodiff/INTERNALS.md + hir_lower/INTERNALS.md + hir_def/INTERNALS.md + hir_ty/INTERNALS.md + hir/INTERNALS.md + mir_llvm/INTERNALS.md + sim_back/INTERNALS.md + osdi/INTERNALS.md + +tutorials/ — progressive tutorial series (in progress) +``` + +--- + +## Pipeline overview + +``` +┌──────────────────────────────────────────────────────────────────────────────┐ +│ Source (.va) │ +└────────────────────────────────┬─────────────────────────────────────────────┘ + │ + ┌────────────▼────────────┐ + │ Preprocessing │ preprocessor, vfs + │ macro expansion │ + │ `include / `ifdef │ + └────────────┬────────────┘ + │ + ┌────────────▼────────────┐ + │ Lexing & Parsing │ tokens, lexer, parser, syntax + │ lossless rowan CST │ + └────────────┬────────────┘ + │ + ┌────────────▼────────────┐ + │ HIR Construction │ basedb, hir_def + │ item tree │ + │ name resolution │ + │ expression bodies │ + └────────────┬────────────┘ + │ + ┌────────────▼────────────┐ + │ Type Inference │ hir_ty, hir + │ builtin signatures │ + │ nature/discipline │ + │ assignment targets │ + └────────────┬────────────┘ + │ + ┌────────────▼────────────┐ + │ HIR → MIR Lowering │ hir_lower, mir_build + │ SSA construction │ + │ HirInterner wiring │ + └────────────┬────────────┘ + │ + ┌────────────▼────────────┐ + │ MIR Optimisation │ mir_opt + │ SCCP, GVN, inst-comb │ + │ DCE, simplify_cfg │ + └────────────┬────────────┘ + │ + ┌────────────▼────────────┐ + │ Automatic Diff (AD) │ mir_autodiff + │ Jacobian generation │ + │ source transformation │ + └────────────┬────────────┘ + │ + ┌────────────▼────────────┐ + │ Simulation Backend │ sim_back + │ DAE system assembly │ + │ node collapse │ + │ init / noise funcs │ + └────────────┬────────────┘ + │ + ┌────────────▼────────────┐ + │ LLVM IR Codegen │ mir_llvm, osdi + │ MIR → LLVM IR │ + │ OSDI descriptor IR │ + │ → .o object files │ + └────────────┬────────────┘ + │ + ┌────────────▼────────────┐ + │ Linking │ linker, target + │ .o files → .so / .dll │ + └────────────┬────────────┘ + │ + ┌────────────▼────────────┐ + │ OSDI 0.4 Shared Library│ + │ (.so / .dll) │ + └─────────────────────────┘ +``` + +--- + +## Crate map + +### Entry points + +| Crate | Kind | Role | +|---|---|---| +| `openvaf-driver` | binary | CLI (`openvaf`); parses arguments, calls `openvaf::compile` | +| `openvaf` | lib | Orchestrates the full pipeline: `compile()` and `expand()` | + +### Front end — syntax + +| Crate | Role | +|---|---| +| `vfs` | Virtual filesystem; maps `FileId` to source text | +| `tokens` | `SyntaxKind` enum and `T!` macro shared by lexer and parser | +| `lexer` | Produces a flat token stream from a `&str` | +| `preprocessor` | Macro expansion, `` `include ``, `` `ifdef ``/`` `ifndef ``; integrates with `vfs` | +| `parser` | Converts token stream to rowan parse events | +| `syntax` | Constructs a lossless rowan CST (`Parse`); combines parser + preprocessor | + +### Front end — semantic analysis / HIR + +| Crate | Role | INTERNALS | +|---|---|---| +| `basedb` | Root Salsa database; VFS queries, line indexing, lint storage | — | +| `hir_def` | `ItemTree`, name resolution (`DefMap`), expression/statement body arenas | [hir_def/INTERNALS.md](hir_def/INTERNALS.md) | +| `hir_ty` | Type inference, builtin signatures, nature/discipline/branch resolution | [hir_ty/INTERNALS.md](hir_ty/INTERNALS.md) | +| `hir` | Public OO API over `hir_def` + `hir_ty`; `CompilationDB`; `Body`/`BodyRef` | [hir/INTERNALS.md](hir/INTERNALS.md) | + +### Middle end — MIR + +| Crate | Role | INTERNALS | +|---|---|---| +| `mir` | SSA-form IR: `Function`, `DataFlowGraph`, `Layout`, `ControlFlowGraph`, dominators | [mir/INTERNALS.md](mir/INTERNALS.md) | +| `mir_build` | Cranelift-style `FunctionBuilder`; constructs MIR from a stream of instructions | — | +| `hir_lower` | Lowers HIR `Body` to MIR via `MirBuilder`; defines `HirInterner`, `ParamKind`, `PlaceKind` | [hir_lower/INTERNALS.md](hir_lower/INTERNALS.md) | +| `mir_opt` | Optimisation passes: SCCP, GVN, inst-combine, DCE, aggressive DCE, simplify-cfg, taint propagation | — | +| `mir_autodiff` | Automatic differentiation by source transformation; adds derivative instructions to an existing `Function` | [mir_autodiff/INTERNALS.md](mir_autodiff/INTERNALS.md) | +| `mir_interpret` | Interpreter for MIR (used in tests) | — | +| `mir_reader` | Text serialisation / deserialisation of MIR (for test fixtures) | — | + +### Back end + +| Crate | Role | INTERNALS | +|---|---|---| +| `sim_back` | Builds `CompiledModule` containing `DaeSystem`, eval/init/noise `Function`s, node collapse | [sim_back/INTERNALS.md](sim_back/INTERNALS.md) | +| `osdi` | Full codegen driver: calls `sim_back`, translates MIR via `mir_llvm`, emits OSDI 0.4 descriptor IR | [osdi/INTERNALS.md](osdi/INTERNALS.md) | +| `mir_llvm` | Translates MIR `Function`s to LLVM IR via `llvm-sys`; manages lifetime hierarchy `LLVMBackend` → `ModuleLlvm` → `CodegenCx` → `Builder` | [mir_llvm/INTERNALS.md](mir_llvm/INTERNALS.md) | +| `target` | Target triple, CPU, and platform definitions | — | +| `linker` | Invokes MSVC / GCC / Clang linker to produce `.dll`/`.so` | — | + +### Utility crates + +`stdx`, `arena`, `bitset`, `bforest`, `list_pool`, `typed_indexmap`, +`workqueue`, `paths`, `base_n`, `mini_harness`, `sourcegen`, `xtask`. + +--- + +## Stage-by-stage walkthrough + +### 1. Source → CST + +The front-end pipeline is: VFS read → preprocessor → lexer → parser → `syntax`. + +The `preprocessor` expands `` `define ``/`` `ifdef ``/`` `include `` and integrates +with the `Vfs` to follow `include` chains across file boundaries. It produces +a flat, macro-expanded token stream with source spans that map back to the +original files via a `SourceMap`. + +The `parser` consumes this stream and emits rowan parse events. The `syntax` +crate turns those events into a lossless concrete syntax tree +(`Parse`). "Lossless" means all whitespace and comments are +preserved; every byte of the original source has a node in the tree. This +matters because `basedb` uses the tree for span-accurate error reporting. + +The CST is the last representation that understands Verilog-A surface syntax. +Everything below operates on derived, computed views. + +### 2. CST → item tree and name resolution (`hir_def`) + +→ **Details:** [hir_def/INTERNALS.md](hir_def/INTERNALS.md) + +`hir_def` builds two key structures from the CST: + +**`ItemTree`** is a stripped, body-free summary of all top-level declarations. +It is the invalidation firewall: if only a function body changes, the +`ItemTree` is unchanged, and no query that depends on names or structure needs +to re-run. + +**`DefMap`** is a tree of `Scope` nodes, one per lexical scope +(file root, module, named block, function). Each scope holds an +`IndexMap` of its declarations. The `DefCollector` builds +this by walking the `ItemTree`. + +**`Body`** holds `Arena` and `Arena` for every definition with a +body (`analog` block, `analog initial` block, function, parameter default, +variable initialiser, nature attribute). Paths are recorded as unresolved +strings at this stage. + +All these structures are Salsa queries; re-running them is demand-driven and +incremental. + +### 3. Type inference and discipline resolution (`hir_ty`) + +→ **Details:** [hir_ty/INTERNALS.md](hir_ty/INTERNALS.md) + +`hir_ty` produces an `InferenceResult` for every `DefWithBodyId`. The five +maps it contains are the most important outputs of the front end: + +| Map | Keys | Values | +|---|---|---| +| `expr_types` | `ExprId` | `Ty` (extended type; includes `Node`, `Branch`, `Param`, …) | +| `resolved_calls` | `ExprId` | `ResolvedFun` (builtin vs user vs param-sys-fun) | +| `resolved_signatures` | `ExprId` | `Signature` (which overload of a builtin) | +| `assignment_destination` | `StmtId` | `AssignDst` (variable, function arg, flow/potential contribution) | +| `casts` | `ExprId` | `Type` (required implicit cast target) | + +Additionally, `hir_ty` resolves nature/discipline hierarchies (`NatureTy`, +`DisciplineTy`, `BranchTy`) and alias-parameter chains. + +### 4. Public HIR API (`hir`) + +→ **Details:** [hir/INTERNALS.md](hir/INTERNALS.md) + +`hir` is the **only layer that `hir_lower`, `sim_back`, `osdi`, and external +tooling import**. It presents: + +- A single concrete Salsa database (`CompilationDB`) that bundles all four + query groups. +- OO newtypes (`Module`, `Node`, `Branch`, `Variable`, `Parameter`, …) that + wrap Salsa intern IDs and expose methods taking only `&CompilationDB`. +- `Body` + `BodyRef`: a combined view that joins `hir_def::Body` with + `InferenceResult`. `get_stmt()` translates raw `hir_def::Stmt + + AssignDst` into the public `Stmt` enum, disambiguating variable assignments + from contribution statements. `get_expr()` resolves paths and call targets + into the public `Expr` enum. +- `RecDeclarations`: a depth-first iterator over all declarations in a scope + hierarchy. + +### 5. HIR → MIR lowering (`hir_lower`) + +→ **Details:** [hir_lower/INTERNALS.md](hir_lower/INTERNALS.md) + +The entry point is `MirBuilder::build()`, which uses a Cranelift-style +`FunctionBuilder` from `mir_build` to construct an SSA `mir::Function` by +walking the HIR `BodyRef`. + +The `HirInterner` is the critical bridge between the HIR and MIR worlds. +The MIR has no knowledge of Verilog-A concepts; it only knows `Value`, +`Param`, and `FuncRef`. `HirInterner` provides the mapping: + +```rust +pub struct HirInterner { + pub outputs: IndexMap>, + pub params: TiMap, + pub callbacks: TiSet, + pub implicit_equations: TiVec, + pub lim_state: TiMap>, + // … +} +``` + +`ParamKind` names every possible MIR input: model parameters +(`Param(Parameter)`), simulation state (`Voltage`, `Current`, `Temperature`, +`Abstime`), convergence aids (`EnableLim`, `PrevState`, `HiddenState`), and +OSDI protocol flags (`ParamGiven`, `PortConnected`). + +`PlaceKind` names every possible MIR output: variable writes (`Var`), +branch contributions (`Contribute { dst, reactive, voltage_src }`), implicit +residuals, and parameter bounds. + +`HirInterner` is carried alongside the `Function` through all subsequent +passes so that `sim_back` and `osdi` can interpret MIR `Param`/`Value` +indices in terms of HIR entities. + +### 6. MIR optimisation (`mir_opt`) + +MIR passes operate on `mir::Function` in isolation — no HIR, no database. +The following passes are exported from `mir_opt`: + +| Pass | What it does | +|---|---| +| `sparse_conditional_constant_propagation` | SCCP: folds constants and eliminates unreachable branches | +| `global_value_numbering` (GVN) | Eliminates redundant computations by assigning value classes | +| `inst_combine` | Peephole rewrites (e.g. `a * 1.0 → a`, double-negation) | +| `dead_code_elimination` | Removes instructions whose results are unused | +| `aggressive_dead_code_elimination` | Removes instructions whose results are only used by other dead instructions | +| `simplify_cfg` / `simplify_cfg_no_phi_merge` | Merges trivially empty blocks, removes unreachable blocks | +| `propagate_taint` / `propagate_direct_taint` | Marks values that depend on operating-point inputs (used by AD to decide what to differentiate) | + +Passes run both before AD (to simplify the function before differentiation) +and after AD (to clean up the generated derivative code). + +### 7. Automatic differentiation (`mir_autodiff`) + +→ **Details:** [mir_autodiff/INTERNALS.md](mir_autodiff/INTERNALS.md) + +AD is performed by **source transformation directly on MIR**, not on LLVM IR. +The entry point is `auto_diff(func, dom_tree, derivatives, extra_derivatives)`. + +This placement in the pipeline means: +- Derivatives are computed before final optimisation, so `mir_opt` can + simplify the generated Jacobian code. +- The derivative instructions share the same SSA representation as + forward-mode instructions, so all MIR passes apply uniformly. +- No LLVM-level differentiation infrastructure is needed. + +`HirInterner::unknowns()` computes the `KnownDerivatives` structure that tells +AD which MIR `Value`s correspond to voltage/current unknowns (and thus need +Jacobian columns) and which `FuncRef`s are `ddx` calls. + +### 8. Simulation model construction (`sim_back`) + +→ **Details:** [sim_back/INTERNALS.md](sim_back/INTERNALS.md) + +`sim_back::collect_modules(db, all_vars_opvars, sink)` runs the full front end +for all modules and returns `Vec`. For each module, the `osdi` +driver then calls `sim_back` to build a `CompiledModule`: + +```rust +pub struct CompiledModule<'a> { + pub info: &'a ModuleInfo, + pub dae_system: DaeSystem, + pub eval: Function, // main evaluation kernel (with HirInterner) + pub intern: HirInterner, + pub init: Initialization, // instance setup kernel + pub model_param_setup: Function, // model-level parameter setup + pub model_param_intern: HirInterner, + pub node_collapse: NodeCollapse, +} +``` + +`DaeSystem` describes the differential-algebraic equation system: the +contribution residuals, Jacobian sparsity, collapsed nodes, noise sources, and +which branches are voltage sources vs current sources. + +`NodeCollapse` records which node pairs can be collapsed to zero voltage +difference (a simulator optimisation that reduces matrix size). + +`Initialization` holds the instance-level setup kernel and its cache-slot +metadata (values computed once at instance creation and cached for use in +subsequent evaluations). + +### 9. LLVM IR codegen (`mir_llvm`, `osdi`) + +→ **Details:** [mir_llvm/INTERNALS.md](mir_llvm/INTERNALS.md) | [osdi/INTERNALS.md](osdi/INTERNALS.md) + +`osdi::compile()` is the driver. For each `CompiledModule` it: + +1. Calls `mir_llvm` to translate each MIR `Function` to LLVM IR via + `CodegenCx::build_func()`. The four-layer ownership hierarchy is: + `LLVMBackend` (target spec, `'t`) → `ModuleLlvm` (LLVM module, `'ll`) → + `CodegenCx` (type cache, `'cx`) → `Builder` (instruction emitter, `'a`). + +2. Emits the OSDI 0.4 descriptor as LLVM globals directly: `OSDI_DESCRIPTORS` + (array of `osdi_descriptor` structs) and `OSDI_DESCRIPTOR_SIZE` (`u32`). + +3. Runs LLVM's optimisation pipeline (`LLVMRunPasses`) at the requested + optimisation level (O0–O3). + +4. Emits a native object file (`.o`) per module via `LLVMTargetMachineEmitToFile`. + +### 10. Linking + +`linker::link()` invokes the platform linker (MSVC `link.exe`, GCC, or Clang) +to combine the `.o` files into a single `.so`/`.dll`. The intermediate `.o` +files are deleted after a successful link. + +--- + +## Incremental computation with Salsa + +OpenVAF uses [Salsa](https://github.com/salsa-rs/salsa) for incremental, +demand-driven computation across the entire front end. `CompilationDB` +implements four Salsa query groups that layer on top of each other: + +``` +BaseDatabase — VFS, preprocessing, parsing, AstIdMap, lint registry + ↑ +InternDatabase — intern_*/lookup_intern_* for all Salsa IDs + ↑ +HirDefDatabase — item_tree, def_map, body, *_data queries + ↑ +HirTyDatabase — inference_result, discipline_info, branch_info, resolve_alias, … +``` + +Each query memoises its result. When a source file changes, Salsa invalidates +only the queries whose inputs changed. Because `ItemTree` strips expression +bodies, a change inside a function body does not invalidate the `def_map` +query (which depends on the `ItemTree`, not on bodies). Similarly, a change +to a parameter default value does not retrigger type inference for the module's +analog block. + +The MIR and everything downstream (`mir_opt`, `mir_autodiff`, `sim_back`, +`osdi`, `mir_llvm`, `linker`) are **not Salsa queries**. They run in a +single eagerly-executed pass, driven by `openvaf::compile()`. The Salsa +boundary is at `hir`: once a `CompiledModule` is built, all subsequent work +is pure computation. + +--- + +## Key data structures at stage boundaries + +| Boundary | Producer | Type | Consumer | +|---|---|---|---| +| Source text | VFS | `FileId` + byte content | `preprocessor` | +| Token stream | `preprocessor` | `TokenStream` + `SourceMap` | `parser` | +| CST | `syntax` | `Parse` | `hir_def` | +| Item tree | `hir_def` | `ItemTree` | `hir_ty`, `hir` | +| Name resolution | `hir_def` | `DefMap` | `hir_ty`, `hir` | +| Expression bodies | `hir_def` | `Body` + `BodySourceMap` | `hir_ty`, `hir` | +| Type inference | `hir_ty` | `InferenceResult` | `hir`, `hir_lower` | +| HIR (public API) | `hir` | `CompilationDB`, `Body`, `BodyRef` | `hir_lower`, `sim_back` | +| MIR + interner | `hir_lower` | `mir::Function` + `HirInterner` | `mir_opt`, `mir_autodiff`, `sim_back` | +| Optimised MIR | `mir_opt` | `mir::Function` | `mir_autodiff`, `sim_back` | +| Differentiated MIR | `mir_autodiff` | `mir::Function` (with derivative instrs) | `sim_back` | +| Simulation model | `sim_back` | `CompiledModule` (`DaeSystem`, eval/init `Function`s) | `osdi` | +| LLVM IR | `mir_llvm` | LLVM `Module` (via `llvm-sys`) | `osdi` (descriptor emission) | +| Object files | `osdi` | `.o` file paths | `linker` | +| Shared library | `linker` | `.so` / `.dll` with `OSDI_DESCRIPTORS` | simulator | + +--- + +## The `compile()` entry point + +`openvaf::compile(opts: &Opts)` in `openvaf/openvaf/src/lib.rs` orchestrates +the entire pipeline. Its `Opts` struct: + +```rust +pub struct Opts { + pub dry_run: bool, + pub defines: Vec, // preprocessor defines + pub codegen_opts: Vec, // passed to LLVMBackend + pub lints: Vec<(String, LintLevel)>, + pub input: Utf8PathBuf, + pub output: CompilationDestination, + pub include: Vec, + pub opt_lvl: LLVMCodeGenOptLevel, + pub target: Target, + pub target_cpu: String, + pub dump_mir: bool, + pub dump_unopt_mir: bool, + pub dump_ir: bool, + pub dump_unopt_ir: bool, +} +``` + +`CompilationDestination` is either `Path { lib_file }` (explicit output path) +or `Cache { cache_dir }` (content-addressable cache: if the library already +exists under the cache key, compilation is skipped entirely). + +### Execution phases + +**Phase 1 — front end (Salsa).** +`CompilationDB::new_fs(input, include, defines, lints)` constructs the +database. `collect_modules(db, false, sink)` runs the entire Salsa front +end — preprocessing, parsing, HIR construction, type inference — and collects +all module declarations into `Vec`. If any fatal diagnostic is +emitted, compilation terminates with `FatalDiagnostic`. + +**Phase 2 — backend (eager).** +`LLVMBackend::new(codegen_opts, target, target_cpu, &[])` initialises the +LLVM target. `osdi::compile(db, modules, lib_file, target, back, true, +opt_lvl, dump_mir, dump_unopt_mir, dump_ir, dump_unopt_ir)` drives `sim_back` +→ `mir_opt` → `mir_autodiff` → `mir_llvm` for each module and returns +`(object_file_paths, compiled_modules, literals)`. + +**Phase 3 — linking.** +`linker::link(None, target, lib_file, |linker| { linker.add_object(path) })` +invokes the platform linker. Intermediate `.o` files are deleted on success. + +### Debugging flags + +| Flag | Effect | +|---|---| +| `--dump-mir` | Prints optimised MIR for each module to stdout | +| `--dump-unopt-mir` | Prints unoptimised MIR (before `mir_opt` passes) | +| `--dump-ir` | Prints optimised LLVM IR to a file | +| `--dump-unopt-ir` | Prints unoptimised LLVM IR (before `LLVMRunPasses`) | +| `--dry-run` | Stops after `collect_modules`; does not invoke `osdi::compile` | + +--- + +## OSDI 0.4 output + +The final artifact is a native shared library. The `osdi` crate emits two +LLVM globals before handing off to the linker: + +**`OSDI_DESCRIPTORS`** — an array of OSDI 0.4 descriptor structs, one per +Verilog-A `module`. Each descriptor is typed as the LLVM struct +`OsdiTys.osdi_descriptor`, built from the per-target stdlib bitcode embedded +at build time in `osdi/build.rs`. Each descriptor encodes: + +- Model name and version string. +- Parameter and instance-variable memory layout (byte offsets, OSDI types, + "given" flag bit positions). +- Jacobian sparsity pattern and column/row offset tables. +- Node-collapse pairs. +- Function pointers for the evaluation kernels: model setup, instance setup, + DC evaluation, AC evaluation, noise evaluation, temperature update, and + limit-state handling. + +**`OSDI_DESCRIPTOR_SIZE`** — a `u32` constant holding the byte size of a +single descriptor. Simulators that support both OSDI 0.3 and 0.4 use this to +stride through the array. + +Simulators `dlopen` the library, locate `OSDI_DESCRIPTORS` and +`OSDI_DESCRIPTOR_SIZE`, and iterate over the descriptors to register each +model with the internal device catalogue. From 5840422b7b614c4e075c6b9862cd41b2f53ff65c Mon Sep 17 00:00:00 2001 From: philippevelha Date: Sat, 23 May 2026 14:19:52 +0200 Subject: [PATCH 10/28] tutorials: add Quarto tutorial series T1-T15 --- tutorials/_quarto.yml | 45 ++++ tutorials/index.qmd | 66 ++++++ tutorials/models/.gitkeep | 0 tutorials/netlists/.gitkeep | 0 tutorials/t01-build-and-compile.qmd | 268 ++++++++++++++++++++++ tutorials/t02-item-tree.qmd | 169 ++++++++++++++ tutorials/t03-name-resolution.qmd | 133 +++++++++++ tutorials/t04-lowered-body.qmd | 142 ++++++++++++ tutorials/t05-mir-contribution.qmd | 140 +++++++++++ tutorials/t06-type-inference.qmd | 112 +++++++++ tutorials/t07-autodiff.qmd | 107 +++++++++ tutorials/t08-diode-nonlinearity.qmd | 115 ++++++++++ tutorials/t09-osdi-descriptor.qmd | 124 ++++++++++ tutorials/t10-ngspice-simulation.qmd | 122 ++++++++++ tutorials/t11-multiport-node-collapse.qmd | 98 ++++++++ tutorials/t12-multi-domain.qmd | 103 +++++++++ tutorials/t13-noise-analysis.qmd | 96 ++++++++ tutorials/t14-analog-function.qmd | 128 +++++++++++ tutorials/t15-cross-platform.qmd | 131 +++++++++++ 19 files changed, 2099 insertions(+) create mode 100644 tutorials/_quarto.yml create mode 100644 tutorials/index.qmd create mode 100644 tutorials/models/.gitkeep create mode 100644 tutorials/netlists/.gitkeep create mode 100644 tutorials/t01-build-and-compile.qmd create mode 100644 tutorials/t02-item-tree.qmd create mode 100644 tutorials/t03-name-resolution.qmd create mode 100644 tutorials/t04-lowered-body.qmd create mode 100644 tutorials/t05-mir-contribution.qmd create mode 100644 tutorials/t06-type-inference.qmd create mode 100644 tutorials/t07-autodiff.qmd create mode 100644 tutorials/t08-diode-nonlinearity.qmd create mode 100644 tutorials/t09-osdi-descriptor.qmd create mode 100644 tutorials/t10-ngspice-simulation.qmd create mode 100644 tutorials/t11-multiport-node-collapse.qmd create mode 100644 tutorials/t12-multi-domain.qmd create mode 100644 tutorials/t13-noise-analysis.qmd create mode 100644 tutorials/t14-analog-function.qmd create mode 100644 tutorials/t15-cross-platform.qmd diff --git a/tutorials/_quarto.yml b/tutorials/_quarto.yml new file mode 100644 index 00000000..5c060288 --- /dev/null +++ b/tutorials/_quarto.yml @@ -0,0 +1,45 @@ +project: + type: book + output-dir: _site + +book: + title: "Understanding OpenVAF" + subtitle: "A Progressive Tutorial Series" + author: "Philippe Velha" + date: today + repo-url: https://github.com/philippevelha/OpenVAF-Reloaded + repo-actions: [issue] + chapters: + - index.qmd + - part: "Beginner" + chapters: + - t01-build-and-compile.qmd + - t02-item-tree.qmd + - t03-name-resolution.qmd + - t04-lowered-body.qmd + - t05-mir-contribution.qmd + - part: "Intermediate" + chapters: + - t06-type-inference.qmd + - t07-autodiff.qmd + - t08-diode-nonlinearity.qmd + - t09-osdi-descriptor.qmd + - t10-ngspice-simulation.qmd + - part: "Advanced" + chapters: + - t11-multiport-node-collapse.qmd + - t12-multi-domain.qmd + - t13-noise-analysis.qmd + - t14-analog-function.qmd + - t15-cross-platform.qmd + +format: + html: + theme: flatly + highlight-style: github + code-copy: true + toc: true + +execute: + eval: false + echo: true diff --git a/tutorials/index.qmd b/tutorials/index.qmd new file mode 100644 index 00000000..3ad43bb7 --- /dev/null +++ b/tutorials/index.qmd @@ -0,0 +1,66 @@ +--- +title: "Understanding OpenVAF" +subtitle: "A Progressive Tutorial Series" +--- + +This series takes a reader who knows Verilog-A as a user and wants to +understand how OpenVAF compiles it — from source text to OSDI `.osdi` shared +library — step by step. + +Each tutorial is self-contained and builds on the previous one. You need a +working Rust toolchain and a checkout of the repository. Nothing else is +required until [T10](t10-ngspice-simulation.qmd), which loads the compiled model into ngspice. + +> **Status note:** Tutorials are marked ✅ once verified by running them +> from a clean checkout. An unmarked tutorial is a complete draft that has +> not yet been run end-to-end. + +--- + +## Beginner + +| # | Title | Status | +|---|---|---| +| T1 | [Build OpenVAF and Compile a Resistor](t01-build-and-compile.qmd) | draft | +| T2 | [Inspect the Item Tree](t02-item-tree.qmd) | draft | +| T3 | [Read the Name-Resolution Map](t03-name-resolution.qmd) | draft | +| T4 | [Examine the Lowered Body](t04-lowered-body.qmd) | draft | +| T5 | [Trace a Contribution Through the MIR](t05-mir-contribution.qmd) | draft | + +## Intermediate + +| # | Title | Status | +|---|---|---| +| T6 | [Follow Type Inference for a Simple Model](t06-type-inference.qmd) | draft | +| T7 | [Understand the Automatic Differentiation Pass](t07-autodiff.qmd) | draft | +| T8 | [Add a Simple Nonlinearity (Diode I–V)](t08-diode-nonlinearity.qmd) | draft | +| T9 | [Inspect the OSDI Descriptor](t09-osdi-descriptor.qmd) | draft | +| T10 | [Use ngspice to Run a DC Sweep](t10-ngspice-simulation.qmd) | draft | + +## Advanced + +| # | Title | Status | +|---|---|---| +| T11 | [Multi-Port Model and Node Collapse](t11-multiport-node-collapse.qmd) | draft | +| T12 | [Multi-Domain Model (Electro-Thermal)](t12-multi-domain.qmd) | draft | +| T13 | [Noise Analysis Support](t13-noise-analysis.qmd) | draft | +| T14 | [Custom Analog Function and the Function DefMap](t14-analog-function.qmd) | draft | +| T15 | [Cross-Platform Compilation and the Target Spec](t15-cross-platform.qmd) | draft | + +--- + +## Background reading + +The tutorials refer frequently to the INTERNALS docs. These are not required +reading before starting, but are useful when you want to go deeper: + +- [Architecture overview](../docs/ARCHITECTURE.md) +- [`hir_def` — item tree and name resolution](../docs/hir_def/INTERNALS.md) +- [`hir_ty` — type inference](../docs/hir_ty/INTERNALS.md) +- [`hir` — public HIR API](../docs/hir/INTERNALS.md) +- [`mir` — SSA intermediate representation](../docs/mir/INTERNALS.md) +- [`mir_autodiff` — automatic differentiation](../docs/mir_autodiff/INTERNALS.md) +- [`hir_lower` — HIR → MIR lowering](../docs/hir_lower/INTERNALS.md) +- [`sim_back` — simulation model builder](../docs/sim_back/INTERNALS.md) +- [`osdi` — OSDI codegen](../docs/osdi/INTERNALS.md) +- [`mir_llvm` — LLVM backend](../docs/mir_llvm/INTERNALS.md) diff --git a/tutorials/models/.gitkeep b/tutorials/models/.gitkeep new file mode 100644 index 00000000..e69de29b diff --git a/tutorials/netlists/.gitkeep b/tutorials/netlists/.gitkeep new file mode 100644 index 00000000..e69de29b diff --git a/tutorials/t01-build-and-compile.qmd b/tutorials/t01-build-and-compile.qmd new file mode 100644 index 00000000..2510d615 --- /dev/null +++ b/tutorials/t01-build-and-compile.qmd @@ -0,0 +1,268 @@ +--- +title: "T1 — Build OpenVAF and Compile a Resistor" +--- + +## Goal + +Build the `openvaf-r` compiler from source and use it to compile a +temperature-dependent resistor model from Verilog-A to an OSDI `.osdi` +shared library. Verify the output exists and confirm the two OSDI symbols +(`OSDI_DESCRIPTORS`, `OSDI_DESCRIPTOR_SIZE`) are exported. + +--- + +## Prerequisites + +- A working internet connection for `cargo` to fetch dependencies. +- No prior knowledge of Rust or compilers is assumed. + +--- + +## Setup + +### 1. Install a Rust toolchain + +OpenVAF requires a recent stable Rust toolchain (1.70 or later). + +**Linux / macOS:** +```bash +curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh +source "$HOME/.cargo/env" +``` + +**Windows (winget):** +```powershell +winget install Rustlang.Rustup +# restart the terminal so %USERPROFILE%\.cargo\bin is on PATH +``` + +Verify: +```bash +rustc --version +cargo --version +``` + +### 2. Clone the repository + +```bash +git clone https://github.com/philippevelha/OpenVAF-Reloaded.git +cd OpenVAF-Reloaded +``` + +All commands in this tutorial run from the repository root unless otherwise +stated. + +### 3. Check that LLVM is available + +`mir_llvm` links against LLVM 18. The build will fail if LLVM is not found. + +**Ubuntu / Debian:** +```bash +sudo apt install llvm-18 libclang-18-dev +``` + +**macOS (Homebrew):** +```bash +brew install llvm@18 +export LLVM_SYS_180_PREFIX=$(brew --prefix llvm@18) +``` + +**Windows:** Install the LLVM 18 pre-built binaries from + and add the `bin` directory to `PATH`. + +--- + +## Walkthrough + +### Step 1 — Build the compiler + +The CLI binary is in the `openvaf-driver` crate and is named `openvaf-r`. + +```bash +cargo build --release --package openvaf-driver +``` + +A release build takes several minutes on first run. On success the binary +lives at: + +| Platform | Path | +|---|---| +| Linux / macOS | `target/release/openvaf-r` | +| Windows | `target\release\openvaf-r.exe` | + +Quick sanity check: + +```bash +./target/release/openvaf-r --version +``` + +Expected output (version may differ): + +``` +openvaf 23.5.0 +``` + +### Step 2 — Look at the source model + +The tutorial uses the resistor model from the integration tests: + +```bash +cat integration_tests/RESISTOR/resistor.va +``` + +```verilog +`include "constants.vams" +`include "disciplines.vams" + +module resistor_va(A,B); + inout A, B; + electrical A,B; + + branch (A,B) br_a_b; + + (*desc= "Ohmic resistance", units = "Ohm" *) parameter real R = 0.0 from [0:inf]; + (*desc= "Temperature coeff" *) parameter real zeta = 0.0 from [-20:20]; + (*desc= "Reference Temp.", units = "Kelvin" *) parameter real tnom = 300.0 from [0:inf]; + + real res, vres; + + analog begin + vres = V(br_a_b); + res = R*pow($temperature/tnom, zeta); + I(br_a_b) <+ vres / res; + end +endmodule +``` + +Three things to notice: + +- The module is named `resistor_va` (module names become the OSDI model name). +- `V(br_a_b)` is a *potential* nature access — it reads the voltage across + branch `br_a_b`. +- `I(br_a_b) <+` is a *contribution* statement — it adds a current into the + branch. Together they implement Ohm's law. + +The `(*desc= ... units = ...*)` syntax is a Verilog-A attribute; OpenVAF +stores these as metadata in the OSDI descriptor. + +### Step 3 — Compile the model + +```bash +./target/release/openvaf-r \ + integration_tests/RESISTOR/resistor.va \ + -o resistor.osdi +``` + +When no `-o` flag is given, OpenVAF writes the output next to the input file. +Here we redirect it to the current directory for convenience. + +If compilation succeeds you will see: + +``` +Finished building resistor.va in X.XXs +``` + +The output file `resistor.osdi` is a native shared library — a PE DLL on +Windows, an ELF `.so` on Linux, a Mach-O `.dylib` on macOS — renamed with the +`.osdi` extension. + +### Step 4 — Inspect the exported symbols + +A valid OSDI library must export exactly two symbols: `OSDI_DESCRIPTORS` and +`OSDI_DESCRIPTOR_SIZE`. + +**Linux:** +```bash +nm -D resistor.osdi | grep OSDI +``` + +**macOS:** +```bash +nm -gU resistor.osdi | grep OSDI +``` + +**Windows (MSVC `dumpbin`):** +```powershell +dumpbin /exports resistor.osdi | Select-String OSDI +``` + +**Windows or any platform (LLVM `llvm-nm`):** +```bash +llvm-nm --defined-only resistor.osdi | grep OSDI +``` + +Expected output (addresses will differ): + +``` +0000000000004020 D OSDI_DESCRIPTORS +0000000000004018 D OSDI_DESCRIPTOR_SIZE +``` + +`D` means the symbol is in the data section — both are global variables, not +functions. `OSDI_DESCRIPTORS` is an array of descriptor structs (one per +Verilog-A `module` in the source); `OSDI_DESCRIPTOR_SIZE` is a `u32` giving +the byte size of a single descriptor entry. + +### Step 5 — Try the optimisation levels + +OpenVAF passes the `-O` flag directly to LLVM. Compare compile times and +object sizes: + +```bash +./target/release/openvaf-r integration_tests/RESISTOR/resistor.va -o r_O0.osdi -O 0 +./target/release/openvaf-r integration_tests/RESISTOR/resistor.va -o r_O3.osdi -O 3 +``` + +On Linux: +```bash +ls -lh r_O0.osdi r_O3.osdi +``` + +The O3 file is usually smaller because DCE and inlining remove dead code. + +### Step 6 — Inspect the preprocessed source + +The `--print-expansion` flag runs OpenVAF through preprocessing only and +prints the macro-expanded source: + +```bash +./target/release/openvaf-r \ + integration_tests/RESISTOR/resistor.va \ + --print-expansion +``` + +You will see the contents of `disciplines.vams` and `constants.vams` inlined, +followed by the resistor module body. All `` `define `` macros are replaced +with their values and `` `include `` directives are resolved. + +--- + +## Exercises + +1. **Change the default resistance.** Open + `integration_tests/RESISTOR/resistor.va`, change `R = 0.0` to `R = 1e3`, + recompile, and verify the output still builds. Then restore the file. + +2. **Dry run.** Add `--dry-run` to the compile command. OpenVAF runs the + entire front end (name resolution, type inference) but stops before LLVM + codegen. Note that no `.osdi` file is produced. + +3. **Dump the MIR.** Add `--dump-mir` to see the optimised MIR for the + resistor eval body. Identify the `fdiv` instruction that implements Ohm's + law (`V / R`) and the `pow` call for the temperature coefficient. + +4. **Dump the LLVM IR.** Add `--dump-ir` to see the LLVM IR for the + evaluation kernel. Compare the IR function name (it will include the + module name `resistor_va`) against what you saw in the MIR. + +5. **Try a syntax error.** Remove the `endmodule` keyword, recompile, and + read the error message. Then fix it and recompile. + +--- + +## What's next + +[T2 — Inspect the Item Tree](t02-item-tree.qmd) opens up the compiler and +shows the first internal representation built from the source: the +`ItemTree`, which summarises all top-level declarations without evaluating +any expression bodies. diff --git a/tutorials/t02-item-tree.qmd b/tutorials/t02-item-tree.qmd new file mode 100644 index 00000000..cca13305 --- /dev/null +++ b/tutorials/t02-item-tree.qmd @@ -0,0 +1,169 @@ +--- +title: "T2 — Inspect the Item Tree" +--- + +## Goal + +Write a small Rust test that calls `ItemTree::dump()` on the resistor model and +read the output. Identify the module declaration, port nodes, the named +branch, and the `R`, `zeta`, `tnom` parameters in the flat item tree before +any name resolution or body lowering has run. + +--- + +## Prerequisites + +- [T1](t01-build-and-compile.qmd) completed: the repository builds and + `openvaf-r` compiles `resistor.va`. +- Familiarity with `cargo test` and Rust's `#[test]` attribute. + +**Background reading:** [`hir_def` INTERNALS §3 — `ItemTree`](../docs/hir_def/INTERNALS.md) + +--- + +## Setup + +No additional tools are needed. The `hir` crate already depends on +`hir_def`, so we can write a test that calls `db.item_tree(file_id)` and +prints its dump. + +--- + +## Walkthrough + +### What is the ItemTree? + +The `ItemTree` is the first compiler-internal representation built from the +parsed concrete syntax tree. It is intentionally stripped: expression bodies +(the right-hand side of parameter defaults, variable initialisers, the +`analog begin … end` block) are thrown away. Only the structural +*declarations* — module ports, parameters, variables, branches, functions — +are kept. + +This stripping is the **incremental invalidation firewall**: if you change a +formula inside the `analog` block, the `ItemTree` is unchanged, and no query +that depends only on structure needs to re-run. + +### Writing the inspection test + +Create a new file `inspect/src/main.rs` alongside `Cargo.toml` inside the +repository root, or simply write a standalone test. The fastest approach is +to add a `[[test]]` entry to a scratch crate. Here is a self-contained +Rust program you can place at `scratch/src/main.rs`: + +```rust +use std::sync::Arc; +use basedb::{AbsPathBuf, BaseDB}; +use hir::CompilationDB; + +fn main() { + let path = AbsPathBuf::assert( + std::path::PathBuf::from("integration_tests/RESISTOR/resistor.va") + .canonicalize() + .unwrap(), + ); + let db = CompilationDB::new_fs(path, &[], &[], &[]).unwrap(); + let root_file = db.compilation_unit().root_file(); + let tree = db.item_tree(root_file); + println!("{}", tree.dump()); +} +``` + +To run this, add a `Cargo.toml` for the scratch crate: + +```toml +[package] +name = "scratch" +version = "0.1.0" +edition = "2021" + +[dependencies] +hir = { path = "openvaf/hir" } +basedb = { path = "openvaf/basedb" } +``` + +Then: + +```bash +cargo run --manifest-path scratch/Cargo.toml +``` + +### Reading the output + +The `dump()` output is structured text. For `resistor.va` it looks roughly like: + +``` +module resistor_va + ports: A, B + nodes: + node A electrical + node B electrical + branches: + branch br_a_b (A, B) + params: + param R: real default= + param zeta: real default= + param tnom: real default= + vars: + var res: real + var vres: real + analog_block: +``` + +The exact format is defined in `openvaf/hir_def/src/item_tree/pretty.rs`. + +Key observations: + +1. **No expressions.** Parameter defaults show `` placeholders — the + actual `0.0`, `0.0`, and `300.0` literals are stored in a separate `Body` + arena and are not part of the `ItemTree`. + +2. **Flat structure.** All declarations are at the same level inside the + module entry. There is no scope nesting yet; that comes in the `DefMap`. + +3. **`ItemTreeId` indices.** Each node in the dump corresponds to an + `ItemTreeId` arena index. These indices are stable as long as the + structural declarations do not change. + +### The invalidation experiment + +To see why stripping expression bodies matters: + +1. Change the default value of `R` in `resistor.va` from `0.0` to `1.0`. +2. Run your dump again. The `ItemTree` dump is identical except for the + `` placeholder — the structural item IDs have not changed. + +This means that any Salsa query depending only on the `ItemTree` (name +resolution, the `DefMap`, all type-checking queries) would be a cache hit +after your change. Only the `body` query and its dependents need to re-run. + +--- + +## Exercises + +1. **Add a port.** Add a third port `G` to `resistor_va` (keep it + disconnected). Confirm the `ItemTree` dump now lists three ports. + +2. **Add a `function`.** Add a trivial Verilog-A `function real clamp; … + endfunction` before the `analog` block. Find where the function appears + in the `ItemTree` dump. + +3. **Compare `blocks.va`.** Run the same dump on + `openvaf/test_data/item_tree/blocks.va`. Notice how named `begin … end` + blocks are represented. + +4. **Read `item_tree.rs`.** Open + `openvaf/hir_def/src/item_tree.rs` and find the `Module` struct + (around line 150). Match its fields against the dump output. + +5. **Trigger no-reparse.** Add a comment to `resistor.va`. Which Salsa + queries would be invalidated? (Hint: comments are stripped by the lexer + before the CST is built, so the parse tree is identical.) + +--- + +## What's next + +[T3 — Read the Name-Resolution Map](t03-name-resolution.qmd) shows the +`DefMap` built from the `ItemTree`: the scope tree, the per-scope declaration +maps, and how `V` resolves to a `NatureAccess` built-in. diff --git a/tutorials/t03-name-resolution.qmd b/tutorials/t03-name-resolution.qmd new file mode 100644 index 00000000..e0da0433 --- /dev/null +++ b/tutorials/t03-name-resolution.qmd @@ -0,0 +1,133 @@ +--- +title: "T3 — Read the Name-Resolution Map" +--- + +## Goal + +Print the `DefMap` for `resistor.va` and identify the root scope, the module +child scope, and all declarations inside it. Trace how `V` resolves to a +`NatureAccess` built-in rather than a plain variable. + +--- + +## Prerequisites + +- [T2](t02-item-tree.qmd) completed. +- The scratch crate from T2 is available. + +**Background reading:** [`hir_def` INTERNALS §5 — `DefMap`](../docs/hir_def/INTERNALS.md) + +--- + +## Setup + +The `DefMap` has a `dump()` method in `openvaf/hir_def/src/nameres/pretty.rs`. +Call it from the same scratch crate used in T2. + +--- + +## Walkthrough + +### What is the DefMap? + +After the `ItemTree` is built, the `DefCollector` walks it and constructs a +`DefMap`: a tree of `Scope` nodes, one per lexical scope in the source. Each +`Scope` holds a map from `Name` → `ScopeDefItem`. + +`ScopeDefItem` is an enum covering everything that can be named: +`ModuleId`, `NodeId`, `VarId`, `ParamId`, `BranchId`, `FunctionId`, and +built-in categories including `BuiltIn`, `NatureAccess`, `DisciplineId`, and +`ParamSysFun`. + +For `resistor.va` the scope tree has two levels: + +``` +root scope + └─ module scope (resistor_va) + A → NodeId(0) + B → NodeId(1) + br_a_b → BranchId(0) + R → ParamId(0) + zeta → ParamId(1) + tnom → ParamId(2) + res → VarId(0) + vres → VarId(1) + V → NatureAccess(potential_id) + I → NatureAccess(flow_id) + $temperature → ParamSysFun(Temperature) + … (all other built-ins from BUILTIN_SCOPE) +``` + +### Printing the DefMap + +Extend the scratch program: + +```rust +use hir_def::db::HirDefDB; + +let def_map = db.def_map(root_file); +println!("{}", def_map.dump(&db)); +``` + +The `dump()` method is defined in `nameres/pretty.rs`. + +### Why does `V` resolve to `NatureAccess`? + +In the source `V(br_a_b)` looks like a function call on an identifier `V`. +But `V` is not a user-defined function — it is the *access function* for the +potential nature of the `electrical` discipline. + +The `DefMap` inserts `NatureAccess` entries for every discipline's access +functions when it initialises the built-in scope (`BUILTIN_SCOPE` in +`builtin.rs`). The `electrical` discipline (from `disciplines.vams`) has: + +- Potential nature `Voltage` with access function `V` +- Flow nature `Current` with access function `I` + +So in the module scope, `V` → `NatureAccess(potential_id)` and +`I` → `NatureAccess(flow_id)`. Name resolution stores this in +`resolved_calls` — not in `expr_types` — which is why the type-inference +stage still needs to do further work to determine whether a given access is +potential or flow. (That work happens in T6.) + +### The BUILTIN_SCOPE + +`BUILTIN_SCOPE` is a `Lazy>` in +`openvaf/hir_def/src/builtin.rs`. It is built once and shared across all +compilations. It contains: + +- All `BuiltIn` variants (`$display`, `$finish`, `abs`, `min`, `max`, …) +- All `ParamSysFun` variants (`$temperature`, `$vt`, `$abstime`, …) +- `NatureAccess` for every nature defined in the standard library + +The module scope's lookup chain is: module declarations → parent scope +(root/file) → `BUILTIN_SCOPE`. If a name is not found in the module +declarations, the search continues outward. + +--- + +## Exercises + +1. **Find a built-in.** Look up `abs` in the DefMap dump. What `ScopeDefItem` + variant does it resolve to? + +2. **Add a local variable that shadows a built-in.** Add `real V;` to the + resistor module. Does `V` still resolve to `NatureAccess`? What does the + compiler emit? + +3. **Trace a `$temperature` reference.** In the dump, find + `$temperature`. Which `ScopeDefItem` variant is it? Read the + `ParamSysFun` enum in `hir_def/src/builtin.rs` to see all system + parameters. + +4. **Block scope.** Wrap the `analog begin` body in a named block + (`begin : my_block … end`). Run the DefMap dump again and find the + new child scope. + +--- + +## What's next + +[T4 — Examine the Lowered Body](t04-lowered-body.qmd) descends into the +expression arenas: the `Body` that stores every `Expr` and `Stmt` for the +analog block, still with unresolved path strings, before type inference runs. diff --git a/tutorials/t04-lowered-body.qmd b/tutorials/t04-lowered-body.qmd new file mode 100644 index 00000000..aa518c17 --- /dev/null +++ b/tutorials/t04-lowered-body.qmd @@ -0,0 +1,142 @@ +--- +title: "T4 — Examine the Lowered Body" +--- + +## Goal + +Print the `Body` for the resistor analog block and read the `entry_stmts`, +`stmts`, and `exprs` arenas. Match each `ExprId` to the corresponding +fragment of `I(br_a_b) <+ vres / res` and confirm that paths are still +unresolved name strings at this stage. + +--- + +## Prerequisites + +- [T3](t03-name-resolution.qmd) completed. + +**Background reading:** [`hir_def` INTERNALS §7 — `DefWithBodyId` and `Body`](../docs/hir_def/INTERNALS.md) + +--- + +## Setup + +The `Body` type has a `pretty` module at +`openvaf/hir_def/src/body/pretty.rs` with a `dump()` method. Use the scratch +crate from T2. + +--- + +## Walkthrough + +### What is the Body? + +The `Body` holds the expression and statement arenas for a single *definition +with a body*: the analog block, the analog initial block, a function body, +a parameter default expression, or a variable initialiser. + +```rust +pub struct Body { + pub exprs: Arena, + pub stmts: Arena, + pub entry_stmts: Vec, + pub stmt_scopes: ArenaMap, +} +``` + +Crucially, paths in `Expr::Path { name, .. }` are raw `Name` strings. They +have **not** been resolved to `VarId`, `ParamId`, or `BranchId` yet — that +resolution happens in type inference (`hir_ty`), not here. + +### Printing the Body + +```rust +use hir_def::{DefWithBodyId, db::HirDefDB}; +use hir::Module; + +let modules = db.compilation_unit().modules(&db); +let module = modules[0]; // resistor_va +let body_id = DefWithBodyId::ModuleId { initial: false, module: module.id }; +let body = db.body(body_id); +println!("{}", body.dump()); +``` + +### Reading the output + +For the resistor analog block: + +``` +entry_stmts: [s0, s1, s2] + +stmts: + s0: Expr(e0) -- vres = V(br_a_b) → assignment + s1: Expr(e1) -- res = R * pow(...) → assignment + s2: Expr(e2) -- I(br_a_b) <+ vres/res → assignment + +exprs: + e0: Assignment { lhs: Path("vres"), rhs: e3 } + e3: Call { callee: Path("V"), args: [e4] } + e4: Path("br_a_b") + + e1: Assignment { lhs: Path("res"), rhs: e5 } + e5: BinaryOp(Mul) { lhs: e6, rhs: e7 } + e6: Path("R") + e7: Call { callee: Path("pow"), args: [e8, e9] } + e8: BinaryOp(Div) { lhs: Path("$temperature"), rhs: Path("tnom") } + e9: Path("zeta") + + e2: Assignment { lhs: Path("I") with args [Path("br_a_b")], + rhs: e10 } + e10: BinaryOp(Div) { lhs: Path("vres"), rhs: Path("res") } +``` + +Every identifier — `vres`, `V`, `br_a_b`, `R`, `pow`, `$temperature`, +`tnom`, `zeta`, `I`, `res` — is a raw `Path` string. At this stage the +compiler has no idea whether `R` is a parameter, a variable, or a typo. + +### The contribution statement + +`I(br_a_b) <+` is lowered into the `Body` as a plain `Stmt::Assignment`. +There is no distinction between a variable assignment (`=`) and a +contribution (`<+`) in the `Body` — they both become `Stmt::Assignment`. +The distinction is made later by `hir_ty`'s type inference, which determines +the `assignment_destination` from the type of the left-hand side. This is +one of the most important things to remember about the `Body`: **it records +syntax, not semantics**. + +### The `entry_stmts` field + +`entry_stmts` is the list of `StmtId` values at the top level of the body. +For the resistor, the analog block has three sequential statements. For a +module with control flow (`if`, `for`, `while`, `case`), the tree structure +is encoded in the `Stmt` variants, which contain child `StmtId` references. + +--- + +## Exercises + +1. **Count the expressions.** How many `ExprId` entries does the resistor + analog block have in total? Which sub-expressions account for the most + entries? + +2. **Find the parameter default bodies.** The parameters `R`, `zeta`, and + `tnom` each have a default expression. Dump the bodies for + `DefWithBodyId::ParamId(param_id)` for each and confirm the literal + values (`0.0`, `0.0`, `300.0`) appear there. + +3. **Add an `if` statement.** Add `if (res < 1e-9) res = 1e-9;` to the + analog block. Dump the body again. Identify the `Stmt::If` node and its + condition, then-branch, and else-branch `StmtId`s. + +4. **Inspect `stmt_scopes`.** The `stmt_scopes` map records which scope each + statement lives in. Dump it. For the resistor with no named blocks, all + statements should map to the same scope ID. + +--- + +## What's next + +[T5 — Trace a Contribution Through the MIR](t05-mir-contribution.qmd) skips +ahead to the end of the pipeline and shows the SSA MIR produced after HIR +lowering and optimisation — where `vres / res` finally becomes a concrete +`fdiv` instruction and the contribution is represented as a `PlaceKind::Contribute` output. diff --git a/tutorials/t05-mir-contribution.qmd b/tutorials/t05-mir-contribution.qmd new file mode 100644 index 00000000..15a22632 --- /dev/null +++ b/tutorials/t05-mir-contribution.qmd @@ -0,0 +1,140 @@ +--- +title: "T5 — Trace a Contribution Through the MIR" +--- + +## Goal + +Use `--dump-mir` and `--dump-unopt-mir` to print the MIR `Function` produced +for the resistor evaluation body. Identify the `fdiv` instruction that +implements Ohm's law, the `pow` call for the temperature coefficient, and the +`optbarrier` marking the contribution output. + +--- + +## Prerequisites + +- [T4](t04-lowered-body.qmd) completed. + +**Background reading:** [`mir` INTERNALS](../docs/mir/INTERNALS.md) | +[`hir_lower` INTERNALS](../docs/hir_lower/INTERNALS.md) + +--- + +## Setup + +No additional tools needed. This tutorial uses flags built into `openvaf-r`. + +--- + +## Walkthrough + +### Step 1 — Dump the unoptimised MIR + +```bash +./target/release/openvaf-r \ + integration_tests/RESISTOR/resistor.va \ + --dump-unopt-mir \ + --dry-run +``` + +The `--dry-run` flag stops before LLVM codegen so no `.osdi` file is written. +The `--dump-unopt-mir` flag prints the MIR immediately after HIR lowering, +before any `mir_opt` passes run. + +### Step 2 — Dump the optimised MIR + +```bash +./target/release/openvaf-r \ + integration_tests/RESISTOR/resistor.va \ + --dump-mir \ + --dry-run +``` + +### Step 3 — Reading the MIR + +The MIR uses an SSA text format similar to Cranelift or LLVM IR. Each +function starts with its parameter list, then lists blocks. + +For the resistor you will see approximately: + +``` +function %(v0, v1, v2, v3, v4) { + block0: + v5 = fdiv v0, v1 + v6 = fdiv v3, v4 + v7 = pow v6, v2 + v8 = fmul v1, v7 // res = R * pow($temperature/tnom, zeta) + v9 = fdiv v0, v8 // vres / res + v10 = optbarrier v9 +} +``` + +The function parameters `v0 … v4` correspond to `ParamKind` entries in the +`HirInterner`: + +| Value | `ParamKind` | HIR entity | +|---|---|---| +| `v0` | `Voltage { hi: A, lo: B }` | `V(br_a_b)` / `vres` | +| `v1` | `Param(R)` | parameter `R` | +| `v2` | `Param(zeta)` | parameter `zeta` | +| `v3` | `Temperature` | `$temperature` | +| `v4` | `Param(tnom)` | parameter `tnom` | + +The `optbarrier` instruction at the end marks the contribution output. It is +semantically a no-op — the value passes through unchanged — but it prevents +LLVM from moving the computation across the output boundary during +optimisation. + +### Step 4 — Compare with the snapshot + +The test snapshot for the resistor MIR is stored at: +``` +openvaf/test_data/dae/resistor_va_mir.snap +``` + +Compare it to your `--dump-mir` output. The snapshot was generated from the +same source, so the instructions should match (modulo `Value` numbering). + +### Step 5 — What the `PlaceKind` table says + +The `HirInterner::outputs` map records where the MIR function writes its +results. For the resistor evaluation kernel, the one output is: + +``` +PlaceKind::Contribute { dst: BranchWrite::Named(br_a_b), reactive: false, voltage_src: false } + → v10 (the optbarrier-wrapped fdiv result) +``` + +This entry tells `sim_back` that `v10` is the resistive current contribution +to branch `br_a_b`, not a reactive (capacitive) contribution and not a voltage +source. The `DaeSystem` uses this to set up the Jacobian entry +`∂I(A,B)/∂V(A,B)`. + +--- + +## Exercises + +1. **Before vs after opt.** Count the number of instructions in the + unoptimised vs optimised MIR. How many were eliminated? (SCCP typically + folds away the `vres` intermediate variable.) + +2. **Set `zeta = 0`.** Change the default `zeta = 0.0` to a literal `0` and + recompile with `--dump-mir`. Does SCCP fold the `pow` call away? + (Hint: `pow(x, 0) = 1`.) + +3. **Find the `fdiv` Jacobian.** Add `--dump-mir` to a full compile (without + `--dry-run`). The second MIR function printed is the Jacobian kernel. + Find the `fdiv` instruction that computes `∂(I)/∂(V) = 1/R_t`. + +4. **Read the `HirInterner` dump.** Add `-dump-mir` and look for the + "Evaluation HIR interner" section. Match each `Param` index to the + `ParamKind` in the table above. + +--- + +## What's next + +[T6 — Follow Type Inference for a Simple Model](t06-type-inference.qmd) +goes back inside the compiler and shows the `InferenceResult` produced by +`hir_ty`: specifically the `resolved_calls` entry for `V(br_a_b)` and the +`assignment_destination` entry for the `<+` statement. diff --git a/tutorials/t06-type-inference.qmd b/tutorials/t06-type-inference.qmd new file mode 100644 index 00000000..80b994d5 --- /dev/null +++ b/tutorials/t06-type-inference.qmd @@ -0,0 +1,112 @@ +--- +title: "T6 — Follow Type Inference for a Simple Model" +--- + +## Goal + +Print the `InferenceResult` for the resistor analog block and confirm that +`resolved_calls` maps the `V(br_a_b)` call expression to +`ResolvedFun::BuiltIn(BuiltIn::potential)` and the `I(br_a_b) <+` statement's +`assignment_destination` to `AssignDst::Flow(BranchWrite::Named(br_a_b))`. + +--- + +## Prerequisites + +- [T4](t04-lowered-body.qmd) completed. + +**Background reading:** [`hir_ty` INTERNALS §7–§10](../docs/hir_ty/INTERNALS.md) + +--- + +## Setup + +`InferenceResult` does not have a `dump()` method. Use `{:#?}` Debug +formatting in the scratch crate: + +```rust +use hir_ty::db::HirTyDB; +use hir_def::DefWithBodyId; + +let result = db.inference_result(body_id); +println!("{:#?}", result); +``` + +--- + +## Walkthrough + +### The five maps + +`InferenceResult` holds five maps. For the resistor, focus on three: + +**`expr_types`** — maps each `ExprId` to a `Ty`. The path expression `R` +maps to `Ty::Param(Type::Real, ParamId(0))`. The path expression `br_a_b` +maps to `Ty::Branch(BranchId(0))`. + +**`resolved_calls`** — maps call expressions to `ResolvedFun`. The +expression `V(br_a_b)` maps to `ResolvedFun::BuiltIn(BuiltIn::potential)`. +The expression `I(br_a_b)` maps to `ResolvedFun::BuiltIn(BuiltIn::flow)`. +This is the resolution of the nature access: the compiler determines whether +`V` means potential or flow by inspecting the discipline of the argument +branch. + +**`assignment_destination`** — maps `StmtId` to `AssignDst`. The statement +`I(br_a_b) <+ vres/res` maps to +`AssignDst::Flow(BranchWrite::Named(BranchId(0)))`. The statement +`vres = V(br_a_b)` maps to `AssignDst::Var(VarId(0))`. + +### Nature access resolution in detail + +When the inference engine encounters `V(br_a_b)`: + +1. `V` was resolved by the `DefMap` to `NatureAccess(potential_id)`. +2. The argument `br_a_b` has type `Ty::Branch(BranchId(0))`. +3. `hir_ty::lower::branch_info(BranchId(0))` returns a `BranchTy` whose + `discipline` is `DisciplineId` for `electrical`. +4. `DisciplineTy::access(NatureAccess::Potential)` returns `BuiltIn::potential`. +5. `resolved_calls[V_expr] = ResolvedFun::BuiltIn(BuiltIn::potential)`. + +The same process for `I(br_a_b)` with `NatureAccess::Flow` yields +`BuiltIn::flow`. + +### How `<+` becomes `AssignDst::Flow` + +The contribution statement `I(br_a_b) <+` is a `Stmt::Assignment` in the +`Body`. During inference, `infere_stmt` checks the left-hand side: + +- The call `I(br_a_b)` resolves (via `resolved_calls`) to `BuiltIn::flow`. +- A `<+` assignment to a `flow` access → `AssignDst::Flow(branch_write)`. +- The `branch_write` is `BranchWrite::Named(BranchId(0))`. + +This is stored in `assignment_destination[stmt_id]` and later read by +`hir::BodyRef::get_stmt()` to produce `Stmt::Contribute { kind: Flow, … }`. + +--- + +## Exercises + +1. **Confirm `resolved_signatures`.** Find the `resolved_signatures` entry + for the `V(br_a_b)` call. It should be `NATURE_ACCESS_BRANCH` (defined + in `hir_ty::builtin`). Look up `NATURE_ACCESS_BRANCH` in the source to + see what overload it represents. + +2. **Check `casts`.** The `casts` map is empty for the resistor. Add a + parameter `integer N = 1;` and use it in an expression that requires an + `integer → real` cast. Does the `casts` map now have an entry? + +3. **Inspect `$temperature`.** Find the `ExprId` for `$temperature` in the + body. What `Ty` does it have in `expr_types`? What does `resolved_calls` + say about it? + +4. **What happens with `V(A,B)`?** Change `V(br_a_b)` to `V(A,B)` (using + nodes directly instead of a named branch). Dump the `InferenceResult` again. + How does the `BranchWrite` change in `assignment_destination`? + +--- + +## What's next + +[T7 — Understand the Automatic Differentiation Pass](t07-autodiff.qmd) +shows what `mir_autodiff` adds to the optimised evaluation MIR: derivative +instructions for the Jacobian columns needed by the simulator. diff --git a/tutorials/t07-autodiff.qmd b/tutorials/t07-autodiff.qmd new file mode 100644 index 00000000..79412289 --- /dev/null +++ b/tutorials/t07-autodiff.qmd @@ -0,0 +1,107 @@ +--- +title: "T7 — Understand the Automatic Differentiation Pass" +--- + +## Goal + +Compare the MIR before and after `mir_autodiff` for the resistor evaluation +kernel. Identify the Jacobian seed (`∂V(A,B)/∂V(A,B) = 1`), the quotient-rule +expansion of `vres/res`, and the four `MatrixEntry` values produced in the +`DaeSystem`. + +--- + +## Prerequisites + +- [T5](t05-mir-contribution.qmd) completed; you can read MIR text format. +- [T6](t06-type-inference.qmd) recommended; understanding `resolved_calls` + helps interpret which values correspond to which unknowns. + +**Background reading:** [`mir_autodiff` INTERNALS](../docs/mir_autodiff/INTERNALS.md) | +[`sim_back` INTERNALS §DAE system](../docs/sim_back/INTERNALS.md) + +--- + +## Walkthrough + +### Why derivatives? + +Circuit simulators (ngspice, Spectre, Qucs-S) solve systems of nonlinear +differential-algebraic equations (DAEs) using Newton–Raphson iteration. Each +Newton step requires the Jacobian: the matrix of partial derivatives of the +residual currents with respect to the circuit node voltages. + +For the resistor the DAE residual at node `A` is `f(V_A, V_B) = (V_A−V_B)/R_t`. +The Jacobian entries are: + +| Entry | Formula | Value | +|---|---|---| +| `∂f_A/∂V_A` | `+1/R_t` | resistive, positive | +| `∂f_A/∂V_B` | `−1/R_t` | resistive, negative | +| `∂f_B/∂V_A` | `−1/R_t` | (KCL symmetry) | +| `∂f_B/∂V_B` | `+1/R_t` | resistive, positive | + +`mir_autodiff` computes these by source transformation on the MIR. + +### Using --dump-mir to see the AD output + +```bash +./target/release/openvaf-r \ + integration_tests/RESISTOR/resistor.va \ + --dump-mir +``` + +With `--dump-mir` (without `--dry-run`) the MIR is printed after AD has run. +You will see the evaluation kernel function followed by additional +instructions added by AD. + +### Reading the derivative instructions + +The AD pass inserts instructions using the *reverse-mode accumulation* pattern. +For `v9 = fdiv v0, v8`: + +- `∂(v9)/∂(v0) = 1/v8` → `fdiv fconst(1.0), v8` +- `∂(v9)/∂(v8) = −v0/v8²` → `fneg (fdiv v0 (fmul v8 v8))` + +These appear in the MIR as new `Value` definitions after the original +`fdiv` instruction. + +### Reading the DaeSystem snapshot + +Compare your MIR output with the test snapshot: +``` +openvaf/test_data/dae/resistor_va_system.snap +``` + +The snapshot records the four `MatrixEntry` values (`j0` through `j3`) with +their `resist` MIR `Value` references. Match each one to the derivative +instruction you identified above. + +--- + +## Exercises + +1. **Count derivative instructions.** How many new instructions does AD add + for the resistor? Compare before (`--dump-unopt-mir`) and after (`--dump-mir`). + +2. **Constant Jacobian.** Because `R_t` does not depend on the circuit + voltages, the Jacobian entries are constants (once `R`, `zeta`, `tnom`, + and `$temperature` are known). Does the OSDI descriptor flag these + entries as `JACOBIAN_ENTRY_REACT_CONST`? Check `resistor.snap`. + +3. **Read `mir_autodiff` entry point.** Open + `openvaf/mir_autodiff/src/lib.rs` and find the `auto_diff` function + signature. What does the `derivatives` parameter tell AD about which + outputs to differentiate? + +4. **Verify the `KnownDerivatives`.** Add a `println!("{:#?}", known_derivs)` + in your scratch crate (calling `HirInterner::unknowns(func, true)`) to + see which MIR `Value`s AD will treat as unknowns. + +--- + +## What's next + +[T8 — Add a Simple Nonlinearity (Diode I–V)](t08-diode-nonlinearity.qmd) +introduces a model with an exponential I–V curve, which produces a +voltage-dependent Jacobian and lets you see how AD handles `exp`. diff --git a/tutorials/t08-diode-nonlinearity.qmd b/tutorials/t08-diode-nonlinearity.qmd new file mode 100644 index 00000000..7d527f74 --- /dev/null +++ b/tutorials/t08-diode-nonlinearity.qmd @@ -0,0 +1,115 @@ +--- +title: "T8 — Add a Simple Nonlinearity (Diode I–V)" +--- + +## Goal + +Write a minimal exponential diode model, compile it, and compare the MIR +before and after `mir_opt`. Observe how constant propagation and DCE reduce +the instruction count, and how the Jacobian for `exp(V/Vt)` differs from the +linear resistor Jacobian. + +--- + +## Prerequisites + +- [T7](t07-autodiff.qmd) completed. + +--- + +## Setup + +Create a new file `tutorials/models/diode_simple.va`: + +```verilog +`include "disciplines.vams" +`include "constants.vams" + +module diode_simple(A, C); + inout A, C; + electrical A, C; + + (*desc="Saturation current", units="A"*) parameter real IS = 1e-14 from (0:inf); + (*desc="Emission coefficient"*) parameter real N = 1.0 from (0:inf); + + real vd, id; + + analog begin + vd = V(A, C); + id = IS * (exp(vd / (N * $vt)) - 1.0); + I(A, C) <+ id; + end +endmodule +``` + +`$vt` is the thermal voltage `kT/q` (a system parameter, approximately +25.85 mV at 300 K). + +--- + +## Walkthrough + +### Step 1 — Compile and dump MIR + +```bash +./target/release/openvaf-r tutorials/models/diode_simple.va \ + --dump-unopt-mir --dump-mir --dry-run +``` + +### Step 2 — Count instructions before and after opt + +Paste the two MIR listings side-by-side. `mir_opt` typically: + +- Folds `N * $vt` into a single `fmul` (if `N` is constant). +- Eliminates the intermediate variable `vd` (replaced by its defining + expression directly into the `exp` argument). +- Removes dead branches left by SCCP. + +### Step 3 — Find the Jacobian for `exp` + +The derivative of `IS * (exp(vd/Vt) − 1)` with respect to `vd` is +`IS/Vt * exp(vd/Vt)`. In the MIR this appears as a `fmul` of the `exp` +result with `1/Vt` (or equivalently a `fdiv` by `Vt`). + +Because this Jacobian depends on `vd` (and therefore on the circuit voltage), +the OSDI descriptor will **not** mark it as `JACOBIAN_ENTRY_REACT_CONST`. + +### Step 4 — Compare with the full diode model + +The integration test model at +`openvaf/test_data/osdi/diode_lim.va` is a more complete diode with +self-heating. Compile it and compare its OSDI descriptor snap +(`openvaf/test_data/osdi/diode_lim.snap` if it exists) with the minimal +model you just built. + +--- + +## Exercises + +1. **Measure instruction counts.** Count instructions in the unoptimised vs + optimised MIR. Which pass removes the most? + +2. **Add a series resistance.** Add `parameter real RS = 0.0;` and a series + resistance term `I(A,C) <+ V(A,C) / RS;` (guarded by `if (RS > 0)`). + How does the MIR change? + +3. **`$vt` vs explicit formula.** Replace `$vt` with + `\`P_K * $temperature / \`P_Q` (using the constants from `constants.vams`). + Do you get the same compiled output? + +4. **Inspect `resolved_calls` for `exp`.** In your scratch crate, dump the + `InferenceResult` for this diode. What does `resolved_calls` say about + the `exp` call? + +5. **Observe the `optbarrier`.** In the optimised MIR, the output `id` is + wrapped in `optbarrier`. Why? Remove it mentally — what could LLVM do + that would break the simulation? + +--- + +## What's next + +[T9 — Inspect the OSDI Descriptor](t09-osdi-descriptor.qmd) reads the +compiled `.osdi` file directly using a small C or Rust harness to print the +node names, parameter layout, Jacobian flags, and evaluation function +pointers from `OSDI_DESCRIPTORS`. diff --git a/tutorials/t09-osdi-descriptor.qmd b/tutorials/t09-osdi-descriptor.qmd new file mode 100644 index 00000000..f06ea902 --- /dev/null +++ b/tutorials/t09-osdi-descriptor.qmd @@ -0,0 +1,124 @@ +--- +title: "T9 — Inspect the OSDI Descriptor" +--- + +## Goal + +Load the compiled `resistor.osdi` from T1 into a small Rust harness that +`dlopen`s the library and reads `OSDI_DESCRIPTORS`. Print the node names, +parameter names, Jacobian entry flags, and `OSDI_VERSION`. Match each field +against the 46-field `OsdiDescriptor` struct documented in +[`osdi` INTERNALS](../docs/osdi/INTERNALS.md). + +--- + +## Prerequisites + +- [T1](t01-build-and-compile.qmd) completed: `resistor.osdi` is built. +- [T7](t07-autodiff.qmd) recommended: understanding Jacobian entries. + +**Background reading:** [`osdi` INTERNALS](../docs/osdi/INTERNALS.md) + +--- + +## Setup + +Add `libloading` to your scratch crate: + +```toml +[dependencies] +libloading = "0.8" +``` + +--- + +## Walkthrough + +### The OSDI 0.4 ABI + +When a simulator `dlopen`s an `.osdi` file it looks for two symbols: + +- **`OSDI_DESCRIPTORS`** — a `[OsdiDescriptor; N]` array, one entry per + Verilog-A `module` in the source. +- **`OSDI_DESCRIPTOR_SIZE`** — `u32` byte size of a single `OsdiDescriptor`. + +Because OpenVAF compiles `resistor.va` (one module `resistor_va`), there +is one entry. + +The layout of `OsdiDescriptor` is defined by the OSDI 0.4 specification and +mirrored in the LLVM struct `OsdiTys.osdi_descriptor` that OpenVAF builds from +the per-target stdlib bitcode. + +> **TODO(verify):** The exact field offsets below should be confirmed against +> `openvaf/osdi/src/lib.rs` once the descriptor emission code is read +> carefully. + +### A minimal Rust reader + +```rust +use libloading::{Library, Symbol}; + +fn main() { + let lib = unsafe { Library::new("resistor.osdi") }.unwrap(); + + // Read OSDI_DESCRIPTOR_SIZE + let size: Symbol<*const u32> = unsafe { lib.get(b"OSDI_DESCRIPTOR_SIZE\0") }.unwrap(); + let descriptor_size = unsafe { **size }; + println!("Descriptor size: {descriptor_size} bytes"); + + // Read the first byte of OSDI_DESCRIPTORS as raw bytes + let descriptors: Symbol<*const u8> = unsafe { lib.get(b"OSDI_DESCRIPTORS\0") }.unwrap(); + let raw = unsafe { std::slice::from_raw_parts(*descriptors, descriptor_size as usize) }; + println!("First {} bytes of descriptor: {:02x?}", raw.len().min(32), &raw[..raw.len().min(32)]); +} +``` + +For a fully typed reader you would define a Rust struct that mirrors +`OsdiDescriptor` exactly. The OSDI 0.4 spec is the reference. + +### Comparing against the snapshot + +The snapshot at `openvaf/test_data/osdi/resistor.snap` shows what OpenVAF's +own test suite expects from the resistor descriptor: + +``` +param "R" units = "Ohm", desc = "Ohmic resistance", flags = ParameterFlags(0x0) +param "zeta" ... +param "tnom" ... +2 terminals +node "A" ... +node "B" ... +jacobian (A, A) JacobianFlags(JACOBIAN_ENTRY_RESIST | JACOBIAN_ENTRY_REACT_CONST) +... +``` + +Match each line against the OSDI descriptor struct fields printed by your +harness. + +--- + +## Exercises + +1. **Count parameters.** The resistor has 3 model parameters plus an implicit + `$mfactor` instance parameter. Confirm your reader finds 4 parameter + entries in the descriptor. + +2. **Interpret `JacobianFlags`.** `JACOBIAN_ENTRY_RESIST` means the entry is + in the resistive (DC/AC) Jacobian. `JACOBIAN_ENTRY_REACT_CONST` means the + reactive part is constant (zero for this model). Look up the flag values + in the OSDI 0.4 specification. + +3. **Read the diode descriptor.** Compile `tutorials/models/diode_simple.va` + from T8 and read its descriptor. Confirm the Jacobian entry is **not** + `REACT_CONST` (because it depends on the operating point). + +4. **Print the function pointer.** The descriptor contains a pointer to the + evaluation function. Print its numeric address. Confirm it is non-null. + +--- + +## What's next + +[T10 — Use ngspice to Run a DC Sweep](t10-ngspice-simulation.qmd) loads +the compiled resistor into ngspice using the OSDI interface and runs a DC +sweep to verify the resistance value. diff --git a/tutorials/t10-ngspice-simulation.qmd b/tutorials/t10-ngspice-simulation.qmd new file mode 100644 index 00000000..ca7ed69a --- /dev/null +++ b/tutorials/t10-ngspice-simulation.qmd @@ -0,0 +1,122 @@ +--- +title: "T10 — Use ngspice to Run a DC Sweep" +--- + +## Goal + +Load `resistor.osdi` in ngspice using the OSDI device interface. Write a +simple netlist, run a DC sweep from 0 V to 1 V, and verify that the simulated +current matches `I = V / R` with the default parameter value. + +--- + +## Prerequisites + +- [T1](t01-build-and-compile.qmd) completed: `resistor.osdi` built. +- ngspice ≥ 40 with OSDI support compiled in (`--enable-osdi`). + +--- + +## Setup + +### Install ngspice with OSDI support + +**Ubuntu / Debian (build from source):** +```bash +sudo apt install build-essential automake libtool libreadline-dev +git clone https://git.code.sf.net/p/ngspice/ngspice +cd ngspice +./autogen.sh +./configure --with-x --enable-osdi --enable-xspice +make -j$(nproc) +sudo make install +``` + +**Verify OSDI is available:** +```bash +ngspice --version | grep -i osdi +# Should print something like: OSDI interface enabled +``` + +--- + +## Walkthrough + +### Step 1 — Write the netlist + +Create `tutorials/netlists/resistor_dc.sp`: + +```spice +* DC sweep of OSDI resistor model +.osdi resistor.osdi + +* Instance: R1, node A=in, node B=0(GND), model=resistor_va, R=1k +Xr1 in 0 resistor_va R=1k + +* DC voltage source +Vtest in 0 DC 1 + +.dc Vtest 0 1 0.01 + +.control +run +plot i(Vtest) +.endc + +.end +``` + +The `.osdi` directive tells ngspice to load the shared library. The instance +`Xr1` is a subcircuit call — by convention, OSDI models are instantiated as +`X` elements. + +### Step 2 — Run the simulation + +```bash +ngspice -b tutorials/netlists/resistor_dc.sp -o resistor_dc.log +``` + +`-b` runs in batch mode (no interactive console). + +### Step 3 — Verify the result + +The log file will contain the DC operating point and sweep data. For +`R = 1 kΩ` and `V = 1 V`, the expected current is `I = 1 V / 1 kΩ = 1 mA`. + +Extract the current at `V = 1`: +```bash +grep "1.000000e+00" resistor_dc.log | head -5 +``` + +### Step 4 — Vary the `R` parameter + +Change `R=1k` to `R=10k` in the netlist and re-run. The current at 1 V +should be 0.1 mA. + +--- + +## Exercises + +1. **Plot V–I curve.** Sweep from −1 V to +1 V and plot current. It should + be a straight line through the origin with slope `1/R`. + +2. **Temperature dependence.** The model has a `zeta` temperature coefficient. + Set `zeta=1` and `tnom=300`. Run simulations at different temperatures + using ngspice's `.temp` directive. Verify that the resistance follows + `R_t = R * (T/tnom)^zeta`. + +3. **Try the diode.** Compile `tutorials/models/diode_simple.va` from T8, + load it via `.osdi`, and run a DC sweep. The exponential I–V should be + visible in the log. + +4. **Check the OSDI handshake.** Enable ngspice's verbose OSDI logging + (`set ngdebug` in `.control`) to see the descriptor fields it reads + from the library at load time. + +--- + +## What's next + +[T11 — Multi-Port Model and Node Collapse](t11-multiport-node-collapse.qmd) +introduces a three-terminal model with an internal node and shows how +`sim_back`'s node-collapse analysis reduces the simulator's matrix size. diff --git a/tutorials/t11-multiport-node-collapse.qmd b/tutorials/t11-multiport-node-collapse.qmd new file mode 100644 index 00000000..bf444b0a --- /dev/null +++ b/tutorials/t11-multiport-node-collapse.qmd @@ -0,0 +1,98 @@ +--- +title: "T11 — Multi-Port Model and Node Collapse" +--- + +## Goal + +Write a three-terminal BJT stub model that declares an explicit internal base +node. Compile it and inspect the `NodeCollapse` output from `sim_back`: +confirm that the `CollapsePair` for the internal node appears, and understand +the conditions under which a simulator may collapse it. + +--- + +## Prerequisites + +- [T8](t08-diode-nonlinearity.qmd) completed. +- [T5](t05-mir-contribution.qmd) recommended: reading MIR dumps. + +**Background reading:** [`sim_back` INTERNALS §node collapse](../docs/sim_back/INTERNALS.md) + +--- + +## Walkthrough + +### The node-collapse optimisation + +In a Verilog-A model, an *internal node* (one not listed in the port +declaration) may have a purely resistive connection to an external node — +for example a base contact resistance `R_b` between the external base port +`B` and the internal base node `Bi`. + +If `R_b = 0` (or the simulator decides the voltage across it is negligibly +small), the simulator can *collapse* the two nodes: treat them as the same +node in the circuit matrix, eliminating one row and one column. This reduces +the matrix size and speeds up Newton iteration. + +`sim_back` identifies potential collapse candidates and records them in +`NodeCollapse`. The OSDI descriptor includes this information so the +simulator can decide at runtime whether to collapse. + +### A minimal BJT stub + +```verilog +`include "disciplines.vams" + +module bjt_stub(C, B, E); + inout C, B, E; + electrical C, B, E, Bi; // Bi is internal + + parameter real Rb = 100.0 from [0:inf]; + + analog begin + // base contact resistance + V(B, Bi) <+ Rb * I(B, Bi); + + // placeholder collector current (not physical) + I(C, E) <+ 1e-3 * V(B, Bi); + end +endmodule +``` + +### Inspecting the NodeCollapse + +Compile with `--dump-mir`: + +```bash +./target/release/openvaf-r tutorials/models/bjt_stub.va \ + --dump-mir --dry-run +``` + +In the `sim_back` output look for the `NodeCollapse` section. When `Rb` can +be zero, the pair `(B, Bi)` should appear as a collapse candidate. + +--- + +## Exercises + +1. **Force collapse.** Set `Rb = 0.0` as the *only* default (no `from` + range). Does `sim_back` mark the pair as collapsible? + +2. **Internal node with capacitance.** Add `I(B, Bi) <+ ddt(C_b * V(B,Bi))` + (a capacitance). Does the node still collapse? + +3. **Read `NodeCollapse` source.** Open + `openvaf/sim_back/src/node_collapse.rs` and find the condition that decides + when a node pair becomes a collapse candidate. + +4. **Check the OSDI descriptor.** Compile and inspect the descriptor + (as in T9). Find the collapse-pair field and verify it matches your + analysis. + +--- + +## What's next + +[T12 — Multi-Domain Model (Electro-Thermal)](t12-multi-domain.qmd) shows +how OpenVAF handles models with multiple disciplines (electrical and thermal) +and how `hir_ty` resolves two different nature hierarchies. diff --git a/tutorials/t12-multi-domain.qmd b/tutorials/t12-multi-domain.qmd new file mode 100644 index 00000000..9e44b3fc --- /dev/null +++ b/tutorials/t12-multi-domain.qmd @@ -0,0 +1,103 @@ +--- +title: "T12 — Multi-Domain Model (Electro-Thermal)" +--- + +## Goal + +Write a compact model that couples an electrical port with a thermal port using +a custom `temperature` discipline. Trace how `hir_ty` resolves the two +different `NatureTy` hierarchies and how `sim_back` produces two separate +`SimUnknownKind` variants in the `DaeSystem`. + +--- + +## Prerequisites + +- [T6](t06-type-inference.qmd) completed. +- [T11](t11-multiport-node-collapse.qmd) recommended. + +**Background reading:** [`hir_ty` INTERNALS §3–§5](../docs/hir_ty/INTERNALS.md) | +[`sim_back` INTERNALS](../docs/sim_back/INTERNALS.md) + +--- + +## Walkthrough + +### A thermal discipline + +The `diode_lim.va` integration test model at +`openvaf/test_data/osdi/diode_lim.va` already demonstrates a multi-domain +model with `electrical` and `thermal` ports. Read it first: + +```bash +cat openvaf/test_data/osdi/diode_lim.va +``` + +The `thermal` discipline uses `Pwr` (power, watts) as the flow access +function and `Temp` (temperature, Kelvin) as the potential access function. + +### Two discipline hierarchies + +In `hir_ty`, each discipline is resolved to a `DisciplineTy`: + +``` +DisciplineTy { flow: NatureId(Pwr), potential: NatureId(Temp) } -- thermal +DisciplineTy { flow: NatureId(Current), potential: NatureId(Voltage) } -- electrical +``` + +The two `NatureId`s come from different nature hierarchies. `hir_ty` traces +the `parent` chain for each nature until it reaches the base nature (the one +with no parent), which determines the `units` and `access` function name. + +### Two SimUnknownKind entries + +In `sim_back`, the `DaeSystem::unknowns` map contains one entry per simulator +unknown. For a two-port electro-thermal model: + +```rust +SimUnknownKind::KirchoffLaw(NodeId(A)) -- electrical node A +SimUnknownKind::KirchoffLaw(NodeId(C)) -- electrical node C +SimUnknownKind::KirchoffLaw(NodeId(dT)) -- thermal node dT +``` + +The thermal node contributes a temperature `Temp(br_sht)` access and a power +`Pwr(br_sht) <+` contribution — structurally identical to the electrical case, +just with different nature types. + +### Compile and inspect + +```bash +./target/release/openvaf-r openvaf/test_data/osdi/diode_lim.va \ + --dump-mir --dry-run +``` + +In the output, identify the `SimUnknown` entries in the dump and match each +to the `electrical` and `thermal` nodes in the source. + +--- + +## Exercises + +1. **Write a minimal electro-thermal model.** Write a two-port resistor with + self-heating: electrical ports `A`, `B` and a thermal port `dT`. The + dissipated power is `P = V(A,B)^2 / R`; add `Pwr(dT) <+ P`. + +2. **Trace `DisciplineTy::access`.** In `hir_ty`, the function + `DisciplineTy::access(NatureAccess)` decides whether an access is + potential or flow. Trace the call chain for `Pwr(br_sht)`. + +3. **Compare Jacobian sizes.** A two-port electrical model has a 2×2 + Jacobian. A two-port model with one extra thermal port has a 3×3 + Jacobian. Confirm this in the `DaeSystem` dump. + +4. **Read the thermal discipline.** Open `openvaf/basedb/src/stdlibs/` + (where the standard library `.vams` files are embedded). Find the + `thermal` discipline definition and confirm the access function names. + +--- + +## What's next + +[T13 — Noise Analysis Support](t13-noise-analysis.qmd) adds `white_noise` +and `flicker_noise` contributions to the diode model and traces the noise +terms through `sim_back/noise.rs` into the OSDI descriptor's noise source list. diff --git a/tutorials/t13-noise-analysis.qmd b/tutorials/t13-noise-analysis.qmd new file mode 100644 index 00000000..187e621d --- /dev/null +++ b/tutorials/t13-noise-analysis.qmd @@ -0,0 +1,96 @@ +--- +title: "T13 — Noise Analysis Support" +--- + +## Goal + +Extend the diode model from T8 with `white_noise` and `flicker_noise` +contributions. Trace the noise terms through `sim_back/noise.rs` into the +`NoiseSources` table in the `DaeSystem`, and verify they appear in the OSDI +descriptor's noise source list. + +--- + +## Prerequisites + +- [T8](t08-diode-nonlinearity.qmd) completed. +- [T9](t09-osdi-descriptor.qmd) recommended: reading OSDI descriptor fields. + +**Background reading:** [`sim_back` INTERNALS §noise](../docs/sim_back/INTERNALS.md) + +--- + +## Walkthrough + +### Noise in Verilog-A + +Noise contributions are added with `white_noise(psd, name)` and +`flicker_noise(psd, exp, name)` in the `analog` block. These are translated +by OpenVAF into `ImplicitEquationKind::NoiseSrc` entries in the `HirInterner` +and collected into `DaeSystem::noise_sources`. + +### Adding noise to the diode + +Extend `tutorials/models/diode_simple.va`: + +```verilog +analog begin + vd = V(A, C); + id = IS * (exp(vd / (N * $vt)) - 1.0); + I(A, C) <+ id; + + // Shot noise: PSD = 2 * q * |id| + I(A, C) <+ white_noise(2 * `P_Q * abs(id), "shot"); + + // Flicker noise + I(A, C) <+ flicker_noise(1e-10 * abs(id), 1.0, "flicker"); +end +``` + +### Compiling and inspecting + +```bash +./target/release/openvaf-r tutorials/models/diode_simple.va \ + --dump-mir --dry-run +``` + +In the dump, find the `NoiseSources` section. Each entry records the noise +type, the branch it is associated with, and the MIR `Value` that computes +the PSD. + +### Reading the OSDI descriptor + +The OSDI descriptor includes a `noise_sources` array. Use the reader harness +from T9 to confirm the two noise entries appear with the names `"shot"` and +`"flicker"`. + +Also reference the existing noise model test: +``` +openvaf/test_data/osdi/noise.va +``` + +--- + +## Exercises + +1. **Read `noise.va`.** Open `openvaf/test_data/osdi/noise.va` and compare + its noise contributions against `diode_simple.va`. What additional noise + sources does it include? + +2. **Find `noise.rs`.** Open `openvaf/sim_back/src/noise.rs`. What data + structure does it build from the `HirInterner`'s implicit equations? + +3. **Disable flicker noise.** Remove the `flicker_noise` line and recompile. + Verify the `NoiseSources` table shrinks by one entry. + +4. **Correlated noise.** Some models add noise to two branches simultaneously. + Look for `white_noise` on two different branches in a model. How does + `sim_back` represent this in the `DaeSystem`? + +--- + +## What's next + +[T14 — Custom Analog Function and the Function DefMap](t14-analog-function.qmd) +shows how a Verilog-A `function` gets its own `DefMap`, how the function call +is lowered through `hir_lower`, and whether it is inlined into the MIR. diff --git a/tutorials/t14-analog-function.qmd b/tutorials/t14-analog-function.qmd new file mode 100644 index 00000000..1e853143 --- /dev/null +++ b/tutorials/t14-analog-function.qmd @@ -0,0 +1,128 @@ +--- +title: "T14 — Custom Analog Function and the Function DefMap" +--- + +## Goal + +Write a Verilog-A `function real` that is called from the analog block. +Trace name resolution through the function's own `DefMap`, inspect the lowered +MIR, and observe whether the function call is inlined or preserved as a +`Call` instruction. + +--- + +## Prerequisites + +- [T3](t03-name-resolution.qmd) completed. +- [T5](t05-mir-contribution.qmd) completed. + +**Background reading:** [`hir_def` INTERNALS §7 — `DefWithBodyId`](../docs/hir_def/INTERNALS.md) | +[`hir_lower` INTERNALS](../docs/hir_lower/INTERNALS.md) + +--- + +## Walkthrough + +### A model with an analog function + +```verilog +`include "disciplines.vams" + +module diode_func(A, C); + inout A, C; + electrical A, C; + + parameter real IS = 1e-14 from (0:inf); + parameter real N = 1.0 from (0:inf); + + // Custom clamping function + function real clamp; + input real x, lo, hi; + begin + if (x < lo) clamp = lo; + else if (x > hi) clamp = hi; + else clamp = x; + end + endfunction + + real vd, id; + + analog begin + vd = clamp(V(A,C), -1.0, 2.0); + id = IS * (exp(vd / (N * $vt)) - 1.0); + I(A, C) <+ id; + end +endmodule +``` + +### The function DefMap + +In `hir_def`, a `function` gets its own `DefMap` built by +`db.function_def_map(fun_id)`. The function scope contains: + +- `x`, `lo`, `hi` → `FunctionArgId` entries (not `ParamId`) +- `clamp` → `FunctionReturn` (the implicit output variable) + +Dump the function DefMap from the scratch crate: + +```rust +use hir_def::db::HirDefDB; + +let fun_id = /* obtain FunctionId from the module DefMap */; +let fn_def_map = db.function_def_map(fun_id); +println!("{}", fn_def_map.dump(&db)); +``` + +### The function Body + +```rust +let fn_body_id = DefWithBodyId::FunctionId(fun_id); +let fn_body = db.body(fn_body_id); +println!("{}", fn_body.dump()); +``` + +The body contains the `if`/`else` tree and the assignments to `clamp` +(the function return variable). + +### Inlining in the MIR + +```bash +./target/release/openvaf-r tutorials/models/diode_func.va \ + --dump-mir --dry-run +``` + +OpenVAF inlines analog functions at the MIR level by default: the `clamp` +call is replaced by its body's instructions directly in the caller's `Function`. +You will see the `select` (ternary) instructions from the `if`/`else` tree +inlined into the eval kernel, not a separate `call` instruction. + +--- + +## Exercises + +1. **Find `FunctionReturn`.** In the function body dump, identify the + `Stmt::Assignment` that writes to `clamp` (the implicit return). In + `hir_ty`'s `assignment_destination`, what `AssignDst` variant does it map to? + +2. **Recursive call.** Try a recursive function. Does OpenVAF accept it? + What error (if any) is produced? + +3. **Multiple return paths.** Add a fourth case to `clamp` that handles + `x == lo` specially. Count the `select` instructions in the resulting MIR. + +4. **Integer function.** Write `function integer sign` that returns `1`, `0`, + or `-1`. Call it from the analog block. Inspect the `casts` map in + `InferenceResult` — does the integer return need a cast before use in a + real expression? + +5. **Read `ctx.rs`.** In `openvaf/hir_lower/src/ctx.rs`, find where function + calls are handled. Does `LoweringCtx` have a dedicated "inline function" + method, or is it handled inline in the call lowering? + +--- + +## What's next + +[T15 — Cross-Platform Compilation and the Target Spec](t15-cross-platform.qmd) +shows how the `Target` spec affects code generation: the pointer width, +`_hypot` vs `hypot` symbol selection, and the final object file ABI. diff --git a/tutorials/t15-cross-platform.qmd b/tutorials/t15-cross-platform.qmd new file mode 100644 index 00000000..f466d5ee --- /dev/null +++ b/tutorials/t15-cross-platform.qmd @@ -0,0 +1,131 @@ +--- +title: "T15 — Cross-Platform Compilation and the Target Spec" +--- + +## Goal + +Build OpenVAF targeting a different architecture (e.g. `x86_64-unknown-linux-gnu` +from a macOS or Windows host using a cross-linker). Trace how the `Target` +spec affects `LLVMBackend` feature strings, the `Types` primitive-pointer-width +table, the `_hypot` vs `hypot` symbol selection, and the final object file ABI. + +--- + +## Prerequisites + +- [T1](t01-build-and-compile.qmd) completed; `openvaf-r` is built. +- A cross-linker installed for the target platform (e.g. `x86_64-linux-gnu-gcc` + on Ubuntu for cross-compiling to a different Linux target). + +**Background reading:** [`mir_llvm` INTERNALS §fast-math and intrinsics](../docs/mir_llvm/INTERNALS.md) + +--- + +## Walkthrough + +### List available targets + +```bash +./target/release/openvaf-r --supported-targets +``` + +This calls `target::spec::get_target_names()` and lists every target OpenVAF +knows about. + +### Compile for a specific target + +```bash +./target/release/openvaf-r \ + integration_tests/RESISTOR/resistor.va \ + --target x86_64-unknown-linux-gnu \ + --target_cpu generic \ + -o resistor_linux.osdi +``` + +`--target_cpu generic` disables host-specific LLVM features (AVX, SSE4.2, +etc.) so the resulting binary runs on any x86-64 processor. + +### The `_hypot` vs `hypot` distinction + +On Windows, the C runtime does not export `hypot` — it exports `_hypot` +instead. `mir_llvm` checks `target.is_like_windows` and emits `_hypot` +accordingly: + +```rust +// mir_llvm/src/intrinsics.rs +let name = if cx.target_spec().is_like_windows { "_hypot" } else { "hypot" }; +``` + +When cross-compiling to a Windows target on Linux, confirm that the emitted +LLVM IR calls `_hypot`: + +```bash +./target/release/openvaf-r \ + integration_tests/RESISTOR/resistor.va \ + --target x86_64-pc-windows-msvc \ + --dump-unopt-ir \ + --dry-run 2>&1 | grep hypot +``` + +### Pointer width and the Types table + +`mir_llvm`'s `Types` struct adapts to the target pointer width. For a 64-bit +target, `ptr_sized_int()` returns `i64`; for a 32-bit target it returns `i32`. +This affects the size of OSDI descriptor fields that hold pointers (e.g. +function-pointer fields). + +Verify by checking the output of `--dump-unopt-ir` for a 32-bit target: + +```bash +./target/release/openvaf-r \ + integration_tests/RESISTOR/resistor.va \ + --target i686-unknown-linux-gnu \ + --dump-unopt-ir \ + --dry-run +``` + +Look for `i32` vs `i64` in struct type definitions. + +--- + +## Exercises + +1. **List all Windows targets.** Filter `--supported-targets` for entries + containing `windows`. How many are there? + +2. **Compare object file headers.** Use `file` (Linux/macOS) or `dumpbin /headers` + (Windows) to inspect the object file type of cross-compiled `.osdi` outputs. + +3. **Read `Target` spec.** Open `openvaf/target/src/spec.rs` (or equivalent) + and find the fields that `mir_llvm` consults when building `LLVMBackend`. + +4. **Host CPU features.** Compile with `--target_cpu native` and + `--dump-unopt-ir`. Look for LLVM function attributes like + `target-features="+avx2,+sse4.2"`. How do these affect the emitted + floating-point instructions? + +5. **Embed the stdlib bitcode.** The OSDI descriptor's struct layout is + determined by per-target stdlib bitcode embedded in the `osdi` crate at + build time (see `openvaf/osdi/build.rs`). Find the bitcode files and + confirm there is one per target family. + +--- + +## What's next + +You have reached the end of the tutorial series! By now you can: + +- Build and use `openvaf-r` to compile arbitrary Verilog-A models. +- Read the `ItemTree`, `DefMap`, and `Body` to understand what the frontend + sees. +- Interpret MIR text, including derivative instructions from `mir_autodiff`. +- Read and write the OSDI descriptor format. +- Load compiled models into ngspice. + +Suggested further reading: + +- [Architecture overview](../docs/ARCHITECTURE.md) — the full compilation + pipeline in one document. +- The OSDI 0.4 specification — the normative reference for the descriptor ABI. +- The OpenVAF test suite (`openvaf/{hir,hir_def,hir_ty,sim_back}/tests/`) — + the most honest documentation of edge cases. From 8a83d8c1e4f4345c0a6dc24d4f60a549cb75051e Mon Sep 17 00:00:00 2001 From: philippevelha Date: Sat, 23 May 2026 18:20:47 +0200 Subject: [PATCH 11/28] docs: add target crate INTERNALS --- docs/target/INTERNALS.md | 472 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 472 insertions(+) create mode 100644 docs/target/INTERNALS.md diff --git a/docs/target/INTERNALS.md b/docs/target/INTERNALS.md new file mode 100644 index 00000000..5e6aa8b7 --- /dev/null +++ b/docs/target/INTERNALS.md @@ -0,0 +1,472 @@ +# `target` — Compilation Target Specifications + +**Location:** `openvaf/target/` +**Role:** Defines what platforms OpenVAF can compile for. The crate provides +the `Target` struct (LLVM triple, data layout, pointer width, linker flavor, +and per-platform link arguments), a fixed table of seven built-in targets, and +the compile-time `host_triple()` function. On Windows it also builds and embeds +a UCRT import library so the linker can reference the C runtime without +requiring Visual Studio to be installed. + +Cross-links: [linker\_target INTERNALS](../linker_target/INTERNALS.md) · +[mir\_llvm INTERNALS](../mir_llvm/INTERNALS.md) · +[ARCHITECTURE](../../ARCHITECTURE.md) + +--- + +## Crate relationships + +``` +target (no runtime dependencies) + └─► linker (selects linker flavor; passes pre/post link args) + └─► mir_llvm (reads llvm_target, data_layout, pointer_width, cpu, features) + └─► osdi (reads is_like_windows for symbol-naming decisions) + └─► openvaf (CLI; calls Target::host_target() and Target::search()) +``` + +`target` is intentionally a leaf crate with no runtime dependencies. All +platform knowledge is encoded at compile time. The `build.rs` script uses the +`cc` crate (a build-time dependency only) to compile the UCRT shim; the +resulting bytes are embedded via `include_bytes!` so the final binary needs +nothing installed at link time beyond the system linker. + +--- + +## Module map + +| File | Contents | +|------|----------| +| `src/lib.rs` | Public re-exports; `host_triple()`; `supported_targets!` macro | +| `src/spec.rs` | `LinkerFlavor`, `Target`, `TargetOptions`, `LinkArgs`; `Target::search*` | +| `src/spec/linux_base.rs` | Shared options for all Linux targets | +| `src/spec/apple_base.rs` | Shared options for macOS and Apple Silicon | +| `src/spec/windows_base.rs` | Shared options for all Windows targets | +| `src/spec/windows_msvc_base.rs` | MSVC-specific options layered over windows_base | +| `src/spec/x86_64_unknown_linux.rs` | Concrete target: x86-64 Linux | +| `src/spec/aarch64_unknown_linux.rs` | Concrete target: AArch64 Linux | +| `src/spec/x86_64_pc_windows.rs` | Concrete target: x86-64 Windows (MSVC) | +| `src/spec/aarch64_pc_windows.rs` | Concrete target: AArch64 Windows (MSVC) | +| `src/spec/x86_64_apple_darwin.rs` | Concrete target: x86-64 macOS | +| `src/spec/aarch64_apple_darwin.rs` | Concrete target: Apple Silicon macOS | +| `src/spec/x86_64_unknown_linux_musl.rs` | Concrete target: musl-libc Linux | +| `src/ucrt.c` | UCRT `snprintf` shim (compiled by build.rs) | +| `build.rs` | Compiles ucrt.c; emits `CFG_COMPILER_HOST_TRIPLE` | + +--- + +## `LinkerFlavor` + +```rust +pub enum LinkerFlavor { + Ld, // GNU ld / lld on Linux and musl + Ld64, // Apple ld64 on macOS + Msvc, // link.exe on Windows +} +``` + +`LinkerFlavor` is the dispatch key for two independent lookups: + +1. **Linker selection** in the `linker` crate: `LinkerFlavor::Msvc` maps to + `MsvcLinker`; `Ld` and `Ld64` both map to `LdLinker` (with the macOS + `-dylib` flag instead of `-shared`). + +2. **Link-argument lookup** in `TargetOptions::pre_link_args` and + `post_link_args`: these are `BTreeMap>`. Every + target module inserts arguments under the flavor it uses, and the linker + crate iterates only over the entry for the active flavor. + +The `flavor_mappings!` macro in `spec.rs` builds a `&[(LinkerFlavor, &str)]` +mapping used by serialization; all three flavors have a canonical string name +(`"ld"`, `"ld64"`, `"msvc"`). + +--- + +## `Target` and `TargetOptions` + +```rust +pub struct Target { + pub llvm_target: String, // LLVM triple: "x86_64-unknown-linux-gnu" + pub pointer_width: u32, // 32 or 64 + pub arch: String, // "x86_64" | "aarch64" + pub data_layout: String, // LLVM datalayout string + pub options: TargetOptions, +} +``` + +`llvm_target` is passed directly to `llvm::TargetMachine::create` in `mir_llvm` +and appears verbatim in the object file's ELF/Mach-O/COFF machine type. +`data_layout` seeds `llvm::Module::setDataLayout`; it must match the target +triple or LLVM will emit incorrect code for struct padding and vector alignment. + +```rust +pub struct TargetOptions { + pub is_builtin: bool, + pub cpu: String, // e.g. "x86-64", "apple-m1" + pub features: String, // LLVM feature string e.g. "+avx2,-x87" + pub linker_flavor: LinkerFlavor, + pub pre_link_args: LinkArgs, // BTreeMap> + pub post_link_args: LinkArgs, + pub import_lib: &'static [u8], // embedded .lib bytes, empty on non-Windows + pub is_like_windows: bool, + pub is_like_osx: bool, +} +``` + +`cpu` and `features` are forwarded to LLVM's `TargetMachine` builder, which +uses them to select instruction-set extensions. The default `cpu = "generic"` +and empty `features` produce fully portable code; target modules override this +when the platform guarantees a specific baseline (e.g. `"x86-64"` enables +SSE2, which is mandatory on 64-bit x86). + +`import_lib` is `&'static [u8]` rather than `Option>` because the +bytes come from `include_bytes!` at compile time and live in the binary's +read-only data segment. On all non-Windows targets it is the empty slice `&[]`. + +--- + +## Base modules and inheritance + +OpenVAF avoids copy-pasting by factoring shared options into base modules. +Each concrete target calls its base's function and then overrides specific +fields: + +``` +linux_base::opts() + └─ pre_link_args[Ld] += ["--no-add-needed", "--hash-style=gnu"] + ├─ x86_64_unknown_linux cpu="x86-64", pre_link += ["-m", "elf_x86_64"] + ├─ aarch64_unknown_linux cpu="generic" (AArch64 has no legacy sub-ISAs) + └─ x86_64_unknown_linux_musl (same as x86_64 linux) + +apple_base::opts() + └─ linker_flavor = Ld64, is_like_osx = true + ├─ x86_64_apple_darwin cpu="core2" + └─ aarch64_apple_darwin cpu="apple-m1" + +windows_base::opts() + └─ is_like_windows = true + └─ windows_msvc_base::opts() + └─ pre_link_args[Msvc] += ["/NOLOGO"] + post_link_args[Msvc] += ["msvcrt.lib"] + ├─ x86_64_pc_windows import_lib = UCRT_IMPORTLIB (x64) + └─ aarch64_pc_windows import_lib = UCRT_IMPORTLIB (arm64) +``` + +`linux_base`'s `--hash-style=gnu` improves dynamic-linker startup time on +modern Linux distributions; `--no-add-needed` prevents the linker from adding +implicit `NEEDED` entries for shared libraries that the object file references +transitively. Together they give cleaner and faster shared-object loading for +the `.osdi` plugin. + +--- + +## The `supported_targets!` macro + +```rust +supported_targets! { + ("x86_64-unknown-linux-gnu", x86_64_unknown_linux), + ("aarch64-unknown-linux-gnu", aarch64_unknown_linux), + ("x86_64-unknown-linux-musl", x86_64_unknown_linux_musl), + ("x86_64-pc-windows-msvc", x86_64_pc_windows), + ("aarch64-pc-windows-msvc", aarch64_pc_windows), + ("x86_64-apple-darwin", x86_64_apple_darwin), + ("aarch64-apple-darwin", aarch64_apple_darwin), +} +``` + +The macro expands to: + +- An array `TARGETS: &[(&str, fn() -> Target)]` pairing each LLVM triple + string with a constructor function. +- A `load_specific(target: &str) -> Option` function that walks the + array and calls the constructor on a match. + +`Target::search(triple)` calls `load_specific` and returns `Err` if the triple +is not in the table. There is no JSON loading path; the target set is closed +and checked at compile time. Adding a new target requires adding a source file +and a macro entry, then rebuilding the compiler. + +--- + +## Target lookup API + +```rust +impl Target { + pub fn search(target_triple: &str) -> Result; + pub fn search_llvm_triple(llvm_target: &str) -> Result; + pub fn host_target() -> Result; +} +``` + +**`search`** takes a target triple exactly as it appears in the +`supported_targets!` table (e.g. `"x86_64-unknown-linux-gnu"`). + +**`search_llvm_triple`** iterates the same table but matches on the +`Target::llvm_target` field. For the current seven targets the two strings are +identical, but the separation exists because LLVM triples and Rust target names +occasionally differ (e.g. `"x86_64-apple-macosx10.7.0"` vs +`"x86_64-apple-darwin"`). + +**`host_target`** calls `host_triple()` and passes the result to `search`. + +--- + +## `host_triple()` — compile-time platform detection + +```rust +pub fn host_triple() -> &'static str { + // CFG_COMPILER_HOST_TRIPLE is set by build.rs + let triple = env!("CFG_COMPILER_HOST_TRIPLE"); + + // MSYS2 GNU environment appears as windows-gnu but the MSVC linker is used + if triple.contains("windows-gnu") { + return "x86_64-pc-windows-msvc"; + } + // Normalize older Apple triples to the canonical name + if triple.contains("apple") { + if triple.contains("aarch64") { + return "aarch64-apple-darwin"; + } else { + return "x86_64-apple-darwin"; + } + } + triple +} +``` + +`CFG_COMPILER_HOST_TRIPLE` is emitted by `build.rs` as: + +```rust +println!("cargo:rustc-env=CFG_COMPILER_HOST_TRIPLE={}", triple); +``` + +where `triple` comes from `std::env::var("TARGET")` — the Cargo-supplied +target triple for the compilation host. Because `env!` is evaluated at compile +time, `host_triple()` is a zero-cost `&'static str` with no syscalls at +runtime. + +The two special cases patch over real-world mismatches: + +- **`windows-gnu`**: The MSYS2 build environment reports itself as + `x86_64-pc-windows-gnu`, but OpenVAF uses the MSVC linker on all Windows + hosts. Normalizing to `windows-msvc` means `host_target()` always returns + the MSVC target on Windows regardless of the Rust toolchain flavour used to + compile OpenVAF. + +- **`apple`**: older Xcode toolchains produced triples like + `x86_64-apple-macosx10.15.0` rather than the bare `x86_64-apple-darwin`. + The normalization collapses all Apple variants to the two canonical entries + in the target table. + +--- + +## The UCRT import library + +### Why it exists + +On Windows, the OSDI `.osdi` shared library must link against the Universal C +Runtime (UCRT) for functions like `printf`, `malloc`, and `snprintf`. The UCRT +import library (`ucrt.lib`) ships with the Windows SDK and the MSVC toolchain, +but OpenVAF's linker runs at compile time on any machine — including CI runners +and developer machines that may not have the full Visual Studio installation. + +To avoid requiring the Windows SDK as a prerequisite, `target/build.rs` +produces a minimal import library at build time and embeds it in the binary. +The embedded bytes are then handed to the linker via a temporary file whenever +OpenVAF links a Windows target. + +### `ucrt.c` — the shim + +`src/ucrt.c` imports exactly one function: + +```c +// Polyfill: expose snprintf as a proper export in older UCRT versions. +// __stdio_common_vsprintf is always available; snprintf may not be in import libs. +int snprintf(char *buf, size_t count, const char *fmt, ...) { + va_list args; + va_start(args, fmt); + int r = __stdio_common_vsprintf( + _CRT_INTERNAL_PRINTF_STANDARD_SNPRINTF_BEHAVIOR, + buf, count, fmt, NULL, args); + va_end(args); + return r; +} +``` + +`__stdio_common_vsprintf` is the underlying UCRT entry point that backs +`snprintf`, `sprintf`, and similar functions. Using it directly bypasses the +older SDK import-library gap where `snprintf` was not exported by name from +`ucrtbase.dll`. With this shim, the generated object file defines `snprintf` +locally so that any OSDI plugin referencing it will resolve correctly. + +### `build.rs` compilation pipeline + +`build.rs` runs a two-step pipeline for each Windows target architecture: + +``` +ucrt.c + │ + ├─ clang -c -target -o ucrt_.obj + │ (or cc crate fallback) + │ + └─ llvm-lib /OUT:ucrt_.lib ucrt_.obj + (MSVC-format import library) +``` + +For MSYS2 builds (`MSYSTEM` env var present), `build.rs` substitutes `ar` +for `llvm-lib` to produce a GNU-format archive instead. + +The `.lib` file is written to `$OUT_DIR/ucrt_{x64,arm64}.lib`. The concrete +target modules then embed it: + +```rust +// in x86_64_pc_windows.rs +const UCRT_IMPORTLIB: &[u8] = + include_bytes!(concat!(env!("OUT_DIR"), "/ucrt_x64.lib")); + +pub fn target() -> Target { + Target { + options: TargetOptions { + import_lib: UCRT_IMPORTLIB, + ..windows_msvc_base::opts() + }, + .. + } +} +``` + +`include_bytes!` resolves the path at compile time, so `UCRT_IMPORTLIB` is a +`&'static [u8]` pointing into the binary's `.rodata` section. The bytes are +never written to disk again unless the linker needs them. + +### How the linker uses `import_lib` + +In the `linker` crate, `link()` checks `target.options.import_lib`: + +```rust +if !target.options.import_lib.is_empty() { + let lib_path = out_dir.join("import_lib.lib"); + fs::write(&lib_path, target.options.import_lib)?; + cmd.arg(lib_path); +} +``` + +A temporary `.lib` is materialized from the embedded bytes, its path is appended +to the linker command, and the file is cleaned up after the link step completes. +This keeps the "no prerequisites" guarantee: a fresh Windows machine with only +the OpenVAF binary can link `.osdi` files without any SDK installation. + +--- + +## Worked example: compiling for x86_64-unknown-linux-gnu + +The CLI receives `--target x86_64-unknown-linux-gnu` (or defaults to +`host_target()`). The pipeline from target lookup to linked `.osdi`: + +**Step 1 — Target lookup.** + +```rust +let target = Target::search("x86_64-unknown-linux-gnu").unwrap(); +``` + +`load_specific` matches the string in the `supported_targets!` table and calls +`x86_64_unknown_linux::target()`: + +```rust +pub fn target() -> Target { + let mut base = linux_base::opts(); + base.cpu = "x86-64".into(); + base.pre_link_args + .entry(LinkerFlavor::Ld) + .or_default() + .extend(["-m".into(), "elf_x86_64".into()]); + + Target { + llvm_target: "x86_64-unknown-linux-gnu".into(), + pointer_width: 64, + arch: "x86_64".into(), + data_layout: "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128".into(), + options: TargetOptions { + is_builtin: true, + ..base + }, + } +} +``` + +**Step 2 — LLVM target machine.** + +`mir_llvm` calls: + +```rust +llvm::TargetMachine::create( + &target.llvm_target, // "x86_64-unknown-linux-gnu" + &target.options.cpu, // "x86-64" + &target.options.features, // "" + &target.data_layout, +) +``` + +LLVM selects the x86-64 backend, enables SSE2 (implied by `x86-64` CPU +class), and sets up the data layout for 64-bit LP64 Linux. The generated +object file will be ELF64 with System V AMD64 ABI calling conventions. + +**Step 3 — Link-argument assembly.** + +The `linker` crate reads: + +```rust +target.options.pre_link_args[LinkerFlavor::Ld] +// → ["--no-add-needed", "--hash-style=gnu", "-m", "elf_x86_64"] +``` + +These are prepended to the `ld` command before the input object and output +`-o` flag. `post_link_args` is empty for Linux targets. + +**Step 4 — Linker invocation.** + +``` +ld --no-add-needed --hash-style=gnu -m elf_x86_64 + -shared + resistor.o + -o resistor.osdi +``` + +The result is an ELF shared object exporting the OSDI entry points +(`osdi_descriptors`, `osdi_init`, etc.) and containing all the compiled +analog equations. + +--- + +## Key design decisions + +**Fixed target table, not JSON.** Some compilers (notably rustc) support +user-defined target JSON files. OpenVAF restricts itself to the seven built-in +targets. The trade-off is inflexibility in exchange for correctness guarantees: +every `data_layout` string and every link argument combination has been tested +against the LLVM version shipped with OpenVAF. A user-supplied target could +produce silently malformed output if the data layout were wrong. + +**`import_lib` is `&'static [u8]`, not `Option`.** Embedding the +bytes as a `&'static [u8]` ensures the compiler binary is truly self-contained +on Windows. The alternative — pointing at a file on disk — would mean that +moving the binary breaks Windows cross-compilation, and that CI machines need +a Windows SDK installed regardless of the target platform. + +**`BTreeMap` for link args.** `LinkArgs = BTreeMap>` +preserves insertion order within each flavor's argument list (because +`BTreeMap` iterates keys in sorted order and `Vec` preserves push order). +This matters because linkers treat argument order as significant: `-m elf_x86_64` +must precede the input objects, and `msvcrt.lib` must appear after them. + +**`host_triple()` normalizes at compile time.** The special cases for +`windows-gnu` and `apple` variants are resolved in a single `if`-chain baked +into the binary. There is no runtime detection, no registry query, and no +environment variable consulted at startup. If the normalization ever needs to +change, recompiling the compiler is the correct response. + +**Separate `pre_link_args` and `post_link_args`.** Linker arguments that must +appear before the input objects (emulation mode `-m`, output format flags) are +kept strictly separate from arguments that must appear after (runtime libraries +`msvcrt.lib`). This prevents the base modules from accidentally placing a +post-link library before the object file, which would cause undefined symbol +errors on link. From f9f9f2ce6a21b702a9d55eff4d4ed04d922f3ab3 Mon Sep 17 00:00:00 2001 From: philippevelha Date: Sun, 24 May 2026 08:17:22 +0200 Subject: [PATCH 12/28] docs: add stdx crate INTERNALS --- docs/stdx/INTERNALS.md | 428 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 428 insertions(+) create mode 100644 docs/stdx/INTERNALS.md diff --git a/docs/stdx/INTERNALS.md b/docs/stdx/INTERNALS.md new file mode 100644 index 00000000..59822a1a --- /dev/null +++ b/docs/stdx/INTERNALS.md @@ -0,0 +1,428 @@ +# `stdx` — Standard Extensions + +**Location:** `lib/stdx/` +**Role:** A grab-bag of small utilities that are used across almost every crate +in OpenVAF but are too small to warrant their own library and too +OpenVAF-specific (or upstream-unavailable) to pull from a third-party crate. +It has no runtime dependencies. Every other crate in the workspace that needs +one of these utilities adds `stdx` to its `Cargo.toml`. + +Cross-links: [mir INTERNALS](../mir/INTERNALS.md) · +[mir\_build INTERNALS](../mir_build/INTERNALS.md) · +[ARCHITECTURE](../../ARCHITECTURE.md) + +--- + +## Module map + +| File | Contents | +|------|----------| +| `src/lib.rs` | Top-level re-exports; CI flags; test helpers; `Upcast` trait | +| `src/ieee64.rs` | `Ieee64` — bit-exact f64 with C99 hex format | +| `src/packed_option.rs` | `ReservedValue` trait; `PackedOption` | +| `src/macros.rs` | Code-generation macros for index types, enums, and formatters | +| `src/iter.rs` | `zip` free function; `multiunzip` / `MultiUnzip` up to 12-tuples | +| `src/vec.rs` | `ensure_contains_elem`; `VecExtensions`; `SliceExntesions` | +| `src/pretty.rs` | `List` — configurable separator/prefix/postfix for error messages | + +--- + +## `Ieee64` — bit-exact floating-point + +```rust +#[derive(Copy, Clone, Debug, Eq, PartialEq, Hash)] +#[repr(transparent)] +pub struct Ieee64(u64); +``` + +`Ieee64` stores an IEEE 754 binary64 value as its raw bit pattern. The key +design choice is that it derives `Eq` and `Hash` directly on the `u64` bits, +not on the floating-point value. This makes `Ieee64` safe to use as a map key +and in `#[derive(Eq)]` structs — something `f64` cannot do because NaN ≠ NaN +under IEEE semantics. + +The MIR uses `Ieee64` as the payload for `F64` constant values in the `Value` +enum. Any time a `real` literal appears in Verilog-A source, it eventually +becomes an `Ieee64` constant in the MIR instruction stream. + +### Construction + +```rust +Ieee64::with_float(1.5) // from f64: calls f64::to_bits() +Ieee64::with_bits(0x3FF8000000000000) // from raw u64 +"0x1.8000000000000p0".parse::() // from hex string +``` + +### Display format + +`Ieee64` implements `Display` using `format_float(bits, w=11, t=52, …)`, which +produces the C99 `printf "%a"` hexadecimal floating-point format. The format +is: + +| Class | Example | +|-------|---------| +| Zero | `0.0` / `-0.0` | +| Normal | `0x1.8000000000000p0` (1.5) | +| Subnormal | `0x0.8000000000000p-1022` | +| Infinity | `+Inf` / `-Inf` | +| Quiet NaN | `+NaN` / `+NaN:0x1` (with payload) | +| Signaling NaN | `+sNaN:0x1` | + +The format is lossless: `parse_float` is the exact inverse of `format_float`. +Any `f64` bit pattern survives a round-trip through `Display` → `FromStr` +without changing a single bit. This is required for MIR serialization — the +compiler's test suite compares MIR text dumps byte-for-byte. + +### `parse_float` internals + +`parse_float` reads the hex significand and decimal exponent, normalizes the +significand so the implicit leading `1` bit is at position `t` (bit 52), then +reconstructs the biased exponent field. It rejects: + +- Decimal fractions (only `0.0` is allowed in decimal; everything else must be + `0x…`) +- Too many significand bits (would require rounding, which would be lossy) +- Exponents out of range (overflow → `"Magnitude too large"`, underflow → + `"Magnitude too small"`) +- Subnormal values where the shift would discard set bits (`"Subnormal + underflow"`) + +The tight rejection policy ensures the format remains a bijection. + +--- + +## `PackedOption` — zero-overhead optional index + +```rust +pub trait ReservedValue { + fn reserved_value() -> Self; + fn is_reserved_value(&self) -> bool; +} + +#[derive(Clone, Copy, …)] +#[repr(transparent)] +pub struct PackedOption(T); +``` + +`PackedOption` stores an optional `T` in the same space as `T` itself by +repurposing one value of `T` as the sentinel for `None`. For `u32`-backed +index types, the sentinel is `u32::MAX` — a value that can never be a valid +index into any realistic array. + +**Why not `Option`?** On a 64-bit machine, `Option` is 8 bytes because +the compiler needs a discriminant byte (padded to alignment). `PackedOption` +is 4 bytes. In data structures with millions of entries — like the MIR's +instruction and block tables — the difference is meaningful. The MIR's +`Phi` nodes use `PackedOption` to represent optional predecessor values, +and the SSA builder uses it for optional block predecessors. + +### The `impl_idx_from!` macro and `ReservedValue` + +The `impl_idx_from!` macro (in `macros.rs`) automatically implements +`ReservedValue` for any `u32`-backed index newtype: + +```rust +impl_idx_from!(Block(u32)); +// expands to, among other things: +impl ReservedValue for Block { + fn reserved_value() -> Self { Block(u32::MAX) } + fn is_reserved_value(&self) -> bool { self.0 == u32::MAX } +} +``` + +This means `PackedOption` works out of the box for any type declared +with `impl_idx_from!`. + +### API + +`PackedOption` mirrors the `Option` API closely: + +```rust +packed.is_some() // true if not the reserved value +packed.is_none() +packed.expand() // → Option +packed.map(|t| …) // → Option +packed.unwrap() // panics if None +packed.unwrap_unchecked() // noop in release, panics in debug +packed.take() // → Option, leaves None behind +``` + +Conversion in both directions is via `From`: + +```rust +let p: PackedOption = Some(b).into(); // From> +let p: PackedOption = b.into(); // From; debug-asserts not reserved +let o: Option = p.into(); // From> +``` + +--- + +## Macros + +### Index type macros + +These four macros are the most widely used items in `stdx`. Nearly every +entity type in the compiler — `Block`, `Value`, `Inst`, `Place`, `FileId`, +`LocalScopeId`, etc. — is declared as a `pub struct Foo(u32)` newtype and then +wired up with one of these macros. + +**`impl_idx_from!(Foo(u32))`** — generates: + +- `From for Foo` and `From for u32` +- `From for Foo` (with a debug bounds check) and `From for usize` +- `impl ReservedValue for Foo { reserved = Foo(u32::MAX) }` + +This is the most common macro call in the codebase. It gives the type full +numeric interoperability and plugs it into `PackedOption`. + +**`impl_idx_from_readonly!(Foo(u32))`** — generates only the `Foo → u32` and +`Foo → usize` conversions, not the reverse. Used for types where constructing +from a raw integer would be unsafe (e.g. handles into a validated table). + +**`impl_idx_math!(Foo(u32))`** — generates `Add`, `Sub`, `AddAssign`, +`SubAssign` for combinations of `Foo`, `u32`, and `usize`. Used for index +types that need arithmetic (e.g. advancing a cursor or computing an offset). + +**`impl_idx_math_from!(Foo(u32))`** — shorthand for +`impl_idx_from!` + `impl_idx_math!`. + +### Enum conversion macros + +**`impl_from!(A, B, C for MyEnum)`** — generates `From for MyEnum`, +`TryFrom for A`, and so on for each variant. The variants must be +tuple variants `MyEnum::A(A)`. Avoids writing the same boilerplate repeatedly +for sum types like HIR nodes. + +**`impl_from_typed!(Foo(FooType), Bar(BarType) for MyEnum)`** — same but for +variants where the inner type differs from the variant name. + +### Formatter macros + +**`impl_display!`**, **`impl_debug!`**, **`impl_debug_display!`** all delegate +to `impl_fmt!`, which generates a `fmt::Display` or `fmt::Debug` impl from a +match expression: + +```rust +impl_display! { + match MyError { + MyError::NotFound(name) => "symbol '{}' not found", name; + MyError::TypeMismatch => "type mismatch"; + } +} +``` + +This pattern is used throughout the diagnostics layer to keep error message +strings next to the variant they describe. + +### Utility macros + +**`format_to!($buf, "fmt {}", arg)`** — appends a formatted string to an +existing `String` using `fmt::Write`, avoiding a heap allocation compared to +`format!` followed by `push_str`. Used in the pretty-printer and in diagnostic +formatting where a message is built incrementally. + +**`eprintln!`** — wraps `std::eprintln!` but panics on CI (`IS_CI = true`) if +called. This ensures that debug `eprintln!` calls are never accidentally left +in the codebase and reach a CI run, where they would silently pass but +contaminate output. + +--- + +## `iter` — iterator utilities + +### `zip` + +```rust +pub fn zip(a: A, b: B) -> Zip<…> +``` + +A free function that calls `a.into_iter().zip(b)`. Exists because the method +form requires the left side to already be an iterator; the free-function form +accepts any `IntoIterator` on both sides, which is more ergonomic when zipping +slices or ranges. + +### `multiunzip` + +```rust +pub fn multiunzip(i: I) -> FromI +where I::IntoIter: MultiUnzip +``` + +`MultiUnzip` is implemented for iterators of 1- to 12-tuples. It consumes the +iterator and distributes each column into a separate `Default + Extend` +collection. This is the n-ary generalization of `Iterator::unzip`: + +```rust +let items = vec![(1u32, "a", true), (2, "b", false)]; +let (nums, strs, bools): (Vec, Vec<&str>, Vec) = multiunzip(items); +// nums = [1, 2], strs = ["a", "b"], bools = [true, false] +``` + +Used in the HIR and MIR where a single pass over a list needs to produce +multiple parallel output vectors simultaneously. + +--- + +## `vec` — slice and vector extensions + +### `ensure_contains_elem` + +```rust +pub fn ensure_contains_elem(vec: &mut Vec, elem: usize, fill_value: impl FnMut() -> T) +``` + +Grows `vec` until index `elem` is valid, filling new slots with `fill_value`. +Used when building a sparse mapping from indices to values: rather than +pre-allocating with a known bound, the builder calls `ensure_contains_elem` +on each insert and the vector grows on demand. + +### `SliceExntesions` + +The trait (note the typo in the source — `Exntesions` — preserved here +verbatim) extends `[T]` with methods for obtaining mutable references to +multiple distinct elements simultaneously: + +```rust +pub trait SliceExntesions { + fn pick2_mut(&mut self, a: usize, b: usize) -> (&mut T, &mut T); + fn pick3_mut(&mut self, a: usize, b: usize, c: usize) -> (&mut T, &mut T, &mut T); + fn pick_n_mut(&mut self, indices: [usize; N]) -> [&mut T; N]; +} +``` + +Rust's borrow checker rejects `(&mut v[a], &mut v[b])` because it cannot prove +`a ≠ b` at compile time. These methods assert uniqueness at runtime and then +use `unsafe` raw-pointer arithmetic to produce the independent mutable +references. The pattern is needed in the MIR builder and optimizer when two +blocks or two instruction slots must be mutated in the same operation. + +`pick_n_mut` is the general form: it takes a const-generic array of `N` +indices, validates all-distinct and in-bounds, then casts the raw pointer array +to a reference array via `ptr::read`. + +--- + +## `pretty` — list formatting for diagnostics + +```rust +pub struct List { + pub data: C, + pub separator: &'static str, // default: ", " + pub final_separator: &'static str, // default: " or " + pub prefix: &'static str, // default: "" + pub postfix: &'static str, // default: "" + pub break_after: u32, // default: 10 + pub first_break_after: u32, // default: 5 +} +``` + +`List` wraps any collection and formats it as a human-readable list in +`Display`. The `final_separator` is used between the last two items, so a list +of three elements `[x, y, z]` formats as `"x, y or z"` — the Oxford-comma-free +form used in most of OpenVAF's error messages. + +`break_after` and `first_break_after` insert newlines when the list is long. +The first line breaks after `first_break_after` items; subsequent lines break +every `break_after` items. This prevents a single run-on line when reporting +"expected one of: keyword1, keyword2, …, keywordN." + +The builder methods allow construction without writing struct literals: + +```rust +List::new(&["module", "discipline", "nature"]) + .with_final_separator(" or ") + .surround("`") +// → "`module`, `discipline` or `nature`" + +List::path(&["std", "constants"]) +// separator = ".", final_separator = "." +// → "std.constants" +``` + +`List::path` is a convenience constructor for dot-separated identifier paths +used in diagnostic messages that reference qualified names. + +--- + +## Top-level utilities (`lib.rs`) + +### CI and test-gating constants + +```rust +pub const IS_CI: bool = option_env!("CI").is_some(); +pub const SKIP_HOST_TESTS: bool = option_env!("CI").is_some() && cfg!(windows); +``` + +Both are compile-time constants baked from environment variables. `IS_CI` is +`true` when the `CI` environment variable is set (standard for GitHub Actions, +CircleCI, etc.). `SKIP_HOST_TESTS` is `true` on CI Windows builds, where +host-specific tests that require the local toolchain installed are suppressed. + +### Test helper functions + +```rust +pub fn skip_slow_tests() -> bool +pub fn ignore_dev_tests(_: &T) -> bool +pub fn ignore_slow_tests(_: &T) -> bool +pub fn ignore_never(_: &T) -> bool +``` + +These are passed as the `ignore` argument in `#[rstest]` or similar test +harnesses. `skip_slow_tests` also creates a sentinel file at +`target/.slow_tests_cookie` when slow tests do run, which CI scripts can check +to confirm the full test suite was exercised. + +`project_root()` walks up the directory tree from `CARGO_MANIFEST_DIR` until it +finds a directory containing `README.md`, which it treats as the workspace root. +This makes `openvaf_test_data("resistor.va")` and `integration_test_dir("osdi")` +work regardless of which crate's tests call them. + +### `Upcast` + +```rust +pub trait Upcast { + fn upcast(&self) -> &T; +} +``` + +A manual coercion trait for upcasting a concrete database type to one of its +supertrait objects. Salsa databases implement several query group traits; code +that holds a `&dyn DatabaseA` sometimes needs a `&dyn DatabaseB` without going +through the concrete type. `Upcast` on the database struct +provides that bridge. This pattern appears in `basedb` and `hir_def` where the +Salsa database is split into layered query groups. + +--- + +## Key design decisions + +**`Ieee64` uses `u64` equality, not `f64` equality.** Two `Ieee64` values are +equal iff their bit patterns are identical. This means `+NaN ≠ -NaN` (which +have different sign bits) and `+0.0 ≠ -0.0` (different sign bits). These +distinctions matter in the MIR constant table, where two bit-identical +constants should share the same `Value` slot but two bit-different constants +should not, regardless of IEEE numeric equivalence. + +**`PackedOption` uses `MAX` as the sentinel.** Using the all-ones bit pattern +(`u32::MAX = 0xFFFF_FFFF`) is safe because no realistic data structure has +4 billion entries. The sentinel is statically known, so `is_reserved_value` +compiles to a single comparison instruction with no branching. + +**`SliceExntesions` uses unsafe raw pointers.** There is no safe way to obtain +two `&mut T` from the same slice in stable Rust without using +`split_at_mut` (which only works for disjoint prefix/suffix). The +`pick2_mut`/`pick3_mut`/`pick_n_mut` implementations use `assert_ne!` to +establish the disjointness invariant at runtime, then perform the cast. The +bounds check (`assert!(idx < len)`) ensures the pointers are valid; the +distinctness check ensures they are non-aliasing. + +**`List` separates `separator` from `final_separator`.** Using `" or "` only +before the last element and `", "` before all others produces grammatically +correct English lists ("x, y or z") without always-Oxford-comma or +always-plain-comma variants. This is a deliberate UX decision: OpenVAF's error +messages should read as natural English prose, not as machine-formatted lists. + +**No runtime dependencies.** `stdx` has an empty `[dependencies]` section in +`Cargo.toml`. This keeps compile times for all downstream crates low and +ensures `stdx` can be used as a foundation without pulling in any transitive +dependency graph. From 6bad04f856e8831f56af03870dcaf6ab7b31eb4b Mon Sep 17 00:00:00 2001 From: philippevelha Date: Sun, 24 May 2026 08:21:22 +0200 Subject: [PATCH 13/28] docs: add arena crate INTERNALS --- docs/arena/INTERNALS.md | 264 ++++++++++++++++++++++++++++++++++++++++ 1 file changed, 264 insertions(+) create mode 100644 docs/arena/INTERNALS.md diff --git a/docs/arena/INTERNALS.md b/docs/arena/INTERNALS.md new file mode 100644 index 00000000..0bc34299 --- /dev/null +++ b/docs/arena/INTERNALS.md @@ -0,0 +1,264 @@ +# `arena` — Typed arena allocation + +**Location:** `lib/arena/` +**Role:** Provides `Idx`, a typed `u32` handle into a contiguous allocation +arena, plus `IdxRange` for contiguous slices of handles, and type aliases +`Arena` and `ArenaMap` that wrap `typed_index_collections::TiVec`. +The entire crate is one file and has no dependencies beyond `typed-index-collections`. + +Cross-links: [stdx INTERNALS](../stdx/INTERNALS.md) · +[hir\_def INTERNALS](../hir_def/INTERNALS.md) · +[basedb INTERNALS](../basedb/INTERNALS.md) · +[ARCHITECTURE](../../ARCHITECTURE.md) + +--- + +## Crate relationships + +``` +arena (lib/arena/) + └─► hir_def (ExprId, StmtId, ItemTreeId, ErasedAstId, …) + └─► basedb (AstIdMap) + └─► mir_autodiff +``` + +`arena` is a leaf library. It is not Salsa-aware and has no runtime logic +beyond allocation. `typed-index-collections` provides the underlying +`TiVec` which enforces that `arena[idx]` only compiles when `idx` +is of the correct `Idx` type. + +--- + +## `Idx` — typed arena handle + +```rust +pub type RawIdx = u32; + +pub struct Idx { + raw: RawIdx, + _ty: PhantomData T>, +} +``` + +`Idx` is a 4-byte handle that indexes into an `Arena`. The +`PhantomData T>` phantom makes `Idx` and `Idx` +distinct types at compile time: you cannot accidentally use an expression +index to index a statement arena. + +The phantom uses the function-pointer form `fn() -> T` (rather than `*const T` +or `T` directly) so that `Idx` is: + +- **covariant** in `T` (like `fn() -> T`; a `fn() -> &'a T` is a subtype of + `fn() -> &'static T` is not required here, but the variance is benign) +- **`Send + Sync`** regardless of whether `T` is, because no `T` is actually + stored inside `Idx` + +### Derived traits + +All standard traits are implemented manually to avoid requiring `T: Clone`, +`T: Eq`, etc. — since `Idx` does not contain a `T`, it can always be +`Copy`, `Clone`, `Eq`, `Hash`, `Ord`, and `PartialOrd` without any bound on +`T`: + +```rust +impl Copy for Idx {} +impl PartialEq for Idx { /* compares raw */ } +impl Hash for Idx { /* hashes raw */ } +impl Ord for Idx { /* orders by raw */ } +``` + +Ordering is by insertion index, which is also allocation order. This means +sorting a set of `Idx` values puts them in the same order they were +allocated, which corresponds to their syntactic order in the source file for +HIR arenas built during lowering. + +### Debug format + +```rust +impl fmt::Debug for Idx { + fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result { + // strips the module path, leaving only the type name + let type_name = /* last component of std::any::type_name::() */; + write!(f, "Idx::<{}>({})", type_name, self.raw) + } +} +``` + +A debug-printed `Idx` looks like `Idx::(3)`. The type name is +stripped to its last `:`-separated component so that fully qualified names +like `hir_def::expr::Expr` become just `Expr`. + +### Conversions + +```rust +impl From for Idx // wraps a raw u32 +impl From> for RawIdx // unwraps +impl From for Idx // debug_assert: usize < u32::MAX +impl From> for usize +``` + +The `usize` conversion truncates on 64-bit platforms (debug-asserted). In +practice no arena grows past `u32::MAX` entries — the debug assertion catches +any overflow during development. + +--- + +## `IdxRange` — contiguous range of handles + +```rust +pub struct IdxRange { + range: Range, + _p: PhantomData, +} +``` + +`IdxRange` represents a half-open range `[start, end)` of `Idx` values +in insertion order. It is the standard way to record that a parent node owns +a contiguous sequence of children — for example, a `Module` in `hir_def` stores +its items as an `IdxRange` into the `ItemTree` arena. + +### Constructors + +```rust +IdxRange::new(a..b) // half-open: [a, b) +IdxRange::new_inclusive(a..=b) // closed: [a, b] → stored as [a, b+1) +``` + +The inclusive form simply adds 1 to the end index before storing it, keeping +the internal representation always half-open. + +### `cover` + +```rust +pub fn cover(&self, other: &Self) -> Self { + Self { range: self.range.start..other.range.end, _p: PhantomData } +} +``` + +Merges two ranges into one that spans from the start of `self` to the end of +`other`, including any gap between them. Used when a parent node's span is +determined by the union of its children's ranges. + +### Iteration + +`IdxRange` implements `Iterator>` and +`DoubleEndedIterator` by delegating to the underlying `Range`. This means +you can write: + +```rust +for expr_id in body.exprs_range { + process(&body.exprs[expr_id]); +} +``` + +without converting to `usize` by hand. + +--- + +## `Arena` and `ArenaMap` + +```rust +pub type Arena = TiVec, T>; +pub type ArenaMap = TiVec, T>; +``` + +Both are thin type aliases over `TiVec` from the `typed-index-collections` +crate. `TiVec` is a `Vec` newtype that only accepts `I` as an index +type — `arena[Idx::(3)]` compiles but `arena[Idx::(3)]` does not +even if both are `u32` at runtime. + +**`Arena`** is used when the value type and the index type are the same +concept: allocating `Expr` values produces `Idx` handles. + +**`ArenaMap`** is used for secondary tables keyed by the same index type +but storing a different value. For example, `hir_def::Body` uses: + +```rust +pub exprs: Arena // primary: stores Expr nodes +pub expr_map_back: ArenaMap>> + // secondary: maps each ExprId back to its AST node +``` + +Both tables are indexed by `Idx` (`ExprId`). `Arena` holds the +HIR `Expr` values; `ArenaMap` holds per-expression metadata without +needing a separate hash map. + +### Allocation + +`TiVec::push` returns the new index: + +```rust +let expr_id: Idx = arena.push(expr_value); +``` + +Indices are stable: `TiVec` never re-orders its elements, so an `Idx` +obtained at allocation time remains valid for the lifetime of the arena. + +--- + +## Worked example: HIR body lowering + +`hir_def::Body` is the data structure that holds the HIR for a single analog +block. During lowering from AST to HIR, expressions and statements are +allocated into two arenas: + +```rust +pub struct Body { + pub exprs: Arena, // all Expr nodes + pub stmts: Arena, // all Stmt nodes + pub expr_map_back: ArenaMap>>, + pub stmt_map_back: ArenaMap>>, + // … +} +``` + +When the lowering pass encounters `V(p, n) / R` in an analog block, it +allocates three `Expr` nodes: + +1. `Expr::BranchAccess(V, Branch(p,n))` → `ExprId(0)` (`Idx::(0)`) +2. `Expr::Var(R)` → `ExprId(1)` +3. `Expr::BinaryOp { op: Div, lhs: ExprId(0), rhs: ExprId(1) }` → `ExprId(2)` + +Each allocation is a single `push` onto `body.exprs`. The children of node 3 +are stored as `ExprId` values — not pointers — so the entire tree is a flat +`Vec` with pointer-free cross-references. The `expr_map_back` arena +is grown in lockstep, so `body.expr_map_back[ExprId(2)]` points back to the +`/` AST node. + +The contribution statement `I(p,n) <+ expr` is similarly a `Stmt` containing +the `ExprId` of the right-hand side. Because both arenas grow contiguously and +indices are stable, the `Body` can be serialized, compared, and cached by +Salsa without copying or pointer-chasing. + +--- + +## Key design decisions + +**`PhantomData T>` over `PhantomData`.** Using `fn() -> T` instead +of `*const T` or `T` means `Idx` is `Send + Sync` unconditionally — no +marker impl needed, no `unsafe` — because the phantom carries no actual data +and imposes no threading restrictions. It also avoids implying that `Idx` +owns a `T` (which would make `Drop` checking relevant). + +**Flat arena, not pointer graph.** Storing `ExprId` values instead of `Box` +or `&Expr` inside each node gives: +- Cache-friendly linear access when iterating all expressions +- Trivial serialization (a `Vec` is just bytes + length) +- Salsa-compatible equality (`Vec` implements `Eq` without unsafe) +- No heap allocation per node (one `Vec` for the whole tree) + +The trade-off is that random access by `ExprId` is `O(1)` but the arena +cannot shrink or free individual elements — it is append-only. For HIR, which +is built once per query and then read-only, this is exactly the right trade-off. + +**Type aliases over a newtype.** `Arena` and `ArenaMap` are type +aliases rather than newtypes. This means all `TiVec` methods (`push`, `iter`, +`len`, `Index`, `IndexMut`, …) are available directly without boilerplate +delegation, while the type-level index check from `TiVec` still prevents +cross-arena indexing errors. + +**`IdxRange` stores `Range`, not `(Idx, Idx)`.** Using the raw +`u32` range avoids storing two phantom-data fields and makes `IdxRange` +trivially `Copy` without an explicit impl. The `Iterator` impl delegates +directly to `Range::next`, which the compiler optimizes to a single +increment instruction. From 3e3f2fd4df494eb4696d82a531d96dad43a5fe2a Mon Sep 17 00:00:00 2001 From: philippevelha Date: Sun, 24 May 2026 08:46:42 +0200 Subject: [PATCH 14/28] docs: add bitset crate INTERNALS --- docs/bitset/INTERNALS.md | 451 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 451 insertions(+) create mode 100644 docs/bitset/INTERNALS.md diff --git a/docs/bitset/INTERNALS.md b/docs/bitset/INTERNALS.md new file mode 100644 index 00000000..880868b2 --- /dev/null +++ b/docs/bitset/INTERNALS.md @@ -0,0 +1,451 @@ +# `bitset` — Typed bit-set collections + +**Location:** `lib/bitset/` +**Role:** A family of bit-set types parameterized over typed index newtypes. +Every type in this crate stores sets of values drawn from a domain of `T` +where `T: Into`. The crate provides dense, sparse, hybrid, and matrix +variants, all sharing a common word type (`u64`) and iteration strategy. + +Cross-links: [stdx INTERNALS](../stdx/INTERNALS.md) · +[arena INTERNALS](../arena/INTERNALS.md) · +[mir\_opt INTERNALS](../mir_opt/INTERNALS.md) · +[ARCHITECTURE](../../ARCHITECTURE.md) + +--- + +## Crate relationships + +``` +bitset (lib/bitset/) + deps: arrayvec, stdx + └─► mir_build (SSABuilder uses BitSet) + └─► mir (BitSet used in liveness data structures) + └─► mir_opt (BitSet, HybridBitSet, SparseBitMatrix in SCCP, DCE, GVN, ADCE) +``` + +`bitset` depends on `stdx` for `zip` and the `SliceExntesions`/`VecExtensions` +traits (used in matrix row-manipulation). It depends on `arrayvec` for the +stack-allocated backing of `SparseBitSet`. + +--- + +## Module map + +| File | Types | +|------|-------| +| `src/lib.rs` | `BitSet`, `GrowableBitSet`, `BitIter`; `bitwise` helper; operation traits | +| `src/sparse.rs` | `SparseBitSet` (module-private); `SPARSE_MAX = 8` | +| `src/hybrid.rs` | `HybridBitSet`, `HybridIter` | +| `src/matrix.rs` | `BitMatrix`, `SparseBitMatrix`, `GrowableSparseBitMatrix` | + +--- + +## Shared foundations + +### Word type + +```rust +pub type Word = u64; +pub const WORD_BYTES: usize = 8; +pub const WORD_BITS: usize = 64; +``` + +All bitsets store bits packed into `u64` words. Using 64-bit words means the +compiler can emit AVX2 vectorized loops when processing multiple words at once, +and `u64::trailing_zeros()` gives the position of the lowest set bit in a +single instruction. + +### `word_index_and_mask` + +```rust +fn word_index_and_mask>(elem: T) -> (usize, Word) { + let elem = elem.into(); + (elem / WORD_BITS, 1 << (elem % WORD_BITS)) +} +``` + +Every insert, remove, and contains call bottoms out here. The division by 64 +and the bit mask are both powers-of-two operations, so the compiler emits a +right-shift and a single-bit-set with no division instruction. + +### `bitwise` — vectorizable word-level operation + +```rust +fn bitwise(out_vec: &mut [Word], in_vec: &[Word], op: Op) -> bool +where Op: Fn(Word, Word) -> Word +{ + let mut changed = 0; + for (out, in_) in out_vec.iter_mut().zip(in_vec) { + let old = *out; + let new = op(old, *in_); + *out = new; + changed |= old ^ new; // accumulate changed bits + } + changed != 0 +} +``` + +The return value (whether any bit changed) is computed via `|= old ^ new` +rather than `changed |= (old != new)`. The comment in the source explains why: +the `!=` form forces the compiler to materialize a boolean on each iteration, +preventing vectorization. Accumulating the XOR of all changed words into a +single `u64` allows the auto-vectorizer to process multiple words in parallel +with SIMD instructions and check the result once at the end. + +This function powers `BitSet::union`, `intersect`, and `subtract`. + +--- + +## `BitSet` — dense fixed-size bitset + +```rust +pub struct BitSet { + domain_size: usize, + words: Vec, + marker: PhantomData, +} +``` + +`BitSet` is the primary type. It represents a subset of `{0, …, domain_size-1}` +with one bit per element, packed into `ceil(domain_size / 64)` words. The +`domain_size` is fixed at construction; use `GrowableBitSet` if you need +runtime growth. + +### Construction + +```rust +BitSet::new_empty(domain_size) // all zeros +BitSet::new_filled(domain_size) // all ones, then clear_excess_bits() +``` + +`new_filled` allocates all-ones words and then calls `clear_excess_bits` to +zero the unused high bits of the last word. Without this step, operations like +`superset` and `count` would read garbage bits past the domain boundary. + +### Core operations + +| Method | Semantics | Returns | +|--------|-----------|---------| +| `insert(elem)` | sets the bit | whether it changed | +| `remove(elem)` | clears the bit | whether it changed | +| `contains(elem)` | tests the bit | `bool` | +| `union(other)` | `self \|= other` | whether it changed | +| `subtract(other)` | `self &= !other` | whether it changed | +| `intersect(other)` | `self &= other` | whether it changed | +| `insert_all()` | fill then `clear_excess_bits` | — | +| `inverse()` | `!self` then `clear_excess_bits` | — | +| `superset(other)` | `(self & other) == other` | `bool` | +| `is_empty()` | all words zero | `bool` | +| `count()` | popcount sum | `usize` | +| `ensure(min)` | grow if needed, new words = 0 | — | + +`union`, `subtract`, and `intersect` are defined against the operation traits +(`UnionIntoBitSet`, `SubtractFromBitSet`) rather than directly on +`&BitSet`. This means `bitset.union(&hybrid)` works equally well, because +`HybridBitSet` and `SparseBitSet` also implement `UnionIntoBitSet`. + +### `BitIter` — trailing-zeros iteration + +```rust +pub struct BitIter<'a, T> { + word: Word, // current word with visited bits cleared + offset: usize, // bit offset of current word + iter: slice::Iter<'a, Word>, + marker: PhantomData, +} +``` + +The iterator uses `u64::trailing_zeros()` to find the next set bit in the +current word in O(1), then clears it with `word ^= 1 << bit_pos`. When `word` +reaches zero, it advances to the next word. + +The initial state uses a degenerate offset trick: + +```rust +word: 0, +offset: usize::MAX - (WORD_BITS - 1), +``` + +On the first `next()` call, `word == 0` so the iterator immediately advances +to the first real word, setting `offset = offset.wrapping_add(WORD_BITS)`. +Because `usize::MAX - 63 + 64 == 0` (wrapping), `offset` becomes 0 correctly +without needing a separate "started" flag. + +Elements are yielded in ascending order, which is a natural consequence of +iterating words in order and using `trailing_zeros` within each word. + +--- + +## `GrowableBitSet` — auto-growing dense bitset + +```rust +pub struct GrowableBitSet { + bit_set: BitSet, +} +``` + +A thin wrapper around `BitSet` that calls `bit_set.ensure(elem.into() + 1)` +before every `insert` and `remove`. `contains` silently returns `false` for +indices beyond the current domain rather than panicking. Used when the final +domain size is not known at construction time. + +--- + +## `SparseBitSet` — small-element sparse set + +```rust +pub(super) const SPARSE_MAX: usize = 8; + +pub struct SparseBitSet { + pub(super) elems: ArrayVec, +} +``` + +`SparseBitSet` stores up to 8 elements as a sorted `ArrayVec` on the stack — +no heap allocation. Elements are kept in ascending order; `insert` performs +a linear scan to find the insertion point and shifts the remaining elements +right. Because `SPARSE_MAX` is 8, this scan touches at most 7 comparisons. + +`SparseBitSet` is deliberately module-private: callers only use it through +`HybridBitSet`. It exists as a named type only so the matrix types can +reference `SPARSE_MAX` and so that `SparseBitSet` can implement the operation +traits for cross-type operations. + +--- + +## `HybridBitSet` — adaptive sparse/dense bitset + +```rust +pub enum HybridBitSet { + Sparse(SparseBitSet), + Dense(BitSet), +} +``` + +`HybridBitSet` starts as `Sparse` (the default constructor returns +`Sparse(SparseBitSet::new_empty())`, which is a constant function and requires +no allocation). When the 9th distinct element would be inserted, it converts +to `Dense`: + +```rust +HybridBitSet::Sparse(sparse) => { + // full and element is not already present → promote + let mut dense = sparse.to_dense(domain_size); + let changed = dense.insert(elem); + *self = HybridBitSet::Dense(dense); + changed +} +``` + +`Dense` never demotes back to `Sparse`, even if elements are removed. This +one-way transition avoids the complexity of tracking when to shrink and keeps +the remove path trivial. + +The `domain_size` is passed at insert time rather than stored in the +`HybridBitSet` itself. This is deliberate: in contexts like `SparseBitMatrix` +where rows share a column count, storing `domain_size` per row would double +the overhead of uninstantiated rows. + +### `clone_from` optimization + +`HybridBitSet::clone_from` avoids re-allocating the dense `Vec` when +cloning a dense set into another dense set: + +```rust +if let HybridBitSet::Dense(dst) = self { + match source { + HybridBitSet::Sparse(src) => { dst.clear(); dst.reverse_union_sparse(src); } + HybridBitSet::Dense(src) => dst.clone_from(src), + } +} else { + *self = source.clone() +} +``` + +The `Dense(dst) ← Dense(src)` path calls `Vec::clone_from` which reuses the +heap allocation if capacities are compatible, avoiding a `malloc`/`free` pair +per dataflow iteration. + +### `reverse_union_sparse` + +When unioning a `Dense` set with a `Sparse` set, the result must be `Dense`. +Rather than re-checking every bit of both sets, `reverse_union_sparse` walks +the sorted sparse elements, groups them by word, and ORs each group into the +corresponding dense word in a single pass. It simultaneously detects whether +any bits existed in the dense set that were not in the sparse set — this is +the "reverse" part — which is needed to report whether the union changed +anything without doing a second pass. + +--- + +## Operation traits + +The crate defines four operation traits so that operations can be dispatched +across the set types without monomorphizing every combination by hand: + +```rust +trait UnionIntoBitSet { fn union_into(&self, other: &mut BitSet) -> bool; } +trait SubtractFromBitSet { fn subtract_from(&self, other: &mut BitSet) -> bool; } +trait UnionIntoHybridBitSet { fn union_into(&self, other: &mut HybridBitSet, domain_size: usize) -> bool; } +trait SubtractFromHybridBitSet { fn subtract_from_h(&self, other: &mut HybridBitSet) -> bool; } +``` + +`FullBitSetOperations` is a blanket supertrait that collects all four; +types that implement all of them (currently `BitSet` and `HybridBitSet`) +satisfy it automatically. + +The implementation matrix for `union_into`: + +| `self` type | `other` type | strategy | +|-------------|-------------|----------| +| `BitSet` | `BitSet` | `bitwise(|)` | +| `SparseBitSet` | `BitSet` | insert each sparse elem | +| `HybridBitSet::Sparse` | `BitSet` | insert each sparse elem | +| `HybridBitSet::Dense` | `BitSet` | `bitwise(|)` | +| `HybridBitSet` | `HybridBitSet::Sparse` | element-by-element, may promote | +| `HybridBitSet::Dense` | `HybridBitSet::Sparse` | `reverse_union_sparse` | +| `BitSet` | `HybridBitSet::Sparse` | clone dense + `reverse_union_sparse` | +| anything | `HybridBitSet::Dense` | `bitwise(|)` | + +--- + +## `BitMatrix` — dense 2D bit matrix + +```rust +pub struct BitMatrix { + num_rows: usize, + num_columns: usize, + words: Vec, + marker: PhantomData<(R, C)>, +} +``` + +`BitMatrix` lays all rows contiguously in a single flat `Vec`. Row `r` +occupies words `[r * words_per_row, (r+1) * words_per_row)` where +`words_per_row = ceil(num_columns / 64)`. + +Key operations: + +- **`insert(row, col)`** / **`contains(row, col)`** — single word access via + `range(row)` + `word_index_and_mask(col)`. +- **`union_rows(read, write)`** — `words[write] |= words[read]` wordwise, + using `pick2_mut` to obtain two mutable slices from the same backing `Vec`. +- **`union_row_with(with: &BitSet, write: R)`** — OR a standalone bitset + into one row, used when seeding a dataflow analysis with initial live-out sets. +- **`intersect_rows(r1, r2)`** — returns the `Vec` of columns set in both rows. +- **`iter(row)`** — yields columns set in a row via `BitIter`. + +`BitMatrix` is used in the `mir_opt` ADCE pass to represent the post-dominance +frontier relation: `matrix[block]` is the set of blocks on whose +post-dominance frontier `block` lies. + +--- + +## `SparseBitMatrix` — per-row hybrid matrix + +```rust +pub struct SparseBitMatrix { + num_columns: usize, + num_rows: usize, + rows: Vec>, + _row_ty: PhantomData R>, +} +``` + +Unlike `BitMatrix`, `SparseBitMatrix` does not pre-allocate storage for all +rows. The `rows` vector is grown lazily: `ensure_row(r)` calls +`ensure_contains_elem(r.into(), HybridBitSet::new_empty)` from `stdx::vec`, +filling skipped rows with empty sets. Rows that are never written cost +nothing beyond the `HybridBitSet::new_empty()` constant (a `Sparse` variant +with an empty `ArrayVec` — no heap allocation). + +Each row is a `HybridBitSet`, so sparsely populated rows remain as +`ArrayVec` and only dense rows upgrade to a `Vec`. + +Additional operations over `BitMatrix`: + +- **`union_rows(read, write)`** — uses `pick2_mut` on `self.rows` to union two + row `HybridBitSet` values without cloning. +- **`inverse()`** — produces a transposed `SparseBitMatrix` by iterating + all set bits and inserting the (column, row) pair into the result. +- **`row(r)`** → `Option<&HybridBitSet>` — `None` for uninstantiated rows. + +### `GrowableSparseBitMatrix` + +A newtype over `SparseBitMatrix` that additionally grows `num_columns` when an +inserted column index exceeds the current bound, and calls `dense.ensure(…)` on +any dense rows before inserting so they don't panic on out-of-range bits. +Used in the `mir_opt` taint propagation pass where the domain can grow as new +values are discovered. + +--- + +## Worked example: SCCP feasible-edge tracking + +The SCCP pass in `mir_opt` maintains a `TiVec` of +feasible successor sets, where `Successors` is effectively a small `BitSet`. +For the live-block check it uses `BitSet`: + +``` +domain_size = number of basic blocks in the function (e.g. 12) +words = [0u64; 1] // one word covers 64 blocks +``` + +When the SCCP solver decides block 5 is reachable, it calls: + +```rust +feasible.insert(Block(5)); +// word_index = 5/64 = 0, mask = 1 << 5 = 0b100000 +// words[0] |= 0b100000 → changed = true +``` + +The worklist then iterates over `feasible.iter()` using `BitIter`: + +``` +initial: word=0, offset=usize::MAX-63 +next(): word==0 → advance; word=0b100000, offset=0 +trailing_zeros(0b100000)=5 → yield Block(5); word ^= 0b100000 → word=0 +next(): word==0 → advance; no more words → None +``` + +Block 5 is the only element yielded, in O(words_count) time regardless of +domain size. + +--- + +## Key design decisions + +**`u64` word type throughout.** Using `u64` rather than `usize` or `u32` +fixes the word width to 64 bits on all platforms. This makes the data layout +and `WORD_BITS` constant predictable across 32-bit and 64-bit targets and lets +the compiler use 64-bit SIMD lanes without platform-specific code. + +**Separate `sparse.rs` from `hybrid.rs`.** `SparseBitSet` is module-private +(`pub(super)`); it exists to give `HybridBitSet` a named inner type and to +centralize the sorted-`ArrayVec` logic. Callers that want a small-set +representation use `HybridBitSet` directly. This keeps the public API simple +while allowing the implementation to be split across files. + +**`SPARSE_MAX = 8`.** Eight is enough to represent a basic block's typical +number of live definitions at any given program point without heap allocation. +For the GVN equivalence classes stored as `HybridBitSet`, most classes +stay sparse; only large equivalence groups (e.g. all definitions of a common +subexpression in a loop) become dense. + +**One-way Sparse → Dense transition.** Once a `HybridBitSet` promotes to +`Dense`, it never demotes. The savings from avoiding an unnecessary dense +representation are small compared to the complexity of tracking when +to shrink. The optimization instead focuses on the common case: sets that stay +sparse through their entire lifetime. + +**`changed |= old ^ new` over `changed |= old != new`.** This is the critical +micro-optimization that enables auto-vectorization of `union`, `subtract`, and +`intersect`. A boolean check (`!=`) materializes a 1-byte value per iteration; +the bitwise XOR accumulation produces a word-width value that the SIMD loop +can reduce at the end with a single comparison. + +**`SparseBitMatrix` stores `num_rows` separately from `rows.len()`.** The +`rows` vector is shorter than `num_rows` when trailing rows have never been +set. `num_rows` is the declared bound; `rows.len()` is the high-water mark of +rows that have been instantiated. This allows the matrix to be declared with +its full logical size without paying for uninstantiated rows. From 522a82c3c10a08a8c551537cab2d76ac384665e8 Mon Sep 17 00:00:00 2001 From: philippevelha Date: Sun, 24 May 2026 09:32:47 +0200 Subject: [PATCH 15/28] docs: add bforest crate INTERNALS --- docs/bforest/INTERNALS.md | 437 ++++++++++++++++++++++++++++++++++++++ 1 file changed, 437 insertions(+) create mode 100644 docs/bforest/INTERNALS.md diff --git a/docs/bforest/INTERNALS.md b/docs/bforest/INTERNALS.md new file mode 100644 index 00000000..848cf1f1 --- /dev/null +++ b/docs/bforest/INTERNALS.md @@ -0,0 +1,437 @@ +# `bforest` — B+-tree forest + +**Location:** `lib/bforest/` +**Origin:** A vendored fork of `cranelift_bforest`, modified to add features +needed by OpenVAF (notably `Map::merge` and `Map::insert_sorted`). +**Role:** A family of ordered map and set types that share a single node pool. +The design optimizes for many small trees (one per basic block, one per live +variable, etc.) rather than one large tree, and for keys and values that are +small copyable types (typically `u32` newtypes). + +Cross-links: [stdx INTERNALS](../stdx/INTERNALS.md) · +[mir\_build INTERNALS](../mir_build/INTERNALS.md) · +[ARCHITECTURE](../../ARCHITECTURE.md) + +--- + +## Why not `BTreeMap`? + +The crate's own README states this plainly: **these are not faster general-purpose +data structures**. The trade-offs are different: + +| Property | `std::BTreeMap` | `bforest::Map` | +|----------|----------------|---------------| +| Empty tree size | 24 bytes (`ptr + len + cap`) | 4 bytes (`PackedOption`) | +| Clear N trees | O(N × tree_size) | O(1) for the whole forest | +| Key ordering | `Ord` on the key type | external `Comparator` object | +| Keys/values | any `T` | `Copy`, optimized for 32-bit | +| Allocation | per-tree | pooled across the whole forest | + +The last row is the main win. When the MIR builder maintains a use-def map for +each of several hundred blocks, clearing the entire set of maps between queries +is O(1): `MapForest::clear()` calls `NodePool::clear()` which is a single +`Vec::clear()`. + +--- + +## Module map + +| File | Contents | +|------|----------| +| `src/lib.rs` | Constants, `Comparator` trait, `Forest` trait, `Node`, `SetValue`, helpers | +| `src/node.rs` | `NodeData`, `SplitOff`, `Removed`; all node-level operations | +| `src/pool.rs` | `NodePool` — allocation, free list, tree freeing | +| `src/path.rs` | `Path` — root-to-leaf cursor; find/insert/remove | +| `src/map.rs` | `MapForest`, `Map`, `MapCursor`, `MapIter` | +| `src/set.rs` | `SetForest`, `Set`, `SetCursor`, `SetIter`, `RevSetIter` | + +--- + +## Constants and shared types + +```rust +const INNER_SIZE: usize = 8; // branching factor of inner nodes +const MAX_PATH: usize = 16; // maximum tree height (never reached) +``` + +`INNER_SIZE = 8` is chosen so that an inner node occupies exactly one 64-byte +cache line when keys and node references are 32 bits each: + +``` +Inner node: u8 size + [u32; 7] keys + [u32; 8] trees = 1 + 28 + 32 = 61 bytes +``` + +(Padding rounds to 64 bytes.) A map leaf node likewise fits in 64 bytes: + +``` +Leaf node (map): u8 size + [K; 7] + [V; 7] = 1 + 28 + 28 = 57 bytes +``` + +A set leaf has no value, so it can store more keys: + +``` +Leaf node (set): u8 size + [K; 15] + [(); 15] = 1 + 60 + 0 = 61 bytes +``` + +`MAX_PATH = 16` is a worst-case bound. With branching factor 4 (the minimum, +when all inner nodes are half-full), a tree holding 2³² entries would need +log₄(2³²) = 16 levels. In practice, OpenVAF trees have far fewer entries and +at most 3–4 levels. + +### `Node(u32)` — node reference + +```rust +struct Node(u32); +impl_idx_from!(Node(u32)); +``` + +A 32-bit index into `NodePool::nodes`. `impl_idx_from!` from `stdx` gives +bidirectional conversions with `u32` and `usize`, and implements `ReservedValue` +with `u32::MAX` as the sentinel — enabling `PackedOption`. + +### `Comparator` — context-bearing key comparison + +```rust +pub trait Comparator { + fn cmp(&self, a: K, b: K) -> Ordering; + fn search(&self, k: K, s: &[K]) -> Result { + s.binary_search_by(|x| self.cmp(*x, k)) + } +} + +impl Comparator for () { … } // trivial impl +``` + +Keys do not need to implement `Ord` themselves. The comparator is an external +object passed to every mutating operation. This allows keys to be small opaque +index types (e.g. `Value(u32)`) that derive their ordering from a separate +table rather than from their numeric value. + +### `Forest` trait — associated array types + +```rust +trait Forest { + type Key: Copy; + type Value: Copy; + type LeafKeys: Copy + BorrowMut<[Self::Key]>; + type LeafValues: Copy + BorrowMut<[Self::Value]>; + fn splat_key(key: Self::Key) -> Self::LeafKeys; + fn splat_value(value: Self::Value) -> Self::LeafValues; +} +``` + +`Forest` is an internal seam. The two concrete implementations are: + +- **`MapTypes`**: `LeafKeys = [K; 7]`, `LeafValues = [V; 7]` +- **`SetTypes`**: `LeafKeys = [K; 15]`, `LeafValues = [SetValue; 15]` + +`SetValue` is a zero-sized type `struct SetValue()`. Because `[SetValue; 15]` +is zero bytes, the set leaf holds 15 keys in the same space a map leaf uses for +7 key-value pairs. + +`splat_key` and `splat_value` initialize a freshly allocated array by +replicating a single value across all slots. This sidesteps the need for a +`Default` bound on `K` or `V` — the first entry is duplicated into every slot +so the array is fully initialized without a sentinel value. + +--- + +## `NodeData` — the B+-tree node + +```rust +pub(super) enum NodeData { + Inner { + size: u8, // number of keys (sub-trees = size + 1) + keys: [F::Key; INNER_SIZE - 1], // [7] discriminating keys + tree: [Node; INNER_SIZE], // [8] child node references + }, + Leaf { + size: u8, + keys: F::LeafKeys, + vals: F::LeafValues, + }, + Free { next: Option }, // free-list link +} +``` + +`NodeData` is `Copy` — it is stored directly in a `Vec>` +without boxing. The manual `Copy` impl (not derived) avoids requiring +`F: Copy`. + +### Inner node invariant + +In an inner node with `size = s`, there are `s` keys and `s + 1` sub-trees. +Key `keys[i]` separates `tree[i]` from `tree[i+1]`: every key in the sub-tree +rooted at `tree[i]` is strictly less than `keys[i]`, and every key in +`tree[i+1]` is greater than or equal to `keys[i]`. + +### Operations + +**`split(insert_index)`** — splits a full node in half. The `insert_index` +hint biases the split point so that after the insertion is retried, both +halves are as even as possible: + +```rust +fn split_pos(len: usize, ins: usize) -> usize { + if ins <= len / 2 { len / 2 } else { (len + 1) / 2 } +} +``` + +Returns a `SplitOff` containing the new right-hand node's data and the +critical key that separates the two halves. + +**`balance(crit_key, rhs)`** — after an underflow, attempts to merge with the +right sibling. If the combined entry count fits in one node, everything moves +to the right node and the left node is left empty (returns `None`). Otherwise +entries are redistributed evenly (returns the new critical key for the right +node). + +**`Removed`** — the status returned after a removal: + +| Variant | Meaning | +|---------|---------| +| `Healthy` | Node still has ≥ half capacity | +| `Rightmost` | Rightmost entry removed; path needs to advance | +| `Underflow` | Below half capacity; must rebalance with sibling | +| `Empty` | No entries left; node must be removed from parent | + +--- + +## `NodePool` — the shared allocator + +```rust +pub(super) struct NodePool { + nodes: Vec>, + freelist: Option, +} +``` + +`NodePool` is a flat `Vec` with an intrusive free list. Freed nodes are +overwritten with `NodeData::Free { next: freelist }` and the freelist head is +updated. Allocation checks the freelist first; if empty, it pushes a new entry +onto the vector. + +``` +alloc_node: + if freelist → pop head, overwrite with new data + else → vec.push(data), return len-1 as Node + +free_node: + nodes[node] = Free { next: freelist } + freelist = Some(node) +``` + +**`free_tree(node)`** — recursively frees an entire sub-tree. The recursion +depth is bounded by `MAX_PATH = 16`, so stack overflow is not possible. +Freeing an entire tree without touching the parent's free list first would +leave dangling `Node` references; the recursive approach ensures that inner +nodes are freed after their children. + +**`clear()`** — calls `Vec::clear()`, which drops all elements in O(N) time +where N is the number of allocated nodes. But because all trees share the pool, +a single `clear()` destroys every tree in the entire forest simultaneously. +This is the key property for clearing all block-local maps between MIR passes. + +--- + +## `Path` — root-to-leaf traversal state + +```rust +pub(super) struct Path { + size: usize, + node: [Node; MAX_PATH], + entry: [u8; MAX_PATH], + unused: PhantomData, +} +``` + +`Path` is `Copy` and stack-allocated. It records the path from the root +down to the current leaf: `node[0]` is always the root, `node[size-1]` is the +current leaf node, and `entry[l]` is the child index taken at level `l`. +`size = 0` is the canonical off-the-end position. + +### `find(key, root, pool, comp)` + +Walks from the root to the leaf: + +1. At each inner node, binary-search the key array. If `key` is found at + position `i`, follow `tree[i+1]` (the `>=` branch). If not found, follow + `tree[i]` (the `<` branch). +2. At the leaf, binary-search again. If found, record the position and return + `Some(value)`. If not found, record the insertion position and return `None`. + +After `find`, the path points either at the found entry or at the position +where the key would be inserted to maintain sorted order. + +### `insert(key, value, pool)` + +Attempts `try_leaf_insert` (in-place shift). If the leaf is full, calls +`split_and_insert`: + +``` +split_and_insert (bottom-up loop): + for level from leaf to root: + split current node → lhs (current) + rhs (new) + determine which half the insert position falls in → update path + insert into the not-full half + if parent had room → insert rhs into parent; done + // reached level 0 without finding room → allocate a new root + new_root = Inner(orig_root, crit_key, rhs_node) + path.size += 1; prepend root to path arrays +``` + +The height of the tree grows by 1 only when the root itself is split. + +### `remove(pool)` + +Removes the entry at the current position: + +1. Call `leaf_remove` → `Removed` status. +2. `Healthy`: done (update critical key if we removed the front entry). +3. `Rightmost`: advance path to next node. +4. `Underflow`: call `balance` with the right sibling. +5. `Empty`: recursively remove the now-empty node from its parent. + +After all rebalancing, prune the root if it has shrunk to a single sub-tree +(the single child becomes the new root). + +--- + +## `Map` and `MapForest` + +```rust +pub struct Map { + root: PackedOption, // 4 bytes; None → empty + unused: PhantomData<(K, V)>, +} + +pub struct MapForest { + nodes: NodePool>, +} +``` + +`Map` is 4 bytes. An empty map is `root = PackedOption::None` (the +`u32::MAX` sentinel), which costs nothing beyond those 4 bytes. No heap +allocation happens until the first `insert`. + +All map operations take a `&MapForest` (or `&mut MapForest`) and a +`&dyn Comparator`: + +```rust +map.get(key, &forest, &comp) // → Option +map.get_or_less(key, &forest, &comp) // → Option<(K, V)> (closest ≤ key) +map.insert(key, value, &mut forest, &comp) // → Option (old value) +map.remove(key, &mut forest, &comp) // → bool +map.iter(&forest, &comp) // → MapIter (ascending) +map.cursor(&mut forest, &comp) // → MapCursor (positioned) +``` + +### `merge` and `insert_sorted` + +These are the OpenVAF additions over the upstream cranelift version. + +**`map.merge(other, &mut forest, &comp, f)`** — absorbs `other` into `self` +in a single sorted pass. `f(existing, incoming) -> V` resolves conflicts. +Implemented via `insert_sorted` with `other`'s iterator as the source. + +**`map.insert_sorted(next_src, &mut forest, &comp, f)`** — takes a closure +that yields `(K, T)` pairs in ascending key order (no duplicates) and merges +them into the map. The cursor advances through the destination map in lockstep, +avoiding redundant `find` calls. This is O(N + M) rather than O(M log N) for +a bulk insert of M entries into a map of size N. + +--- + +## `Set` and `SetForest` + +```rust +pub struct Set { + root: PackedOption, + unused: PhantomData, +} +``` + +`Set` is identical in structure to `Map` but uses `SetTypes` +which gives leaves 15 keys instead of 7. The public API mirrors `Map` minus +the value parameter: + +```rust +set.contains(key, &forest, &comp) +set.insert(key, &mut forest, &comp) // → bool (was absent) +set.remove(key, &mut forest, &comp) // → bool (was present) +set.clear(&mut forest) // frees the tree +set.retain(&mut forest, |k| bool) // filter in-place +set.iter(&forest, &comp) // → SetIter (ascending) +set.cursor(&mut forest, &comp) // → SetCursor +``` + +`RevSetIter` provides reverse (descending) iteration by calling `prev` on +the path repeatedly. + +--- + +## Worked example: block-local use-def map in `mir_build` + +The SSA builder in `mir_build` tracks, for each basic block and each `Place` +variable, the `Value` that was last written. This is a map from `Place → Value`. +Because a block can define only a few variables before being sealed, the typical +tree has 1–5 entries and never overflows a single leaf node. + +Using `bforest`: + +```rust +let mut forest: MapForest = MapForest::new(); +let mut block_map: Map = Map::new(); +// costs 4 bytes; no allocation + +// Record a write to Place(3) in this block +block_map.insert(Place(3), Value(7), &mut forest, &()); +// first insert: allocates one leaf node in forest.nodes + +// Look up the last writer of Place(3) +let val = block_map.get(Place(3), &forest, &()); +// → Some(Value(7)), path created on stack, no allocation + +// At end of pass, clear all block maps in O(1) +forest.clear(); +// all 200 block maps are cleared by a single Vec::clear() +``` + +If the maps were `BTreeMap` instead, clearing 200 maps would +touch every allocator-level node individually. With `MapForest`, the entire +pool is discarded in a single operation. + +--- + +## Key design decisions + +**`Map` is 4 bytes.** Storing only a `PackedOption` (one `u32`) +means that an array of 1000 empty maps costs 4 KB — the same as a single +`Vec<_>` header. The forest owns the memory; the maps are just handles. + +**Free list via `NodeData::Free`.** Rather than maintaining a separate free +list allocation, freed nodes are overwritten with the `Free` variant and linked +through the same `Vec`. This keeps the pool compact and avoids a second +allocation per pool. + +**`splat_key`/`splat_value` avoid `Default`.** Filling a newly allocated leaf +array by replicating the first key avoids requiring `K: Default`. This matters +for index types that intentionally have no "zero" value, or whose `Default` +might be semantically inappropriate (e.g. `u32::MAX` is the reserved value for +`PackedOption`). + +**`Path` is `Copy` and stack-allocated.** Every tree operation creates a +`Path::default()` on the stack, uses it for the operation, and then drops it. +No heap allocation is needed for traversal state. The fixed-size `[Node; 16]` +and `[u8; 16]` arrays are sized for the theoretical maximum depth; in practice +the compiler will optimize away the unused tail. + +**Set leaves hold 15 keys, not 7.** Since `SetValue` is zero bytes, a set +leaf wastes half its space if it uses the same 7-entry layout as a map leaf. +The `SetTypes` implementation doubles the key count, halving the number of +leaf nodes and inner nodes needed for any given set size. + +**`INNER_SIZE = 8` targets one cache line.** The branch factor and array sizes +are chosen so that each node — inner or leaf — fits in exactly 64 bytes for +32-bit keys and values. Searching within a node is a cache-friendly linear or +binary scan of at most 7–15 elements, all in a single cache line. From 40f46b088ad6de74e0f12c8ac53e485250355914 Mon Sep 17 00:00:00 2001 From: philippevelha Date: Sun, 24 May 2026 09:57:25 +0200 Subject: [PATCH 16/28] docs: add list_pool crate INTERNALS --- docs/list_pool/INTERNALS.md | 275 ++++++++++++++++++++++++++++++++++++ 1 file changed, 275 insertions(+) create mode 100644 docs/list_pool/INTERNALS.md diff --git a/docs/list_pool/INTERNALS.md b/docs/list_pool/INTERNALS.md new file mode 100644 index 00000000..5ae6291d --- /dev/null +++ b/docs/list_pool/INTERNALS.md @@ -0,0 +1,275 @@ +# `list_pool` — Pooled variable-length lists + +**Location:** `lib/list_pool/` +**Origin:** Vendored from `cranelift_entity`, adapted for OpenVAF. +**Role:** Provides `ListHandle` — a 4-byte handle to a variable-length list +of `T` values — backed by a shared `ListPool`. The design targets the same +niche as `bforest`: many small lists that share a pool and can be cleared as a +group in O(1). + +Cross-links: [bforest INTERNALS](../bforest/INTERNALS.md) · +[stdx INTERNALS](../stdx/INTERNALS.md) · +[mir INTERNALS](../mir/INTERNALS.md) · +[ARCHITECTURE](../../ARCHITECTURE.md) + +--- + +## Where it is used in OpenVAF + +The MIR defines two type aliases in `mir/src/instructions.rs`: + +```rust +pub type ValueList = list_pool::ListHandle; +pub type ValueListPool = list_pool::ListPool; + +pub type UseList = list_pool::ListHandle; +pub type UseListPool = list_pool::ListPool; +``` + +`ValueList` is embedded directly inside `InstructionData` variants that take a +variable number of operands (e.g. calls, phi nodes). Each `InstructionData` is +stored in a flat `TiVec` inside the `DataFlowGraph`. +Because `ListHandle` is 4 bytes, storing a variable operand list adds +the same cost as storing a single fixed operand — no pointer indirection, no +`Vec` header. + +`UseList` tracks the use-def chain for each `Value`: every time a value is +referenced as an operand, a `Use` record is added to the value's use list. +Both pools live on the `DataFlowGraph`: + +```rust +pub struct DataFlowGraph { + pub value_lists: ValueListPool, + pub use_lists: UseListPool, + // … +} +``` + +--- + +## Memory layout + +The pool is a single `Vec`: + +``` +data: [ … | len | e0 | e1 | … | eN-1 | (pad) | … ] + ↑ ↑ + | └─ ListHandle.index points here (element 0) + └─ length field is one slot before the elements +``` + +Each allocated block occupies a power-of-two number of slots: +- 1 slot for the length field +- up to `block_size - 1` slots for elements + +The length field stores the current count; unused trailing slots hold +`T::reserved_value()`. + +### Size classes + +```rust +fn sclass_size(sclass: SizeClass) -> usize { 4 << sclass } +// sclass 0 → 4 slots (1 length + up to 3 elements) +// sclass 1 → 8 slots (1 length + up to 7 elements) +// sclass 2 → 16 slots (1 length + up to 15 elements) +// … +``` + +`sclass_for_length(len)` computes the smallest size class that holds `len` +elements plus the length slot. The implementation uses a leading-zeros trick: + +```rust +fn sclass_for_length(len: usize) -> SizeClass { + 30 - (len as u32 | 3).leading_zeros() as SizeClass +} +``` + +`| 3` ensures that lengths 0–3 all map to size class 0 (block of 4). For +lengths 4–7 the result is size class 1, and so on, doubling with each class. + +--- + +## `ListHandle` + +```rust +pub struct ListHandle { + index: u32, // offset into pool.data of the first element; 0 = empty + unused: PhantomData, +} +``` + +`index == 0` is the sentinel for the empty list. It is safe because the pool +never allocates at offset 0: the first slot of any block is always the length +field, and `index` always points one past the length field (to element 0 of +the list). Therefore no valid list can have `index == 0`. + +`ListHandle` is 4 bytes. `Default` returns the empty list (index = 0) +without touching the pool. + +### Key operations + +| Method | What it does | +|--------|-------------| +| `is_empty()` | check `index == 0` — no pool access | +| `len(&pool)` | read `pool.data[index - 1]` (the length slot) | +| `as_slice(&pool)` | `&pool.data[index .. index + len]` — a direct slice into the pool | +| `as_mut_slice(&mut pool)` | same, mutable | +| `get(i, &pool)` | `as_slice(pool).get(i).cloned()` | +| `first(&pool)` | `pool.data[index]` — one array access | +| `push(elem, &mut pool)` | append; reallocate to next size class if `new_len` is a power of two ≥ 4 | +| `extend(iter, &mut pool)` | bulk append; uses `grow` for exact-size iterators | +| `insert(i, elem, &mut pool)` | `push` then shift tail right | +| `remove(i, &mut pool)` | shift tail left, then `remove_last` | +| `swap_remove(i, &mut pool)` | swap with last, then `remove_last` | +| `truncate(new_len, &mut pool)` | may reallocate to a smaller class | +| `clear(&mut pool)` | free block to pool's free list; set `index = 0` | +| `take()` | `mem::take(self)` — leaves an empty handle, returns the old one | +| `deep_clone(&mut pool)` | allocate a fresh block and copy contents — does not alias | +| `to_pool(src, dst)` | copy list from one pool into another | + +### Cloning without deep-cloning + +`Clone` is derived for `ListHandle`. The clone has the same `index` as the +original — it is an alias. The comment in the source is explicit: *"Cloning an +entity list does not allocate new memory for the clone. It creates an alias of +the same memory."* Mutating one clone through `as_mut_slice` silently mutates +the other. This is intentional: in the MIR, `InstructionData` is cloned when +duplicating instructions, but the operand list is treated as copy-on-write by +the surrounding pass logic. + +Use `deep_clone` when an independent copy is needed. + +--- + +## `ListPool` + +```rust +pub struct ListPool { + data: Vec, + free: Vec, // free-list heads, one per size class +} +``` + +### Allocation + +`alloc(sclass)` checks `free[sclass]` first. The free list for each size class +is an intrusive singly-linked list embedded in `data`. A free block looks like: + +``` +data[block] = T::from(0) // length = 0 signals "free" +data[block + 1] = T::from(next) // next free block + 1, or 0 for end-of-list +free[sclass] = block + 1 // head points at the "next" field +``` + +The `+ 1` offset means the free-list head is always the index of the `next` +field, not the block start. `0` terminates the list (a value of 0 at the +`next` field means no further free blocks). On allocation, the head is +replaced by the value it points to: + +```rust +self.free[sclass] = self.data[head].into(); // pop head +``` + +If the free list is empty, `data` is extended by `sclass_size(sclass)` slots +filled with `T::reserved_value()`. + +### Reallocation + +`realloc(block, from_sclass, to_sclass, elems_to_copy)` allocates a new block, +copies the first `elems_to_copy` elements (including the length slot at +position 0), and frees the old block. `mut_slices(block0, block1)` splits +`data` to get two non-overlapping mutable slices, allowing a direct +`copy_from_slice` without an intermediate buffer. + +### Growth trigger + +A list grows to the next size class exactly when the new length would be a +power of two ≥ 4, i.e. when `is_sclass_min_length(new_len)` is true: + +```rust +fn is_sclass_min_length(len: usize) -> bool { + len > 3 && len.is_power_of_two() +} +``` + +At that point, the block is too small for even one more element. Conversely, +when removing the last element that would bring the length down to such a +boundary, the block shrinks to the next smaller size class. + +### Clearing + +```rust +pub fn clear(&mut self) { + self.data.clear(); + self.free.clear(); +} +``` + +A single `Vec::clear()` invalidates every list in the pool simultaneously. +The pool keeps its heap allocation for reuse (the capacity remains). This is +the same O(1) global clear pattern as `bforest::MapForest::clear()`. + +--- + +## Worked example: MIR phi node operands + +A phi node in the MIR collects one incoming value per predecessor block. +For a block with three predecessors the phi's value list has three entries. +The `DataFlowGraph` stores it as: + +``` +dfg.value_lists.data: + index: 0 1 2 3 4 5 6 7 + data: [ … | 3 | V5 | V2 | V9 | • | … | … ] + ↑ ↑ + | └─ ListHandle.index = 2 + └─ length field (len = 3) +``` + +`phi.args.as_slice(&dfg.value_lists)` returns `&data[2..5]` = `[V5, V2, V9]` +directly — no allocation, no indirection beyond the one array index computation. + +When the CFG is simplified and a predecessor is eliminated, the pass calls +`phi.args.remove(1, &mut dfg.value_lists)`, which shifts `V9` left and calls +`remove_last(3, pool)`. Since `len - 1 = 2` does not trigger a size-class +shrink (`is_sclass_min_length(3)` is false), the length field is simply +decremented: + +``` +after remove(1): + data: [ … | 2 | V5 | V9 | • | • | … ] +``` + +The block stays in size class 0 (4 slots). No reallocation. + +--- + +## Key design decisions + +**`index == 0` as the empty sentinel.** The pool never occupies slot 0 (the +first slot of any allocated block is the length field, and `index` points one +past it). This gives the empty list a natural representation with no pool +access required for `is_empty()`. + +**Power-of-two size classes starting at 4.** The minimum block size of 4 +(1 length + 3 elements) means that even a one-element list wastes only 2 +slots, and the doubling strategy keeps the average waste below 50% while +bounding the number of reallocations over a sequence of pushes to O(log N). + +**Cloning aliases the pool storage.** Since the caller controls the pool +lifetime, sharing storage between clones is safe as long as the pool is not +cleared while both handles are live. The design explicitly accepts this +trade-off to keep `ListHandle` `Copy`-compatible and 4 bytes wide. + +**`T: ReservedValue + Copy + Into + From`.** The `Into` +and `From` bounds allow the length field and free-list pointers to be +stored as `T` values in the same `Vec`, eliminating the need for a separate +metadata array. The `ReservedValue` bound provides the fill value for +uninitialized slots (`T::reserved_value()`). For `Value(u32)` and `Use(u32)`, +all four bounds are satisfied by the `impl_idx_from!` macro from `stdx`. + +**`T::from(0)` is used for the free-list terminator.** Index 0 is both the +"empty list" sentinel in `ListHandle` and the "end of free list" value in the +pool's `free` vector. Because the pool's `data[0]` is never a valid `next` +pointer (the pool begins allocations from `data.len()` which is at least 4), +this dual use of 0 is safe. From 534b7026429c290810fbd37f970ab2e15d0413bf Mon Sep 17 00:00:00 2001 From: philippevelha Date: Sun, 24 May 2026 10:16:49 +0200 Subject: [PATCH 17/28] docs: add typed_indexmap INTERNALS Covers TiMap (#[repr(transparent)] over IndexMap with typed index I), TiSet (over IndexSet with typed position K), ahash hasher rationale, insert_full/ensure/replace/iter_enumerated API, and usage in hir_lower, mir, sim_back, osdi, and mir_autodiff. Worked example traces an OSDI parameter table build and lookup. Co-Authored-By: Claude Sonnet 4.6 --- docs/typed_indexmap/INTERNALS.md | 259 +++++++++++++++++++++++++++++++ 1 file changed, 259 insertions(+) create mode 100644 docs/typed_indexmap/INTERNALS.md diff --git a/docs/typed_indexmap/INTERNALS.md b/docs/typed_indexmap/INTERNALS.md new file mode 100644 index 00000000..f88d2c6d --- /dev/null +++ b/docs/typed_indexmap/INTERNALS.md @@ -0,0 +1,259 @@ +# `typed_indexmap` — Type-safe ordered maps and sets + +**Location:** `lib/typed_indexmap/` +**Role:** Thin wrappers around `indexmap`'s `IndexMap` and `IndexSet` that +enforce a typed position index at compile time. `TiMap` gives you an +`IndexMap` where the integer position returned by insertion has type `I` +rather than `usize`. `TiSet` does the same for `IndexSet` with +position type `K`. Both use `ahash::RandomState` as the hasher. + +Cross-links: [arena INTERNALS](../arena/INTERNALS.md) · +[stdx INTERNALS](../stdx/INTERNALS.md) · +[mir INTERNALS](../mir/INTERNALS.md) · +[ARCHITECTURE](../../ARCHITECTURE.md) + +--- + +## Crate relationships + +``` +typed_indexmap (lib/typed_indexmap/) + ├─► hir_lower (string interning, node-to-position maps) + ├─► mir (value/block position maps) + ├─► mir_autodiff + ├─► sim_back (signal and parameter tables) + └─► osdi (ABI parameter index tables) +``` + +The crate has two external dependencies: `indexmap 2.x` (insertion-order +hash map/set with O(1) position lookup) and `ahash 0.8` (non-cryptographic +hasher). It does not depend on `stdx` or `arena`. + +--- + +## `TiMap` + +```rust +#[repr(transparent)] +pub struct TiMap { + pub raw: IndexMap, + _marker: PhantomData I>, +} +``` + +`TiMap` is a zero-cost newtype over `IndexMap`. +The `#[repr(transparent)]` guarantees identical memory layout, which allows a +safe `AsRef` conversion implemented via an unsafe pointer cast: + +```rust +impl AsRef> for IndexMap { + fn as_ref(&self) -> &TiMap { + // SAFETY: repr(transparent) — same layout + unsafe { &*(self as *const _ as *const TiMap) } + } +} +``` + +The phantom `fn(I) -> I` (rather than just `I`) makes the marker invariant in +`I` and keeps `TiMap` `Send + Sync` regardless of whether `I` is, because no +`I` value is actually stored. + +### Key methods + +| Method | Signature | What it does | +|--------|-----------|-------------| +| `insert_full` | `(&mut self, K, V) -> (I, Option)` | Insert and return the typed position + displaced value | +| `next_index` | `(&self) -> I` | `self.raw.len().into()` — the index the next insert would get | +| `get_index` | `(&self, I) -> Option<(&K, &V)>` | Position → entry (delegates to `IndexMap::get_index`) | +| `index` | `(&self, &Q) -> Option` | Key → position (delegates to `IndexMap::get_index_of`) | +| `iter_enumerated` | `(&self) -> Iter<'_, I, K, V>` | Yields `(I, (&K, &V))` tuples | +| `keys` | `(&self) -> impl Iterator` | Delegates to `IndexMap::keys` | +| `Index` | `index(&self, I) -> &V` | Panicking position lookup | +| `IndexMut` | `index_mut(&mut self, I) -> &mut V` | Panicking mutable position lookup | + +`insert_full` is the primary constructor: it returns the typed index `I` for +the inserted entry so the caller can store it without an extra `index()` call. +If the key was already present, the old value is returned as `Some(old)` and +the existing position is returned — the map is not re-ordered. + +### Default and construction + +`TiMap::new()` creates an empty map. `TiMap::default()` delegates to +`IndexMap::default()` which uses `ahash::RandomState`. The public `raw` field +gives direct access to the underlying `IndexMap` for operations not wrapped by +`TiMap`. + +--- + +## `TiSet` + +```rust +pub struct TiSet { + pub raw: IndexSet, + _marker: PhantomData K>, +} +``` + +`TiSet` wraps `IndexSet` where `K` is the typed position index and `V` +is the element. Despite the name, `K` is not the key — `V` is both the stored +element and the lookup key. `K` is only a phantom type for the position integer. + +This differs from `TiMap` in one important way: there is no `#[repr(transparent)]` +on `TiSet`. The `AsRef` shortcut is not provided; callers use `tiset.raw` +directly when they need `IndexSet` methods. + +### Key methods + +| Method | Signature | What it does | +|--------|-----------|-------------| +| `ensure` | `(&mut self, V) -> (K, bool)` | Insert if absent; return `(position, was_new)` | +| `insert` | `(&mut self, V) -> bool` | Insert; return `true` if new | +| `replace` | `(&mut self, K, V) -> V` | Insert `new_val` at tail, then `swap_indices` to put it at `index` | +| `index` | `(&self, &Q) -> Option` | Value → position | +| `indices` | `(&self, &[V]) -> impl Iterator` | Batch value→position lookup | +| `unwrap_index` | `(&self, &V) -> K` | Panicking value → position | +| `contains` | `(&self, &Q) -> bool` | Delegates to `IndexSet::contains` | +| `iter_enumerated` | `(&self) -> Iter` | Yields `(K, &V)` tuples | +| `retain` | `(&mut self, FnMut(K, &V) -> bool)` | Retain with typed position exposed | +| `Index` | `index(&self, K) -> &V` | Panicking position lookup | + +`ensure` is the idiom for interning-style tables: insert the value if not +already present, then return its stable position regardless. The boolean lets +the caller know if this was the first occurrence. + +`replace(index, new_val)` is implemented by inserting `new_val` at the tail +(getting a fresh index), then calling `swap_indices` to move it to the desired +position. This in-place replacement keeps the position stable for all other +entries. + +--- + +## The `ahash` hasher choice + +Both `TiMap` and `TiSet` hard-code `ahash::RandomState` as the hasher rather +than `std::collections::hash_map::RandomState` (SipHash 1-3). `ahash` is +non-cryptographic but significantly faster for short keys (integers, short +strings) on modern hardware. The random state is seeded at runtime, so +hash-flooding denial-of-service attacks are still prevented — the trade-off +versus SipHash is purely performance. + +For OpenVAF's use cases (compiler internal tables keyed by integers, interned +strings, or compact model parameter names), the non-cryptographic property is +acceptable and the speed advantage is real. + +--- + +## Where it is used in OpenVAF + +### `hir_lower` — name→node tables + +`hir_lower` uses `TiMap` to build tables that map identifiers to their +lowered HIR nodes. The typed index returned by `insert_full` is then stored +inside the IR to reference entries without repeated hash lookups. + +### `mir` — value and block position maps + +The MIR uses `TiSet` for interning small integer-keyed tables (e.g., mapping +`Value`s to positions in a result set). `iter_enumerated` makes it easy to +emit numbered entries from a set during code generation. + +### `sim_back` and `osdi` — parameter tables + +The simulator back-end and OSDI ABI layer use `TiMap` +style tables to assign stable numeric positions to compact model parameters. +The OSDI ABI requires parameters to appear at fixed integer offsets in the +emitted struct; `TiMap` provides both the lookup (`index(name)`) and the stable +integer position (`insert_full` → `ParamId`) in one structure. + +### `mir_autodiff` — AD variable tables + +`mir_autodiff` uses `TiSet` to collect the set of values that require +derivative computation. `ensure` maps each `Value` to a derivative index +without duplicates; `iter_enumerated` then drives the emission of derivative +instructions. + +--- + +## Worked example: OSDI parameter table + +Consider a compact model with three parameters: + +```verilog-a +parameter real tnom = 27.0; +parameter real is = 1e-14; +parameter real n = 1.0; +``` + +`sim_back` builds: + +```rust +let mut params: TiMap = TiMap::new(); + +let (tnom_id, _) = params.insert_full("tnom".to_string(), ParamInfo { default: 27.0, .. }); +// tnom_id = ParamId(0) + +let (is_id, _) = params.insert_full("is".to_string(), ParamInfo { default: 1e-14, .. }); +// is_id = ParamId(1) + +let (n_id, _) = params.insert_full("n".to_string(), ParamInfo { default: 1.0, .. }); +// n_id = ParamId(2) +``` + +Later, when generating the OSDI struct, the code iterates: + +```rust +for (id, (name, info)) in params.iter_enumerated() { + // id: ParamId, name: &String, info: &ParamInfo + emit_osdi_param(id.into(), name, info.default); +} +``` + +This emits parameter 0 (`tnom`), 1 (`is`), 2 (`n`) in insertion order — +matching the order in which the model's `paramset` is defined, which is the +order OSDI expects. The `id: ParamId` comes directly from the typed position, +not from a separate counter. + +If a later pass needs to look up the offset for `is` by name: + +```rust +let offset: Option = params.index("is"); +// Some(ParamId(1)) +``` + +No re-scanning; `IndexMap::get_index_of` is O(1). + +--- + +## Key design decisions + +**`#[repr(transparent)]` on `TiMap` enables a free `AsRef` cast.** Because +`TiMap` and `IndexMap` have identical layout, +the conversion is a pointer cast with no runtime cost. This lets code that +receives a `&IndexMap` from an external source treat it as a `&TiMap` without +copying or wrapping. `TiSet` does not have this guarantee (no `repr(transparent)`) +so no equivalent cast is provided. + +**Typed phantom index, not a newtype integer.** `I` is never stored — it +only appears in `PhantomData`. The underlying storage remains `usize` (as +`IndexMap` uses internally). This means there is no runtime overhead from +the typed index; the compile-time error you get when mixing up `ParamId` and +`NodeId` is free. + +**`fn(I) -> I` phantom for invariance.** Using `PhantomData I>` +rather than `PhantomData` makes both `TiMap` and `TiSet` invariant in `I`. +This is the conservative choice: covariant phantoms can allow unsound lifetime +substitutions for types that don't actually store `I`. Since `I` is always a +plain integer type in practice (e.g., `u32` wrapped by `impl_idx_from!`), the +variance is invisible at the use site. + +**Hard-coded `ahash` rather than a generic hasher parameter.** The `indexmap` +crate supports `BuildHasher` as a generic parameter; `typed_indexmap` fixes it +to `ahash::RandomState`. This simplifies all the type signatures — no `S: BuildHasher` +bound propagating through every caller — and matches OpenVAF's preference for +concrete fast defaults over maximally generic abstractions. + +**`ensure` over `insert_or_ignore`.** The `ensure(val) -> (K, bool)` API +returns both the position and a flag, making it a single call for the common +interning pattern ("get the position of this value, inserting it if new"). +Splitting this into `contains` + conditional `insert_full` would require two +hash lookups; `ensure` uses `IndexSet::insert_full` which does one. From e237e0c65c21f92cbb3022283ea58418e46a4b62 Mon Sep 17 00:00:00 2001 From: philippevelha Date: Sun, 24 May 2026 10:56:28 +0200 Subject: [PATCH 18/28] docs: add workqueue INTERNALS Covers WorkQueue (FIFO) and WorkStack (LIFO) de-duplicating worklist types, the pop-vs-take distinction (re-insertion vs single-visit semantics), BitSet membership oracle rationale, and public fields design. Worked example traces dead_code_elimination's two-phase reverse-sweep + fixpoint-propagation pattern. Co-Authored-By: Claude Sonnet 4.6 --- docs/workqueue/INTERNALS.md | 231 ++++++++++++++++++++++++++++++++++++ 1 file changed, 231 insertions(+) create mode 100644 docs/workqueue/INTERNALS.md diff --git a/docs/workqueue/INTERNALS.md b/docs/workqueue/INTERNALS.md new file mode 100644 index 00000000..e7de50fc --- /dev/null +++ b/docs/workqueue/INTERNALS.md @@ -0,0 +1,231 @@ +# `workqueue` — De-duplicating worklist structures + +**Location:** `lib/workqueue/` +**Role:** Provides two de-duplicating worklist types — `WorkQueue` (FIFO) +and `WorkStack` (LIFO) — for iterative dataflow algorithms over dense +integer indices. Both pair a `VecDeque`/`Vec` for ordering with a `BitSet` +for O(1) membership, so inserting an element that is already in the queue is +a no-op. `WorkStack` is defined in the source but not used in the current +codebase; all production call sites use `WorkQueue`. + +Cross-links: [bitset INTERNALS](../bitset/INTERNALS.md) · +[mir INTERNALS](../mir/INTERNALS.md) · +[ARCHITECTURE](../../ARCHITECTURE.md) + +--- + +## Crate relationships + +``` +workqueue (lib/workqueue/) + ├─► mir_opt (dead_code_elimination) + └─► mir_autodiff (live_derivatives fixpoint) +``` + +The only dependency is `bitset`. There is no dependency on `stdx`, `arena`, +or the MIR — `workqueue` is a pure data-structure crate and knows nothing +about compiler-specific types. + +--- + +## Type bound + +Both types share the same trait bound on `T`: + +```rust +T: From + Into + Copy + PartialEq + Debug +``` + +- `From + Into` — needed to construct elements from a range + (`0..size`) and to index into the `BitSet`. +- `Copy` — elements are stored by value in both the deque/vec and (implicitly) + the bitset. +- `PartialEq + Debug` — for standard utilities; `PartialEq` is not used inside + the worklist logic itself. + +Any `T` produced by `impl_idx_from!` (from `stdx`) satisfies these bounds. +In practice `T` is always `Inst` (a `u32` newtype). + +--- + +## `WorkQueue` — FIFO de-duplicating queue + +```rust +pub struct WorkQueue { + pub deque: VecDeque, + pub set: BitSet, +} +``` + +### Construction + +| Constructor | What it creates | +|-------------|----------------| +| `with_all(size)` | All elements `0..size` pre-inserted (deque filled, bitset fully set) | +| `with_none(size)` | Empty queue; deque pre-allocated to `size`, bitset empty | + +### Key methods + +| Method | Behaviour | +|--------|-----------| +| `insert(element) -> bool` | Calls `set.insert`. If the bit was not already set, pushes to the **back** of the deque and returns `true`. Otherwise no-op and returns `false`. | +| `pop() -> Option` | Pops from the **front** of the deque, clears the bit, returns the element. FIFO order. The element can be re-inserted after a `pop`. | +| `take() -> Option` | Pops from the **front** without clearing the bit. The element **cannot** be re-inserted; any future `insert` call will find the bit still set and silently discard it. | +| `is_empty() -> bool` | Checks `deque.is_empty()` (not the bitset). | +| `clear()` | Clears both deque and bitset. | +| `extend(iter)` | Filters the iterator through `set.insert`, then extends the deque with the accepted elements. One pass; does not double-insert. | + +### `pop` vs `take` + +The distinction is subtle but intentional: + +- **`pop`** clears the bit after dequeuing. The element is marked "not in queue" + and can be re-inserted by a later `insert` call. This is the standard fixpoint + loop: process an element, potentially re-enqueue it (or its dependents) when + their state changes. + +- **`take`** leaves the bit set after dequeuing. The element will never pass the + `set.insert` guard again, so it is processed exactly once. This is the + "visit each element once" pattern — a topological traversal rather than a + fixpoint. + +`dead_code.rs` uses `take` for its initial reverse-order sweep (visiting every +instruction once), then switches to `insert`/`take` for propagation (each +dead instruction adds its operand-defining instructions back to the queue, but +since the bit is never cleared, each instruction that gets re-added is +processed once more and then permanently excluded). + +### `From>` + +```rust +impl From> for WorkQueue { + fn from(set: BitSet) -> Self { + Self { deque: set.iter().collect(), set } + } +} +``` + +Converts a pre-populated `BitSet` into a work queue. The deque is filled in +bit-iteration order (ascending index). Used in `mir_autodiff` to seed the +initial worklist from a post-order traversal result. + +--- + +## `WorkStack` — LIFO de-duplicating stack + +`WorkStack` is structurally identical to `WorkQueue` except that the +`VecDeque` is replaced by a plain `Vec`: + +- `insert` pushes to the **back** of the vec. +- `pop` pops from the **back** (LIFO). +- `take` pops from the **back** without clearing the bit. + +Everything else — the bitset membership check, `with_all`/`with_none`, +`extend`, `From` — is identical. The LIFO order means the most +recently inserted element is processed first, which matches depth-first +traversal patterns. + +`WorkStack` is not currently used anywhere in the OpenVAF codebase (no files +import it); it exists as an alternative for DFS-based worklist algorithms. + +--- + +## Memory layout + +For a function with `N` instructions: + +``` +WorkQueue: + deque: VecDeque — heap allocation, up to N elements + set: BitSet — ceil(N/64) × 8 bytes of heap +``` + +Both structures are pre-allocated at construction time via `with_none(N)`, so +there are no incremental reallocations during the fixpoint loop — the deque +starts with capacity `N` and the bitset is sized to `N` bits. + +--- + +## Worked example: dead code elimination + +`mir_opt::dead_code_elimination` (`mir_opt/src/dead_code.rs`) removes MIR +instructions whose results are not used and are not in the `output_values` set. + +**Setup.** The queue is constructed with the bitset fully set (`new_filled`), +meaning every instruction starts as a candidate: + +```rust +let mut work_list = WorkQueue { + deque: VecDeque::new(), // empty — seeded by the initial sweep + set: BitSet::new_filled(N), // all N instructions marked +}; +``` + +**Initial sweep.** The pass walks basic blocks in reverse (from the last block +to the first, and within each block from the last instruction to the first). +For each instruction it calls `process(work_list, inst, …)`: + +```rust +fn process(workque: &mut WorkQueue, inst: Inst, func: &mut Function, …) { + if func.dfg.inst_dead(inst, true) + && !func.dfg.inst_results(inst).iter().any(|r| output_values.contains(*r)) + { + func.dfg.zap_inst(inst); + func.layout.remove_inst(inst); + // operand-defining instructions might now be dead + for arg in func.dfg.instr_args(inst) { + if let ValueDef::Result(def_inst, _) = func.dfg.value_def(*arg) { + workque.insert(def_inst); + } + } + } else { + // still live — permanently exclude from future processing + workque.set.remove(inst); + } +} +``` + +When an instruction is found dead, it is removed and its operand-defining +instructions are enqueued (`insert` adds them to the deque if not already +present). When an instruction is found live, its bit is manually cleared so +it will never be processed again. + +**Fixpoint.** After the initial sweep, the main loop drains the queue with +`take` (not `pop`) — since bits are never cleared by `take`, each instruction +is visited at most once in this phase, but can be re-inserted once if a later +elimination makes it newly dead. The loop terminates when the deque is empty. + +This two-phase structure (reverse sweep seeding the queue, then `take`-based +propagation) avoids revisiting instructions that are proven live while allowing +cascading elimination of newly dead operand producers. + +--- + +## Key design decisions + +**`BitSet` as the membership oracle, not `HashSet`.** For dense integer +indices like `Inst`, a `BitSet` is both faster and smaller than a hash set. +`set.insert` is a bit-test-and-set (two memory accesses into the bitset's +backing `Vec`). `HashSet::insert` requires hashing and a hash table +probe. For a function with 1000 instructions the bitset is 16 words (128 +bytes); a `HashSet` would be kilobytes. + +**Public fields `deque` and `set`.** Both fields are `pub`. This is deliberate: +`dead_code.rs` constructs the `WorkQueue` directly using struct literal syntax +(`WorkQueue { deque: VecDeque::new(), set: BitSet::new_filled(N) }`) rather +than a constructor, because it wants a fully-set bitset but an empty deque — +a combination `with_all` would not provide. The public fields also allow the +`set.remove(inst)` call in the live-instruction branch without going through +the queue API. + +**`pop` preserves re-insertion; `take` does not.** The two dequeue methods +support the two dominant worklist patterns in one type without needing two +separate abstractions. A fixpoint algorithm re-inserts work as state changes +(`pop` is correct); a single-pass traversal should visit each element exactly +once (`take` is correct). Providing both in the same type avoids the need for +a wrapper or a flag. + +**`WorkStack` mirrors `WorkQueue` exactly.** The only difference is `Vec` +vs `VecDeque` and push/pop direction. Both share the same invariants and API +surface, so switching between BFS and DFS traversal order is a one-word change +at the call site. From 63e9d7369233cb27ca3e56c54872b10674d017bc Mon Sep 17 00:00:00 2001 From: philippevelha Date: Sun, 24 May 2026 11:30:05 +0200 Subject: [PATCH 19/28] docs: add base_n INTERNALS Covers the 64-character alphabet, three named base constants (CASE_INSENSITIVE=36, ALPHANUMERIC_ONLY=62, MAX_BASE=64), the stack-allocated push_str implementation (least-significant-first + reverse), and the four call sites: MD5-based cache file naming, LLVM local symbol generation, OSDI module UUID encoding, and temp object file extension generation. Co-Authored-By: Claude Sonnet 4.6 --- docs/base_n/INTERNALS.md | 197 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 197 insertions(+) create mode 100644 docs/base_n/INTERNALS.md diff --git a/docs/base_n/INTERNALS.md b/docs/base_n/INTERNALS.md new file mode 100644 index 00000000..48316c39 --- /dev/null +++ b/docs/base_n/INTERNALS.md @@ -0,0 +1,197 @@ +# `base_n` — Integer-to-string encoding in arbitrary bases + +**Location:** `lib/base_n/` +**Role:** Converts a `u128` integer into its string representation in any base +from 2 to 64. The only public API is two functions — `encode` and `push_str` — +and three base-constant exports. No dependencies; pure `std`. + +Cross-links: [stdx INTERNALS](../stdx/INTERNALS.md) · +[ARCHITECTURE](../../ARCHITECTURE.md) + +--- + +## Crate relationships + +``` +base_n (lib/base_n/) + ├─► openvaf (cache file naming) + ├─► mir_llvm (local LLVM symbol generation) + └─► osdi (module UUID → symbol name, temp file extensions) +``` + +`base_n` has no dependencies of its own. It is a leaf utility used in three +places where a compact, filesystem-safe string representation of an integer is +needed. + +--- + +## The alphabet + +```rust +const BASE_64: &[u8; 64] = + b"0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ@$"; +``` + +Digits are assigned in this order: + +| Position | Characters | Notes | +|----------|-----------|-------| +| 0–9 | `0`–`9` | decimal digits | +| 10–35 | `a`–`z` | lowercase | +| 36–61 | `A`–`Z` | uppercase | +| 62 | `@` | | +| 63 | `$` | | + +The three named base constants select subsets of this alphabet: + +| Constant | Value | Alphabet used | +|----------|-------|--------------| +| `CASE_INSENSITIVE` | 36 | digits + lowercase only (safe for case-insensitive filesystems and identifiers) | +| `ALPHANUMERIC_ONLY` | 62 | digits + lowercase + uppercase (no `@`/`$`) | +| `MAX_BASE` | 64 | full alphabet | + +Any base between 2 and 64 (inclusive) is accepted. The `debug_assert` in +`push_str` catches out-of-range bases in debug builds. + +--- + +## API + +### `push_str(n: u128, base: usize, output: &mut String)` + +Appends the base-`base` representation of `n` to `output` without allocating +a separate string. This is the core function; `encode` is a thin wrapper. + +The implementation uses a fixed stack buffer of 128 bytes: + +```rust +let mut s = [0u8; 128]; +let mut index = 0; + +loop { + s[index] = BASE_64[(n % base) as usize]; + index += 1; + n /= base; + if n == 0 { break; } +} +s[0..index].reverse(); +output.push_str(str::from_utf8(&s[0..index]).unwrap()); +``` + +Digits are produced least-significant-first (each iteration takes `n % base`), +then the slice is reversed in place to get most-significant-first order. The +128-byte buffer is large enough for any `u128` in any base ≥ 2: the longest +representation is `u128::MAX` in base 2, which is 128 bits and fits exactly. + +### `encode(n: u128, base: usize) -> String` + +Allocates a fresh `String`, calls `push_str`, and returns it. Use `push_str` +when appending to an existing string to avoid an extra allocation. + +--- + +## Where it is used in OpenVAF + +### `openvaf/src/cache.rs` — cache file naming + +The compiler caches compiled `.osdi` files to avoid recompilation when neither +the source nor the compiler options have changed. The cache key is an MD5 hash +of the source file contents (token-by-token), compiler version, defines, and +lint settings. The 128-bit MD5 digest is converted to a compact filename: + +```rust +let hash = u128::from_ne_bytes(*hash(db, &opts.defines)); +let hash = base_n::encode(hash, base_n::CASE_INSENSITIVE); +format!("{}.osdi", hash) +``` + +`CASE_INSENSITIVE` (base 36) is used so the filename is valid on +case-insensitive filesystems (Windows, macOS HFS+). A `u128` in base 36 +produces at most 25 characters, which is compact and collision-resistant. + +### `mir_llvm/src/context.rs` — local LLVM symbol names + +The LLVM code-generation context needs to generate unique names for internal +(private-linkage) symbols. A monotonically incrementing counter is converted +to a short alphanumeric suffix: + +```rust +pub fn generate_local_symbol_name(&self, prefix: &str) -> String { + let idx = self.local_gen_sym_counter.get(); + self.local_gen_sym_counter.set(idx + 1); + let mut name = String::with_capacity(prefix.len() + 6); + name.push_str(prefix); + name.push('.'); + base_n::push_str(idx as u128, base_n::ALPHANUMERIC_ONLY, &mut name); + name +} +``` + +`ALPHANUMERIC_ONLY` (base 62) is used because LLVM symbol names allow +alphanumeric characters but `@` and `$` have special meaning in some LLVM IR +contexts. The `.` separator before the numeric suffix ensures no collision with +user-defined names (Verilog-A identifiers cannot contain `.`). + +### `osdi/src/compilation_unit.rs` — module UUID → symbol prefix + +Each compiled Verilog-A module has a UUID. The UUID is encoded in base 36 +and used as the OSDI symbol prefix that linkers and simulators use to find +the module's entry points: + +```rust +let sym = base_n::encode(module.info.module.uuid(db) as u128, base_n::CASE_INSENSITIVE); +``` + +### `osdi/src/lib.rs` — temporary object file extensions + +When compiling multiple modules in parallel, temporary object files are named +with unique extensions derived from a counter: + +```rust +let num = base_n::encode((i + 1) as u128, CASE_INSENSITIVE); +let extension = format!("o{num}"); +dst.with_extension(extension) +``` + +This avoids collision between the temporary `.o` files for each module/pass +combination without requiring a separate temp-directory. + +--- + +## Worked example + +```rust +base_n::encode(255u128, 16) // "ff" +base_n::encode(255u128, 36) // "73" +base_n::encode(u128::MAX, 36) // 25-character string +base_n::encode(12345u128, 62) // "3D7" +``` + +For the cache file name use case, an MD5 digest of a typical resistor model +produces a `u128` such as `0x9f3c8a1d…`. In base 36 this becomes something +like `"2k7mxp4jqr9b0n3vd"` — short enough to be a filename, long enough that +collisions are negligible. + +--- + +## Key design decisions + +**Stack-allocated output buffer.** The 128-byte buffer on the stack avoids any +heap allocation inside `push_str`. Since a `u128` in base 2 has exactly 128 +digits, the buffer can never overflow. The function appends to a caller-supplied +`String` rather than returning a new one, so the caller controls allocation. + +**`push_str` as the primary function.** Callers that are building a longer +string (like `generate_local_symbol_name`, which prepends a prefix) avoid +an intermediate allocation by using `push_str` directly. `encode` is a +one-line convenience wrapper for the common "I need a standalone string" case. + +**`u128` as the input type.** MD5 produces 128 bits; UUID-style integers also +fit in 128 bits. Using `u128` as the universal input type means no truncation +is needed at any call site, even for the largest hash values OpenVAF produces. + +**Fixed `BASE_64` alphabet with named constants.** Rather than letting callers +supply their own alphabet, the crate defines one canonical 64-character +alphabet and three named base constants for the three use cases OpenVAF +actually needs. This prevents subtle bugs from custom alphabets (e.g., using +`+`/`/` from base64 in a filesystem context) while keeping the API minimal. From ba18d846cf34062a30bf2b9122d2256996be7ad2 Mon Sep 17 00:00:00 2001 From: philippevelha Date: Sun, 24 May 2026 11:38:18 +0200 Subject: [PATCH 20/28] docs: add mini_harness INTERNALS MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Covers the harness=false/custom-main pattern, Test/Failed/Arguments/TestSummary types, from_dir/from_dir_filtered/from_list constructors, the harness! macro and TestOrTestList trait, run_harness execution model (sort→filter→sequential catch_unwind), and all seven data-test users. Worked example traces the basedb data_tests.rs harness! call. Notes the tests.rs/README.md copy-paste artefacts from base_n. Co-Authored-By: Claude Sonnet 4.6 --- docs/mini_harness/INTERNALS.md | 311 +++++++++++++++++++++++++++++++++ 1 file changed, 311 insertions(+) create mode 100644 docs/mini_harness/INTERNALS.md diff --git a/docs/mini_harness/INTERNALS.md b/docs/mini_harness/INTERNALS.md new file mode 100644 index 00000000..8b6acdd8 --- /dev/null +++ b/docs/mini_harness/INTERNALS.md @@ -0,0 +1,311 @@ +# `mini_harness` — Custom libtest-compatible data-driven test harness + +**Location:** `lib/mini_harness/` +**Role:** A minimal custom test runner that replaces Rust's built-in `libtest` +harness for data-driven tests. It discovers test cases from directories or +lists at runtime, supports the same CLI flags as `cargo test`, catches panics, +and prints output in libtest's `pretty`/`terse` formats. All test execution is +sequential on the main thread. + +Cross-links: [ARCHITECTURE](../../ARCHITECTURE.md) + +--- + +## Why a custom harness? + +Rust's `#[test]` attribute requires `libtest` as the test runner. `libtest` +discovers tests at compile time; it cannot scan a directory for test files at +runtime. Data-driven tests in OpenVAF need to read `.va` source files from +`integration_tests/` and `integration_tests/data/` at runtime and create one +test case per file. + +Setting `harness = false` in a `[[test]]` Cargo.toml section tells Cargo to +compile the test binary without injecting `libtest` and to call the binary's +own `main` function instead. `mini_harness` provides that `main` via the +`harness!` macro, while staying CLI-compatible with `cargo test` (which passes +libtest-style flags to the test binary regardless). + +All seven data-test binaries in OpenVAF set `harness = false` and use +`mini_harness`: + +``` +basedb, hir_def, hir, hir_lower, openvaf, osdi, openvaf-driver +``` + +--- + +## Crate layout + +``` +lib/mini_harness/src/ + lib.rs — public API: Test, Arguments, run_harness, harness!, TestSummary + flags.rs — xflags-generated CLI argument struct (re-exported as Arguments) + printer.rs — (empty) + tests.rs — (copied from base_n; contains no valid mini_harness tests) +``` + +The only external dependency is `xflags 0.3`, used to parse CLI arguments. + +> **Note:** `src/tests.rs` and `README.md` appear to have been copied from the +> `base_n` crate by mistake — both reference `encode`, which does not exist in +> `mini_harness`. They do not affect the library's behaviour. + +--- + +## Core types + +### `Test<'a>` + +```rust +pub struct Test<'a> { + pub name: String, + pub runner: Box Result + 'a>, + pub ignored: bool, +} +``` + +Each test case is a name, a one-shot closure that returns `Result`, and an +ignored flag. The closure is `FnOnce` — it is consumed when run and cannot be +re-run. + +### `Result` and `Failed` + +```rust +pub type Result = std::result::Result; + +pub struct Failed { msg: String } +impl From for Failed { … } +``` + +Any `Display` value can be converted into `Failed` via `?`, so test functions +can use `?` to propagate errors from `std::io`, `anyhow`, or any other +error type that implements `Display`. The `msg` string is printed when the +test fails. + +### `Arguments` + +```rust +pub use flags::Test as Arguments; +``` + +Re-exported from `flags.rs`. Fields: + +| Field | Type | Meaning | +|-------|------|---------| +| `filter` | `Option` | Substring (or exact, with `--exact`) filter; only matching tests run | +| `skip` | `Vec` | Tests whose names contain any of these strings are skipped | +| `exact` | `bool` | Filters match exactly rather than by substring | +| `ignored` | `bool` | Run only ignored tests | +| `include_ignored` | `bool` | Run both ignored and non-ignored tests | +| `list` | `bool` | Print test names and exit without running | +| `nocapture` | `bool` | Accepted but no-op (harness always runs without capture) | +| `format` | `Option` | `pretty` (default) or `terse` | + +`Arguments::parse_cli()` reads `std::env::args_os()` via `xflags` and exits +with code 101 on parse error — matching `libtest`'s exit code for harness +failures. + +### `TestSummary` + +```rust +#[must_use = "Call `exit()` or `exit_if_failed()` to set the correct return code"] +pub struct TestSummary { + pub failed: Vec, + pub passed: u32, + pub ignored: u32, + pub filtered: u32, + pub elapsed: Duration, +} +``` + +`exit()` calls `process::exit(0)` on success or `101` on failure, matching +`libtest`'s convention. `exit_if_failed()` does the same but returns normally +on success — useful when you want to do cleanup after the test run. + +--- + +## Test constructors + +### `Test::new` — single closure + +```rust +Test::new("my_test", &|| { /* ... */ Ok(()) }) +``` + +Wraps a `Fn() -> Result` as a single named test. + +### `Test::from_dir` — one test per file in a directory + +```rust +Test::from_dir("name", &runner, &ignore_fn, dir) +``` + +Calls `read_dir(dir)` and creates one `Test` per directory entry. The test +name is `"{name}::{filename}"`. The `ignore` predicate marks tests as ignored +without removing them from the list (they are still printed with `--list`). + +`from_dir` is a thin wrapper around `from_dir_filtered`, which adds a `filter` +predicate to skip entries entirely (rather than mark them ignored): + +```rust +Test::from_dir_filtered("name", &runner, &filter_fn, &ignore_fn, dir) +``` + +In practice `filter_fn` is used to restrict to files with a specific extension: + +```rust +Test::from_dir_filtered("ui", &ui_test, &is_va_file, &ignore_never, &openvaf_test_data("syn_ui")) +``` + +### `Test::from_list` — one test per item in a slice + +```rust +Test::from_list("name", &runner, &ignore_fn, &[item1, item2, …]) +``` + +Creates `"{name} {item:?}"` for each item. Used for tests parametrised over a +fixed set of values rather than a filesystem directory. + +--- + +## The `harness!` macro + +```rust +#[macro_export] +macro_rules! harness { + ($($tests: expr),*) => { + fn main() { + let args = $crate::Arguments::parse_cli(); + let mut tests = ::std::vec::Vec::new(); + $($crate::TestOrTestList::push_to_list($tests, &mut tests);)* + $crate::run_harness(&args, tests).exit() + } + }; +} +``` + +Each expression in the macro can be either a `Test` (pushed as one item) or +any `IntoIterator` (flattened into the list), via the +`TestOrTestList` trait: + +```rust +pub trait TestOrTestList<'a> { + fn push_to_list(self, dst: &mut Vec>); +} +impl<'a> TestOrTestList<'a> for Test<'a> { … } +impl<'a, I: IntoIterator>> TestOrTestList<'a> for I { … } +``` + +This lets `Test::from_dir(…)` (which returns an iterator) and `Test::new(…)` +(which returns a single `Test`) appear in the same `harness!` invocation +without explicit flattening. + +--- + +## `run_harness` execution model + +```rust +pub fn run_harness(args: &Arguments, mut tests: Vec) -> TestSummary +``` + +1. **Sort** tests alphabetically by name — ensures deterministic order + regardless of filesystem enumeration order. +2. **Filter** — retain only tests that pass `args.is_filtered_out`; count + filtered-out tests for the summary. +3. **List mode** — if `--list` was given, print names and return an empty + summary without running anything. +4. **Sequential execution** — iterate the remaining tests in order. For each: + - If ignored (and `--include-ignored` not set): count as ignored. + - Otherwise: call `Test::run(runner)`. +5. **`Test::run`** wraps the closure in `std::panic::catch_unwind`: + - `Ok(Ok(()))` → pass + - `Ok(Err(failed))` → fail with `failed.msg` + - `Err(panic_payload)` → fail with `"test panicked: {payload}"` (or + `"test panicked"` if the payload is not a `&str`/`String`) +6. Print failures, then the summary line. + +Tests are always single-threaded. There is no parallelism, no test isolation +beyond panic-catching, and no output capture (`--nocapture` is accepted but +is always the effective mode). + +--- + +## Worked example: `basedb` data tests + +`basedb/tests/data_tests.rs` registers three test suites with one `harness!` +call: + +```rust +harness! { + Test::from_dir_filtered( + "integration", &integration_test, + &Path::is_dir, // filter: only directories + &ignore_dev_tests, // ignore: directories named "dev_*" + &project_root().join("integration_tests") + ), + Test::from_dir_filtered( + "ui", &ui_test, + &is_va_file, // filter: only *.va files + &ignore_never, + &openvaf_test_data("syn_ui") + ), + Test::from_dir_filtered( + "ast", &ast_test, + &is_va_file, + &ignore_never, + &openvaf_test_data("ast") + ) +} +``` + +At runtime, `main` parses CLI args, then calls `read_dir` on each directory, +building a list of `Test` values like: + +``` +integration::resistor +integration::diode +ui::missing_semicolon.va +ui::unknown_nature.va +ast::resistor.va +… +``` + +These are sorted alphabetically, then `cargo test -- ui` would filter to only +the `ui::*` tests. Each runner constructs a `TestDataBase`, runs the compiler +frontend to a specific stage, and uses `expect_test::expect_file!` to compare +output against a `.log` or `.va_ast` snapshot file. A mismatch returns +`Err(Failed { msg: "…" })`, which `run_harness` prints as a test failure. + +--- + +## Key design decisions + +**`harness = false` + custom `main`.** The `harness!` macro generates a `main` +function, which is only valid when Cargo compiles the test binary with +`harness = false`. This is a deliberate opt-in: regular `#[test]` functions in +the same crate would require `harness = true`. Data-test binaries are separate +`[[test]]` sections in `Cargo.toml` so they can set `harness = false` +independently. + +**`TestOrTestList` for uniform macro syntax.** Without this trait, callers +would need to write `…collect::>()` after each `from_dir` call to +flatten iterators. The blanket `impl>` handles this +automatically, keeping the `harness!` invocation clean. + +**`FnOnce` runner, not `Fn`.** Test closures capture state (e.g., the `Path` +to a file) and run exactly once. Using `FnOnce` is honest about ownership and +avoids cloning the captured state. The `from_dir` constructor clones the `Path` +into each closure's capture at construction time, so the closure itself is +`FnOnce`. + +**Sequential execution.** Running tests sequentially avoids the need for +`Send + Sync` bounds on test state, allows tests to share a Salsa database +without locking, and produces deterministic interleaved output. For compiler +integration tests that each spin up a full Salsa instance and parse tens of +kilobytes, the overhead of spawning threads would not improve wall-clock time +on the CI machines this project targets. + +**`catch_unwind` for panic isolation.** A panicking test should not abort the +entire test run. `AssertUnwindSafe` is required because the `FnOnce` closure +is not automatically `UnwindSafe` — the harness accepts this as a known +approximation, consistent with how `libtest` itself handles panics. From c9fd98f6898c5982c4251f32f33445ab36df65f1 Mon Sep 17 00:00:00 2001 From: philippevelha Date: Sun, 24 May 2026 12:20:35 +0200 Subject: [PATCH 21/28] docs: add sourcegen INTERNALS Covers the test-as-codegen pattern (ensure_file_contents fail-then-rewrite), four generators (AST/SyntaxKind from .ungram, MIR opcodes/InstBuilder from the opcodes! DSL, HIR builtins tables, OSDI C-header bindings), the utility layer (CommentBlock extraction, reformat via rustfmt, string-case converters, project_root), and the cross-compilation guard. Worked example traces adding a new MIR opcode through the full regeneration cycle. Co-Authored-By: Claude Sonnet 4.6 --- docs/sourcegen/INTERNALS.md | 383 ++++++++++++++++++++++++++++++++++++ 1 file changed, 383 insertions(+) create mode 100644 docs/sourcegen/INTERNALS.md diff --git a/docs/sourcegen/INTERNALS.md b/docs/sourcegen/INTERNALS.md new file mode 100644 index 00000000..7eab261c --- /dev/null +++ b/docs/sourcegen/INTERNALS.md @@ -0,0 +1,383 @@ +# `sourcegen` — Compile-time source code generation via tests + +**Location:** `sourcegen/` (workspace root, not under `lib/` or `openvaf/`) +**Role:** A collection of `#[test]` functions that read grammar/header files, +generate Rust source code with `proc_macro2`/`quote`, and write it back into +the repository. The generated files are checked in; CI runs the tests and fails +if any generated file is out of date. This is sometimes called the +"test-as-codegen" or "update-on-fail" pattern. + +Cross-links: [ARCHITECTURE](../../ARCHITECTURE.md) · +[syntax INTERNALS](../syntax/INTERNALS.md) + +--- + +## Why a crate just for generation? + +OpenVAF has three large bodies of repetitive Rust code that would be painful to +write and maintain by hand: + +1. **`SyntaxKind`, `AstNode` impls, and `AstToken` impls** — every token, + keyword, punctuation mark, and CST node kind as an enum variant, plus + `can_cast`/`cast`/`syntax` boilerplate for each. +2. **MIR `Opcode` enum and `InstBuilder` trait** — one enum variant and one + builder method per instruction opcode, with arity and return-count metadata. +3. **OSDI C-struct Rust bindings** — LLVM type descriptors and Rust `repr(C)` + structs mirroring the OSDI C headers, per OSDI version, per target triple. + +Keeping these in sync with their source definitions (the `.ungram` grammar, the +`opcodes!` macro invocation, the `.h` headers) manually would introduce drift. +`sourcegen` closes the loop: the source-of-truth lives in human-readable files; +the tests regenerate the derived Rust; if the derived files differ from what is +on disk the test fails and rewrites them. + +--- + +## Crate layout + +``` +sourcegen/ + build.rs — detects cross-compilation; emits cfg=cross_compile + src/ + lib.rs — utility functions (file scanning, comment extraction, + string-case conversion, reformat, ensure_file_contents) + ast.rs — AST/token/syntax-kind generation from the .ungram grammar + ast/src.rs — hardcoded KINDS_SRC (punct, keywords, literals, tokens, nodes) + mir_instructions.rs — Opcode enum + InstBuilder trait generation + hir_builtins.rs — builtin function tables for HIR lowering + osdi.rs — OSDI C-header → Rust bindings generation + tests.rs — re-exports hir_builtins test module +``` + +All code under `#![cfg(all(test, not(cross_compile)))]` — the entire crate +compiles to nothing except during `cargo test` on the host architecture. + +### Cross-compilation guard + +`build.rs` emits `cfg(cross_compile)` when `HOST != TARGET`. This suppresses +sourcegen tests during cross-compilation runs where the generated files cannot +be regenerated (the generated code would be for the wrong architecture in the +OSDI case). + +--- + +## Utility layer (`lib.rs`) + +### File scanning + +```rust +pub fn list_files(dir: &Path) -> Vec // recursive, skips hidden +pub fn list_rust_files(dir: &Path) -> Vec // filters to *.rs +``` + +Used by `hir_builtins` and `osdi` generators to enumerate source directories. + +### Comment extraction + +```rust +pub struct CommentBlock { pub id: String, pub line: usize, pub contents: Vec } + +CommentBlock::extract(tag, text) // tagged blocks: // TAG: id\n// ... +CommentBlock::extract_untagged(text) // all // ... blocks +``` + +`extract` finds consecutive `// ` comment lines where the first line matches +`{tag}: {id}`. This allows embedding structured metadata inside Rust source +comments that `sourcegen` can parse. For example, `hir_builtins` uses comment +blocks in the HIR source files to enumerate which builtin functions exist. + +`do_extract_comment_blocks` is the shared implementation. It trims leading +whitespace from each line before checking the `// ` prefix, so indented +comments inside `impl` blocks are found correctly. A bare `//` line (no +content) is treated as an empty continuation line when +`allow_blocks_with_empty_lines` is true (used for tagged blocks but not +untagged ones). + +### Code formatting + +```rust +pub fn reformat(text: String) -> String +``` + +Pipes the generated token stream through `rustfmt` (using the project's +`rustfmt.toml`) so the written files are properly formatted. `rustfmt` is +invoked via `xshell::cmd!` with its input fed on stdin and output read back. + +### Preamble injection + +```rust +pub fn add_preamble(generator: &'static str, mut text: String) -> String +``` + +Prepends `//! Generated by \`{generator}\`, do not edit by hand.` to the +formatted source. Every generated file starts with this line so readers know +not to edit it directly. + +### The update-on-fail contract + +```rust +pub fn ensure_file_contents(file: &Path, contents: &str) +``` + +This is the central mechanism of the whole system: + +1. Read the existing file from disk (if it exists). +2. Normalize newlines (`\r\n` → `\n`). +3. If the file already matches `contents`, **return** — the test passes. +4. If the file differs (or doesn't exist): + - Print a red error message naming the file. + - If running on CI (env var `CI` is set), print a hint to run `cargo test` + locally and commit the result. + - Write the new `contents` to disk. + - **Panic** with `"some file was not up to date and has been updated, simply re-run the tests"`. + +The panic causes the test to fail, which causes `cargo test` to exit non-zero. +But because the file has already been rewritten, running `cargo test` a second +time will find the file up to date and pass. This "fail then update, pass on +re-run" pattern is deliberate: it ensures the generated files are always +committed and reviewable as a diff. + +### String-case utilities + +| Function | Example | +|----------|---------| +| `to_upper_snake_case("FooBar")` | `"FOO_BAR"` | +| `to_lower_snake_case("FooBar")` | `"foo_bar"` | +| `to_pascal_case("foo_bar")` | `"FooBar"` | +| `pluralize("token")` | `"tokens"` | + +These are used throughout the generators to derive Rust identifiers from +grammar names. + +### `project_root()` + +```rust +pub fn project_root() -> PathBuf +``` + +Walks up from `CARGO_MANIFEST_DIR` until it finds a directory containing +`README.md`. This is the workspace root — all generated file paths are +expressed relative to it. + +--- + +## AST generator (`ast.rs` + `ast/src.rs`) + +**Test function:** `#[test] pub fn ast()` + +**Source-of-truth:** `openvaf/syntax/veriloga.ungram` (parsed with the +`ungrammar` crate) plus `KINDS_SRC` (hardcoded in `ast/src.rs`). + +**Generated files:** + +| File | Content | +|------|---------| +| `openvaf/tokens/src/parser/generated.rs` | `SyntaxKind` enum, `T![]` macro, `is_keyword`/`is_punct`/`is_literal` methods, `from_keyword`/`from_char` dispatchers | +| `openvaf/syntax/src/ast/generated/tokens.rs` | One newtype struct per token kind with `AstToken` impl | +| `openvaf/syntax/src/ast/generated/nodes.rs` | One struct per CST node and one enum per alternative group, all with `AstNode` impls | + +### Pipeline + +1. Parse `veriloga.ungram` into a `Grammar` (ungrammar's AST). +2. `lower(&grammar)` converts the grammar rules into `AstSrc` — an intermediate + representation of nodes and enums: + - A grammar rule that is a pure `Alt` of nodes/tokens becomes an `AstEnumSrc`. + - Everything else becomes an `AstNodeSrc` with a list of `Field`s. + - `lower_comma_list` detects the pattern `T (',' T)*` and collapses it to + a single `Many`-cardinality field. + - Labeled sub-rules named `op`, `lhs`, `rhs`, `then_branch`, etc. are + flagged `manually_implemented` and excluded from codegen (they have custom + accessor impls in the `syntax` crate). +3. Post-process `AstSrc`: + - `deduplicate_fields` removes repeated fields (can arise from `Seq` of + repeated `Opt` sub-rules). + - `extract_struct_traits` and `extract_enum_traits` lift common field sets + into trait impls (`AttrsOwner`, `ArgListOwner`). +4. `generate_syntax_kinds(KINDS_SRC)` builds the `SyntaxKind` enum and support + methods purely from the hardcoded `KINDS_SRC` constant (which lists all + punctuation tokens, keywords, literals, and node names). +5. `generate_tokens(&ast)` and `generate_nodes(KINDS_SRC, &ast)` emit the + typed AST wrappers using `quote!`. +6. Doc comments are injected via a `#[pretty_doc_comment_placeholder_workaround]` + attribute placeholder: `quote!` emits the placeholder, and a post-processing + step splits the output on that string and inserts the actual `///` lines. + +### `KINDS_SRC` + +The hardcoded table in `ast/src.rs` lists all Verilog-A tokens in four +categories: + +- **punct** — operator and delimiter tokens, each with a string literal and a + `SCREAMING_SNAKE_CASE` name +- **keywords** — reserved words (`analog`, `begin`, `branch`, …) +- **literals** — `INT_NUMBER`, `STD_REAL_NUMBER`, `SI_REAL_NUMBER`, `STR_LIT` +- **tokens** — non-keyword identifier-like tokens (`IDENT`, `NAME`, etc.) +- **nodes** — all CST node names + +--- + +## MIR instruction generator (`mir_instructions.rs`) + +**Test functions:** `#[test] fn gen_opcodes()`, `#[test] fn gen_instr_builder()` + +**Source-of-truth:** The `opcodes!` macro invocation at the top of +`mir_instructions.rs` itself. The macro call is both the spec and the input to +codegen — there is no separate file to keep in sync. + +**Generated files:** + +| File | Content | +|------|---------| +| `openvaf/mir/src/instructions/generated.rs` | `InstructionFormat` enum, `Opcode` enum (`#[repr(u8)]`), `OPCODE_CONSTRAINTS`, `OPCODE_NAMES`, `OPCODE_FORMAT` arrays, `FromStr for Opcode` | +| `openvaf/mir/src/builder/generated.rs` | `InstBuilder` trait with one method per `Unary`/`Binary` opcode | + +### The `opcodes!` DSL + +``` +opcodes! { + Unary(1) -> 1 { Inot Bnot Fneg … } + Binary(2) -> 1 { Iadd Isub … Pow } + Branch(1) -> 0 { Br } + Jump(0) -> 0 { Jmp } + Exit(0) -> 0 { Exit } + @varargs Call { Call(0) -> 0 } + @varargs PhiNode { Phi(0) -> 1 } +} +``` + +Each format declaration gives the default argument count and return count for +all opcodes in that format. Individual opcodes can override the return count +with `-> N`. The `@varargs` forms handle variable-argument formats (Call, Phi) +where arity varies per instruction rather than per opcode. + +The macro expands to a `const INSTRUCTION_FORMATS: [InstructionFormatData; N]` +array that the test functions consume to produce the generated Rust. + +`gen_opcodes` iterates `INSTRUCTION_FORMATS` and emits: +- The `Opcode` enum with `#[repr(u8)]` and discriminants starting at 1 (0 is + reserved as a sentinel). +- `OPCODE_CONSTRAINTS[opcode as u8]` — a `(args, returns)` pair, indexed + directly by the opcode discriminant. +- `OPCODE_NAMES[opcode as u8]` — lowercase name string for `Display`/`Debug`. +- `OPCODE_FORMAT[opcode as u8]` — the `InstructionFormat` for dispatch. +- `FromStr for Opcode` — matches lowercase name strings. + +`gen_instr_builder` emits the `InstBuilder` trait, filtering to only +`Unary` and `Binary` formats (the others have manually-written constructors in +the non-generated part of the file). + +--- + +## HIR builtins generator (`hir_builtins.rs`) + +**Source-of-truth:** Comment blocks in the HIR source (extracted with +`CommentBlock::extract`) listing the built-in Verilog-A system functions and +analog operators. + +**Generated file:** `openvaf/hir_lower/src/builtins/generated.rs` (the path +is verified by reading the source; > TODO(verify): confirm exact path). + +The generator reads the list of analog operators (`ANALOG_OPERATORS`) and +unsupported system functions (`UNSUPPORTED`) from the constants in +`hir_builtins.rs`, then emits dispatch tables that `hir_lower` uses to resolve +`$display`, `$absdelay`, `ddt`, etc. during name resolution. + +--- + +## OSDI struct generator (`osdi.rs`) + +**Source-of-truth:** The C header files in `openvaf/osdi/header/` (one per +OSDI version, e.g. `osdi_0_3.h`). + +**Generated files** (per OSDI version, per target triple): + +| File | Content | +|------|---------| +| `openvaf/osdi/src/metadata/osdi_{major}_{minor}.rs` | LLVM type descriptors (`CodegenCx` helpers), `#define`-derived constants, stdlib bitcode `include_bytes!` for each target | +| `openvaf/openvaf/tests/load/*.rs` | Rust `repr(C)` struct mirrors of the OSDI structs, used by integration tests to load and inspect `.osdi` files | +| `melange/core/src/veriloga/*.rs` | Bindings for the Melange simulator | + +The generator parses the C headers with a hand-written recursive-descent +parser (`HeaderParser`) that understands `typedef struct { … } Name;`, +`#define NAME value`, and `typedef T Name;`. It does not handle the full C +preprocessor or arbitrary C syntax — only the subset used in the OSDI headers. + +For each parsed struct, it emits: +- A `fn llvm_ty_Name(cx: &CodegenCx) -> &Type` function that builds the LLVM + struct type from the field types. +- A Rust `#[repr(C)] struct Name { … }` with field types translated from C + primitives (`uint32_t` → `u32`, `double` → `f64`, pointers → `*mut T`, etc.) + +The `target` crate's `get_targets()` iterator provides the list of supported +target triples; one `include_bytes!` per target embeds the pre-compiled OSDI +stdlib bitcode at build time. + +--- + +## Worked example: adding a new MIR opcode + +Suppose a new unary opcode `Abs` (absolute value of a float) is needed. + +**Step 1** — add it to the `opcodes!` invocation in `mir_instructions.rs`: + +```rust +Unary(1) -> 1 { + … + Atanh + Abs // ← new +} +``` + +**Step 2** — run `cargo test -p sourcegen`: + +``` +error: openvaf/mir/src/instructions/generated.rs was not up-to-date, updating +error: openvaf/mir/src/builder/generated.rs was not up-to-date, updating +some file was not up to date and has been updated, simply re-run the tests +``` + +Both generated files are rewritten. `generated.rs` gains `Abs = N` in the +`Opcode` enum and a corresponding entry in each of the three arrays. +`builder/generated.rs` gains: + +```rust +fn abs(self, arg0: Value) -> Value { + let (inst, dfg) = self.unary(Opcode::Abs, arg0); + dfg.first_result(inst) +} +``` + +**Step 3** — run `cargo test -p sourcegen` again. All tests pass. Commit both +the source change and the two generated files together. + +--- + +## Key design decisions + +**Tests as the build step, not `build.rs`.** Using `#[test]` rather than a +`build.rs` script keeps generated files in the repository (readable as diffs), +avoids running codegen on every incremental build, and makes it easy to +regenerate selectively (`cargo test -p sourcegen ast`). The trade-off is that +generated files can drift from their inputs between test runs — CI prevents +this from reaching the main branch. + +**`ensure_file_contents` fail-then-rewrite pattern.** Writing the file before +panicking means a developer only needs to run `cargo test` twice (once to +regenerate, once to verify) rather than running a separate generation script. +The panic message tells them exactly what happened. + +**`quote!` + `rustfmt`, not string templates.** Using `proc_macro2`/`quote` +means the code generation logic is type-checked by the Rust compiler and +handles operator precedence, identifier hygiene, and token quoting correctly. +`rustfmt` normalises whitespace so the diff between regenerations is minimal. + +**Cross-compilation guard.** The OSDI generator embeds target-specific stdlib +bitcode paths that are only meaningful on the host. Running it during a +cross-compilation would write wrong paths or fail to find the bitcode files. +The `build.rs` / `#![cfg(not(cross_compile))]` guard prevents this silently. + +**Single source of truth per generated artefact.** The `.ungram` file is the +only place that defines the Verilog-A grammar shape; `KINDS_SRC` is the only +place that lists token names; the `opcodes!` invocation is the only place that +lists opcodes. The generators are read-only consumers of these sources — +they never modify them. From 9da66a03213d5defc2edb5e57f28e67688a1fe52 Mon Sep 17 00:00:00 2001 From: philippevelha Date: Sun, 24 May 2026 12:25:51 +0200 Subject: [PATCH 22/28] docs: add mir_interpret INTERNALS Covers the Data union (#[repr(C)], 8-byte, untagged f64/i32/bool/Spur), UNDEF=0xFF sentinel, from_f64_slice transmute, InterpreterState (vals/prev_bb/ next_inst), Interpreter run loop, eval dispatch for all InstructionData variants (Branch/Jump/Exit/PhiNode/Call/Unary/Binary), and the Func<'a> raw- fn-pointer call handler type. Worked example traces the mir_autodiff numerical verification test pattern. Co-Authored-By: Claude Sonnet 4.6 --- docs/mir_interpret/INTERNALS.md | 340 ++++++++++++++++++++++++++++++++ 1 file changed, 340 insertions(+) create mode 100644 docs/mir_interpret/INTERNALS.md diff --git a/docs/mir_interpret/INTERNALS.md b/docs/mir_interpret/INTERNALS.md new file mode 100644 index 00000000..3ef51e6a --- /dev/null +++ b/docs/mir_interpret/INTERNALS.md @@ -0,0 +1,340 @@ +# `mir_interpret` — MIR tree-walking interpreter + +**Location:** `openvaf/mir_interpret/` +**Role:** A simple tree-walking interpreter for MIR `Function`s. Given a +function and a set of parameter values, it evaluates every instruction in +basic-block order and returns the final value of any result `Value`. Used +primarily in tests to numerically verify that auto-differentiation produces +correct derivatives. + +Cross-links: [mir INTERNALS](../mir/INTERNALS.md) · +[ARCHITECTURE](../../ARCHITECTURE.md) + +--- + +## Crate relationships + +``` +mir_interpret (openvaf/mir_interpret/) + └─► mir_autodiff (tests: numerical verification of AD output) + (also listed as a dependency in sim_back, osdi, mir_build — unused in those + crates' source at the time of writing) +``` + +The crate depends on `mir` (for `Function`, `Opcode`, `Value`, etc.), +`typed-index-collections` (for `TiVec`/`TiSlice`), and `lasso` (for `Spur`, +the interned string handle type). + +--- + +## `Data` — the value type + +```rust +#[repr(C)] +pub union Data { + raw: [u8; 8], + float: f64, + int: i32, + str: Spur, + bool: bool, +} +``` + +`Data` is an 8-byte untagged union that can hold any of the four MIR primitive +types plus a raw byte view. There is no discriminant — the caller must know +which variant is active, just as in MIR itself (each `Value` has a statically +known type). + +### The UNDEF sentinel + +```rust +pub const UNDEF: Data = Data { raw: [u8::MAX; 8] }; // 0xFFFFFFFFFFFFFFFF +``` + +Uninitialized values (results of instructions not yet evaluated) are filled +with `UNDEF`. The all-ones bit pattern is a quiet NaN for `f64`, a non-zero +value for `i32`/`bool`, and a non-zero value for `Spur` (which uses a +`NonZeroU32` internally). This makes use-before-define distinguishable in +debug scenarios via `is_undef()`, though the interpreter does not enforce it at +runtime. + +### Conversions + +All conversions go through the union's field reads, guarded by `unsafe`: + +| Direction | Safety argument | +|-----------|----------------| +| `f64 → Data` | `mut res = UNDEF; res.float = val` — `float` field is always written before read | +| `i32 → Data` | same pattern with `res.int` | +| `bool → Data` | same with `res.bool` | +| `Spur → Data` | same with `res.str` | +| `Data → f64` | all bit patterns are valid `f64` values | +| `Data → i32` | all bit patterns are valid `i32` values | +| `Data → bool` | all bit patterns are valid `bool` values | +| `Data → Spur` | `Spur` is a `NonZeroU32`; the conversion `assert`s the low 4 bytes are non-zero | + +`From for Data` bridges MIR compile-time constants into `Data` +values, handling all four `Const` variants (`Float`, `Int`, `Str`, `Bool`). + +### `from_f64_slice` + +```rust +pub fn from_f64_slice(data: &[f64]) -> &[Data] { + unsafe { transmute(data) } +} +``` + +Zero-copy reinterpretation of a `&[f64]` as `&[Data]`. This is safe because: +- `Data` is `#[repr(C)]` and 8 bytes — the same size and alignment as `f64`. +- The values will only be read through the `float` field. + +This is the primary way test code passes a batch of `f64` parameters to the +interpreter without allocating a separate `Vec`. + +--- + +## `InterpreterState` + +```rust +pub struct InterpreterState { + vals: TiVec, // current value for every Value in the function + prev_bb: Block, // the basic block we came from (for phi resolution) + next_inst: Option, // None means execution has finished +} +``` + +`vals` is indexed by `Value` and holds the current computed `Data` for each +SSA value. It is initialised in `Interpreter::new`: + +- `ValueDef::Param(param)` — copied from the `args` slice. +- `ValueDef::Const(c)` — converted from `mir::Const` via `Data::from`. +- `ValueDef::Result(…)` and `ValueDef::Invalid` — set to `Data::UNDEF`. + +`prev_bb` is updated on every jump or branch so that phi nodes can read the +correct incoming value. + +`next_inst` starts as `func.layout.first_inst(entry_block)` and advances +instruction-by-instruction. Setting it to `None` terminates the main loop. + +### `write` and `read` + +```rust +pub fn write(&mut self, dst: Value, val: impl Into) +pub fn read>(&self, val: Value) -> T +``` + +These are the public accessors for external call handlers (see `Func<'a>`) to +read arguments and write results. + +--- + +## `Interpreter` + +```rust +pub struct Interpreter<'a> { + pub state: InterpreterState, + calls: &'a TiSlice, *mut c_void)>, + func: &'a Function, +} +``` + +`calls` maps each `FuncRef` in the function to a native Rust function pointer +plus a `*mut c_void` context pointer. This allows the interpreter to dispatch +`Call` instructions to host code without knowing the function bodies. + +### Construction + +```rust +Interpreter::new(func, calls, args) // full constructor +Interpreter::test(func) // shorthand: no calls, no params +``` + +`test` is used in unit tests that only exercise pure arithmetic MIR without +any external call dependencies. + +### The `run` loop + +```rust +pub fn run(&mut self) { + while let Some(inst) = self.state.next_inst { + self.eval(inst) + } +} +``` + +`eval` advances `next_inst` before performing the computation (so early +returns from control-flow instructions are clean). The loop terminates when: +- An `Exit` instruction sets `next_inst = None`. +- The last instruction of a block falls off the end (unreachable in well-formed + MIR — every block must end with a terminator). + +--- + +## `eval` — instruction dispatch + +`eval` matches on `InstructionData` variants: + +### Control flow + +| Variant | Action | +|---------|--------| +| `Branch { cond, then_dst, else_dst }` | Reads `vals[cond]` as `bool`; calls `jmp` to the appropriate block | +| `Jump { destination }` | Calls `jmp` unconditionally | +| `Exit` | Sets `next_inst = None` and returns | + +`jmp` records `prev_bb` (the block of the current instruction) before +installing `first_inst(dst)` as `next_inst`. + +### Phi nodes + +```rust +InstructionData::PhiNode(ref phi) => { + let val = func.dfg.phi_edge_val(phi, state.prev_bb).unwrap(); + let res = func.dfg.first_result(inst); + state.vals[res] = state.vals[val]; + (Opcode::Phi, [].as_slice()) +} +``` + +The phi node looks up the value corresponding to the predecessor block +(`prev_bb`) and copies it into the result. This requires that `jmp` always +records the source block before moving to the destination. + +### External calls + +```rust +InstructionData::Call { func_ref, ref args } => { + let (fun, data) = calls[func_ref]; + let args = args.as_slice(&func.dfg.insts.value_lists); + let rets = func.dfg.inst_results(inst); + fun(&mut state, args, rets, data); + state.next_inst = func.layout.next_inst(inst); + return; +} +``` + +The call handler receives mutable access to `InterpreterState` (so it can +call `state.read`/`state.write`), the argument `Value` slice, the result +`Value` slice, and the opaque context pointer. It is responsible for writing +all result values before returning. + +### Arithmetic and comparison opcodes + +Every `Unary` and `Binary` opcode maps directly to a Rust expression on the +appropriate typed field: + +```rust +Opcode::Fadd => (args(0).f64() + args(1).f64()).into(), +Opcode::Imul => (args(0).i32() * args(1).i32()).into(), +Opcode::Flt => (args(0).f64() < args(1).f64()).into(), +Opcode::Sqrt => f64::sqrt(args(0).f64()).into(), +Opcode::Clog2 => { + let val = args(0).i32(); + let val = 8 * size_of_val(&val) as i32 - val.leading_zeros() as i32; + val.into() +} +``` + +`args` is a local closure `let args = |i| state.vals[args[i]]` that reads from +the current `vals` table. The result is written to `state.vals[res]` after the +match. + +Cast opcodes are also handled inline: + +| Opcode | Semantics | +|--------|-----------| +| `FIcast` | `f64 as i32` (truncation) | +| `IFcast` | `i32 as f64` | +| `BIcast` | `bool as i32` (0 or 1) | +| `IBcast` | `i32 != 0` | +| `FBcast` | `f64.round() as i32` | +| `BFcast` | `bool as i32 as f64` | +| `OptBarrier` | identity (pass-through) | + +String comparisons (`Seq`, `Sne`) compare `Spur` handles directly — because +`lasso` interns strings, equality of `Spur` values implies equality of the +underlying strings. + +--- + +## `Func<'a>` — external call handler type + +```rust +pub type Func<'a> = fn(&mut InterpreterState, &[Value], &[Value], *mut c_void); +``` + +- First argument: mutable state (to call `read`/`write`) +- Second argument: argument `Value` indices (caller reads them with `state.read(args[i])`) +- Third argument: result `Value` indices (caller writes them with `state.write(rets[i], val)`) +- Fourth argument: opaque context (used for closures that capture state as a raw pointer) + +This is a raw function pointer, not a closure, so it is `Send`, has no +implicit lifetime, and can be stored in a `TiSlice` without boxing. + +--- + +## Worked example: numerically verifying autodiff + +`mir_autodiff/src/builder/tests.rs` uses the interpreter to check that +auto-differentiation produces numerically correct results. The pattern is: + +```rust +fn check_num(src: &str, expected_ir: Expect, args: &[f64], expected_val: f64) { + // 1. Parse MIR text into a Function + let (mut func, _) = parse_function(src).unwrap(); + + // 2. Run auto-differentiation (modifies func in place) + auto_diff(&mut func, &dom_tree, &unknowns, &[]); + + // 3. Run the interpreter with the given float arguments + let mut interp = Interpreter::new( + &func, + TiSlice::from_ref(&[]), // no external calls + TiSlice::from_ref(Data::from_f64_slice(args)), // params as Data + ); + interp.run(); + + // 4. Read the derivative result value (always Value(100) in these tests) + let val: f64 = interp.state.read(100u32.into()); + + // 5. Compare numerically with tolerance + assert!(val.approx_eq(expected_val, margin)); +} +``` + +`Data::from_f64_slice(args)` reinterprets the `&[f64]` test arguments as +`&[Data]` without copying. The interpreter evaluates the AD-augmented function +and the derivative result is read back as an `f64`. If it deviates from the +symbolic expectation by more than 10×ε, the test prints the MIR and fails. + +--- + +## Key design decisions + +**Untagged union over enum for `Data`.** An `enum { Float(f64), Int(i32), … }` +would cost an extra byte for the discriminant and require match arms everywhere. +Since MIR is typed (each `Value` has a statically known type, determined during +`hir_lower`), the interpreter already knows which field to read. The union +keeps each `Data` at exactly 8 bytes — the same size as `f64` — so +`from_f64_slice` can safely transmute. + +**`UNDEF = [0xFF; 8]` rather than zero.** Zero is a valid, common value for +all four types (0.0, 0, false, and technically a null-like `Spur`). Using +all-ones makes use-before-define more visually obvious in debug output and is +safe for `f64` (a quiet NaN rather than zero or a meaningful number). + +**`Func<'a>` as a raw fn pointer + `*mut c_void`.** Storing closures would +require boxing or lifetime-erased trait objects. The raw pointer design matches +the C ABI convention (function + context), avoids heap allocation per call +entry, and makes the call table a flat `TiSlice` without indirection. + +**No type checking at runtime.** The interpreter trusts that the MIR is +well-typed (which it is, having been produced by `hir_lower` and verified by +`mir_opt`). Adding a runtime type tag to `Data` and checking it on every +opcode would double the overhead for a tool used only in tests. + +**`prev_bb` on `InterpreterState` rather than `Interpreter`.** The call +handler `Func<'a>` receives `&mut InterpreterState`, not `&mut Interpreter`. +Keeping `prev_bb` on the state rather than on the outer struct makes the full +CFG context available to call handlers without exposing the rest of the +interpreter's internals. From 9fef1acd902a89a42d3a17204b84d65feb9a092b Mon Sep 17 00:00:00 2001 From: philippevelha Date: Sun, 24 May 2026 12:42:27 +0200 Subject: [PATCH 23/28] docs: add mir_reader INTERNALS Covers the MIR text format (Cranelift .clif dialect), Lexer token set and scanning rules (numbers/NaN/Inf, entity names via split_entity_name, %names, strings with escapes, -> arrow detection), Parser single-token lookahead recursive-descent design, forward-reference handling via make_invalid_value, Context add_sig/add_block helpers, parse_inst_operands dispatch by InstructionFormat, phi [v,block] pair syntax, and the lasso Rodeo returned alongside the Function. Worked example traces a mir_autodiff test fixture. Co-Authored-By: Claude Sonnet 4.6 --- docs/mir_reader/INTERNALS.md | 377 +++++++++++++++++++++++++++++++++++ 1 file changed, 377 insertions(+) create mode 100644 docs/mir_reader/INTERNALS.md diff --git a/docs/mir_reader/INTERNALS.md b/docs/mir_reader/INTERNALS.md new file mode 100644 index 00000000..38dbcabb --- /dev/null +++ b/docs/mir_reader/INTERNALS.md @@ -0,0 +1,377 @@ +# `mir_reader` — MIR text format parser + +**Location:** `openvaf/mir_reader/` +**Role:** Parses the human-readable MIR text format into a `mir::Function`. +The text format is Cranelift's `.clif` format, adapted for OpenVAF's MIR. +`mir_reader` is the companion to `Function::print` (in the `mir` crate); the +two together support round-trip testing of MIR transformations. + +Cross-links: [mir INTERNALS](../mir/INTERNALS.md) · +[mir_interpret INTERNALS](../mir_interpret/INTERNALS.md) · +[ARCHITECTURE](../../ARCHITECTURE.md) + +--- + +## Crate relationships + +``` +mir_reader (openvaf/mir_reader/) + └─► mir_autodiff (tests: parse MIR text, run AD, interpret result) +``` + +The crate depends on `mir`, `bforest` (for `Map` used in phi node parsing), +and `lasso` (for string interning). It is only used as a `[dev-dependency]` in +`mir_autodiff`. + +--- + +## The MIR text format + +A function in the text format looks like this (taken from the roundtrip test): + +``` +function %bar(v4, v8, v9, v10) { + v5 = iconst 42 + v6 = iconst 23 +block0: + v7 = iadd v5, v8 + v11 = iadd v6, v9 + v12 = ilt v8, v10 + br v12, block1, block2 + +block1: + v13 = isub v7, v10 + jmp block3 + +block2: + jmp block3 + +block3: + v14 = phi [v13, block1], [v11, block2] +} +``` + +The format has four sections: + +1. **Header**: `function %name(v0, v1, …)` — function name and parameter + value numbers. +2. **Preamble**: constant definitions (`fconst`, `iconst`, `sconst`) and + function signature declarations (`fn0 = const fn %name(1) -> 1`), before + the first block. +3. **Basic blocks**: each introduced by `blockN:` followed by zero or more + instructions. +4. **Closing `}`**. + +### Token reference + +| Text | Token | Examples | +|------|-------|---------| +| `vN` | `Value(Value)` | `v0`, `v42` | +| `blockN` | `Block(Block)` | `block0`, `block3` | +| `fnN` | `FuncRef(u32)` | `fn0`, `fn2` | +| `%name` | `Name(&str)` | `%bar`, `%ddx_v10` | +| `"…"` | `String(&str)` | `"hello"` | +| `0x…` or decimal | `Integer(&str)` | `42`, `0xff` | +| float literal | `Float(&str)` | `0x1.8p+1`, `NaN`, `Inf` | +| `#89AF` | `HexSequence(&str)` | | +| `@00c7` | `SourceLoc(&str)` | | +| `;` to end of line | `Comment(&str)` | | +| identifiers | `Identifier(&str)` | `iadd`, `fconst`, `loop` | + +Whitespace (spaces, tabs, `\n`, `\r`) is skipped by the lexer. Comments +(`; …`) are consumed but not stored. + +--- + +## Crate structure + +``` +mir_reader/src/ + lib.rs — public API: parse_function, parse_functions, ParseError, LexError + error.rs — Location, ParseError, ParseResult, err! macro + lexer.rs — Lexer, Token, split_entity_name + lexer/tests.rs + parser.rs — Parser, Context, VariableArgs + parser/tests.rs +``` + +--- + +## `error.rs` — error types + +```rust +pub struct Location { pub line_number: usize } + +pub struct ParseError { + pub location: Location, + pub message: String, + pub is_warning: bool, +} + +pub type ParseResult = Result; +``` + +`Location` is just a line number — no column, no file path. `line_number == 0` +means the error originated from a command-line argument rather than source text. + +The `err!` macro constructs a `ParseError` at the current `loc`: + +```rust +err!(self.loc, "expected '{' before function body") +err!(self.loc, "expected {} result values, {} given", num_results, results.len()) +``` + +--- + +## `lexer.rs` — `Lexer` + +```rust +pub struct Lexer<'a> { + source: &'a str, + chars: CharIndices<'a>, + lookahead: Option, + pos: usize, + line_number: usize, +} +``` + +The lexer keeps one character of lookahead and advances character-by-character +via `next_ch`. It tracks `line_number` by counting `'\n'` characters. All +`Token` variants that carry string data hold `&'a str` slices directly into +`source` — no copies. + +### Key lexing rules + +- **Numbers**: `scan_number` handles `+`/`-` signs, hex prefixes (`0x`), + floats (`.` or `p` exponent), `NaN[:payload]`, `Inf`, and `sNaN`. A `-` + followed by a non-numeric character is emitted as `Token::Minus` rather than + a number prefix. +- **Entity names**: `scan_word` reads an alphanumeric word then calls + `split_entity_name` to check if it matches `v{N}`, `block{N}`, or `fn{N}`. + If so, the corresponding typed token is returned; otherwise `Token::Identifier`. +- **`split_entity_name`**: splits a word at the boundary between a letter + prefix and a decimal suffix. Leading zeros in the suffix are rejected + (e.g. `block007` is not a valid entity name). +- **`%names`**: `scan_name` reads alphanumeric + `_` characters after `%` and + returns the interior as `Token::Name`. +- **Strings**: `scan_string` reads until the closing `"`, handling `\0`, + `\n`, `\r`, `\t`, `\\`, `\"` escape sequences. +- **`->` arrow**: detected with `looking_at("->")` before trying to scan `-` + as a number. +- **`@N`** (source location) and **`#N`** (hex sequence): each scanned until a + non-hex digit is found. + +--- + +## `parser.rs` — `Parser` and `Context` + +### `Parser<'a>` + +```rust +pub struct Parser<'a> { + lex: Lexer<'a>, + lex_error: Option, + lookahead: Option>, + loc: Location, + interner: Rodeo, // lasso string interner for sconst values +} +``` + +The parser is a single-token lookahead recursive-descent parser over the +lexer's token stream. `token()` lazily fills `lookahead` by calling +`lex.next()`, skipping `Comment` tokens implicitly (they are consumed but not +stored). `consume()` takes the lookahead. `match_token` / `optional` are the +standard LL(1) primitives. + +`interner` is a `lasso::Rodeo` that interns string constant values (`sconst`). +All interned `Spur` handles refer to this rodeo; the caller receives it back +alongside the `Function` to allow string lookups after parsing. + +### `Context` + +```rust +struct Context { function: Function } +``` + +`Context` wraps the `Function` being built and provides two allocating helpers: + +- `add_sig(sig, data)` — grows `dfg.signatures` up to the given `FuncRef` + index (filling gaps with default signatures) and then writes `data` at that + slot. This allows preamble declarations to appear in any order. +- `add_block(block)` — grows `layout` up to the given block number and + appends it. Blocks are always processed in declaration order. + +### `match_value` + +```rust +fn match_value(&mut self, ctx: &mut Context, err_msg: &str) -> ParseResult +``` + +When a `Token::Value(v)` is consumed, `ctx.function.dfg` is grown +(`make_invalid_value()` is called repeatedly) until `num_values > v`. This +allows forward references: a value like `v14` can appear in a phi-node operand +before its defining instruction has been parsed. The defining instruction's +`make_inst_results_reusing` call later patches the `Invalid` slot to the real +`ValueDef`. + +### Parsing pipeline + +``` +parse_functions / parse_function + └── parse_function + ├── match_identifier("function") + ├── parse_external_name → function name + ├── parse_func_params → v0, v1, … registered as Param values + ├── match_token(LBrace) + ├── parse_preamble → constant defs + signature decls + ├── parse_function_body → basic blocks + instructions + └── match_token(RBrace) +``` + +**Preamble** (`parse_preamble`): loops on: +- `FuncRef` token → `parse_signature_decl` → `ctx.add_sig` +- `Value = fconst|iconst|sconst …` → constant definition via `dfg.values.fconst_at` etc. +- Anything else → exits the preamble loop. + +**Function body** (`parse_function_body`): loops calling `parse_basic_block` +until a `RBrace` is seen. + +**Basic block** (`parse_basic_block`): consumes `blockN:`, calls `ctx.add_block`, +then loops calling `parse_instruction` while the lookahead is a `Value`, +`Identifier`, `LBracket`, or `SourceLoc` token. + +**Instruction** (`parse_instruction` + `parse_inst_operands`): reads the +opcode identifier via `text.parse::()` (using the `FromStr` impl +generated by `sourcegen`), then dispatches on `opcode.format()`: + +| Format | Text syntax | Example | +|--------|-------------|---------| +| `Unary` | `opcode v` | `fneg v3` | +| `Binary` | `opcode v, v` | `fadd v1, v2` | +| `Jump` | `jmp blockN` | `jmp block3` | +| `Branch` | `br v, blockN[loop]?, blockN` | `br v12, block1[loop], block2` | +| `Call` | `opcode fnN(v, …)` | `call fn0(v1, v2)` | +| `PhiNode` | `phi [v, blockN], …` | `phi [v13, block1], [v11, block2]` | +| `Exit` | `exit` | `exit` | + +After building `InstructionData`, the parser calls: +- `dfg.make_inst(inst_data)` — allocates the `Inst`. +- `dfg.make_inst_results_reusing(inst, results)` — creates result `Value`s, + reusing the pre-allocated `Invalid` slots from earlier `match_value` calls. +- `layout.append_inst_to_bb(inst, block)` — places the instruction in the CFG. + +The optional `@hexnum` source location prefix is parsed by `optional_srcloc` +and stored in `func.srclocs[inst]`. + +### Phi node parsing + +Phi operands are pairs `[value, block]`: + +``` +phi [v13, block1], [v11, block2] +``` + +For each pair, the value is pushed onto a `ValueList` (getting a position +index), and the block is mapped to that position in a `bforest::Map` +(the `blocks` field of `PhiNode`). This is the same representation used by the +MIR at runtime. + +### `VariableArgs` + +A thin `Vec` newtype with a helper: + +```rust +pub fn into_value_list(self, fixed: &[Value], pool: &mut ValueListPool) -> ValueList +``` + +Used by `Call` parsing to convert the argument list into a `ValueList` in the +pool, prepending any fixed arguments. + +### Signature syntax + +``` +fn0 = const fn %name(2) -> 1 +fn1 = fn %callback(0) -> 0 +``` + +- `const` prefix → `has_sideeffects = false` +- `fn %name` → function name +- `(N)` → parameter count +- `-> N` → return count (optional; 0 if absent) + +--- + +## Public API + +```rust +pub fn parse_function(text: &str) -> ParseResult<(Function, Rodeo)> +pub fn parse_functions(text: &str) -> ParseResult<(Vec, Rodeo)> +``` + +Both return the `Rodeo` string interner alongside the function(s) so that +callers can look up `sconst` string values by their `Spur` handle. The rodeo +is created fresh per `Parser` and not shared across multiple `parse_function` +calls in a single session. + +--- + +## Worked example: `mir_autodiff` test + +The `check_num` test helper in `mir_autodiff/src/builder/tests.rs` uses +`mir_reader` to set up a fixture function, then verifies both the textual and +numerical output of auto-differentiation: + +```rust +let src = r##" + function %bar(v10, v11) { + fn0 = const fn %ddx_v10(1) -> 1 + fn1 = const fn %ddx_v11(1) -> 1 + block0: + v0 = fmul v10, v11 + v1 = call fn0(v0) + exit + } +"##; + +let (mut func, _rodeo) = parse_function(src).unwrap(); +// … run auto_diff, then interpret and check v100 ≈ expected derivative +``` + +`parse_function` converts the text into a live `Function` complete with +`DataFlowGraph`, `Layout`, signatures, and constant values. The `_rodeo` is +discarded here because there are no `sconst` values in the fixture. After +`auto_diff` transforms the function, `Function::print` serialises it back to +text (using the same `Rodeo`) for the `expect_test` snapshot comparison. + +--- + +## Key design decisions + +**Cranelift `.clif` format as the baseline.** Reusing the established Cranelift +IR text format means the MIR is human-readable in a familiar style and the +parser design is well-understood. The deviations from `.clif` are minor +(phi node syntax, no type annotations) and documented by the grammar comments +in `parser.rs`. + +**`&'a str` slices, not `String` copies.** All `Token` variants that carry +text (`Name`, `Identifier`, `Float`, `Integer`, `String`, `HexSequence`, +`SourceLoc`, `Comment`) hold references into the original source string. No +heap allocation occurs during lexing. The parser interns only `sconst` string +values (into the `Rodeo`) since those need to outlive the source text. + +**Forward references via `make_invalid_value`.** Rather than requiring a +two-pass parse or topological ordering of values, `match_value` eagerly +allocates `Invalid` placeholder slots up to the referenced index. The defining +instruction then overwrites the slot via `make_inst_results_reusing`. This +keeps the parser single-pass at the cost of allocating a few extra `Invalid` +entries for forward references. + +**`lasso::Rodeo` returned to the caller.** The interner is not hidden inside +the parser; it is handed back alongside the `Function` so that callers can +resolve `Spur` handles to their underlying strings. This avoids a separate +global interner and keeps `mir_reader` stateless between calls. + +**`parse_functions` for multi-function files.** The `mir_autodiff` tests +occasionally embed multiple functions in a single source string. `parse_functions` +loops `parse_function` until EOF, sharing one `Parser` (and one `Rodeo`) across +all functions in the file. From 3b24dc574a36a281e35cde0d11341bf99679ddf9 Mon Sep 17 00:00:00 2001 From: philippevelha Date: Sun, 24 May 2026 12:58:35 +0200 Subject: [PATCH 24/28] docs: add openvaf and openvaf_driver INTERNALS Covers the compilation library (Opts, CompilationDestination, CompilationTermination, compile/expand pipeline, cache hash logic) and the CLI binary (mimalloc allocator, ARGS mutex, clap flag table, matches_to_opts, crash_report panic hook). Co-Authored-By: Claude Sonnet 4.6 --- docs/openvaf/INTERNALS.md | 283 +++++++++++++++++++++++++++++++ docs/openvaf_driver/INTERNALS.md | 226 ++++++++++++++++++++++++ 2 files changed, 509 insertions(+) create mode 100644 docs/openvaf/INTERNALS.md create mode 100644 docs/openvaf_driver/INTERNALS.md diff --git a/docs/openvaf/INTERNALS.md b/docs/openvaf/INTERNALS.md new file mode 100644 index 00000000..6558e9fc --- /dev/null +++ b/docs/openvaf/INTERNALS.md @@ -0,0 +1,283 @@ +# `openvaf` — Compilation pipeline library + +**Location:** `openvaf/openvaf/` +**Role:** The library crate that ties the entire compiler together. It exposes +two entry-point functions — `compile` and `expand` — plus the `Opts` struct +that carries every compilation option. The binary crate `openvaf-driver` +depends on this crate; it is also the natural integration point for any tool +that wants to embed the OpenVAF compiler. + +Cross-links: [basedb INTERNALS](../basedb/INTERNALS.md) · +[hir INTERNALS](../hir/INTERNALS.md) · +[sim_back INTERNALS](../sim_back/INTERNALS.md) · +[osdi INTERNALS](../osdi/INTERNALS.md) · +[mir_llvm INTERNALS](../mir_llvm/INTERNALS.md) · +[linker_target INTERNALS](../linker_target/INTERNALS.md) · +[base_n INTERNALS](../base_n/INTERNALS.md) · +[ARCHITECTURE](../../ARCHITECTURE.md) + +--- + +## Crate layout + +``` +openvaf/openvaf/src/ + lib.rs — Opts, CompilationDestination, CompilationTermination, compile(), expand() + cache.rs — cache file name derivation (MD5 hash → base-36 filename) +``` + +The crate has no `main.rs`; it is a `[lib]` that is called by `openvaf-driver` +and by the integration tests (`tests/integration.rs`, `harness = false`). + +--- + +## Public types + +### `Opts` + +```rust +pub struct Opts { + pub dry_run: bool, + pub defines: Vec, // -D MACRO[=VALUE] + pub codegen_opts: Vec, // -C OPT[=VALUE] (passed to LLVM) + pub lints: Vec<(String, LintLevel)>, + pub input: Utf8PathBuf, // root .va file + pub output: CompilationDestination, + pub include: Vec, // -I directories + pub opt_lvl: LLVMCodeGenOptLevel, // LLVM optimisation level (0–3) + pub target: Target, // target triple + pub target_cpu: String, // "native", "generic", or specific CPU + pub dump_mir: bool, // print optimised MIR to stdout + pub dump_unopt_mir: bool, // print unoptimised MIR to stdout + pub dump_ir: bool, // print optimised LLVM IR to stdout + pub dump_unopt_ir: bool, // print unoptimised LLVM IR to stdout +} +``` + +`Opts` is `Clone` so the driver can stash a copy in the crash-report mutex +before compilation starts. + +### `CompilationDestination` + +```rust +pub enum CompilationDestination { + Path { lib_file: Utf8PathBuf }, // explicit -o output path + Cache { cache_dir: Utf8PathBuf }, // batchmode: content-addressed cache +} +``` + +### `CompilationTermination` + +```rust +pub enum CompilationTermination { + Compiled { lib_file: Utf8PathBuf }, + FatalDiagnostic, // errors were emitted; caller should exit +} +``` + +`FatalDiagnostic` means the compiler already printed the errors to stderr via +`ConsoleSink`; the caller should exit with a non-zero code without printing +anything further. + +--- + +## `compile` — the full pipeline + +```rust +pub fn compile(opts: &Opts) -> Result +``` + +This function is the top-level orchestration of the entire compiler. The steps +in order: + +### 1. Resolve input and create the Salsa database + +```rust +let input = opts.input.canonicalize()?; +let input = AbsPathBuf::assert(input); +let db = CompilationDB::new_fs(input, &opts.include, &opts.defines, &opts.lints)?; +``` + +`CompilationDB` is the Salsa incremental compilation database. It owns the +VFS, all parsed source, the HIR, type information, and every other +query-computed value. Building it registers the root file with the VFS and +sets up the preprocessor include path. + +### 2. Cache check (batchmode only) + +```rust +if let CompilationDestination::Cache { cache_dir } = &opts.output { + let file_name = cache::file_name(&db, opts); + let lib_file = cache_dir.join(file_name); + if cfg!(not(debug_assertions)) && lib_file.exists() { + return Ok(CompilationTermination::Compiled { lib_file }); + } + create_dir_all(cache_dir)?; +} +``` + +In batchmode the output filename is a content-addressed hash of the input +(see [cache logic](#cache-logic) below). If the file already exists and this +is a release build, compilation is skipped entirely and the cached path is +returned. This is the only early exit in `compile`. + +### 3. Module collection + +```rust +let modules = collect_modules(&db, false, &mut ConsoleSink::new(&db))?; +``` + +`sim_back::collect_modules` runs the full frontend (preprocessor → parser → +HIR lowering → type checking → `sim_back` model extraction). It returns a +`Vec` — one entry per `module … endmodule` block. If any fatal +diagnostic is emitted the function returns `None` and `compile` returns +`FatalDiagnostic`. + +### 4. LLVM backend initialisation + +```rust +let back = LLVMBackend::new(&opts.codegen_opts, &opts.target, opts.target_cpu.clone(), &[]); +``` + +`LLVMBackend` initialises the LLVM target machine for the requested triple and +CPU. Codegen options (`-C` flags) are forwarded directly to LLVM. + +If `opts.dry_run` is set, compilation returns here — the frontend ran +(catching any parse/type errors) but no object files are produced. + +### 5. OSDI compilation + +```rust +let (paths, compiled_modules, literals) = osdi::compile( + &db, &modules, &lib_file, &opts.target, &back, + /*emit_ir=*/true, opts.opt_lvl, + opts.dump_mir, opts.dump_unopt_mir, + opts.dump_ir, opts.dump_unopt_ir, +); +``` + +`osdi::compile` runs MIR construction, optimisation, AD differentiation, LLVM +codegen, and writes one temporary object file per module per compilation unit. +The `dump_*` flags cause intermediate representations to be printed to stdout +at the relevant stages. `paths` is the list of temporary `.oN` object files +to link. + +### 6. MIR dump (optional) + +If `dump_mir` or `dump_unopt_mir` is set, the function prints module names, +the HIR string interner contents, and MIR for each compiled module using +`sim_back::print_module` and `sim_back::print_intern`. + +### 7. Linking + +```rust +link(None, &opts.target, lib_file.as_ref(), |linker| { + for path in &paths { linker.add_object(path); } +})?; +``` + +The `linker::link` function writes the final `.osdi` shared library. The +closure adds each temporary object file to the linker command line. + +### 8. Cleanup and timing + +Temporary object files are deleted, then a green `Finished building … in Xs` +message is printed to stderr. + +--- + +## `expand` — preprocessor-only mode + +```rust +pub fn expand(opts: &Opts) -> Result +``` + +Runs only the preprocessor (triggered by `--print-expansion`). It creates +`CompilationDB`, runs `cu.preprocess(&db)`, and prints each token's source +text to stdout — with a newline after line comments and a space after all +other tokens — approximating the expanded source. Diagnostics are collected +and summarised; if fatal errors are present `FatalDiagnostic` is returned. + +--- + +## Cache logic (`cache.rs`) + +```rust +pub fn file_name(db: &CompilationDB, opts: &Opts) -> String +``` + +Computes a deterministic cache filename for a given compilation. The filename +is `{hash}.osdi` where `hash` is the base-36 encoding of the MD5 digest (as a +`u128`) of: + +| Input | Notes | +|-------|-------| +| `root_file().0.to_ne_bytes()` | File ID of the root `.va` file | +| `defines.len()` + each `define` string | Preprocessor macro definitions | +| `env!("CARGO_PKG_VERSION")` | Compiler version (invalidates cache on upgrade) | +| lint overwrite bytes | `LintLevel` values, cast to bytes via `slice::from_raw_parts` | +| All non-trivia preprocessor tokens | The source content after macro expansion, one `" "` separator per token | + +The MD5 is computed over the **preprocessed** token stream (not the raw +source), so `\`include` files are transitively included in the hash. Two +compilations that produce identical preprocessed output and use the same +compiler version, defines, and lints will always share a cache entry. + +The MD5 digest is reinterpreted as a `u128` via `u128::from_ne_bytes` and +encoded with `base_n::encode(hash, base_n::CASE_INSENSITIVE)` (base 36, +digits + lowercase letters), giving a 25-character filename that is safe on +case-insensitive filesystems. + +> **TODO(verify):** The cache does not cover `opts.opt_lvl`, `opts.target`, or +> `opts.target_cpu`. Two compilations with different optimisation levels or +> target triples will produce the same cache filename if the source and defines +> match, and the second will find and return the first's output. This may be +> intentional (OSDI output is target-independent in practice) or a limitation. + +--- + +## Re-exported items + +`openvaf` re-exports several items from its dependencies to give `openvaf-driver` +a single import point: + +| Re-export | Source | +|-----------|--------| +| `builtin_lints` | `basedb::lints::builtin` | +| `LintLevel` | `basedb::lints` | +| `LLVMCodeGenOptLevel` | `llvm_sys::target_machine` | +| `AbsPathBuf` | `paths` | +| `host_triple` | `target` | +| `get_target_names`, `Target` | `target::spec` | + +--- + +## Integration tests (`tests/integration.rs`) + +The integration test binary (`harness = false`) uses `mini_harness` to run one +test per directory in `integration_tests/`. Each test calls `compile` with +`dry_run = true`, checks that no fatal diagnostic was produced, and compares +the diagnostic output against a snapshot. This verifies the complete pipeline +from source text to compiled output without requiring a simulator. + +--- + +## Key design decisions + +**Library, not binary.** Exposing `compile` and `expand` as a library API +rather than inline in `main` makes the compiler embeddable and testable without +spawning a process. The integration tests call `compile` directly and inspect +`CompilationTermination` without any subprocess overhead. + +**`CompilationTermination::FatalDiagnostic` instead of `Err`.** Fatal +diagnostics (type errors, undefined names, etc.) are not `anyhow::Error`s — +they have already been formatted and printed to stderr by `ConsoleSink`. The +`FatalDiagnostic` variant signals the caller to exit with a non-zero code +without double-printing the error. Only I/O errors (failed to read file, failed +to write output) propagate as `Err`. + +**Cache keyed on preprocessed tokens, not raw source bytes.** Hashing the +preprocessed token stream rather than the raw bytes means that changes to +whitespace, comments, or `\`include` file structure that don't affect the +semantics don't invalidate the cache. Two `.va` files with different formatting +but identical token sequences produce the same cache entry. diff --git a/docs/openvaf_driver/INTERNALS.md b/docs/openvaf_driver/INTERNALS.md new file mode 100644 index 00000000..acdc569d --- /dev/null +++ b/docs/openvaf_driver/INTERNALS.md @@ -0,0 +1,226 @@ +# `openvaf-driver` — CLI binary + +**Location:** `openvaf/openvaf-driver/` +**Role:** The `openvaf` executable. Parses command-line arguments with `clap`, +translates them into an `Opts` struct, installs a crash-report panic hook, and +calls `openvaf::compile` or `openvaf::expand`. Contains no compiler logic of +its own — all compilation is delegated to the `openvaf` library crate. + +Cross-links: [openvaf INTERNALS](../openvaf/INTERNALS.md) · +[ARCHITECTURE](../../ARCHITECTURE.md) + +--- + +## Crate layout + +``` +openvaf/openvaf-driver/src/ + main.rs — entry point, global allocator, error printing + cli_def.rs — clap Command definition and flag name constants + cli_process.rs — ArgMatches → Opts translation + crash_report.rs — panic hook that writes a crash log to /tmp +``` + +`openvaf-driver` is a binary-only crate (no `[lib]`). Its only public surface +is the `openvaf` executable. + +--- + +## `main.rs` + +### Global allocator + +```rust +#[global_allocator] +static GLOBAL: MiMalloc = MiMalloc; +``` + +`mimalloc` replaces the default system allocator for the process. This gives +measurable throughput improvements on multi-module compilations where many +small allocations are made by the Salsa query system. + +### `ARGS` mutex + +```rust +static ARGS: Mutex> = Mutex::new(None); +``` + +After `matches_to_opts` succeeds, the resolved `Opts` is stored here. The +crash reporter reads it in the panic hook to include the compilation arguments +in the crash log. The mutex is necessary because the panic hook runs on an +arbitrary thread. + +### `main` flow + +``` +main() + ├─ clap::main_command().get_matches() — parse CLI + ├─ crash_report::install_panic_handler() + ├─ env_logger::init() — OPENVAF_LOG / OPENVAF_LOG_STYLE env vars + └─ wrapped_main(matches) + ├─ matches_to_opts(matches) — may exit(0) for --lints / --supported-targets + ├─ *ARGS.lock() = Some(opts) + ├─ if --print-expansion → expand(&opts) → exit(0 or 65) + ├─ if --dump-json → bail! (unimplemented) + └─ compile(&opts) → print lib_file if Cache mode → exit(0 or 65) +``` + +Errors from `wrapped_main` are printed as a `clap`-style chain of red `error:` +lines, one per `anyhow` cause. The process then exits without a code (falls off +`main`, which exits 0 — the non-zero code is only set via `exit()` inside +`wrapped_main` for `FatalDiagnostic`). + +### Exit codes + +| Code | Meaning | +|------|---------| +| `0` | Success | +| `65` (`DATA_ERROR`) | Fatal diagnostic — compiler errors in the input | +| non-zero from `anyhow` | I/O or configuration error (file not found, bad target, etc.) | + +`65` is the POSIX `EX_DATAERR` code, indicating that the input data was +malformed. It is distinct from generic failure so that scripts can distinguish +"source has errors" from "the compiler itself failed." + +--- + +## `cli_def.rs` — command definition + +All flag names are `pub const &str` constants so `cli_process.rs` can refer to +them by name without string literals: + +```rust +pub const INPUT: &str = "input"; +pub const OUTPUT: &str = "output"; +pub const DEFINE: &str = "define"; +pub const INCLUDE: &str = "include"; +pub const TARGET: &str = "target"; +pub const OPT_LVL: &str = "opt_lvl"; +// … etc. +``` + +`main_command()` builds a `clap::Command` with the following flags: + +| Flag | Short | Type | Default | Notes | +|------|-------|------|---------|-------| +| `input` | — | FILE | required | Root `.va` file | +| `--output` / `-o` | `-o` | FILE | `{input}.osdi` | Conflicts with `--batch` | +| `--include` / `-I` | `-I` | DIR | — | Repeatable; directory must exist | +| `--define` / `-D` | `-D` | `MACRO[=VALUE]` | — | Repeatable | +| `--allow` / `-A` | `-A` | LINT | — | Repeatable | +| `--warn` / `-W` | `-W` | LINT | — | Repeatable | +| `--deny` / `-E` | `-E` | LINT | — | Repeatable | +| `--lints` | — | flag | false | Print lint list and exit | +| `--target` | — | TARGET | host triple | Must be one of `get_target_names()` | +| `--supported-targets` | — | flag | false | Print targets and exit | +| `--target_cpu` | — | CPU | `native`/`generic` | Passed to LLVM | +| `--opt_lvl` / `-O` | `-O` | 0–3 | `3` | LLVM optimisation level | +| `--codegen` / `-C` | `-C` | `OPT[=VALUE]` | — | Repeatable; forwarded to LLVM | +| `--batch` / `-b` | `-b` | flag | false | Batchmode (content-addressed cache) | +| `--cache-dir` | — | DIR | platform cache dir | Requires `--batch` | +| `--interface` / `-i` | `-i` | `OSDI` | `OSDI` | Output format; only OSDI is implemented | +| `--dry-run` | — | flag | false | Parse+typecheck only, no output | +| `--dump-mir` | — | flag | false | Print optimised MIR to stdout | +| `--dump-unopt-mir` | — | flag | false | Print unoptimised MIR to stdout | +| `--dump-ir` | — | flag | false | Print optimised LLVM IR to stdout | +| `--dump-unopt-ir` | — | flag | false | Print unoptimised LLVM IR to stdout | +| `--print-expansion` | — | flag | false | Run preprocessor only, print result | +| `--dump-json` | — | flag | false | Unimplemented; bails immediately | + +Path arguments use custom `clap` `ValueParser`s that validate existence and +type (file vs. directory) at parse time, so argument errors are reported +before any compilation starts. + +--- + +## `cli_process.rs` — `matches_to_opts` + +`matches_to_opts(matches: ArgMatches) -> Result` is the only public +function. It handles two early exits before constructing `Opts`: + +```rust +if matches.get_flag(LINTS) { print_lints(); exit(0) } +if matches.get_flag(SUPPORTED_TARGETS) { print_targets(); exit(0) } +``` + +**Output destination resolution:** + +- `--batch` → `CompilationDestination::Cache`. Cache directory is taken from + `--cache-dir` if given, otherwise from + `directories_next::ProjectDirs::from("com", "semimod", "openvaf").cache_dir()` — + the platform-appropriate user cache directory (`~/.cache/openvaf` on Linux, + `%LOCALAPPDATA%\semimod\openvaf\cache` on Windows). +- No `--batch` → `CompilationDestination::Path`. Output path is `--output` if + given, otherwise `{input}.osdi` (same directory as the input file). + +**Target and CPU resolution:** + +```rust +let host = host_triple(); +let target = matches.get_one(TARGET).cloned().unwrap_or_else(|| host.to_owned()); +let default_cpu = if host != target { "generic" } else { "native" }; +``` + +Cross-compilation defaults to `"generic"` CPU to avoid emitting host-specific +instructions in the output. Native compilation defaults to `"native"` for best +performance. + +**`print_lints` / `print_targets`:** Print to stdout with colour: errors in +red, warnings in yellow, allowed lints in green; targets in yellow. Each exits +with code 0. + +--- + +## `crash_report.rs` — panic handler + +`install_panic_handler()` replaces the default Rust panic handler with a +custom hook that: + +1. Extracts the panic message from the `PanicHookInfo` payload (tries `&str` + then `String`). +2. Records the panic location (file + line number if available). +3. Appends a symbolicated backtrace using `backtrace` + `backtrace_ext`'s + `short_frames_strict` (deduplicates inlined frames). +4. Prepends the OpenVAF version and the current `ARGS` (the `Opts` that were + being compiled, from the global mutex). +5. Writes everything to a timestamped file in the system temp directory: + `openvaf-crash-{unix_timestamp}.log`. +6. Prints a red error message to stderr naming the log file and asking the + user to file an issue. + +The hook is **not installed** in debug builds (`cfg(debug_assertions)`), so +development panics surface normally with the default Rust handler. + +The `Report` struct accumulates the log text in a `String` using `write!` / +`writeln!`. If writing to disk fails, the report is printed to stderr instead +and `handle_dump` returns `None`. + +--- + +## Key design decisions + +**`mimalloc` as the global allocator.** The Salsa incremental database and the +MIR make many small, short-lived allocations. `mimalloc` is significantly +faster than the system allocator (glibc `malloc`, Windows `HeapAlloc`) for +this allocation pattern. It is opt-in here at the binary level so the library +crate remains allocator-agnostic. + +**`ARGS` mutex for crash reporting.** The panic hook must access the `Opts` +that were being compiled to include them in the crash log. Storing `Opts` in a +`static Mutex>` is the simplest way to make it available to the +hook without thread-local storage or `Arc` threading. The mutex is only +contended if a panic occurs while another thread holds it, which cannot happen +in the current single-threaded compilation model. + +**Early exit for `--lints` and `--supported-targets`.** These informational +flags exit immediately inside `matches_to_opts` rather than returning a special +`CompilationTermination` variant. This keeps the `Opts` type simple (no +`ListLints` or `ListTargets` variant) at the cost of making `matches_to_opts` +impure (it can `exit`). The trade-off is acceptable for a CLI binary where +these paths don't need to be testable in isolation. + +**Custom path validators in `clap`.** Validating that input files exist and +output directories are writable at argument-parse time gives the user a clean +error before any compiler work starts. Without this, a missing include +directory would surface as a VFS error deep inside Salsa, with a less +informative message. From 7dd42b3a3f2567a645b2fee6ebc7e64e1d7fa446 Mon Sep 17 00:00:00 2001 From: philippevelha Date: Sun, 24 May 2026 13:45:29 +0200 Subject: [PATCH 25/28] docs: add melange-core INTERNALS MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Covers Circuit / Node / DeviceId / ModelId / InstanceId; the DeviceImpl / ModelImpl / InstanceImpl trait hierarchy; the Expr / Value / Arena expression system; CircuitDescription elaboration; the Newton-Raphson DC/AC solver backed by KLU; and the Verilog-A device loading path (openvaf::compile → dlopen → OsdiDevice). Co-Authored-By: Claude Sonnet 4.6 --- docs/melange_core/INTERNALS.md | 532 +++++++++++++++++++++++++++++++++ 1 file changed, 532 insertions(+) create mode 100644 docs/melange_core/INTERNALS.md diff --git a/docs/melange_core/INTERNALS.md b/docs/melange_core/INTERNALS.md new file mode 100644 index 00000000..d3a371d2 --- /dev/null +++ b/docs/melange_core/INTERNALS.md @@ -0,0 +1,532 @@ +# `melange-core` — Embedded circuit simulator + +**Location:** `melange/core/` +**Role:** An analog circuit simulator library that is layered on top of the +OpenVAF compiler. It provides a typed Rust API for building netlists, loading +Verilog-A compact models (by invoking `openvaf::compile` at runtime), and +running DC and AC operating-point analyses. `melange-core` is **not** part of +the compiler pipeline — it is a downstream consumer of the compiled `.osdi` +output. + +Cross-links: [openvaf INTERNALS](../openvaf/INTERNALS.md) · +[osdi INTERNALS](../osdi/INTERNALS.md) · +[ARCHITECTURE](../../ARCHITECTURE.md) + +--- + +## Crate layout + +``` +melange/core/src/ + lib.rs — public re-exports (Circuit, CircuitDescription, Arena, Expr, Value) + circuit.rs — Circuit, Node, DeviceId, ModelId, InstanceId, CircuitModel, CircuitInstance + elaboration.rs — CircuitDescription, CircuitInstanceDescription, CircuitModelDescription + devices.rs — DeviceImpl / ModelImpl / InstanceImpl traits; DeviceInfo; default_devices() + devices/ + params.rs — DeviceParams, ParamId, Type + resistor.rs — built-in Resistor device + vsource.rs — built-in VoltageSrc device + expr.rs — Expr, Value, Arena, ExprEvalCtx, CircuitParam, ExprPtr + simulation.rs — Simulation, SimConfig, SimBuilder, SimInfo; DC/AC solvers + simulation/ + flags.rs — EvalFlags, SimulationState (bitflags) + matrix.rs — MatrixBuilder, SimulationMatrix, MatrixEntryIter (KLU wrapper) + veriloga.rs — compile_va(), load_osdi_lib(), Opts; OSDI log hook + veriloga/ + osdi_0_4.rs — auto-generated OSDI v0.3/0.4 C-struct bindings + osdi_device.rs — OsdiDevice: DeviceImpl wrapping an OsdiDescriptor + utils.rs — PrettyPrint helper for table output + tests.rs — integration tests (currently disabled on Windows) +``` + +--- + +## Crate relationships + +``` +melange-core + ├─► openvaf (compile_va calls openvaf::compile) + ├─► klu-rs (KLU sparse direct solver for Newton iteration) + ├─► num-complex (Complex64 for AC analysis) + ├─► libloading (dlopen the compiled .osdi shared library) + ├─► lasso (string interning in Arena / Expr) + ├─► typed_indexmap (TiMap / TiSet in Circuit internals) + └─► typed-index-collections (TiVec / TiSlice throughout) +``` + +The library has **no binary**. Its consumers build a `Circuit`, populate it +with device instances and parameters, construct a `Simulation`, and call +`dc_op()` or `ac()`. + +--- + +## `Circuit` — the central netlist data structure + +```rust +pub struct Circuit { + pub name: String, + nodes: TiSet, + devices: TiMap, + models: TiVec, + instances: TiVec, + namespace: AHashMap, + param_assignments: IndexMap, +} +``` + +`Circuit` is always in a valid, simulation-ready state. There is no +separate "validation" step — every mutation method (`new_model`, +`new_model_instance`, `set_instance_param`, …) checks correctness +immediately and returns `Result`. + +### Named entities and the namespace + +Every top-level named item (nodes, devices, models, instances) is registered +in `namespace: AHashMap` where `NameSpaceEntry` is +one of `Device(DeviceId)`, `Model(ModelId)`, or `Instance(InstanceId)`. +The namespace enforces uniqueness: inserting a name that already exists +panics with `unreachable!` (the assertion is deliberate — the API is designed +so duplicates cannot arise if used correctly). + +### Index types + +| Type | `u32` newtype | What it indexes | +|------|---------------|----------------| +| `Node` | `Node(u32)` | External circuit node (0 = ground) | +| `DeviceId` | `DeviceId(u32)` | Registered device type | +| `ModelId` | `ModelId(u32)` | Circuit model (explicit or implicit) | +| `InstanceId` | `InstanceId(u32)` | Circuit instance | + +All four implement `impl_idx_from!(T(u32))` from `stdx` so they can serve as +`TiVec`/`TiSlice` indices. + +`Node(0)` is the ground node, always created first by `Circuit::new`. Its +`matrix_idx()` is `-1` (excluded from the Jacobian matrix — ground has no +unknown potential). + +### Model vs. instance + +A *model* (`CircuitModel`) is a collection of model parameters bound to a +device type. An *instance* (`CircuitInstance`) is a concrete instantiation of +a model, with its own instance parameters and terminal connections. + +When a user creates an instance directly with `new_device_instance`, an +*implicit* model is created automatically (its `src` field is +`CircuitModelSrc::Implicit(instance_id)`). When a model is created explicitly +with `new_model`, instances can share it. + +### Built-in devices + +`default_devices()` returns two built-in `DeviceImpl` implementations: + +| Device name | Terminals | Instance param | +|-------------|-----------|---------------| +| `"resistor"` | `A`, `C` | `r` (resistance in Ω) | +| `"vsource"` | ... | voltage, etc. | + +Both are registered in every new `Circuit` during `Circuit::new`. + +### Loading Verilog-A devices + +```rust +pub fn load_veriloga_file(&mut self, path: Utf8PathBuf, opts: &veriloga::Opts) -> Result> +``` + +This method calls `veriloga::compile_va` (which invokes `openvaf::compile`), +then `libloading` to `dlopen` the resulting `.osdi` shared library, and wraps +each `OsdiDescriptor` entry point as an `OsdiDevice` (implementing +`DeviceImpl`). The new devices are registered in the circuit and returned. + +If a device with the same name is already registered, a warning is emitted +and the new device is silently ignored. + +--- + +## Device trait hierarchy + +### `DeviceImpl` — type-level device factory + +```rust +pub trait DeviceImpl { + fn get_name(&self) -> &'static str; + fn get_terminals(&self) -> Box<[&'static str]>; + fn get_params(&self) -> DeviceParams; + fn new_model(&self) -> Rc; +} +``` + +One `DeviceImpl` object exists per device *type* (e.g., one `Resistor` +singleton). It creates `Rc` instances when models are +instantiated. + +### `ModelImpl` — per-model state + +```rust +pub trait ModelImpl { + fn process_params(&self) -> Result<()>; + fn set_real_param(&self, param: ParamId, val: f64); + fn set_int_param(&self, param: ParamId, val: i32) { unreachable!(...) } + fn set_str_param(&self, param: ParamId, val: &str) { unreachable!(...) } + fn new_instance(self: Rc) -> Box; +} +``` + +`ModelImpl` uses `Rc` receiver for `new_instance` so that multiple +instances can share the same model data via `Rc::clone`. Parameters are +stored in `Cell<…>` fields to allow interior mutability through the shared +`Rc`. + +### `InstanceImpl` — per-instance simulation state + +```rust +pub trait InstanceImpl { + fn process_params(&mut self, temp: f64, sim_builder: &mut SimBuilder, terminals: &[Node]) -> Result<()>; + fn populate_matrix_ptrs(&mut self, matrix_entries: MatrixEntryIter); + fn eval(&mut self, sim_info: SimInfo<'_>) -> Result<()>; + unsafe fn load_matrix_resist(&self); + unsafe fn load_matrix_react(&self, alpha: f64); + fn load_residual_resist(&self, prev_solve: &TiSlice, rhs: &mut TiSlice); + fn load_residual_react(&self, prev_solve: &TiSlice, rhs: &mut TiSlice); + // … AC and lead-current variants with default no-op implementations +} +``` + +`load_matrix_resist` and `load_matrix_react` are `unsafe` because they write +directly through raw `NonNull>` pointers that were set up during +`populate_matrix_ptrs`. This avoids hash-map or array lookups on every +Newton iteration. + +### `MatrixEntry` and pointer-based matrix writes + +During `prepare_solver`, `MatrixEntryIter` yields one `MatrixEntry` per +`ensure_matrix_entry` call registered by the device. Each entry contains: + +```rust +pub struct MatrixEntry<'a> { + pub(crate) resist: &'a Cell, // real Jacobian slot + pub(crate) react: &'a Cell, // imaginary part of AC Jacobian +} +``` + +`populate_matrix_ptrs` stores raw `NonNull>` pointers into the +KLU matrix's backing storage. On every subsequent Newton iteration, +`load_matrix_resist` uses those pointers directly — the KLU matrix's memory +is pinned once it is built. + +--- + +## Expression system (`expr.rs`) + +Circuit parameter values and instance parameter expressions are represented +as `Expr`, a sum type: + +```rust +pub enum Expr { + Eval(ExprPtr), // pointer into the Arena + Value(Value), // constant: already evaluated +} + +pub enum Value { + Num(f64), + Str(Spur), // interned string handle + UNDEF, +} +``` + +`Expr::Value` is the common case (most parameters are numeric literals). +`Expr::Eval` stores a `u32` index into the `Arena`'s expression heap, used +for parameter-dependent expressions (e.g., `r = 2 * R_global`). + +### `Arena` — expression allocator and parameter registry + +```rust +pub struct Arena { + exprs: Vec, // expression nodes + params: TiVec>, + intern: Rodeo, // string interner +} +``` + +`Arena` is shared across circuits and simulations. Each `Circuit` owns a +`CircuitParamCtx` (an index into `arena.params`), which scopes its +parameters so that different circuits in the same session do not share +parameter namespaces. + +`CircuitParam::TEMPERATURE` is a built-in parameter automatically created at +`CircuitParamCtx::ROOT` (index 0) with name `"temp"`. + +### `ExprEvalCtx` — runtime evaluation context + +`ExprEvalCtx` holds the current parameter values as a flat `Box<[Value]>`, +with per-context offsets stored in `ctx_offsets: Box>`. +A `CircuitParam{ctx, param}` is looked up as: + +```rust +let off = ctx_offsets[param.ctx] + u32::from(param.param); +params[off as usize] +``` + +`ExprEvalCtxRef<'_>` is a borrow of `ExprEvalCtx` that is passed through +recursive `eval` calls. It has a `borrow()` method that returns a +shorter-lived `ExprEvalCtxRef` to work around the borrow checker in +recursive contexts. + +### Constant folding in expression builders + +The `Expr::add`, `Expr::mul`, `Expr::inv`, and `Expr::neg` methods do +partial constant folding at construction time: + +- `Expr::Value + Expr::Value` → immediately computes the result. +- `Expr::Eval(lhs) + Expr::Value(rhs)` where `lhs` is already a `Commutative { Add, … }` → folds the constant into the existing node in-place, mutating `arena.exprs[lhs.0]` without allocating a new node. + +This keeps the expression tree compact for the common case of scaled +parameters. + +--- + +## `CircuitDescription` — netlist-format-independent elaboration + +`CircuitDescription` is an intermediate representation intended for netlist +parsers. It stores instances and models by *name* (strings), deferring all +name resolution to `elaborate()`: + +```rust +pub struct CircuitDescription { + pub name: String, + pub instances: TiVec, + pub models: TiVec, + pub va_files: Vec, + pub earena: Arena, +} +``` + +`CircuitDescription::elaborate(earena, opts)` performs: +1. Compile and register each Verilog-A file in `va_files`. +2. For each model description, resolve the device name → `DeviceId` and call + `Circuit::new_model`. +3. For each instance description, look up its `master` field in the namespace + (model, device, or subcircuit) and dispatch to `new_model_instance` or + `new_device_instance`, wiring terminal names to `Node`s via `circuit.node()`. + +--- + +## `Simulation` — Newton-Raphson solver + +```rust +pub struct Simulation<'a> { + circ: &'a Circuit, + model_data: Box>>, + instance_data: Box>>, + matrix_builder: MatrixBuilder, + matrix: Option, + nodes: TiVec, + solution: TiVec, + ac_solution: TiVec, + residual_resist: TiVec, + residual_react: TiVec, + pub config: SimConfig, + state: SimulationState, // bitflags: which OPs are current + omega: f64, +} +``` + +### Construction sequence + +``` +Circuit::prepare_simulation(eval_ctx, arena, config) + └── Circuit::setup_simulation(config) → allocates Simulation + └── Simulation::prepare_solver(eval_ctx, arena) + ├─ evaluate and set all model parameters (ModelImpl::set_*_param, process_params) + ├─ evaluate and set all instance params (InstanceImpl::set_*_param) + ├─ MatrixBuilder::reset + ├─ for each instance: + │ InstanceImpl::process_params(temp, builder, terminals) + │ └─ SimBuilder::ensure_matrix_entry registers (col, row) pairs + ├─ SimulationMatrix::new_or_reset (builds KLU symbolic factorization) + └─ for each instance: + InstanceImpl::populate_matrix_ptrs (stores raw pointers into KLU storage) +``` + +### DC operating point: `dc_op()` + +`solve_op(OperatingPointAnalysis::DC)` runs a Newton-Raphson loop: + +``` +loop: + for each instance: + InstanceImpl::eval(sim_info) + unsafe { load_matrix_resist() } + load_residual_resist(solution, residual_resist) + KLU: lu_factorize → solve_linear_system(residual_resist[1..]) + update solution -= delta + check convergence: |delta| ≤ max(atol, |val| × rtol) +until converged or maxiters +``` + +The ground row/column (index 0) is excluded from the matrix solve — `[1..]` +slices skip it. + +### AC analysis: `ac(omega)` + +`ac()` first ensures a DC operating point via `ac_op()`. It then: +1. Calls `eval(EvalFlags::AC)` on each instance (the AC evaluation flag + activates reactive-branch contributions in Verilog-A devices). +2. Calls `load_matrix_resist()` and `load_matrix_react(omega)` to fill both + the real and imaginary parts of the complex Jacobian. +3. Solves the complex KLU system for `ac_solution`. + +The `SimulationState` bitflags track which operating points are stale: +setting `omega` clears `AT_AC` so the next call re-solves. + +### `SimConfig` + +```rust +pub struct SimConfig { + pub debug: bool, // print matrix and solution at each Newton step + pub maxiters: u32, // default: 100 + pub voltage_atol: f64, // default: 1e-6 V + pub current_atol: f64, // default: 1e-12 A + pub rtol: f64, // default: 1e-3 (relative tolerance) +} +``` + +### Internal nodes + +Devices can add internal nodes (e.g., a current-branch unknown) via +`SimBuilder::new_internal_branch` or `new_internal_node`. These grow the +`nodes` and `solution` vectors beyond the external node count. The matrix is +rebuilt each time `prepare_solver` is called with a changed topology. + +--- + +## Verilog-A device loading (`veriloga.rs`) + +`compile_va(path, opts)` is the bridge from Melange to OpenVAF: + +```rust +pub fn compile_va(path: &Utf8Path, opts: &Opts) -> Result>> +``` + +1. Resolves (or defaults) the cache directory, mirroring `openvaf-driver`'s + batch-mode logic (`~/.cache/melange` on Linux). +2. Constructs `openvaf::Opts` with `target_cpu = "native"`, + `opt_lvl = Aggressive`, and `CompilationDestination::Cache`. +3. Calls `openvaf::compile`. On `FatalDiagnostic`, returns an error (the + compiler has already printed diagnostics to stderr). +4. `dlopen`s the resulting `.osdi` file with `libloading::Library`. +5. Reads `OSDI_VERSION_MAJOR` / `OSDI_VERSION_MINOR` and rejects anything + other than `0.3`. +6. Reads `OSDI_NUM_DESCRIPTORS` and `OSDI_DESCRIPTORS` and slices them into + `&'static [OsdiDescriptor]` (the library is `Box::leak`-ed to give it + `'static` lifetime). +7. Installs `osdi_log` as the library's log callback. +8. Wraps each `OsdiDescriptor` in an `OsdiDevice` (which implements + `DeviceImpl` via the OSDI function-pointer table). + +> **TODO(verify):** The version check gates on `0.3` but the file name is +> `osdi_0_4.rs`. The auto-generated bindings may target v0.4 while the +> runtime check enforces v0.3. Check whether these need to be reconciled. + +--- + +## `MatrixBuilder` and KLU integration + +`MatrixBuilder` wraps `klu_rs::KluMatrixBuilder` and tracks, per instance, +which `(column, row)` matrix entries that instance writes: + +```rust +pub(crate) struct MatrixBuilder { + inner: KluMatrixBuilder, + pub instance_entries: Box>>, + dump: NonNull>, // ground entries redirect here +} +``` + +The `dump` pointer is a `Box::leak`-ed `Cell` that absorbs writes to +ground-connected matrix entries (`column == GROUND || row == GROUND`). This +avoids branches in `load_matrix_resist`/`load_matrix_react` — ground entries +just write to a dummy location and the result is discarded. + +`SimulationMatrix` holds two KLU matrices sharing the same sparsity pattern +(`MatrixSpec`): +- `nonlinear_matrix: RealMatrix` — used for DC Newton iterations. +- `ac_matrix: ComplexMatrix` — used for AC small-signal analysis. + +`new_or_reset` reuses the existing memory allocations (via `into_alloc` / +`new_with_alloc`) when the circuit topology has not changed, avoiding +reallocation between successive simulations with the same netlist. + +--- + +## Worked example: DC operating point of a resistor divider + +```rust +let mut earena = Arena::new(); +let mut circ = Circuit::new("divider".to_owned(), &mut earena); + +let vdd = circ.node("vdd".to_owned()); +let mid = circ.node("mid".to_owned()); +let gnd = Node::GROUND; + +// V1: vdd → gnd, 5V +let (v1_inst, v1_model) = circ.new_device_instance_by_name( + "V1".to_owned(), "vsource", vec![vdd, gnd])?; +circ.set_model_param(v1_model, "dc", Expr::from(5.0))?; + +// R1: vdd → mid, 1kΩ; R2: mid → gnd, 1kΩ +let (r1_inst, r1_model) = circ.new_device_instance_by_name( + "R1".to_owned(), "resistor", vec![vdd, mid])?; +circ.set_model_param(r1_model, "r", Expr::from(1000.0))?; + +let (_, r2_model) = circ.new_device_instance_by_name( + "R2".to_owned(), "resistor", vec![mid, gnd])?; +circ.set_model_param(r2_model, "r", Expr::from(1000.0))?; + +// Run DC +let mut eval_ctx = ExprEvalCtx::new(&earena); +eval_ctx.set_param(CircuitParam::TEMPERATURE, Value::Num(300.0)); + +let mut sim = circ.prepare_simulation(eval_ctx.borrow(), &earena, SimConfig::default())?; +let solution = sim.dc_op()?; + +// solution[mid] ≈ 2.5 V +``` + +--- + +## Key design decisions + +**Trait objects for device polymorphism.** `DeviceImpl`, `ModelImpl`, and +`InstanceImpl` are `dyn` traits. This lets built-in Rust devices and OSDI +Verilog-A devices coexist in the same `TiSlice` without enums or code +generation. The cost is one vtable dispatch per `eval` call per instance — +acceptable for circuit simulation where the work per call dominates. + +**`Rc` for shared model state.** Multiple instances can share +the same model `Rc`. Interior mutability (`Cell`) is used because model +parameters are only mutated during `prepare_solver`, which is single-threaded. +This avoids `Arc` + `Mutex` overhead for a fundamentally single-threaded use +case. + +**Pointer-based matrix writes.** Storing `NonNull>` pointers into +the KLU matrix backing store during `populate_matrix_ptrs` and writing through +them in `load_matrix_resist` eliminates all index arithmetic from the Newton +inner loop. The `unsafe` annotation on `load_matrix_resist` and +`load_matrix_react` documents this invariant explicitly. + +**`dump` pointer for ground entries.** Rather than conditionally skipping +ground-connected matrix entries in the hot loop, all such entries are silently +redirected to a single dummy `Cell`. The code in `load_matrix_resist` is +then unconditional. + +**`compile_va` always uses batch (cache) mode.** Melange passes +`CompilationDestination::Cache` to OpenVAF so that repeated loads of the same +`.va` file (across simulation runs or different circuits) hit the content- +addressed cache without recompiling. The cache key covers the preprocessed +token stream, so whitespace-only changes to the `.va` file do not invalidate +it. + +**`Box::leak` for the OSDI library.** The `libloading::Library` is leaked to +give the `OsdiDescriptor` slice a `'static` lifetime. This is intentional — +Melange is designed for long-running simulation sessions where loaded libraries +are never unloaded. From 030e8d9efe2ff9d8523f91b37c7427d102afa37e Mon Sep 17 00:00:00 2001 From: philippevelha Date: Sun, 24 May 2026 13:58:50 +0200 Subject: [PATCH 26/28] docs: add verilogae family INTERNALS (verilogae, verilogae_ffi, verilogae_py) verilogae: covers the (*retrieve*) variable extraction model, FatPtr vectorised ABI, ModelInfo/FuncSpec/ParamInfo, the full/info-only compilation paths, per-function MIR slicing, dependency breaking, VFS export, and rayon-parallel codegen. verilogae_ffi: covers the static/dynamic feature split, Opts/VfsExport RAII wrappers, and the links = "verilogae" cargo convention. verilogae_py: covers PyInit_verilogae, the three Python types (VaeModel/VaeFunction/VaeParam), the NumPy FatPtr zero-copy integration, global type-ref caching, and the raw pyo3-ffi approach. Co-Authored-By: Claude Sonnet 4.6 --- docs/verilogae/INTERNALS.md | 314 ++++++++++++++++++++++++++++++++ docs/verilogae_ffi/INTERNALS.md | 134 ++++++++++++++ docs/verilogae_py/INTERNALS.md | 227 +++++++++++++++++++++++ 3 files changed, 675 insertions(+) create mode 100644 docs/verilogae/INTERNALS.md create mode 100644 docs/verilogae_ffi/INTERNALS.md create mode 100644 docs/verilogae_py/INTERNALS.md diff --git a/docs/verilogae/INTERNALS.md b/docs/verilogae/INTERNALS.md new file mode 100644 index 00000000..308904eb --- /dev/null +++ b/docs/verilogae/INTERNALS.md @@ -0,0 +1,314 @@ +# `verilogae` — Verilog-A model evaluation library + +**Location:** `verilogae/verilogae/` +**Role:** A Verilog-A compiler and runtime library with a different goal from +`openvaf`: rather than producing a full OSDI compact-model shared library for +circuit simulation, VerilogAE extracts individual model variables as +standalone callable C functions. The intended use case is parameter extraction +(curve fitting) and scripted model evaluation — scenarios where the user wants +to call `Id(Vgs, Vds, T, ...)` directly from Python, C, or MATLAB without +running a full circuit simulator. + +The crate is built both as a `[lib]` (for static linking from `verilogae_ffi`) +and as a `[cdylib]` (for dynamic loading), sharing the same source tree. + +Cross-links: [verilogae_ffi INTERNALS](../verilogae_ffi/INTERNALS.md) · +[verilogae_py INTERNALS](../verilogae_py/INTERNALS.md) · +[openvaf INTERNALS](../openvaf/INTERNALS.md) · +[mir_autodiff INTERNALS](../mir_autodiff/INTERNALS.md) · +[ARCHITECTURE](../../ARCHITECTURE.md) + +--- + +## Crate layout + +``` +verilogae/verilogae/src/ + lib.rs — export_vfs(), load(), build_local_model(), build_model() + api.rs — C ABI: Opts, Slice, FatPtr, VfsEntry, Vfs, + verilogae_load, verilogae_export_vfs, verilogae_call_fun_parallel, + verilogae_functions / verilogae_real_params / … accessors, + ParamFlags, ModelcardInit, VaeFun type aliases + compiler_db.rs — CompilationDB construction, ModelInfo, FuncSpec, ParamInfo + middle.rs — build_module_mir(), build_param_init_mir(), FuncSpec::slice_mir() + back.rs — CodegenCtx, LLVM codegen for model functions and modelcard init + cache.rs — content-addressed cache lookup (MD5 of preprocessed tokens) + opts.rs — Opts accessor impls; abs_path helper + main.rs — (unused standalone binary entry point) +``` + +--- + +## What VerilogAE extracts + +Given a Verilog-A module, VerilogAE looks for variables marked with the +`(*retrieve*)` attribute: + +```verilog-a +real Id; (* retrieve *) +real Ig; (* retrieve = Idiode *) // dependency breaking +``` + +Each such variable becomes a separate C function in the output library. The +function evaluates that variable given: +- All branch voltages and currents the variable depends on. +- All model and instance parameters. +- Temperature. +- Optional *dependency-breaking* values (described below). + +The generated library also exports: +- `init_modelcard` — initialises parameter arrays to default values and sets + `ParamFlags` (which params are given / have min/max). +- `functions`, `functions.cnt` — list of function names. +- `params.real`, `params.real.cnt` — real parameter names (and `.unit.*`, + `.desc.*`, `.group.*` variants). +- `params.integer.*`, `params.string.*` — integer and string parameters. +- `opvars`, `opvars.cnt` — operating-point output variables. +- `nodes`, `nodes.cnt` — module port names. +- `module_name` — the Verilog-A module name. + +--- + +## Core types + +### `ModelInfo` + +```rust +pub struct ModelInfo { + pub params: IndexMap, + pub functions: Vec, + pub var_names: AHashMap, + pub op_vars: Vec, + pub module: Module, + pub ports: Vec, + pub optional_currents: AHashMap, + pub optional_voltages: AHashMap<(Node, Option), f64>, +} +``` + +`ModelInfo::collect` runs the full frontend (HIR, type check) and then walks +`module.rec_declarations(db)` to find: +- Variables with `(*retrieve*)` → become `FuncSpec` entries. +- Parameters → become `ParamInfo` entries (picking up `units`, `desc`, + `group` attributes). +- Branches with `(*opt_voltage*)` / `(*opt_current*)` → default branch values + that the caller can omit (treated as 0 by default, or a specified value if + the attribute has a literal argument). + +### `FuncSpec` + +```rust +pub struct FuncSpec { + pub var: Variable, + pub dependency_breaking: Box<[Variable]>, + pub prefix: String, // e.g. "fun.0" (base-36 index) +} +``` + +`dependency_breaking` is the list of variables whose read-back values are +passed in as explicit inputs rather than computed internally. This breaks +dependency cycles in iterative solvers. A variable `v` with +`(*retrieve = w*)` means "when computing `v`, treat `w` as an input rather +than computing it." + +`prefix` is `"fun."` followed by the base-36 index of the function. This +prefix namespaces all per-function globals in the output library +(`{prefix}.params.real`, `{prefix}.voltages`, etc.). + +### `FatPtr` — the vectorised-call ABI + +```rust +#[repr(C)] +pub struct FatPtr { + pub ptr: *mut T, + pub meta: Meta, +} + +#[repr(C)] +pub union Meta { + pub stride: u64, + pub scalar: T, +} +``` + +When `ptr` is null, `meta.scalar` holds a single scalar value shared by all +evaluation points. When `ptr` is non-null, `meta.stride` is the byte stride +between successive elements, allowing the caller to pass a NumPy array slice +directly (any stride, including 0 for broadcast). + +The generated model function signature is: + +```rust +pub type VaeFun = Option, + currents: *mut FatPtr, + real_params: *mut FatPtr, + int_params: *mut FatPtr, + str_params: *mut *const c_char, + real_dep_break: *mut FatPtr, + int_dep_break: *mut FatPtr, + temp: *mut FatPtr, + out: *mut c_void, +)>; +``` + +`cnt` is the number of evaluation points. Each `FatPtr` array has one entry +per voltage/current/parameter input. Index `i` in the array corresponds to the +`i`-th input signal for that category. + +### `ParamFlags` + +```rust +pub type ParamFlags = u8; +pub const PARAM_FLAGS_MIN_INCLUSIVE: ParamFlags = 1; +pub const PARAM_FLAGS_MAX_INCLUSIVE: ParamFlags = 2; +pub const PARAM_FLAGS_INVALID: ParamFlags = 4; +pub const PARAM_FLAGS_GIVEN: ParamFlags = 8; +``` + +`GIVEN` is set after the caller assigns a value. `INVALID` is set when the +value is out of the declared range. `MIN_INCLUSIVE`/`MAX_INCLUSIVE` describe +whether the parameter's declared range bounds are inclusive or exclusive. + +--- + +## Compilation pipeline + +### `load(path, full_compile, opts) -> Result` + +``` +load + └── build_local_model + ├── compiler_db::new(path, opts) — create Salsa CompilationDB + ├── cache::lookup(&db, full_compile) — content-addressed cache check + │ (returns early if hit) + └── build_model(db, path, full_compile, local=true, opts, dst) + ├── ModelInfo::collect(&db) — run HIR, collect metadata + ├── LLVMBackend::new(...) + ├── if full_compile: + │ build_module_mir(&db, &info) — build unified MIR + │ info.intern_model(...) + │ build_param_init_mir(...) + │ CodegenCtx::compile_model_info(...) + │ rayon_core::scope { for each FuncSpec: + │ FuncSpec::slice_mir(...) — slice + optimise per-function MIR + │ CodegenCtx::gen_func_obj(...) — LLVM codegen → .o file + │ } + └── else (info-only): + build_param_init_mir only + compile_model_info only (no function objects) + └── linker::link(...) — link .o files → .so/.dylib/.dll +``` + +`full_compile = false` produces a library that has the modelcard (`init_modelcard`, +parameter metadata, node names) but no callable functions. The Python binding +exposes this as `load_info()`, which is much faster than a full compilation. + +### `build_module_mir` — unified MIR for all retrieve variables + +`build_module_mir` calls `hir_lower::MirBuilder` once for all retrieve +variables together, producing a single MIR `Function`. This unified function +contains the code for every output. Individual per-function MIRs are then +carved out by `FuncSpec::slice_mir`. + +After `MirBuilder`: +1. All callback side-effects are disabled (`has_sideeffects = false`), because + VerilogAE functions are expected to be pure. +2. `intern.insert_var_init` inserts the Verilog-A `initial` block assignments. +3. `dead_code_elimination` removes outputs not needed by any retrieve variable. +4. `auto_diff` adds derivative computations for branch voltages and currents + (so that the caller can get Jacobian information alongside the function + values). +5. `sparse_conditional_constant_propagation`, `inst_combine`, `simplify_cfg` + clean up. + +### `FuncSpec::slice_mir` — per-function MIR slicing + +For a single `FuncSpec` (one retrieve variable): +1. Replace `tagged_reads` of dependency-breaking variables with the + corresponding `ParamKind::HiddenState` inputs (treating those variables as + external inputs rather than computing them). +2. Run `aggressive_dead_code_elimination` with the retrieve variable's output + value as the only live root. +3. `simplify_cfg` to remove dead blocks. + +The result is a minimal MIR that computes exactly one variable. + +### Parallelism + +`rayon_core::scope` is used to compile the per-function object files in +parallel. The shared `HirInterner` (read-only after `ensure_names`) is +accessed concurrently via Salsa's `db.snapshot()`. + +--- + +## VFS support + +`export_vfs(path, opts) -> Result>` runs only the +preprocessor and exports the virtual filesystem: a mapping of virtual path → +file contents after `\`include` resolution. The result can be passed back to +`load()` as `opts.vfs`, allowing in-memory compilation without touching the +real filesystem. The Python binding uses this to bundle model files with the +caller's Python package. + +--- + +## C ABI (`api.rs`) + +All `verilogae_*` functions are `#[no_mangle] pub unsafe extern "C"` and wrap +their Rust counterparts in `catch_unwind` to convert panics to null/error +returns rather than unwinding across the FFI boundary. + +The `expose_ptrs!`, `expose_consts!`, `expose_named_ptrs!`, and +`expose_named_consts!` macros generate the global-accessor functions that +look up symbols in a loaded library. Each accessor: +1. Reconstitutes a `Library` from the raw handle via `Library::from_raw`. +2. Looks up the symbol by name. +3. Calls `std::mem::forget(lib)` before returning so the library is not + closed. + +`verilogae_call_fun_parallel` dispatches `cnt` calls to a `VaeFun` in parallel +using `rayon_core::scope`, passing each call index `i` as the first argument. + +--- + +## Cache (`cache.rs`) + +The cache filename uses the same MD5-over-preprocessed-tokens strategy as +`openvaf::cache`. The hash additionally covers the `full_compile` flag +(so full and info-only compilations do not share a cache entry). +The default cache directory is `~/.cache/verilogae` (Linux) / +`%LOCALAPPDATA%\semimod\verilogae\cache` (Windows). + +--- + +## Key design decisions + +**Variable-at-a-time extraction instead of full OSDI.** OSDI is designed for +the circuit simulator use case where every variable is evaluated together in a +tight Newton loop. VerilogAE targets the parameter extraction use case where +the user calls individual model equations for specific bias points. Slicing the +MIR per retrieve variable avoids computing unnecessary intermediate quantities. + +**`FatPtr` for stride-aware vectorization.** The `ptr == null → scalar` and +`stride` in the `Meta` union lets the caller broadcast scalar parameters +across all evaluation points without copying them into an array. This is the +same convention NumPy uses for broadcasting. + +**Dependency breaking.** Compact models have algebraic loops (e.g., a current +that depends on a voltage that depends on that same current). The `retrieve` +attribute's optional argument (`retrieve = otherVar`) allows the caller to +supply the loop-breaking value from outside, enabling the extracted function +to be evaluated without the iterative solver that a full circuit simulator +would use. + +**`full_compile = false` for fast metadata loading.** Building only the +modelcard (parameter lists, node names, init values) takes a fraction of the +time of a full LLVM compilation. This lets the Python binding introspect a +model's parameters without waiting for codegen. + +**`rayon_core` (not `rayon`).** Using the low-level `rayon_core` API gives +VerilogAE direct control over the thread pool without depending on the +top-level `rayon` crate. The parallel scope is used once per `build_model` +call, with each per-function object file compiled on a separate rayon task. diff --git a/docs/verilogae_ffi/INTERNALS.md b/docs/verilogae_ffi/INTERNALS.md new file mode 100644 index 00000000..d265ebe2 --- /dev/null +++ b/docs/verilogae_ffi/INTERNALS.md @@ -0,0 +1,134 @@ +# `verilogae_ffi` — Rust FFI wrapper for VerilogAE + +**Location:** `verilogae/verilogae_ffi/` +**Role:** A thin, safe-ish Rust wrapper over the C ABI that `verilogae` +exports. Provides RAII types (`Opts`, `VfsExport`) and re-exports all +`verilogae_*` C functions so that Rust consumers (primarily `verilogae_py`) +do not need to write `unsafe` symbol lookups themselves. + +The crate supports two linking modes selected by the `static` feature flag +(enabled by default): + +| Feature | Linking | Source of symbols | +|---------|---------|-------------------| +| `static` (default) | Statically links `verilogae` lib | Uses `verilogae::api` directly | +| (no feature) | Dynamically loads `libverilogae` | Uses `generated.rs` bindings loaded via `links = "verilogae"` | + +Cross-links: [verilogae INTERNALS](../verilogae/INTERNALS.md) · +[verilogae_py INTERNALS](../verilogae_py/INTERNALS.md) + +--- + +## Crate layout + +``` +verilogae/verilogae_ffi/src/ + lib.rs — Opts RAII wrapper; VfsExport RAII wrapper; feature dispatch + ffi.rs — Slice helpers (non-static mode only) + ffi/generated.rs — auto-generated extern "C" declarations (non-static mode) + tests.rs — (minimal tests) +``` + +The `Cargo.toml` declares `links = "verilogae"`, which tells Cargo that this +crate provides the native library named `verilogae`. In the `static` feature +mode the `verilogae` lib crate is a direct Rust dependency; in the dynamic +mode a build script would supply the linker flags to find the pre-built +`libverilogae.so`/`.dll`. + +--- + +## Feature dispatch + +```rust +// lib.rs (simplified) +#[cfg(not(feature = "static"))] +mod ffi; +pub use ffi::*; // dynamic: symbols from generated.rs extern "C" block + +#[cfg(feature = "static")] +use verilogae::api as ffi; // static: direct Rust path to the same types/functions +``` + +In `static` mode, `ffi` is literally `verilogae::api`. Every type alias and +constant in `lib.rs` therefore refers directly to the Rust type in the parent +crate — no `extern "C"` involved. In dynamic mode, `ffi::generated` contains +the auto-generated `extern "C"` declarations that match the C ABI. + +--- + +## `Opts` — RAII options wrapper + +```rust +#[derive(Default)] +pub struct Opts(Option<&'static mut ffi::Opts>); +``` + +`Opts` is a wrapper around a heap-allocated `ffi::Opts` struct. The inner +value is lazily allocated: calling `write()` for the first time calls +`ffi::verilogae_new_opts()` to `Box`-allocate an `ffi::Opts` and stores a +`'static` mutable reference (the allocation is live until `Drop`). + +```rust +pub unsafe fn write(&mut self) -> &mut ffi::Opts { … } +``` + +The `'static` lifetime is a lie — the allocation is owned by `Opts` and freed +on `Drop`. The `unsafe` on `write()` documents that the caller must not store +the returned reference past the lifetime of `Opts`. + +On `Drop`, each slice field is individually freed via `into_box_opt()` before +calling `verilogae_free_opts`. This matches the allocation convention +documented in `verilogae::api`: the slices are owned by the Rust side, while +the string *contents* pointed to by those slices are owned by the caller. + +--- + +## `VfsExport` — RAII VFS export wrapper + +```rust +pub struct VfsExport(ffi::Vfs); +``` + +`VfsExport::new(path, opts)` calls `verilogae_export_vfs`. If the returned +`Vfs` has a null `ptr`, it returns `None` (indicating a compilation error). + +`entries()` returns an iterator over `(&str, &str)` pairs (virtual path, +file contents), reinterpreting the raw byte slices as UTF-8. The +`unsafe { std::str::from_utf8_unchecked(…) }` is justified by the invariant +that VerilogAE only produces UTF-8 paths and Verilog-A source text. + +On `Drop`, `verilogae_free_vfs` is called to release the C-allocated memory. + +--- + +## Re-exported constants + +In `static` mode, the four `ParamFlags` constants are defined directly: + +```rust +pub const PARAM_FLAGS_MIN_INCLUSIVE: ParamFlags = 1; +pub const PARAM_FLAGS_MAX_INCLUSIVE: ParamFlags = 2; +pub const PARAM_FLAGS_INVALID: ParamFlags = 4; +pub const PARAM_FLAGS_GIVEN: ParamFlags = 8; +``` + +In dynamic mode these constants come from `generated.rs`. + +--- + +## Key design decisions + +**`links = "verilogae"` without a build script.** The `links` key in +`Cargo.toml` normally requires a `build.rs` that emits `cargo:rustc-link-lib` +lines. Here, in `static` mode, the `verilogae` lib crate is a direct +`[dependencies]` entry, so Cargo links it automatically. The `links` key is +used primarily to signal to Cargo that this crate "provides" the `verilogae` +native library, preventing two crates in the same build from both trying to +provide it. + +**`Option<&'static mut ffi::Opts>` instead of `Box`.** Using +`&'static mut` rather than `Box` avoids the type system tracking the lifetime +of the inner allocation — necessary because in dynamic mode the allocation is +done via an FFI call (`verilogae_new_opts`) whose return type is a raw +pointer. The `Option` supports the lazy-allocation pattern: `None` until +`write()` is first called. diff --git a/docs/verilogae_py/INTERNALS.md b/docs/verilogae_py/INTERNALS.md new file mode 100644 index 00000000..5f67de5a --- /dev/null +++ b/docs/verilogae_py/INTERNALS.md @@ -0,0 +1,227 @@ +# `verilogae_py` — Python extension module + +**Location:** `verilogae/verilogae_py/` +**Role:** A CPython extension module (`verilogae.so` / `verilogae.pyd`) that +exposes VerilogAE to Python. It is written directly against the CPython C API +via `pyo3-ffi` (raw bindings, not the high-level PyO3 interface), with NumPy +integration for zero-copy vectorized model evaluation. + +Cross-links: [verilogae INTERNALS](../verilogae/INTERNALS.md) · +[verilogae_ffi INTERNALS](../verilogae_ffi/INTERNALS.md) + +--- + +## Crate layout + +``` +verilogae/verilogae_py/src/ + lib.rs — PyInit_verilogae entry point; FUNCTIONS table; module-level setup + load.rs — load_py(), load_info_py(), load_vfs() Python function implementations + model.rs — VaeModel, VaeFunction, VaeParam Python types; call implementation + numpy.rs — NumpyArray, ItemType; NumPy C-API integration + typeref.rs — Global PyObject* refs: NUMPY_API, NUMPY_ARR_TYPE, etc. + ffi.rs — new_type() helper; zero!() macro; PyTypeObject boilerplate + offsets.rs — offset_to!() macro for computing struct field offsets + unicode.rs — UTF-8 ↔ Python str helpers + util.rs — likely() / unlikely() branch prediction hints + build.rs — pyo3-build-config: detects Python installation and sets link flags +``` + +--- + +## Module entry point (`lib.rs`) + +```rust +#[no_mangle] +pub unsafe extern "C" fn PyInit_verilogae() -> *mut PyObject +``` + +`PyInit_verilogae` is the standard CPython extension initialisation function. +It is called by the Python interpreter when `import verilogae` is executed. + +The function: +1. Calls `PyType_Ready` for all three Python types (`VAE_MODEL_TY`, + `VAE_FUNCTION_TY`, `VAE_PARAM_TY`). +2. Creates the module with `PyModule_Create`. +3. Calls `init_typerefs()` to populate global `PyObject*` caches + (see [Global type refs](#global-type-refs)). +4. Adds `__version__` (from `env!("CARGO_PKG_VERSION")`). +5. Adds `__all__`. + +--- + +## Python API + +Three module-level functions: + +| Python name | Rust handler | Description | +|-------------|-------------|-------------| +| `verilogae.load(path, ...)` | `load_py` | Compile the `.va` file (if not cached) and return a `VaeModel` with callable functions | +| `verilogae.load_info(path, ...)` | `load_info_py` | Compile only the modelcard (no callable functions); much faster | +| `verilogae.export_vfs(path, ...)` | `load_vfs` | Run the preprocessor and return a dict of `{virtual_path: file_contents}` | + +All three accept keyword arguments for compilation options (include dirs, +defines, lint levels, cache dir, target, opt level, VFS dict). + +### `load_py` / `load_info_py` + +Both handlers call `verilogae_ffi::Opts::write()` to populate a C `Opts` +struct from the Python keyword arguments, then call `verilogae_load` (or +`verilogae_load` with `full_compile = false` for `load_info`). On success, +the raw library handle (a `*const c_void`) is wrapped in a `VaeModel` +Python object. + +--- + +## Python types + +### `VaeModel` + +``` +VaeModel attributes (read-only): + functions — tuple of VaeFunction objects (one per (*retrieve*) variable) + modelcard — tuple of VaeParam objects (one per parameter) + op_vars — tuple of str (operating-point variable names) + module_name — str (the Verilog-A module name) + nodes — tuple of str (port names) +``` + +`VaeModel` stores the raw library handle (`*const c_void`) obtained from +`verilogae_load`. All attribute values are built once at construction time by +reading the exported globals (`verilogae_functions`, `verilogae_real_params`, +etc.) via `verilogae_ffi` accessors. + +On deallocation (`tp_dealloc`), the library handle is passed to +`Library::from_raw` and dropped, which calls `dlclose`. + +### `VaeFunction` + +``` +VaeFunction attributes (read-only): + voltages — tuple of str: voltage input names + currents — tuple of str: current input names + real_params — tuple of str: real parameter names relevant to this function + int_params — tuple of str: integer parameter names + str_params — tuple of str: string parameter names + dep_break — dict: dependency-breaking variable names and their types + +VaeFunction.__call__(voltages=..., currents=..., params=..., temp=...) -> ndarray +``` + +Calling a `VaeFunction` dispatches `verilogae_call_fun_parallel` (which +uses `rayon_core` internally). The call handler: +1. Extracts each named input from the kwargs as a NumPy array or scalar. +2. Builds a `FatPtr` for each input: + - If scalar: `set_scalar(val)`. + - If NumPy array: `set_ptr(data_ptr, stride_bytes)` using the NumPy + array's strides. +3. Allocates an output array with `PyArray_SimpleNew`. +4. Calls `verilogae_call_fun_parallel(fun_ptr, cnt, voltages, currents, ...)`. +5. Returns the output ndarray. + +### `VaeParam` + +``` +VaeParam attributes (read-only): + name — str + units — str + description — str + group — str + type — str ("real", "integer", or "string") + default — float, int, or str (the default value) + flags — int (ParamFlags bitmask) +``` + +`VaeParam` is constructed from the modelcard data (the output of +`init_modelcard` plus the name/unit/description arrays). + +--- + +## NumPy integration (`numpy.rs`) + +`NumpyArray` wraps a raw `*mut PyObject` that is guaranteed to be a NumPy +array. It is constructed via `PyArray_SimpleNew` (for output arrays) or by +checking `PyArray_Check` on caller-supplied arguments. + +```rust +pub enum ItemType { Float64, Int32, Complex128 } +``` + +`NumpyArray::data_ptr()` returns the raw data pointer and `strides()` returns +the byte strides. These are passed directly into `FatPtr::set_ptr`, enabling +zero-copy access to NumPy array data regardless of memory layout (C-order, +Fortran-order, or strided slices). + +Complex-valued outputs use `NUMPY_CDOUBLE_DESCR` to create arrays of +`np.complex128`. + +--- + +## Global type refs (`typeref.rs`) + +Several global `*mut PyObject` pointers are cached after module init: + +| Symbol | Value | +|--------|-------| +| `NUMPY_API` | NumPy C API function table pointer | +| `NUMPY_ARR_TYPE` | `np.ndarray` type object | +| `NUMPY_CDOUBLE_DESCR` | `np.dtype(complex)` descriptor | +| `TEMPERATURE_STR` | interned `"temp"` Python string | +| `VOLTAGES_STR` | interned `"voltages"` Python string | +| `CURRENTS_STR` | interned `"currents"` Python string | + +`init_typerefs()` populates these globals once during `PyInit_verilogae`. +Using cached interned strings for hot-path keyword argument lookups avoids +repeated `PyUnicode_FromString` calls per function invocation. + +--- + +## `offset_to!` macro and struct field offsets + +Because `VaeModel`, `VaeFunction`, and `VaeParam` are `#[repr(C)]` structs +with `PyObject_HEAD` at the start, member offsets for `PyMemberDef` must be +computed at compile time. The `offset_to!` macro expands to +`std::mem::offset_of!(StructName, field)` (or an equivalent `addr_of!`-based +implementation for older compilers). + +--- + +## Build configuration (`build.rs`) + +`pyo3-build-config` detects the Python installation: +- The Python include path (`-I/usr/include/pythonX.Y`). +- The link library and flags (`-lpython3.X` or `python3X.lib`). +- The `Py_3_8` cfg flag (enabling `METH_FASTCALL` for the function table). + +On the `static` feature path, the extension links `libverilogae.a` directly. +On the dynamic path, `libverilogae.so` must be available at runtime. + +--- + +## Key design decisions + +**Raw `pyo3-ffi` instead of high-level PyO3.** The high-level PyO3 API adds +significant overhead through GIL management, error conversion, and trait +dispatch. For a performance-oriented library that dispatches tens of thousands +of model evaluations per second, the raw CPython C API gives full control. +It also avoids PyO3's version compatibility shims. + +**`FatPtr` for zero-copy NumPy access.** The `ptr + stride` design means the +Python caller can pass a column from a 2-D NumPy array (non-contiguous) or a +scalar without any copying. VerilogAE reads through the stride directly in the +generated machine code. + +**`rayon_core::scope` inside `verilogae_call_fun_parallel`.** The +parallelism lives in the Rust layer, invisible to Python. The GIL is not +released during the call (the GIL release would require the high-level PyO3 +API); instead, the generated model function is assumed to be GIL-independent +(it is: it does only arithmetic on the provided data pointers). + +**Lazy NumPy import.** `import_array()` (NumPy's C-API initializer macro) is +deferred to the first call that needs NumPy, rather than at module init. This +avoids a hard dependency on NumPy being installed even if the user only uses +`load_info()`. + +> **TODO(verify):** The `import_array()` call location and whether NumPy is +> actually a lazy dependency or is required at module init — the code was not +> fully read due to macro complexity in `typeref.rs`. From 17fae5a55b71c3ca5cd327d4dbc77a1a218b2148 Mon Sep 17 00:00:00 2001 From: philippevelha Date: Sun, 24 May 2026 14:00:51 +0200 Subject: [PATCH 27/28] docs: add xtask INTERNALS MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Covers the two active subcommands: `verilogae build/test/publish` (Python wheel pipeline: cargo build → pip wheel → auditwheel → twine) and `gen-msvcrt` (UCRT .def generation via mingw-w64 + clang -E). Notes the commented-out cache and vendor modules. Co-Authored-By: Claude Sonnet 4.6 --- docs/xtask/INTERNALS.md | 161 ++++++++++++++++++++++++++++++++++++++++ 1 file changed, 161 insertions(+) create mode 100644 docs/xtask/INTERNALS.md diff --git a/docs/xtask/INTERNALS.md b/docs/xtask/INTERNALS.md new file mode 100644 index 00000000..3fae2c29 --- /dev/null +++ b/docs/xtask/INTERNALS.md @@ -0,0 +1,161 @@ +# `xtask` — Auxiliary build commands + +**Location:** `xtask/` +**Role:** A [cargo-xtask](https://github.com/matklad/cargo-xtask) binary that +implements build automation tasks too complex for plain `cargo` commands. +Invoked as `cargo xtask ` via an alias in `.cargo/config`. + +This crate contains no compiler logic — it is a shell-scripting harness that +orchestrates `cargo build`, `pip wheel`, `auditwheel`, and `twine`. + +Cross-links: [verilogae INTERNALS](../verilogae/INTERNALS.md) · +[verilogae_py INTERNALS](../verilogae_py/INTERNALS.md) · +[ARCHITECTURE](../../ARCHITECTURE.md) + +--- + +## Crate layout + +``` +xtask/src/ + main.rs — entry point; project_root(); subcommand dispatch + flags.rs — xflags CLI definition + generated parser + build_py.rs — verilogae build / test / publish subcommands + msvcrt.rs — gen-msvcrt subcommand + cache.rs — (module present but commented out / unused) + vendor.rs — (module present but commented out / unused) +``` + +Dependencies are intentionally minimal: `xshell` (shell commands), `xflags` +(CLI parsing), `anyhow` (error handling), `md5` and `base_n` (unused in +active code — leftovers from the commented-out `cache` module). + +--- + +## Subcommands + +### `cargo xtask verilogae build [--force] [--manylinux] [--windows] [--install]` + +Builds the `verilogae_py` Python extension wheels for Python 3.8–3.11. + +**Steps:** + +1. Sets `RUSTFLAGS="-C strip=symbols"` to strip debug symbols from the + release build. +2. Selects the Rust target: + - `--windows` → `x86_64-pc-windows-msvc` + - otherwise → `x86_64-unknown-linux-gnu` +3. Runs `cargo build --release -p verilogae --target {target}` to produce + `libverilogae.so` / `verilogae.dll`. +4. If `--force`, clears the `wheels/` directory. Otherwise, aborts if + `wheels/` is non-empty. +5. For each Python version in 3.8–3.11 (filtered to those actually + installed on the host): + - Linux: `pip wheel . -w ./wheels --no-deps` with `PYO3_PYTHON={py}` + - Windows: same with `PYO3_CROSS_PYTHON_VERSION={version}` and + `CARGO_BUILD_TARGET=x86_64-pc-windows-msvc` +6. Audits wheels with `auditwheel repair`: + - `--manylinux` → repairs to a `manylinux` tag (portable Linux binary). + - otherwise (Linux) → repairs with `--plat linux_x86_64`. + - `--windows` → skips `auditwheel` (not applicable on Windows). +7. If `--install`, installs each built wheel into the matching Python + installation via `pip install --force-reinstall`. + +`find_py` discovers Python versions by probing `python3.8` … `python3.11` +executables. On Windows it uses the versions directly; on Linux it requires +the interpreter to be in `PATH`. + +### `cargo xtask verilogae test` + +Installs NumPy into each available Python 3.8–3.11 interpreter and runs +`verilogae/tests/test_hicum.py` for each. + +### `cargo xtask verilogae publish [--windows]` + +Convenience wrapper: runs `build --force --manylinux --install` (or +`--windows`), then (Linux only) `test`, then `twine upload wheels/*` to +upload all wheels to PyPI. + +### `cargo xtask gen-msvcrt` + +Generates the Universal C Runtime (UCRT) `.def` export-definition files +that the linker needs to link against `api-ms-win-crt-*.dll` on Windows +cross-compilation targets. + +**Steps:** + +1. Clones `mingw-w64` at tag `v10.0.0` (shallow, single-branch) into + `./mingw/`. +2. For each of the 15 UCRT DLL names in `UCRT_FILES` and each architecture + (`X64`, `ARM64`): + - If a `.def` file exists in `mingw-w64-crt/lib-common/`, sanitize it + (strip comments, `DATA` entries, and preprocessor directives) and copy + it to `openvaf/target/ucrt/defs/{arch}/`. + - If only a `.def.in` template exists, run it through `clang -E` + (C preprocessor) with `-D DEF_{ARCH}` to expand the architecture + guard macros, then sanitize and write the result. +3. Removes the `./mingw/` checkout. + +`sanitize_def` strips: +- Lines containing `==` (version-compare guards). +- Lines ending with `DATA` or `\tDATA` (data-export entries that cause + linker errors). +- Empty lines, `;` comments, and `#` preprocessor lines. +- Replaces the special `"; strnlen replaced by emu"` comment with a real + `strnlen` export entry. + +--- + +## `flags.rs` — CLI definition + +The CLI is declared with `xflags::xflags!` and the generated parser code +is committed alongside the declaration (the `// generated start … end` +block). Regeneration is triggered by `UPDATE_XFLAGS=1 cargo build`. + +``` +xtask +├── verilogae +│ ├── build [--force] [--manylinux] [--windows] [--install] +│ ├── test +│ └── publish [--windows] +└── gen-msvcrt +``` + +--- + +## `project_root()` + +```rust +fn project_root() -> PathBuf { + Path::new(&env::var("CARGO_MANIFEST_DIR").unwrap_or_else(|_| env!("CARGO_MANIFEST_DIR"))) + .ancestors() + .nth(1) + .unwrap() + .to_path_buf() +} +``` + +Walks one level up from the `xtask/` manifest to reach the workspace root. +`main` immediately calls `sh.change_dir(project_root())` so that all +subsequent `xshell` commands run from the workspace root regardless of where +`cargo xtask` was invoked. + +--- + +## Key design decisions + +**`xshell` for shell commands.** `xshell` provides a Rust-native, cross- +platform shell API with interpolation (`cmd!(sh, "cargo build {target}")`) +and `push_env`/`push_dir` RAII guards. This avoids shell-escaping bugs and +makes the build scripts work on both Linux and Windows without `bash`. + +**Commented-out modules.** `cache.rs` and `vendor.rs` are present in the +source tree but their `mod` declarations in `main.rs` are commented out. +They are dead code preserved for potential future reactivation. The active +`cache` module was likely intended to implement the same content-addressed +caching as `openvaf::cache` and `verilogae::cache`, but it is not wired up. + +**`gen-msvcrt` uses `clang -E`.** The MinGW-w64 `.def.in` files use C +preprocessor guards (`DEF_X64`, `DEF_ARM64`) to select architecture-specific +exports. Invoking `clang -E` is simpler than implementing a C preprocessor +in Rust and is justified for a one-time code-generation task. From 838b84002a21affe2d92a06d9ebf038475b55b2d Mon Sep 17 00:00:00 2001 From: philippevelha Date: Sun, 24 May 2026 20:52:59 +0200 Subject: [PATCH 28/28] docs: add full workspace crate interaction graph (Graphviz) crate_graph.gv covers all 35 workspace crates in 6 clusters: entry points, compilation library, frontend, MIR middle-end, backend, utility crates, and consumer crates. Edges are labelled with the primary function/method calls crossing each crate boundary. Dashed edges denote dev-dependencies, generated code, and widely-used macro utilities. Rendered to SVG and PDF. Regenerate with: dot -Tsvg docs/crate_graph.gv -o docs/crate_graph.svg Co-Authored-By: Claude Sonnet 4.6 --- docs/crate_graph.gv | 289 ++++++++++++ docs/crate_graph.pdf | Bin 0 -> 57781 bytes docs/crate_graph.svg | 1002 ++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 1291 insertions(+) create mode 100644 docs/crate_graph.gv create mode 100644 docs/crate_graph.pdf create mode 100644 docs/crate_graph.svg diff --git a/docs/crate_graph.gv b/docs/crate_graph.gv new file mode 100644 index 00000000..658cb322 --- /dev/null +++ b/docs/crate_graph.gv @@ -0,0 +1,289 @@ +// OpenVAF / VerilogAE workspace — crate interaction diagram +// Render with: dot -Tsvg docs/crate_graph.dot -o docs/crate_graph.svg +// dot -Tpdf docs/crate_graph.dot -o docs/crate_graph.pdf + +digraph openvaf { + graph [ + rankdir = LR + fontname = "Helvetica" + fontsize = 12 + compound = true + splines = true + nodesep = 0.4 + ranksep = 1.2 + bgcolor = "#fafafa" + ] + node [ + shape = box + style = "filled,rounded" + fontname = "Helvetica" + fontsize = 11 + margin = "0.15,0.10" + ] + edge [ + fontname = "Helvetica" + fontsize = 9 + labeldistance = 2.0 + labelangle = 30 + ] + + // ─────────────────────────────────────────────────────────────────── + // CLUSTER 0 — Binary entry points + // ─────────────────────────────────────────────────────────────────── + subgraph cluster_bins { + label = "Entry points" + style = "filled,rounded" + color = "#444444" + fillcolor = "#eeeeee" + fontsize = 13 + + openvaf_driver [label="openvaf-driver\n(CLI binary)" fillcolor="#c8e6c9"] + } + + // ─────────────────────────────────────────────────────────────────── + // CLUSTER 1 — Compilation library (openvaf lib crate) + // ─────────────────────────────────────────────────────────────────── + subgraph cluster_lib { + label = "Compilation library" + style = "filled,rounded" + color = "#444444" + fillcolor = "#e3f2fd" + fontsize = 13 + + openvaf [label="openvaf\n(lib)" fillcolor="#bbdefb"] + } + + // ─────────────────────────────────────────────────────────────────── + // CLUSTER 2 — Frontend + // ─────────────────────────────────────────────────────────────────── + subgraph cluster_frontend { + label = "Frontend (parsing → HIR)" + style = "filled,rounded" + color = "#444444" + fillcolor = "#fff8e1" + fontsize = 13 + + tokens [label="tokens" fillcolor="#fff9c4"] + lexer [label="lexer" fillcolor="#fff9c4"] + syntax [label="syntax\n(CST / AST)" fillcolor="#fff9c4"] + parser [label="parser" fillcolor="#fff9c4"] + vfs [label="vfs" fillcolor="#ffe082"] + paths [label="paths\n(AbsPathBuf)" fillcolor="#ffe082"] + preprocessor [label="preprocessor" fillcolor="#ffcc80"] + basedb [label="basedb\n(Salsa root DB)" fillcolor="#ffb74d"] + hir_def [label="hir_def\n(item tree, bodies)" fillcolor="#ffa726"] + hir_ty [label="hir_ty\n(type inference)" fillcolor="#fb8c00"] + hir [label="hir\n(query facade)" fillcolor="#f57c00"] + } + + // ─────────────────────────────────────────────────────────────────── + // CLUSTER 3 — MIR / middle-end + // ─────────────────────────────────────────────────────────────────── + subgraph cluster_mir { + label = "MIR middle-end" + style = "filled,rounded" + color = "#444444" + fillcolor = "#f3e5f5" + fontsize = 13 + + mir [label="mir\n(IR types)" fillcolor="#e1bee7"] + mir_build [label="mir_build\n(SSA builder)" fillcolor="#ce93d8"] + hir_lower [label="hir_lower\n(HIR → MIR)" fillcolor="#ba68c8"] + mir_opt [label="mir_opt\n(optimisations)" fillcolor="#ab47bc"] + mir_autodiff [label="mir_autodiff\n(AD pass)" fillcolor="#9c27b0"] + mir_reader [label="mir_reader\n(text parser)" fillcolor="#e1bee7"] + mir_interpret [label="mir_interpret\n(test interpreter)" fillcolor="#e1bee7"] + } + + // ─────────────────────────────────────────────────────────────────── + // CLUSTER 4 — Backend + // ─────────────────────────────────────────────────────────────────── + subgraph cluster_backend { + label = "Backend (codegen → OSDI)" + style = "filled,rounded" + color = "#444444" + fillcolor = "#fce4ec" + fontsize = 13 + + sim_back [label="sim_back\n(model extraction)" fillcolor="#f48fb1"] + osdi [label="osdi\n(OSDI ABI codegen)" fillcolor="#f06292"] + mir_llvm [label="mir_llvm\n(LLVM codegen)" fillcolor="#ec407a"] + linker [label="linker" fillcolor="#e91e63"] + target [label="target\n(target triple)" fillcolor="#e91e63"] + linker_target [label="linker_target" fillcolor="#e91e63"] + } + + // ─────────────────────────────────────────────────────────────────── + // CLUSTER 5 — Data-structure library crates + // ─────────────────────────────────────────────────────────────────── + subgraph cluster_util { + label = "Utility / data-structure crates" + style = "filled,rounded" + color = "#444444" + fillcolor = "#e8f5e9" + fontsize = 13 + + stdx [label="stdx\n(macros, iters)" fillcolor="#c8e6c9"] + arena [label="arena\n(typed arena)" fillcolor="#c8e6c9"] + bitset [label="bitset\n(BitSet / Sparse)" fillcolor="#a5d6a7"] + workqueue [label="workqueue\n(WorkQueue/Stack)" fillcolor="#a5d6a7"] + list_pool [label="list_pool\n(ValueListPool)" fillcolor="#c8e6c9"] + bforest [label="bforest\n(B+-tree)" fillcolor="#c8e6c9"] + typed_indexmap [label="typed_indexmap\n(TiMap/TiSet)" fillcolor="#c8e6c9"] + base_n [label="base_n\n(base-36 encode)" fillcolor="#c8e6c9"] + mini_harness [label="mini_harness\n(test runner)" fillcolor="#c8e6c9"] + sourcegen [label="sourcegen\n(codegen tests)" fillcolor="#fff9c4"] + } + + // ─────────────────────────────────────────────────────────────────── + // CLUSTER 6 — Consumer / downstream crates + // ─────────────────────────────────────────────────────────────────── + subgraph cluster_consumers { + label = "Consumer crates" + style = "filled,rounded" + color = "#444444" + fillcolor = "#e0f7fa" + fontsize = 13 + + melange_core [label="melange-core\n(circuit simulator)" fillcolor="#b2ebf2"] + verilogae [label="verilogae\n(model eval lib)" fillcolor="#80deea"] + verilogae_ffi [label="verilogae_ffi\n(C FFI wrapper)" fillcolor="#4dd0e1"] + verilogae_py [label="verilogae_py\n(Python extension)" fillcolor="#26c6da"] + xtask [label="xtask\n(build scripts)" fillcolor="#b2ebf2"] + } + + // ═══════════════════════════════════════════════════════════════════ + // EDGES — Entry points + // ═══════════════════════════════════════════════════════════════════ + openvaf_driver -> openvaf [label="compile()\nexpand()" color="#2e7d32" fontcolor="#2e7d32"] + + // ═══════════════════════════════════════════════════════════════════ + // EDGES — openvaf lib → subsystems + // ═══════════════════════════════════════════════════════════════════ + openvaf -> basedb [label="CompilationDB::new_fs()" color="#1565c0" fontcolor="#1565c0"] + openvaf -> sim_back [label="collect_modules()" color="#1565c0" fontcolor="#1565c0"] + openvaf -> mir_llvm [label="LLVMBackend::new()" color="#1565c0" fontcolor="#1565c0"] + openvaf -> osdi [label="osdi::compile()" color="#1565c0" fontcolor="#1565c0"] + openvaf -> linker [label="link()" color="#1565c0" fontcolor="#1565c0"] + openvaf -> base_n [label="encode() [cache hash]" color="#1565c0" fontcolor="#1565c0"] + openvaf -> target [label="host_triple()\nget_target_names()" color="#1565c0" fontcolor="#1565c0"] + + // ═══════════════════════════════════════════════════════════════════ + // EDGES — Frontend chain + // ═══════════════════════════════════════════════════════════════════ + tokens -> lexer [label="Token types"] + lexer -> parser [label="token stream"] + parser -> syntax [label="Parse"] + vfs -> paths [label="AbsPathBuf\nreexport" style=dashed] + preprocessor -> basedb [label="VfsStorage\nSourceMap" style=dashed] + syntax -> basedb [label="parse()\nast_id_map()"] + basedb -> vfs [label="file_text()\nset_file_text()"] + basedb -> preprocessor [label="preprocess()"] + basedb -> syntax [label="parse()"] + hir_def -> basedb [label="db.parse()\ndb.file_text()\nSalsa queries"] + hir_def -> syntax [label="AstIdMap\nAstPtr"] + hir_def -> arena [label="Idx storage"] + hir_def -> stdx [label="impl_idx_from!\nimpl_intern_key!"] + hir_ty -> hir_def [label="item_tree()\ndef_map()\nbodies()"] + hir_ty -> basedb [label="Salsa queries"] + hir -> hir_ty [label="infer()\nalias_resolve()"] + hir -> hir_def [label="item_tree()\ndef_map()"] + hir -> basedb [label="CompilationDB\nreexport" style=dashed] + + // ═══════════════════════════════════════════════════════════════════ + // EDGES — MIR construction + // ═══════════════════════════════════════════════════════════════════ + hir_lower -> hir [label="MirBuilder::new(db, module)\nCallBackKind / PlaceKind"] + hir_lower -> mir [label="Function\ndfg.make_inst()\nlayout.append_inst()"] + hir_lower -> mir_build [label="SSABuilder::def_var()\nSSABuilder::use_var()"] + mir_build -> mir [label="Function\nDataFlowGraph\nLayout"] + mir_build -> bitset [label="BitSet\n[block reachability]"] + mir -> bforest [label="PhiNode.blocks\nMap"] + mir -> list_pool [label="ValueList\nValueListPool"] + mir -> arena [label="PrimaryMap\nPrimaryMap"] + mir_opt -> mir [label="dead_code_elimination()\nsimplify_cfg()\nSCCP / inst_combine()"] + mir_opt -> bitset [label="BitSet / HybridBitSet\nSparseBitMatrix\n[liveness sets]"] + mir_opt -> workqueue [label="WorkQueue\n[DCE worklist]"] + mir_autodiff -> mir [label="auto_diff()\nDominatorTree::compute()"] + mir_autodiff -> mir_opt [label="dead_code_elimination()\nsimplify_cfg()"] + mir_autodiff -> workqueue [label="WorkQueue\n[live-derivatives fixpoint]"] + workqueue -> bitset [label="BitSet\n[membership oracle]"] + + // ═══════════════════════════════════════════════════════════════════ + // EDGES — sim_back (model extraction) + // ═══════════════════════════════════════════════════════════════════ + sim_back -> hir_lower [label="MirBuilder::build()\nHirInterner"] + sim_back -> hir [label="Module\nParameter\nVariable queries"] + sim_back -> mir_opt [label="dead_code_elimination()\naggressive_dce()\nSCCP / simplify_cfg()"] + sim_back -> mir_autodiff [label="auto_diff()\nunknowns()"] + sim_back -> typed_indexmap [label="TiMap<…>\n[signal/param tables]"] + + // ═══════════════════════════════════════════════════════════════════ + // EDGES — Backend + // ═══════════════════════════════════════════════════════════════════ + osdi -> sim_back [label="collect_modules()\nModule list"] + osdi -> mir_llvm [label="LLVMBackend::new()\nCodegenCx::compile_module()"] + osdi -> linker [label="link()"] + osdi -> base_n [label="encode() [UUID / symbol names]"] + osdi -> target [label="is_like_windows\nsymbol naming"] + mir_llvm -> mir [label="Function traversal\nOpcode dispatch"] + mir_llvm -> target [label="target_machine\ndata_layout\nfeatures"] + linker -> target [label="linker flavor\npre/post link args"] + linker -> linker_target [label="LinkerFlavor\nLldFlavor"] + target -> linker_target [label="LinkerFlavor\nabi"] + + // ═══════════════════════════════════════════════════════════════════ + // EDGES — test / dev tools + // ═══════════════════════════════════════════════════════════════════ + mir_reader -> mir [label="parse_function()\nparse_functions()" style=dashed color="#666666" fontcolor="#666666"] + mir_interpret -> mir [label="Interpreter::run()\neval() dispatch" style=dashed color="#666666" fontcolor="#666666"] + mir_autodiff -> mir_reader [label="[dev-dep]\nparse test fixtures" style=dashed color="#666666" fontcolor="#666666"] + mir_autodiff -> mir_interpret [label="[dev-dep]\nnumerical verify" style=dashed color="#666666" fontcolor="#666666"] + sourcegen -> syntax [label="ensure_file_contents()\nSyntaxKind gen" style=dashed color="#888888" fontcolor="#888888"] + sourcegen -> mir [label="ensure_file_contents()\nopcode / InstBuilder gen" style=dashed color="#888888" fontcolor="#888888"] + sourcegen -> hir [label="ensure_file_contents()\nbuiltin functions gen" style=dashed color="#888888" fontcolor="#888888"] + sourcegen -> osdi [label="ensure_file_contents()\nC header bindings gen" style=dashed color="#888888" fontcolor="#888888"] + + // ═══════════════════════════════════════════════════════════════════ + // EDGES — Consumer crates + // ═══════════════════════════════════════════════════════════════════ + melange_core -> openvaf [label="compile()\n[batch mode]" color="#006064" fontcolor="#006064"] + melange_core -> typed_indexmap [label="TiMap / TiSet\n[Circuit internals]" color="#006064" fontcolor="#006064"] + + verilogae -> hir [label="CompilationDB::new()\nModule / Parameter queries" color="#00695c" fontcolor="#00695c"] + verilogae -> hir_lower [label="MirBuilder::build()\nHirInterner" color="#00695c" fontcolor="#00695c"] + verilogae -> mir_opt [label="dead_code_elimination()\naggressive_dce()\nsimplify_cfg()" color="#00695c" fontcolor="#00695c"] + verilogae -> mir_autodiff [label="auto_diff()" color="#00695c" fontcolor="#00695c"] + verilogae -> mir_llvm [label="LLVMBackend::new()\ngen_func_obj()" color="#00695c" fontcolor="#00695c"] + verilogae -> linker [label="link()" color="#00695c" fontcolor="#00695c"] + verilogae -> base_n [label="encode() [function prefix]" color="#00695c" fontcolor="#00695c"] + verilogae -> target [label="Target::host_target()\nTarget::search()" color="#00695c" fontcolor="#00695c"] + verilogae -> paths [label="AbsPathBuf\n[include dirs]" color="#00695c" fontcolor="#00695c"] + verilogae -> bitset [label="BitSet [output values]" color="#00695c" fontcolor="#00695c"] + verilogae -> typed_indexmap [label="TiMap [model info]" color="#00695c" fontcolor="#00695c"] + + verilogae_ffi -> verilogae [label="verilogae_load()\nverilogae_export_vfs()\nverilogae_call_fun_parallel()\n[static feature: direct Rust]\n[dynamic: extern C ABI]" color="#00838f" fontcolor="#00838f"] + verilogae_py -> verilogae_ffi [label="verilogae_load()\nverilogae_fun_ptr()\nverilogae_init_modelcard()\nverilogae_real_params() …" color="#00acc1" fontcolor="#00acc1"] + + xtask -> base_n [label="encode() [cache hash]" style=dashed color="#666666" fontcolor="#666666"] + + // ═══════════════════════════════════════════════════════════════════ + // Widely-used utility edges (shown as dashed to reduce clutter) + // ═══════════════════════════════════════════════════════════════════ + stdx -> hir_def [label="impl_idx_from!\nimpl_intern_key!" style=dashed color="#999999" fontcolor="#999999"] + stdx -> mir [label="impl_idx_from!" style=dashed color="#999999" fontcolor="#999999"] + stdx -> sim_back [label="impl_idx_from!\nzip()" style=dashed color="#999999" fontcolor="#999999"] + + // Legend + subgraph cluster_legend { + label = "Legend" + style = "filled,rounded" + color = "#444444" + fillcolor = "#f5f5f5" + fontsize = 11 + rank = sink + + leg_solid [label=" solid arrow = runtime dependency " shape=plaintext] + leg_dashed [label=" dashed arrow = dev-dep / generated / macro use " shape=plaintext] + } +} diff --git a/docs/crate_graph.pdf b/docs/crate_graph.pdf new file mode 100644 index 0000000000000000000000000000000000000000..f71aaa92952bca776a54d9f4e3e85d857511d9de GIT binary patch literal 57781 zcmZ^}V~nOvxHa0gZQHgvZQHhO+n#>fwr$(C?dk4m>&(0NIp@che1GmrDpyiT)rEDh zwa694BAXLZsB{8x+J9hzWlcgMV~MH zlmXo`_s)J;-z=YxCyYaAjIVp`u`l+s4@X=EyFI<1S0^`2yNq7h7y%oE-5=Kv-&OYy z8%G;s;TEGin&v=uB)I*6SxwZ@V>dN&z0sAgOXveMYc>8SwWn*(X-cZY`BY<#4{>H@U zoY<5H;uV*b;d^(&9v4Md)I!1O-B_b{6m5R%@4|}nARwuslTi=$d<|$}9vNM}!{{f*%Szar*FektA-DS# zVE$o|eS^}BnciWlGetyI&^F0m1c3qJ( zYxab;;Rx=q52em2n79;Py@I|z)Qf{QUD)?;H)!XfeV&hE28FV%@Y>Cnc7iDwf@TFB zuHzqFt|k|3*a+N^ZS!>cjR^#vnUw07UNwV>+JzqTFM7twf>P?0bhhm0f5$~0UAA9g zY*u13XiA<8KJ;Bxs@=NzdG?qhlf&$I;UREU;a{C}N^lQa{xe`fH$jSKF>e~X%Z9|3L>25-+sZ%+Eg zC9ns8`e8@?NqpOxyG#3x}1*}xV zFDR|W)fSaki*^N;SKJ(0>aNXrPj!XUk#J9dRfT_34gHlA^nlm*{TI6JyodFk>#5!q zC_iNVU{5#6ee$s%HfOGTn;r3dS@qB@`Tdn1KuFZueVZO%JGKMBwpF{skTbE%0_`x` z5D{z605^h68M1rdX*g%_x9xi^ALC9pvGpMBO3NGd<{PQ!vF|CZUq%&;YL!8-)xE;c zC+A%Mav3R&1f@~gW_X%*@0WA$hI_tKgGMg;1N7W1$YA(+Cm@J>-@mbm5l`n6am|p3 zkdHoFNpze3e0MuOinq+kKZufHPAQyH=A<=926>yVS_+|Z6mTN$X<%EZ-xd?WKM`ge z#YZZ{4p=H(x{hUv(+R#6SmU37V%nczC%GGHMVw*(A|PQgl#+KIX;Me#y1~ z!wNWx1`}=IjMnQWO~u_#!Wqb2f)W+P)}zd+cX!HK+{xNtp0FndUwCjO?ogtz<<^oi z*jeb5fhda^?|HqQ4S9pUS?UUmd`Pu9Oz%Szmn}Bj%;~M&K>P|-LI@Ru0QB_QOA#J_Z`m|8mZV+;v2J35xteoDhZqO~= z)#eyhpQg{}JA?QQ`|T&XXww3hZiRn0R=wBK4O?amyfa>kQ!?~9XBizgH z&?Rzncs0B5OXi|zZ4_5McqbpZJjdShhFbxeg=$h`Cka9kqa-zkb8uOrMk-G6e;>SN zNs~wJ3bVnFmD4jAanVAx6i8zgTj6!x9b^2fQ;&OROyCqHIW1%Cw|E2hF(hH+GNG#p ziA|!5r=9VSciAq3)%_f2YfT0JF?l!wEe4Co!$W<5!h6`nrrHvhz}fK~YlllW;n?(y zOep%>dUkyZSM>h5*i&xkzGT3iWRj$rh&Wd$H*_8O#9dJc1a%V$++*@-(kj(L#qnVd z`qT{RHG-C>W9rWGjM6OMT!vY;VUb2vo$q#`@F4-!jn|~L;lG$kW4>TUKl1&fsk zQaYn@ZPu}!Vl3)J$gI6q_k-4;I`Ip>;lc05Yi}=L^6ig6*h&1V1LMLRUb+i^UBmm3 z()N8@zCdHFlUfspcg^6?x}$yNK%L#dX! z-lSw+B}~)#D#{Ffgs%@{_9bVJh4N#@*{R%VX-PeF**cX^(OZE2;iF>#TU@Ee@ko%O3!OzgP~NL|d}J7J_Ma@+ zS9?i#s`QzyO&mQ`jfdjcR(c|f3{#c#j6DciD%M**Ck?nFg2NUodVec3tN=1T7h7v0 z*xDHnwq;lS4ouHD78CLmttu#Ynj`7zl*&XA{${*k&?%-r`?so|y<<_m)H*<%p*F;RPMlo! zutn1O3VXo)kNdDmf`hI`(y@b!AjdTeNt|YHY)0t%Vzkp%71!LfRrtunsp2Q&!ml(~ z*Hm3F6f_pd4Td%@>tvyIW+zQdRxv_=!avNNxZ()oBoX8XY*D(a6BSHE@oW%-PE?&4 z*B$G%p~Acy5hvjK=GT|lrM*hGT42{!@h`vE`*KhxsZVYjy3K!4zmre7v$~K=PS4{i zig}KAUy~TqaUBfwTQrDx31y}`7QjyO+WxXQyv?FHGdD3_bGW5J^-hpBCV>7Z_iN-{>url9kIoN=8_Q=)`Js60OG}3gNK4XHe}zF`9HOsXj(3IF zM%rpfUjm_#*j3cf4s@heS_W^+-JryeKD8iuvM8s;=q-1RU+i_^x%@>9`l+`yb&(;V z)ou-0G$+q_g(YB1!!e}fgzb{yz=5_om0M6r1)4j9Ne;QP)N{H!+GLh}3~zXmt?!jR zDpjTmI$XEP!p%8iCHvJDfyZTRVH#K%}$8x9I-Oiu%zJWD{Fl^oeom5`ho~@36x~pFFvG%2GU6l zDqk{4TmJJD@oV{Gx(}y0V>%MIJq7xpz#DcgmW-*UJ|h zVHLJw6fxzGM!Jp$P2OD`AWMikt%vd^GA`yYe;u^lDZBq0hXau&~txb zy-$?q_@d_?zKh%LP0Jlv1qlhez=Pz@t;3%Xz$lQcl0hjHP4YVk6_MW@P;0Zsb@z42 zOvMN$Z+sG*-E_>9f5rx{SlIaNSQB>7*yW1>>V#9x-;+*1xEbK`N>G`Nag%msq*Wtf zy0kz{Gr&lGK^MSC*j~ng& z;4Pxp0zl-Wa#)zAX4OEu2PkqC4+Y7x=K=`0im2SV;=#>yPjv)%!>Z zKX&vY8NXjYGAb4H`&4gNals}KCa_+lkJTF&p;D2ncVfeR0)B@qh4tPiPJnfl`pQtk zclViH`-ixNA^3@*X~R+=rgiBwv0L6;;P{QjS|)f0ncXt0{*CCC&8l8WMb?FI|7Zvns^HQxih_M$$1 zHq{4$`FsZ}5^B#9u+3Wlv(7{(&t_2y z_x)ky9Gpq%wbooX;2xOLP#2;Yw;_K~Ckv;COmTQB(! zZ`HppROr?6XW4Y`=A$f6^_05Jc}esW9bDr>&p{snQ9T`|DLOdHScg7jOZ-oIPKK_; zta<~$uetL2e*MH#( zAUKG;&rNOgPz1`r+6abv@P|{tV7tCgOQBC6J6Q94;wTXdrI^7dXCmcCtWgVAMtn2F zluVjxpb;LcuWUON2!xpkI3wL!w_|o`V`aqByl~EekP9Lp1F!*CzuM?uRc>6rd z=q|2u_{*FrwWO_0uXnQY+(agyQ)!O7KdO1PEF`>j_zSH(@AGe$FGk*!a< zhzrTeTqkkXv5Q1;f;iLd`OqTIF~_}i?DW;k{e03~d%NH$_zSH_3uJgi7@}l7{_5?f z2)GC>S+H)L|9&vn&s5Mj_(+>E;krY@p)VGS<&|UkOx?lpaO@aRDA2E~ zQ0m3>?xmYzU9gF43UaoDl&CM>^;~*4O2BC78vtU{1u%Z`S_ADv81;QRleGKv%Auec zB+$y1u&pcM-zddcXt9uVYISgy;kMOvQ?TwE=(;?yo3*c~nYy-1v71>HDwNqf=e-or zW&oP=>v^qSRQ;b4&NspPSHdQ-Cm?!547=<)A#L02;k}f3YK(?UNbl601`MA%P=QSuTbyAFIc)vv;(WG!9a5lwU;9^S7~33LA)0=7O6(;u zw&?@amFVm4Y3=>jESM4pHsbCj7k;k)FGsZm(V2mEJUM4AfB5Zy1MGLHO)hqY@k*|d zkt1$MvJi}DN!ahXNGmum5UV0D?CJAkGSJUZO*Zk0bcGU`9Xj_0>Qj;JBdQ5IV?ra< zq_a;h#!%f~3iygxiSMV@asb{PCIYdaZwFWLUHo1-J{EvAJnW8DPX3|bc${$o9K8=a zUAk7AaPc|qjLh1Ne~gr~Cg*!RtS$Nzh$8ak2&P>BCEzB#4c?UwQ2Ncq`0vI<8Fx8# zK-;!C!+R}cpFlXoRRqP)g3Q{#4znh{U2_8TO7JWF6523)!h16Ih?Hg>nF-``dIkpYnh>ll^F zI%keTX04f$eX?!kYTRsP>$r+n-jbn-0-_b+3MSYJ2m= zkxY4opK99T&UX4T)CqK3s7*j2i_*!V8%3ML(dtWBYB~wou_JHt8RTp4Ui!ZDra>=b z^wV($)1EAcBI9dBM%Ts2$0bXe^$fv`Jy-=>g;nDwu&a2R6Z?><3q_CRYkwy;Zk| z?%;)gDf;&xyYY7=1iCVF%S1R;9RY~T?Iu({-A9m^XbJQi$=n{q9v6>F%CCX|$cqt1 z!PiPyBySJ3O9yGN6Q0;_4gQzvjw)f7S`9?N$+sE!0rXVCoWG0XNC79nRteaipMZ%C zGy*gs4a!HCyM5E>*&U%9J(tsh-+@DWKBR!)K!dyN8z8|8iE!QRUJlvaiGQ9DvO_sL zz>9cdHT#BWd>*);T)NbgQ)^$$&oAJXT;{|ql%;4)>0S)&%~e4<{{H~&Wk=vrS7r?2v#VF&H=nJAE?DyaU4r(QX% z<^GR(MZMFVi92P6L#$Z28S+}%bW<4>@(0S-Ujuo;5JY}GKi-38CqbZ9sP1BfPfetuI@EM>n&&VZZ9E}lqzDXWTq&P0StP2$zNMovCIg!0!Z>$Z(d;Wl;raAz zXa?0F3)ZLoY`|B;t#m+n!Rdyr1Y=*zy3-7+bj*-^>&B8SmYx|dHK_l~%w^XZWuF55 zq?E=S9o>b|j#z(p1$h?4>nc{)o&>By_QYv(SdW5FtGNx`W9Nb!+!0qZY$$VUOvD0L z%bkCd&My179xUWR0>&H_re&P-eeDKL_em}DU;eZb#vNa`6ps`y(Z9jbo;r*^Wl*9I z1I?!zE61Omo8^QyuSfbeI({nS?m-|emYV6^w_sr8Ne~vHEF#x|c?=+#hFV0`Z^c15 zi73o6O5vwT#=Y-zzG$@^-asG_zW@T0Yt+WfDy5Y@ub-%?-?XBJ=cRe<*{B(?aoOv6Y+ZaMM} zg62~UK6}TLd67{joFXq^{&aDn7a@mU@v#!HG7CiYQm{gQAc&1~OH7r!&`8V@3%O>r z9YmArqB)UKM)}VsIwATXg9wE@(#QKTO3aRb$wt3I-Gjq`R=3Nq1Qp|Ou?I}PnJ|6% z2>yXK=C^;51$0xd)}1m*z;*&ThSy|+#bTP$GwL!D5LbF72|P|i?Y8Ddm5{N5P=q|i z64i4b#soH>cZ8qB0HdHB@{nM@Ob0WY-t@q67{Tg3l zhs#ZB4^KMvL60Wu>kzrDW5Vm>y*_ay zrE7lOS0i8e4U!LBiaj6^gbEI#c-&pSHTl%A`ubd6PAa%0_vwCoLqA;?atICGU-cus zYHQ3gIRP751|2jXq($f(^rx{?Ryy}jYbc9y!C*%ZCq*-%*KFu1NN~_rPtUiVMqBs; zh6znV!q)Nu7F@yV{a(ur`=n{;A%Ow=7u1*A#2FZttP#Dir`1)clw=u~9kdL73o*M0 z2jW9|BTmcttv&-osBNfoh(w^Ch0Kx#pEZ*Dlv^M47~L99^NM|`gLM&~%pMi~loutLe>9PJ{Qc+fux&P3muhcWjo3lw7_oRf&TX96p*f4E90l?BBB}=L)9qzHt1Pqw)KP(?IC#cpU2lC@bEPra( z{Oaqe(m&YiYgBknZ?+%$&UZgPsyNEcTZWCV$V#qtD1T>zBR5|#vN%vx&$41_h2jpf zau~$)#6XOfQbPu=U0KCs6BJ-P*j}ZK1|1!!uEWKx_FK23PNUPR@zckCVq>P6s`lF| zuxR2RvDSg%M8bAWSJUe3tD@*^xv$upKOKU%^8n=4=WT$Oif%_B>boVWCe#5Q_Zf3S z%h&4XtKtN2CJq#$)`6jT><<)la2)TkY69dJ*7FzaI*GfY&;;dqrC|d$=fiW4WnP#5 zJq}mee33IO+|Vjhk#m*ABJa-prf8lr-KW*I`&5JCXZqr;s|>F9qUK;;YwAjWL%3jE z)fy{acSF|hRg?flxF9joQ|T74?RU!Th9kk_aB61q&lRtD${989fcG8Mw@3ipK6jZu zxNYk^$@wB`JoBjx+AYlPHSl^!bt)O&Om{?3H3RP`IMEJ+{F0-Vh~aVgw85$(RVIBY zQWqXuz^EAy!Cdzh*o`5Gdtq?DBdo$yvM3Q(veLepBtx4FpuYZhp%>fx_5@cD_t3#I%kM&-$Lvaj(5rOSu#7yIuFqpbp-ES46$v zcPiXT6z;s&a%re`mJbIw@ki7nTno(?>HCc&C5JX_(qmDp%>|SrlTjT1Za|e=sUvKe zYoU?kKM=nony2&^S!L6GN~PeHp4>g#Qtsbn>2fcGH{xiT*Y6Dei#(0ni7$#G5wavY zMVsYbsA?geB3hg-##`c!M8|?hUhb_`y39ix2dqfJGyGYhxNdYJL1es=l}}xfB_d?e z%R|O+wIC*qWy8Y+yCi+ERO&Y8^or|Hro!xB6rvXwGnJ@}v%t)hkszC}{AN?C=5g6{ zA_G&;v{Hyl3!0jA!a_*+0OJMO#j?;e$F0n1Ec!avxrTB=815dSAS--c(77C5zYBOH z#_78}#OG1btaaMM@AjIw)u4Bs!bt`|ijHBgl0nKx;eVw%;BKGh#{5Oj=Gl8K0 z@%1%J%UmP>sEyreJBUPp2IXTQO=P!PPFscq2lGZCFjj&l$OzOR z`~bW|l5A1wt&pAY?@*n0i zAZ4P%v>-eV`B%S&z{*%FAt4;S8<>^jqDpJlT2QNrm_NjoJ7y5$2qfdu%kuM;%Hs&R z#V157MCFiQnE*lM6&qkC9w2QJEQGlTZ6&t(P9YFS7+*!*je>0yxEeT8AahS+gg6*S zU!csn=r@j`YryP^^l2`n#B5}sWDAQIP8xE3aC;PIr4@-KoPTn*4AO-Cc?F6fH&nM6 z#d_UTpPtN>`jT^9DXv5#Xg7{F6a4u2!DzNM>3M8~E6!{X!)8FK!vg59n(TeK7v4i= zIuLu-V^_vVbsFj=Wk(E%O2n;;9z;U+$*#4XZdas8NQ5K-)$gTrLRUO}m(uEh?G9GK z3lMjNlyDl)r`Ywb`kg3CL2UN?QTIaniGkNpc9%s7Pi|jJs6JY~z=t!po!YImJA0}U zd~22&z&t!OIjv+J&5iD6gM$IA*?H%BQx3{+x(?g-W5ZC82Q})O)7E0?a)ows-H2Rs zR5gZO3ZF*~`_XgI2C>KLp*XgD|Cfd@id^b}fDTfifQA4LC<9yWyxlMaqN^30&jWWV z(2(JY&EpP<9lJy-08(T$x6mYydBm#6UV#*dQac5nV#;_LYEuB7UR#D2nwRHoA|xeD<)9TsXCURW1P~ChV)OW|lU-gH zylJiZXrOOU<^cI0O%lkYHcL%n2v2#=`Un+-3z=Vs#dX-vMv+CVER9CNh!9i#=uA0Q zk24;4$pDFi<&kP5LgY+{7f8YvmJ0!rt7sQkjYQ}LNYT0HB&Uk_Ht17GrGmwQ@CVj? zC}(tH{-^Qk*Ho^A_>Gf%e+7{Z<^rJG2cQ@9-M%$ln9`yC8NLs5KWY$!KoN;Lp~Xj( zhBsL>-N^CC8-gsR?adU)=on2 z)}DgkxQQaq7p<1uAMVz+ovHykQAqfAbJNm;YD83!vHJ=TIEY3ItPW$n$>=tq3kx1M zlSW+GMqjeo9$@HZ;+SR<7N9ZOK%KXomsWJ(OTN|b7wMxdBOxl$ z)`7DQUD)|vI@&Mu7HEFM?Y3Pd zid6P87q_mWxZcS#!@2(&r$zm)Nap(&ZkL^C#!Wvi$SWT|wB+QqnhW2WT9CL#^k{`> zn$eGe2KB+To^(Wb`C)bNWWR)Oo67{SJ1VG#F4_$;Gy_7d>h3?UhnSl45>>V7yj(zO z&}I~fa{?MMw-VZuTX>Q@XqP5)6cA_1dBy%!*)$nVCG(6=>;jTI3cIi-+mR2NFqxn* zyn!yNkq*0zhv0i^$#g-nGFRWQ%FHohX(V=n-;3x|JInQF8$6HHjCz1%K+d&xU*`?- zXVPP@@6t`n8siVbbZ^d+mimLDV+F6ydN@TDv8DSD^~gRGVVV( zP8Vm3geFVH>0fP-8?SUaE&N5P#A3?@e#Lf_nLdq>KjRaHOd7%sDU4g1F%&`|HFJY( zcizJcWW9mJ42A)X14Z7a83MbUWk&5({)Kvfoi9A%7*a(&&pTRTI1eg+bKzSAg97da zbrFHEVVo!(S78U4cec$CS4PbJ5L)(i&oHNscPPishi-M@k#lw1aQJ~2BR9v545HY3 z`Fk_V!i9yM95|@YsgXWT>sXhi3@rsvYR3t26?9mNU*cDTyE&gyYK*m~tPXlMD1J#l zotCnqlY>jSwb&UUQ1p8_wRLAvmCFDB%PHDyTNy3*YF14Lqv~wik5PM<#ttWzAJTr> zJPgZF^K{O16I>^x1_`l5(eIM^-k*%>Y4IQp8mg76OK`4JMJaOp$kOQVa7#ta_?4G1 zCSbAPp-0Jfvgdd$lXNXN|IABHW~Aijmlg8O+P3nVl!4A`p3`sdh@q za59=<&-LeB!hFNH@ZrLjCG&j5wD1?EobRc7LY=gq6me%P%V+qsuK_$M;&pSoA)>z9 z)SBYYUUYX3b%;N|r&DK&1@7z4_T+;nnRXg6L?-fQU4mh{nGa}yoOHRYOmX z(u&u-eYa%J^(f|j|>GBlwF$qS%XWejUYE44ViXw z%>~nsQZZJ7h+$v(Qw85odcSp=X~jz!;aIplPEg0g;_Y})fzK3!fKb2&gbZ(S84+jf#l%)y}-7dB2M9oUE-}WkL?|+4J{4GYA=`E^;Xx;yCh9&1H z=vu*0>ML?X^pjZNS!TBD61G-~-cx?>|75p2m!g*l5dD{O3-3=DlmF4UH|<~1OHw%} zrHW$Df8m!JV83XVN#gd zu)*F(QLq3R#jx!Xnj$aN?PFF<36?)DqVJ`Mq`20*h3E(SL%&L)MaMK_O9kM{QaeiDWE|Aw}w~1*@dC?yQZi~k`Jw$D&WS8@J!%&cX ztEV){x|{2~XlNQ&VdLnfZDo?vSRSVM$m`8T0h~EV5}+c`-qxX&^mbBVOuEL<^_6(6!!?tHgZv30_j79{q0AI{hU^1ofP)-877q>H$-mY)2IK1INFn1 zgRMSezCNI%amJ3v)GsljgNWlExS99IVv2y;U^$y{q2puSsIF9MD5`-*v&BTU1J9WA zAt7kkp|MvDECNp;;j9AwaoH6_Y?l9AvpC#J%UH0gnNX)MDoQ8u`K4%X6~;JNb~C|w zGc}>=i%)VBF(PY`a>l0G>!n%pA9XpEKBo&px9jdcLjW{6+rjNa7=JIC*F3i#`%!A> zyYuDuPqu60#5-2Ay)|qz1!{>Y$NP+dXRw;|(A_Txxtg-c!B`dt$MdEZjxKf*`iyQx zS2PCJoYDSq)usX!9x7D~&Hv^3cmHqJq1-CT6h>v&=0xE4zFbV!?~OYo%%N?(a8LKx z5<<^O_0kH>{QV?8RAYhH{WVkSdkUEb5v8oanB?_`9onzJy{9NI$yd0!j&z2|F#)%4 z-XYq6ws@AvlH@MNdTD5JyjFHxy2hFWmh}Z`#XV{x%limDN?3H;?|X@JIt|*zc_on+ z-5Thxa!q;-_nUto9M|;-jODD_oXe10Wo@d%d5@;Zfm;X~a@W%;_!`A6MpCv|aay7Lxl&Hp>!`KFu|c?^3a;0JTTc zazBES>$H9{3-?Pd+HRPw#sIULnhJW~m06ps<>#6Yj4}~BMUxV$UXL5(Q4s4cTsLdq zLWv5hUAkO1R=aj9mXo5jx%bXC|A1vFLUOn$x>Mk6lZz?LZm#uB^`u`UR0$E~IFM1^ z(j+z?1NQMjW_dLVZ@GdS>jH{S=XV=lh^h9`*(nI_lPiyczT1N5Tl(KTtvF;O=ZX%0 zOyTSn5D5{SwSk>mrR%|uda)3;$LmuD#6$DBRyCYAIpZ^-1RMEz=n<9WAe&TNK<#Rj zhE74gK7XAFZCGyt{Z3w~wB{8Zm(;VfZIW57w6?#GsXaF%A<~8o#7l&c+*1E@c_pcD zX>OI>Yi{+AZ~V?2Jq;;YF5r#BAJdWW9!$2E zF&#K0tD(*Cj(FBZP`l^MPSEk}1C=!V9aX|$?Nbcw%!q2u)|mQsm#|<+dpQy0r2lX$ z8^Qs(V?cY*2;SfzdLR#(_;QckTlx!|Gi0Kt<4L$a&x?%!E2 z7v{w&z*Od<)V$d0z6f0m?ZH`wQajvB43AfKji4_eRBKN3z9}p}hp%19_g*Sd@tIed zR5Kajutt(T{4EH{^_LpD7T*VHi_K&~qZzSvACdzFu-1pV1QhXd37HT&9=t>*-T9oY|~ zOj>-Q3Ig70qCg)3Cwmk;&N_X_>b#4PPr@~XPgC7<8+GBsceVoy1xyWTzWw|GfzWVHiKeKlG-c`F3T)S#Cyz%DFW3^snVT9j`oeO62SfSrVMj)8m?|9P&4~~ZpT5j88dt< z#iBsv$r0Fso&Eo$Iqo>7zbOkmYBd&{-H|g{9r%G9X|EIH{0>vr1r>EL(#8fNkPFG2 ziA%J(t1sjfgtcS1W?rkvmM_yDXFr@;W9w_Iu&~-*YPRN7$)f(I1Y&2zLn+*zRTVm| zti$$>74ngABzYo{U}A2=bP1ki`X1n(Y)Y>tz@c(Ar@ObYnXZvq>M z7me<(%*QgkgV`}Pv=$?q(%f$t4K!f^;!Qm0zH8m~P@+D$9LA#U)Y*!f zEAkB0k}DQl|6^(cON(j4-t7*`%C@X#cr)`QptNr18dZz+tL{Ip3NdOf?B(=aG?`1$ z?s#X1*uT9N4Y(=0ubi-HBwB_o8$dWVn2t7W3DctTQk}>GZjMtaz`*(}V*0~C^^XxC z>|C41AsqbG-KH5^?8OV4oSE!HP$W|P``1YfsC3Dw3D)!DA(|mjHjKm{DCJhw!nuTS zfgH67YsX(=T!}-uV>R(EgnRBRFGDc5r<-GHiE5c)PdPAlH0n9y-Sx%^L24{i&zC-q zCL*?Qk5p(ZW7H1~SJnZ%?={R;=IIiWu(UbEtZM)}4< zyAU*yK14kw@-LO?)TVKU%u$Wtu3*HTt9r?PbA`28pdfD4wE$3s0Eo#%ABxD>=$ie& z(E+{`>zQcRUCPT_&0wJvM|}dOGu1ak}w_k&ho!a?8>ZzVG~dA(DecMmp;g3+}D z4Bt~+&%2acnAYjOtqT0rw|i&K^Kad5i@kuI3UXYbnKbt6)H(2DoQ{0F=@MmDG^-79 zca$4JCTQQJFlkmBJ?W$09dsZw&>=LtCM>9Aajp6>Cro)zM4KW~s)u<)7m3IC(BY9H z?J6|2TTChdS2zK$BJ6?v9uZtCPG-nk`@eSfn4)UznEsj*_*#(h$24k}0(KkXmmPf3 zT$MO`j)1|xthxtIdbj(l(wdgkW2;Ho>IvZb*s(n23oTTEh`qCB0D9ONjKRWfh7nT` z*~Oh$0bF|lvCe70TDf_anEG?JT4Ij6K|Zw9!D1#*N2*8FIghJ)<1QCQ9;yAwSwvBQ zFZ%C$;H}~d<1wK$AFM64g;4qVTM?AKmKVxHH^^J(X;Iftk1q;MSpS#S@-hFFKk2o3 zgtpO2sIP&rI2LhOU}QesvQS;al5n=f(??)@mcG+wl49wHYA$)yKO&Xbnc-2PHCp%S zkw9i!1>|xjWkCv%r;?hm6&glf^+4wTbew6|2O(=Jf|;~ zX!KnvSqbRx4EXp%lC~ElJ4YsQ8SvSn5ma$*6S?P$U2uF%yF@{j3XjM)8(I-mSDQ+)vTa+i?L=$H_N^ z<_LV(l&2;2Fz}hr#xab92@qc(B!N|sVwc2vE|7*(dj_xTU&~FA&X?UPYR`6Qn3J4w z3fi@+@XKUQws1FBGVM4@^rB5fn;fk~*U}6qMGjw^ZgTb?C#<}_h7=swj~zVzQQ6Ge z`O?DPn7iDA!FzsfMLhN@vi+h>q=%6+*fUb%l^y|UO zAN@nFB6gX7l)UWQe}mhq)i=aH+Tv7k9;FL@ZF}Ini5)U%-B+oeGq0hdZvvB@V1?jz zI04h3_XcCzfLG2q#PVHLml*X$IiaDaAPTIG?=suCUBqb#Ub|2kXWqTUwOz#KA8~}U z`utCuS)Xs6qGwaQ_O8Nreg0=0_=`Q_7O1FHcet3Z$`7O!5cWGxJX06I7MhmGNjXlDh?nb7rQ zaWvBu0x9KPB2Lto%#cP%#d6&CYT+#2Cv~%7zSuE0?%ilhY>kTu{~gfx5)rS1V`IDb zS$&kmpT6+X<*Yz4>Z0-)j_}Uv)gALnhuS_J&!mh>H&w0BYT2UldWKCJzO1ihpVEW` z$!T*UNA@yv#t5+!xDfGp4&q*}3wnV}JsvfHX0L9kP8B$Qx`c1b`{4TH{KO7KJ0~{= zj$g|!U@nNi#_;+fseH#{GKx)g18-M-DU#LSP?RCeW{4_JsCfek!vK6o2}4p&bQ}f= zL<(syd@MCUfxryruoEaOQ%$(8#^0;l93&q*Xa!X9RHn#(J{q0GD#Ly?mBN}o zsI5pzt*<{1BCp@elB(8{(T4B+{;WwFOqg1a^)@>8_G>~-F?w$&?Da6AuWlpOFR*F$ z^(J)6`dWlwOsP;0QEPi@F3+t_QK6c@j}cqB12zJCb89CJe$`%Qv@E;*0<=Uy|u@j{#F(f#3Ui;x9HVclokzA`{uwM zK$lsj!1ekZO4UxCn#W($h7_Y}+5Nae4HPM&6?Bn_7Lg-w03TP(>8G^!nth<}R=>a> zOB@jkFc!e~W=a@PubPwFXMh7)@7K^bYVko1GQ0|9L*~&D0V{HTknlS_$!iFfWCUm= z?$@kUg$x-=O3pzNDT@**O!j2Zm>ejd+u7@##~C;3hljc$`aN(+`&8|43CV}_)ufibtX0TU&%O! z%rWCEsgz99(2A-O*J*A32m*naxg~au?f}gHlZQVDA`blv_QJE${(!AyrX-6gCAe%~pd8 zR!o9eskS1Oir{?U6-VGeCcQ$C1$@G;0jr&d35bo9HIH1frGcDeW7N1t-l5~f{gSQ> zt1mW6kbrno#mrrkMf&4Lq9}S%Y<&DG9r7Nc***?TXWgyzXWB}H2USjj){{ba*sAo; zV15E8j8_qd%&<~wng`Q+`b)=XF5?;PofH+gT4)LQxNAHL`2DN&+f&wH9b!w?kMlnX zA_Uyu3K?(ik-}?pabI83nn;w6C|(g{g-QSo4yhJuAsu@}C>^$;p_a>|CHTr<5j41hL9@5an6_61JNz!7D}3c)t^f!Z)qZ0j#(y37m+oHZ?pd(6c_Qer6AUMI+u#w>cmjzHCWWfD-mq zU_5hnxfN(?f`1=0&15B2K<4{R*Fv?G=*ABT6xKG)f%_4VtA9TZ-srIu6}5j7ixw$$T%bnk#?`S zx~`DQt1vQDU2#7i0-zV8tLl$OnlybPXk@`b%LV8mBeaRm*+s8_iNswZIbC{u)XifF zL4J7sxk;ehAnv>3YGHZYI@{Pc=JFoqds(k;;1Py>oNt_4$Ag`Odecv0#{@LuM^o)o zHY8D>)-R`5vF27tK#E{@D`7$#fYiI6ZJYb@|HX^hUBi~Ffd3-%Z{>cGbmZ+}e~w{L z_^bTg7ZoAGY;T=xuOyr-%R;5dfFBg9H4ZbcAW2$uV*T&}MaFK(FAr#eppyny5xVW4 z&L`=uP#$GP7Sv!U`_2KgxyEa*r8M#+B_~I|V)>^uOJM6p726;VVpTr5th}2tksiUu z7p-~*%u5foBm5bt7QjAMEq8i|QYlkikJy>d*qX{kCrD_($aytr`}6uHohey}cqy%P zo+Qx(P{$vssHs=V*JhIc#CAyjW7``WUy0tzDp4XFlzHaX7hTbf?e}3f*v3s6224(h z^`mz=zL#qiipmnS0-D8O!{1QN6#^DS1^|+AVxqyF_bA{rilnG1$+~v}#Q=JL5c|7o z3BiSkx+ndW4R>OuwM-R4vKC4l93uaYC@sA$d?jERGCI+!i6&^u)|Q8+F*>Fs*T)W| z+l({gD*8zcOFL0^%8o6ozWUpZ)qr|ge@2U{pxi6_;euq}!m)e@0t}1dPC1GZn4kI* z>77l3hCiAT1|Wm2q7HSYU~QP7m6cRK473}0!#7_Z`2JM{ z2J+^>!m6;nzwcOP|&AHkxae= z$zwjcKGTQwT^IT9rM1pua{u|(-t@=q<<-6P%zMj5NC$u&lQlGZ#<{IdR9`Is5>dBj;)(Vpw&$bK+uHLq~!n30Z!|qqD4V`I`S* zL3;c(&NAMN@!2Nb7FKa&36%V2Hl4UC7xcCYR^-q zh+(fcYqs&H_Rdg|^iJXK5*Y5W72)puDzJCVha=TmQ+7AQ)+vm99=>%bw~8vh4nomm z9XL$3@Og)um325wu*@PzdFwt|R=m?P-N=6bj74^U#Q29S3AiEs(QxI|4kK~JmEe*1XmQJ5qEx0y1~B{`bs2? z#y2`mYc=UvEU3lK42A8NOY*u2IjYU2S1K@pvn^n=2UiY94-<@yIU0YAlYD|L(*cCN zCGr%F38Ga*aC81XgY0IE&~~gbq|{OM=a(1{7}BRts8f^ZcOUufTUb>n1Z})k*j$Jg z0;Q#gaUl=sE%b(ISOBJgQsC({Dh^0|5InL2nFbbM)yGQw$Y#{LI`58}J6(&vhN7xXjj5+SaEa?fd2kprNN6}fL5Vo!N4_}feqH0sc6T)of}?nGT< zc$~M*!#SDeilRCNTm!AF=<)m*V4^!kYO4egBjYV}|@X7uHR@#N9 zh72NUbEnbNb-*<5q)3csne>PFI8DB`Z(J=UudeJ@R8^*j=Z`Vu*X;0_z6*~)PEGVh zGb&$zcVYU0i&ohd13Bb3=83-uDPSRYkK{jWOG}O!d=g*JGVjWJt**PzymS@#8uu3$ z*z+^?3lXl14fA>;_UPgI!L#`T|0$4ickhU@S!9F*+P*uv=t~7L6&i^q;$ye44L-$F z(-F<^7N_q;j!kejM-dCD@swm^I&b;;eJ_2QTv_2fb zl~r-)KYqp+u4Q9Uzxh2jZ+r)o$I%%2N48}SjBT;XkFJ1S6e49i^0I$f!@ot_w& zd}s2?a{6B;oq!E(^a$%tD8@R+48di=jV#awby=K|i zZ|}$~aPf;nHqwh8q{yS;dJIy#BGe~bn$@0Sy#-qY1X`l>a9C!d@%J@dh{!B@X&_!& z<|58Uiv@w9;>J*;Kyi@Gi@VH-8K6(K>NEIA9o$WQKEdDkQjv!3DI`%sMJRjtOo+KD z2s^?1bmz?p(NXorP?V71grFF9NW10&0wD6vWoziKQyA6AI@pG?I_i#x&36Yyw)mI713 z^vOf*U#Dc=aD)dh4658mbSdJ`hZ3)kWE1G_$N7Db{@caB7StTM zk#xO184be?9xT|~obI~}+u3P);0Yef2p9nehA=|kU%&$> z>64TgOoF+9l&}rPfdZk&dw;anyfUIcRb>j&(J8;MyVBv#`<)MbFuS#JtIt!7-f=QE z*qxH!W58tjv>Mha_u*Q(89Rv5=a5OdWWdB2Gi-3pCuCYLvThmnO5y3ar5^u$fa92K4mZ>x$${>#_Cqj)8 zrG5%P?xs|vN-wpwR)~pck|Gw*bT?i91y~M21vsKLf;Ue`zmT$VRKRPvHiiL|*rf(+ zXe5P)k11F%r4sL@&I~g_tHzwX{>%NyZMx0^7E>uNt}okiZ9RQ&T9J9x3T5|Mcpj|n zcq;xYO6P0O_A0z!E6ZprN&KF_`JU8tRwH7b`e0*^+}7LwZn`k=GoNi8eAq5a%6{xB z)2No|1B94LxeWs?9M(%rOd3EO?qzY|EuX?h7)Sv-$zMDL0SUHEgs=X$F3r!OJip7% z-dHPfOr%T*gSL-72@0sqQ3;^pP|4^EVgi0GFN`QS6-3`K%mot(Nhq<5EFM2Ks5hV! z9+7j%EZSuq#(SOPvf&Yh@5I-gzN#=?*qX%;({79}-W#8f)6rKSho#V2Axeoow2hpy z7b>MioY-}pULJx76NuLOUPzF1(Nf>-!Az}Io=U6ly!KQ@XCW1R+~kA`U0s6>_*Ip; ziLoTLtZ|YatiK>Io>nVzRrnoZ5!EtYNFc+f?Xn7|Gf}D*Fmxu zBviV|`H1aHWQALP%QhtHe8DfLM_cw`AK=Z*0NhJa-gw<#_nLnV2gucTRhd@n4mq%J z+XKTsNIi$ngexd29&iwn>puie5sNYiNFtjBd@NK2gLI3*q*rwHr+=2j9oF83$;k#f zKrL7qpCa+2^zlfS*pH{kJm%uVu(vA>bRR}~r~p=m7~=|$32r2#E0Q^vucckS?SBA_ zy)efJ@RUk`0;uSvy4OYhT0>SY5>?RVQ+u)Ahg^5)<&obm+-g7F8RRz+gf`Y`<^E~}lV2DM}HvoiP z{u!dKIV}nip~N}3Ph8ey++)YruRPI$GXUiFbnnP0-quC0%fUObP_?3eRVM17nm!qq<@YAlXH4gFD;3X zOXSckXEZ@2A_^mlB#4ScwUvLdi_i561MMqN=_;i03`wA9VBZb;C;}hVFJTywk^)xo z%{K!B3$+wXDIo>V7ev-s5A#hJNjrlSc?NE22S<72O4*ZQW>v;8#hmarOB957ScCB{ zI7}LW>;>L!XR;j*b&XM>gdFKSyjS7;ei+A`ub6KT&XZU3U2)8?RoO_|U@R`xd8vnR zo}cr=msM*}%hmc57C*;fT<1`C%ZI5v$`@?(qyun(m3(rVBnNO^1}NqRw0k}an*Tbm?GozXR@6`B6# zsWOD|l)cw&lmys~8-^h)7cFYfL^$TxP1qS|pZCa^7v~rn^cB^7L2$v42QC9_r*>#n zv|r?pN6lzVM8OD*_v`H4ncFR;(FF-WcBR~XDC89(?@)hylchN+K&t7{L4sTyheYaT zWBV-Qx4u(%tn)Jt{JQ3ke?|v~P{mZ&z|KyY1vt+Lo}X0@zC|aG8&!|7k$hG^F~&_%pSU4mN$0s-M31mfsM+t`apx(BozeFid@Bo{x$S+su-SO$jHJ+ z71OYs3u@YgMmcrJZe<;l3Z|+Lfa*P0Hx0+8%$qg3kMZWF!c_G#NRXd~UpUubjr&Y& zM4^#23bDtbgs5>lJFNUw-7T?l(TK}W0BA|n?8@e&S z?wEsEgEnORLX1B)qR^;8w~gTSo2sQO;4VW1{bkf7!>a$ZZ!Fx)!4JLW$3wNj9u$AL>B5|qac9*%c*^Q?2xm;kb1Z-F(<^P8@YgZXasg^4k zs2Y(HsxJuv9#8f(9SyI)n?g@>Hp=R1WLa%XE6KHu4Y<2VL$OO_&Z<_g>tuI;9lSBq z5HtLqTw~)9`kPI<1gubX-sotE7?nOJFrRLwX8Qr>)mt!wf?8+*b7-)-yLLi(@}I6V zShFgYqmIUtyK%=uT>_dcls?ywfybx!;6}W1-Zs?NSEno+LAYu2c#iLxGE8HF5*K#a zV;)vE(xSl@_~=r##Hv6h>oxPj1`^j4S}3`9Tf7{)5J~-lq07D^mfu3#U2Bp z&CS4H0xkN?TD2%>^e>PSE!k>j~Np7VJ#@5?u0Z7zc z^;#Usu#+Z2`VtSpi=2ljJCb`1S%oIn;?-fS{O#SC#Xrygg84? zoCqu$xc#*gr?L1=;J>p=&l&+8&6;L@oCzZ((}Qp3U27H%{44GEm|Ej(yROyOQu1Xi zKLjCBZxue)WoMU*6lAfu`oq>0HGmN7*qn|X<>l^hrtC$g-ftuj^@_JfkotR23q+p6iy zo0>6XC1Xh8z0FpOH87h~>a>OswDoSj0M61bURQ~!e`;Ih%&0)L+IaP6>zD!AsQ_Ry}#7o=>RPjj9vvp+2pLityNp@8GXz1F(N(2%j*33pS+kX4QFW5XF3k z%VAN#5d}-#o+GEAR_id8@ZjXtRUZj|J8WL9)#!RDvQyK~-cS&8 zwdUV{^MJs0(s+qI(puw9h~R@WsZhhqYf+ep+t?~?=85kLgGLl^Vk%vDsxfMP{nM?J zG8riB1IP5E|0cza{z|vm{vJE8cubzcn)!CqjJp(B9ZRs+&!Fi$t`mEtb%l<0s`;d5 zGZq8h+53(Jy?iwE_}voPt;56pB_gU@c{hB7{LxU-?xh@P+^gnoQ%ea?`ct=|dlM1z zucTp?78kv_rNYbRloT1pdc-ULbGKL5JB zE&y%)apDm!0gDhG@@eWHs00W0$r)SzDfoU#8kK?LR z<5ZV5*vrJSSwc}amF2DnBaBD0_K!C7l#UYUV+ZR*pF)kr<#TZD_*y+NxHqb1a7)&= z`>QXY=0piHq_K_Be^X6=_x{5}{SO57KigjnDkB3MJ==eJ{HM(He|V|?MMxEPa}rf_ z`lX}db8-Er;`c|VPybuMrxWDIXTYa3)c;-oGE;vW{`Z(_{a@DnH^7yF{(laolfh^B zuNDz=2S+D-=KtRR|EIb#Dbnvo97h}h0f4asYX857=D$w=-($nT`agy-F#cbb{f{C4 z8Lj_slO%NIR1Q1jD&1fUq z@FNL{_(kd|#|seU&S@7xTPd4os+&<#DQ%3!{rCQ-(X z{bV`bWJ@jjYZ&*VkWkJ>Xd2oqJAOa-o*MjAdMF1Nr%47lrwO!+pA2-S=$(m{WVEuO@hun-Dz-G! zmnaS_HlTS69vnCqBtU2G1|Twh-P!Rb7uk>_g*-AyUmXQVJasWLKBTwcm&|O_Mle&l z;(~*Qpo5knNl}IO>=}(%M?srkl?A%pV?a`f`RBFyf{-Ks&TPKU;4Of3Vc(vP@)LkK zW$qq?U0*y4jG*vST4UkgPk)DNSp;ADu0gyzIs1`*@4xRDV-wSCu4#vfq^ynxcndAg z82|Y3^~+o_;0YrAX(sT2=BBkwr^ne&G)A18)mic}Z=T3CeINfxU7uo2N^i!TNMz5X zG@XoYRt-zu0(xTq8H79C1Yr+>%Y!k=ePcTw>;pWO^D`Z@vqY)>qD11+yAHpOh813G zLSGEj=NyBGgR^WXseDIvJcY^wgyF~oc+8)<^Pz&P zu0vdO?z;MYp{XBTq@Ye`D^A&-V{?MvsknWmT{s-OON|jYmWhW6#Km*4VyuiiBLxPg#z5{iiT4K(N=+%AkjSw)*u(B)f}YodTaCPPFae>naIN`V;8%Uu z3k{czY4h_18zo81b2B~hOAdAM(VBoz{j0y7U?h)f6vEbsT1%EkumSPMfj-qz12}RE zU-eRkJ%KdTc$6t?>tB&FlkeB$Y8wPXhq|C<64IXk5RSvV}vUz z593CXXJ$%H*S$Ea=g$aI>Wo>pjU7qH(Y$2I=EFxn8bE*Qd;W|((hGiApCm+&lsK?+ z*j(ITg2)Iznm+?n@C{;4kUG*Bu!j zu=__?_)V5aS6xfXCf_bvP}Cl+q^wMPJSEMp7=?OCh8j5L7(>+k^TtvOhAq^I?hPi^ z)zwo$d3PPggkIX#teo2wSUV=tLy>Bxpo4oYDn=u+7q80=b8K6Sjn0<%D`_5w`VVbT zc)}+;5W&X9H`{+-zc309{A#wQ)KI&k%(nu7ik}QKfM<>LaWQfuLjKUMh z%>gCn0f+=h#tebL8y4GSf()!V0D#Mu?@xB_Rfp~&Mhid|Z_;#yeHdGD8NBz@P=oGVd-haplcr)kXon@v<=K~NWr9F5d7u*{myoeMPoHL-Jk#d!}0xOH~~?# zdXB$NB;#GPQ#ctw39L>0TSbvQ?KN(~kNp)fsUM5*RDxgT!8Zy%z@w*Ea_bmz!+IQ) zB>0T#^iy#RZ1Lvf%j%3b>jiM5{8P;Z7Ad>`79*2`)Bi~_XGoks$M#s$$e3;mjT#lrcwiReALV2y z*(uH%K)I$*W5GraV-d*8pPd)Ig4dKJ(#c@w&^B=m=@g8*5eXbAsCrk~R9RISs2Xas zY}0z(bOXJK>czL6*QIye$KRW(X-E_dKK=ESnU zZw81YA95!uz%yY;mx87o5~WwGr&=GvSZm(5uhw8ypk$;*4&l@3i7J|l>Xb~HI0^_l zeg~czFkFIVjcoBfsEIr|1O3;5!`E(30D-{r54$~S?X|1i@Ua=d$OCO5>VDzAD-MQ^ z`_9L-U9rbb&;Xy}%jYJdp>3A$q~6ny(sK(ZwgO~U4kk0Xtw@ebUul`wp^2o<&PVh+ z?E3v(C;WD&Th>Sq4p&26F@NWl<+dwejv%<55bgjwBLAvgViJft061o#2Q~wX>Yy?~ ztCuul3@@yStV!enKEy#FjnP>_1rZ^Ug0171fJ|E1{J~}FI5-STK1!M=q3cepte*a= zZf&xX@SbQ6ex4VWO^<#GFBU=BV1M03v8>S!PQiVf8ae1ZhSv$~Tc0;*0eis2D%!LPqx!9*%!ndsIv_sY zFd(iC=mC8}Jm?`dxgtn@c}A&57QS>7pHIPgFW}xhSyFD2;1VVl_YuN_4+J16f3vE5 zZbT>>287>v9|ZJ)6L?Gg=74S{XuX49qebarpr!>l3%AG7hHDc5j;{&Bxko+9Hwl(a zIq;jtjtee*Pi>X}$e+vwi+B}8FX2IDoFqpA7tE@kM6cpKKSW+7M-EY|HODYWyova$ z+}>*x8tyY<;d3XOqvJ!Mf~VqhEkHX>i78fmcsA|`So*^wE{u3niO|lqmkk)}8ZwxV zPuNi2Zx8^IKGKS0((!yZ_M+u=kwc=|Gl)%$@ooLMgTu?7~!h8fmR5Wuz}Ab&pw)N{%*%B3rohu*CR zW1x%J zga2-gp5P|KnfW)7Q;|ngdqitgo19MZNGt)Rs;B@&6koOu$UVULkO8Xy9Yi=8L>Q}z zz)R|pA6h;LLF(obFoFg-4-y2h%zz(2w)NEY@m2;3ycu<$d2klOO#>2Z(f( zTcV<-8+5QER@ay!zccJus$YXu&eyi&1Qxzs?pn2=2hv^o$^8;0W! z@zvnZ>kX9Bd(Q@>o>rH40a<9IF$htZk;_R$sKfAH;$O6GFb;6~hjZEho5?0=X#wjux^G{2K2PH}SAz+G^E(9;kLC1FBiQse=Y`m{hN-|cJxDYmm1c=|s4{QU1TVN>iO{&xet zkK(c>!U(t<9z4AqgTl@`5DL92U?b0^6tx)AAD+pCW$}w*X@%a$949_RKP4KPR24}$ z&(dcJ!2~TD{c0^e1fsji>3p9XoD9gxC-__Jnenc;J%A7T&YZ2G=SENUo?Pxe!oHF< z?$rUC~cn z)A?VMCX=BWtH{GCVKYO8Jn(~kN#RZ!SX6F;f}mN|eoQEK-k@5lfGa9}EGTvzR(!ge zdX|CBiN%V=W@D5krOXDQAT{QFai_XatQY<8z?4urs3ptb=U#c+;HTV1V;0_d)8Oacc@br4E|k2wR-ytjMp#F5qVGhiI9${(yU{gqIyplOcn77ZgT&v> zqwca;mr-IMsxd-GxTv*!{CV+5 z!ker>XAmcmd2`O#Qk&+Qo z@*G_KRbAHT>`Z0BLJ;)i2Md{&lv}!G+_w$|IQj>oE*qNhYAnQ#K-*R>-)I)jplIw1 zj9?O<1QbmXxk1gM^1w$NbJ6^5c2&QBv z9HfdIDjS!H7z2%8rO>ceQBS?(J3?9S#-$`oqBy$bo3RlV2=fr-<%qS1k2F}LWiiSc zC48_^PkPm%3bN9O7!`D93)!wfz0p40c9mgSi=F%SJpnhJ!B~Fg=~(lz=4F}K;c(Y1 zIk0y;Bu%%itM+;CT-yU0D^06yzqe?3QINi@xID>qRxxtEIkuvO%x&0@fpn3e4A|ad zJ>Cp;%xY(l0{7BFm;R&PEZ!%EZk`tSvgdniGbJ1S@%z1)H4x`ch(C=X9zGPvi zxWsV_e2sBTw*DYh`)p|qx2pOSODWk}Sye$<5^md+k#KgN*d%aUB}d@G(Asp4`2qU@ zJ{kB9mw>f5ikf(yV8&lYHsC7f2(%6v>QSB&*g0*Igar$~u3-hhP3G7GZ7JgWZg zJnTl1=3z1HL@^mE-by2=;2hTiG@i2CfZF*RXhu>ai{CD+LlhhqRt|KKP%JRDHf)Nf z%C8T_4TB-p6lOtb4aB*jE&;QzFU6Shzyb{&_zN@$A*Z^s`k$ofN^5h&cE{dR!&pZ{ zHyw`V`_Tt*G4Gch5!YVvPe5U;d;L}M4ewJcXsVapgyB#0Y}-uCCy>tjzn8tuDNkLD z#_t?a$y{$+)3d@2T{~}DOTgVmwU|YQRC(VuZhX0XB^Amv7f>tZK4p}9PIW*s%3U{5 zsp}U|(FPzHrM^4Jwt3%DsI zFQzq9)tEPo2s2cWwMkfG5`0Ra5=YM;87CS(X+Dc@D&8&H=I*; zN?@;!7uLanF|uxEg0nnxmWX&nMsk{XdsiAGoRgIIeGnv-6Eu^eRq7%@G9v}=itcU# z#A|`YXYKqgc&|H*0<@3}tMsE<{*Cwi${*ZI zBlcyjv~Uj{sODV)r?aLg*i^GQS-YB0b5OlcoIeUFqKlp7z8|c}+gD;_Dk=)TJZF*y zOq#DyQ05+Er40ktz`rD+*U57Dw{E46Fkd%S+qV$tWV8f3HJnox;)#J~BZVSh=$7LA zkoM!rAv>GM%X9C~n$mj*L=fQqlTXo8I7#JoZO^wWuTMv*O~3ubPoe8JXlZr&q{&UT z`*rK_)oPn{9pIPQT6^M+x7V#t)6qDU>Cdibpe;g24}vWzq9;(zTL%k+@W)HbpsH2S zef5qWf|u|^t=mfN+1mybw(`l}f_JH`MVXhYgDvj~c30iZd`~i@h$bv8mKXXL+}~85 zOkixqDVh@%M?vh!zok*=E76kM;T?{G)(0DSV0C!deVBc0Z%%NYuPPY$`4-&p+{}*7 z9?;Nl4D_8Tac!vqM9NC5jml8ujMkEp6ud6Ibgrg+B1#`kxdui>FXi zU;`)yARdIzP6sWk(<9pTs#rhh9;NbGm?0Dbe^K86SYdeW7h#xsY|Uz!BKel$&&_O{ z6g_jMVtInO-JSR^$7vAJPE^V*mv$<2Ep(5o!ml`trjFmA#Vn=iJ+8|pMDpdwOu7|& zHGS3a=G#M`qTfw93U9DEF88NhtS*ph3OZX5LFwNTFqX?lN)%IyO}%IMk_^_D18T_9 z|18&f78Gi;$k7RjPTb!)7QTQbiUTI`Lb2EL+ekBxmpx2A^!<8dDb3~_?6`lJ2eUy!8-&AZbBEQm!5)9Uq0@u8}X?x>wY|LR{$FVe? zSxLGWrc4-q7TRZ6?aXG<%DF^7A~zh2I*StV#C!STR?iH21)X*sP7%?@x)cczBUM{J ztD837te8DJ8C1?{{Sf|8##X$^;NJ1?;J`mtTiAp&ah>r5n%7iCllZgtIonUHp5EGaH&h`Psa(810FWdNYD` zK?`5`)zZ+nXH}dRzNay4r0zA_#HFi1GD$Eaii4HEXJD)7a*F$d$jg7XtXt(4&B@CnSW>k0BUpJZB6prGmq5e_Iv5%^oRQV=sI=3^twNc-{z#vF&%#igvGpsECt3d&L|r7vW~h;fz1v zsAN+vJU?GI(!75?ygzqd`aI+lZ6os0cx`f%r?z;AGI-5;MYK51ZGg>I zyM&55HxE}phc9XG`g?19-n5v_7v#p%{u-xKckSW#%RdWjdWCx?QMQO~ z(FU3{Q%LQaen(0O_>geL@*4c)HJ?9|F!sp06c0lOs!W21hNzz#lkkZ5PKg2$L=6Hh z3srHoMS-ijQ($X-U5=05P3Cs2V<73Do4bTkmJ1Bk#dVX=`WjFRvUufSWAGUh$n6Ep zs_k*=F&rz>QhiA_Ls~BSs zmy-J@B(;aN&^Q921*Crj1tWz_LeWA}L(}c%)l6&H;>{bvi>Uo@YE1wA8}|_<6BEtC zc>znPLhN^P%aihqhZLmPUoe#=*f&!fnr&YeD+-B{B@#%KY7p60vM{sQDdMNmk$%jQ zYRSHAMPg8ZsT}0=25hZ{w_);D-g#VEL-y`qx&SThDeu-eMK*zm+Fi)reV@(N;ul*0 z07ym=NC*Kb1J@B#DCKp=gs4~g5Syns)}q16y_i0NmCrqfj@1KWSlC(l%eUfF*BR)+ zYkmIV)O=!N^`invkA?MC+0W6}?8e8tyzUmYM;d;~zRrl0MneioUor_s?q45~m6srZ z>P7Un=@>Efx3ao@^$K%*2+lS=uNp;-KR<+nM^8E>0_r#mr;ykCyKvr={#DnO4+nRP z&bIySlM{SLq_i>-vQ_&7nOX?a7v8PG*eEVhO0Mcn%3#Q)JW28{A;%9{*Ml zfrzRU@PnuDZ+`8onuAQdDE%j$S25Q*SpW3iDd(u#$e%SmrBSd#YOHf8krCz>wb>?Y z%lzi`fqDN9j`FBFD~oIe?kJ5iu4b9?$v#tFSH(PP2}sob(|FV~8DA$gE-fBxx5v4; zNIn~?)u({7!m_i1Q-)x)VAbH4AOioVdUj?128xJ=BuNORVdQ;edxr7p8a{PI-;zHX zCby49q2hspf$cHgw2e3TwfZiyK!V{@(E3j?QVRM`I9PDBy{aICM=#mdE;%EC79*V$ z^jEAb7xXWoC=3K9^TC}qZU+%0wL5LMJ;yoykBzET?_(jBHC~4)csuTg^`QigA7_ej zK5m!|zCWG1U7yc4uI=CHmXf7~!JC$9|I)-6WNdx|ZPtL9G~RX9;9OOaZZ~QJgYSR+ z7}A*s>1Q-}I}*{Uhx-MAkMjW&ikA3K;jlUWl+^tGa3pn!{X()Z)=d zm8H{HI6g8PYfOa~PFMYPTr}AG%yW>I|lfzrz+(?7|6UV+c z#cT!$MX`Y|1)K?f%JJ_oL0pihp0#{@PfoK$8W#a4nF^c7xS^k&&a_~T8^KH@Yjj}r z{0}E)Bsa4w^YGiB;Wsgi6C&gjB7KkOA$f}F#7;Md%YCn)G)NFQqBJfz)lvV!ZKfrG zLkD0JKI`8wUZuB3*|QgEynyppGBM z!@mA~_CXSCl$y=s5z7_P7V+>kMoo*YiUKeR`1lfB{7{{DP~r`dX9VY#AMy1VI>{h3 zfIqQlGalE01&y=Nf%=J{zsPtlFp9t1{@9QfCXrSUdq%m(^0VvG40Olq#?Np6>F`=K*+b+x0rkjJEeti2L0FQ#IOTZrd5` z>fJOj{PtVld7)&F+tavdS_$uC@9T$d&!=Rn4=%W0B}sd;9cKUHiGN72B|E%@X?hZI zZR*RY*fG3t;=EL6tgHsLJ@Cz(Z1OPIvAfk2w`YhLWplwLq-S+%V)*l z#y?$wP+br6xPKfMNM=08+T4Lmx25g~vY(jeo*7r~IiE8#uK!$JP{)uNkEd95C|EkJX=2v5Y)9LrDS&CQ)33^dn0_DP+!j@;7R$`r16(VaP^ zq*GS1ju&mw*n>p`0Iq>gdPsVi{((=)FMWW0Yvx`KIC`g7PG|g!nvJrbEFX)yje;>3 zYf=j=o=M(~maDN!T))0-no7W+VOm6LT2oRtonnL<;yG;)+h<*`@Qws{H;{v8N4K29 zJk%||H@oQS)7hN3EZuIguA+=&vsLLL&HhKA{IA8Hg-tWEjLLVvePEZJS>Kn{JC`or zS;dW6oA`r@5SN7wtZiJ~A4gf`#P9(C?AXa#Yb#o{`-VO$TNvA$^(-( zl>?tEio%)IT3@rYj*1c*A~W?6$&HlqaxJBVfp|A+l(#oI-(bkCn|W1Y zAPbA~VWeox5ZV%vS(}Og7sbJ*%<*%$J0GFM7xPF&>!<`e*)O7Dr?30>r6(zGYh>@K zaGjRCf*(z&h#!uA4^*Ws2CSa3fO|9M7P@nbMP}|9rqVi~K)CxU);kr^d&{T#zt1seHu9{zNA zuiVsrZ&@;7p5(AP-4?f}=CG8QyzA}}vF)%ueE*Z6*^E)|Z9mUvt7Lp)swHLrY(AzZI*Efh_fV1xvuNw^;ONe_J2gT^(PRfq&0auWDiu<{^r zT^RXW$|IN{&d{nR`cmImNF5FR&X@*)8CyvQa_%FvU$hd$vgb@Uvx6aBTsOq}wQk@O zqc%k*)N@Z)*X_G!N3#|A9et znI&pQBspDu*~X`0X*^$rG_7OExEKJTyn++HgqVt(Mwy-mPuAcDxo~FR6oK7BhBq|# zxdFNT&Q%H4r2YHH6u0Ml_ycXKV>_If8%FICkg+b;7NgU|JdJ28VZ6UHh=3B5shJLQ zfxbuhdzSIi1m4F$Y^L{~5)%0YKMT#q71)*?Wy;?UAfd@C$^|s-Zi(%1$UZPqO7@JC zq+M?h2kw_Ruav+kaEdV}7pFG7619HGEj^Ecc0hbV=z^fp8Un5JqznR*_+U2!NLZJ2rI+=Mh0t@No0YuZYF)K&lvw@S|1K#^!$Kw92@%a}t*pLp` zi1ph`1L5e%OH5HrV$1&C!=QX22$52Q7ByRh_XL6%q}2S@Fg@ zMAW4?Phg;woVWj$^8;>#z1>2SeXCSeJi6i9wQM~Bi=vN6nLp(8Ju@^%tB>oV1h71) z3m}Fj1BU~LV@~n)>6p>b@rJ2jUr-<-T;)^_enzJJ0TWQVo;$QH|A~yjZU`-~Q8<>^ zN@Vl+ZqN8d2-mVdlJW`K%49H?L&@wE?OR3ESft`aRuVrr23=aiKYIN(M(4yKlx2<@(+nuJIbh;O30&L@Y&n^?V%W<{h{5P|A;sS&vPFE$O*KWL~V)Vsy1`i(NSE6s2Mw=%H?Sv))ik&6g1EqhoAD;}Ad= zq7=}dM+%723N>~~paaV$L^3Mj&udBLlqSSkq;rvI<#fotF)A{+R5mz<=^lsLP z24g7ehBMy{;3%(KZOk--$+Q-XySH4e=usJbM(Gc2<){c`sYWUv9b|;l&2+TiOtJ%@x3#=lupJecBPK^7%5( zI5xPZY)M#YO9^sJC>l6Fgz!fqbOdrFgaU>l;uYdG^2y`2;lWiErXi){k4z`oL^*e` zZhj8xP+ljsRbHY$9`O~7(($aZPqzndaW8Mqwa#B>tA7zk?ANIZmh7WPrsop;hun<8ID+=1cWyWEJMY z*8BY~Gi4LhHpa>qMTP%OJ#5*^i@4rtjZW9|q4-8_v->w1K^G=HJzJ&@C&;J_jO$t1 z)WKUTrce99G$CWP4bDDen?QYA0~JSi`#>@>``wSTUvKyFy2;nFrzeJ;ts&}| zZaeXd8}ZL@iAU#k0jBukoQ-^86_s6_MVglW1w}SsPsk?87_>w4isX>_X0{9XWWHh1 zarF;!@K6~Iaq)_uFPfNZTcPpFo+i{7*6!mb=^X!z_Ke*e@FIQ>`%;AbDrlBTBi(f3 zh!OC_3epf>^YCw>CrsEesE`||_kaz+8nEfY(l~+BuL47e?5Sa{S0WC{lS;h2M)#kve9e)$VbGWgq1;!!XO;-v`$7R!-g$6^47YZ1+T-8?EcaI*co9QY}LNGKnx)BDlqST> zzPB2Fyabf|6T4CAr@(K@vNq;PvDf0sPq+EKUf0NBmX7i7Zj}GS1 zkR*}hBJdm>wI@cJe}Jh@XA2*-`?sCc{LB#bJ9Hy^VZ(8t8*}9F^^nQHi-u<^1tL;m z3C*AqvYTIq$}%h{!cahv2U_Y@D=<<;`oc?jUrE*-l^d8&1#Ode#z{n zBown!ka+`&-AJe>lsqpV;)6dsSc#Hnd-z~Rfd(KIirsI#;?>U9Km5I`4?GWwJv#Uy z{b4P6@+b|9CCc=E3@J@9XrV%;Do2C2S|4_Sgu0BQ@~*{CN0c(98w3k&^p$TJ9MdZ= zro!cViZ<@fzY2~S9&Vi(h{a;fLurQDh!p+GS&@m1bw(EL{MaOgLb9p&y~z zxib1tlt;cd(%h&K2_loDBM{e8@*U+NDg%B&Xp->OiPu@P$8YdML z&A?!OL@T+C#vd`gj+JyN&L4wzQvNFWeq|AzA{mki+1;UNdSS|y!h<9?-s{2dBhzlg z=-gR)&!SEy2Oc!X7#@!4sreOqrWnz1AC@4b^MIZQ9?Z}TAU)%eRBCX|jz9}`g&+61 zUpj_^G#M5oCRY-75`ipPBxY3ZmW(|)?GHGWRM5=tpYC(FE(x@7mt4Rt$UU|o9P3cO zaKD=I^H|EUG!F_ceYx#pJPh2|QS5nGWM+a{$q=*xix49*1-{O~?&ewE31nf?c@z zmj1w!LT$>t#Qd6K@wg*HY)&lY7|NSuCOcYA80UiVX$^Fv+0oZ=%UE_My?ZxpzF`4P z*Xiqb=DM2NYR4IMlC%CO=7Z>ExqV(~`}e+3H}(4t|1;73%yovt=V%d$t|BrkHS={f zIQ45kV7OPeaL0ZnqtUdvnZ}ySG;#W>Fk*D4gz^8P>m9%&ZMKE&*v`bZZQGM% zV%xUuWMbR4?TKyMp4j>`?>_tN+5dIE>*?-PRj*oAwF=$or&ir}3A+nQeZ4*_@dw6g zN$H;!d#7hIqw{R&DqVt{zw+|j>XB0d@oxyp>Ilb%@~+8uQ83p08T&$h_1fZ(UqO<} z16$$P~acMq*oL_fUM?d~~7|Aw_yW_IB#2(xbjtwB#At@cY#3W|w=|79Bv{AE{N-=|9 zVP2zZr&*1;LgEiZBE%l}StXHC4H>WmKlr5R)lHpA0&h`s+OIaDl zH%7t#1V-$C93A)eeK4qco>#U+ebB@q31R_f5X-CL&r?Dj)4magkIA{5fkjk0D|IS# zWB484ClJRLd2!lli7LF~{P=7a7x8t2EZ-6TyC4>LMhcGQ6sT?|Iz%}=g0d^VhJ+}f z)efQ=>JAh)go$tFKNNT9G(AYu$JcM*|c!I~6u)lW9Gc?;Zt>D}h zkua~hAx#Kj4IdB@)801~360bzKX!Fo9lWh?ct7j=4g_Ck~!xoWAK;;P0^x<*Yd2 zSS!Gc^6XZWS_B8XCWp{>jFds z6G+L?3)hPkkk)W3qQgCjDkt#W zunq|A_F4RT^97D>uLTQLB^>}{m=J7&=V?{qKQJZDPj)!-azVZAK5Pl;8nk=1*=@o5 zrX?yRe*&;~D<(@bo_m5n&&^~X{U!9a*;)pUJb;Cm1V9TKg_`9bgLx*V&m*=coD^`t-v8)(| zmB?#D4hZw=ZmH9a#s`HpnU-iHU9S#D;-G7M+xp#R30oj&P|>7R{B8Pd^2__F3T?(LcGMlA z9nx*fUEA-7u6*x!p4F~O-k55%mZ?8c-z2@UItYA#K1lHZ!4GDh_J-vUlCT7nv&g4_ zmZ$fzxz;E3^erF1NrmmR+!Ua|5~v5%De-qA8qyAPtOEFrL-B;O%P5K8oo8ZaVZE(p zu4MvGFz1PKn<5)o9-0tykV`P?A~X%wn)NaBN7Zl*YyJovy}g+M8O$~RhAMz z4G61*Yfclw4#uX4#8xf$d=vBk0erozYgdCF4R=n}CgJv6cW-qd=Kd09d`M!eLT)%v zWlv2~8|kpLdhzx1u5}9^_?wy8%G9G#S$?Z*7M<3pF}O|JaUGLNt_II4lrD)G%i%Sm zD_miv;tzk(3XVl22mLVvK1ICQtl9pS#S*KZ_e0C*da?`Grw^2J28u)!eF&qVz|;Xp zFFW`!aG;tuYuXU(=e8qRQ`dx9s0H5$F;J)Wd8TP_Fpd(V_Tv^V&Hy4f8Uhz*LgR_V zN19@1Eoq!QA|8ex9Oz20?)=;EUX>*Arm@Wgm)9tG_)sG|0bIUZs3Au2)5etcSK}8! zR5^Ft9V4?tWsLDP=zR#xPh>|k^Jt2yjvxqvDY~Lc zJPDbi+jN5QO%zmui#o2(nY8N(<+F-r(oUYMdUx63d1OE3d;sTXpB=phslbfb{(5Fd zjuBK}$A%}Pve?%8xonv|;0&oX%Kh&BvA|nxS8wC(wk-K%Ig?JOqOXWe^t%(RJOLNFluX8AlaN@ChgK7GWaK8vf+CX zk4Ut6xb{?5hh7DWOXb3Z)^F5Hkk!HBzT5cw)0wgk!9$JG(elPv?f{*AS1IYuf>T3s zcngc0xcECJKyhF^{;S2at$15SPOm_WodFsX2sv=nKr6&w5Tru5%wvGs zF$i0|j5skSM)Nb@UwjdsUqYXTfnC$!P}{-9%h4Tr##l*H8WRmf_{5i5*Decvb*3&i zJJU~AJ716E+g%j6k7_%caQNm9dEEfV+88;{jIdQo0p_{kOHH(&8*MaO&p;`qz8ec8 zYN{2bbzHU|jukq|i=^fJ?&}0Kc!F!+g|mBto4uXD+k|NS4Hx}57PwqjkW2~n;kacc z_TrX!Swp4iwtI#1>k6k%XI;4S=D=A`*=hjimFn%m2s}RVfl9>unryC9W2@4XTt4{S z5hA@nTb?XWqkq5yY~rl5Ocaawe5gR0F=npJBez?{WzF@DYXYPW+nrhi285DM*Um^c zVL=gdUy-I_Zlr?5gQUqinWvm1d98EJ)1mp9b0Xr3yIv7j9uHLHvj#Cx`kWji(>&8^ z!-!$OPcGiPdJEIAS|{+!XQs?zb;2+a>$`1}%2IQE2PNoJ;J11X)S3=LiZ;`9{2q{? zf|MU+ckVHWY5f_SaD&kC0|s}c;}RW2;|Rr6WawmS#j7S*Cd*Pm(EZ`(8xYaR($e zzWyo0V5z8Po&iV&3jowsuBgbUpER6`K~o*8u&7+4S&37t=!`kAMKg9S+rTxSXbqUC z>Kk4c^VSN%`jBE52;QrS#J7#34p+q;D>Bj>vvh|CKX=@krkEsLnV#!z0AZj}ZOYsW z#2

)IN$R0%ws%qv;zS(MR~?+8%8~;r~&iHdCXhmETv-ute&m8dhh71vc^yhs($! zjzq0u0W)|=hQq-l6`VZ6HZ1;I$N|SJ36!qhx+c^@w<ib1*0O0rxgVk3R_(m z_F_3AVBiZiyd3!n5kM!(hu*t9QhlLJjO=VYbFqq%NVz(UGbs7%9I}{<4k&)L)gZJ)*l?z^U4=wB%B-HWr24(^!t1qS~GU7b#`qDe}m*o8de;|3Ksv?7}yV+4x zG3QECah8wkTkRTalDi8p%c-oY-}?^lnalBPFSzzfBr1*@mR9@yak-)x=8J_HgNMh- zrK)!1&3V5}90!p8Ii!N=YG^i#{{$;xcxcghL^Y&Ry=)??Pe{Me%rt%ypVso?H{U4h z;#gnGx5>VP3VRVYkkVc5ukFTt_jq}Ddq-;BNrHDIphwB4yKAmGZMA(7S^Ku{>tnSv z+1S6iP-usA-fHsbraQ(-sA0YMXW#XnOh7iEc?37;wb)dv!CN>qLzgAA`ce(7C+zE2 z^tFk#scD#Pj22V753~BEEzv@KZIUGXlp6UYdQ0EUn7i^z8|GX zl%S)!DfQfGJ?I292&*{I_xYqY3m5P_W@d{yRaji3xI{?v7Sn5{`d60GwC ze`e2oVh(U7jdSfc>4kF$@If@H83n?}AQIoX#s(G7v$%7o)xK!|8&^Sy1G4=4YDI78 zFF4DzFQ<2Mxn2|^tLEwtV%Bg^Jf=yx z3Ab%*k?9{oe`nrjKAaM{vc97i49%#WUAKg5w1sYU(&TgT-D}}>w1}>=4M@jc?yO$U zB&sesahKk!^(#5!xTlZ9F9>(()$3RTf~mND>!yi=9g%^Fi%^QL+-t|W-at0>hr)-X z)M7qYuW%%Sc}3;U_9uTq0nd`{mA(`^g=evzow-x%@Q?59dkmp&$Z=Sv=6B$EiQ)(WRpZ%*ud4=k#sXa;7NM{f~ zE23<-%TQ8^lW!(Qx3R%tXTU7#P=i|p+2oYLi^tAu&l zzJY7R+*wituGGdRyQ?`Jr@#hBx)&*Zff&~A6LBDs@P^O1w~AjLzaXEZ+#C+iASJo z^=IJ^W_n3U+BC`lct+Ch7?@qzy4ZGx@%EOzz77GK5{a(p zG@jI2g8ki(*G_*2h;{gK?Er_Uc}1=qCSrz9lLo?knnuJ5jIBgw5o}&y=%z@XXsD1M zw@zLuqma)Knka#pB>}Cr`VO;|WkpIc*PcE?S947c*{JKxaaH$Rm&xXGpgt0g;0;|I z)QH@u!Tcb6Ms1XEv$?n3dl@K4 z4`s0Zq^$&7JA-pmWEwA+;^ojM8|j$N)>&CTYehFNF^J$q-a*UzU^ zrdXABVLEe zEspiaTTd@w1O+Z3_8o8_a3*j-gI>Qchw3`py7;XxSfgYfG~?7Ia>Y!24Yo4333Z1FxeD-@O(6+!nwpf{z5j zNHG;>_>(3FoBrJuwjCbocYG8WyQ9)8=>!yc_*rhrRDNJ-h-o^tl(6rOO( zH+tjcp~yCaJpwTBZknT6@N*M8i^#xgrM4Fej7;gGTDr@pL|lbp6+GJ@^eX8hdpzUv z+!VH}M0#C6w4nX914)&?c8h-W_`q6lHp|j=A`}VX@1Ye|iC|y1C|)Zwht#W{hJ!JW zge}qv!j|#A_H!$_pBq$Nm-ecIzh>-VwO74?!x9~=4~3*h%snq(R!u#H@lnHc6dLUw z&f8;3+K1-ib;jy7+rz|mx1!y+%Jpr=qRKlhGoQ{Zb;f?Z_|xg**KGHP>ub-wEoTQR z?|nEd9gbJ=lcUO3G~RLxV)ZOe;Q+bhMR0h@MXW(8wlL|?k*_Wj1mv5n8eBNYrqTUA z7jNj?x!ScDn#TbxP5=%#*ZQU1Rlw7-%+9e17^m5sFCWvvh_+yY67rYFuAocLW!Js3 zS*`6$r%sm>-F+@g0&1T&XHt;03s}+^WXDJjwlC|H-urPebv!|)C$C*eA1F7~yDhI= zc()q@!i)farVCjX9xg})hAj3POHFgbO=a`8i&XgK0oLyU15-fX4S_)bBE`r3cYw@o z;0_%Z^0#QGkbnNdJB6UraU|$t%z6eKsg}Z5o=dbF_d4^!$}s_Bz!>z~Q}H~;PYN3H zbi`~Qc@tlCzb|jg@ev`pU(83HsnynaoiggOE z?q?byZjAW+1(SaCu-lU;!@NRG{qAJ05U$NMG+`wqf^@azan5Texjr{e$P*pzdvhz? zlTYKPMPFQXGGssHURj?L(8jqft|!%I#zPSELd7eQSf5> z^9i}>JP%XS#@xNxNpqpb4h!8|i|cAjx@5B*m*eX%tK@TPSUBFX9xfYITLW>2J({sz zpZcMiZV>L4#e0(ViHbvnzUzq4mP}LLuAZD{jhMbm-+Dm*`b-2@Fy44$IZZ>Z<2Ox$ z43_L!=b@hOw+NvuLnbiW*{17`nS0g_MS=M>sy&2JM;PF8@NO2j*5h-|N2S@=!T2yV zCfXe8zcdXez@7rKW~*lD+`H@|%4asHSq<=sY(paufl#Ak9q=)Tj-q0HNGIrVxI@Fu z;OsZi`N($>oyXDf1Kti97=4d80#N`%UQf{puw_4%4Wgn_PNX6=e3M^WugX|Z}duUEfCPT6R0zZ{+n z&2(YaTElUzM~w~vL_X;dL3HyBd7X~*)r!7Uo0>hsr9=%oC)9D3^A3Fs#c4Mi2AVq? zDI6Hreqh)2Cw}Ab8Zv?!%`G}+Fy(BMJm3<_tUal9D!r$u;+f7m{w@vsU3wgHTp|OH zov)U@k@KGOGx#0wGxo)99y`ct0_ZMu6HJs@1%ODBGB}+IVD?A9Z_y4qbmRh9dfo(zydpH4sRL;Kx@K?A1WH8B z(na&Wd_5wma@-CZUQx4xIV zyUXXzQg!#U1sWf#o@aadVYT=+Z&|oJSs!X(A=m0Lxjs0ZRQw8@H)G($E7+soWG_4? za(euoIjW9`Z*ro_7mb1~>;fSE$}7-PJY-$PP)$ui=tm-{&+J3$>y^ei6W>5ReINFl z?n{On5B3hdFehR?Uo8$m?~X{b5NfXBFB5SChhX7--i#;hSA*(bMiC&g)EC?@)w4e5 zqzN||A7tsX5x#bnV-p3$*nWvzp*mSc7;3*`UUV(Pr+arcoWGA(m)N|=Bh27G-+P`O z@&{rX5j*%B3@|D=wfPAMscNRiwX4r`7= zb~u)fkLuW8&B`!C6MR#Ew8`hF1?Qq)T3(s~OoK;)4Uqu?XIIP7ce`Z-yJfxsm9%2^ zVG$y(KX~X7teJq&Uw8VdfEtJ}*K@mUGyU;@a})ilV1k zzkALvf)K?#{WvK02xFrueowmOD(p(5Vu zNr`xr@a#fSkeVP1Lyh*9bW^(k(?z2UR_WKbmZQ|3eZMlRb}usQk(!v#jLs;1@ES#k zIEVet=Iv%@jSP2;SWcFJ?j5rN$4u^MZmVPs7grb;nC^Iw*e!}T^*1Ydc99GDsrlqG zv}2Vu6>#f%otmxa?re4h6&=MU@E?+_B3YkWjJV%P_w*Sj`H2VPBa#@yn2eJ^D|yj3 zBnKgfd))Em4-iFbxbymA=$;z;vpn@7`Z7&n#etD!>=i96$T-oof~|XAX5kd! z%XT)h_5$0a#RttNWlNl4mqO1d39N7(=?H%`8Q@Upkxo!D8Qt*Vaq>r~r=^qapw*gb z^(Betp=kBz>Zhp-A~5eK)f2Lm$hA|dvYJ-1E@e#wn%5zdxtcBX$sp@5B4H(?M9B!p zjvMejS@s1ic^(lwd2WaEbvPgIlj*oS3Y@kc6**>X`Mg&>qS3s62sMLid!6ka9rC>Y z+H1}_TCC7!yd!wze#9K^A)~;HB=ZkLP+ta^LIj>RK0OnN#1X)VD4!QV%Tr2>i7s#~ zA{~%IFveoMCpwRM`3YJ=cQj!U^V3(xT^div)A zY)$THGhdk}n37TzUs>MB;=$l0V`Xi;!OHCv?#z={u!^XjT#a;cvLi9B!s>SjSf z8s?9wZP+Vz@OMtY&*CuF3K>h2jF%V8OwVwgwM8QYjd>QY?`IE29fsgQ?tw*-C*u&Z zm>A3p96L`DEP1buV>FvJLM&L*7W-sEyN%XQS&SzM z;ZfsZ80v`7w?~pwLyN4Nq(I`NCoCWJueDnBK0zpQ8i=V`24-rxrL-HvxW)R7(F63@ z(0

RZuIarSbn#Z_X!Bn1`FVN0-$5WeekP?OPw9fgS=J6B;6P^YgAvuVC8B(flU zLN0pIu;pk&rOBJ zRVGxuH}tesv)?PDq4#ZAJ^$R28J<*E!$|P%n{AxaPcL~#T=uavlWY_gJOn)Y7!(q6 zo7u`@#)edS+_9k5R%a(lHl<8l&ad+lc(@z^UHsaZ9XVWD-KB@O1q(vLBTcsWnWG*P zGqONPO_C0O>+)z0{>&&i3C;xrZo+{Q%bWSJ^+OCERotvj#VY(f^O-J`OByJpN6BHYvrvm`EtS%a9%M8^D_|4iCt0~2ZA!C@q{D6S3yJrdBPCaZzr>;6T z``{O!N18ZsIdipRA(U(>S1{jQl|^zElJZl2Z`7EB=~$EzOJZu>tJfd*p}8xDIEVwm z7D#(3f9qu3n%2#t`NG z*sEyUPl5MVYjZR?NdzN6qJB?Wvun8h@c7i8>fl1B^SX)Va!<|H&unl#`c&CW91>I7 zc-`W7+*FH;`7SujDs%`JkAWYL4kIZySx7V z_oe0IrbG26&)J#S!3#5bx5BHYN9N!Ran*3vN4r#Kdj~gPwLZ42Q;7@%wJ}%yMv;Cc z^U_h}qPcx}2bu-i(c}Z|3~F1z=EADz0hm>Loz^SuMm4vnETzpf76;zkP%GXM3Oh3% zip4@&oA8&P-P~)i%KiQ~kKzMGcax(DXOIj)mvEO@l9X24b^W2;>#vOIl@F<+JO2mCJoNZ0kF*#Hw zJ)&w&OyF2}sJl2I8JSv{Zfm@WTz39sht)C?&05k>WTzl2CJ%T;8PFttR7$VJ2(pig zja{$fKv95hRYp+AElnMXS|MUJ|A{69C(F{0%J;*B_)%kQ=|$OR@mzs_wyM(^xAc3&-;< z^F4s>U;*<4h5kC-ICsQ?(?yx96G+{}JuCRXUpe^nFwrw@Li=1jur|F7X zIV+Kh6slU^J(eqS0d@4G-{B*CJ9V5)ZOk&Ug3*Z8;&38qWwEu%d2w)!I=r&7NvR^< zMO|5)t?oW4a`6kMeBgYha^Bs_v$bm!Ps3|bcwjfCF;1P;=e^2mjY&6B)&RBVut$7F zX#v$-P~eI#CN`Yv+7spy`bu4t00#qPZ$^UGeOFj~g+8`yQk%ju$E2z0&m0bqHg@rd zJa|56NxzYeD-&IiEwQvOhWQhO%8RGaQs0uUwzVN`H;{=mdF2_S>7MqQzpVu|$u$w! zxcSO+KmHZn(quqp3v+Eqwat@yt0;b5T3^bCm*Y1XpHWb*&!^#GVE@5w03q z_>te&Vc85uqaRB8&5}3D^1E{jy9U%ZyYNh{16qj2Squv-)<`}TZ0zhRC!DY6r)Mg4 zpX<4*-?^%IJf20*fN#_~#lVhgDWAXmiJUcdFOvG&yMrGWVYj-_ZwjKZ2gIV!%s%g6 z4|bC6$otwKg1HUi=pG)Z(yAn`0k*x5)#@KQkfcQovh0UH!+W@WANH!J49qQ17s4jqs3_wa6;Zr zf0T(hJ(v^%-Coprp9vKAcuABJG@idCF`6&G36t-T5ai01!7G8 z#CR=Q^s#ArIid4;UKo6xHM#t;*1h@umO&fAcedPTo@gGogzB;X#3aYWuwY3N{firl z%|mfvKlRrNt+&5}w{SiX)mSre)o&8K(|%3$b(E6WW}5vHI$GBhB#i}W(&d(x$)N+M zuDbn9DK~JL%9q#g2PA-_2Og8WHDXm!DAzr3?wE->K$_yLKW0yif7XLm1ZWMh*5bT@ zepLNlC_YwtiPl2iwL_A9Iyb&mLw`$;d#l}6fxG>@k0ywAS$RJfctD8S7}*2GrmIcv zVUs%`f38gcZ%#A*0jM+$BGK+U8Q(%p5ulw=B!?`WOn|q-upfpJ5!s*q>cC%~PdHv~ z5%Z{6R;+0+&(T@};!xdh1I4dNLk?D;0LDQc&Nw2qqA00sUT|Gw`u%L!|CTjgQPMAI zeLr`EIhNV{Nyp;t^FW1#(YK2d??=vi%*{ZTacv*wRaj2|)$VR;tEba`;7}L%C!5jwOOxdd@{V78?Ymq|5 zH>P4?rrvtEWZi>Fjg{1iQ20)u|B>Z2!nV_*<|bej#n!XPHs=!9bYgPHOQmaMNvqLM zcDwLKV1!fA31T-@Osfv)wG}?UWx(s~TYQZ?ntvL9x>R%>P`rD>PNePb+a)_y$52u_ z>AupuoPI8}Ih=e0{scbCa~eIblgjnuZ0v-xEg-9aP7O7EGe!2L3>KyukMJ)f2)cS~) z7ceBfdT;5*<0;_STlQ+{3GZ1&vWijQnqn>ylcb-@)&KfaZ+8PCC$r{O|4CSv!`a#JJ~sX!yPJDBxCdroZ8{-ljCO3 z(QZqI%WdPEbyvXUp3O}+g)xcJ$d5safl+Fsu-VEAl`0iy%HSL51D8?oQJsYDz0h(B z)!XksI?H4;3u%?uO5jUV4WmnktE>Kl?PU1yas(r8h$}}mo1tL*i*pI$|f|Pldety3sdN5H1$Q% z>B;8km>L|MS3{9zqJKWf1>F#ANUGdyO`DhwX`FaeNOk)EE+{=0eF1(4sMN1>ZY*79 zj_Mp&DUst%x*)3hfM_f?6!|_=cd}*~X5W3n?32zq4apL;so)(ko4w5^cUWT5@LT~- zvrXC-xj1N(d)AoqLgfrfVTtHAuW-C!tvITLb~SOVv^rCa!+*vb=$)gT3M`3BeWbrU zAb5xHxYvUA9tZ`k?#Jp);a;m^mR%Y-bPPq4nY%cfCK6JIXNkd#|O_XOHiX8sgi!-byP9Ui;4KY!%| zjL1T>Dd56D^UxN8h5230;|yb4D6b$5pBNk(rI^|2Hc1EFW<`a7m+WlE6a1b`k|58u z5_2-#+@5hH=QwYMLBKoI{f794r?+S4EKduNdkeJsm5>J3^yhk2m6DB8cix5RE!WJ} zxoEo^5RF)a*VjgAH?PaKMhj&fuZ1!ym8#=beYeLtl3pn^8q0JKiO%k-2miFwve8tO zxngM=J|D>yw?dMx-H~`wcOH@7khB|W?3;RY!&74u7|A@WRC7tOyreWN1C)3b^6dF> zzU>?pSerGu?i*;@s+7Ne!y#~rBK!)5lfN~^5Q3Lt@M{j{M=t6(7sn8mpx9nx=lEeChF-}tSxLPOiQM^c@b&hN~EvTKEXjqI^@|ZeuKHlIvSxD7vY;v zzny8lgMxkpTq`IdOy^q(G>?_kw+rD&M|D~Lob2|-->N=}K1@iuTVX{0c_B3=?mjyh zapwqsTOj)rT4(G{Z-P;d@X-=IFxTajTw*$sbK;xKHtC(HGE2+InPtMPMH>TUXB!yZU;BhoTt@3AM_> z#mPyU$a~h0nCWmJ_i|K@Ylvl(0WVs#cK*!+Zw)YjM2vJP5N8eFJK9^u(HLD-ZKL4q z25G)!3?rDZyD!5^ZIYpp?ZJe*JZ0;RVPGAXGOA8~{`xI*6;DUG06b$W-%MkbF9`WD zuBSa~5@7{H>6~?E*G|| z_ZYsR?C>hczz`TRRKA7@Yi1jG0TrH#vW9*d_8C(ZJPKUHz2--TlTy=lRK~*>L2HB5 zsSA7XcDdv(N=b*tAgrCe`ROBLjidYLl||6-0B|eSLTi+0T!u|_YlwRkKQ!(VgOV6U z+7f?qpf-pf&;jNzts6f`{?0!O0O5-Mv}Mjafsqj>V_%7anQAhkObYr187<0y}t%}{nQ0tW`l$S zp2D(FgV0xk4Tp%&-{f`ZBgHgOdw@WhjEZ=SKs3-0VGmi8+NvGGRx!fYm)iLqu{hl= zAns?V#mF%i>{@aYCq;S#sZ=Za{SpL44Oe0+HUQJv(Y^~QnyCQ8&4j=lHQA9jI}J;_ z<_WD}WNxKkN!rwm1yX(1J$W$=XAa}#I}*;BdN?wi&lp9ZVUUga>9!FOEAUJFP9)K@ zw>|C8&}}u9YCGPamTsdv-AjHVHbxSovj=a(Cjbce?U+HGyorJ&Z#`ogxbrDRz$z#X zL(F(xQt3|KAD1h98;PkB0rvm{o(`%TwaY;_Bb`J0b)s2BqO*b0FdOwO!6~-LS``v( zhoop034Vm0NfP48IX@!9=J;>N09!-|cUv{V0R5b(pQ#Ea;CYZVWDbl~50R2cUO+-< zL4m?f5#r^za3^|q1eUl7R!gw3syb08@$Eu)Se0Qf4SBcVr=hg`nn=YH)+saDG?&ND z;gbsp4_VV)=u$CJ;n#wEOlmX%IdvJby=7Gt_I)k~(jMQGoxX zY9x}c2Vbv35N;7WMfxcuq(Pc7S(3s=3;CK8Hg2k{SRIkqODCXH=+1=}G*f-T+NqXY z+}0O`kU*~yC@4-&V0sTB(4Xv6ZI@VW6Sm?IMfh3n1*sSQT}g&8Vvzu`{;jWm(d249 zRrlv;{d4$1i!U-sLb02%P^Y(QqE^2smAMogR@hU$h~n!sq?9l)2{SW5&*m$)rN!BB zR?Wt+d8$t;&QF>|qGpbZM3c|6D`Y7q^RAj@8vCH@;Sl5j7WNm{-27V`k11_EqA_le z${3R{5{SG7IBmg5)|a)-?hW01;`enrT_6b!;pmdi22hKB!VeHDMdX*!_wB5!dIUk!kXP!A$o;C)=7H=-VRXw2PpXw7I(XTnV79%Tg* zC&G2O)Q|#ALcZ9l7z%z@YFn@?fh%cqn>Q~jxi{)v)%XK-kQCq#)*j&m11MMcQobS< zAbt-ZTL?xpAgt+Sv*{&*N^pU4fe)3io0Q8O&!>Q43nUMQkI)#2Q)85QlFa~IqS-}* zNLWjrMBfJPygL@*ri~Q&Mx+Bze?mMG`vwDi8p1SN_}Eu#LL1gN5zKfT;jZLRNsG`) zkBDQ>t1&v#v=77?m}z6eO!NLq(5Wqvc@Xt&n3;+*03Q-+B)}=;3BPg8Z%(+!w;$r) zGC=q_=+Y$``tjA>Q%qNw^Joj^ctC5SO*v*y1fErL#;JY`CX!!D?l(hyly^xxI$>^W z=dqCM*kH|o6?oy?ETfaQLD~_8x{7uCgu_ zQQN^Ep^QDS^_1en>PYfG<`YAoUiV7D+Q^f-L^3(TDdswxSdznMYaxWPhl*An~Ha~~l3 zDGVmyRZhv-{}{pk<_W7gm|YArI4segXlDq}qAw}`Cb`HT+w8f7C_++X8zhIuu_uPI z$VXb=!5cL6Y&J?r_L!Luzpn0lbk^L;P~FU?5sZ7$bFFl zC$=wp9^rhzMP8kJM2wyVPOcM4cw}T}<1J*}>+DAj-c;Uoq=QA=?27c{rpS>?l8Pvi zUwDrC-fmtjl-Uj5`T#eM*0uP8x@Ew8^NFj8j+P|A8-RZKvH&1^+WdMX6#;Mob_Rw4 zsDP=0&AHw4)dAQnpa9^@LDAi*z9&jY=bO2-vRyp7Qh5&F)rRs3>L$LXdxA##%=rAi zo9qYe7k!KN0-dRt4aG|k1-lHU3Dz~3jYB?aKX}IUjCec=*$ndoJmsI>(%lt%3jClo zK}B*4j>xY>vNfiY;oh+Fwj$6Up&s_&ZGO>woz#vBNFk|P(1E7!{ zqmLh>BXIGn|)h<@K28L2ZGSdRPA4ZYou zF&I`h-*7!^<((b6kX>uN>*SspKR$S5+bu>LY~>50Bn?rHjB)ocm8t|6$M+FlG235} z`Hj8egd@=8Z)xmu_s@z{08dt9<8kmg^|NEf}Jl<;NH|p2mP&A_~VfEKPrd1!LehHn5;GdC@R9cy`I{ zSTKxNCWF4Bw=;p1^)~B=ZJi12}QKo>J!v3{p_=1rvC^O!U*l;idGY-Q6it+PFc&q!(iS5bLGKX`1 zg_gECc*D<{e|gziS~g3SjB2^KW!kWX@&H>66^KXDaJk^V z62VQ{K{m-Pq1kC=FE2DpAm$pft8Z~pk7b{#4eS2$;=CkQmGgmB9p0~bBlyA>LD97K zUUXoDirJT0_g=v{qzXqnX`YeMHN}fMJbjNiYOLYZ-(X)Si%z%u-2hRgA$@AU@vC2qBFPl@aO-g?GnP7H`Jre`{*6yMwSA3IB3itVJ>qv4KE2H{xC7 zN)WAE7Q$1wi!HfDsMw0RBXO$b4MS!?EOZek>Ep|#-l`c~95XEB^t$MMaWo2+7rjs6 zk_Ds%{kp(uRYu>`w2OaHWhhGpF|ibmn6a{Ogvi4_ANv4sfsau9A3&ggLy^9KK8&m^ zjEw)w1Y-U_m_UEXO8;*r&==|Ge?UyG8Cr^CQET)F?XOgh>$s@SqSe&8AmmNS*f0EZ z-o8X*YQT|J?2`mOI(99~Ks>~GW|O|yQe`^PI#F{isV_sE#R{9_^fK>hHn(2VD;c*& z8D*I%-rv_=<3EZmniZWWczLWfx^Rp2s#X{0`Nqc9ns~?Oj=Xt~WD^;RKDr*Ai3z+p9NjD%*_}EUs^B=`M#Zt~EzLD7<-+EQPaO8WIk9 zw~}ZN8q;Z8&C?HcSM}Hx-A+F}&>pBc;7=+w$`_vrtMfK?EW&@_Y0C+XbFZ<)v0>kmJ@YoW{^lJ-AR7Vd zHjgo3WKY0!z;TA!*$<0nt%~2s#x5@#r7owb9ZRV--D01^ts>(z2;jP75k|d!?tC2V z>Mg*$la_7O793UXa>pD4kT}I}cFEMY{*>?Uph(}pC@^0PtpBETG5m$;Vqv2H-*LzK z*SPx^CF>7$OW4T4z~0Q((Z>E?9IUS-89l2ngqE7Rh^VR}m4LmOo~5+n|Gp`zXJuyT zhEM)aI>o<|E%i(s@R=C@S4L3K=F7nupN5r{8J~uUkr|(no}M0`fu4y2U+W(si7&3W znSp?{iKP)f{XZ!J4hBYF2rv$MmVcsuYV{{h!^rS2R+o^Tt+VG8~ayV{|eIM zv#>MKaHk*!Uq${^`d`66Dp*-r zX*oC;8Q9tX4ze*a(lW5Jajq#%KGFP5zSot2zHw$X``v|BtEu z%J`SMzsmb7^)K1K1X%y7(0?lbmjDMd6D>P48!OA7PXBZLR}KFw@>d=I%f^4DvwTgV zzoY-@qQ5l#KLq|$J;wh8|0w(``j>6~Z!Q0K>R;pOABF$vn!n}0iu%il|1{-KbH1Mc z*TwK>x%g+L_?o0PO4epyGsOs>;m-%>-*cJa@5%gcy59d10_nb107oNxd^%CfFBGA$ z(H|ZEOs#*@_1GD|maM-&S|e4VJrxI9KaCI1iRkJeV?dlovJPs>4UvHbV2I7?+$<4@ z5CnU)U}^{;L4+Z}Jj11Eex3oJH7eGkG&agt+B*k020RFgScxE3H#Rj^=FFBqxX>J5 zzVwJqzE7=4<3Fyxo~7PSeC>g;>AIa5=LkVZH2RSM!qHZ>TEgqLy{sjCE=YKWQvRZ> z!+SOZ#1~5=kx*Ez?Q&7J`I>Yg5RM1vJ+n5q@?LoB$DLQ(xlMN*gzpplZgd6jmp$w8 zJZLe9paJmVcJ<(N*3LQOg7~D;W1_R^CY^l{zAoGIiP*8~VkEoO7Y6?U_W@lp7AAG{ z|221)QBiO00>D8LJhXI!(m6DObayurGsMu{$N)+sDK&J5bPXWg3eq7h-AE(N9X#=# zd*5^4bwA!Uv*yFU*51$a?Ei=To3%F!!{LxLh7bQMU>)t{Z9tWon)1{BT1I*zrAh-< z`UcNLl|poPkyp{-H&MgZt;NuTi-Q!e?-Q!zmVMuf3RF(kypEfTh!{`nm@pyal_NTN zOeo6)tXY(Eu2oTFCa9>WeFkN``)}ok6{CkeqTKeTHWOX*y9J7-L?9QWVc7WnE@DFS<~R~0;$vP+iViJxpe zef9Zvt0iO+EVYL`8fpXqZS|g1I*)Sh=aI?$~f5b=6tMVFN^5PVn8(=iYdjNU?EYHahDE}sNjgaO{u6iGhkvt z9B}I;<&G*C2YFr->EF>`r#VPB$~6~plVWfqJ6*QE=WP)^s0=GjV>MB@&}_{RVJN40 zC7l*EreuSCa6NK8_jUOE$QGh`Bk3i6*)L$V)xd$Q(k+8E621M61%*pXPvn}-=R->$ z=_TqNf}hTO$<1r8g{d$-K;gi)viDnI@TDz*!Dlw}mOW&5=4)Y`v!_0hzU{kSuiD;c z;wj50q=3Gr5ggV`8GHY6h)_Km^u48;Vh0Tqf#H*a?e;$5eAZd#;9z0kV zS&>mQTD|X4i4f<|yXa9=qwWL=HGk5Yk36pRlz|qJX`y#Fc2;*)4^+<`2d?uEFxtTO zo7}70j?YwD880{XpOeh;)|xVK)PC~}>0KpOmc*Oae0=aqZED5qK(u#1`+JpbxBUxd zVBH=_(&CB10B@xY_LqaI^gXrlpM}#kp<|TOau;b`h6Tf$N6_W~fT-S(&jm7suPx+Y z-QgGzq+}NMn!_kQlph4$dPdq)2MQVKVP2_NF+74^ko2f>;@5NP9Wg@8qpHPc@&t>T zE1Dg)E1PXEQ+dX{8_x)A#;&UN9~qRDO~OWCUttTd;qzdjw|jz;D&v6d(g<=w14&gu zC66QGexwIMzHEN`sYftMalL7V7eYNZP^ep8rF~+}o2kqw+hhD|sc~m6}zH zpFi_ctbKt7$)C^{UwbB!Q`n8{PAxR}P7-Q7b0A7F430K6wgltvp#fu-PGY4JR)ju5 z*;f7XqAR27N3rzkt5bgD5bq&-XF`ItI3}Hg>beCoTDre6aKQbTwj93vSn6N_JCIoB zyXio;@~+$+{j8)+f}*#<&7(!ib(Ju$enth--5%72So>^@UV=|kIO;~=61hs+HLWO9 zk0voq28yNB$`V%-mAR0~7RqbUmShyUXPld$wI0wO_A9j?UBcN^@1{&5jo%vXY0cW$ zZOZA}5qDMYVlx7;>PFR9!LDspDdp@xvN`uq>Ph&r`W8<4W|f;}rm}a7qrNO2cHCL~w!leINb6PYgp3WLGRW((W%4fxKD8?zm5@slc= zZ1++_Co>P%vjL#5f5G~pw|88 zyiMro0$uv!*;TQ}Jh~Hec{?mQ$Y&cR-liS0xSt}x=DgOCLQwO3yy35ppikrh6IUyx zm5ER!0J?S$|BTC0ys(-L>JDx#-F5(5bH|W!0mW^YV;Cy!^V5i%Y~ zG6;0F$+hPSQTz6JW@O*G`P5`DD({`P^v%qIz6V%FmM|U2FP%ZT%DWx(EuBH2E9c~< zccct#lm3E$ML$=bm!L^&l2V^Yz+gAl?d5ynkwl+V>M9F#UzE4g)v`U7Z){%X)szvb zCve5q$PZS?pvlXbm3;(TV(2^A+95?-fD{X1HbP-xxR2lUI;9Y4n(jXz%>Bk+=Lr75 zEF1;0`-Z9bvs6pTt~qXek<L#xp_|VFwRk^g%#f!)4gS@H=am;%U-g7 z*GnUV*`!_&G}l$OZo=PX{3)@@{`2VBqH$2EBH%}!lVf6Fr3Mz2S>zbPWfD&Qj9Tyz zH+e5@O-*SnF{vTTRB7>Gm24@Om86dKM~MiGTq5W8C)ijb7k~tVVg~=QN zD+b0&jnu*vg}h@gDK*CO1>TnUJEka9jsAR1RN7qmV0|L-4@>Uji~K}X;VJq(c*$>c z;$j(Vum*_SXuae&WHHvbP*9h8s+o)PxC4MTPdI(#36%6QQ(-WP)tHpq4RA)WqM;M} zezvHEc@~o@dC|HX*|XRjB${iW!JuVU;v@%{%ohe-p@FXn|9D8OrVv(Zj+dxTKpxxS zV&$$aLj>SC>H%9+dRoCR9%<9-$`LbiE)2p2S3Gwyjq`$piNp_@vPQ%Ns6+V&pk9gk z2^ek#+sw5#g3j(&IQ#lofhpyn`e3!RjfT#u8bY*NA zk4JM=;vN^?BE0Iw=k@WY{|U`tKXKgYeAU6!n5fzK1(-U)7JhOG!R5m3V;g(BP#NPd z;Xy}{D6@~Lz^ueZ0=>AO2u(Mq!NRFINzV07Hnn1_dt-XFGirPAbdCk_fMM}OeGX{@ z#HJVBI#pbOb}geQu%2eBJE!FwpaYHElnq^)hUUsN2`s-DN^LYGK_V2(3f3-o!GN`s z{KjQxN0N(p{Z#e}|4z*iq+&9iGZLE?MPa8!4di(%6v!%Bnf}opQCnyEOl~*+TH3~1 zxNQ0+K)^I&Byr8uWrueUTq5AZ-PpLp`CVf)Nto#{iXh0{4Y)bNZ#**>?@?X1D9bUzntuEeLDe#dA$UIE zicS=-62%eWkfdEcP~RH9M_D^tyHoMA4}r|tpUBTPLw~b;g^2qZKgbow1Oe%WrMoDo z7NPP{B{|$PdG_+CaX;9>`b`B=37?CjpBBhhUt?=*X*VcW;~fxLkxt@DJS zz;S#&UQesv_Ue-H$P0t_B4h;v{~6s1;0!vnq$; z2vkyd^uo_1yaMzMF&i8AHTw=-y_0ICq_UBeFCE{7Tt7YEJ<2^p{xT)T`!6NaT7J*` zlFkt@5s>@{*U9fN7rLVdl5@)+I1_>FNg9&wrBl2qgTqNZW72&>KcbsIzQ3XWWB(5b zr1#9$DEASHv^9yYaL#PYaz7L?)%TN5ANc<0jKB}A{FY6PzX>!#bF*&Mxvzv9G{hZi ztQxykL9eVLC4G6A7ppcL`bATLt9vcbLtw0eL9M1YeeD3F2=Y_nZ2W>W@9|6croPl| zVoX{1*odcyrCEzeXLM4C9*-#=Z6TNuTce1shagtKG1JN)0gjoQ6}0S%ag<6IB=cua zZnW-5nIyZT55K2e;NRth8E*iRFKCxQOIRdD@59!$)@u3Q zwks!9&lN3Vs&dPj;H;)7LMlR!yFF*PQtzo${FC!25Evy~_mn+C_)E?$%Z{lUAVL|a zmY{_kH~%5|RhOg~F2=-w6v|Dy{+@8-{JT)^m=lV<-%wO3pAgXz`o{|Jn+s;^u6%h+ z>&AH3{4xAYS2$$k5Syawxd5{2p+DyHNX+aH5?fS+ZsaCUmZ+9!+7WbtLc~!He7mnF z*8K(AQQMW^by2cCB!^IVW<~fjIZCc@U_1(dz;J^denYyuoY77bGO+CNngQ0NYtLbz zYKwOw&&1GfpOFb3R;W(%&D?x*cA+yBl}@eBT5Sync?) z{8^AU_03l_!H6p*2wmeIPNoXm9g?F|XC2a@Gj6N;PYACxY<5$4UtsxeBw>w8#8GJV z?p=zeZs)C!Y{v8^q{}V!I{!G2JPzzDhg@sSql98uGp|Cyt5h#q#kIW1GAjHtvYh|N z!0Un>`h49*EVh_k`x90-Sg z^f_pJ7Ptrj<%-pdqQ!P0tY!u#P3_u1<~lE1`#GA_ao(W}2aoB#>6(40d((oo&fdK8 zxs%+gXFC9E+)4XHv5u))TXD9-d{Cnh0Cra6Id{l7Ng>ZE5e^er2V{zYSR8iK zmBWS?yumLOrJm?a;!m!B+ZeRBze;3^E58Mo=ct|LA^XY<k1xX1M5!ApIBZXwI$@THbesFEyIBuW0GCzQ zoUH4*Ftte;GETaz=_WXr%JWa9wUfE9aB8&h;MHqLG*nRsp?GT3jq$!c2oHWFV_XL9 zn#{-<9J=*%n^DrctIR0l+2ThiW-0-cBr&nD7L!olrx$2RrIrW{-P(HSulv7Lv7x%} zRk}?rVrVE}bo;sb)^e4O_)v5I;2uoqDkd~CGtm}XBn8l$$oaNxDrjFLMiQ$rRuXKk zr`M6^#7--S)yc$&2kKVsx6WgsrnbBA%e*h(wvR8N>dz!L2>`)jue!tm_EwwI&t}I( ziqBVj0zWicW7TyAaUI9elD4@s zWyH#UGDVR04WjWHLW}8zh)M8|4uVeTv1scst7rbJE~D+Cep~0cQl%fuMqQ9AO6IH@ z*MESdhq=@TEP)dy4;LTrUy{WC$MFY5`D5Od^I;zM|0tL-!`5Bw=)DIJRPrVXrw<0t zk>q{80R41fD2}%LTrAaUX%dUA;iLk|?a15Lkd2ztx+XU~&MmAcSRcA@jA`0xCjqdX zBj%&EW5-4dbA7s^y8aR*olBYgvM2@lv@q0V10R!+5ao>;Le}@0=O`r0hrauixe&Q+ zLi#*(`bpT*$bd@)T?_O#F~W;!{!s~|=(o50ty6)jNv+K;Ajsq`W+eng>wH86!_H|UM_GCl(V;RczDo5nW~$Et>8@h0GDPCAoE{v z2|oo;aW*q|eo#0DIWssoTDf}tbxOm{(b3l2?tvkH86%tu9`5)bqW3F{zZbp^l>s|k z?D%*dtn)!I4F3j?e~zX92#@d%{?3{Iv4ej?Ar;`yHnYn9YTEB=Y6gVsCp=ug^zb3P z>c6Aw-z4Q>0RKG;9RFU5Dj;hOSG!-2{d0)_>rV>GRDTrLgMU(S{z0Ap;4OdW6|LAo z__)UoJb;8Z#`ofZx(8feOp7`rys?5m@^r;dbnU6l(p=hvlC zU(@Vxp$J3YE(q<%`Eu^djAy0{Qjlk%(B(wk3GQnMza2WTugtj0L*APE6?XdyC z_~Or8()n+>*~S}B9e%wF{=43D{!Q;W`Ck6@viNts*Ht&S_w-_S&7`H?Gn1&>DoH;C-&NTmSUu10NN3wfp$j|siSK9TkkC0@^7LaM| zBLbcZQ903LO2$75Ytq7wAQD63T774qHT_Ns9UXzHj9BU4U!X3oKxbD^XLxsc;Dd`7 Mm6ldYMH= + + + + + +openvaf + + +cluster_bins + +Entry points + + +cluster_lib + +Compilation library + + +cluster_frontend + +Frontend (parsing → HIR) + + +cluster_mir + +MIR middle-end + + +cluster_backend + +Backend (codegen → OSDI) + + +cluster_util + +Utility / data-structure crates + + +cluster_consumers + +Consumer crates + + +cluster_legend + +Legend + + + +openvaf_driver + +openvaf-driver +(CLI binary) + + + +openvaf + +openvaf +(lib) + + + +openvaf_driver->openvaf + + +compile() +expand() + + + +basedb + +basedb +(Salsa root DB) + + + +openvaf->basedb + + +CompilationDB::new_fs() + + + +sim_back + +sim_back +(model extraction) + + + +openvaf->sim_back + + +collect_modules() + + + +osdi + +osdi +(OSDI ABI codegen) + + + +openvaf->osdi + + +osdi::compile() + + + +mir_llvm + +mir_llvm +(LLVM codegen) + + + +openvaf->mir_llvm + + +LLVMBackend::new() + + + +linker + +linker + + + +openvaf->linker + + +link() + + + +target + +target +(target triple) + + + +openvaf->target + + +host_triple() +get_target_names() + + + +base_n + +base_n +(base-36 encode) + + + +openvaf->base_n + + +encode() [cache hash] + + + +tokens + +tokens + + + +lexer + +lexer + + + +tokens->lexer + + +Token types + + + +parser + +parser + + + +lexer->parser + + +token stream + + + +syntax + +syntax +(CST / AST) + + + +syntax->basedb + + +parse() +ast_id_map() + + + +parser->syntax + + +Parse<SourceFile> + + + +vfs + +vfs + + + +paths + +paths +(AbsPathBuf) + + + +vfs->paths + + +AbsPathBuf +reexport + + + +preprocessor + +preprocessor + + + +preprocessor->basedb + + +VfsStorage +SourceMap + + + +basedb->syntax + + +parse() + + + +basedb->vfs + + +file_text() +set_file_text() + + + +basedb->preprocessor + + +preprocess() + + + +hir_def + +hir_def +(item tree, bodies) + + + +hir_def->syntax + + +AstIdMap +AstPtr + + + +hir_def->basedb + + +db.parse() +db.file_text() +Salsa queries + + + +stdx + +stdx +(macros, iters) + + + +hir_def->stdx + + +impl_idx_from! +impl_intern_key! + + + +arena + +arena +(typed arena) + + + +hir_def->arena + + +Idx<T> storage + + + +hir_ty + +hir_ty +(type inference) + + + +hir_ty->basedb + + +Salsa queries + + + +hir_ty->hir_def + + +item_tree() +def_map() +bodies() + + + +hir + +hir +(query facade) + + + +hir->basedb + + +CompilationDB +reexport + + + +hir->hir_def + + +item_tree() +def_map() + + + +hir->hir_ty + + +infer() +alias_resolve() + + + +mir + +mir +(IR types) + + + +mir->arena + + +PrimaryMap<Inst,…> +PrimaryMap<Block,…> + + + +list_pool + +list_pool +(ValueListPool) + + + +mir->list_pool + + +ValueList +ValueListPool + + + +bforest + +bforest +(B+-tree) + + + +mir->bforest + + +PhiNode.blocks +Map<Block,u32> + + + +mir_build + +mir_build +(SSA builder) + + + +mir_build->mir + + +Function +DataFlowGraph +Layout + + + +bitset + +bitset +(BitSet / Sparse) + + + +mir_build->bitset + + +BitSet<Block> +[block reachability] + + + +hir_lower + +hir_lower +(HIR → MIR) + + + +hir_lower->hir + + +MirBuilder::new(db, module) +CallBackKind / PlaceKind + + + +hir_lower->mir + + +Function +dfg.make_inst() +layout.append_inst() + + + +hir_lower->mir_build + + +SSABuilder::def_var() +SSABuilder::use_var() + + + +mir_opt + +mir_opt +(optimisations) + + + +mir_opt->mir + + +dead_code_elimination() +simplify_cfg() +SCCP / inst_combine() + + + +mir_opt->bitset + + +BitSet / HybridBitSet +SparseBitMatrix +[liveness sets] + + + +workqueue + +workqueue +(WorkQueue/Stack) + + + +mir_opt->workqueue + + +WorkQueue<Inst> +[DCE worklist] + + + +mir_autodiff + +mir_autodiff +(AD pass) + + + +mir_autodiff->mir + + +auto_diff() +DominatorTree::compute() + + + +mir_autodiff->mir_opt + + +dead_code_elimination() +simplify_cfg() + + + +mir_reader + +mir_reader +(text parser) + + + +mir_autodiff->mir_reader + + +[dev-dep] +parse test fixtures + + + +mir_interpret + +mir_interpret +(test interpreter) + + + +mir_autodiff->mir_interpret + + +[dev-dep] +numerical verify + + + +mir_autodiff->workqueue + + +WorkQueue<Inst> +[live-derivatives fixpoint] + + + +mir_reader->mir + + +parse_function() +parse_functions() + + + +mir_interpret->mir + + +Interpreter::run() +eval() dispatch + + + +sim_back->hir + + +Module +Parameter +Variable queries + + + +sim_back->hir_lower + + +MirBuilder::build() +HirInterner + + + +sim_back->mir_opt + + +dead_code_elimination() +aggressive_dce() +SCCP / simplify_cfg() + + + +sim_back->mir_autodiff + + +auto_diff() +unknowns() + + + +typed_indexmap + +typed_indexmap +(TiMap/TiSet) + + + +sim_back->typed_indexmap + + +TiMap<…> +[signal/param tables] + + + +osdi->sim_back + + +collect_modules() +Module list + + + +osdi->mir_llvm + + +LLVMBackend::new() +CodegenCx::compile_module() + + + +osdi->linker + + +link() + + + +osdi->target + + +is_like_windows +symbol naming + + + +osdi->base_n + + +encode() [UUID / symbol names] + + + +mir_llvm->mir + + +Function traversal +Opcode dispatch + + + +mir_llvm->target + + +target_machine +data_layout +features + + + +linker->target + + +linker flavor +pre/post link args + + + +linker_target + +linker_target + + + +linker->linker_target + + +LinkerFlavor +LldFlavor + + + +target->linker_target + + +LinkerFlavor +abi + + + +stdx->hir_def + + +impl_idx_from! +impl_intern_key! + + + +stdx->mir + + +impl_idx_from! + + + +stdx->sim_back + + +impl_idx_from! +zip() + + + +workqueue->bitset + + +BitSet<T> +[membership oracle] + + + +mini_harness + +mini_harness +(test runner) + + + +sourcegen + +sourcegen +(codegen tests) + + + +sourcegen->syntax + + +ensure_file_contents() +SyntaxKind gen + + + +sourcegen->hir + + +ensure_file_contents() +builtin functions gen + + + +sourcegen->mir + + +ensure_file_contents() +opcode / InstBuilder gen + + + +sourcegen->osdi + + +ensure_file_contents() +C header bindings gen + + + +melange_core + +melange-core +(circuit simulator) + + + +melange_core->openvaf + + +compile() +[batch mode] + + + +melange_core->typed_indexmap + + +TiMap / TiSet +[Circuit internals] + + + +verilogae + +verilogae +(model eval lib) + + + +verilogae->paths + + +AbsPathBuf +[include dirs] + + + +verilogae->hir + + +CompilationDB::new() +Module / Parameter queries + + + +verilogae->hir_lower + + +MirBuilder::build() +HirInterner + + + +verilogae->mir_opt + + +dead_code_elimination() +aggressive_dce() +simplify_cfg() + + + +verilogae->mir_autodiff + + +auto_diff() + + + +verilogae->mir_llvm + + +LLVMBackend::new() +gen_func_obj() + + + +verilogae->linker + + +link() + + + +verilogae->target + + +Target::host_target() +Target::search() + + + +verilogae->bitset + + +BitSet [output values] + + + +verilogae->typed_indexmap + + +TiMap [model info] + + + +verilogae->base_n + + +encode() [function prefix] + + + +verilogae_ffi + +verilogae_ffi +(C FFI wrapper) + + + +verilogae_ffi->verilogae + + +verilogae_load() +verilogae_export_vfs() +verilogae_call_fun_parallel() +[static feature: direct Rust] +[dynamic: extern C ABI] + + + +verilogae_py + +verilogae_py +(Python extension) + + + +verilogae_py->verilogae_ffi + + +verilogae_load() +verilogae_fun_ptr() +verilogae_init_modelcard() +verilogae_real_params() … + + + +xtask + +xtask +(build scripts) + + + +xtask->base_n + + +encode() [cache hash] + + + +leg_solid + + solid arrow = runtime dependency + + + +leg_dashed + + dashed arrow = dev-dep / generated / macro use + + +