Skip to content

ts: two divergent canonicalize implementations (@simlin/core incomplete vs @simlin/engine Rust-faithful) #624

@bpowers

Description

@bpowers

Problem

The TypeScript codebase now has two separate name-canonicalization implementations with different, divergent rules:

File Export Status
src/core/canonicalize.ts canonicalize Incomplete — the long-standing shared helper
src/engine/src/internal/canonicalize.ts canonicalizeIdent Complete, Rust-faithful — added in commit 6e5cdaec during the @simlin/engine wasm-sim work

Both are meant to reproduce the canonicalization performed by Rust simlin-engine/src/common.rs canonicalize, but only the engine-local copy actually does so. The @simlin/core version diverges in correctness-relevant ways:

  • No unquoted-dot -> module separator. Rust rewrites an unquoted . (e.g. model.variable) to U+00B7 MIDDLE DOT (·). src/core/canonicalize.ts leaves the . untouched.
  • No quoted-inner-dot -> sentinel. Rust maps a . inside a quoted identifier to U+2024 ONE DOT LEADER (LITERAL_PERIOD_SENTINEL) to stay idempotent (see simlin-engine engine: C-LEARN full simulation blocked by non-macro issues (circular dep, dimension mismatches, unknown dep) -- the macro epic's deferred 'tracked separately' follow-up #559). src/core/canonicalize.ts has no such handling.
  • Whole-string quote check, not quote-aware per-part. src/core/canonicalize.ts only checks whether the entire trimmed string starts and ends with ", and lowercases the whole string before stripping the outer quotes. The Rust rules (and the engine copy) split into quote-aware parts via an IdentifierPartIterator and substitute per part.

The engine copy (canonicalizeIdent) faithfully ports all of the above (quote-aware part splitting, both separator substitutions, the \\ -> \ unescape, the whitespace-run + literal \n/\r escape collapse to a single _, and lowercasing) and was differential-checked byte-for-byte against the Rust oracle across a broad input corpus (src/engine/tests/canonicalize.test.ts).

Why it matters

  • Silent divergence risk (maintainability/correctness). Two copies of the same conceptual function will drift over time. A future fix to one (e.g. a new Rust canonicalization rule) can easily miss the other, and because the inputs that exercise the differences are dotted/quoted identifiers — relatively rare in casual testing — a divergence can go unnoticed.
  • Single source of truth. Canonicalization is the contract by which a raw caller name resolves to the same key the engine/WASM WasmLayout (and the VM's get_var_names) use. Having one authoritative TS implementation that mirrors common.rs is the correct end state.

Components affected

  • src/core/canonicalize.ts (@simlin/core, the incomplete shared helper)
  • src/engine/src/internal/canonicalize.ts (@simlin/engine, the complete copy)
  • Every existing consumer of @simlin/core's canonicalize (must be audited for behavior change before unification — see below)

Why two copies exist today (intentional, not an accident)

The split was a deliberate decision during the wasm-sim feature, documented in the engine module's rustdoc and in commit 6e5cdaec. src/core/canonicalize.ts is shared by other consumers whose behavior must not shift mid-feature, so it was left unchanged rather than upgraded in place. The engine needed fully-correct, Rust-faithful canonicalization for name resolution, so it got its own correct copy. This issue tracks paying down that intentional debt later.

Possible approaches for resolution

  1. Upgrade src/core/canonicalize.ts to the complete Rust-faithful rules (effectively move/promote the canonicalizeIdent logic into @simlin/core), then have @simlin/engine consume it and delete the engine-local copy. The differential test corpus in src/engine/tests/canonicalize.test.ts should move/extend to cover the unified implementation.
  2. Audit every existing @simlin/core canonicalize caller before flipping, since the unified rules change output for dotted and quoted identifiers (unquoted . -> ·, quoted inner . -> U+2024, per-part quote handling). Any consumer relying on the current incomplete behavior must be updated or confirmed unaffected.
  3. Keep one authoritative copy mirroring simlin-engine/src/common.rs, ideally with a guard/differential test against the Rust oracle so future Rust-side changes force a TS update.

Context

Identified during Phase 2 of the @simlin/engine wasm simulation backend work (branch engine-wasm-sim). The engine-local copy was added in commit 6e5cdaec. Both src/engine/src/internal/canonicalize.ts (its canonicalizeIdent rustdoc) and the commit message explicitly note that the two implementations "should later be unified into one Rust-faithful implementation" — this issue is that follow-up.

Metadata

Metadata

Assignees

No one assigned

    Labels

    engineIssues with the rust-based simulation enginehygieneToil, but its useful to get get too behind on itjavascriptPull requests that update Javascript code

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions