Skip to content

Derive the unit lexer from registered symbols#29

Merged
nertzy merged 1 commit into
minad:masterfrom
nertzy:dynamic-lexer
Jun 12, 2026
Merged

Derive the unit lexer from registered symbols#29
nertzy merged 1 commit into
minad:masterfrom
nertzy:dynamic-lexer

Conversation

@nertzy

@nertzy nertzy commented Jun 12, 2026

Copy link
Copy Markdown
Collaborator

The tokenizer used a frozen character class (/[a-zA-Z_°'"][\w°'"]*/) to recognize unit symbols. Any registered symbol built from characters outside that class was silently dropped to a dimensionless value rather than rejected -- so Unit(1, "Ω") and Unit(1, "%") returned a unitless 1, and compatible_with? on them raised. The same latent bug affected , π, the arcminute/arcsecond quotes, a.u., Å, and .

Build the tokenizer and symbol matcher from the symbols actually registered in the system, plus an ASCII baseline, and memoize them. #load clears the memo, so a unit defined at runtime -- including one whose symbol uses a non-ASCII glyph -- becomes lexable without touching the lexer. A symbol the system has never registered still parses to a bare unit token, preserving prior behavior for unknown ASCII symbols.

Add a spec that round-trips every symbol in the default system, which is what surfaced the dropped glyphs above.

The tokenizer used a frozen character class (/[a-zA-Z_°'"][\w°'"]*/)
to recognize unit symbols. Any registered symbol built from characters
outside that class was silently dropped to a dimensionless value rather
than rejected -- so Unit(1, "Ω") and Unit(1, "%") returned a unitless
1, and compatible_with? on them raised. The same latent bug affected
℃, π, the arcminute/arcsecond quotes, a.u., Å, and ℉.

Build the tokenizer and symbol matcher from the symbols actually
registered in the system, plus an ASCII baseline, and memoize them.
#load clears the memo, so a unit defined at runtime -- including one
whose symbol uses a non-ASCII glyph -- becomes lexable without touching
the lexer. A symbol the system has never registered still parses to a
bare unit token, preserving prior behavior for unknown ASCII symbols.

Add a spec that round-trips every symbol in the default system, which
is what surfaced the dropped glyphs above.
@nertzy nertzy merged commit da76701 into minad:master Jun 12, 2026
4 checks passed
@nertzy nertzy deleted the dynamic-lexer branch June 15, 2026 20:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant