Skip to content

Re-key reverse symbol map by (nucleus, type)#201

Open
kostub wants to merge 12 commits into
masterfrom
worktree-new-symbols
Open

Re-key reverse symbol map by (nucleus, type)#201
kostub wants to merge 12 commits into
masterfrom
worktree-new-symbols

Conversation

@kostub
Copy link
Copy Markdown
Owner

@kostub kostub commented May 14, 2026

Summary

  • Re-keys +textToLatexSymbolNames from nucleus → command to nucleus → (type → command)
  • Adds a Bin/Un retry in +latexSymbolNameForAtom: so MTMathList.finalized's leading-Bin → Un reclassification doesn't break round-trip
  • Updates +addLatexSymbol:value: to write to the nested dict
  • Adds testReverseMapTypeKeyedRoundTrip covering \\pm, \\cdot, \\leq, \\to, \\alpha and confirms behavior with and without finalize-induced Bin → Un transitions

No new symbols and no public API changes. De-risks PRs that will add type-distinct entries at shared nuclei (triangle family, future \\bigtriangleup).

Test plan

  • Existing iosMath tests still pass
  • SPM swift test still passes
  • New testReverseMapTypeKeyedRoundTrip covers Bin atoms at start-of-list and mid-list

🤖 Generated with Claude Code

kostub and others added 6 commits May 14, 2026 09:02
textToLatexSymbolNames is now a two-level dict keyed by nucleus and then
boxed MTMathAtomType. latexSymbolNameForAtom: retries the (nucleus, Bin)
cell when (nucleus, Un) misses, since MTMathList.finalized reclassifies
leading Bin atoms to Un. This unblocks future PRs that register
type-distinct symbols at the same nucleus.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
f' parses as f^{\prime}, y'' as y^{\prime\prime}, etc. Mirrors the
existing ^ branch in -buildInternal:stopChar:: allocates an empty Ord
when there is no host atom or the host already has a superscript,
greedily collects consecutive primes, and merges with a trailing ^
(TeX \futurelet behavior). +atomForCharacter:'s rejection of '
is preserved so +mathListForCharacters: (used by
+fractionWithNumeratorStr:) keeps its no-LaTeX semantics.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Registers \nleq, \ngeq, \nless, \ngtr, \nsubseteq, \nsupseteq,
\nmid, \nparallel and 23 more negated-arrow / negated-relation
symbols as kMTMathAtomRelation. Symbol-table only; no parser or
layout changes.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
\boxplus, \boxminus, \boxtimes, \boxdot, \circledast,
\circledcirc, \circleddash, \barwedge, \veebar, \triangleleft,
\triangleright. All registered as kMTMathAtomBinaryOperator.
Round-trip tests verify the Bin → Un retry in
+latexSymbolNameForAtom: handles the start-of-list case.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
\rightleftharpoons, \leftrightharpoons, \upharpoonleft,
\upharpoonright, \downharpoonleft, \downharpoonright,
\rightharpoonup, \leftharpoonup, \rightharpoondown,
\leftharpoondown, \hookleftarrow, \hookrightarrow,
\twoheadleftarrow, \twoheadrightarrow, \rightarrowtail,
\leftarrowtail registered as kMTMathAtomRelation.

\restriction registered as an alias for \upharpoonright.
Bundled in one commit so the alias is never broken on a transient
build.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…p \square

Adds 17 missing relations (\vdash, \dashv, \Subset, \Supset,
\backsim, \backsimeq, \eqsim, \Bumpeq, \bumpeq, \therefore,
\because, \multimap, \vartriangleleft, \vartriangleright,
\trianglelefteq, \trianglerighteq, \triangleq) and 17 missing
ordinaries (\complement, \Box, \Diamond, \lozenge,
\blacklozenge, the four card suits, the Hebrew letters, and the
triangle family).

Removes the \square → +placeholder mis-wiring and registers
\Box as the canonical Ord/U+25A1 entry. \square now resolves
via an alias to \Box. +placeholder is unchanged.

Adds the remaining Feature-6 aliases: \implies, \impliedby,
\dotsc, \dotsb, \dotsm, \dotsi, \square, \vartriangle.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copy link
Copy Markdown
Owner Author

@kostub kostub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review — PR #201 (Re-key reverse symbol map by (nucleus, type))

All 169 tests pass (swift test). The core idea is sound and the implementation is clean. A few observations below, ranging from a correctness concern to nits.


Correctness: nsucceq / npreceq codepoints may be wrong

In iosMath/lib/MTMathAtomFactory.m:

@"nsucceq" : [MTMathAtom atomWithType:kMTMathAtomRelation value:@""],
@"npreceq" : [MTMathAtom atomWithType:kMTMathAtomRelation value:@""],

The standard amssymb assignments for \npreceq / \nsucceq are typically U+22E0 (DOES NOT PRECEDE OR EQUAL, ) and U+22E1 (DOES NOT SUCCEED OR EQUAL, ). U+2AB0 and U+2AB1 are / (PRECEDES/SUCCEEDS ABOVE SINGLE-LINE NOT EQUAL TO) — which is what textsf{stmaryrd} uses. Most LaTeX distributions ship \npreceq = ⋠ (U+22E0) and \nsucceq = ⋡ (U+22E1). Worth double-checking against the specific font's glyph mapping before merging.


Design: Un→Bin fallback is one-directional only

In latexSymbolNameForAtom:, the fallback goes Un→Bin. But finalized also converts a trailing Bin to Un. If any downstream caller ever passes a post-finalized atom that was trailing (not leading), the same problem applies. The comment in the code mentions "leading/orphan", which covers the known cases today — but the asymmetry between what finalized can reclassify and what the fallback handles is worth noting as a known limitation if future atoms are added.


addLatexSymbol: overwrites without respect to length preference

The new textToLatexSymbolNames builder respects the "shorter/alphabetical" tie-break. addLatexSymbol: (the public API path) does not — it blindly overwrites inner[typeKey] with the new name if a prior entry already exists. This is the same behaviour as before the re-key, so not a regression, but it's now more visible since the inner dict is directly written. Could be worth adding the same guard here for consistency:

NSString* existing = inner[@(atom.type)];
if (!existing || name.length < existing.length ||
    (name.length == existing.length && [name compare:existing] == NSOrderedAscending)) {
    inner[@(atom.type)] = name;
}

Nit: \square removed from the symbol table, not just re-routed

The diff removes @"square" from supportedLatexSymbols and adds it only to aliases. This means \square no longer appears in +supportedLatexSymbolNames, which changes the observable API contract slightly (a caller enumerating that list to populate a keyboard palette would lose square). That may be intentional — testSquareBoxParity confirms round-trips still work — but it is a behaviour change worth calling out explicitly in the PR description.


Nit: dotsb / dotsm both alias \cdots; dotsc / dotsi both alias \ldots

This is correct per the amsmath spec (\dotsb = binary-operator dots = \cdots, \dotsc = comma-separated dots = \ldots). No issue here — just confirming it's intentional rather than a copy-paste error.


Minor: testReverseMapTypeKeyedRoundTrip\pm trailing space in expected value

@[ @"\\pm",     @"\\pm " ],

The trailing space in @"\\pm " is the correct output of mathListToString: (the serializer appends a space after every command name). This is expected and correct — just noting it for reviewers who might wonder if it's a typo.


Overall this is well-structured work. The main item to verify before merging is the nsucceq/npreceq codepoint question.

@kostub
Copy link
Copy Markdown
Owner Author

kostub commented May 14, 2026

/review code

@kostub
Copy link
Copy Markdown
Owner Author

kostub commented May 14, 2026

/gemini review

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request expands iosMath's LaTeX compatibility by adding 91 new symbols, implementing prime shorthand in the parser, and refactoring the reverse symbol map to preserve atom types during round-trip serialization. Review feedback identifies several issues, including inconsistent tie-breaking logic in the runtime symbol registration API, incorrect Unicode codepoints for \nsucceq and \npreceq, and an incorrect alias for \dotsi. Additionally, \square is redundantly defined as both a canonical symbol and an alias, leading to incorrect expectations in the test suite.

Comment on lines +220 to +223
if (!existing || name.length < existing.length ||
(name.length == existing.length && [name compare:existing] == NSOrderedAscending)) {
inner[typeKey] = name;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The tie-breaking logic here is inconsistent with the logic in textToLatexSymbolNames. If an existing symbol with the same name is re-added, this implementation will not update it, whereas textToLatexSymbolNames would. This can lead to unexpected behavior for clients using this runtime API.

The comparison should be != NSOrderedDescending to include NSOrderedSame and be consistent with the logic for building the initial reverse map.

        if (!existing || name.length < existing.length ||
            (name.length == existing.length && [name compare:existing] != NSOrderedDescending)) {
            inner[typeKey] = name;
        }

Comment thread iosMath/lib/MTMathAtomFactory.m Outdated
Comment on lines +669 to +670
@"nsucceq" : [MTMathAtom atomWithType:kMTMathAtomRelation value:@"\u22E1"],
@"npreceq" : [MTMathAtom atomWithType:kMTMathAtomRelation value:@"\u22E0"],
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The Unicode characters for \nsucceq and \npreceq are incorrect. According to the LLD and standard LaTeX, they should be U+2AB1 and U+2AB0 respectively. The current implementation uses U+22E1 and U+22E0, which correspond to \nsqsupseteq and \nsqsubseteq.

                     @"nsucceq" : [MTMathAtom atomWithType:kMTMathAtomRelation value:@"\u2AB1"],
                     @"npreceq" : [MTMathAtom atomWithType:kMTMathAtomRelation value:@"\u2AB0"],

@"dotsc" : @"ldots",
@"dotsb" : @"cdots",
@"dotsm" : @"cdots",
@"dotsi" : @"cdots",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The \dotsi command is incorrectly aliased to \cdots. According to the LLD and standard LaTeX, it should be an alias for \ldots.

                    @"dotsi" : @"ldots",

// Missing ordinaries (logic / suits / Hebrew letters / amssymb)
@"complement" : [MTMathAtom atomWithType:kMTMathAtomOrdinary value:@"\u2201"],
@"Box" : [MTMathAtom atomWithType:kMTMathAtomOrdinary value:@"\u25A1"],
@"square" : [MTMathAtom atomWithType:kMTMathAtomOrdinary value:@"\u25A1"],
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The \square symbol is defined as an alias for \Box in +aliases. Defining it here as a canonical symbol is redundant and goes against the design outlined in the LLD. The alias mechanism will handle resolving \square to \Box, so this entry should be removed.

Comment thread iosMathTests/MTMathListBuilderTest.m Outdated
Comment on lines +1792 to +1793
@[ @"nsucceq", @0x22E1 ],
@[ @"npreceq", @0x22E0 ],
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This test case uses incorrect Unicode values for \nsucceq and \npreceq, which match the buggy implementation in MTMathAtomFactory. Once the factory is fixed, this test should be updated to use the correct codepoints (0x2AB1 and 0x2AB0).

        @[ @"nsucceq",         @0x2AB1 ],
        @[ @"npreceq",         @0x2AB0 ],

@[ @"dotsc", @"ldots", @"…", @"\\ldots " ],
@[ @"dotsb", @"cdots", @"⋯", @"\\cdots " ],
@[ @"dotsm", @"cdots", @"⋯", @"\\cdots " ],
@[ @"dotsi", @"cdots", @"⋯", @"\\cdots " ],
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This test case for the \dotsi alias is incorrect, as it expects it to resolve to \cdots. It should be updated to expect \ldots once the alias definition in MTMathAtomFactory is corrected.

        @[ @"dotsi",        @"ldots",          @"", @"\\ldots " ],


XCTAssertEqualObjects([MTMathListBuilder mathListToString:a], @"\\Box ");
XCTAssertEqualObjects([MTMathListBuilder mathListToString:b], @"\\Box ");
XCTAssertTrue([[MTMathAtomFactory supportedLatexSymbolNames] containsObject:@"square"]);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This assertion incorrectly enforces that \square is a canonical symbol in supportedLatexSymbolNames. According to the design documents, \square should be an alias for \Box and not a canonical symbol itself. This line should be removed after removing \square from supportedLatexSymbols.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant