Non-Latin text support via MTTextAtom#200
Conversation
Introduces a new atom type kMTMathAtomText distinct from Ordinary so it won't be fused by Rule 14, an MTTextStyle enum (Roman/Bold/Italic/ SansSerif/Typewriter), an MTTextAtom subclass with raw-Unicode body and LaTeX round-trip with the eight standard escapes, and the +textStyleWithName:/+commandNameForTextStyle: factory APIs that drive parser and serializer lookups. No parser/typesetter wiring yet; \text* still flows through the legacy fontStyles path. Existing behavior unchanged. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Returns a CT font configured for system text rendering with optional bold/italic traits, or a system monospace font for typewriter style. Caller owns the returned CTFontRef (CF_RETAINED). No callers yet — wired into MTTextDisplay/typesetter in following commits. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
A new MTDisplay subclass that owns one CTLineRef built from raw text and a CTFontRef, with an xHeightShift applied at draw time so the text-font x-height can be aligned with the math x-height. Models MTCTLineDisplay's lifecycle for color/draw but takes a CTFontRef directly rather than an MTFont so it can carry a system text font. Not yet wired into the typesetter — covered by the next chunk. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds case kMTMathAtomText: to createDisplayAtoms — flushes the current math line, builds an MTTextDisplay backed by a system text CTFont, and positions it like any other inline display. Inter-element spacing maps kMTMathAtomText into the Ord row/column of the existing 8x8 matrix without changing the matrix itself. Sub/superscripts attach via the existing makeScripts: path with delta=0 (no italic correction). Baseline alignment uses mathTable.accentBaseHeight (\fontdimen5 / TeX x-height) minus CTFontGetXHeight(textFont) so the lowercase x-heights line up — visually correct for Latin/Cyrillic/Greek; approximate for CJK/Devanagari per LLD section 5.1. Tests construct MTTextAtoms programmatically; \text* still flows through the legacy parser path until the next commit flips the switch. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
\text, \textrm, \textbf, \textit, \textsf, \texttt now build a single
MTTextAtom from the raw {…} body. The new readTextArgument honors
balanced nested {…} as TeX-style grouping (the braces are stripped),
processes the eight standard escapes \\, \{, \}, \_, \^, \%, \&, \#,
\$, and rejects $ and any other backslash sequence as
MTParseErrorInvalidCommand. \textbf{你好}, \text{नमस्ते}, \text{مرحبا}
etc. now render correctly through CoreText cascade fallback.
Removes the six \text* keys (text, textrm, textbf, textit, textsf,
texttt) from MTMathAtomFactory.fontStyles so no fall-through path
remains. \math* keys are preserved — \mathbf{x} etc. still produce
Unicode-math-alphanumeric remapping unchanged.
Updates the existing testText to expect the new MTTextAtom shape and
adds parser/integration tests covering the body-capture grammar
(empty, ASCII, all five styles, CJK, Cyrillic, Devanagari, Hebrew,
Arabic, mixed scripts, escapes, NBSP, nested braces), scripts,
round-tripping, and parse-error cases.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The U+0411-U+044E carve-out was inconsistent: only one Cyrillic block,
off-by-one against U+0410-U+044F, and no equivalent for Greek, Hebrew,
Arabic, or CJK. Now that \text* offers a uniform path for non-Latin
text, the special case is more confusing than useful.
After this commit, raw math input is ASCII U+0021-U+007E only. Cyrillic
must be wrapped in \text*, \textbf{...} etc. (already exercised by
testTextCyrillic*).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds demo entries covering Latin/Cyrillic/Chinese/Devanagari/Hebrew/ Arabic mixed with math, the five \text* styles side-by-side, and a text block carrying scripts. Visible in iosMathExample, MacOSMathExample, and SwiftMathExample. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
|
||
| self.width = (CGFloat)CTLineGetTypographicBounds(_line, NULL, NULL, NULL); | ||
| CGRect bounds = CTLineGetBoundsWithOptions(_line, kCTLineBoundsUseGlyphPathBounds); | ||
| self.ascent = MAX(0, CGRectGetMaxY(bounds)); |
There was a problem hiding this comment.
Because draw: later sets the CT text position to self.position.y - _xHeightShift, the ink no longer lives in the coordinate space described by these unshifted metrics. For positive shifts, the real descent is -CGRectGetMinY(bounds) + _xHeightShift; displayBounds, label sizing/background fills, and script placement can all undercount the lower ink. Please fold _xHeightShift into the reported ascent/descent here, or expose shifted accessors like the existing shifted glyph displays do.
There was a problem hiding this comment.
Already addressed in 3ff02bf — _xHeightShift was removed from MTTextDisplay entirely. draw: now uses self.position.y directly, so the unshifted metrics here are consistent with the draw position and no longer undercount lower ink.
| case '^': case '%': case '&': case '#': case '$': | ||
| [body appendFormat:@"%C", esc]; | ||
| break; | ||
| default: |
There was a problem hiding this comment.
This now rejects \ inside text bodies. The old text/font-style path accepted that single-character space command, and it is common LaTeX, so existing inputs such as \text{hello\ world} or \textbf{hello\ world} now fail with MTParseErrorInvalidCommand. Please preserve that compatibility, for example by adding case ' ': to append a literal space and covering it with a regression test.
There was a problem hiding this comment.
Fixed in f3d4e12. readTextArgument now treats \<space> as a forced literal space (matching the legacy single-char command behavior), and testTextBackslashSpace covers both \text{hello\ world} and \textbf{hello\ world}.
The original LLD called for an x-height shift so text x-height aligned
with the math x-height, implemented using MTFontMathTable.accentBaseHeight.
That property is TeX's math axis height (\fontdimen5), not the x-height,
so the shift was both poorly motivated and incorrectly computed.
Drop the xHeightShift parameter from MTTextDisplay and the helper from
MTTypesetter; \text{...} now shares the surrounding math baseline, which
matches TeX semantics. Pin the contract with an equality assertion in
testTextInMixedLine.
Refresh demos 23-30 to "x + \text{...} + y = ..." compositions that make
the baseline alignment visually verifiable across Latin, Cyrillic, CJK,
Devanagari, Arabic, Hebrew, and the five \text* styles.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Share the \text* body's escapable-character set via
MTTextAtom.latexEscapableCharacterSet so the LaTeX writer
(MTTextAtom.appendLaTeXToString:) and the parser
(MTMathListBuilder.readTextArgument) cannot drift apart.
- Add testTextSubAndSuperscript covering combined ^ and _ on a
text atom (parse + round-trip).
- Expand MTTextStyle doc comments to match the MTFontStyle style.
- Replace the fixture-shaped "x + \\text{...} + y = ..." demo rows
with realistic compositions (vector-calculus definition, area-
of-a-circle in five non-Latin scripts, multi-style textbook
definition, Cyrillic Pythagoras label) and bump demo 23 from
60 to 70 to fit its new fraction.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
CTFontCreateUIFontForLanguage with kCTFontUIFontSystem / kCTFontUIFontUserFixedPitch does not return NULL on supported platforms. The dead fallback also bypassed the bold/italic trait-application that follows, so a hit would have silently served unstyled Helvetica. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The legacy text path treated \<space> as a single-character command that
mapped to an ordinary atom with a literal space, so inputs like
\text{hello\ world} and \textbf{hello\ world} worked. The new
\text* parser in readTextArgument restricts backslash escapes to the
\{}_^%&#$ set, so those inputs now fail with MTParseErrorInvalidCommand.
Accept \<space> as a forced literal space in \text* bodies and add a
regression test covering both \text and \textbf.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
@gemini-code-assist review code |
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request implements native support for \text* commands by introducing MTTextAtom and MTTextDisplay, allowing verbatim Unicode text capture and rendering via system fonts. The parser and typesetter were updated to handle these blocks, including support for non-Latin scripts. Feedback highlights a potential crash on iOS 6 from using CTFontCreateUIFontForLanguage and a regression where braces are now strictly required for \text* arguments, breaking standard LaTeX single-character compatibility.
| break; | ||
| } | ||
|
|
||
| CTFontRef baseFont = CTFontCreateUIFontForLanguage(base, size, NULL); |
There was a problem hiding this comment.
CTFontCreateUIFontForLanguage was introduced in iOS 7.0. Since the library maintains compatibility with iOS 6.0 (as indicated by isIos6Supported in MTMathListDisplay.m), using this function will cause a crash on older systems. CTFontCreateUIFontType is a safer alternative available since iOS 3.0 and macOS 10.5.
CTFontRef baseFont = CTFontCreateUIFontType(base, size, NULL);There was a problem hiding this comment.
@gemini-code-assist Current repo support is iOS 18+ and macOS 15+, not iOS 6
There was a problem hiding this comment.
You are correct, my apologies for the oversight regarding the minimum supported OS version. Given that the project now targets iOS 18+ and macOS 15+, CTFontCreateUIFontForLanguage is indeed the appropriate and modern API to use. Thank you for the clarification.
| if (![self hasCharacters] || [self getNextCharacter] != '{') { | ||
| // Roll the position back so the error highlights the right spot. | ||
| if ([self hasCharacters] || _currentChar > 0) { | ||
| [self unlookCharacter]; | ||
| } | ||
| [self setError:MTParseErrorCharacterNotFound | ||
| message:@"Missing { for \\text* argument"]; | ||
| return nil; | ||
| } |
There was a problem hiding this comment.
The new \text* commands now strictly require an opening brace {. In standard LaTeX, and in previous versions of this library for font styles, commands like \textbf can be used without braces for a single following character (e.g., \textbf x). This change breaks compatibility for such inputs. Consider allowing a single character or command as an argument if no opening brace is found, while maintaining the raw capture logic for the braced case.
|
Review findings
|
There was a problem hiding this comment.
Code Review
This pull request introduces support for text-mode commands such as \text, \textbf, and \textit by implementing a new MTTextAtom and MTTextDisplay rendering path using CoreText. This allows for raw Unicode text runs, including non-Latin scripts like Arabic and Chinese, to be rendered within math equations. The feedback identifies potential crashes on older iOS versions due to the use of APIs like CTLineGetBoundsWithOptions (iOS 12.0+) and CTFontCreateUIFontForLanguage (iOS 7.0), and suggests improving LaTeX compatibility by allowing single-character arguments without braces.
| (__bridge CFAttributedStringRef)_attributedString); | ||
|
|
||
| self.width = (CGFloat)CTLineGetTypographicBounds(_line, NULL, NULL, NULL); | ||
| CGRect bounds = CTLineGetBoundsWithOptions(_line, kCTLineBoundsUseGlyphPathBounds); |
There was a problem hiding this comment.
CTLineGetBoundsWithOptions is only available on iOS 12.0+ and macOS 10.14+. Since this library includes compatibility checks for older versions (e.g., isIos6Supported), this call will cause a crash on devices running older OS versions. Consider adding a version guard and providing a fallback mechanism for calculating glyph bounds, similar to the computeDimensions: method used in MTCTLineDisplay.
There was a problem hiding this comment.
@gemini-code-assist Current repo support is iOS 18+ and macOS 15+, not iOS 6.
There was a problem hiding this comment.
Thank you for the clarification regarding the supported OS versions. Since the project targets iOS 18+ and macOS 15+, CTLineGetBoundsWithOptions is indeed safe to use. I apologize for the confusion regarding the legacy support constraints.
| if (![self hasCharacters] || [self getNextCharacter] != '{') { | ||
| // Roll the position back so the error highlights the right spot. | ||
| if ([self hasCharacters] || _currentChar > 0) { | ||
| [self unlookCharacter]; | ||
| } | ||
| [self setError:MTParseErrorCharacterNotFound | ||
| message:@"Missing { for \\text* argument"]; | ||
| return nil; | ||
| } |
There was a problem hiding this comment.
The readTextArgument method strictly expects an opening brace {. In standard LaTeX, commands like \textbf or \textit can also accept a single character or token as an argument without braces (e.g., \textbf x). This implementation change may break existing LaTeX strings that do not use braces for these commands. If full LaTeX compatibility is desired, consider allowing a single-token fallback if the next character is not {.
| break; | ||
| } | ||
|
|
||
| CTFontRef baseFont = CTFontCreateUIFontForLanguage(base, size, NULL); |
There was a problem hiding this comment.
CTFontCreateUIFontForLanguage was introduced in iOS 7.0. If the library is intended to support iOS 6 (as suggested by isIos6Supported in MTMathListDisplay.m), this call will result in a crash on those devices. Consider using CTFontCreateWithName with a fallback to the system font name if the current OS version is below 7.0.
There was a problem hiding this comment.
@gemini-code-assist Current repo support is iOS 18+ and macOS 15+, not iOS 6.
There was a problem hiding this comment.
Thank you for the clarification, kostub. I appreciate the context regarding the project's supported platform versions. Since the minimum deployment target is iOS 18+ and macOS 15+, the use of CTFontCreateUIFontForLanguage is perfectly appropriate. I will keep this in mind for future reviews.
Summary
Adds first-class non-Latin text support to iosMath via a new
MTTextAtom(Approach E from the LLD):\text,\textrm,\textbf,\textit,\textsf,\textttnow capture their bodies as rawNSStringand render through CoreText's system-font cascade. This unlocks CJK, Cyrillic, Devanagari, Hebrew, Arabic, etc. inside math layouts without relying on the math font's BMP coverage. The implementation also retires the legacy Cyrillic-as-Variable carve-out, since\textnow handles arbitrary scripts uniformly.kMTMathAtomText(= 19) withMTTextStyleenum (Roman/Bold/Italic/SansSerif/Typewriter), distinct from Ordinary so Rule 14 fusion is avoided.MTTextDisplayopaque sub-display embedded into the parent math line, x-height-aligned via the math font's\fontdimen5.MTFontManager +textCTFontForStyle:size:returning aCTFontRefwith system font + symbolic traits.\text*commands route toMTTextAtom; raw-body capture supports balanced braces and the standard 8 LaTeX escapes (\\,\{,\},\_,\^,\%,\&,\#,\$).Plan & LLD
(These live under untracked `docs/` per the orchestrator setup; they're not in the diff but are the design reference for review.)
Commits
Each commit's tests pass independently (verified per-commit with `swift test`).
Test plan