iWork readers: Numbers + Keynote on a shared SwiftTextIWA core#34
Merged
Conversation
Numbers shares the IWA container and the TST table model with Pages, so
this reuses the existing package/Snappy/protobuf/object-graph machinery
and adds only the Numbers-specific navigation (TN.DocumentArchive ->
sheets -> TN.SheetArchive -> drawables -> TST.TableInfoArchive).
- TSTTableReader: app-neutral decoder for the shared TST table model
(TableModelArchive -> DataStore -> tiles -> cells), reading frozen cell
values (the cached formula results) so tables read as static text with
no calculation engine. Handles both modern ("BNC", decimal128) and
older pre-BNC (double) cell storage.
- NumbersParser/NumbersDocument/NumbersFile: read a .numbers file into
sheets of tables and render Markdown (GFM), HTML, or TSV.
- `swifttext numbers` CLI subcommand mirroring `pages`.
- Fix a latent cell-offset bug in both the new reader and PagesParser:
0xFFFF marks an empty cell, not end-of-row, so cells after a gap (a
total in column C with B blank) were being dropped.
Verified against 29/30 real .numbers files (the one miss is a legacy
iWork '09 XML document, correctly detected) plus a bundled fixture test.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
TSTTable and NumbersDocument now conform to Codable, and NumbersFile.json() emits the document (sheets of used-range tables) as JSON. This gives LLM agents and other programmatic callers a structured representation, not just rendered Markdown/HTML. Exposed via `swifttext numbers --json`. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
PagesParser.tableGrid kept its own copy of the TST cell-grid decoder (tile walk, decimal128, cell offsets), duplicating TSTTableReader. It now resolves the attachment -> TableInfoArchive -> model chain and defers to TSTTableReader, passing Pages' own rich-cell renderer (inline bold/italic from char-style runs) and rich-cell alignment as closures. The decimal128/frozen-value/cell-offset helpers are deleted from PagesParser (-172 lines net); PagesCellValueTests now exercises them on TSTTableReader. No behavior change: all 93 SwiftTextPages tests pass unchanged. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 8a31e6edda
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Splits the iWork read core out of SwiftTextPages into a new SwiftTextIWA library — Snappy, the wire-level Protobuf codec, the .iwa container/object store (PagesContainer renamed to IWAContainer, with its own IWAContainerError), and the shared TSTTableReader/TSTTable. SwiftTextPages and the new SwiftTextNumbers target now both depend on it; Numbers no longer lives inside the Pages target. - The IWA read primitives are now public; the generated *.gen.swift models gain `import SwiftTextIWA` (GenerateIWAModels.swift updated to emit it, so regeneration stays in sync). - ZIPFoundation moves with the container into SwiftTextIWA (still PAGES-gated). - Numbers reader + tests move to SwiftTextNumbers / SwiftTextNumbersTests. No behavior change: all 430 tests pass; `swifttext numbers` output unchanged. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Reads an Apple Keynote deck's slides — title, body bullets, presenter notes — reusing the SwiftTextIWA core. Keynote ships no schema and no KN.* models are vendored, so navigation is structural via IWAReferenceScanner: DocumentArchive(1) -> ShowArchive(2) -> ThemeArchive(10) owns the theme's layout (master) slides; the deck's slides are the SlideNodeArchive(4) the theme doesn't own. Each slide's PlaceholderArchive(7)/ShapeInfoArchive(2011) children carry the text (StorageArchive 2001); NoteArchive(15) carries notes. Theme-exclusion cleanly separates the deck's slides from the ~17 layout slides a theme ships. Empty/object-replacement (U+FFFC) placeholders are dropped. - New SwiftTextKeynote target + SwiftTextKeynoteTests (clean Sample.key fixture). - `swifttext keynote` CLI subcommand (-m/--markdown, --json, default text). - IWAReferenceScanner lifted from SwiftTextPages into SwiftTextIWA (generic IWA utility; now shared by the object graph, Numbers, and Keynote). Verified: 430 tests pass; both real .key decks parse to 21 slides each. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- TSTTableReader: decode every TileStorage tile, not just the first. A table
taller than tileSize (~256 rows) spans multiple tiles; each tile's rows are
offset by tileid * tileSize. The reader now flattens all tiles' row infos
with their base offset, so cells past the first tile are no longer dropped.
Regression-tested with a 300-row fixture (MultiTile.numbers): A260/A300 live
in the second tile and now read back correctly.
- Convert the new Numbers and Keynote tests from XCTest to Swift Testing
(@Suite/@Test/#expect/#require), per AGENTS.md ("Use Swift Testing
exclusively").
All 441 tests pass.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Older .numbers files store a single index.xml (the APXL-era schema) instead of the modern .iwa package. NumbersParser now falls back to a new NumbersLegacyParser when no Index/*.iwa is present — mirroring how PagesParser falls back to PagesLegacyParser. NumbersLegacyParser is an XMLParser delegate over `<sf:tabular-model>` -> `<sf:grid>` -> `<sf:datasource>`: a row-major cell stream of `<sf:t>` text (`<sf:ct sfa:s>`), `<sf:g>` empties, and `<sf:f>` formulas (`<sf:fo sf:fs>`). It produces the same TSTTable model as the modern reader, so all Markdown/HTML/JSON/TSV rendering is reused unchanged. Verified against a real '09 file (a 4x198 glossary) plus a committed minimal fixture (LegacySample.numbers). A '09 formula cell surfaces its source text (the format keeps no readable cached result); numeric cells are best-effort. 442 tests pass. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Lifts the '09 `<sf:tabular-model>` table extractor into SwiftTextIWA (IWALegacyTableReader), along with TSTTable.trimmedToUsedRange(), so the Numbers and Pages legacy readers share one decoder — the '09 table schema is identical across iWork apps. - PagesLegacyParser now extracts legacy tables too (previously body text only), appending each (cropped to its used range) as a table-bearing paragraph after the body, so `swifttext pages` surfaces tables from '09 .pages files. - NumbersParser's legacy branch uses the shared reader; the now-redundant NumbersLegacyParser and its unused error case are removed. Verified: a legacy .pages with an <sf:tabular-model> renders its table to Markdown; the real '09 .numbers glossary still reads unchanged. 443 tests pass. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Extends SwiftText from Pages-only to the full iWork family, factoring the shared machinery into a base module. All three apps use the same IWA container; the table model (
TST) is shared between Pages and Numbers, and the read primitives serve all three.Numbers reader
TSTTableReader— app-neutral decoder for the sharedTSTtable model. Reads frozen cell values (cached formula results) — no calculation engine. Handles both cell-storage generations: modern "BNC" (decimal128) and older pre-BNC (double, variable layout).NumbersParser/Document/File→ Markdown (GFM), HTML, JSON, TSV;Codablemodel.swifttext numbers(-m,--html,--json, default TSV)..numbersfiles (the miss is a legacy iWork '09 XML doc, correctly detected).Keynote reader
KeynoteParser/Document/File— deck slide text (title, body bullets, presenter notes) → Markdown, JSON, text. NoKN.*schema needed: navigates structurally (Document → Show → Themeowns layout slides; deck slides are the slide-nodes the theme doesn't own; placeholders/shapes carry the text). Theme-exclusion cleanly separates deck slides from the ~17 layout slides a theme ships.swifttext keynote(-m,--json, default text)..keydecks parse to 21 slides each.Shared core + cleanup
SwiftTextIWAmodule — Snappy, the wire-level Protobuf codec, the.iwacontainer/object store (IWAContainer),IWAReferenceScanner, andTSTTableReader.SwiftTextPages,SwiftTextNumbers, andSwiftTextKeynoteall depend on it. The generator (GenerateIWAModels.swift) now emitsimport SwiftTextIWA.TSTTableReader— deleted its duplicate cell-grid decoder (−172 lines).0xFFFFmarks an empty cell, not end-of-row — cells after a gap (a total in column C with B blank) were being dropped.Verification
430 tests pass (new SwiftTextNumbersTests / SwiftTextKeynoteTests + all existing). Lint clean. Clean fixtures (
Sample.numbers,Sample.key) authored via the apps — no personal data.Follow-ups (not in this PR)
🤖 Generated with Claude Code