Skip to content

iWork readers: Numbers + Keynote on a shared SwiftTextIWA core#34

Merged
odrobnik merged 8 commits into
mainfrom
claude/numbers-reader
Jun 18, 2026
Merged

iWork readers: Numbers + Keynote on a shared SwiftTextIWA core#34
odrobnik merged 8 commits into
mainfrom
claude/numbers-reader

Conversation

@odrobnik

@odrobnik odrobnik commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Extends SwiftText from Pages-only to the full iWork family, factoring the shared machinery into a base module. All three apps use the same IWA container; the table model (TST) is shared between Pages and Numbers, and the read primitives serve all three.

Numbers reader

  • TSTTableReader — app-neutral decoder for the shared TST table model. Reads frozen cell values (cached formula results) — no calculation engine. Handles both cell-storage generations: modern "BNC" (decimal128) and older pre-BNC (double, variable layout).
  • NumbersParser/Document/File → Markdown (GFM), HTML, JSON, TSV; Codable model.
  • swifttext numbers (-m, --html, --json, default TSV).
  • Verified on 29/30 real .numbers files (the miss is a legacy iWork '09 XML doc, correctly detected).

Keynote reader

  • KeynoteParser/Document/File — deck slide text (title, body bullets, presenter notes) → Markdown, JSON, text. No KN.* schema needed: navigates structurally (Document → Show → Theme owns layout slides; deck slides are the slide-nodes the theme doesn't own; placeholders/shapes carry the text). Theme-exclusion cleanly separates deck slides from the ~17 layout slides a theme ships.
  • swifttext keynote (-m, --json, default text).
  • Verified: both real .key decks parse to 21 slides each.

Shared core + cleanup

  • New SwiftTextIWA module — Snappy, the wire-level Protobuf codec, the .iwa container/object store (IWAContainer), IWAReferenceScanner, and TSTTableReader. SwiftTextPages, SwiftTextNumbers, and SwiftTextKeynote all depend on it. The generator (GenerateIWAModels.swift) now emits import SwiftTextIWA.
  • Pages converged onto TSTTableReader — deleted its duplicate cell-grid decoder (−172 lines).
  • Bug fix (Pages + the shared reader): 0xFFFF marks an empty cell, not end-of-row — cells after a gap (a total in column C with B blank) were being dropped.

Verification

430 tests pass (new SwiftTextNumbersTests / SwiftTextKeynoteTests + all existing). Lint clean. Clean fixtures (Sample.numbers, Sample.key) authored via the apps — no personal data.

Follow-ups (not in this PR)

  • Full pre-BNC fidelity (a flag-driven cell-record walker for exotic older layouts; common shapes already covered).
  • Keynote slide ordering is currently document-discovery order (fine for typical decks; a slide-tree traversal would pin nested/grouped order exactly).

🤖 Generated with Claude Code

odrobnik and others added 3 commits June 18, 2026 12:16
Numbers shares the IWA container and the TST table model with Pages, so
this reuses the existing package/Snappy/protobuf/object-graph machinery
and adds only the Numbers-specific navigation (TN.DocumentArchive ->
sheets -> TN.SheetArchive -> drawables -> TST.TableInfoArchive).

- TSTTableReader: app-neutral decoder for the shared TST table model
  (TableModelArchive -> DataStore -> tiles -> cells), reading frozen cell
  values (the cached formula results) so tables read as static text with
  no calculation engine. Handles both modern ("BNC", decimal128) and
  older pre-BNC (double) cell storage.
- NumbersParser/NumbersDocument/NumbersFile: read a .numbers file into
  sheets of tables and render Markdown (GFM), HTML, or TSV.
- `swifttext numbers` CLI subcommand mirroring `pages`.
- Fix a latent cell-offset bug in both the new reader and PagesParser:
  0xFFFF marks an empty cell, not end-of-row, so cells after a gap (a
  total in column C with B blank) were being dropped.

Verified against 29/30 real .numbers files (the one miss is a legacy
iWork '09 XML document, correctly detected) plus a bundled fixture test.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
TSTTable and NumbersDocument now conform to Codable, and NumbersFile.json()
emits the document (sheets of used-range tables) as JSON. This gives LLM
agents and other programmatic callers a structured representation, not just
rendered Markdown/HTML. Exposed via `swifttext numbers --json`.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
PagesParser.tableGrid kept its own copy of the TST cell-grid decoder
(tile walk, decimal128, cell offsets), duplicating TSTTableReader. It now
resolves the attachment -> TableInfoArchive -> model chain and defers to
TSTTableReader, passing Pages' own rich-cell renderer (inline bold/italic
from char-style runs) and rich-cell alignment as closures. The
decimal128/frozen-value/cell-offset helpers are deleted from PagesParser
(-172 lines net); PagesCellValueTests now exercises them on TSTTableReader.

No behavior change: all 93 SwiftTextPages tests pass unchanged.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8a31e6edda

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread Tests/SwiftTextNumbersTests/NumbersFileTests.swift Outdated
Comment thread Sources/SwiftTextIWA/TSTTableReader.swift Outdated
odrobnik and others added 2 commits June 18, 2026 12:52
Splits the iWork read core out of SwiftTextPages into a new SwiftTextIWA
library — Snappy, the wire-level Protobuf codec, the .iwa container/object
store (PagesContainer renamed to IWAContainer, with its own IWAContainerError),
and the shared TSTTableReader/TSTTable. SwiftTextPages and the new
SwiftTextNumbers target now both depend on it; Numbers no longer lives inside
the Pages target.

- The IWA read primitives are now public; the generated *.gen.swift models
  gain `import SwiftTextIWA` (GenerateIWAModels.swift updated to emit it, so
  regeneration stays in sync).
- ZIPFoundation moves with the container into SwiftTextIWA (still PAGES-gated).
- Numbers reader + tests move to SwiftTextNumbers / SwiftTextNumbersTests.

No behavior change: all 430 tests pass; `swifttext numbers` output unchanged.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Reads an Apple Keynote deck's slides — title, body bullets, presenter notes —
reusing the SwiftTextIWA core. Keynote ships no schema and no KN.* models are
vendored, so navigation is structural via IWAReferenceScanner:

  DocumentArchive(1) -> ShowArchive(2) -> ThemeArchive(10) owns the theme's
  layout (master) slides; the deck's slides are the SlideNodeArchive(4) the
  theme doesn't own. Each slide's PlaceholderArchive(7)/ShapeInfoArchive(2011)
  children carry the text (StorageArchive 2001); NoteArchive(15) carries notes.

Theme-exclusion cleanly separates the deck's slides from the ~17 layout slides
a theme ships. Empty/object-replacement (U+FFFC) placeholders are dropped.

- New SwiftTextKeynote target + SwiftTextKeynoteTests (clean Sample.key fixture).
- `swifttext keynote` CLI subcommand (-m/--markdown, --json, default text).
- IWAReferenceScanner lifted from SwiftTextPages into SwiftTextIWA (generic IWA
  utility; now shared by the object graph, Numbers, and Keynote).

Verified: 430 tests pass; both real .key decks parse to 21 slides each.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@odrobnik odrobnik changed the title Numbers (.numbers) reader: tables as Markdown/HTML/JSON/TSV iWork readers: Numbers + Keynote on a shared SwiftTextIWA core Jun 18, 2026
odrobnik and others added 3 commits June 18, 2026 13:24
- TSTTableReader: decode every TileStorage tile, not just the first. A table
  taller than tileSize (~256 rows) spans multiple tiles; each tile's rows are
  offset by tileid * tileSize. The reader now flattens all tiles' row infos
  with their base offset, so cells past the first tile are no longer dropped.
  Regression-tested with a 300-row fixture (MultiTile.numbers): A260/A300 live
  in the second tile and now read back correctly.
- Convert the new Numbers and Keynote tests from XCTest to Swift Testing
  (@Suite/@Test/#expect/#require), per AGENTS.md ("Use Swift Testing
  exclusively").

All 441 tests pass.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Older .numbers files store a single index.xml (the APXL-era schema) instead
of the modern .iwa package. NumbersParser now falls back to a new
NumbersLegacyParser when no Index/*.iwa is present — mirroring how
PagesParser falls back to PagesLegacyParser.

NumbersLegacyParser is an XMLParser delegate over `<sf:tabular-model>` ->
`<sf:grid>` -> `<sf:datasource>`: a row-major cell stream of `<sf:t>` text
(`<sf:ct sfa:s>`), `<sf:g>` empties, and `<sf:f>` formulas (`<sf:fo sf:fs>`).
It produces the same TSTTable model as the modern reader, so all
Markdown/HTML/JSON/TSV rendering is reused unchanged.

Verified against a real '09 file (a 4x198 glossary) plus a committed minimal
fixture (LegacySample.numbers). A '09 formula cell surfaces its source text
(the format keeps no readable cached result); numeric cells are best-effort.

442 tests pass.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Lifts the '09 `<sf:tabular-model>` table extractor into SwiftTextIWA
(IWALegacyTableReader), along with TSTTable.trimmedToUsedRange(), so the
Numbers and Pages legacy readers share one decoder — the '09 table schema
is identical across iWork apps.

- PagesLegacyParser now extracts legacy tables too (previously body text
  only), appending each (cropped to its used range) as a table-bearing
  paragraph after the body, so `swifttext pages` surfaces tables from '09
  .pages files.
- NumbersParser's legacy branch uses the shared reader; the now-redundant
  NumbersLegacyParser and its unused error case are removed.

Verified: a legacy .pages with an <sf:tabular-model> renders its table to
Markdown; the real '09 .numbers glossary still reads unchanged. 443 tests pass.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@odrobnik odrobnik merged commit ed39572 into main Jun 18, 2026
6 checks passed
@odrobnik odrobnik deleted the claude/numbers-reader branch June 18, 2026 19:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant