Skip to content

Improve DOCX render fidelity#993

Open
aldrinjenson wants to merge 15 commits into
eigenpal:mainfrom
aldrinjenson:codex/docx-render-fidelity
Open

Improve DOCX render fidelity#993
aldrinjenson wants to merge 15 commits into
eigenpal:mainfrom
aldrinjenson:codex/docx-render-fidelity

Conversation

@aldrinjenson

@aldrinjenson aldrinjenson commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Improve cached diagram label placement by using the authored text transform and preserving text insets/color metadata.
  • Render section-specific headers and footers per page so active navigation labels follow the page section.
  • Keep paragraph-relative footer floats anchored to the footer band instead of shifting them into body content.
  • Add focused regressions for diagram label geometry, footer float anchoring, and section-aware header rendering.

Validation

  • bun test packages/core/src/layout-painter/tests/section-header-rendering.test.ts packages/core/src/docx/tests/diagram-drawing.test.ts packages/core/src/layout-painter/tests/header-footer-spacing.test.ts packages/core/src/layout-painter/tests/hf-float-textbox-no-advance.test.ts packages/core/src/docx/tests/tiff-preview.test.ts packages/core/src/layout-bridge/tests/toFlowBlocks-image-sizing.test.ts packages/core/src/layout-engine/tests/anchored-textbox-pagebreak.test.ts packages/core/src/layout-painter/tests/floating-image-layer-zorder.test.ts
  • bun run --filter '@eigenpal/docx-editor-core' typecheck
  • bun run --filter '@eigenpal/docx-editor-react' typecheck
  • bun run --filter '@eigenpal/docx-editor-vue' typecheck
  • Pre-commit hook: workspace typecheck, parity gates, API snapshot check, Prettier, ESLint
  • Local visual comparison against a Word-rendered baseline

@vercel

vercel Bot commented Jun 22, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
docx-editor Ready Ready Preview, Comment Jun 23, 2026 2:49pm

Request Review

@eigenpal-release-pal

eigenpal-release-pal Bot commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

All contributors have signed the CLA ✍️ ✅

Posted by the CLA bot.

@greptile-apps

greptile-apps Bot commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR improves DOCX render fidelity across three areas: multi-column layout, image parsing, and browser-safe image painting. It also adds regression test coverage for the fixed scenarios.

  • Multi-column layout: Paragraph and table fragments now use paginator.columnWidth instead of the pre-captured contentWidth, and table justification offsets are computed from result.x (returned post-addFragment) rather than the column x-coordinate captured before the paginator may have advanced columns, fixing incorrect fragment placement in multi-column sections.
  • Image parsing: parseInline/parseAnchor return null when no blip rId is found, skipping non-picture DrawingML shapes cleanly. The previously unused _media parameter in parseTextBoxContent is now forwarded to the paragraph parser, fixing media resolution for images nested inside text-box paragraphs.
  • Browser-safe painting: A new applyImageSourceForBrowser function centralises the browser-format check across all six image render sites (inline, block, floating, header/footer, watermark, and floating layer). TIFF, EMF, and WMF data-URLs are now hidden (visibility: hidden) instead of producing broken-image icons, while the model data is preserved for round-trip/export. Note: a changeset file (bun changeset) covering @eigenpal/docx-editor-core is required per CLAUDE.md for any code PR that touches production sources — none was added in this branch.

Confidence Score: 4/5

Safe to merge after adding a changeset file; the layout and painter changes are well-tested, and the refactoring is mechanically consistent across all six image render sites.

The three independent fix areas each have matching synthetic test coverage and the changes are narrowly scoped. The missing changeset file is the only blocker — without it the Release workflow will produce an incorrect CHANGELOG entry or fail changeset status in CI. The two inline comments are quality observations that do not affect correctness.

packages/core/src/layout-painter/renderImage.ts (new exported API surface) and packages/core/src/layout-engine/index.ts (column-layout geometry change) are the most impactful files and warrant a careful read. All other changed files are straightforward call-site updates or tests.

Important Files Changed

Filename Overview
packages/core/src/layout-painter/renderImage.ts New applyImageSourceForBrowser centralises the browser-render guard; TIFF/EMF/WMF data-URLs are hidden instead of producing broken-image icons. Minor semantic issue: undefined src incorrectly sets data-unsupported-image.
packages/core/src/layout-engine/index.ts Multi-column fixes: paragraph/table fragments now use paginator.columnWidth instead of the captured content width, and table justification is computed from result.x (post-addFragment) for correctness. Minor: fragment.width is initialised then immediately overwritten.
packages/core/src/docx/imageParser.ts parseInline/parseAnchor now return null when no blip rId is found, cleanly skipping non-picture DrawingML shapes; parseDrawing already returned `null
packages/core/src/docx/textBoxParser.ts Previously unused _media parameter is now named media and forwarded to the paragraph parser, fixing image resolution for pictures nested inside text-box paragraphs.
packages/core/src/layout-painter/renderWatermark.ts Replaces direct img.src = wm.dataUrl with applyImageSourceForBrowser, adding the browser-format guard to the watermark paint path that previously had no validation.
packages/core/src/layout-painter/renderPage/headerFooter.ts Floating header/footer images now go through applyImageSourceForBrowser instead of direct src assignment; straightforward defensive fix.
packages/core/src/layout-painter/floatingImageLayer.ts Same browser-format guard applied to the floating image layer renderer; one-line change.
packages/core/src/layout-painter/renderParagraph/runs.ts applyImageSourceForBrowser replaces direct img.src assignment in both renderInlineImageRun and renderBlockImage; logic is otherwise unchanged.
packages/core/src/layout-engine/integration/section-breaks.test.ts New integration tests assert that paragraph fragments in two-column sections use columnWidth and that table fragments receive the correct post-addFragment column x-coordinate.
docs/render-fidelity-qa.md New QA checklist for render-fidelity investigations; documentation-only, no code risk.

Sequence Diagram

%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
    participant P as DOCX Parser
    participant IP as imageParser
    participant TB as textBoxParser
    participant LE as layoutEngine
    participant RI as renderImage
    participant DOM as DOM

    P->>IP: parseImage(drawingEl, rels, media)
    IP->>IP: parseAnchor / parseInline
    alt no blip rId (non-picture shape)
        IP-->>P: null (skip)
    else picture found
        IP->>IP: resolveImageData(rId, rels, media)
        IP-->>P: "Image { rId, src, ... }"
    end

    P->>TB: parseTextBoxContent(..., rels, media)
    Note over TB: media now forwarded to parseParagraph
    TB->>IP: parseImage (nested picture)
    IP-->>TB: "Image { src: resolved dataUrl }"

    P->>LE: layoutDocument(blocks, measures, options)
    LE->>LE: layoutParagraph / layoutTable
    Note over LE: width = paginator.columnWidth
    Note over LE: table x = result.x + justification
    LE-->>P: "Layout { pages, fragments }"

    LE->>RI: renderImageFragment / renderFloatingImagesLayer
    RI->>RI: applyImageSourceForBrowser(img, src)
    alt src is data:image/tiff or EMF/WMF
        RI->>DOM: removeAttribute(src), visibility:hidden
    else src is data:image/png, blob:, https:
        RI->>DOM: "img.src = src"
    end
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
    participant P as DOCX Parser
    participant IP as imageParser
    participant TB as textBoxParser
    participant LE as layoutEngine
    participant RI as renderImage
    participant DOM as DOM

    P->>IP: parseImage(drawingEl, rels, media)
    IP->>IP: parseAnchor / parseInline
    alt no blip rId (non-picture shape)
        IP-->>P: null (skip)
    else picture found
        IP->>IP: resolveImageData(rId, rels, media)
        IP-->>P: "Image { rId, src, ... }"
    end

    P->>TB: parseTextBoxContent(..., rels, media)
    Note over TB: media now forwarded to parseParagraph
    TB->>IP: parseImage (nested picture)
    IP-->>TB: "Image { src: resolved dataUrl }"

    P->>LE: layoutDocument(blocks, measures, options)
    LE->>LE: layoutParagraph / layoutTable
    Note over LE: width = paginator.columnWidth
    Note over LE: table x = result.x + justification
    LE-->>P: "Layout { pages, fragments }"

    LE->>RI: renderImageFragment / renderFloatingImagesLayer
    RI->>RI: applyImageSourceForBrowser(img, src)
    alt src is data:image/tiff or EMF/WMF
        RI->>DOM: removeAttribute(src), visibility:hidden
    else src is data:image/png, blob:, https:
        RI->>DOM: "img.src = src"
    end
Loading

Comments Outside Diff (1)

  1. packages/core/src/layout-engine/index.ts, line 381-397 (link)

    P2 fragment.width initialised then unconditionally overwritten

    In the empty-paragraph path the fragment is created with width: paginator.columnWidth and then fragment.width = paginator.columnWidth is written again immediately after addFragment. The post-addFragment write is the authoritative one (in case ensureFits advanced to a page with different geometry), so the constructor value is always dead. Setting width: 0 (or any placeholder) at construction and relying solely on the post-addFragment assignment would remove the ambiguity and match how x and y are handled. The same pattern recurs in the non-empty paragraph loop at line 458/475.

Reviews (1): Last reviewed commit: "Improve DOCX render fidelity" | Re-trigger Greptile

Comment thread packages/core/src/layout-painter/renderImage.ts
@jedrazb

jedrazb commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

LGTM overall — render-fidelity logic is sound, typecheck clean, tests green, React/Vue parity holds.

One security finding to fix before merge:

HIGH — zero-click DoS in the new TIFF decoder (packages/core/src/docx/tiffPreview.ts:137). concatenateStrips allocates new Uint8Array(total) where total is summed straight from the file's StripByteCounts with no upper clamp, and the per-strip bounds check happens after the allocation. A ~138-byte crafted TIFF with StripByteCounts = 0x40000000 forces a 1 GiB allocation (near 2^32 → RangeError that propagates uncaught through buildMediaMap and aborts document open). Reproduced locally.

Fix: clamp total against bytes.length (strip data can't exceed the file) before allocating, add a width*height sanity cap before the BMP allocation (line 164), and wrap the decode in try/catch → return undefined. Per CLAUDE.md: never feed a file-supplied number straight into allocation.

stripByteCounts: number[]
): Uint8Array {
const total = stripByteCounts.reduce((sum, count) => sum + Math.max(0, count), 0);
const out = new Uint8Array(total);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security (HIGH): unbounded allocation from attacker-controlled bytes.

total is the raw sum of the file's StripByteCounts (each a u32, no upper clamp), and the per-strip bounds check at line 142 runs after this allocation. A ~138-byte crafted TIFF with StripByteCounts = 0x40000000 forces new Uint8Array(1 GiB); values near 2^32 throw RangeError that propagates uncaught through buildMediaMap (parser.ts:403/452, no try/catch) and aborts document open. Zero-click DoS on untrusted input. Reproduced locally.

Suggest clamping before allocating, e.g.:

const total = stripByteCounts.reduce((sum, count) => sum + Math.max(0, count), 0);
if (total > bytes.length) return new Uint8Array(0); // strip data can't exceed the source file
const out = new Uint8Array(total);

Also add a width*height sanity cap before the BMP allocation (line 164) and wrap the decode in try/catch → return undefined. Per CLAUDE.md: never feed a file-supplied number straight into allocation.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, fixed in 6e324d5a.

The TIFF preview path now validates/clamps strip byte counts before allocation, validates strip offsets/counts before copying, caps preview dimensions before BMP allocation, rejects invalid samples-per-pixel values, and wraps preview decoding so malformed TIFF input returns undefined instead of aborting document open. I also added regression tests for oversized strip byte counts, oversized dimensions, and malformed buffers.

Latest PR head is green for lint, test, build, CLA, and Vercel.

…elity

# Conflicts:
#	packages/core/src/docx/parser.ts
#	packages/core/src/prosemirror/conversion/toProseDoc/textbox.ts
@aldrinjenson

Copy link
Copy Markdown
Contributor Author

Addressed the remaining review feedback on this PR:

  • Hardened TIFF preview decoding in 6e324d5a with pre-allocation strip validation, preview dimension caps, invalid sample-count rejection, and safe malformed-input handling.
  • Added TIFF regression coverage for oversized strip byte counts, oversized dimensions, and malformed input.
  • Confirmed the earlier bot notes are now covered: changeset is present, missing image sources are no longer marked as unsupported images, and paragraph fragments use placeholder width before post-layout geometry assignment.
  • Regenerated the generated API reports in 094be4c6 after merging main, which fixed the build check drift.

Latest PR head is green for lint, test, build, CLA, and Vercel.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants