Skip to content

fix: support legacy form checkboxes#998

Open
xcrong wants to merge 1 commit into
eigenpal:mainfrom
xcrong:fix/legacy-form-checkbox
Open

fix: support legacy form checkboxes#998
xcrong wants to merge 1 commit into
eigenpal:mainfrom
xcrong:fix/legacy-form-checkbox

Conversation

@xcrong

@xcrong xcrong commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Render legacy Word FORMCHECKBOX fields when their cached field result is empty
  • Preserve legacy checkbox w:ffData through DOCX parse, ProseMirror conversion, and serialization
  • Treat bare <w:checked/> as checked for legacy checkbox fields

Fixes #972.

Validation

  • Manually verified the issue attachment form.docx renders the checkbox ( see the image)
  • bun test packages/core/src/docx/paragraphParser.test.ts packages/core/src/prosemirror/conversion/__tests__/legacy-form-checkbox.test.ts
  • bun run typecheck
  • bun run api:check
image

View with Codesmith Autofix with Codesmith
Need help on this PR? Tag /codesmith with what you need. Autofix is disabled.

@vercel

vercel Bot commented Jun 23, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
docx-editor Ready Ready Preview, Comment Jun 23, 2026 10:46am

Request Review

@eigenpal-release-pal

eigenpal-release-pal Bot commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

All contributors have signed the CLA ✍️ ✅

Posted by the CLA bot.

@greptile-apps

greptile-apps Bot commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR adds support for legacy Word FORMCHECKBOX fields by parsing w:ffData/w:checkBox from the begin fldChar, synthesizing a checked () or unchecked () glyph when the cached field result is empty, and round-tripping the raw ffDataXml through ProseMirror node attributes and back into the DOCX serializer.

  • Parse: runParser.ts extracts formCheckboxChecked and ffDataXml from w:ffData/w:checkBox; paragraphParser/content.ts promotes these onto the ComplexField at finalization, guarded by an instruction check.
  • Render: toProseDoc/runs.ts synthesizes a Unicode glyph only when the cached field result is genuinely empty, avoiding conflicts with documents that already have a rendered result.
  • Serialize: serializeComplexField embeds ffDataXml directly inside the w:fldChar begin element to preserve the original form metadata on save.

Confidence Score: 3/5

The core parse/render/serialize logic for legacy FORMCHECKBOX fields is correct, but the raw w:ffData XML string travels through an HTML DOM attribute and is later written directly into DOCX XML output without re-validation, opening a clipboard-paste XML injection path into the saved file.

The ffDataXml round-trip through FieldExtension.toDOM → HTML attribute → parseDOM → serializeComplexField template interpolation is the key concern. In the normal open/edit/save cycle the value originates from elementToXml and is trustworthy, but the parseDOM code path — used when ProseMirror reconstructs its document from HTML (clipboard paste, drag-and-drop) — makes formCheckboxFfDataXml attacker-controlled. A crafted paste could inject arbitrary XML into the DOCX on save.

packages/core/src/docx/serializer/paragraphSerializer/content.ts (interpolation site) and packages/core/src/prosemirror/extensions/nodes/FieldExtension.ts (parseDOM read-back) need attention before merging.

Security Review

  • XML injection on DOCX save (paragraphSerializer/content.ts line 180, FieldExtension.ts parseDOM): ffDataXml is stored verbatim in a data-form-checkbox-ff-data-xml DOM attribute via toDOM and read back without validation in parseDOM. When ProseMirror parses HTML from clipboard paste, this attribute is attacker-controlled. A crafted paste with a malicious data-form-checkbox-ff-data-xml value would be interpolated directly into the DOCX XML output, enabling XML injection into the saved file.

Important Files Changed

Filename Overview
packages/core/src/docx/runParser.ts Parses w:ffData/w:checkBox from the begin fldChar to extract checked state and raw XML; logic for bare w:checked and w:default is correct.
packages/core/src/docx/paragraphParser/content.ts Propagates the parsed formCheckbox state from the begin fldChar into the finalized ComplexField correctly; local variable pattern avoids the pre-existing fldLock/dirty reset issue.
packages/core/src/docx/serializer/paragraphSerializer/content.ts Directly interpolates ffDataXml into DOCX XML output; safe for the normal parse path but susceptible to XML injection if the value arrived via parseDOM from untrusted clipboard HTML.
packages/core/src/prosemirror/extensions/nodes/FieldExtension.ts Stores raw ffDataXml in a DOM data-attribute via toDOM and reads it back verbatim in parseDOM; the DOM round-trip is safe for HTML encoding but introduces an untrusted-input path that feeds into XML serialization.
packages/core/src/prosemirror/conversion/toProseDoc/runs.ts Correctly synthesizes checked/unchecked glyphs only when the cached field result is empty; formCheckbox metadata is forwarded to PM node attrs.
packages/core/src/prosemirror/conversion/fromProseDoc/runs.ts Restores formCheckbox from PM node attrs without validation; the formCheckboxFfDataXml value is trusted from the parse path but untrusted from the parseDOM (clipboard) path.
packages/core/src/types/content/run.ts Adds ffDataXml and formCheckboxChecked fields to FieldCharContent; clean type additions with accurate JSDoc.
packages/core/src/docx/fieldParser.ts Adds formCheckbox to ComplexFieldContext and correctly resets/propagates it through resetComplexFieldContext and finalizeComplexField.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[DOCX XML\nw:fldChar begin\n+ w:ffData] --> B[runParser.ts\nparseFieldChar]
    B --> C[FieldCharContent\nffDataXml\nformCheckboxChecked]
    C --> D[paragraphParser/content.ts\nparseParagraphContents]
    D --> E[ComplexField\nformCheckbox.checked\nformCheckbox.ffDataXml]
    E --> F[toProseDoc/runs.ts\nconvertField]
    F --> G{displayText empty\n& result empty?}
    G -- yes --> H[Synthesize glyph\n☐ or ☒]
    G -- no --> I[Use cached result text]
    H --> J[PM field node\nformCheckboxChecked\nformCheckboxFfDataXml]
    I --> J
    J --> K[FieldExtension toDOM\ndata-form-checkbox-*\nattributes]
    K --> L[HTML DOM]
    L --> M[FieldExtension parseDOM\nel.dataset.*]
    M --> N[fromProseDoc/runs.ts\ncreateFieldFromNode]
    N --> O[ComplexField\nformCheckbox restored]
    O --> P[serializeComplexField\nffDataXml interpolated into XML]
    P --> Q[DOCX output]

    style P fill:#f99,stroke:#c33
    style M fill:#fda,stroke:#c80
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
flowchart TD
    A[DOCX XML\nw:fldChar begin\n+ w:ffData] --> B[runParser.ts\nparseFieldChar]
    B --> C[FieldCharContent\nffDataXml\nformCheckboxChecked]
    C --> D[paragraphParser/content.ts\nparseParagraphContents]
    D --> E[ComplexField\nformCheckbox.checked\nformCheckbox.ffDataXml]
    E --> F[toProseDoc/runs.ts\nconvertField]
    F --> G{displayText empty\n& result empty?}
    G -- yes --> H[Synthesize glyph\n☐ or ☒]
    G -- no --> I[Use cached result text]
    H --> J[PM field node\nformCheckboxChecked\nformCheckboxFfDataXml]
    I --> J
    J --> K[FieldExtension toDOM\ndata-form-checkbox-*\nattributes]
    K --> L[HTML DOM]
    L --> M[FieldExtension parseDOM\nel.dataset.*]
    M --> N[fromProseDoc/runs.ts\ncreateFieldFromNode]
    N --> O[ComplexField\nformCheckbox restored]
    O --> P[serializeComplexField\nffDataXml interpolated into XML]
    P --> Q[DOCX output]

    style P fill:#f99,stroke:#c33
    style M fill:#fda,stroke:#c80
Loading

Reviews (1): Last reviewed commit: "fix: support legacy form checkboxes" | Re-trigger Greptile

Comment on lines +178 to +183
const ffDataXml = field.formCheckbox?.ffDataXml ?? '';
if (ffDataXml) {
parts.push(`<w:r>${rPrXml}<w:fldChar ${beginAttrs.join(' ')}>${ffDataXml}</w:fldChar></w:r>`);
} else {
parts.push(`<w:r>${rPrXml}<w:fldChar ${beginAttrs.join(' ')}/></w:r>`);
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 security XML injection via untrusted ffDataXml from clipboard paste

ffDataXml is stored verbatim in a DOM data-form-checkbox-ff-data-xml attribute (see FieldExtension.ts toDOM) and read back via el.dataset.formCheckboxFfDataXml in parseDOM. When a ProseMirror document is re-built from HTML (e.g., clipboard paste), this attribute is attacker-controlled: a pasted field node carrying data-form-checkbox-ff-data-xml="</w:fldChar></w:r><w:r><w:t>injected</w:t></w:r><w:r><w:fldChar w:fldCharType='begin'><w:ffData>" would inject arbitrary XML into the saved DOCX. CLAUDE.md explicitly calls this out: "XML injection on save — escape every attacker-derived string written back into XML on serialize/round-trip (escapeXml); never template a raw value into output markup." The fix is to re-parse ffDataXml through parseXmlDocument / findChild on the way out, verifying it is a well-formed w:ffData element, rather than trusting the raw string from node attributes.

Comment on lines +531 to +541
if (ffData && checkBox) {
const checked = findChild(checkBox, 'w', 'checked');
const defaultChecked = findChild(checkBox, 'w', 'default');
const val = checked
? (getAttribute(checked, 'w', 'val') ?? '1')
: defaultChecked
? getAttribute(defaultChecked, 'w', 'val')
: null;
fieldChar.ffDataXml = elementToXml(ffData);
fieldChar.formCheckboxChecked = val == null ? false : !/^(0|false|off)$/i.test(val);
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 ffDataXml set but formCheckboxChecked not set when checkBox is absent

If a FORMCHECKBOX field carries w:ffData that has no w:checkBox child (e.g., a FORMTEXT or FORMDROPDOWN field that also carries ffData), ffDataXml is never set and formCheckboxChecked is never set, so the guard in paragraphParser/content.ts (formCheckboxChecked != null) correctly prevents a non-checkbox field from getting formCheckbox metadata. The field-type check at finalization (instruction.trim().toUpperCase() === 'FORMCHECKBOX') provides a second safety net — both guards are working together correctly. This is a note to confirm the intent is clear: the outer if (ffData && checkBox) guard intentionally skips non-checkbox ffData forms.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[bug] Checkbox is not rendered/supported

1 participant