fix: support legacy form checkboxes#998
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
All contributors have signed the CLA ✍️ ✅ Posted by the CLA bot. |
Greptile SummaryThis PR adds support for legacy Word
Confidence Score: 3/5The core parse/render/serialize logic for legacy FORMCHECKBOX fields is correct, but the raw w:ffData XML string travels through an HTML DOM attribute and is later written directly into DOCX XML output without re-validation, opening a clipboard-paste XML injection path into the saved file. The ffDataXml round-trip through FieldExtension.toDOM → HTML attribute → parseDOM → serializeComplexField template interpolation is the key concern. In the normal open/edit/save cycle the value originates from elementToXml and is trustworthy, but the parseDOM code path — used when ProseMirror reconstructs its document from HTML (clipboard paste, drag-and-drop) — makes formCheckboxFfDataXml attacker-controlled. A crafted paste could inject arbitrary XML into the DOCX on save. packages/core/src/docx/serializer/paragraphSerializer/content.ts (interpolation site) and packages/core/src/prosemirror/extensions/nodes/FieldExtension.ts (parseDOM read-back) need attention before merging.
|
| Filename | Overview |
|---|---|
| packages/core/src/docx/runParser.ts | Parses w:ffData/w:checkBox from the begin fldChar to extract checked state and raw XML; logic for bare w:checked and w:default is correct. |
| packages/core/src/docx/paragraphParser/content.ts | Propagates the parsed formCheckbox state from the begin fldChar into the finalized ComplexField correctly; local variable pattern avoids the pre-existing fldLock/dirty reset issue. |
| packages/core/src/docx/serializer/paragraphSerializer/content.ts | Directly interpolates ffDataXml into DOCX XML output; safe for the normal parse path but susceptible to XML injection if the value arrived via parseDOM from untrusted clipboard HTML. |
| packages/core/src/prosemirror/extensions/nodes/FieldExtension.ts | Stores raw ffDataXml in a DOM data-attribute via toDOM and reads it back verbatim in parseDOM; the DOM round-trip is safe for HTML encoding but introduces an untrusted-input path that feeds into XML serialization. |
| packages/core/src/prosemirror/conversion/toProseDoc/runs.ts | Correctly synthesizes checked/unchecked glyphs only when the cached field result is empty; formCheckbox metadata is forwarded to PM node attrs. |
| packages/core/src/prosemirror/conversion/fromProseDoc/runs.ts | Restores formCheckbox from PM node attrs without validation; the formCheckboxFfDataXml value is trusted from the parse path but untrusted from the parseDOM (clipboard) path. |
| packages/core/src/types/content/run.ts | Adds ffDataXml and formCheckboxChecked fields to FieldCharContent; clean type additions with accurate JSDoc. |
| packages/core/src/docx/fieldParser.ts | Adds formCheckbox to ComplexFieldContext and correctly resets/propagates it through resetComplexFieldContext and finalizeComplexField. |
Flowchart
%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[DOCX XML\nw:fldChar begin\n+ w:ffData] --> B[runParser.ts\nparseFieldChar]
B --> C[FieldCharContent\nffDataXml\nformCheckboxChecked]
C --> D[paragraphParser/content.ts\nparseParagraphContents]
D --> E[ComplexField\nformCheckbox.checked\nformCheckbox.ffDataXml]
E --> F[toProseDoc/runs.ts\nconvertField]
F --> G{displayText empty\n& result empty?}
G -- yes --> H[Synthesize glyph\n☐ or ☒]
G -- no --> I[Use cached result text]
H --> J[PM field node\nformCheckboxChecked\nformCheckboxFfDataXml]
I --> J
J --> K[FieldExtension toDOM\ndata-form-checkbox-*\nattributes]
K --> L[HTML DOM]
L --> M[FieldExtension parseDOM\nel.dataset.*]
M --> N[fromProseDoc/runs.ts\ncreateFieldFromNode]
N --> O[ComplexField\nformCheckbox restored]
O --> P[serializeComplexField\nffDataXml interpolated into XML]
P --> Q[DOCX output]
style P fill:#f99,stroke:#c33
style M fill:#fda,stroke:#c80
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
flowchart TD
A[DOCX XML\nw:fldChar begin\n+ w:ffData] --> B[runParser.ts\nparseFieldChar]
B --> C[FieldCharContent\nffDataXml\nformCheckboxChecked]
C --> D[paragraphParser/content.ts\nparseParagraphContents]
D --> E[ComplexField\nformCheckbox.checked\nformCheckbox.ffDataXml]
E --> F[toProseDoc/runs.ts\nconvertField]
F --> G{displayText empty\n& result empty?}
G -- yes --> H[Synthesize glyph\n☐ or ☒]
G -- no --> I[Use cached result text]
H --> J[PM field node\nformCheckboxChecked\nformCheckboxFfDataXml]
I --> J
J --> K[FieldExtension toDOM\ndata-form-checkbox-*\nattributes]
K --> L[HTML DOM]
L --> M[FieldExtension parseDOM\nel.dataset.*]
M --> N[fromProseDoc/runs.ts\ncreateFieldFromNode]
N --> O[ComplexField\nformCheckbox restored]
O --> P[serializeComplexField\nffDataXml interpolated into XML]
P --> Q[DOCX output]
style P fill:#f99,stroke:#c33
style M fill:#fda,stroke:#c80
Reviews (1): Last reviewed commit: "fix: support legacy form checkboxes" | Re-trigger Greptile
| const ffDataXml = field.formCheckbox?.ffDataXml ?? ''; | ||
| if (ffDataXml) { | ||
| parts.push(`<w:r>${rPrXml}<w:fldChar ${beginAttrs.join(' ')}>${ffDataXml}</w:fldChar></w:r>`); | ||
| } else { | ||
| parts.push(`<w:r>${rPrXml}<w:fldChar ${beginAttrs.join(' ')}/></w:r>`); | ||
| } |
There was a problem hiding this comment.
XML injection via untrusted
ffDataXml from clipboard paste
ffDataXml is stored verbatim in a DOM data-form-checkbox-ff-data-xml attribute (see FieldExtension.ts toDOM) and read back via el.dataset.formCheckboxFfDataXml in parseDOM. When a ProseMirror document is re-built from HTML (e.g., clipboard paste), this attribute is attacker-controlled: a pasted field node carrying data-form-checkbox-ff-data-xml="</w:fldChar></w:r><w:r><w:t>injected</w:t></w:r><w:r><w:fldChar w:fldCharType='begin'><w:ffData>" would inject arbitrary XML into the saved DOCX. CLAUDE.md explicitly calls this out: "XML injection on save — escape every attacker-derived string written back into XML on serialize/round-trip (escapeXml); never template a raw value into output markup." The fix is to re-parse ffDataXml through parseXmlDocument / findChild on the way out, verifying it is a well-formed w:ffData element, rather than trusting the raw string from node attributes.
| if (ffData && checkBox) { | ||
| const checked = findChild(checkBox, 'w', 'checked'); | ||
| const defaultChecked = findChild(checkBox, 'w', 'default'); | ||
| const val = checked | ||
| ? (getAttribute(checked, 'w', 'val') ?? '1') | ||
| : defaultChecked | ||
| ? getAttribute(defaultChecked, 'w', 'val') | ||
| : null; | ||
| fieldChar.ffDataXml = elementToXml(ffData); | ||
| fieldChar.formCheckboxChecked = val == null ? false : !/^(0|false|off)$/i.test(val); | ||
| } |
There was a problem hiding this comment.
ffDataXml set but formCheckboxChecked not set when checkBox is absent
If a FORMCHECKBOX field carries w:ffData that has no w:checkBox child (e.g., a FORMTEXT or FORMDROPDOWN field that also carries ffData), ffDataXml is never set and formCheckboxChecked is never set, so the guard in paragraphParser/content.ts (formCheckboxChecked != null) correctly prevents a non-checkbox field from getting formCheckbox metadata. The field-type check at finalization (instruction.trim().toUpperCase() === 'FORMCHECKBOX') provides a second safety net — both guards are working together correctly. This is a note to confirm the intent is clear: the outer if (ffData && checkBox) guard intentionally skips non-checkbox ffData forms.
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
fb925e1 to
d91fa30
Compare
Summary
FORMCHECKBOXfields when their cached field result is emptyw:ffDatathrough DOCX parse, ProseMirror conversion, and serialization<w:checked/>as checked for legacy checkbox fieldsFixes #972.
Validation
form.docxrenders the checkbox ( see the image)bun test packages/core/src/docx/paragraphParser.test.ts packages/core/src/prosemirror/conversion/__tests__/legacy-form-checkbox.test.tsbun run typecheckbun run api:checkNeed help on this PR? Tag
/codesmithwith what you need. Autofix is disabled.