Skip to content

feat: add configurable furniture_mode options to markdown serializer#666

Open
riyo264 wants to merge 6 commits into
docling-project:mainfrom
riyo264:feature/furniture-serialization-options
Open

feat: add configurable furniture_mode options to markdown serializer#666
riyo264 wants to merge 6 commits into
docling-project:mainfrom
riyo264:feature/furniture-serialization-options

Conversation

@riyo264

@riyo264 riyo264 commented Jun 29, 2026

Copy link
Copy Markdown

Description

Context

Fixes #665

Following up on the architectural discussion during the MS Word backend update in the main docling repository, this PR moves the configuration and handling of document furniture (headers and footers) to the serialization layer in docling-core.

Currently, items on the FURNITURE layer are omitted by default during Markdown serialization to prevent repetitive page layout clutter. This PR introduces a configurable parameter to let users selectively include headers and footers, which is highly beneficial for documents where margin metadata (like document versions, classification markers, or template notices) carries crucial context.

Changes Introduced

  • Added FurnitureMode Enum: Defined three core serialization modes:
    • none (Default): Retains existing behavior by dropping all furniture elements.
    • all: Renders every header/footer sequentially as they appear across page boundaries.
    • distinct: Implements text-based deduplication, serializing each unique header/footer exactly once when first encountered.
  • Updated MarkdownParams: Added the furniture_mode field and introduced a Pydantic @model_validator(mode="after") to dynamically inject ContentLayer.FURNITURE into the active processing layers whenever a rendering mode is active.
  • Intercepted MarkdownDocSerializer.serialize: Built a stateful tracking mechanism (self._seen_furniture) directly onto the document serializer instance to handle the distinct text-matching logic cleanly during the tree-walk traversal.
  • Test Coverage: Added a comprehensive unit test suite in test/test_serialization.py (test_md_furniture_modes) verifying the behavioral purity of all three configuration modes.

Verification & Testing

Ran the serialization test suite locally using Python 3.14:

pytest test/test_serialization.py

@github-actions

github-actions Bot commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

DCO Check Passed

Thanks @riyo264, all your commits are properly signed off. 🎉

@mergify

mergify Bot commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

Merge Protections

🔴 1 of 2 protections blocking · waiting on 👀 reviews

Protection Waiting on
🔴 Require two reviewer for test updates 👀 reviews
🟢 Enforce conventional commit

🔴 Require two reviewer for test updates

Waiting for

  • #approved-reviews-by >= 2
This rule is failing.

When test data is updated, we require two reviewers

  • #approved-reviews-by >= 2

Show 1 satisfied protection

🟢 Enforce conventional commit

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

  • title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert)(?:\(.+\))?(!)?:

riyo264 added 3 commits June 29, 2026 09:52
I, riyo264 <supriyodhani50@gmail.com>, hereby add my Signed-off-by to this commit: 31aaaeb
I, Supriyo <138874454+riyo264@users.noreply.github.com>, hereby add my Signed-off-by to this commit: 70f578a

Signed-off-by: riyo264 <supriyodhani50@gmail.com>
I, riyo264 <supriyodhani50@gmail.com>, hereby add my Signed-off-by to this commit: 31aaaeb
I, Supriyo <138874454+riyo264@users.noreply.github.com>, hereby add my Signed-off-by to this commit: 70f578a

Signed-off-by: riyo264 <supriyodhani50@gmail.com>
I, riyo264 <supriyodhani50@gmail.com>, hereby add my Signed-off-by to this commit: 31aaaeb
I, Supriyo <138874454+riyo264@users.noreply.github.com>, hereby add my Signed-off-by to this commit: 70f578a

Signed-off-by: riyo264 <supriyodhani50@gmail.com>
@riyo264 riyo264 changed the title Add configurable furniture_mode options to markdown serializer feat: add configurable furniture_mode options to markdown serializer Jun 29, 2026
Signed-off-by: riyo264 <supriyodhani50@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature Request] Add configurable serialization options for headers and footers in Markdown output

1 participant