Skip to content

feat(document): add change_type field to TextItem#667

Open
a-huk wants to merge 1 commit into
docling-project:mainfrom
a-huk:feat/text-item-change-type
Open

feat(document): add change_type field to TextItem#667
a-huk wants to merge 1 commit into
docling-project:mainfrom
a-huk:feat/text-item-change-type

Conversation

@a-huk

@a-huk a-huk commented Jun 29, 2026

Copy link
Copy Markdown

Adds an optional change_type field to TextItem with values 'inserted' or 'deleted' (None means normal/final text). Also threads the parameter through DoclingDocument.add_text() so backends can set it at creation time.

Needed by the docling DOCX backend to represent tracked changes in 'raw' mode without hijacking underline/strikethrough formatting.

Adds an optional change_type field to TextItem with values 'inserted'
or 'deleted' (None means normal/final text). Also threads the parameter
through DoclingDocument.add_text() so backends can set it at creation time.

Needed by the docling DOCX backend to represent tracked changes in 'raw'
mode without hijacking underline/strikethrough formatting.
@github-actions

Copy link
Copy Markdown
Contributor

DCO Check Failed

Hi @a-huk, your pull request has failed the Developer Certificate of Origin (DCO) check.

This repository supports remediation commits, so you can fix this without rewriting history — but you must follow the required message format.


🛠 Quick Fix: Add a remediation commit

Run this command:

git commit --allow-empty -s -m "DCO Remediation Commit for a-huk <huk.adam.g@gmail.com>

I, a-huk <huk.adam.g@gmail.com>, hereby add my Signed-off-by to this commit: 224777f61b0b6851d5552251fd0c8e9e055d0a66"
git push

🔧 Advanced: Sign off each commit directly

For the latest commit:

git commit --amend --signoff
git push --force-with-lease

For multiple commits:

git rebase --signoff origin/main
git push --force-with-lease

More info: DCO check report

@mergify

mergify Bot commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

Merge Protections

🟢 Merge protection satisfied — ready to merge.

Show 1 satisfied protection

🟢 Enforce conventional commit

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

  • title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert)(?:\(.+\))?(!)?:

@ceberam ceberam left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @a-huk , it's a good starting draft, but there are quite a few things that need attention:

  • The golden YAML file test/data/docling_document/unit/TextItem.yaml does not include change_type, so the test fails with:
Left contains 1 more item: {'change_type': None}
  • change_type is silently dropped for labels that are dispatched to helper methods.
    Inside add_text(), labels like TITLE, LIST_ITEM, or CODE are all forwarded to specialized helpers (add_title, add_list_item, add_heading, etc.). None of which accept or forward change_type. The parameter is silently ignored. We may want to track changes on those items too.
  • Following up with the previous bullet: what if a contributor drops an entire table from the .docx document? We should better define change_type as a field of DocItem
  • The PR adds no test verifying that a TextItem constructed with change_type="inserted" round-trips correctly (serialize → deserialize), nor that add_text() propagates the value onto the returned item. A single parametrized test would suffice.
  • It would be good to add the support of change_type in at least one serializer, to show the intention of the new field and get an idea about how inserted or deleted text should be rendered compared to normal text. The Markdown or the HTML serializers would be excellent candidates.
  • I have doubts about the name of the new field. change_type is generic and ambiguous. Also change and type are both overloaded words in any codebase. It also encodes only the kind of change, not the fact that this is a tracked/revision-control concept. What about tracked_change and insertion and deletion for its values?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants