As it is now, anchors extracted from documents get extracted in a flat space, while they usually exist in a tree namespace structure. This structure is described by the header level of the anchor, or the header level where the anchor exists in (if there are anchors other then headers themselfs), and all the super-headers of that header.
This is at least the case with Markdown and HTML, but probably also most other document formats.
example markdown document (doc.md):
# Top
## First Sub
bla bla bla
### A Sub Sub
bli bli bli
## Second Sub
blu blu blu
### B Sub Sub
tri tra tralala
<a name="in-text"/>
flat extraction:
doc.md#top
doc.md#first-sub
doc.md#a-sub-sub
doc.md#second-sub
doc.md#b-sub-sub
doc.md#in-text
structured extraction:
doc.md#
\ top
\ first-sub
\ a-sub-sub
\ second-sub
\ b-sub-sub
\ in-text
Why
This is useful when analyzing changes in documents, for example if a title has been renamed, but the structure overall has stayed the same, one might be able to generate an auto-fix for a missing link including a fragment (that is meant to map to an anchor).
As it is now, anchors extracted from documents get extracted in a flat space, while they usually exist in a tree namespace structure. This structure is described by the header level of the anchor, or the header level where the anchor exists in (if there are anchors other then headers themselfs), and all the super-headers of that header.
This is at least the case with Markdown and HTML, but probably also most other document formats.
example markdown document (
doc.md):flat extraction:
structured extraction:
Why
This is useful when analyzing changes in documents, for example if a title has been renamed, but the structure overall has stayed the same, one might be able to generate an auto-fix for a missing link including a fragment (that is meant to map to an anchor).