Fix empty-title wikilinks being incorrectly parsed as Wikilink nodes#362
Fix empty-title wikilinks being incorrectly parsed as Wikilink nodes#362gaoflow wants to merge 3 commits into
Conversation
MediaWiki does not treat [[]] or [[|...]] as wikilinks because their title is empty. Calls to filter_wikilinks() on text containing these sequences returned false positives. The fix is in builder.py (pure Python, runs for both the C and Python tokenizers): when a WikilinkClose token is encountered and the resulting title is an empty string, the node is demoted to a plain Text node containing the original bracket sequence instead of being wrapped in a Wikilink. Fixes earwig#292.
|
I could imagine a tool based on mwparserfromhell that goes through the parsed nodes and emits a warning (or quickfix in some cases) for "empty-title wikilinks". Treating it as plain text directly in the parser would make it impossible. |
|
Thanks, that makes sense. I pushed The parser now keeps the Local verification: |
|
It does not make sense to have |
|
You're right. I pushed The empty-title check now lives in the shared I also extended the regression test to assert that these all agree for empty-title cases:
Local verification: |
Problem
[[]]and[[|...]]are treated asWikilinknodes by the parser, causingfilter_wikilinks()to return false positives for sequences that MediaWiki itself does not render as hyperlinks.In MediaWiki, a wikilink with an empty title (
[[]],[[|foo]], etc.) is not treated as an internal link — it is left as literal text. See issue #292.Fix
The fix lives entirely in
builder.py(pure Python), so it applies to both the C and Python tokenizer paths. When_handle_wikilinkassembles aWikilinknode and the title resolves to an empty string, it demotes the node to aTextnode containing the original bracket sequence.This keeps the tokenizer tests unchanged (the tokenizers still emit
WikilinkOpen/WikilinkClosetokens for these sequences) while correcting the parsed tree.Tests
New parametrized test
test_empty_title_wikilink_is_textintests/test_parser.pycovers[[]],[[|]],[[|foo]], and[[|||]]. All 2010 existing tests continue to pass.Fixes #292.
This pull request was prepared with the assistance of AI, under my direction and review.