Skip to content

Fix empty-title wikilinks being incorrectly parsed as Wikilink nodes#362

Open
gaoflow wants to merge 3 commits into
earwig:mainfrom
gaoflow:fix/empty-title-wikilink-292
Open

Fix empty-title wikilinks being incorrectly parsed as Wikilink nodes#362
gaoflow wants to merge 3 commits into
earwig:mainfrom
gaoflow:fix/empty-title-wikilink-292

Conversation

@gaoflow

@gaoflow gaoflow commented Jun 25, 2026

Copy link
Copy Markdown

Problem

[[]] and [[|...]] are treated as Wikilink nodes by the parser, causing filter_wikilinks() to return false positives for sequences that MediaWiki itself does not render as hyperlinks.

import mwparserfromhell

mwparserfromhell.parse("[[]]").filter_wikilinks()   # returns ['[[]]'] — wrong
mwparserfromhell.parse("[[|]]").filter_wikilinks()  # returns ['[[|]]'] — wrong

In MediaWiki, a wikilink with an empty title ([[]], [[|foo]], etc.) is not treated as an internal link — it is left as literal text. See issue #292.

Fix

The fix lives entirely in builder.py (pure Python), so it applies to both the C and Python tokenizer paths. When _handle_wikilink assembles a Wikilink node and the title resolves to an empty string, it demotes the node to a Text node containing the original bracket sequence.

This keeps the tokenizer tests unchanged (the tokenizers still emit WikilinkOpen/WikilinkClose tokens for these sequences) while correcting the parsed tree.

Tests

New parametrized test test_empty_title_wikilink_is_text in tests/test_parser.py covers [[]], [[|]], [[|foo]], and [[|||]]. All 2010 existing tests continue to pass.

Fixes #292.

This pull request was prepared with the assistance of AI, under my direction and review.

MediaWiki does not treat [[]] or [[|...]] as wikilinks because their
title is empty. Calls to filter_wikilinks() on text containing these
sequences returned false positives.

The fix is in builder.py (pure Python, runs for both the C and Python
tokenizers): when a WikilinkClose token is encountered and the resulting
title is an empty string, the node is demoted to a plain Text node
containing the original bracket sequence instead of being wrapped in a
Wikilink.

Fixes earwig#292.
@lahwaacz

Copy link
Copy Markdown
Contributor

I could imagine a tool based on mwparserfromhell that goes through the parsed nodes and emits a warning (or quickfix in some cases) for "empty-title wikilinks". Treating it as plain text directly in the parser would make it impossible.

@gaoflow

gaoflow commented Jun 25, 2026

Copy link
Copy Markdown
Author

Thanks, that makes sense. I pushed d8f9d4a to avoid demoting empty-title wikilinks to plain text.

The parser now keeps the Wikilink node, so tools can still find and warn/fix these cases via the generic node APIs. The false-positive fix is limited to filter_wikilinks() / ifilter_wikilinks(), which now skip wikilinks whose title string is empty.

Local verification:

uv run --with-editable . --with pytest python -m pytest tests/test_parser.py::test_empty_title_wikilink_filter_wikilinks tests/test_wikicode.py::test_filter_family -q
uv run --with-editable . --group dev python -m pytest tests/test_parser.py tests/test_wikicode.py -q
uv run --with-editable . --group dev python -m pytest tests -q
uv run --with ruff ruff check src/mwparserfromhell/wikicode.py src/mwparserfromhell/parser/builder.py tests/test_parser.py
uv run --with ruff ruff format --check src/mwparserfromhell/wikicode.py src/mwparserfromhell/parser/builder.py tests/test_parser.py
git diff --check

@lahwaacz

Copy link
Copy Markdown
Contributor

It does not make sense to have ifilter_wikilinks and ifilter with forcetype=Wikilink do something else. ifilter_wikilinks is supposed to be a shortcut, it should not be semantically different.

@gaoflow

gaoflow commented Jun 26, 2026

Copy link
Copy Markdown
Author

You're right. I pushed 1be7727 to remove that semantic split.

The empty-title check now lives in the shared ifilter(forcetype=Wikilink) path, and ifilter_wikilinks() / filter_wikilinks() are back to being direct shortcuts. The parsed Wikilink node is still preserved, so tools can still inspect the raw parsed nodes via the generic traversal; it just no longer appears in the wikilink-specific filtered results.

I also extended the regression test to assert that these all agree for empty-title cases:

  • filter_wikilinks()
  • filter(forcetype=Wikilink)
  • ifilter(forcetype=Wikilink)

Local verification:

uv run --with-editable . --with pytest python -m pytest tests/test_parser.py::test_empty_title_wikilink_filter_wikilinks tests/test_wikicode.py::test_filter_family -q
# 5 passed

uv run --with-editable . --group dev python -m pytest tests/test_parser.py tests/test_wikicode.py -q
# 31 passed

uv run --with-editable . --group dev python -m pytest tests -q
# 2010 passed, 1 skipped

uv run --with ruff ruff check src/mwparserfromhell/wikicode.py tests/test_parser.py
# All checks passed

uv run --with ruff ruff format --check src/mwparserfromhell/wikicode.py tests/test_parser.py
# 2 files already formatted

git diff --check
# passed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[[]] and [[|]] should not be treated as wikilinks

2 participants