mdparser

Native C CommonMark + GitHub Flavored Markdown parser for PHP. 15-30× faster than pure-PHP alternatives (Parsedown, cebe, michelf) with full CommonMark 0.31 compliance: 652/652 spec examples pass. GFM extensions: tables, strikethrough, task lists, autolinks, tagfilter. Installable via PIE (the PHP Foundation's PECL successor); ships as a single .so. PHP 8.3 minimum, OO API with final classes and readonly options.

📦 Install

# PIE (PHP Foundation's extension installer; uses the composer.json
# at the repo root with type: "php-ext")
pie install iliaal/mdparser

On a minimal PHP image (e.g. php:8.x-cli from Docker Hub), PIE needs a few build tools installed first:

# Debian/Ubuntu
sudo apt install -y git bison libtool-bin

# macOS
brew install bison libtool

From source

git clone https://github.com/iliaal/mdparser.git
cd mdparser
phpize && ./configure --enable-mdparser
make -j
sudo make install
echo 'extension=mdparser.so' | sudo tee /etc/php/conf.d/mdparser.ini

Windows binaries

Pre-built DLLs for PHP 8.3, 8.4, and 8.5 (TS/NTS, x86/x64) are attached to each GitHub release.

🛠️ Usage

use MdParser\Parser;
use MdParser\Options;

// Default parser: safe mode on, GFM extensions on.
$parser = new Parser();
echo $parser->toHtml('# Hello');
// <h1>Hello</h1>

// Custom options via named arguments. All fields readonly.
$parser = new Parser(new Options(
    smart: true,          // --- -> em dash, -- -> en dash, "..." -> curly
    sourcepos: true,      // add data-sourcepos to every HTML element
    footnotes: true,      // enable [^ref] / [^ref]: syntax
    unsafe: false,        // raw HTML is still stripped (default)
));
echo $parser->toHtml($markdown);

// Three output formats from one parser.
$html = $parser->toHtml($markdown);
$xml  = $parser->toXml($markdown);   // CommonMark XML, DOCTYPE-wrapped
$ast  = $parser->toAst($markdown);   // nested arrays, see below

// AST shape is documented in tests/006_ast.phpt. Brief example:
// [
//   'type' => 'document',
//   'children' => [
//     ['type' => 'heading', 'level' => 1, 'children' => [
//        ['type' => 'text', 'literal' => 'Hello'],
//     ]],
//   ],
// ]

📊 Performance

Against the major pure-PHP Markdown libraries, on PHP 8.4 with each parser in its default configuration:

Parser	Small (200 B)	Medium (1.8 KB)	Large (200 KB)
mdparser	30447 ops/s	5697 ops/s	105 ops/s
Parsedown	1651 ops/s (18x slower)	325 ops/s (17x)	6 ops/s (17x)
cebe/markdown (GFM)	1350 ops/s (22x)	374 ops/s (15x)	6 ops/s (16x)
michelf (Markdown Extra)	1006 ops/s (30x)	209 ops/s (27x)	5 ops/s (19x)

15-30× faster across the board, from small messages to full 200 KB spec documents. See bench/README.md for methodology, corpora, caveats, league/commonmark notes, and how to reproduce these numbers yourself.

✨ Feature matrix

Comparison with the major pure-PHP Markdown libraries. "via ext" means the feature exists but requires opting in to a non-default extension; "Extra" means the feature ships in the library's Markdown Extra dialect, not its base mode; "✗" means the feature is not supported at all.

Feature	mdparser	Parsedown	league/cm core	cebe GFM	michelf Extra	Ciconia
CommonMark core	✓	partial	✓	partial	partial	partial
Fenced code blocks	✓	✓	✓	✓	✓	✓
GFM tables	✓	✓	via ext	✓	via Extra	✓
Strikethrough	✓	✓	via ext	✓	✗	✓
Task lists	✓	✗	via ext	✗	✗	✓
Autolinks (bare URL)	✓	✓	via ext	✓	✗	✓
`<script>` tag filter	✓ (tagfilter)	✓ (escaped)	via ext	partial	✗	✗
Smart punctuation	✓ (`Options::smart`)	✗	via ext	✗	✗	✗
Footnotes	✓ (`Options::footnotes`)	Extra	via ext	✗	✓ Extra	plugin
Hardbreaks/nobreaks	✓	✗	✗	✗	✗	✗
Sourcepos	✓	✗	✓	✗	✗	✗
Heading anchors	✓ (`Options::headingAnchors`)	✗	via ext	✗	✗	✗
`rel="nofollow"`	✓ (`Options::nofollowLinks`)	✗	via ext	✗	✗	✗
HTML output	✓	✓	✓	✓	✓	✓
XML output	✓	✗	✗	✗	✗	✗
AST output	✓ (arrays)	✗	✓ (objects)	✗	✗	✗

What we don't cover

mdparser is deliberately scoped to what cmark-gfm supports: CommonMark core plus the five GFM extensions. It does not cover the "Markdown Extra" family of features that Parsedown Extra, michelf Markdown Extra, and league/commonmark's optional extensions offer. If you need any of the following, reach for league/commonmark, the most actively-maintained pure-PHP option for extended Markdown:

Definition lists (Term :: definition)
Abbreviations (*[HTML]: ...)
Attribute syntax ({.class #id key="val"})
Permalink anchor markup (we emit heading id slugs; we don't inject the inner <a class="anchor"> element GitHub uses for permalinks)
Table of contents
YAML front matter
Mentions (@user)
LaTeX math ($$...$$)
Emoji (:smile:)
Custom admonition containers (::: warning)

These are real features. They're just not in scope for a CommonMark+GFM core parser, and cmark-gfm doesn't implement them.

A note on `unsafe: true`

Options::unsafe = true tells cmark to pass raw HTML through verbatim instead of escaping or stripping it. The contract for this mode is that you own the input: it is yours, or it comes from a pipeline you trust. Two postprocess interactions are worth knowing if you also turn on headingAnchors or nofollowLinks:

Heading slug positioning under raw <hN>. mdparser locates each AST heading in the rendered HTML by rendering it standalone and matching its exact byte sequence. Raw <h1>x</h1> blocks written directly in the markdown source are therefore left untouched and do not consume slugs. The fingerprint search skips over HTML comments, CDATA sections, and raw-text / escapable-raw-text element bodies (script, style, title, textarea, iframe, noscript, xmp, noembed, noframes, plaintext), so a heading-shaped byte sequence inside those regions cannot hijack a slug. The narrow remaining exception is when a raw <hN>...</hN> block in the document body produces bytes byte-identical to a later Markdown heading (same level, same inner text), in which case the id attribute lands on the first match.
nofollowLinks is tag-aware. It rewrites every <a href="..."> it finds at a real tag-start position. The scan walks tag-by-tag with quote-aware attribute parsing, so anchor-shaped substrings inside another tag's quoted attribute value (e.g. <div title='<a href="x">y</a>'> written directly in the source) are passed through verbatim rather than rewritten. Raw-text element bodies and comment / CDATA bodies are likewise emitted verbatim. In-document fragment anchors (href="#...") are intentionally skipped, so footnote references and backrefs stay clean.

`toAst()` is unsanitized

Parser::toAst() returns a structural representation of the parsed document. Link / image url fields and html_block / html_inline literal fields are preserved byte-for-byte; the unsafe, tagfilter, and URL-scheme defenses do not apply to the AST. If you build HTML out of the AST yourself, you own the sanitization: apply a URL scheme allowlist before emitting href, and run HTML through a sanitizer before emitting raw html_block / html_inline literal text. See docs/ast.md for examples.

🔗 PHP Performance Toolkit

Companion native PHP extensions for high-throughput PHP workloads:

php_excel: native Excel I/O. 7-10× faster than PhpSpreadsheet, full XLS/XLSX with formulas, formatting, and styling. Powered by LibXL.
php_clickhouse: native ClickHouse client speaking the wire protocol directly. Picks up where SeasClick left off.
fastchart: native chart-rendering extension. 19 chart types behind one fluent OO API; composes with caller-owned \GdImage canvases.

📚 Read more

Full background, design rationale, and benchmark methodology in the launch post: mdparser: A Native CommonMark + GFM Parser for PHP.

License

Wrapper code (mdparser*.c, php_mdparser.h) under BSD 3-Clause.
Embedded cmark-gfm sources under BSD-2-Clause, MIT, and related permissive licenses. See LICENSE for aggregated notices.

Follow @iliaa on X • Blog • If this sped up your stack, ⭐ star it!

Name		Name	Last commit message	Last commit date
Latest commit History 62 Commits
.github		.github
.upstream		.upstream
bench		bench
docs		docs
examples		examples
images		images
scripts		scripts
tests		tests
vendor		vendor
.gitattributes		.gitattributes
.gitignore		.gitignore
.release-config		.release-config
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
composer.json		composer.json
config.m4		config.m4
config.w32		config.w32
mdparser.c		mdparser.c
mdparser.stub.php		mdparser.stub.php
mdparser_arginfo.h		mdparser_arginfo.h
mdparser_ast.c		mdparser_ast.c
mdparser_ast.h		mdparser_ast.h
mdparser_exception.c		mdparser_exception.c
mdparser_html_postprocess.c		mdparser_html_postprocess.c
mdparser_html_postprocess.h		mdparser_html_postprocess.h
mdparser_options.c		mdparser_options.c
mdparser_parser.c		mdparser_parser.c
php_mdparser.h		php_mdparser.h

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

mdparser

📦 Install

From source

Windows binaries

🛠️ Usage

📊 Performance

✨ Feature matrix

What we don't cover

A note on `unsafe: true`

`toAst()` is unsanitized

🔗 PHP Performance Toolkit

📚 Read more

License

About

Uh oh!

Releases 4

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

mdparser

📦 Install

From source

Windows binaries

🛠️ Usage

📊 Performance

✨ Feature matrix

What we don't cover

A note on unsafe: true

toAst() is unsanitized

🔗 PHP Performance Toolkit

📚 Read more

License

About

Topics

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 4

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

A note on `unsafe: true`

`toAst()` is unsanitized

Packages