saxonche (Saxon/C XSLT processor Python bindings) is a hard runtime dependency of doclang, but it creates several problems:
- No source distribution on PyPI: all versions are distributed exclusively as pre-compiled wheels, with no sdist available for any release (12.0.0 through 13.0.0).
- Missing platform support: no wheels for ppc64le or s390x. doclang is uninstallable on these architectures.
- Proprietary binary: while licensed as MPL-2.0, the Python wheel contains a pre-compiled native library with no reproducible build path. The source repository (saxonica.plan.io) contains Java/C++ source but no Python packaging infrastructure.
- Blocks the entire import chain: because
schematron_validation.py imports saxonche at module level, and __init__.py transitively imports it, even from doclang import pack fails without saxonche installed, despite packaging having nothing to do with Schematron validation.
This affects downstream consumers like docling-core, which depends on doclang but does not use Schematron validation in its default code paths.
Note that #153 was submitted to make saxonche an optional dependency as a quick solution to unblock downstream consumers of doclang that do not require schematron validation.
Proposed solution
Replace saxonche with pyschematron (v1.2.0+) as the Schematron validation backend.
pyschematron is a pure-Python Schematron validator that evaluates schemas directly via an AST and XPath, bypassing XSLT entirely. It:
- Ships as
py3-none-any with an sdist and works on all architectures
- Supports XPath 1.0/2.0/3.0/3.1 query bindings
- Is MIT licensed
- Has no native binary dependencies
Required changes to doclang.sch
The bundled Schematron schema uses current(), which is an XSLT function (not XPath). pyschematron's XPath engine does not support it, but it can be replaced with equivalent <sch:let> variables.
Required changes to schematron_validation.py
Replace the Saxon XSLT pipeline (transpile .sch to XSLT 3.0, compile stylesheet, transform XML) with a direct pyschematron call.
saxonche(Saxon/C XSLT processor Python bindings) is a hard runtime dependency of doclang, but it creates several problems:schematron_validation.pyimports saxonche at module level, and__init__.pytransitively imports it, evenfrom doclang import packfails without saxonche installed, despite packaging having nothing to do with Schematron validation.This affects downstream consumers like docling-core, which depends on doclang but does not use Schematron validation in its default code paths.
Note that #153 was submitted to make saxonche an optional dependency as a quick solution to unblock downstream consumers of doclang that do not require schematron validation.
Proposed solution
Replace saxonche with pyschematron (v1.2.0+) as the Schematron validation backend.
pyschematron is a pure-Python Schematron validator that evaluates schemas directly via an AST and XPath, bypassing XSLT entirely. It:
py3-none-anywith an sdist and works on all architecturesRequired changes to
doclang.schThe bundled Schematron schema uses
current(), which is an XSLT function (not XPath). pyschematron's XPath engine does not support it, but it can be replaced with equivalent<sch:let>variables.Required changes to
schematron_validation.pyReplace the Saxon XSLT pipeline (transpile
.schto XSLT 3.0, compile stylesheet, transform XML) with a direct pyschematron call.