Skip to content

Replace saxonche with pyschematron for Schematron validation #154

Description

@mprpic

saxonche (Saxon/C XSLT processor Python bindings) is a hard runtime dependency of doclang, but it creates several problems:

  • No source distribution on PyPI: all versions are distributed exclusively as pre-compiled wheels, with no sdist available for any release (12.0.0 through 13.0.0).
  • Missing platform support: no wheels for ppc64le or s390x. doclang is uninstallable on these architectures.
  • Proprietary binary: while licensed as MPL-2.0, the Python wheel contains a pre-compiled native library with no reproducible build path. The source repository (saxonica.plan.io) contains Java/C++ source but no Python packaging infrastructure.
  • Blocks the entire import chain: because schematron_validation.py imports saxonche at module level, and __init__.py transitively imports it, even from doclang import pack fails without saxonche installed, despite packaging having nothing to do with Schematron validation.

This affects downstream consumers like docling-core, which depends on doclang but does not use Schematron validation in its default code paths.

Note that #153 was submitted to make saxonche an optional dependency as a quick solution to unblock downstream consumers of doclang that do not require schematron validation.

Proposed solution

Replace saxonche with pyschematron (v1.2.0+) as the Schematron validation backend.

pyschematron is a pure-Python Schematron validator that evaluates schemas directly via an AST and XPath, bypassing XSLT entirely. It:

  • Ships as py3-none-any with an sdist and works on all architectures
  • Supports XPath 1.0/2.0/3.0/3.1 query bindings
  • Is MIT licensed
  • Has no native binary dependencies

Required changes to doclang.sch

The bundled Schematron schema uses current(), which is an XSLT function (not XPath). pyschematron's XPath engine does not support it, but it can be replaced with equivalent <sch:let> variables.

Required changes to schematron_validation.py

Replace the Saxon XSLT pipeline (transpile .sch to XSLT 3.0, compile stylesheet, transform XML) with a direct pyschematron call.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions