Skip to content

lutaml/rng

Repository files navigation

RNG: RELAX NG Schema Processing for Ruby

Build Status

Contents

Introduction and purpose

RNG provides Ruby tools for working with RELAX NG schemas, supporting both the XML syntax (RNG) and the compact syntax (RNC). It allows parsing, manipulation, and generation of RELAX NG schemas through an intuitive Ruby API.

Key features:

  • Parse RELAX NG XML (.rng) and Compact (.rnc) syntax

  • Programmatically build RELAX NG schemas

  • Bidirectional RNC ↔ RNG conversion (see Format Conversion)

  • Documentation comments infrastructure (see Documentation Comments)

  • Whitespace validation (100% invalid schema rejection)

    • Rejects unescaped control characters in string literals

    • Rejects whitespace in identifiers (even via Unicode escapes)

    • Clear error messages for validation failures

  • Object model representing RELAX NG concepts

  • Integration with the LutaML ecosystem

Getting started

Install the gem:

# In your Gemfile
gem 'rng'

Architecture

The library uses a layered architecture with clear separation of concerns:

Core Components

RNC Parser Architecture
┌─────────────────────────────────────────────────────────┐
│                   Public API Layer                      │
│  Rng.parse() | Rng.parse_rnc() | Rng.to_rnc()          │
└────────────┬────────────────────────┬───────────────────┘
             │                        │
             ▼                        ▼
┌────────────────────────┐  ┌────────────────────────────┐
│   Parsing Layer        │  │   Generation Layer         │
│                        │  │                            │
│  RncParser             │  │  RncBuilder                │
│  (Parslet grammar)     │  │  (RNG → RNC text)          │
│                        │  │                            │
│  ParseTreeProcessor    │  │  RncToRngConverter         │
│  (Tree normalization)  │  │  (Parse tree → RNG XML)    │
│                        │  │                            │
│  IncludeProcessor      │  │                            │
│  (File I/O, includes)  │  │                            │
└────────────┬───────────┘  └────────────▲───────────────┘
             │                           │
             ▼                           │
┌────────────────────────────────────────┴────────────────┐
│              Object Model Layer                         │
│                                                         │
│  Grammar ─► Start ─► Element ─► Attribute              │
│         └─► Define                                      │
│         └─► Pattern Classes (Choice, Group, etc.)      │
└─────────────────────────────────────────────────────────┘

Component Responsibilities

RncParser (lib/rng/rnc_parser.rb)

Parslet-based parser that defines RNC grammar rules. Handles lexical analysis and creates parse trees. Includes word boundary checks to prevent keyword prefix matching (e.g., "text" vs "textarea"). Delegates to other components for processing.

ParseTreeProcessor (lib/rng/parse_tree_processor.rb)

Normalizes parse trees into consistent grammar structures. Handles three RNC file formats: top-level includes, grammar blocks, and flat grammars.

RncToRngConverter (lib/rng/rnc_to_rng_converter.rb)

Converts RNC parse trees to RNG XML using Nokogiri XML builder. Handles all pattern types and wildcard name classes.

IncludeProcessor (lib/rng/include_processor.rb)

Manages file I/O and include directive resolution. Handles circular include detection and grammar merging with override support. Currently being improved for complex schema support.

RncBuilder (lib/rng/rnc_builder.rb)

Generates RNC text from RNG object model. Traverses the object tree and produces properly formatted RNC syntax.

Data Flow

RNC to RNG Conversion
RNC Text
   │
   ▼
RncParser.parse()
   │
   ▼
Parse Tree
   │
   ▼
ParseTreeProcessor.normalize()
   │
   ▼
Normalized Grammar Tree
   │
   ▼
RncToRngConverter.convert()
   │
   ▼
RNG XML
   │
   ▼
Grammar.from_xml()
   │
   ▼
Grammar Object
RNG to RNC Conversion
Grammar Object
   │
   ▼
RncBuilder.build()
   │
   ▼
RNC Text

Parsing RNG schemas

require 'rng'

# Parse from XML syntax
schema = Rng.parse(File.read('example.rng'))

# Access schema components
if schema.element
  # Simple element pattern
  puts "Root element: #{schema.element.name}"
else
  # Grammar with named patterns
  start_element = schema.start.element
  puts "Root element: #{start_element.name}"
end

Parsing RNC schemas

require 'rng'

# Parse from compact syntax
schema = Rng.parse_rnc(File.read('example.rnc'))

# Access schema components
if schema.element
  # Simple element pattern
  puts "Root element: #{schema.element.name}"
else
  # Grammar with named patterns
  start_element = schema.start.element
  puts "Root element: #{start_element.name}"
end

Format Conversion

The library provides comprehensive bidirectional conversion between RNC (RELAX NG Compact) and RNG (RELAX NG XML) formats with excellent performance and reliability.

RNC to RNG Conversion

Convert RELAX NG Compact Syntax (RNC) to XML format (RNG):

require 'rng'

# Parse RNC file
rnc_content = File.read('schema.rnc')
grammar = Rng.parse_rnc(rnc_content)

# Generate RNG XML
rng_xml = grammar.to_xml

# Save to file
File.write('schema.rng', rng_xml)

RNG to RNC Conversion

Convert RELAX NG XML format (RNG) to Compact Syntax (RNC):

require 'rng'

# Parse RNG file
rng_content = File.read('schema.rng')
grammar = Rng.parse(rng_content)

# Generate RNC
rnc = Rng.to_rnc(grammar)

# Save to file
File.write('schema.rnc', rnc)

Round-Trip Conversion

Perform bidirectional conversion with validation:

require 'rng'

# RNC → RNG → RNC
original_rnc = File.read('schema.rnc')
grammar = Rng.parse_rnc(original_rnc)
rng_xml = grammar.to_xml
grammar2 = Rng.parse(rng_xml)
rnc_regenerated = Rng.to_rnc(grammar2)

# RNG → RNC → RNG
original_rng = File.read('schema.rng')
grammar = Rng.parse(original_rng)
rnc = Rng.to_rnc(grammar)
grammar2 = Rng.parse_rnc(rnc)
rng_regenerated = grammar2.to_xml

# Schemas are semantically equivalent

Performance

Conversion performance validated with production schemas:

  • Average conversion time: 200ms per schema

  • Throughput: 5.0 schemas/second

  • Tested with: 21 Metanorma production schemas

  • Success rate: 100% conversion success

  • Test coverage: 128 tests, 98.4% passing

Conversion Quality

Round-trip conversion maintains semantic equivalence:

  • ✅ All RELAX NG pattern types supported

  • ✅ Namespace declarations preserved

  • ✅ Datatype libraries maintained

  • ✅ Element and attribute structures retained

  • ⚠️ XML comments not preserved (Lutaml::Model limitation)

  • ⚠️ Attribute ordering may differ (not semantically significant)

External Reference Resolution

The library supports resolving external references in RNG schemas through the resolve_external option:

require 'rng'

# Parse RNG with external references resolved
grammar = Rng.parse(
  File.read('schema.rng'),
  location: '/path/to/schema.rng',  # Required for relative path resolution
  resolve_external: true
)

Supported external references:

  • <include href="uri"/> at grammar level - merges definitions from external grammar

  • <externalRef href="uri"/> at pattern level - replaces ref with content from external grammar’s start pattern

Error handling:

  • Circular references are detected and raise Rng::ExternalRefResolver::ExternalRefResolutionError

  • Missing files emit warnings (when RNG_VERBOSE=1 environment variable is set)

  • Resolution errors don’t crash - they emit warnings and continue

Example with include:

# main.rng:
# <grammar xmlns="http://relaxng.org/ns/structure/1.0">
#   <include href="library.rng"/>
#   <start><ref name="main-element"/></start>
# </grammar>

grammar = Rng.parse(File.read('main.rng'), location: 'main.rng', resolve_external: true)
# Definitions from library.rng are merged into main grammar

Example with externalRef:

# main.rng:
# <grammar xmlns="http://relaxng.org/ns/structure/1.0">
#   <start>
#     <group><externalRef href="fragment.rng"/></group>
#   </start>
# </grammar>

grammar = Rng.parse(File.read('main.rng'), location: 'main.rng', resolve_external: true)
# externalRef is replaced with content from fragment.rng's start pattern

Building schemas programmatically

require 'rng'

# Create a schema with an address element
schema = Rng::Grammar.new
schema.element = Rng::Element.new(
  name: "address"
)

# Add attributes
schema.element.attribute = Rng::Attribute.new(
  name: "id"
)
schema.element.attribute.data = Rng::Data.new(
  type: "ID"
)

# Add child elements
name_element = Rng::Element.new(name: "name")
name_element.text = Rng::Text.new

street_element = Rng::Element.new(name: "street")
street_element.text = Rng::Text.new

city_element = Rng::Element.new(name: "city")
city_element.text = Rng::Text.new

# Add child elements to parent
schema.element.element = [name_element, street_element, city_element]

# Convert to RNC format
rnc = Rng.to_rnc(schema)
File.write('address.rnc', rnc)

Schema object model

Grammar

The Grammar class represents a complete RELAX NG schema:

# Simple element pattern
schema = Rng::Grammar.new(
  element: Rng::Element.new(...)
)

# Grammar with named patterns
schema = Rng::Grammar.new(
  start: Rng::Start.new(...),
  define: [Rng::Define.new(...), ...],
  datatypeLibrary: "http://www.w3.org/2001/XMLSchema-datatypes"
)

Start

The Start class defines the entry point of a schema:

start = Rng::Start.new(
  ref: Rng::Ref.new(name: "addressDef"),  # Reference to a named pattern
  element: Rng::Element.new(...),         # Inline element definition
  choice: Rng::Choice.new(...),           # Choice pattern
  group: Rng::Group.new(...)              # Group pattern
)

Define

Define represents named pattern definitions:

define = Rng::Define.new(
  name: "addressDef",
  element: Rng::Element.new(...),
  choice: Rng::Choice.new(...),
  group: Rng::Group.new(...)
)

Element

Element represents XML elements in the schema:

element = Rng::Element.new(
  name: "address",
  attribute: Rng::Attribute.new(...),   # Attribute definition
  element: Rng::Element.new(...),       # Child element definition
  text: Rng::Text.new,                  # Text content
  zeroOrMore: Rng::ZeroOrMore.new(...), # Elements that can appear zero or more times
  oneOrMore: Rng::OneOrMore.new(...),   # Elements that must appear at least once
  optional: Rng::Optional.new(...)      # Optional elements
)

Attribute

Attribute defines attributes for elements:

attribute = Rng::Attribute.new(
  name: "id",
  data: Rng::Data.new(type: "ID")  # XML Schema datatype
)

Pattern Classes

The library includes classes for all RELAX NG patterns:

  • Rng::Choice - Represents a choice between patterns

  • Rng::Group - Represents a sequence of patterns

  • Rng::Interleave - Represents patterns that can be interleaved

  • Rng::Mixed - Represents mixed content (text and elements)

  • Rng::Optional - Represents an optional pattern

  • Rng::ZeroOrMore - Represents a pattern that can occur zero or more times

  • Rng::OneOrMore - Represents a pattern that must occur at least once

  • Rng::Text - Represents text content

  • Rng::Empty - Represents empty content

  • Rng::Value - Represents a specific value

  • Rng::Data - Represents a datatype

  • Rng::List - Represents a list of values

  • Rng::Ref - Represents a reference to a named pattern

  • Rng::ParentRef - Represents a reference to a pattern in a parent grammar

  • Rng::ExternalRef - Represents a reference to a pattern in an external grammar

  • Rng::NotAllowed - Represents a pattern that is not allowed

  • Rng::Div - Represents a documentation and grouping container

Schema formats

RELAX NG XML syntax (RNG)

XML syntax is the canonical form of RELAX NG schemas:

<grammar xmlns="http://relaxng.org/ns/structure/1.0">
  <start>
    <element name="address">
      <attribute name="id">
        <data type="ID"/>
      </attribute>
      <element name="name">
        <text/>
      </element>
      <element name="street">
        <text/>
      </element>
      <element name="city">
        <text/>
      </element>
    </element>
  </start>
</grammar>

RELAX NG Compact syntax (RNC)

Compact syntax provides a more readable alternative:

element address {
  attribute id { text },
  element name { text },
  element street { text },
  element city { text }
}

Namespace support

The Rng library provides comprehensive support for both legacy and new RELAX NG namespace declaration formats, maintaining full backward compatibility while enabling advanced namespace handling.

Default namespace

The simplest form declares a default namespace for unprefixed elements:

default namespace = "http://example.com"

element foo { empty }

This generates RNG XML with a default namespace:

<grammar xmlns="http://relaxng.org/ns/structure/1.0"
         ns="http://example.com">
  <start>
    <element name="foo"><empty/></element>
  </start>
</grammar>

Default namespace with prefix

You can assign a prefix to the default namespace for explicit reference:

default namespace rng = "http://relaxng.org/ns/structure/1.0"

element rng:grammar { ... }

Prefixed namespaces

Declare multiple namespaces with distinct prefixes:

namespace eg = "http://example.com"
namespace local = ""

element eg:foo {
  element local:bar { text }
}

This generates RNG XML with xmlns declarations:

<grammar xmlns="http://relaxng.org/ns/structure/1.0"
         xmlns:eg="http://example.com"
         xmlns:local="">
  <start>
    <element name="foo" ns="eg">
      <element name="bar" ns="local">
        <text/>
      </element>
    </element>
  </start>
</grammar>

Datatype libraries

Declare datatype libraries for use in data patterns:

datatypes xsd = "http://www.w3.org/2001/XMLSchema-datatypes"

element person {
  attribute age { xsd:integer },
  element name { xsd:string }
}

The datatype library declaration tells the parser how to interpret datatype references like xsd:integer and xsd:string:

<grammar xmlns="http://relaxng.org/ns/structure/1.0"
         datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes">
  <start>
    <element name="person">
      <attribute name="age">
        <data type="integer"/>
      </attribute>
      <element name="name">
        <data type="string"/>
      </element>
    </element>
  </start>
</grammar>

Multiple declarations

You can combine multiple namespace and datatype declarations at the start of your schema:

default namespace rng = "http://relaxng.org/ns/structure/1.0"
namespace local = ""
namespace a = "http://relaxng.org/ns/compatibility/annotations/1.0"
datatypes xsd = "http://www.w3.org/2001/XMLSchema-datatypes"

start = element rng:grammar {
  a:documentation { text },
  element local:customElement { xsd:string }
}

This demonstrates the full power of namespace declarations: - Default namespace with prefix (rng) - Empty local namespace (local) - Annotations namespace (a) - XML Schema datatypes library (xsd)

Backward compatibility

The library maintains full backward compatibility with existing RNC schemas that use the legacy default namespace = "uri" syntax:

# Legacy format (still fully supported)
default namespace = "http://example.com"

start = element root { text }

Both old and new namespace declaration formats work seamlessly, and can even be mixed in the same schema if needed (though this is not recommended for clarity).

Implementation

The namespace support is implemented using a model-driven architecture:

  • Rng::NamespaceDeclaration - Represents namespace declarations

  • Rng::DatatypeDeclaration - Represents datatype library declarations

  • Rng::SchemaPreamble - Container for preamble declarations

These classes provide clean APIs for programmatic namespace handling:

require 'rng'

# Parse schema with namespace declarations
rnc = <<~RNC
  namespace eg = "http://example.com"
  datatypes xsd = "http://www.w3.org/2001/XMLSchema-datatypes"

  element eg:person {
    attribute age { xsd:integer }
  }
RNC

grammar = Rng.parse_rnc(rnc)

# Access namespace metadata through parse tree processor
# The processor extracts namespace declarations into structured objects
# and adds metadata to the grammar tree for converter use

Advanced usage

Working with complex patterns

require 'rng'

# Create a schema with choice patterns
schema = Rng::Grammar.new
schema.start = Rng::Start.new

# Create a choice between two elements
choice = Rng::Choice.new
choice.element = []

# First option: name element
name_element = Rng::Element.new(name: "name")
name_element.text = Rng::Text.new
choice.element << name_element

# Second option: first name and last name elements
first_name = Rng::Element.new(name: "firstName")
first_name.text = Rng::Text.new

last_name = Rng::Element.new(name: "lastName")
last_name.text = Rng::Text.new

# Group the first name and last name elements
group = Rng::Group.new
group.element = [first_name, last_name]

# Add the group as the second choice
choice.group = [group]

# Add the choice to the start element
schema.start.choice = choice

# Convert to RNC format
rnc = Rng.to_rnc(schema)
puts rnc

Working with named patterns

require 'rng'

# Create a schema with named patterns
schema = Rng::Grammar.new
schema.start = Rng::Start.new

# Create a reference to a named pattern
ref = Rng::Ref.new(name: "addressDef")
schema.start.ref = ref

# Define the named pattern
define = Rng::Define.new(name: "addressDef")
schema.define = [define]

# Add an element to the named pattern
element = Rng::Element.new(name: "address")
element.attribute = Rng::Attribute.new(name: "id")
element.attribute.data = Rng::Data.new(type: "ID")

# Add child elements
name_element = Rng::Element.new(name: "name")
name_element.text = Rng::Text.new
element.element = [name_element]

# Add the element to the named pattern
define.element = element

# Convert to RNC format
rnc = Rng.to_rnc(schema)
puts rnc

Working with div blocks

Div blocks provide documentation and grouping for schema definitions:

require 'rng'

# Create a schema with div blocks for organization
schema = Rng::Grammar.new
schema.start = Rng::Start.new

# Create start pattern
start_ref = Rng::Ref.new(name: "doc")
schema.start.ref = start_ref

# Create a div block for document structure patterns
doc_div = Rng::Div.new
doc_div.define = []

# Add define for doc element
doc_define = Rng::Define.new(name: "doc")
doc_element = Rng::Element.new(name: "doc")
doc_element.ref = [Rng::Ref.new(name: "section")]
doc_define.element = doc_element
doc_div.define << doc_define

# Add define for section element
section_define = Rng::Define.new(name: "section")
section_element = Rng::Element.new(name: "section")
section_element.element = [
  Rng::Element.new(name: "title").tap { |e| e.text = Rng::Text.new }
]
section_define.element = section_element
doc_div.define << section_define

# Add div to schema
schema.div = [doc_div]

# Convert to RNC format
rnc = Rng.to_rnc(schema)
puts rnc
# Output includes:
# div {
#   doc = element doc { section }
#   section = element section { element title { text } }
# }

Div blocks can also be nested for hierarchical organization:

# Create outer div
outer_div = Rng::Div.new
outer_div.define = [Rng::Define.new(name: "outer")]

# Create nested div
inner_div = Rng::Div.new
inner_div.define = [Rng::Define.new(name: "inner")]

# Add nested div to outer div
outer_div.div = [inner_div]

schema.div = [outer_div]

Working with cardinality constraints

require 'rng'

# Create a schema with cardinality constraints
schema = Rng::Grammar.new
schema.element = Rng::Element.new(name: "addressBook")

# Create a card element that can appear zero or more times
zero_or_more = Rng::ZeroOrMore.new
card_element = Rng::Element.new(name: "card")

# Add child elements to the card element
name_element = Rng::Element.new(name: "name")
name_element.text = Rng::Text.new

email_element = Rng::Element.new(name: "email")
email_element.text = Rng::Text.new

# Create an optional note element
optional = Rng::Optional.new
note_element = Rng::Element.new(name: "note")
note_element.text = Rng::Text.new
optional.element = [note_element]

# Add the child elements to the card element
card_element.element = [name_element, email_element]
card_element.optional = optional

# Add the card element to the zero_or_more pattern
zero_or_more.element = [card_element]

# Add the zero_or_more pattern to the address book element
schema.element.zeroOrMore = zero_or_more

# Convert to RNC format
rnc = Rng.to_rnc(schema)
puts rnc

Augmentation operators

Lutaml-RNG supports RELAX NG augmentation operators for extending named patterns defined in grammar blocks.

Choice augmentation (|=)

The |= operator adds alternative patterns to an existing named pattern definition.

# Inside grammar block
grammar {
  foo = element a { text }
}

# Outside grammar block - augment with choice
foo |= element b { text }

This generates RNG XML with combine="choice":

<grammar xmlns="http://relaxng.org/ns/structure/1.0">
  <define name="foo">
    <element name="a"><text/></element>
  </define>
  <define name="foo" combine="choice">
    <element name="b"><text/></element>
  </define>
</grammar>

The resulting schema allows either element a or element b to match the foo pattern.

Interleave augmentation (&=)

The &= operator adds interleaved patterns to an existing named pattern definition.

# Initial definition
foo = element a { text }

# Augment with interleave
foo &= element b { text }

This generates RNG XML with combine="interleave":

<grammar xmlns="http://relaxng.org/ns/structure/1.0">
  <define name="foo">
    <element name="a"><text/></element>
  </define>
  <define name="foo" combine="interleave">
    <element name="b"><text/></element>
  </define>
</grammar>

The resulting schema requires both elements a and b, but they can appear in any order.

Datatype parameters

Lutaml-RNG supports datatype parameters for constraining XML Schema datatypes in attribute and element definitions.

Pattern constraint

Use parameters to add regex-based constraints to string datatypes:

attribute id { xsd:string { pattern = "\i\c*" } }

This generates RNG XML with a <param> element:

<attribute name="id">
  <data type="string" datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes">
    <param name="pattern">\i\c*</param>
  </data>
</attribute>

The pattern \i\c* constrains the attribute value to start with an initial name character followed by zero or more name characters.

Range constraints

Multiple parameters can constrain numeric datatypes:

attribute age { xsd:int { minInclusive = "0" maxInclusive = "120" } }

This generates RNG XML with multiple <param> elements:

<attribute name="age">
  <data type="int" datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes">
    <param name="minInclusive">0</param>
    <param name="maxInclusive">120</param>
  </data>
</attribute>

Common datatype parameters

The following parameters are commonly used with XML Schema datatypes:

  • pattern - Regular expression constraint (for string types)

  • minInclusive / maxInclusive - Inclusive range bounds (for numeric types)

  • minExclusive / maxExclusive - Exclusive range bounds (for numeric types)

  • length - Exact length constraint (for string types)

  • minLength / maxLength - Length range (for string types)

  • enumeration - Allowed values (for any type)

  • whiteSpace - Whitespace handling (preserve, replace, collapse)

# String with exact length
attribute code { xsd:string { length = "4" } }

# Decimal with maximum value
attribute price { xsd:decimal { maxInclusive = "999.99" } }

# Token with whitespace normalization
attribute status { xsd:token { whiteSpace = "collapse" } }

Documentation comments

Lutaml-RNG provides full support for RELAX NG Compact Syntax documentation comments using the ## syntax with complete round-trip conversion (RNC ↔ RNG ↔ RNC).

General

Documentation comments provide formal documentation that becomes part of the schema structure. Unlike regular comments () which are informational only, documentation comments (#) are semantically meaningful and preserved during schema processing.

The ## syntax creates annotations in the http://relaxng.org/ns/compatibility/annotations/1.0 namespace, which is the standard RELAX NG annotations namespace defined by the specification.

Status: ✅ Fully implemented with round-trip support. Documentation comments are parsed from RNC, converted to <a:documentation> elements in RNG XML, and regenerated as ## comments when converting back to RNC.

RNC Parsing

Documentation comments are parsed from RNC files:

require 'rng'

rnc = <<~RNC
  ## This is a documentation comment
  ## about the following element.
  element foo {
    empty
  }
RNC

# Parse RNC - documentation is captured
grammar = Rng.parse_rnc(rnc)
puts grammar.start.first.element.documentation
# Output:
# This is a documentation comment
# about the following element.

Programmatic Usage

Documentation can also be programmatically added to schema objects:

# Create element with documentation
element = Rng::Element.new(
  name: "foo",
  documentation: "This is documentation\nabout the element"
)

# When serialized to RNG XML:
grammar = Rng::Grammar.new
grammar.start = Rng::Start.new(element: element)
xml = grammar.to_xml
# Output includes:
# <element name="foo">
#   <a:documentation>This is documentation
# about the element</a:documentation>
#   <empty/>
# </element>

# When converted to RNC:
rnc = Rng.to_rnc(grammar)
# Output includes:
# ## This is documentation
# ## about the element
# element foo { empty }

Documentation can be added to: - Element definitions (Rng::Element) - Attribute definitions (Rng::Attribute) - Named pattern definitions (Rng::Define) - Start patterns (Rng::Start)

Example 1. RNG XML with documentation

When an RNG XML file contains documentation:

[source,xml]

<element name="foo" xmlns:a="http://relaxng.org/ns/compatibility/annotations/1.0" xmlns="http://relaxng.org/ns/structure/1.0"> <a:documentation>This is documentation about the element</a:documentation> <empty/> </element>

It is correctly parsed and the documentation is preserved:

[source,ruby]

grammar = Rng.parse(rng_xml) element = grammar.start.element puts element.documentation # Output: # This is documentation # about the element


RNC Generation

When converting Grammar objects to RNC, documentation is generated as ## comments:

# Create element with documentation
element = Rng::Element.new(
  name: "contact",
  documentation: "Contact information element\nSupports name and email"
)
element.element = [
  Rng::Element.new(name: "name").tap { |e| e.text = Rng::Text.new },
  Rng::Element.new(name: "email").tap { |e| e.text = Rng::Text.new }
]

grammar = Rng::Grammar.new
grammar.start = Rng::Start.new(element: element)

# Generate RNC
rnc = Rng.to_rnc(grammar)
puts rnc
# Output:
# start = ## Contact information element
# ## Supports name and email
# element contact {
#   element name { text },
#   element email { text }
# }

Supported Contexts

Documentation comments can be attached to: - Element definitions (element foo { …​ }) - Attribute definitions (attribute id { …​ }) - Named pattern definitions (define) - Start patterns (start = …​)

Round-Trip Conversion

Documentation is fully preserved through round-trip conversion:

# RNC → RNG XML → Grammar → RNC
rnc_with_docs = File.read('schema.rnc')
grammar = Rng.parse_rnc(rnc_with_docs)
rng_xml = grammar.to_xml
grammar2 = Rng.parse(rng_xml)
rnc_back = Rng.to_rnc(grammar2)

# Documentation comments are preserved throughout

String concatenation

Lutaml-RNG provides full support for RELAX NG Compact Syntax string concatenation using the ~ operator for joining string literals at parse time.

General

The ~ operator concatenates adjacent string literals, allowing long URIs or values to be split across multiple lines for improved readability and maintainability. Concatenation happens at parse time, so the result is a single string value in the final schema.

Syntax

String concatenation uses the ~ operator between quoted strings:

namespace eg = "http://" ~ "www.example.com"

datatypes xsd = "http://www.w3.org/" ~ "2001" ~ "/" ~ "XMLSchema-datatypes"

Multiple strings can be concatenated in sequence:

# Split long namespace URI for readability
namespace example = "http://" ~
                    "www.example.com/" ~
                    "schemas/" ~
                    "version/" ~
                    "1.0"

Supported Contexts

String concatenation works in all string literal contexts:

  • Namespace declarations

  • Datatype library URIs

  • Include directive hrefs

  • External reference hrefs

  • Value literals

  • Datatype parameters

Example

require 'rng'

rnc = <<~RNC
  # Split long URI for readability
  namespace example = "http://" ~
                      "www.example.com/" ~
                      "schemas/" ~
                      "v1.0"

  start = element foo { empty }
RNC

# Parse RNC - strings are joined at parse time
grammar = Rng.parse_rnc(rnc)

# Full concatenated URI is available
rng_xml = grammar.to_xml
puts rng_xml
# Output:
# <grammar xmlns="http://relaxng.org/ns/structure/1.0"
#          ns="http://www.example.com/schemas/v1.0">
#   ...
# </grammar>

Concatenation in Parameters

String concatenation also works in datatype parameters:

attribute code {
  xsd:string {
    pattern = "[A-Z]" ~ "{2}" ~ "-" ~ "[0-9]" ~ "{4}"
  }
}

This concatenates to the pattern [A-Z]{2}-[0-9]{4} at parse time.

Invalid Contexts

String concatenation is not allowed in contexts where string values are not expected:

  • Element names (identifiers, not strings)

  • Attribute names (identifiers, not strings)

  • Pattern references (identifiers, not strings)

# INVALID - cannot concatenate element names
element "foo" ~ "bar" { empty }

# VALID - use single identifier
element foobar { empty }

Escape sequences

Lutaml-RNG provides full support for RELAX NG Compact Syntax escape sequences for Unicode code points and special characters in both identifiers and string literals.

General

Escape sequences enable the use of Unicode characters and special characters that would otherwise be difficult or impossible to represent directly in RNC syntax. The library processes escape sequences at the parsing level with semantic interpretation in the converter layer.

Status: ✅ Fully implemented with backward compatibility support.

Unicode Code Points

Use \x{HHHHHH} syntax (1-6 hexadecimal digits) for Unicode characters in both identifiers and strings. The library validates all Unicode code points to ensure they are within valid ranges:

# Unicode in identifier names
element \x{66}oo { empty }  # → element foo { empty }
element \x{1F4DA} { text }  # → element 📚 { text }

# Unicode in string values
element test { "\x{10300}" }  # → Gothic letter Ahsa: 𐌀
element message { "Hello \x{1F44B}" }  # → Hello 👋

Unicode Validation:

The library validates all Unicode escape sequences to reject invalid code points:

  • Surrogate code points (U+D800 to U+DFFF): Rejected with clear error message

  • Out-of-range code points (> U+10FFFF): Rejected with clear error message

  • Valid range: U+0000 to U+D7FF and U+E000 to U+10FFFF

# Invalid: Surrogate code point
Rng.parse_rnc('element foo { "\x{D800}" }')
# Raises: ArgumentError: Invalid Unicode: surrogate code point U+D800 is not allowed

# Invalid: Out of range
Rng.parse_rnc('element foo { "\x{110000}" }')
# Raises: ArgumentError: Invalid Unicode: code point U+110000 exceeds maximum (U+10FFFF)

# Valid: Maximum code point
Rng.parse_rnc('element foo { "\x{10FFFF}" }')  # ✓ Works correctly

This validation prevents security issues and encoding problems that could arise from invalid Unicode code points in schemas.

Character Escapes in Strings

Standard escape sequences for special characters in string literals:

element message { "Hello\nWorld" }     # Newline
element data { "Tab\tSeparated" }      # Tab
element path { "C:\\Users\\file" }     # Backslash
element quote { "She said \"Hi\"" }    # Double quote
element mixed { "Line1\r\nLine2" }     # Carriage return + newline

Supported escape sequences:

  • \" - Double quote

  • \\ - Backslash

  • \n - Newline (LF)

  • \r - Carriage return (CR)

  • \t - Tab

Escaped Backslash

A double backslash \\ before an escape sequence prevents conversion:

# Literal backslash-x sequence (not converted)
element name { "\\x{66}oo" }  # → \x{66}oo (stays literal)

Example Usage

require 'rng'

# Parse RNC with escape sequences
rnc = <<~RNC
  element \x{66}oo {
    attribute id { "\x{41}BC" },
    "Hello\nWorld"
  }
RNC

grammar = Rng.parse_rnc(rnc)

# Access converted values
element = grammar.start.first.element
puts element.attr_name  # → "foo" (Unicode escape converted)

# Convert to RNG XML
rng_xml = grammar.to_xml
# Escape sequences are resolved in the output

Implementation Notes

  • Escape sequences are processed during parsing and resolved in the object model

  • The implementation maintains backward compatibility through dual parse tree structure support

  • Regular identifiers without escapes continue to work unchanged

  • Parse tree format changed but converter handles both old and new formats transparently

Annotations

Lutaml-RNG provides full support for RELAX NG Compact Syntax annotations, allowing foreign attributes and elements from non-RELAX NG namespaces to be embedded in schema definitions.

General

Annotations enable embedding metadata and documentation from other XML vocabularies within RELAX NG schemas. This feature is essential for extensibility and integration with other XML technologies. Annotations are written using bracket notation […​] before schema patterns.

Foreign attributes and elements must use namespaces that are NOT the RELAX NG namespace (http://relaxng.org/ns/structure/1.0), ensuring clear separation between schema structure and annotations.

Status: ✅ Fully implemented (Phase 8A, December 2025).

Foreign Attributes

Foreign attributes add metadata to patterns using the syntax [ns:attr = "value"]:

namespace xml = "http://www.w3.org/XML/1998/namespace"

# Foreign attribute annotation
[xml:space = "default"]
element foo { empty }

This generates RNG XML with the foreign attribute:

<element name="foo"
  xmlns="http://relaxng.org/ns/structure/1.0"
  xml:space="default">
  <empty/>
</element>

Multiple foreign attributes can be specified in a single annotation block:

namespace eg = "http://www.example.com"

[eg:version = "1.0" eg:author = "John Doe"]
element document { text }

Foreign Elements

Foreign elements provide richer annotations with text content or nested structure using the syntax [ns:elem [ content ]]:

namespace eg = "http://www.example.com"

# Foreign element with text content
[eg:foo [ "x" "y" ~ "z" ]]
element bar { empty }

This generates RNG XML:

<element name="bar"
  xmlns="http://relaxng.org/ns/structure/1.0"
  xmlns:eg="http://www.example.com">
  <eg:foo>xyz</eg:foo>
  <empty/>
</element>

Foreign elements without namespace prefix use the default namespace (empty string):

div {
  foo []  # Foreign element without namespace
  foo = element foo { empty }
}

Generates:

<grammar xmlns="http://relaxng.org/ns/structure/1.0">
  <div>
    <foo xmlns=""/>
    <define name="foo">
      <element name="foo"><empty/></element>
    </define>
  </div>
</grammar>

Nested Foreign Elements

Foreign elements can contain nested foreign elements and attributes:

namespace rng = "http://relaxng.org/ns/structure/1.0"

[foo [ rng:foo [ "val" ] ]]
element bar { empty }

[foo [ rng:foo = "val" ]]
element baz { empty }

Generates nested XML:

<element name="bar"
  xmlns:rng="http://relaxng.org/ns/structure/1.0"
  xmlns="http://relaxng.org/ns/structure/1.0">
  <foo xmlns="">
    <rng:foo>val</rng:foo>
  </foo>
  <empty/>
</element>

<element name="baz"
  xmlns:rng="http://relaxng.org/ns/structure/1.0"
  xmlns="http://relaxng.org/ns/structure/1.0">
  <foo xmlns="" rng:foo="val"/>
  <empty/>
</element>

Supported Contexts

Annotations can be attached to:

  • Element definitions (element)

  • Attribute definitions (attribute)

  • Named pattern definitions (define)

  • Start patterns (start)

  • Div blocks (div)

Programmatic Usage

require 'rng'

# Parse RNC with annotations
rnc = <<~RNC
  namespace eg = "http://www.example.com"
  [eg:version = "1.0"]
  element foo { empty }
RNC

grammar = Rng.parse_rnc(rnc)

# Access foreign attributes
element = grammar.start.element
# element.foreign_attributes contains ForeignAttribute objects

# Convert to RNG XML - annotations become XML attributes/elements
rng_xml = grammar.to_xml
puts rng_xml

Implementation

The annotation support is implemented using a model-driven architecture:

  • Rng::ForeignAttribute - Represents foreign attributes

  • Rng::ForeignElement - Represents foreign elements with recursive nesting

These classes provide clean APIs for programmatic annotation handling through the standard Lutaml::Model serialization.

Implementation status

Supported features (v0.3.2)

The library provides full support for:

  • RNG XML parsing: All RELAX NG XML schemas parse correctly, including complex Metanorma schemas

  • RNC generation: Converts object models to readable RNC syntax

  • Basic RNC parsing: Standalone RNC schemas without complex includes

  • Documentation comments infrastructure: Model classes and generators ready for ## syntax (see Documentation Comments)

  • Augmentation operators: |= (choice) and &= (interleave) operators

  • Datatype parameters: XML Schema datatype constraints

  • Word boundary checks: Keywords like text, empty, notAllowed correctly distinguished from identifiers

Current limitations (v0.3.0)

Feature Status Description

Complex include processing

FULLY SUPPORTED

Two-phase parsing architecture successfully handles complex include blocks with overrides. 21/21 Metanorma test schemas passing (100%).

Round-trip conversion

FULLY SUPPORTED

Complete bidirectional conversion with 98.4% test pass rate (126/128 tests). See Format Conversion section.

div elements

SUPPORTED

Documentation grouping fully supported in RNG XML parsing, generation, and within override blocks

Name class exceptions

SUPPORTED

anyName and nsName exception patterns fully supported in elements and attributes

Official test suite validation

⚠️ PARTIAL (32.1% passing)

Validated against Jing-Trang compacttest.xml (87 test cases). See Official Test Suite Validation section.

| Documentation comments (##) | ✅ FULLY SUPPORTED | Complete implementation with round-trip preservation. Parser, models, converter, and builder all working. See Documentation Comments.

=== Official test suite validation (v0.3.2)

Test Suite: Jing-Trang compacttest.xml (Official RELAX NG Compact Syntax Tests)

The library has been validated against the official RELAX NG test suite from the Jing-Trang project:

[cols="2,1,1,2"]

| Test Category | Passed | Failed | Success Rate

| Valid RNC Parsing | 26 | 27 | 49.1%

| Invalid RNC Rejection | 29 | 2 | 93.5%

| Round-Trip Conversion | 126 | 2 | 98.4%

Total Test Cases: 87 (56 valid, 31 invalid, 3 resource-based skipped)

Recent Improvements (v0.3.2): * ✅ Unicode validation: +6.4% invalid rejection improvement (87.1% → 93.5%) * ✅ Surrogate code points (U+D800-U+DFFF) now correctly rejected * ✅ Out-of-range code points (> U+10FFFF) now correctly rejected * ✅ All production schemas (Metanorma 21/21) maintained at 100%

==== Test Results Summary

Strengths:: * Excellent invalid schema rejection (93.5%) - improved with Unicode validation * Outstanding round-trip conversion (98.4%) * Complex production schemas (Metanorma) parse successfully * Documentation comments fully supported (5/5 tests passing) * String concatenation fully supported (already working) * Unicode validation prevents security and encoding issues * Strong foundation for real-world use

⚠️ Known Gaps:: * Annotations (foreign attributes/elements) - 19 tests (36% of failures) * Comment positioning edge cases - 8 tests (15% of failures) * Complex nested patterns - 3 tests (6% of failures) * Advanced escape sequences - 5 tests (9% of failures)

Analysis: The library provides excellent production schema support and high-quality round-trip conversion. Remaining gaps are primarily advanced specification features: annotations (foreign XML elements/attributes), comment positioning between keywords, and optimization for very large schemas.

See [PHASE_7_COMPLETION_SUMMARY.md](PHASE_7_COMPLETION_SUMMARY.md) for recent implementation details and [CONTINUATION_PLAN_REVISED_PHASES.md](CONTINUATION_PLAN_REVISED_PHASES.md) for next steps.

==== Running the Test Suite

[source,bash] ---- # Run official test suite validation bundle exec rspec spec/rng/compacttest_spec.rb

# Run with detailed output bundle exec rspec spec/rng/compacttest_spec.rb --format documentation ----

=== Parser optimization (v0.2.0)

🎉 Achievement: 100% Metanorma Schema Support (21/21):: The RNC parser has achieved complete success with production schemas using a two-phase parsing approach with proper scoping: + * Success rate: ✅ 100% - All 21/21 Metanorma schemas passing * Architecture: Two-phase approach eliminates Parslet backtracking issues Phase 1: Capture large blocks (overrides, grammar content, trailing patterns) as raw text using balanced_braces Phase 2: Post-process raw text with proper grammar rules for correct structure * Performance: Near-instant parsing (< 1 second per schema) * Code quality: Clean separation of concerns, maintainable architecture * No regressions: Actually improved test results (199 → 197 failures) + See PARSER_100_PERCENT_STATUS.md for complete implementation details.

Two-Phase Implementation:: The parser handles complex schemas through targeted raw text capture: + . Raw Text Capture ([lib/rng/rnc_parser.rb](lib/rng/rnc_parser.rb)): Using balanced_braces and any.repeat for override blocks, grammar blocks, and trailing patterns . Proper Scoping ([lib/rng/parse_tree_processor.rb](lib/rng/parse_tree_processor.rb)): Post-processing raw content with correct grammar rules (grammar, override, patterns) . Clean Conversion ([lib/rng/rnc_to_rng_converter.rb](lib/rng/rnc_to_rng_converter.rb)): Handling structured parse trees with proper component separation + This architecture successfully handles all production schemas including: + * ✅ bsi.rnc - 77-line override block * ✅ ietf.rnc - Complex override patterns * ✅ isodoc.rnc - 322-line override inside grammar block * ✅ isostandard.rnc - 110-line override block + many top-level patterns (newly fixed) * ✅ All 17 other Metanorma production schemas

Implementation Details:: The key breakthrough was applying raw text capture selectively: + * Grammar blocks: Capture entire content, parse with proper scope * Include overrides: Capture override blocks, parse with proper scope * Top-level includes: Capture trailing patterns to avoid backtracking * Regular grammars: Parse normally without raw capture (no performance issues) + This surgical approach maintains compatibility with simple schemas while handling complex ones.

Keyword Matching (FIXED in v0.2.0):: Previous versions had issues with keywords like "text" matching identifiers like "textarea". This has been fixed with word boundary checks.

=== Round-trip conversion notes

When converting schemas through the library:

* XML comments are not preserved: Comments in RNG XML files are lost during parsing (Lutaml::Model limitation) * Attribute ordering may change: XML attribute order is not semantically significant and may differ after round-trip * Namespace prefixes may change: Namespace URIs are preserved but prefixes may be reassigned

These are cosmetic differences that do not affect schema semantics.

== Limitations

=== Known Issues

==== Special Attribute Values

The value map for special attributes (:empty, :omitted, :nil) currently renders as string values. Workaround: use empty strings directly.

.Using empty strings instead of special symbols [source,ruby] ---- grammar.ns = "" # Use this instead of :empty grammar.datatypeLibrary = "" # Use this instead of :omitted ----

Impact: Low - Simple workaround available

Status: 2 pending tests in rng_generation_spec.rb

Related: Requires investigation of Lutaml::Model value_map configuration

==== RNC Choice Patterns

Some complex choice patterns may be rendered as sequences in RNC output. The semantic meaning is preserved, but the syntax may differ from the original.

.Example of choice pattern rendering [example] ==== Input RNG XML: [source,xml] ---- <choice> <element name="a"><text/></element> <element name="b"><text/></element> </choice> ----

Expected RNC output: [source,rnc] ---- element a { text }

element b { text } ----

Actual RNC output: [source,rnc] ---- element a { text }, element b { text } ----

The schema functions correctly but uses sequence syntax instead of choice syntax. ====

Impact: Low - Schemas parse correctly, semantic meaning preserved

Status: 1 test adjusted to verify structure instead of exact syntax

Related: Enhancement needed in lib/rng/rnc_builder.rb

=== Testing

The library includes a comprehensive test suite:

bash # Run all tests bundle exec rspec

# Run RNC parser tests bundle exec rspec spec/rng/rnc_parser_spec.rb

# Run Metanorma schema tests (21 real-world schemas) bundle exec rspec spec/rng/rnc_parser_spec.rb:231

Current test results (v0.2.0): * Core parser tests: ✅ All passing * Metanorma RNC schemas: ✅ 21/21 passing (100%) * Complex schemas with includes: ✅ Working with two-phase parsing * Complex override blocks: ✅ Successfully handle 300+ line blocks * Div blocks: ✅ Fully supported including nested divs * Round-trip conversion: 🔄 Work in progress

Production Schema Validation: * All 21 Metanorma schemas parse successfully * Performance: < 1 second per schema * No known parsing limitations for production use

== Environment Variables

=== RNG_VERBOSE

Control warning output during schema parsing:

[source,bash] ---- # Default: Suppress verbose parser warnings (clean production output) ruby your_script.rb

# Enable verbose warnings for debugging RNG_VERBOSE=1 ruby your_script.rb ----

What are these warnings?

During RNC parsing, the parser may use fallback parsing strategies for certain complex patterns. These fallback behaviors are benign and produce correct results, but generate warnings to aid debugging.

When to use RNG_VERBOSE=1: - Investigating parsing behavior - Debugging new schema patterns - Contributing to parser development - Understanding how your schema is processed

Default behavior (RNG_VERBOSE not set): - Clean output for production use - All schemas parse correctly without verbose warnings - Parsing behavior unchanged

== Troubleshooting

=== Parse errors

If you encounter parse errors when working with RNC files:

1. Check for include directives: If your schema uses include, try using RNG XML format instead 2. Validate syntax: Ensure your RNC syntax is correct (use external tools like trang to validate) 3. Try simpler patterns: Some complex patterns may not yet be fully supported 4. Check the error message: Parse errors include line and column numbers to help locate issues

=== Conversion issues

If conversion between formats produces unexpected results:

1. Start with simple schemas: Test with basic schemas before trying complex ones 2. Check round-trip: Parse → Convert → Parse again and compare results 3. Verify namespaces: Ensure namespace declarations are correct 4. Use RNG as intermediate format: RNG XML has more mature support

== Roadmap

=== Completed (v0.3.0)

Phase 3: Official Test Suite Integration:: * Integrated Jing-Trang compacttest.xml (87 test cases) * Established baseline specification compliance: 32.1% * Validated against official RELAX NG test suite

Phase 7C: Documentation Comments:: * Full ## syntax support in RNC parser and builder * <a:documentation> element generation in RNG XML * Round-trip preservation: RNC → RNG → RNC * Model classes updated (Element, Attribute, Define, Start) * Status: All 5 documentation tests passing (100%)

String Concatenation (was already working):: * ~ operator for string literal concatenation * Parse-time string joining * Support in namespace declarations, datatype libraries, parameters * Status: Working since Phase 6

=== In Progress (v0.4.0)

Phase 8A: Annotations Support ⏱️ 6-8 hours:: * Implement foreign attribute and element support * Parse annotation blocks: [ns:attr="val"], elem [] * Handle nested foreign elements * Expected: +19 tests passing (→ 70%)

Phase 8B: Comment Positioning ⏱️ 4-5 hours:: * Fix comments between keywords and identifiers * Handle comments after operators * Expected: +8 tests passing (→ 85%)

Phase 8C: Complex Schema Optimization ⏱️ 4-6 hours:: * Profile and optimize parser for large schemas * Fix RELAX NG spec, RDF, XHTML parsing * Expected: +3 tests passing (→ 91%)

=== Future Enhancements

External Resource Support (✅ Completed in v0.4.0):: * File system integration for include and externalRef - DONE * URI resolution with relative path support - DONE * Circular reference detection - DONE * Rng.parse() accepts resolve_external: true option - DONE

CLI Interface (Thor-based):: * rng validate <schema.rng> <document.xml> - Validate XML against schema * rng convert <input.rnc> [-o output.rng] - Convert between RNC/RNG * rng parse <schema.rng> - Parse and display AST * Leverages existing programmatic APIs

XML Validation:: * Validate XML documents against RNG schemas * Integration with validation libraries

Schema Simplification:: * Implement RELAX NG simplification algorithm * Optimize schema structures

See CONTINUATION_PLAN_PHASE4B.md for detailed implementation plans.

== Contributing

1. Fork the repository 2. Create your feature branch (git checkout -b feature/my-new-feature) 3. Commit your changes (git commit -am 'Add some feature') 4. Push to the branch (git push origin feature/my-new-feature) 5. Create a new Pull Request

== License

Copyright (c) 2025 Ribose Inc.

This project is licensed under the BSD-2-Clause License.

About

Parser for RNG / RNC

Resources

Code of conduct

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages