GitHub - AustralianCancerDataNetwork/orm-loader: SQLAlchemy helper tools for managing dataload and model generation

orm-loader

A lightweight foundation for building and validating SQLAlchemy-based data models.

orm-loader sits below any particular schema or CDM. It gives you a small set of reusable pieces for defining tables, loading files through staging tables, and checking models against external specifications. It stays out of domain logic on purpose.

The library focuses on:

ORM table mixins and introspection
staged file loading
loader and validation infrastructure
operational helpers that work across supported backends

At the moment, the built-in backends are SQLite and PostgreSQL.

What this library provides

The package is deliberately small. Most downstream projects only need a couple of these pieces.

A minimal ORM table base

ORMTableBase provides structural utilities for mapped tables without pulling domain rules into the base layer.

It supports:

mapper access and inspection
primary key discovery
required (non-nullable) column detection
consistent primary key handling across models
simple ID allocation helpers for sequence-less databases

from orm_loader.tables import ORMTableBase

class MyTable(ORMTableBase, Base):
    __tablename__ = "my_table"

You can inherit from it directly or pick it up through one of the higher-level mixins.

CSV-based ingestion mixins

CSVLoadableTableInterface adds staged file loading to ORM tables. It can use pandas or PyArrow loaders, and on PostgreSQL it can use a fast COPY path when the input is clean enough.

Features include:

staging table creation and cleanup
chunked loading for large files
optional casting and deduplication before insert
backend-specific merge behaviour
PostgreSQL fast-path loading with ORM fallback
backend-aware index handling during merge

class MyTable(CSVLoadableTableInterface, ORMTableBase, Base):
    __tablename__ = "my_table"

The main extension points here are loader choice, column mapping, and the normal SQLAlchemy model definitions themselves. Most downstream projects do not need to override much beyond csv_columns() and the model schema.

Structured serialisation and hashing

SerialisableTableInterface adds lightweight serialisation helpers for ORM rows.

It supports:

conversion to dictionaries
JSON serialisation
stable row-level fingerprints
iterator-style access to field/value pairs

row = session.get(MyTable, 1)
row.to_dict()
row.to_json()
row.fingerprint()

This is useful for:

debugging
auditing
reproducibility checks
downstream APIs or exports

Model registry and validation scaffolding

The library includes validation infrastructure for comparing ORM models against external specifications.

This includes:

a model registry
table and field descriptors
validator contracts
a validation runner
structured validation reports Specifications can be loaded from CSV today, with support for other formats (e.g. LinkML) planned.

registry = ModelRegistry(model_version="1.0")
registry.load_table_specs(table_csv, field_csv)
registry.register_models([MyTable])

runner = ValidationRunner(validators=always_on_validators())
report = runner.run(registry)

Validation output is available as:

human-readable text
structured dictionaries
JSON (for CI/CD integration)
exit codes suitable for pipelines

Database bootstrap helpers

The library provides lightweight helpers for schema creation and bootstrapping. It does not try to replace migrations.

from orm_loader.metadata import Base
from orm_loader.bootstrap import bootstrap

bootstrap(engine, create=True)

Bulk-loading helpers

There are a few lower-level helpers for trusted bulk workflows, including backend-aware foreign key management and SQLite connection setup for heavy local loads.

Summary

This library is meant to be the boring layer underneath downstream models:

reusable ORM mixins
staged ingestion patterns
validation scaffolding
operational helpers

Domain rules, business logic, and schema semantics stay in the downstream project.

This makes it suitable as a shared foundation for:

clinical data models
research data marts
registry schemas
synthetic data pipelines

Name		Name	Last commit message	Last commit date
Latest commit History 92 Commits
.github/workflows		.github/workflows
docs		docs
src/orm_loader		src/orm_loader
tests		tests
.gitignore		.gitignore
.python-version		.python-version
CHANGELOG.md		CHANGELOG.md
README.md		README.md
TODO.txt		TODO.txt
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

orm-loader

What this library provides

Summary

About

Uh oh!

Releases 36

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

orm-loader

What this library provides

Summary

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 36

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages