TurboVault Engine is a CLI-first, Django-based automation engine that accelerates Data Vault 2.0 implementations. It:
- Ingests source metadata from Excel files, database catalogs, or previously exported JSON files
- Maps metadata into a consistent Data Vault domain model (Hubs, Links, Satellites)
- Generates complete, production-ready dbt projects with datavault4dbt macros
- Validates your model before generation with comprehensive error checking
Perfect for: Data Engineers looking to rapidly prototype, standardize, or automate their Data Vault implementations.
- Automatic model generation - SQL models with datavault4dbt macros
- YAML schemas - Complete dbt documentation for all models
- Organized structure - Clean folder hierarchy (staging, raw_vault, business_vault)
- Template customization - Customize any template via Django Admin
- Validation - Pre-generation checks to catch errors early
- Hubs - Standard and reference hubs with business keys
- Links - Standard and non-historized links connecting multiple hubs
- Satellites - Standard, multi-active, non-historized, effectivity, and reference satellites
- PITs - Point-in-Time table generation
- Reference Tables - Reference data modeling
- Snapshot Controls - Configurable snapshot logic for temporal tracking
- Source Systems - Define database schemas and connections
- Source Tables - Map physical tables with record source and load date
- Prejoins - Cross-table joins for complex link mappings
- Stage Models - Automatic staging layer with hashkeys and hashdiffs
- Modern CLI - Built with Typer and Rich for beautiful terminal output
- Web Initializer - Interactive, multi-step project creation wizard
- Django Admin - Full web interface for model and template management
- Config-Driven - YAML configuration for automation and CI/CD
- Comprehensive Testing - pytest test suite with 20+ tests
- Python 3.12+
- pip (Python package manager)
- (Optional) Database drivers if using external databases:
- PostgreSQL:
psycopg2-binary - MySQL:
mysqlclient - SQL Server:
mssql-django - Oracle:
cx_Oracle - Snowflake:
django-snowflake
- PostgreSQL:
Install from PyPI:
pip install turbovault-engineInstall directly from GitHub (latest development version):
pip install git+https://github.com/ScalefreeCOM/turbovault-engine.gitWe recommend installing into a dedicated virtual environment:
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
pip install turbovault-engineTurboVault uses a two-step setup. First, create and enter a dedicated folder for your workspace:
mkdir my-turbovault-workspace
cd my-turbovault-workspaceStep 1 — Initialise the workspace (once per directory):
# Interactive (recommended for first time)
turbovault workspace init
# Or fully non-interactive:
turbovault workspace init \
--db-engine sqlite3 --db-name db.sqlite3 \
--stage-schema stage --rdv-schema rdv \
--admin-username admin --admin-password changeme --admin-email admin@example.comThis creates turbovault.yml, initialises the database, runs all migrations, and populates default templates.
Step 2 — Create a project (once per project):
# Interactive wizard
turbovault project init --interactive
# Non-interactive with flags (great for CI/scripts)
turbovault project init --name my_project --source ./metadata.xlsx \
--stage-schema stage --rdv-schema rdv
# Import from a previously exported JSON file (round-trip)
turbovault project init --name my_project --source ./exports/model.json
# Or from a per-project config file
turbovault project init --config config.example.ymlThis creates projects/my_project/config.yml and the projects/my_project/exports/ folder.
You can check, define, and change your Data Vault model via the Django Admin interface. To launch the web interface:
# Launch the web interface
turbovault serveSign in via the credentials you set up during workspace initialization.
# Generate dbt project from your Data Vault model
turbovault generate --project my_project
# Generate with custom output path
turbovault generate --project my_project --output ./my_dbt
# Generate with ZIP archive
turbovault generate --project my_project --zip
# Skip satellite v1 views
turbovault generate --project my_project --no-v1-satellites| Command | Description |
|---|---|
turbovault workspace init |
Initialise directory as a workspace (creates turbovault.yml + DB) |
turbovault workspace status |
Show workspace health (DB, projects, migrations) |
turbovault project init |
Create a new project in the workspace |
turbovault project list |
List all projects in the workspace |
turbovault generate |
Generate dbt project or export model to JSON / DBML |
turbovault serve |
Start Django admin server for model management |
turbovault reset |
Reset the database |
turbovault --help |
Show all available commands |
# --- Workspace ---
# Initialise workspace (non-interactive)
turbovault workspace init --db-engine sqlite3 --db-name db.sqlite3 \
--stage-schema stage --rdv-schema rdv
# Check workspace health
turbovault workspace status
# --- Projects ---
# Interactive project creation
turbovault project init --interactive
# Create from YAML config
turbovault project init --config config.yml
# List all projects
turbovault project list
# --- Generation ---
# Generate dbt project with validation
turbovault generate --project sales_datavault
# Generate in lenient mode (skip invalid entities)
turbovault generate --project sales_datavault --mode lenient
# Generate with ZIP and no v1 satellites
turbovault generate -p sales_datavault --zip --no-v1-satellites
# Export Data Vault model to JSON
turbovault generate --type json --project sales_datavault
# Start admin on custom port
turbovault serve --port 9000TurboVault Engine uses a comprehensive Data Vault domain model:
| Entity | Description |
|---|---|
| Project | Top-level container for all metadata |
| Group | Logical grouping for organizing entities into subfolders |
| Source System | Database/schema source definitions |
| Source Table | Physical source tables with metadata |
| Hub | Data Vault hubs (standard or reference) |
| Link | Relationships between hubs (standard or non-historized) |
| Satellite | Descriptive attributes for hubs/links (6 types) |
| PIT | Point-in-Time tables for temporal joins |
| Reference Table | Reference data structures |
| Snapshot Control | Temporal snapshot configuration |
- Prejoins - Define cross-table joins for link mappings
- Multi-source support - Multiple sources feeding the same entity
- Satellite variants - Standard, multi-active, effectivity, non-historized, reference, record-tracking
- Template customization - All SQL and YAML templates customizable via Admin
TurboVault uses two config files with clearly separated responsibilities:
{workspace}/
├── turbovault.yml ← workspace-level: database, global defaults
└── projects/
└── my_project/
└── config.yml ← project-level: schemas, naming patterns, output
Created once by turbovault workspace init. Contains the database connection and optional global defaults:
# Database connection (required)
database:
engine: sqlite3 # sqlite3 | postgresql | mysql | mssql | snowflake
name: db.sqlite3
# Optional: global defaults applied to every new project
defaults:
stage_schema: stage
rdv_schema: rdv
bdv_schema: bdvPostgreSQL example:
database:
engine: postgresql
name: turbovault_db
user: turbovault_user
password: your_password
host: localhost
port: 5432Supported Databases:
- SQLite (default) — no extra packages needed
- PostgreSQL —
pip install psycopg2-binary - MySQL/MariaDB —
pip install mysqlclient - SQL Server —
pip install mssql-django - Oracle —
pip install cx_Oracle - Snowflake —
pip install django-snowflake
Created once by turbovault project init. Contains everything specific to one project:
project:
name: "my_datavault"
description: "My Data Vault Implementation"
# Optional: import source metadata on project init
source:
type: excel # excel | sqlite | json
path: "./metadata/sources.xlsx"
configuration:
stage_schema: "stage"
rdv_schema: "rdv"
bdv_schema: "bdv"
output:
create_zip: falseSee config.example.yml for the full set of options.
Documentation:
- Configuration Overview - Two-config system explained with folder structure
- Project Config Schema Reference - Complete
config.ymlfield reference - Database Configuration Guide - Detailed
turbovault.ymldatabase setup
TurboVault Engine collects lightweight, anonymous usage statistics (command invoked, TurboVault version, Python version, OS family, and install type) to help us understand real-world usage and improve the tool. No personal data, project names, or Data Vault model content is ever sent.
Telemetry is enabled by default. To opt out, you can either:
- Set the environment variable:
TURBOVAULT_DISABLE_TELEMETRY=1 - Add the following to your
turbovault.yml:disable_anonymous_usage_stats: true
All SQL and YAML templates can be customized:
- Start admin:
turbovault serve - Navigate to: Model Templates in Django Admin
- Edit any template to customize generation
- Higher priority templates are selected first
Templates are automatically populated from files during turbovault workspace init.
Advanced / contributor use: The following commands require access to the
backend/Django project.
# Populate templates from files
cd backend && python manage.py populate_templates
# Overwrite existing templates
python manage.py populate_templates --overwritePre-generation validation catches common errors:
| Entity | Rule | Code |
|---|---|---|
| Hub (standard) | Must have hashkey | HUB_001 |
| Hub | Must have ≥1 business key | HUB_002 |
| Link | Must have hashkey | LNK_001 |
| Link | Must reference ≥2 hubs | LNK_002 |
| Satellite | Must have parent entity | SAT_001 |
| Model | SQL generated but YAML missing | YML_001 |
Validation modes:
--mode strict(default): Stop on first error--mode lenient: Skip invalid, continue with valid--skip-validation: Skip all validation
# Export full Data Vault model as JSON
turbovault generate --type json --project my_project
# Custom output path
turbovault generate --type json --project my_project --json-output ./exports/model.json
# Compact format
turbovault generate --type json --project my_project --json-format compactExports complete model to JSON with:
- Project metadata
- All hubs, links, satellites
- Stage definitions with hashkeys/hashdiffs
- PITs and reference tables
- Snapshot controls
A JSON export can be re-imported as the source for a new project, enabling project migration, backup/restore, and sharing model definitions across workspaces:
# 1. Export the model from the source workspace
turbovault generate --type json --project my_project --json-output ./model.json
# 2. Import it into a new workspace (or project name)
turbovault project init --name my_project_copy --source ./model.json
# Or use a config.yml:
# source:
# type: json
# path: "./model.json"Everything — hubs, links, satellites, stages, snapshot controls, PITs, reference tables — is restored exactly as it was in the original project.
# Export Data Vault model as a DBML diagram
turbovault generate --type dbml --project my_project
# Custom output path
turbovault generate --type dbml --project my_project --dbml-output ./exports/model.dbmlExports the model as DBML (Database Markup Language), which can be rendered in dbdiagram.io to visualize entity relationships.
turbovault generate --project my_projectGenerates ready-to-use dbt project with:
- SQL models using datavault4dbt macros
- YAML schemas for all models
- Complete folder structure
- packages.yml with datavault4dbt dependency
We welcome and appreciate community contributions! To keep the project sustainable while ensuring the software remains open and accessible, we follow a Dual-Licensing model.
This project is licensed under the GNU Affero General Public License v3 (AGPL-3.0).
The AGPL is a "strong copyleft" license. If you modify this software and provide it as a service over a network (SaaS), you must make your modified source code available to your users under the same license.
To contribute code, all contributors are required to sign our Contributor License Agreement (CLA).
- Why? This ensures that you have the right to contribute the code and grants us the necessary rights to include your work in future versions of the project, including potential commercial or non-AGPL distributions.
- How? FIXME
We understand that the AGPL-3.0 may not be suitable for every organization's internal policies or proprietary products.
If you wish to use this project in a commercial or proprietary setting without the "copyleft" requirements of the AGPL, we offer alternative commercial licenses. This allows you to:
- Use the software without disclosing your own source code.
- Receive dedicated support and enterprise-grade warranties.
- Support the development team.
Please contact us at contact@scalefree.com to discuss a commercial license tailored to your needs.
Getting Started
Configuration
- Configuration Overview
- Database Configuration Guide
- Project Config Schema Reference
- Environment Variables Reference
Concepts
- Architecture Overview
- Architecture Details
- Domain Model Specification
- Excel Metadata Format
- JSON Import (Round-Trip)
- Validation Rules Reference
This project is licensed under the GNU Affero General Public License v3 (AGPL-3.0) - see the LICENSE file for details.
Built with:
- Django - Web framework
- Typer - CLI framework
- Rich - Terminal formatting
- Pydantic - Data validation
- Jinja2 - Template engine
- datavault4dbt - dbt macros
Built by Scalefree
