TurboVault Engine

Transform source metadata into production-ready Data Vault dbt projects

🎯 What is TurboVault Engine?

TurboVault Engine is a CLI-first, Django-based automation engine that accelerates Data Vault 2.0 implementations. It:

Ingests source metadata from Excel files, database catalogs, or previously exported JSON files
Maps metadata into a consistent Data Vault domain model (Hubs, Links, Satellites)
Generates complete, production-ready dbt projects with datavault4dbt macros
Validates your model before generation with comprehensive error checking

Perfect for: Data Engineers looking to rapidly prototype, standardize, or automate their Data Vault implementations.

✨ Key Features

🏗️ Complete dbt Project Generation

Automatic model generation - SQL models with datavault4dbt macros
YAML schemas - Complete dbt documentation for all models
Organized structure - Clean folder hierarchy (staging, raw_vault, business_vault)
Template customization - Customize any template via Django Admin
Validation - Pre-generation checks to catch errors early

📦 Data Vault Modeling

Hubs - Standard and reference hubs with business keys
Links - Standard and non-historized links connecting multiple hubs
Satellites - Standard, multi-active, non-historized, effectivity, and reference satellites
PITs - Point-in-Time table generation
Reference Tables - Reference data modeling
Snapshot Controls - Configurable snapshot logic for temporal tracking

🔧 Source Management

Source Systems - Define database schemas and connections
Source Tables - Map physical tables with record source and load date
Prejoins - Cross-table joins for complex link mappings
Stage Models - Automatic staging layer with hashkeys and hashdiffs

🖥️ Developer Experience

Modern CLI - Built with Typer and Rich for beautiful terminal output
Web Initializer - Interactive, multi-step project creation wizard
Django Admin - Full web interface for model and template management
Config-Driven - YAML configuration for automation and CI/CD
Comprehensive Testing - pytest test suite with 20+ tests

🚀 Quick Start

Prerequisites

Python 3.12+
pip (Python package manager)
(Optional) Database drivers if using external databases:
- PostgreSQL: psycopg2-binary
- MySQL: mysqlclient
- SQL Server: mssql-django
- Oracle: cx_Oracle
- Snowflake: django-snowflake

Installation

Install from PyPI:

pip install turbovault-engine

Install directly from GitHub (latest development version):

pip install git+https://github.com/ScalefreeCOM/turbovault-engine.git

We recommend installing into a dedicated virtual environment:

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
pip install turbovault-engine

Initialize Your Workspace & First Project

TurboVault uses a two-step setup. First, create and enter a dedicated folder for your workspace:

mkdir my-turbovault-workspace
cd my-turbovault-workspace

Step 1 — Initialise the workspace (once per directory):

# Interactive (recommended for first time)
turbovault workspace init

# Or fully non-interactive:
turbovault workspace init \
  --db-engine sqlite3 --db-name db.sqlite3 \
  --stage-schema stage --rdv-schema rdv \
  --admin-username admin --admin-password changeme --admin-email admin@example.com

This creates turbovault.yml, initialises the database, runs all migrations, and populates default templates.

Step 2 — Create a project (once per project):

# Interactive wizard
turbovault project init --interactive

# Non-interactive with flags (great for CI/scripts)
turbovault project init --name my_project --source ./metadata.xlsx \
  --stage-schema stage --rdv-schema rdv

# Import from a previously exported JSON file (round-trip)
turbovault project init --name my_project --source ./exports/model.json

# Or from a per-project config file
turbovault project init --config config.example.yml

This creates projects/my_project/config.yml and the projects/my_project/exports/ folder.

Populate and Maintain your Data Vault model

You can check, define, and change your Data Vault model via the Django Admin interface. To launch the web interface:

# Launch the web interface
turbovault serve

Sign in via the credentials you set up during workspace initialization.

Generate Your dbt Project

# Generate dbt project from your Data Vault model
turbovault generate --project my_project

# Generate with custom output path
turbovault generate --project my_project --output ./my_dbt

# Generate with ZIP archive
turbovault generate --project my_project --zip

# Skip satellite v1 views
turbovault generate --project my_project --no-v1-satellites

📋 CLI Commands

Command	Description
`turbovault workspace init`	Initialise directory as a workspace (creates `turbovault.yml` + DB)
`turbovault workspace status`	Show workspace health (DB, projects, migrations)
`turbovault project init`	Create a new project in the workspace
`turbovault project list`	List all projects in the workspace
`turbovault generate`	Generate dbt project or export model to JSON / DBML
`turbovault serve`	Start Django admin server for model management
`turbovault reset`	Reset the database
`turbovault --help`	Show all available commands

Command Examples

# --- Workspace ---
# Initialise workspace (non-interactive)
turbovault workspace init --db-engine sqlite3 --db-name db.sqlite3 \
  --stage-schema stage --rdv-schema rdv

# Check workspace health
turbovault workspace status

# --- Projects ---
# Interactive project creation
turbovault project init --interactive

# Create from YAML config
turbovault project init --config config.yml

# List all projects
turbovault project list

# --- Generation ---
# Generate dbt project with validation
turbovault generate --project sales_datavault

# Generate in lenient mode (skip invalid entities)
turbovault generate --project sales_datavault --mode lenient

# Generate with ZIP and no v1 satellites
turbovault generate -p sales_datavault --zip --no-v1-satellites

# Export Data Vault model to JSON
turbovault generate --type json --project sales_datavault

# Start admin on custom port
turbovault serve --port 9000

🗄️ Domain Model

TurboVault Engine uses a comprehensive Data Vault domain model:

Core Entities

Entity	Description
Project	Top-level container for all metadata
Group	Logical grouping for organizing entities into subfolders
Source System	Database/schema source definitions
Source Table	Physical source tables with metadata
Hub	Data Vault hubs (standard or reference)
Link	Relationships between hubs (standard or non-historized)
Satellite	Descriptive attributes for hubs/links (6 types)
PIT	Point-in-Time tables for temporal joins
Reference Table	Reference data structures
Snapshot Control	Temporal snapshot configuration

Advanced Features

Prejoins - Define cross-table joins for link mappings
Multi-source support - Multiple sources feeding the same entity
Satellite variants - Standard, multi-active, effectivity, non-historized, reference, record-tracking
Template customization - All SQL and YAML templates customizable via Admin

⚙️ Configuration

TurboVault uses two config files with clearly separated responsibilities:

{workspace}/
├── turbovault.yml              ← workspace-level: database, global defaults
└── projects/
    └── my_project/
        └── config.yml          ← project-level: schemas, naming patterns, output

`turbovault.yml` — Workspace Config

Created once by turbovault workspace init. Contains the database connection and optional global defaults:

# Database connection (required)
database:
  engine: sqlite3       # sqlite3 | postgresql | mysql | mssql | snowflake
  name: db.sqlite3

# Optional: global defaults applied to every new project
defaults:
  stage_schema: stage
  rdv_schema: rdv
  bdv_schema: bdv

PostgreSQL example:

database:
  engine: postgresql
  name: turbovault_db
  user: turbovault_user
  password: your_password
  host: localhost
  port: 5432

Supported Databases:

SQLite (default) — no extra packages needed
PostgreSQL — pip install psycopg2-binary
MySQL/MariaDB — pip install mysqlclient
SQL Server — pip install mssql-django
Oracle — pip install cx_Oracle
Snowflake — pip install django-snowflake

`projects/<name>/config.yml` — Project Config

Created once by turbovault project init. Contains everything specific to one project:

project:
  name: "my_datavault"
  description: "My Data Vault Implementation"

# Optional: import source metadata on project init
source:
  type: excel          # excel | sqlite | json
  path: "./metadata/sources.xlsx"

configuration:
  stage_schema: "stage"
  rdv_schema: "rdv"
  bdv_schema: "bdv"

output:
  create_zip: false

See config.example.yml for the full set of options.

Documentation:

Configuration Overview - Two-config system explained with folder structure
Project Config Schema Reference - Complete config.yml field reference
Database Configuration Guide - Detailed turbovault.yml database setup

📊 Anonymous Usage Statistics

TurboVault Engine collects lightweight, anonymous usage statistics (command invoked, TurboVault version, Python version, OS family, and install type) to help us understand real-world usage and improve the tool. No personal data, project names, or Data Vault model content is ever sent.

Telemetry is enabled by default. To opt out, you can either:

Set the environment variable: TURBOVAULT_DISABLE_TELEMETRY=1
Add the following to your turbovault.yml:
```
disable_anonymous_usage_stats: true
```

🎨 Template Customization

All SQL and YAML templates can be customized:

Start admin: turbovault serve
Navigate to: Model Templates in Django Admin
Edit any template to customize generation
Higher priority templates are selected first

Templates are automatically populated from files during turbovault workspace init.

Manual Template Management

Advanced / contributor use: The following commands require access to the backend/ Django project.

# Populate templates from files
cd backend && python manage.py populate_templates

# Overwrite existing templates
python manage.py populate_templates --overwrite

✅ Validation

Pre-generation validation catches common errors:

Entity	Rule	Code
Hub (standard)	Must have hashkey	HUB_001
Hub	Must have ≥1 business key	HUB_002
Link	Must have hashkey	LNK_001
Link	Must reference ≥2 hubs	LNK_002
Satellite	Must have parent entity	SAT_001
Model	SQL generated but YAML missing	YML_001

Validation modes:

--mode strict (default): Stop on first error
--mode lenient: Skip invalid, continue with valid
--skip-validation: Skip all validation

📤 Export Formats

JSON Export

# Export full Data Vault model as JSON
turbovault generate --type json --project my_project

# Custom output path
turbovault generate --type json --project my_project --json-output ./exports/model.json

# Compact format
turbovault generate --type json --project my_project --json-format compact

Exports complete model to JSON with:

Project metadata
All hubs, links, satellites
Stage definitions with hashkeys/hashdiffs
PITs and reference tables
Snapshot controls

JSON Import (Round-Trip)

A JSON export can be re-imported as the source for a new project, enabling project migration, backup/restore, and sharing model definitions across workspaces:

# 1. Export the model from the source workspace
turbovault generate --type json --project my_project --json-output ./model.json

# 2. Import it into a new workspace (or project name)
turbovault project init --name my_project_copy --source ./model.json

# Or use a config.yml:
# source:
#   type: json
#   path: "./model.json"

Everything — hubs, links, satellites, stages, snapshot controls, PITs, reference tables — is restored exactly as it was in the original project.

DBML Export

# Export Data Vault model as a DBML diagram
turbovault generate --type dbml --project my_project

# Custom output path
turbovault generate --type dbml --project my_project --dbml-output ./exports/model.dbml

Exports the model as DBML (Database Markup Language), which can be rendered in dbdiagram.io to visualize entity relationships.

dbt Project

turbovault generate --project my_project

Generates ready-to-use dbt project with:

SQL models using datavault4dbt macros
YAML schemas for all models
Complete folder structure
packages.yml with datavault4dbt dependency

🤝 Contributing

We welcome and appreciate community contributions! To keep the project sustainable while ensuring the software remains open and accessible, we follow a Dual-Licensing model.

📜 Licensing & Open Source

This project is licensed under the GNU Affero General Public License v3 (AGPL-3.0).

The AGPL is a "strong copyleft" license. If you modify this software and provide it as a service over a network (SaaS), you must make your modified source code available to your users under the same license.

✍️ Contributor License Agreement (CLA)

To contribute code, all contributors are required to sign our Contributor License Agreement (CLA).

Why? This ensures that you have the right to contribute the code and grants us the necessary rights to include your work in future versions of the project, including potential commercial or non-AGPL distributions.
How? FIXME

💼 Commercial Usage & Licensing

We understand that the AGPL-3.0 may not be suitable for every organization's internal policies or proprietary products.

If you wish to use this project in a commercial or proprietary setting without the "copyleft" requirements of the AGPL, we offer alternative commercial licenses. This allows you to:

Use the software without disclosing your own source code.
Receive dedicated support and enterprise-grade warranties.
Support the development team.

Please contact us at contact@scalefree.com to discuss a commercial license tailored to your needs.

📚 Documentation

Getting Started

Configuration

Concepts

📄 License

This project is licensed under the GNU Affero General Public License v3 (AGPL-3.0) - see the LICENSE file for details.

🙏 Acknowledgements

Built with:

Django - Web framework
Typer - CLI framework
Rich - Terminal formatting
Pydantic - Data validation
Jinja2 - Template engine
datavault4dbt - dbt macros

Built by Scalefree

Documentation · Changelog · Report Bug · Request Feature

Name		Name	Last commit message	Last commit date
Latest commit History 169 Commits
.agent/rules		.agent/rules
.github		.github
backend		backend
docs		docs
scripts		scripts
.dockerignore		.dockerignore
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.release-please-manifest.json		.release-please-manifest.json
CHANGELOG.md		CHANGELOG.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
TurboVault_TPCH_Data.db		TurboVault_TPCH_Data.db
TurboVault_TPCH_Data.xlsx		TurboVault_TPCH_Data.xlsx
config.example.yml		config.example.yml
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml
release-please-config.json		release-please-config.json
turbovault.example.yml		turbovault.example.yml
verify_naming_standards.py		verify_naming_standards.py

Folders and files

Latest commit

History

Repository files navigation

TurboVault Engine

🎯 What is TurboVault Engine?

✨ Key Features

🏗️ Complete dbt Project Generation

📦 Data Vault Modeling

🔧 Source Management

🖥️ Developer Experience

🚀 Quick Start

Prerequisites

Installation

Initialize Your Workspace & First Project

Populate and Maintain your Data Vault model

Generate Your dbt Project

📋 CLI Commands

Command Examples

🗄️ Domain Model

Core Entities

Advanced Features

⚙️ Configuration

turbovault.yml — Workspace Config

projects/<name>/config.yml — Project Config

📊 Anonymous Usage Statistics

🎨 Template Customization

Manual Template Management

✅ Validation

📤 Export Formats

JSON Export

JSON Import (Round-Trip)

DBML Export

dbt Project

🤝 Contributing

📜 Licensing & Open Source

✍️ Contributor License Agreement (CLA)

💼 Commercial Usage & Licensing

📚 Documentation

📄 License

🙏 Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 26

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`turbovault.yml` — Workspace Config

`projects/<name>/config.yml` — Project Config

Packages