A reproducible PostgreSQL 18 platform delivered as code. core_data builds a hardened database image with spatial, vector, and graph extensions and ships a management CLI that automates backups, restores, QA cloning, and upgrades. Everything lives in version control so environments can be rebuilt consistently across laptops, CI, and production.
- Run the same Postgres 18 stack everywhere: laptop, CI runner, or production.
- Ship with the heavy hitters pre-installed—PostGIS, pgvector, AGE, pg_cron, pgBackRest—without custom build scripts.
- Automate the boring-but-critical tasks: backups, restores, QA clones, log analytics, and even major version upgrades via pgautoupgrade.
- Treat your database like code with reproducible
.envconfigs, templated init scripts, and a pytest smoke test that catches regressions early.
Pre-built images are available from GitHub Container Registry with full SLSA attestations and SBOM for supply chain security.
# Pull the latest stable release
docker pull ghcr.io/paudley/core-data-postgres:latest
# Or specify a version
docker pull ghcr.io/paudley/core-data-postgres:18.4-v1.0.0Images follow a hybrid versioning strategy combining PostgreSQL version with semantic versioning:
| Tag Pattern | Example | Description |
|---|---|---|
latest |
latest |
Latest stable release |
{PG_VERSION}-v{MAJOR}.{MINOR}.{PATCH} |
18.4-v1.0.0 |
Exact version (recommended for production) |
{PG_VERSION}-v{MAJOR}.{MINOR} |
18.4-v1.0 |
Latest patch for minor version |
{PG_VERSION}-v{MAJOR} |
18.4-v1 |
Latest minor for major version |
{PG_VERSION} |
18.4 |
Latest semantic version for PostgreSQL version |
{PG_MAJOR} |
18 |
Latest for PostgreSQL major version |
Version Format: {PostgreSQL_Version}-v{Semantic_Version}
- Example:
18.4-v1.0.0means PostgreSQL 18.4 with semantic version 1.0.0 - See docs/RELEASING.md for complete versioning details
All published images include cryptographic attestations that prove build provenance and supply chain integrity:
# Verify image attestation (requires GitHub CLI)
gh attestation verify oci://ghcr.io/paudley/core-data-postgres:18.4-v1.0.0 \
--owner paudley
# Pull and verify in one step
docker pull ghcr.io/paudley/core-data-postgres:18.4-v1.0.0
gh attestation verify oci://ghcr.io/paudley/core-data-postgres:18.4-v1.0.0 \
--owner paudley
# Verify every referenced image from your .env (pretty console output)
./scripts/manage.sh attestation-verify --env-file ci.env.exampleGitHub invariably injects a repository-scoped GITHUB_TOKEN when workflows run. If that token cannot read paudley/core_data, the helper automatically retries the same verification anonymously (clearing GH_TOKEN/GITHUB_TOKEN) so public attestations still succeed without extra flags.
Every service in the stack is published with the same attested tag:
| Service | Image reference |
|---|---|
| PostgreSQL | ghcr.io/paudley/core_data/postgres:<stack-tag> |
| ValKey | ghcr.io/paudley/core_data/valkey:<stack-tag> |
| PgBouncer | ghcr.io/paudley/core_data/pgbouncer:<stack-tag> |
| RabbitMQ | ghcr.io/paudley/core_data/rabbitmq:<stack-tag> |
| Memcached | ghcr.io/paudley/core_data/memcached:<stack-tag> |
| Network Probe | ghcr.io/paudley/core_data/network-probe:<stack-tag> |
| Network Guard | ghcr.io/paudley/core_data/network-guard:<stack-tag> |
Use the optional CORE_DATA_STACK_TAG / CORE_DATA_STACK_REGISTRY environment variables (or .env entries) to pin a specific release across every helper; otherwise the defaults use the latest tag from ghcr.io/paudley/core_data.
Every container publishes /README.md and /SECURITY.md (copies of this repository's documentation) so you can inspect provenance and policy from inside the image with a simple ls.
Expected verification output:
✓ Verification succeeded!
sha256:abc123... was attested by:
REPO PREDICATE_TYPE WORKFLOW
paudley/core_data https://slsa.dev/provenance/v1 .github/workflows/publish-docker.yml@refs/tags/18.4-v1.0.0
What's Verified:
- SLSA Provenance: Confirms the image was built by GitHub Actions from this repository
- SBOM: Software Bill of Materials listing all components and dependencies
- Build Reproducibility: Links image digest to specific source code commit
Learn More:
Update your docker-compose.yml to use the published image instead of building locally:
services:
postgres:
image: ghcr.io/paudley/core-data-postgres:18.4-v1.0.0 # Use published image
# Remove or comment out the 'build:' section
# build:
# context: .
# dockerfile: postgres/DockerfileOr override via .env:
POSTGRES_IMAGE_NAME=ghcr.io/paudley/core-data-postgres
POSTGRES_IMAGE_TAG=18.4-v1.0.0Pre-built images from ghcr.io have the postgres user baked in with UID/GID 999. Your .env must match:
POSTGRES_UID=999
POSTGRES_GID=999Why this matters: The volume_prep service uses these values to chown data directories. If they don't match the image's baked-in UID, you'll see permission errors like:
chmod: changing permissions of '/var/lib/postgresql/data': Operation not permitted
Building locally? If you set CORE_DATA_BUILD_IMAGE=1, you can use your host user's UID/GID instead—the image build process will create the postgres user with your specified IDs.
- Custom Docker image with PostGIS, pgvector, Apache AGE, pgsodium, gzip, zstd, pg_cron, pg_squeeze, pgAudit, pgBadger, pgBackRest, and pgtune baked in.
- Init scripts render configuration from templates, create application databases, and enable extensions automatically.
./scripts/manage.shwraps lifecycle tasks: image builds,psql, logical dumps, pgBackRest backups/restores, QA cloning, log analysis, daily maintenance, and major upgrades via pgautoupgrade.- PGDATA, WAL, pgBackRest, and monitoring state live under
./data/bind mounts by default, whileBACKUPS_HOST_PATH(defaults to./backups) remains a bind mount for easy artifact exports. - Secrets stay in Docker secrets (
POSTGRES_PASSWORD_FILE) and the container runs as the non-rootpostgresUID/GID at all times, keeping the least-privilege posture consistent across init and steady state. - TLS is enforced by default with auto-generated self-signed certificates (override with your own CA material), and a multi-stage health probe (
scripts/healthcheck.sh) guards dependent services before they start. - Logging uses Docker's
localdriver with rotation and non-blocking delivery, preventing runaway JSON logs from filling the host while preserving enough history for incident response. - ValKey, PgBouncer, RabbitMQ, Memcached, Prometheus, Grafana, and required exporters are part of the standard stack with sensible defaults, secrets, metrics, dashboards, and CLI helpers.
- CI smoke test (
python -m pytest -k full_workflow) provisions a stack, exercises critical commands, and verifies upgrade safety.
core_data provisions a batteries-included extension stack in every non-template database at init time:
- Performance & Observability —
pg_stat_statements,auto_explain,pg_buffercache,pg_prewarm,bloom. - Security & Compliance —
pgaudit,pgcrypto,pgsodium(Transparent Column Encryption),"uuid-ossp". - Compression —
gzip(gzip/gunzip in SQL),zstd(Zstandard compress/decompress with dictionary support in SQL). - Developer Ergonomics —
hstore,citext,pg_trgm,btree_gin,btree_gist,hypopg,intarray,ltree,tablefunc,unaccent. - Connectivity —
postgres_fdw,dblink. - Spatial, Vector, Graph —
postgis,postgis_raster,postgis_topology,vector,age,cube,earthdistance. - Maintenance & Testing —
pg_cron(kept in thepostgresdatabase),pg_partman,pg_repack,pg_squeeze,pgstattuple,pgtap. - Geospatial Extras —
postgis_tiger_geocoder,address_standardizer,address_standardizer_data_us,pgrouting,fuzzystrmatch.
The same bundle is installed into template1 so freshly created databases inherit the tooling automatically.
pgsodium requires a 32-byte root encryption key for Transparent Column Encryption. The key is stored at secrets/pgsodium.key and bind-mounted read-only into the container at /opt/core_data/secrets/pgsodium.key. The create-env wizard and bootstrap-ci generate this key automatically. To generate one manually:
head -c 32 /dev/urandom | od -A n -t x1 | tr -d ' \n' > secrets/pgsodium.key
chmod 600 secrets/pgsodium.keyThis key is irreplaceable. If lost, any data encrypted with pgsodium's Transparent Column Encryption becomes unrecoverable. Back it up alongside your other secret material.
pg_partman_bgw is preloaded with a one-hour interval targeting the postgres database under the postgres superuser. Adjust pg_partman_bgw.dbname/role in postgresql.conf.tpl (or via postgresql.pgtune.conf) if you manage partitions from a different control schema.
Use ./scripts/manage.sh partman-show-config to inspect tracked parents, partman-maintenance to run run_maintenance_proc() on demand, and partman-create-parent schema.table control_column '1 day' to bootstrap new partition sets without hand-writing SQL.
Run ./scripts/manage.sh async-queue bootstrap when you want a lightweight background-job queue. It provisions an asyncq.jobs table plus helpers (enqueue, dequeue, complete, fail, extend_lease) that rely on FOR UPDATE SKIP LOCKED, pg_notify, and UUID leasing. Point a worker at the queue with SELECT * FROM asyncq.dequeue('default'); in a loop and call asyncq.complete(...) or asyncq.fail(...) as you process jobs.
-
Bootstrap environment config:
./scripts/manage.sh create-envto walk through password creation, host UID/GID selection, and resource sizing (writes.env+ secrets). -
Build and start the stack:
./scripts/manage.sh build-image ./scripts/manage.sh up
-
Verify health and monitoring:
docker compose exec postgres /opt/core_data/scripts/healthcheck.sh ./scripts/manage.sh psql -c 'SELECT 1;' curl -fsS http://127.0.0.1:9090/-/healthy
-
Explore the CLI:
./scripts/manage.sh helpfor grouped commands (lifecycle, CI, backups, audits, extensions, cache/messaging, security). -
CI runners: see
CI_USAGE.mdfor the pipelines-focused flow (attestations,ci-up/ci-down, pytest markers).
We optimize for data infrastructure as code ➜ automated admin ➜ human-friendly ➜ best-practices by default (including security and performance). In practice that means:
- Data infrastructure as code. Declarative, reproducible Postgres once, versioned forever.
- Automated admin. Every recurring task should be scriptable and CI-friendly before we worry about shell ergonomics.
- Human-friendly. CLI helpers, sensible prompts, and clear docs matter—but only after the first two goals are met.
- Best-practices by default. Security and performance guardrails (cap drops, TLS, tuned memory, backups) are enabled out of the box, with deliberate escape hatches when absolutely required.
See docs/security_philosophy.md for how capability hardening and related controls fit into that priority order.
- Resource guardrails. Container memory, CPU, and shared memory limits come from
.env, keeping pgtune advice and runtime constraints aligned. AdjustPOSTGRES_MEMORY_LIMIT,POSTGRES_CPU_LIMIT, andPOSTGRES_SHM_SIZEto match the host. - TLS everywhere. PostgreSQL refuses non-SSL connections from the bridge network. Provide your own certificate/key via Docker secrets or rely on the init hook to mint a self-signed pair under
${PGDATA}/tls. - Bind-mounted persistent state. PostgreSQL data, WAL, pgBackRest, ValKey, RabbitMQ, Prometheus, and Grafana state live under
./data/by default so ownership and backups are explicit. - Non-root from the start. A one-shot
volume_prephelper chowns the volumes before Postgres launches so the main service and sidecars run as your host user by default (UID/GID${POSTGRES_UID}), keeping file ownership consistent across deployments. Supply alternative IDs only when required. - Automated logical backups. The
logical_backupsidecar runspg_dump/pg_dumpallon the cadence defined byLOGICAL_BACKUP_INTERVAL_SECONDS, writes into${BACKUPS_HOST_PATH}/logical, validates custom dumps withpg_restore --list, records manifests and_SUCCESSmarkers, exports Prometheus metrics on port9188, and skips any databases listed inLOGICAL_BACKUP_EXCLUDE(defaults topostgres).daily-maintenancecaptures the latest run inlogical_backup_status.txtfor auditing. - Required monitoring. Prometheus scrapes PostgreSQL, PgBouncer, logical backups, RabbitMQ, ValKey, Memcached, host metrics, and container metrics. Grafana is provisioned with the Prometheus datasource and a Core Data overview dashboard. Configure ports, retention, and credentials with the
PROMETHEUS_*,GRAFANA_*, and*_EXPORTER_*variables. - Composable health check.
scripts/healthcheck.shverifies readiness, executesSELECT 1, and optionally enforces replication lag ceilings before dependents start. - Rotated container logs. Docker's
localdriver with non-blocking delivery prevents runaway JSON files while retaining compressed history for incident response. - Required service set. Cache, pooling, messaging, and monitoring services are started by the normal compose flow.
COMPOSE_PROFILESis intentionally empty by default. - Seccomp baseline. Shipping profiles in
seccomp/cover each service (postgres.json,logical_backup.json,pgbouncer.json,valkey.json,memcached.json, plusdocker-default.jsonreused for RabbitMQ)../scripts/manage.sh seccomp-statusreports the active spec,seccomp-verifygates compose configs, anddocs/security_philosophy.mdoutlines how to regenerate traces when you need to tighten them further. - AppArmor (opt-in). Minimal profiles live in
apparmor/core_data_minimal.profile. Load them with./scripts/manage.sh apparmor-load(sudo), then setCORE_DATA_APPARMOR_<SERVICE>=apparmor:core_data_minimalin.envfor each service you want to confine. The profile denies access to high-value host paths (/root,/etc/shadow, Docker socket) while leaving normal container paths alone.
- ValKey — Requires authentication by default (
valkey_passwordsecret), persists to thevalkey_datavolume (appendonly yes), exposesvalkey-cli/valkey-bgsave, and is tuned via.envknobs such asVALKEY_MAXMEMORYandVALKEY_MAXMEMORY_POLICY. - PgBouncer — Uses SCRAM auth backed by a dedicated superuser, renders config/userlist from templates, and offers
pgbouncer-stats/pgbouncer-poolshelpers. Pool sizing and admin/stat users are driven by thePGBOUNCER_*variables. - Memcached — Lightweight hot cache with configurable memory, connection, and thread limits (
MEMCACHED_*). Thememcached-statshelper pipesstatsoutput for quick validation. All services stay on the internal bridge network by default. - RabbitMQ — Delivers AMQP 0.9.1 alongside the management API. Credentials and cluster identity come from
secrets/rabbitmq_default_passandsecrets/rabbitmq_erlang_cookie; daily maintenance captures definitions/status snapshots, and on-demand helpers (rabbitmq-ctl,rabbitmq-diagnostics,rabbitmq-export,rabbitmq-overview) streamline broker admin.
core\_data/
├── .env.example # Template for environment-specific settings (never commit real secrets)
├── docker-compose.yml # Orchestrates PostgreSQL and supporting services
├── scripts/ # Operator tooling (manage.sh + lib modules + maintenance workflow)
├── postgres/ # Custom image build assets, configs, and initdb scripts
├── backups/ # Host output directory for logical dumps and reports
├── secrets/ # Docker secret material (passwords, pgsodium.key — never committed)
├── README.md # This guide
├── THIRD\_PARTY\_LICENSES.md # Upstream license attributions for vendored tooling
└── AGENTS.md # Contributor quick-reference & runbooks
Keep persistent bind mounts under data/ out of version control; they contain live cluster state, monitoring state, and pgBackRest archives.
Use the dedicated CI helpers when you need to spin up the published stack inside automation:
./scripts/manage.sh ci-verify --env-file ci.env.example --require-attestationconfirms Docker is available, checks disk/port prerequisites, and verifies that every referenced image carries a recent attestation../scripts/manage.sh ci-up --env-file ci.env.example --output ./backups/ci-output.jsonbootstraps secrets/network allow lists, pulls/warms the prebuilt images, and emits a JSON manifest with connection details for downstream jobs../scripts/manage.sh ci-down --volumes --prune-data --prune-secretsstops containers and removes ephemeral state when the pipeline finishes.
ci.env.example shows the minimal environment required for this workflow, and docs/examples/ci-workflow.md includes a ready-to-copy GitHub Actions job that wires everything together.
./scripts/manage.sh is the operator entry point. Frequently used commands:
| Command | Description |
|---|---|
build-image |
Build the custom PostgreSQL image defined in postgres/Dockerfile. |
create-env |
Interactive wizard that copies .env.example, sizes resources, seeds secrets, and writes .env. |
up / down |
Start or stop the Compose stack (volumes preserved). |
psql |
Open psql inside the container (respects PGHOST, PGUSER, etc.). |
dump / dump-sql |
Produce logical backups (custom or plain format) under /backups. |
restore-dump |
Drop and recreate a database before restoring a .dump.gz. |
backup [--verify] / stanza-create / restore-snapshot |
Manage pgBackRest backups & optionally restore the latest backup into a throwaway data dir for checksum verification. |
daily-maintenance |
Run dumps, log capture, pgBadger analysis, and retention pruning. |
provision-qa |
Differential backup + targeted restore for QA databases. |
config-render |
Re-render postgresql.conf / pg_hba.conf from the templates and restart PostgreSQL (terminates active connections; required for some settings like shared_buffers and max_connections). |
config-check |
Compare live postgresql.conf / pg_hba.conf against rendered templates to catch drift. |
audit-roles / audit-security |
Generate CSV/text reports covering role hygiene, passwords, and HBA/RLS posture. |
audit-extensions |
Confirm bundled extensions are present and on expected versions. |
audit-autovacuum |
Flag tables with high dead tuple counts or ratios. |
audit-replication |
Summarise follower lag and sync state. |
audit-cron / audit-squeeze |
Inspect pg_cron schedules and pg_squeeze activity tables. |
audit-index-bloat |
Estimate index bloat using pgstattuple (supports --min-size-mb). |
audit-buffercache |
Snapshot shared buffer usage per relation (supports --limit). |
audit-schema |
Snapshot schema metadata for drift detection. |
snapshot-pgstat |
Capture a pg_stat_statements baseline (CSV output) for performance trending. |
diff-pgstat --base --compare |
Diff two snapshots (CSV-in/CSV-out) to highlight hot queries. |
compact --level N |
Layered bloat management: 1=autovacuum audit, 2=pg_squeeze refresh, 3=pg_repack (needs --tables), 4=VACUUM FULL (needs --yes). |
exercise-extensions |
Smoke-test the core extension bundle (vector, PostGIS, AGE, citext, hstore, pgcrypto, hypopg, pg_partman, etc.). |
pgtap-smoke |
Run a micro pgTap plan to confirm key extensions (including hypopg/pg_partman) are registered. |
test-dataset bootstrap |
Stand up a synthetic schema with spatial, vector, routing, and graph fixtures to exercise PgBouncer/GraphQL clients. |
async-queue bootstrap |
Install the lightweight async queue schema (asyncq) with enqueue/dequeue helpers. |
partman-maintenance |
Invoke run_maintenance_proc() for the selected database (defaults to POSTGRES_DB). |
partman-show-config |
Print rows from part_config (optionally filter by --parent schema.table). |
partman-create-parent |
Wrap create_parent to bootstrap managed partitions without manual SQL. |
valkey-cli |
Run valkey-cli inside the ValKey container with secrets wired in. |
valkey-bgsave |
Trigger BGSAVE so the ValKey RDB is flushed to the valkey_data volume. |
rabbitmq-ctl |
Execute rabbitmqctl inside the RabbitMQ container (requires the rabbitmq profile). |
rabbitmq-diagnostics |
Run rabbitmq-diagnostics commands such as status or check_running. |
rabbitmq-export |
Export broker definitions to JSON (defaults to ${BACKUPS_HOST_PATH}/rabbitmq-definitions.json). |
rabbitmq-overview |
Print the summary from rabbitmq-diagnostics status. |
pgbouncer-stats / pgbouncer-pools |
Emit PgBouncer SHOW STATS / SHOW POOLS via the admin console. |
memcached-stats |
Fetch stats output from the Memcached service. |
version-status |
Compare installed Postgres/extension versions with upstream releases (CSV via --output). |
upgrade --new-version |
Orchestrate pgautoupgrade (takes backups, validates base image, restarts). |
The CLI sources modular helpers from scripts/lib/ so each function can be imported by tests or future automation.
daily-maintenance now emits a richer bundle under backups/daily/<YYYYMMDD>/, including pg_stat_statements snapshots, pg_buffercache heatmaps, role/extension/autovacuum/replication CSVs, pg_cron schedules, pg_squeeze activity, and a security checklist alongside logs, dumps, pgBadger HTML, and pgaudit summaries. The workflow also records per-step results in maintenance_status.json, records the most recent sidecar dump run in logical_backup_status.txt, runs partman.run_maintenance_proc() across each database so freshly created partitions land even if the background worker interval has not elapsed, and captures version drift in version_status.csv (focusing on out-of-date components). Pair those reports with config-check to keep the rendered configs aligned with the templates. Tune the thresholds via DAILY_PG_STAT_LIMIT, DAILY_BUFFERCACHE_LIMIT, DAILY_DEAD_TUPLE_THRESHOLD, DAILY_DEAD_TUPLE_RATIO, and DAILY_REPLICATION_LAG_THRESHOLD as needed.
Nightly cron jobs also refresh pg_squeeze targets, reset pg_stat_statements, and run a safe VACUUM (ANALYZE, SKIP_LOCKED, PARALLEL 4) so statistics stay current without blocking hot tables.
Set DAILY_EMAIL_REPORT=true and DAILY_REPORT_RECIPIENT=ops@example.com in .env to have the maintenance job email the HTML summary via sendmail (if available inside the container).
To compare performance snapshots between runs, capture CSVs with snapshot-pgstat --output /backups/pg_stat_before.csv and snapshot-pgstat --output /backups/pg_stat_after.csv, then run ./scripts/manage.sh diff-pgstat --base /backups/pg_stat_before.csv --compare /backups/pg_stat_after.csv --limit 25 for a ranked delta report.
./scripts/manage.sh compact provides escalating space-recovery options:
- Level 1 — run the autovacuum audit (no changes, just reporting).
- Level 2 — rerun
core_data_admin.refresh_pg_squeeze_targets()and emit updatedpg_squeezefindings. - Level 3 — execute
pg_repackfor specific tables (--tables schema.table[,schema.table...]) without heavy locks. - Level 4 — run
VACUUM (FULL, ANALYZE, VERBOSE)across all tables or a comma-delimited--scope(requires--yes). Expect exclusive locks; schedule during maintenance windows.
All runs write logs under backups/ for auditing (pg_repack-*.log, vacuum-full-*.log).
- CI Workflow:
.github/workflows/ci.ymlbuilds the image, runspython -m pytest -k full_workflow, and uploads generated backups for inspection. - Smoke Test:
tests/test_manage.pyspins up a disposable environment, exercises key CLI commands (includingdaily-maintenance, pgBackRest, andupgrade), and tears everything down. Run locally withpython -m pytest -k full_workflow(Docker required). - Fast Tests:
tests/test_lightweight.pyvalidates offline flows like help output and the vendored tooling without needing Docker. - Extension Smoke:
./scripts/manage.sh exercise-extensions --db <name>pluspgtap-smokeprovide quick feedback that the entire core extension bundle (vector/PostGIS/AGE/hstore/citext/pgcrypto/pg_partman/etc.) is ready for use. - Synthetic Fixture:
./scripts/manage.sh test-dataset bootstrap --db core_data_testkit --owner demo_user --print-password --forcecreates repeatable spatial/vector/graph/routing sample data so PgBouncer, REST, and GraphQL clients can run end-to-end queries without touching production schemas. - Documentation:
AGENTS.mdoffers contributor runbooks and on-call notes.
Thank you to the maintainers and communities behind the components that make core_data possible:
- PostgreSQL – the database at the heart of the platform.
- Docker & Docker Compose – containerization and orchestration.
- PgBackRest – resilient backup and restore tooling.
- pgBadger – PostgreSQL log analytics.
- pg_cron – database-native scheduling.
- pg_squeeze – automatic bloat mitigation.
- pgvector – high-dimensional vector search.
- PostGIS – spatial superpowers for PostgreSQL (including Tiger Geocoder & Address Standardizer).
- pgRouting – network routing & graph analysis atop PostGIS.
- Apache AGE – graph database extension.
- pgaudit – enhanced auditing.
- pg_partman – automated time/ID partition management.
- HypoPG – hypothetical index exploration for query tuning.
- pg_repack & pgtap – maintenance & testing extensions.
Their work powers the database-as-code experience delivered by core_data.