Skip to content

Postgres: run_migrations enables RLS on tenant_secrets before creating it → startup fails on fresh DB #148

@nerdsane

Description

@nerdsane

Summary

temper-store-postgres run_migrations() creates every schema table except tenant_secrets, then jumps to ENABLE_TENANT_RLS which references tenant_secrets. On a fresh Postgres DB this aborts startup with:

Error: Failed to run migrations

Caused by:
    storage error: failed to enable tenant RLS: error returned from database: relation "tenant_secrets" does not exist

Environment

  • Temper main (HEAD as of 2026-04-18)
  • Postgres 16 (Cloud SQL, Enterprise edition, connection via Cloud SQL Auth Proxy)
  • Startup: temper serve --port 3000 --storage postgres
  • DATABASE_URL points at the fresh temper DB with a non-superuser temper role.

Reproduce

  1. Create an empty Postgres database.
  2. Start Temper with --storage postgres against it.
  3. Temper's migration runner applies the base tables (events, snapshots, specs, trajectories, design_time_events, tenant_constraints, wasm_modules, wasm_invocation_logs, and their indexes) then fails at ENABLE_TENANT_RLS.

Root cause

crates/temper-store-postgres/src/migration.rs::run_migrations:

  • Lines 19–124: executes CREATE_EVENTS_TABLE, CREATE_SNAPSHOTS_TABLE, CREATE_SPECS_TABLE, CREATE_TRAJECTORIES_TABLE (+ indexes), CREATE_DESIGN_TIME_EVENTS_TABLE, CREATE_TENANT_CONSTRAINTS_TABLE (+ indexes), CREATE_WASM_MODULES_TABLE, CREATE_WASM_INVOCATION_LOGS_TABLE (+ indexes).
  • Line 127: jumps straight to for stmt in schema::ENABLE_TENANT_RLS { … }.

schema::ENABLE_TENANT_RLS includes ALTER TABLE tenant_secrets ENABLE ROW LEVEL SECURITY (and the matching policy), but schema::CREATE_TENANT_SECRETS_TABLE — defined at schema.rs:180 — is never executed by run_migrations().

Effect: fresh Postgres installs can't complete migrations; pod CrashLoopBackOff.

Suggested fix

Add one sqlx::query call for CREATE_TENANT_SECRETS_TABLE before the RLS loop. Minimal patch:

// migration.rs, just before the RLS loop (line 126):
sqlx::query(schema::CREATE_TENANT_SECRETS_TABLE)
    .execute(pool)
    .await
    .map_err(|e| PersistenceError::Storage(format!("failed to create tenant_secrets table: {e}")))?;

// Enable row-level security on all tenant-scoped tables.
for stmt in schema::ENABLE_TENANT_RLS {}

Everything else in run_migrations already follows the idempotent CREATE TABLE IF NOT EXISTS pattern, so re-running is safe.

Workaround (for callers stuck on this today)

Manually create the table before starting Temper:

CREATE TABLE IF NOT EXISTS tenant_secrets (
    tenant      TEXT         NOT NULL,
    key_name    TEXT         NOT NULL,
    ciphertext  BYTEA        NOT NULL,
    nonce       BYTEA        NOT NULL,
    created_at  TIMESTAMPTZ  NOT NULL DEFAULT now(),
    updated_at  TIMESTAMPTZ  NOT NULL DEFAULT now(),
    PRIMARY KEY (tenant, key_name)
);

Then restart Temper; migrations complete and ENABLE_TENANT_RLS succeeds because the table now exists.

Context

Hit this bringing up Temper on GKE as the control plane for the dark-helix factory (self-hosted Stage 2 build of the Directed Software Evolution arc). Pods entered CrashLoopBackOff; the DB-drop-and-retry didn't help because the bug is in the migration itself, not in state.

Happy to send a PR if helpful.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions