Summary
When Temper runs with --storage postgres, previously-installed OS apps are lost on process restart. The SpecRegistry repopulates platform + Agent specs at boot, but there is no code path that replays tenant os-app installs from Postgres — the Phase 8b recovery exists only for the Turso store.
Observable effect: after POST /observe/os-apps/<app>/install, the entity sets (e.g. /tdata/GitTokens) work until the next pod restart. After restart, {"error":{"code":"EntitySetNotFound","message":"Entity set 'GitTokens' not found"}} is returned for anything the user app defined.
Environment
- Temper HEAD
078777d2aea5.
- Storage:
--storage postgres (Cloud SQL Postgres 16).
- Deployment: k8s Deployment, 2 replicas.
- Repro was against the
dark-helix OS app with 21 entity types.
Root cause
Two gaps working together:
-
temper-store-postgres has no installed-apps tracking. Trait methods on PlatformStore:
is_app_installed(tenant, app)
record_installed_app(tenant, app)
list_all_installed_apps()
are implemented in temper-store-turso::store::specs (see
crates/temper-store-turso/src/store/specs.rs:246) but have no equivalent in crates/temper-store-postgres/. Grepping the crate returns zero hits for list_all_installed_apps / record_installed_app / is_app_installed.
-
Phase 8b only wires the Turso path. In crates/temper-cli/src/serve/bootstrap.rs::bootstrap_installed_apps (~line 564):
if let Some(ref store) = state.server.event_store
&& let Some(turso) = store.platform_turso_store()
{
match turso.list_all_installed_apps().await { ... }
}
With a Postgres-only deployment platform_turso_store() returns None, so the whole replay block is skipped. There is no fallback to PostgresStore::list_all_installed_apps() (which doesn't exist) or to a file-catalog scan.
The earlier comment at line 517 of the same file — "OS app specs are already restored from the specs table by restore_registry_from_turso (Phase 2) and Cedar policies by recover_cedar_policies (Phase 6), so no reinstall loop is needed" — explicitly depends on the Turso restore path, which doesn't exist for Postgres.
Reproduce
- Run Temper with
--storage postgres against a fresh Postgres database.
- Ship an OS app bundle into
TEMPER_OS_APPS_DIR (e.g. an initContainer extracting a tarball to /apps/dark-helix).
POST /observe/os-apps/dark-helix/install with {"tenant": "dark-helix"}. Returns 200 with added: [...entities].
GET /tdata/<any-entity-from-the-bundle> with X-Tenant-Id: dark-helix → 200 OK.
- Delete the pod (
kubectl delete pod -l app=temper). Wait for the new pod.
- Re-issue the same
GET → 404 EntitySetNotFound.
Suggested fix
Two orthogonal layers are worth fixing — a short-term unblocker and a long-term proper solution:
Short-term: add a filesystem-catalog reinstall on startup
Gate behind an env var or a CLI flag so existing Turso-based users aren't affected. Pseudocode:
// Phase 8b: re-install apps found on the local catalog into TEMPER_TENANT.
if std::env::var("TEMPER_AUTO_INSTALL_APPS").ok().as_deref() == Some("true") {
let tenant = std::env::var("TEMPER_TENANT").unwrap_or_else(|_| "default".into());
for entry in os_apps::list_os_apps() {
if let Err(e) = os_apps::install_os_app(state, &tenant, &entry.name).await {
tracing::warn!("auto-install failed for app='{}': {e}", entry.name);
}
}
}
This sidesteps the Postgres-tracking gap entirely. It's also the right behavior for deployments where the app catalog is shipped via image/initContainer/PVC (i.e. the canonical k8s pattern) — the filesystem IS the source of truth, so no DB tracking is needed.
Long-term: Postgres implementations of the three trait methods
Add a tenant_installed_apps table in temper-store-postgres::schema, implement the three PlatformStore trait methods, and remove the Turso-only gate in bootstrap_installed_apps. This is the symmetric solution but requires a migration.
Context
Hit this bringing up Temper on GKE as the control plane for the dark-helix factory. Currently working around by (a) scaling the Temper Deployment to 1 replica and (b) re-running the install after every pod restart. Neither is acceptable for a production control plane. Planning to implement the short-term fix on a local fix branch and build from that while the upstream work lands.
Related: #148 (tenant_secrets migration — another Postgres-specific init-path bug from the same pipeline).
Summary
When Temper runs with
--storage postgres, previously-installed OS apps are lost on process restart. The SpecRegistry repopulates platform + Agent specs at boot, but there is no code path that replays tenant os-app installs from Postgres — the Phase 8b recovery exists only for the Turso store.Observable effect: after
POST /observe/os-apps/<app>/install, the entity sets (e.g./tdata/GitTokens) work until the next pod restart. After restart,{"error":{"code":"EntitySetNotFound","message":"Entity set 'GitTokens' not found"}}is returned for anything the user app defined.Environment
078777d2aea5.--storage postgres(Cloud SQL Postgres 16).dark-helixOS app with 21 entity types.Root cause
Two gaps working together:
temper-store-postgreshas no installed-apps tracking. Trait methods onPlatformStore:is_app_installed(tenant, app)record_installed_app(tenant, app)list_all_installed_apps()are implemented in
temper-store-turso::store::specs(seecrates/temper-store-turso/src/store/specs.rs:246) but have no equivalent incrates/temper-store-postgres/. Grepping the crate returns zero hits forlist_all_installed_apps/record_installed_app/is_app_installed.Phase 8b only wires the Turso path. In
crates/temper-cli/src/serve/bootstrap.rs::bootstrap_installed_apps(~line 564):With a Postgres-only deployment
platform_turso_store()returnsNone, so the whole replay block is skipped. There is no fallback toPostgresStore::list_all_installed_apps()(which doesn't exist) or to a file-catalog scan.The earlier comment at line 517 of the same file — "OS app specs are already restored from the
specstable byrestore_registry_from_turso(Phase 2) and Cedar policies byrecover_cedar_policies(Phase 6), so no reinstall loop is needed" — explicitly depends on the Turso restore path, which doesn't exist for Postgres.Reproduce
--storage postgresagainst a fresh Postgres database.TEMPER_OS_APPS_DIR(e.g. an initContainer extracting a tarball to/apps/dark-helix).POST /observe/os-apps/dark-helix/installwith{"tenant": "dark-helix"}. Returns 200 withadded: [...entities].GET /tdata/<any-entity-from-the-bundle>withX-Tenant-Id: dark-helix→ 200 OK.kubectl delete pod -l app=temper). Wait for the new pod.GET→ 404EntitySetNotFound.Suggested fix
Two orthogonal layers are worth fixing — a short-term unblocker and a long-term proper solution:
Short-term: add a filesystem-catalog reinstall on startup
Gate behind an env var or a CLI flag so existing Turso-based users aren't affected. Pseudocode:
This sidesteps the Postgres-tracking gap entirely. It's also the right behavior for deployments where the app catalog is shipped via image/initContainer/PVC (i.e. the canonical k8s pattern) — the filesystem IS the source of truth, so no DB tracking is needed.
Long-term: Postgres implementations of the three trait methods
Add a
tenant_installed_appstable intemper-store-postgres::schema, implement the three PlatformStore trait methods, and remove the Turso-only gate inbootstrap_installed_apps. This is the symmetric solution but requires a migration.Context
Hit this bringing up Temper on GKE as the control plane for the dark-helix factory. Currently working around by (a) scaling the Temper Deployment to 1 replica and (b) re-running the install after every pod restart. Neither is acceptable for a production control plane. Planning to implement the short-term fix on a local fix branch and build from that while the upstream work lands.
Related: #148 (tenant_secrets migration — another Postgres-specific init-path bug from the same pipeline).