Skip to content

OS app distribution: replace filesystem-bundled catalog with registry-pulled, hash-pinned artifacts #145

@rita-aga

Description

@rita-aga

Context

Today temper_platform::os_apps::AppCatalog::discover() scans a filesystem directory (TEMPER_OS_APPS_DIR env, legacy TEMPER_SKILLS_DIR, or compile-time relative path) for app.toml manifests. list_startup_os_apps() filters that scan by startup_install = "core" and the openpaw startup loop auto-installs each of those for the tenant.

Consequence: an OS app can only exist in a deployment if its files are physically present inside the server's filesystem at boot. There is no way to install an app at runtime from a registry, a URL, or a signed artifact.

Symptoms this produces in practice

Openpaw hit all of these in the 2026-04-18 Katagami incident (openpaw issue linked below):

  1. Image-baked apps. For an app to run in Railway, its source + compiled WASM must be inside the Docker image. Updating an app = full image rebuild + platform redeploy cycle (20–30 min).
  2. Dev symlinks silently disappear in production. os-apps/katagami-curation and os-apps/katagami-commons are absolute-path symlinks into a developer's local checkout. On the dev's Mac they resolve; inside Docker they dangle; the catalog scan drops them silently and startup_os_apps() returns a partial list. Every restart the tenant has no katagami entity sets until a human manually runs temper.install_app().
  3. temper.install_app() is effectively a no-op for anything not in the image. The tool exists and lets an agent name an app, but the installer loads it from the catalog, which can only see filesystem-resident manifests. An agent can't install a third-party or remote app.
  4. wasm_modules: [] silent failure. The installer only uploads modules listed in [[wasm_modules]] in the app's manifest. Forgetting that block is indistinguishable from a successful install — the op logs Installed os-app 'X': ... wasm=[] and every integration trigger fails with WASM module 'NAME' not found. Hit paw-research (2026-04-18 early AM) and katagami-curation (same day).
  5. Tenant state vs. catalog drift. Tenant data (entities, files, events) survives restarts in the event store. Catalog state (which apps + WASMs exist) is re-derived from the image on every boot. If the image changes, the catalog changes; tenants that depended on an app newly missing from the image get half-broken state until an operator re-installs.

Proposal

Replace filesystem-bundled catalog with a registry-pulled, hash-pinned artifact model:

  1. Packaged app format. An OS app is published as a self-contained artifact containing:

    • app.toml manifest (including [[wasm_modules]] contract declarations)
    • Compiled .wasm module files
    • Entity specs (*.ioa.toml)
    • Cedar policies
    • Seed data, agents, skills, ADRs
      Either a tarball/zip under a content-addressed name (name@sha256:...) or an OCI artifact pushed to any OCI-compliant registry (GHCR, Docker Hub, ECR, etc.). Leaning OCI because the tooling (crane, oras, skopeo) already exists and supports signing + pulls without Docker daemon.
  2. Source of truth: Turso, not filesystem. The catalog becomes a database-backed registry: (tenant, app_name) -> { artifact_ref, pinned_hash, installed_at, status }. install_os_app() resolves artifact_ref to an OCI manifest, pulls layers to a local cache, verifies the hash, then runs the existing spec-registration + WASM-upload + seed-instance flow against the pulled content.

  3. Hash pinning + verification. Every install records the resolved digest. Subsequent installs of the same (tenant, app) must match the recorded hash unless the caller passes force_upgrade with a new target ref. Prevents silent drift when upstream :edge/:latest tags move.

  4. temper.install_app() works for real. Agents can pass "github.com/arni-labs/katagami/katagami-curation@sha256:..." or a signed ref and the platform pulls it, verifies it, installs it. No image rebuild.

  5. Runtime updates. A new artifact push + an admin API call (or scheduled reconcile) installs the update. Decoupled from server-binary deploys.

  6. Fail-loud missing-manifest check. The installer should refuse to complete when a tenant's app expected [[wasm_modules]] or integrations reference modules that didn't end up in the registry. wasm_modules: [] should be an error, not a log line.

Compatibility / rollout

  • Keep TEMPER_OS_APPS_DIR filesystem scan as a development fallback (local cargo run, test fixtures). Production servers flip a flag to registry-only.
  • Add a one-shot migration command: for each current catalog entry, publish to the registry under a first-class artifact ref and pin the resulting digest in Turso.
  • Gate the new install_app(artifact_ref) behind Cedar policy so tenants can only pull from allowlisted registries/publishers.
  • Deprecate the hardcoded openpaw startup list in favour of a tenant config that enumerates artifact refs.

Related

  • openpaw issue (tracking the consumer side): [to be filled in after openpaw issue is created]
  • Current triggering incident: the Katagami deploy hole on 2026-04-18 — katagami-curation WASMs missing, required Dockerfile vendoring step, required [[wasm_modules]] retrofit in upstream `arni-labs/katagami`.

Severity: high (every openpaw deploy currently has risk of an app silently missing). Priority: should be next platform milestone once ADR-0048/0049/0050/0051 land cleanly.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions