Skip to content

Per-tenant BYOK, data residency & WORM audit log #53

@dcoln25-writer

Description

@dcoln25-writer

Problem

Aperio encrypts credentials at rest via AES-256-GCM with a single APERIO_ENCRYPTION_KEY. That works for self-hosted single-tenant deployments but is a non-starter for:

  • Regulated buyers (FedRAMP Moderate/High, FINRA, HIPAA-covered entities) who require customer-controlled encryption rooted in their own KMS.
  • EU customers with GDPR data residency requirements who need per-tenant region pinning.
  • Multi-tenant managed deployments where blast radius of a single key compromise must be bounded.
  • Compliance audit trails that require append-only / write-once-read-many audit logs (FedRAMP AU-9, SOC 2 CC7.1).

Today's gaps:

  • One global encryption key; no per-tenant separation, no BYOK, no key rotation tooling.
  • No data residency selector — every tenant lands in the same Postgres.
  • TenantAuditLog is mutable (no WORM enforcement at the DB or application layer).

Goals

  1. Per-tenant BYOK — each tenant can root the encryption key for their IntegrationConnection/SiemDestination/WorkflowDestination tokens in their own KMS (AWS KMS, GCP KMS, Azure Key Vault, HashiCorp Vault Transit).
  2. Envelope encryption with a tenant DEK wrapped by the customer KEK; DEK cached in memory only.
  3. Key rotation — rotate the org's DEK without downtime; old ciphertext stays readable until lazy re-encryption finishes.
  4. Data residency — per-org region pin (US / EU / APAC); tenant data physically lives in the picked region.
  5. WORM audit logTenantAuditLog becomes append-only with cryptographic chaining, optionally exported to S3 Object Lock / GCS bucket lock.

Non-goals

  • Not building our own KMS — we always defer to the customer's existing key management.
  • Not implementing per-row encryption (overkill given that PII largely lives in audit-log payloads, not core tables).
  • Not building cross-region replication in v1 — region pinning is single-region per org.

Proposed design

Envelope encryption model

Customer KEK (in their KMS) ──wraps──▶ Tenant DEK (per-org, AES-256)
                                          │
                                          ▼
                                  Encrypts integration/SIEM tokens,
                                  Google service-account private keys,
                                  API token hashes (already hashed), etc.

DEK is unwrapped on demand via the customer's KMS, cached for APERIO_DEK_CACHE_TTL_SECONDS (default 600s), then evicted. Cache miss triggers a single KMS call per org.

KMS provider abstraction

// internal/kms/provider.go
type Provider interface {
    Encrypt(ctx context.Context, keyRef string, plaintext []byte) ([]byte, error)  // returns wrapped DEK
    Decrypt(ctx context.Context, keyRef string, ciphertext []byte) ([]byte, error)
    Sign(ctx context.Context, keyRef string, payload []byte) ([]byte, error)        // for WORM chain attest
}

Implementations: aws-kms, gcp-kms, azure-keyvault, vault-transit, local (current behavior, dev only).

New schema

enum KekProviderKind {
  AWS_KMS
  GCP_KMS
  AZURE_KEY_VAULT
  VAULT_TRANSIT
  LOCAL
}

enum DataResidency {
  US
  EU
  APAC
}

model OrganizationKey {
  organizationId    String          @id @map("organization_id")
  kekProvider       KekProviderKind @map("kek_provider")
  kekRef            String          @map("kek_ref") @db.VarChar(500) // ARN / resource name / vault path
  wrappedDek        Bytes           @map("wrapped_dek")
  dekVersion        Int             @default(1) @map("dek_version")
  algorithm         String          @default("AES256-GCM") @db.VarChar(32)
  createdAt         DateTime        @default(now()) @map("created_at")
  rotatedAt         DateTime?       @map("rotated_at")
  organization      Organization    @relation(fields: [organizationId], references: [id], onDelete: Cascade)
  @@map("organization_keys")
}

Organization gains:

model Organization {
  // ... existing ...
  dataResidency     DataResidency @default(US) @map("data_residency")
  wormAuditEnabled  Boolean       @default(false) @map("worm_audit_enabled")
}

All existing encryptedAccessToken, encryptedRefreshToken, etc. fields now store ciphertext encrypted with the per-org DEK (wrapped by the customer KEK), tagged with tokenKeyVersion so we can lazy re-encrypt during rotation.

Key rotation

RotateOrgDek(orgId, newKekRef?) -> RotationJob:

  1. Generate a new random 256-bit DEK.
  2. Wrap it with the customer KEK (new ref if provided).
  3. Persist as dekVersion = old+1.
  4. Spawn a background job that re-encrypts every column tagged with the old version under the new DEK; updates tokenKeyVersion.
  5. Once all rows are migrated, archive the old DEK metadata (kept for break-glass decryption only).

Zero downtime: both DEK versions are unwrappable during the rolling re-encrypt.

Data residency

Per-org region pin. Implementation options:

  • Single deployment, multi-schema — one Postgres per region; routing layer (in internal/bootstrap) picks the connection by Organization.dataResidency resolved from the session.
  • Per-region deployment — separate full Aperio install per region with a global control plane for auth + org metadata only.

v1 ships the multi-schema model (simpler ops); v2 considers per-region deployments for FedRAMP High.

WORM audit log

TenantAuditLog becomes append-only:

  • DB-level: revoke UPDATE/DELETE from the application role; only INSERT permitted.
  • Application-level: each row carries prevRowHash (SHA-256 of the previous row's canonical JSON), forming a hash chain. The latest row's hash is signed daily by the org's KEK (via the KMS provider's Sign capability) and persisted as an AuditChainAttestation record.
  • Optional export: stream every row into an S3 Object Lock bucket (compliance mode) for off-system immutability.

New schema:

model TenantAuditLog {
  // ... existing ...
  prevRowHash  String? @map("prev_row_hash") @db.VarChar(64)
  rowHash      String  @map("row_hash") @db.VarChar(64)
}

model AuditChainAttestation {
  id              String   @id @default(cuid())
  organizationId  String   @map("organization_id")
  windowStart     DateTime @map("window_start")
  windowEnd       DateTime @map("window_end")
  headHash        String   @map("head_hash") @db.VarChar(64)
  signature       Bytes
  signedByKekRef  String   @map("signed_by_kek_ref") @db.VarChar(500)
  createdAt       DateTime @default(now()) @map("created_at")
  organization    Organization @relation(...)
  @@index([organizationId, windowEnd])
  @@map("audit_chain_attestations")
}

Configuration

APERIO_KMS_PROVIDER=aws-kms|gcp-kms|azure-keyvault|vault-transit|local
APERIO_KMS_DEFAULT_KEK_REF=arn:aws:kms:us-east-1:...    # org-level override allowed
APERIO_DEK_CACHE_TTL_SECONDS=600
APERIO_DATA_RESIDENCY_ROUTING=enabled
APERIO_REGION_DSN_US=postgres://...
APERIO_REGION_DSN_EU=postgres://...
APERIO_REGION_DSN_APAC=postgres://...
APERIO_WORM_AUDIT_S3_BUCKET=acme-aperio-audit-worm     # optional

UI surface

  • /admin/security/encryption — view current KEK provider, KEK ref, DEK version, last rotation; trigger rotation; switch KEK provider.
  • /admin/security/audit — toggle WORM mode; view chain attestations; download audit chain proof for a window.
  • /admin/security/residency — view region; (region change requires support-led data migration).
  • Org settings: residency selector at org creation time.

Phasing

Phase Scope
P1 KMS provider abstraction; OrganizationKey schema; envelope encryption for IntegrationConnection.encryptedAccessToken; local + aws-kms providers; /admin/security/encryption view
P2 gcp-kms + azure-keyvault + vault-transit providers; DEK rotation w/ lazy re-encrypt; rotation UI
P3 Per-org data residency routing; multi-region Postgres setup docs
P4 WORM audit log + chain attestation; optional S3 Object Lock export

Open questions

  • KEK access on customer KMS — IAM role assumption vs. customer-provided service account. Cross-account role chain seems cleanest.
  • What's the boundary between "self-hosted" and "managed" tenants? Self-hosted ops can do everything inline; managed tenants need a delegation contract.
  • Rotation cadence — operator-triggered only, or built-in cron (e.g. annual auto-rotate)?
  • For WORM, do we ship a verifier CLI (aperio audit verify --window 2026-Q1) auditors can run themselves?

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestsecurity-hardeningEncryption, BYOK, residency, WORM audittier-3-operator-dxTier 3: operator + developer experience

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions