Skip to content

[High] Add Kubernetes manifests and production deployment runbook #167

@robertocarlous

Description

@robertocarlous

Labels: priority/high, area/infra, documentation, feature

Body:

## Summary
A production-grade `Dockerfile` exists, but there is no Kubernetes (or equivalent orchestration) configuration or end-to-end deployment guide. This blocks repeatable, zero-downtime production releases.

## Problem
Current state:
- `Dockerfile` runs `prisma migrate deploy && node dist/index.js` in one container
- `docker-compose.yml` only provisions Postgres
- No manifests for:
  - Deployment / StatefulSet
  - Service / Ingress
  - Secrets management
  - Liveness vs readiness probes
  - Resource limits
  - Horizontal Pod Autoscaler
  - Init container for migrations (recommended for atomic rollouts)

Without this, each deploy is manual and error-prone.

## User story
**As a** DevOps engineer,  
**I want** Kubernetes manifests and a runbook,  
**so that** I can deploy NeuroWealth consistently with health checks, secrets, and safe migrations.

## Proposed solution
Add `deploy/k8s/` with:

### Workloads
- `deployment.yaml` — app container, non-root user (matches Dockerfile)
- `service.yaml` — ClusterIP on port 3001
- `ingress.yaml` — TLS termination, optional WAF annotations

### Probes (must match existing endpoints)
- Liveness: `GET /health/live`
- Readiness: `GET /health/ready`

### Jobs
- `migration-job.yaml``npx prisma migrate deploy` as pre-deploy init job
- Document: app container should NOT run migrations in K8s (split from Dockerfile CMD for prod)

### Config
- `configmap.yaml` — non-secret env (rate limits, CORS origins, log level)
- `secret.yaml.example` — template for:
  - `DATABASE_URL`
  - `JWT_SEED`
  - `WALLET_ENCRYPTION_KEY`
  - `STELLAR_AGENT_SECRET_KEY`
  - `ANTHROPIC_API_KEY`
  - `ADMIN_API_TOKEN`
  - `TWILIO_AUTH_TOKEN`

### Documentation (`docs/DEPLOYMENT.md`)
- Prerequisites (Postgres, RPC URLs, secrets store)
- Staging vs production env matrix
- Rollout procedure (migrate → deploy → verify readiness)
- Rollback procedure
- Scaling guidance (note: event listener may need single-leader pattern — document constraint)

## Acceptance criteria
- [ ] K8s manifests committed under `deploy/k8s/`
- [ ] Liveness/readiness probes configured correctly
- [ ] Secrets never committed in plaintext
- [ ] Migration strategy documented (init job vs startup)
- [ ] `docs/DEPLOYMENT.md` covers staging and production
- [ ] Optional: GitHub Actions workflow to validate manifests (`kubeconform` or `kubectl apply --dry-run`)

## Important architectural note
The Stellar event listener uses a DB cursor (`event_cursors`). Document whether multiple replicas are supported or if a **single active consumer** pattern is required (leader election / dedicated worker deployment).

## Out of scope
- Terraform for cloud infra (separate issue)
- Multi-region active-active

Metadata

Metadata

Assignees

Labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions