Skip to content

feat(dev): add k3s dev target (CLUSTER=k3s) with full kind parity#516

Merged
ericfitz merged 8 commits into
mainfrom
feature/k3s-dev-target
Jul 4, 2026
Merged

feat(dev): add k3s dev target (CLUSTER=k3s) with full kind parity#516
ericfitz merged 8 commits into
mainfrom
feature/k3s-dev-target

Conversation

@ericfitz

@ericfitz ericfitz commented Jul 4, 2026

Copy link
Copy Markdown
Owner

Summary

Adds a selectable k3s dev deployment target (make dev-up CLUSTER=k3s) that deploys the full TMI stack to a remote k3s cluster with complete parity to the kind path. kind remains the default (CLUSTER=kind); the two are orthogonal to DB=postgres|oracle.

Both paths were verified end-to-end this session: make dev-up CLUSTER=k3s and make dev-up (kind) each come up clean and serve localhost:8080 → HTTP 200.

What's included

  • CLUSTER=kind|k3s selector threaded through the Makefile and scripts/devenv.py (default kind); orthogonal to DB.
  • Cluster-aware lifecycle in scripts/lib/cluster.py: k3s selects the remote k3s-rp context (no create/destroy); registry + image refs resolve to the in-cluster registry rp2:30500.
  • In-cluster registry (deployments/k8s/dev/k3s/registry.yml) — the Mac builds arm64 images and pushes to it; nodes pull from it.
  • k3s overlay (deployments/k8s/dev/k3s/) — full workload set (server, controller, redis, extractor + chunk-embed workers), all TMI images remapped to rp2:30500; in-cluster Postgres on longhorn; DB-URL host rewritten to the postgres Service.
  • deploy.start/devenv wiring — registry bootstrap, in-cluster Postgres prerequisite, cluster-aware context guard, and a server port-forward to preserve the localhost:8080 contract (k3s has no extraPortMappings; CATS uses the NodePort at rp2:30080). dev-nuke does a namespace-scoped hard reset.

Notable fixes discovered during bring-up

  • redis:7-alpine on k3s — the Pi 5 (BCM2712) nodes run a 16KB-page kernel; the chainguard redis image's bundled jemalloc is built for 4KB pages and aborts with "Unsupported system page size". redis:7-alpine (libc malloc) runs the same redis-server/args cleanly. Kind keeps chainguard redis; only the k3s overlay remaps it.
  • kind context on skip-createcluster.up now always use-contexts the kind context, not just on kind create, so an active context from another cluster (e.g. k3s-rp) no longer fails the deploy context guard.
  • /etc/hosts pin for rp2 — documented, so kubectl resolves the API server deterministically after a node reboot (mDNS short-name resolution is unreliable then).

One-time k3s host/node setup

Documented in deployments/k8s/dev/k3s/README-node-setup.md: Mac /etc/hosts pin + insecure-registry, and per-node containerd registry mirror (endpoint by IP).

Testing

  • make test-dev-scripts — 88 unit tests pass (cluster-aware helpers, --cluster parser, DB-host rewrite, port-forward gating).
  • Live: make dev-up CLUSTER=k3s (full stack, HTTP 200) and make dev-up (kind non-regression, HTTP 200).

🤖 Generated with Claude Code

https://claude.ai/code/session_01Kk9GxWS9EpazjbwBKfMpUX

ericfitz and others added 8 commits July 2, 2026 22:24
Threads a --cluster global option through the Makefile dev-* targets and
scripts/devenv.py (mirrors the existing --db selector). Parsed in both
orderings and orthogonal to --db; defaults to kind so the current path is
unchanged. Command functions consume it in later tasks.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01Kk9GxWS9EpazjbwBKfMpUX
Adds K3S_CONTEXT/K3S_REGISTRY plus registry_for(), expected_context(), and a
cluster kwarg on local_image_ref()/up()/down(): CLUSTER=k3s selects the k3s-rp
context (never create/delete) and targets the rp2:30500 registry, while kind is
unchanged. Threads cluster through devenv's up/down/nuke/cluster commands.
Unit tests cover the new helpers and the --cluster parser (84 tests pass).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01Kk9GxWS9EpazjbwBKfMpUX
Adds deployments/k8s/dev/k3s/registry.yml (registry:2 Deployment + longhorn PVC
+ NodePort 30500, in a tmi-platform namespace) and a README documenting the
one-time Mac insecure-registries + per-node registries.yaml config. build_and_push
and remove_local_images take a cluster_target so k3s targets rp2:30500. Mac and
nodes are both arm64, so a plain docker build already yields arm64 images — no
buildx/--platform needed; only the target registry differs.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01Kk9GxWS9EpazjbwBKfMpUX
Nodes don't resolve each other's bare hostnames (containerd: 'lookup rp2: no
such host'), so the registries.yaml mirror endpoint must be an IP
(http://192.168.1.2:30500). The mirror key stays rp2:30500 so image refs and the
Mac's insecure-registries config are unchanged.
Adds the k3s kustomize overlay (full stack: server/controller/redis/extractor/
chunk-embed, all TMI images remapped to rp2:30500 via images-transformer + CRD
image patches) and a single-node in-cluster Postgres StatefulSet on longhorn
(vanilla chainguard postgres — same as kind; no custom image or pgvector needed).
deploy.overlay_dir_for and deliver_config are cluster-aware; in_cluster_db_host()
rewrites the server's DB URL host to the in-cluster 'postgres' Service for k3s.
Postgres verified Running on longhorn; overlay renders clean (88 tests pass).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01Kk9GxWS9EpazjbwBKfMpUX
deploy.start/restart branch on cluster_target: ensure the in-cluster registry is
up before push, bring up in-cluster Postgres before the server, rewrite the DB
host to the postgres Service, apply the k3s overlay, and preserve localhost:8080
via a server port-forward (k3s has no extraPortMappings; CATS uses NodePort
rp2:30080). devenv cmd_up/down/restart/reset/nuke/deploy thread cluster_target and
skip the Mac Postgres container for k3s (DB is in-cluster); nuke does a
namespace-scoped hard reset. Context guard accepts the k3s-rp context. Updated the
#463 regression test: the server port-forward is now k3s-only and always gated.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01Kk9GxWS9EpazjbwBKfMpUX
…B pages)

The Pi 5 (BCM2712) k3s nodes run a 16KB-page kernel; cgr.dev/chainguard/redis's
bundled jemalloc is built for 4KB pages and aborts at startup with 'Unsupported
system page size', CrashLooping redis and blocking the server (Redis ping times
out). redis:7-alpine is built with libc malloc (page-size-agnostic) and runs the
same redis-server with identical args — verified starting cleanly on the nodes.
The kind path keeps chainguard redis; only the k3s overlay remaps the image.
Also document pinning rp2 in the Mac's /etc/hosts (mDNS short-name resolution
fails right after a node reboot, aborting dev-up pre-flight).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01Kk9GxWS9EpazjbwBKfMpUX
…reate)

kind create cluster sets the kubectl context, but the skip-create path (cluster
already exists) did not, so an active context from another cluster (e.g. k3s-rp
after working the k3s dev target) lingered and failed deploy's context guard with
'Context k3s-rp is not the expected kind-tmi-dev'. Always use-context the kind
context after ensuring the cluster exists, mirroring the k3s path. Idempotent.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01Kk9GxWS9EpazjbwBKfMpUX
@ericfitz ericfitz merged commit 98062f8 into main Jul 4, 2026
12 checks passed
@ericfitz ericfitz deleted the feature/k3s-dev-target branch July 4, 2026 02:16
ericfitz added a commit that referenced this pull request Jul 4, 2026
…netes (CLUSTER=docker-desktop default) (#520)

* docs(dev): design spec for the Docker Desktop dev target

Replace the standalone-kind dev path with Docker Desktop's kind-provisioned
Kubernetes (context docker-desktop): registry-free image delivery via ctr import,
port-forward endpoint, in-cluster Postgres, chainguard redis (4KB pages). e2e-platform
stays on standalone kind (needs Calico/NetworkPolicy + ephemeral clusters). Sequenced
after the k3s target (#516).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01Kk9GxWS9EpazjbwBKfMpUX

* docs(dev): implementation plan for the Docker Desktop dev target

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01Kk9GxWS9EpazjbwBKfMpUX

* feat(dev): add docker-desktop cluster identity + selector plumbing

* feat(dev): registry-free image import for docker-desktop (docker save | ctr import)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01Kk9GxWS9EpazjbwBKfMpUX

* feat(dev): docker-desktop overlay + in-cluster Postgres (default storageclass)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01Kk9GxWS9EpazjbwBKfMpUX

* feat(dev): wire deploy.start/restart/nuke for the docker-desktop target

- Rename apply_k3s_postgres -> apply_incluster_postgres(cluster_target)
  so it resolves the right overlay's postgres.yml (k3s or docker-desktop).
- Rename teardown_k3s_namespace -> teardown_namespace and generalize its
  docstring to cover both no-own-cluster targets.
- Extend server port-forward gating in wait_and_forward and restart from
  cluster_target == 'k3s' to cluster_target in ('k3s', 'docker-desktop').
- Add docker-desktop branch to start(): skip registry setup, import images
  via build_and_push, bring up in-cluster Postgres via apply_incluster_postgres.
- Add docker-desktop branch to restart(): same no-registry guard shape.
- Generalize cmd_nuke guard in devenv.py from k3s-only to both no-own-cluster
  targets; calls teardown_namespace and redeploys via deploy.start.
- Update test_server_port_forward_is_k3s_only to assert tuple-form gating;
  add test_server_port_forward_gated_for_docker_desktop_too.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01Kk9GxWS9EpazjbwBKfMpUX

* feat(dev): default CLUSTER to docker-desktop; document setup

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01Kk9GxWS9EpazjbwBKfMpUX

* refactor(dev): retire the standalone-kind dev path (docker-desktop is the default)

- Delete deployments/k8s/dev/kind-cluster.yml (dev kind config, not e2e)
- scripts/lib/cluster.py: remove all kind lifecycle helpers (ensure_registry,
  is_registry_running, cluster_exists, _kind_node_containers, start_stopped_nodes,
  connect_registry_to_kind) and kind-only constants (CLUSTER_NAME, REGISTRY_CONTAINER,
  REGISTRY_IMAGE, REGISTRY_PORT, LOCAL_REGISTRY, KIND_CONFIG); up()/down() now handle
  only k3s and docker-desktop (raises ValueError on unknown target); registry_for() and
  expected_context() likewise simplified and made explicit.
- scripts/lib/deploy.py: remove kind registry block from start()/restart(), remove
  REGISTRY_CONTAINER stop from teardown(), update HOST_PORT/NODE_PORT comment block,
  defaults updated from "kind" to "docker-desktop".
- scripts/devenv.py: drop _uses_host_db(), _db_profile(), database import, cmd_db and
  "db" verb (host Postgres container no longer in use); drop "kind" from --cluster
  choices; simplify cmd_nuke (always namespace-scoped for k3s/docker-desktop).
- scripts/lib/devstatus.py: replace kind/registry/host-db status rows with a single
  kube-context row; remove cluster and database imports.
- All tests updated: kind-specific tests removed, kind defaults updated to
  docker-desktop, new tests assert kind raises ValueError.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01Kk9GxWS9EpazjbwBKfMpUX

* fix(dev): remove dangling dev-db-up/down targets and stale kind wording after kind retirement

- Makefile: delete dev-db-up and dev-db-down targets (called now-removed
  devenv.py db verb) and remove them from .PHONY
- scripts/help.py: remove dev-db-up/dev-db-down help entries; update
  stale "kind" wording to match current docker-desktop/k3s reality
- scripts/lib/deploy.py: change seven cluster_target default args from
  "kind" to "docker-desktop" for consistency with cluster.py

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01Kk9GxWS9EpazjbwBKfMpUX

* feat(dev): restore DB=oracle parity on docker-desktop; drop orphaned kind overlays; fix stale comments

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01Kk9GxWS9EpazjbwBKfMpUX

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant