feat(dev): add k3s dev target (CLUSTER=k3s) with full kind parity#516
Merged
Conversation
Threads a --cluster global option through the Makefile dev-* targets and scripts/devenv.py (mirrors the existing --db selector). Parsed in both orderings and orthogonal to --db; defaults to kind so the current path is unchanged. Command functions consume it in later tasks. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01Kk9GxWS9EpazjbwBKfMpUX
Adds K3S_CONTEXT/K3S_REGISTRY plus registry_for(), expected_context(), and a cluster kwarg on local_image_ref()/up()/down(): CLUSTER=k3s selects the k3s-rp context (never create/delete) and targets the rp2:30500 registry, while kind is unchanged. Threads cluster through devenv's up/down/nuke/cluster commands. Unit tests cover the new helpers and the --cluster parser (84 tests pass). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01Kk9GxWS9EpazjbwBKfMpUX
Adds deployments/k8s/dev/k3s/registry.yml (registry:2 Deployment + longhorn PVC + NodePort 30500, in a tmi-platform namespace) and a README documenting the one-time Mac insecure-registries + per-node registries.yaml config. build_and_push and remove_local_images take a cluster_target so k3s targets rp2:30500. Mac and nodes are both arm64, so a plain docker build already yields arm64 images — no buildx/--platform needed; only the target registry differs. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01Kk9GxWS9EpazjbwBKfMpUX
Nodes don't resolve each other's bare hostnames (containerd: 'lookup rp2: no such host'), so the registries.yaml mirror endpoint must be an IP (http://192.168.1.2:30500). The mirror key stays rp2:30500 so image refs and the Mac's insecure-registries config are unchanged.
Adds the k3s kustomize overlay (full stack: server/controller/redis/extractor/ chunk-embed, all TMI images remapped to rp2:30500 via images-transformer + CRD image patches) and a single-node in-cluster Postgres StatefulSet on longhorn (vanilla chainguard postgres — same as kind; no custom image or pgvector needed). deploy.overlay_dir_for and deliver_config are cluster-aware; in_cluster_db_host() rewrites the server's DB URL host to the in-cluster 'postgres' Service for k3s. Postgres verified Running on longhorn; overlay renders clean (88 tests pass). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01Kk9GxWS9EpazjbwBKfMpUX
deploy.start/restart branch on cluster_target: ensure the in-cluster registry is up before push, bring up in-cluster Postgres before the server, rewrite the DB host to the postgres Service, apply the k3s overlay, and preserve localhost:8080 via a server port-forward (k3s has no extraPortMappings; CATS uses NodePort rp2:30080). devenv cmd_up/down/restart/reset/nuke/deploy thread cluster_target and skip the Mac Postgres container for k3s (DB is in-cluster); nuke does a namespace-scoped hard reset. Context guard accepts the k3s-rp context. Updated the #463 regression test: the server port-forward is now k3s-only and always gated. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01Kk9GxWS9EpazjbwBKfMpUX
…B pages) The Pi 5 (BCM2712) k3s nodes run a 16KB-page kernel; cgr.dev/chainguard/redis's bundled jemalloc is built for 4KB pages and aborts at startup with 'Unsupported system page size', CrashLooping redis and blocking the server (Redis ping times out). redis:7-alpine is built with libc malloc (page-size-agnostic) and runs the same redis-server with identical args — verified starting cleanly on the nodes. The kind path keeps chainguard redis; only the k3s overlay remaps the image. Also document pinning rp2 in the Mac's /etc/hosts (mDNS short-name resolution fails right after a node reboot, aborting dev-up pre-flight). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01Kk9GxWS9EpazjbwBKfMpUX
…reate) kind create cluster sets the kubectl context, but the skip-create path (cluster already exists) did not, so an active context from another cluster (e.g. k3s-rp after working the k3s dev target) lingered and failed deploy's context guard with 'Context k3s-rp is not the expected kind-tmi-dev'. Always use-context the kind context after ensuring the cluster exists, mirroring the k3s path. Idempotent. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01Kk9GxWS9EpazjbwBKfMpUX
ericfitz
added a commit
that referenced
this pull request
Jul 4, 2026
…netes (CLUSTER=docker-desktop default) (#520) * docs(dev): design spec for the Docker Desktop dev target Replace the standalone-kind dev path with Docker Desktop's kind-provisioned Kubernetes (context docker-desktop): registry-free image delivery via ctr import, port-forward endpoint, in-cluster Postgres, chainguard redis (4KB pages). e2e-platform stays on standalone kind (needs Calico/NetworkPolicy + ephemeral clusters). Sequenced after the k3s target (#516). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01Kk9GxWS9EpazjbwBKfMpUX * docs(dev): implementation plan for the Docker Desktop dev target Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01Kk9GxWS9EpazjbwBKfMpUX * feat(dev): add docker-desktop cluster identity + selector plumbing * feat(dev): registry-free image import for docker-desktop (docker save | ctr import) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01Kk9GxWS9EpazjbwBKfMpUX * feat(dev): docker-desktop overlay + in-cluster Postgres (default storageclass) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01Kk9GxWS9EpazjbwBKfMpUX * feat(dev): wire deploy.start/restart/nuke for the docker-desktop target - Rename apply_k3s_postgres -> apply_incluster_postgres(cluster_target) so it resolves the right overlay's postgres.yml (k3s or docker-desktop). - Rename teardown_k3s_namespace -> teardown_namespace and generalize its docstring to cover both no-own-cluster targets. - Extend server port-forward gating in wait_and_forward and restart from cluster_target == 'k3s' to cluster_target in ('k3s', 'docker-desktop'). - Add docker-desktop branch to start(): skip registry setup, import images via build_and_push, bring up in-cluster Postgres via apply_incluster_postgres. - Add docker-desktop branch to restart(): same no-registry guard shape. - Generalize cmd_nuke guard in devenv.py from k3s-only to both no-own-cluster targets; calls teardown_namespace and redeploys via deploy.start. - Update test_server_port_forward_is_k3s_only to assert tuple-form gating; add test_server_port_forward_gated_for_docker_desktop_too. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01Kk9GxWS9EpazjbwBKfMpUX * feat(dev): default CLUSTER to docker-desktop; document setup Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01Kk9GxWS9EpazjbwBKfMpUX * refactor(dev): retire the standalone-kind dev path (docker-desktop is the default) - Delete deployments/k8s/dev/kind-cluster.yml (dev kind config, not e2e) - scripts/lib/cluster.py: remove all kind lifecycle helpers (ensure_registry, is_registry_running, cluster_exists, _kind_node_containers, start_stopped_nodes, connect_registry_to_kind) and kind-only constants (CLUSTER_NAME, REGISTRY_CONTAINER, REGISTRY_IMAGE, REGISTRY_PORT, LOCAL_REGISTRY, KIND_CONFIG); up()/down() now handle only k3s and docker-desktop (raises ValueError on unknown target); registry_for() and expected_context() likewise simplified and made explicit. - scripts/lib/deploy.py: remove kind registry block from start()/restart(), remove REGISTRY_CONTAINER stop from teardown(), update HOST_PORT/NODE_PORT comment block, defaults updated from "kind" to "docker-desktop". - scripts/devenv.py: drop _uses_host_db(), _db_profile(), database import, cmd_db and "db" verb (host Postgres container no longer in use); drop "kind" from --cluster choices; simplify cmd_nuke (always namespace-scoped for k3s/docker-desktop). - scripts/lib/devstatus.py: replace kind/registry/host-db status rows with a single kube-context row; remove cluster and database imports. - All tests updated: kind-specific tests removed, kind defaults updated to docker-desktop, new tests assert kind raises ValueError. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01Kk9GxWS9EpazjbwBKfMpUX * fix(dev): remove dangling dev-db-up/down targets and stale kind wording after kind retirement - Makefile: delete dev-db-up and dev-db-down targets (called now-removed devenv.py db verb) and remove them from .PHONY - scripts/help.py: remove dev-db-up/dev-db-down help entries; update stale "kind" wording to match current docker-desktop/k3s reality - scripts/lib/deploy.py: change seven cluster_target default args from "kind" to "docker-desktop" for consistency with cluster.py Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01Kk9GxWS9EpazjbwBKfMpUX * feat(dev): restore DB=oracle parity on docker-desktop; drop orphaned kind overlays; fix stale comments Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01Kk9GxWS9EpazjbwBKfMpUX --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a selectable k3s dev deployment target (
make dev-up CLUSTER=k3s) that deploys the full TMI stack to a remote k3s cluster with complete parity to the kind path.kindremains the default (CLUSTER=kind); the two are orthogonal toDB=postgres|oracle.Both paths were verified end-to-end this session:
make dev-up CLUSTER=k3sandmake dev-up(kind) each come up clean and servelocalhost:8080→ HTTP 200.What's included
CLUSTER=kind|k3sselector threaded through the Makefile andscripts/devenv.py(defaultkind); orthogonal toDB.scripts/lib/cluster.py: k3s selects the remotek3s-rpcontext (no create/destroy); registry + image refs resolve to the in-cluster registryrp2:30500.deployments/k8s/dev/k3s/registry.yml) — the Mac builds arm64 images and pushes to it; nodes pull from it.deployments/k8s/dev/k3s/) — full workload set (server, controller, redis, extractor + chunk-embed workers), all TMI images remapped torp2:30500; in-cluster Postgres on longhorn; DB-URL host rewritten to thepostgresService.deploy.start/devenv wiring — registry bootstrap, in-cluster Postgres prerequisite, cluster-aware context guard, and a server port-forward to preserve thelocalhost:8080contract (k3s has noextraPortMappings; CATS uses the NodePort atrp2:30080).dev-nukedoes a namespace-scoped hard reset.Notable fixes discovered during bring-up
redis:7-alpine(libc malloc) runs the sameredis-server/args cleanly. Kind keeps chainguard redis; only the k3s overlay remaps it.cluster.upnow alwaysuse-contexts the kind context, not just onkind create, so an active context from another cluster (e.g.k3s-rp) no longer fails the deploy context guard./etc/hostspin forrp2— documented, so kubectl resolves the API server deterministically after a node reboot (mDNS short-name resolution is unreliable then).One-time k3s host/node setup
Documented in
deployments/k8s/dev/k3s/README-node-setup.md: Mac/etc/hostspin + insecure-registry, and per-node containerd registry mirror (endpoint by IP).Testing
make test-dev-scripts— 88 unit tests pass (cluster-aware helpers,--clusterparser, DB-host rewrite, port-forward gating).make dev-up CLUSTER=k3s(full stack, HTTP 200) andmake dev-up(kind non-regression, HTTP 200).🤖 Generated with Claude Code
https://claude.ai/code/session_01Kk9GxWS9EpazjbwBKfMpUX