Skip to content

feat(dev): replace standalone-kind dev path with Docker Desktop Kubernetes (CLUSTER=docker-desktop default)#520

Merged
ericfitz merged 10 commits into
mainfrom
feature/docker-desktop-dev-target
Jul 4, 2026
Merged

feat(dev): replace standalone-kind dev path with Docker Desktop Kubernetes (CLUSTER=docker-desktop default)#520
ericfitz merged 10 commits into
mainfrom
feature/docker-desktop-dev-target

Conversation

@ericfitz

@ericfitz ericfitz commented Jul 4, 2026

Copy link
Copy Markdown
Owner

Summary

Replaces the standalone-kind local dev deployment path with Docker Desktop's kind-provisioned Kubernetes (CLUSTER=docker-desktop, the new default). CLUSTER=k3s is retained; the separate e2e-platform kind usage is untouched. Motivation: fewer moving parts, lower resource use, reuse the DD cluster the developer already runs.

Design spec: docs/superpowers/specs/2026-07-03-docker-desktop-dev-target-design.md · Plan: docs/superpowers/plans/2026-07-03-docker-desktop-dev-target.md

What changed

  • Cluster lifecycle — docker-desktop is treated like k3s: select the docker-desktop context, never create/destroy it. dev-nuke is namespace-scoped.
  • Registry-free image delivery — build locally, then docker save <img> | docker exec -i desktop-control-plane ctr -n k8s.io images import -. No registry container, no containerd mirror. (The standalone kind CLI can't address the DD-managed cluster, and DD LoadBalancer isn't localhost here — verified.)
  • Endpoint — port-forward localhost:8080/:6379 (reuses the k3s port-forward, gated to no-own-cluster targets).
  • Database — in-cluster Postgres on DD's default hostpath storageclass; DB-URL host rewritten to the postgres Service. DB=oracle deploys the server-oracle overlay against the external Oracle ADB (no in-cluster Postgres), preserving kind's behavior; CLUSTER=k3s DB=oracle now fails fast (out of scope) instead of hanging.
  • Redis — chainguard redis unchanged (DD's linuxkit kernel is 4KB pages; the k3s redis:7-alpine remap does not apply).
  • Retired the standalone-kind dev machinery: deployments/k8s/dev/kind-cluster.yml, the tmi-dev-registry container + mirror, extraPortMappings, the kind lifecycle in cluster.py, the host-Postgres dev path, and orphaned kind-era overlays. Makefile default is now CLUSTER ?= docker-desktop.

Kept for e2e only: the kind CLI, deployments/k8s/platform/kind-cluster.yml, e2e-platform-*, test/e2e/platform/.

Testing

  • make test-dev-scripts94 unit tests pass (cluster-aware helpers, --cluster docker-desktop parser + default, ctr import command builder, overlay routing incl. oracle, port-forward gating).
  • Live gates: make dev-up CLUSTER=docker-desktopHTTP 200 (7 pods Running); make dev-up CLUSTER=k3sHTTP 200 (non-regression). Oracle overlay verified by render + code routing; live oracle bring-up requires ADB creds (deferred).

New docs

deployments/k8s/dev/docker-desktop/README.md — one-time prereq (enable DD Kubernetes with the kind provisioner), the image-import mechanism, and why e2e stays on kind.

Follow-ups

#517 (pre-import postgres/redis for first-run reliability), #518 (--no-workers path), #519 (import_image_to_node hardening).

🤖 Generated with Claude Code

https://claude.ai/code/session_01Kk9GxWS9EpazjbwBKfMpUX

ericfitz and others added 10 commits July 3, 2026 22:16
Replace the standalone-kind dev path with Docker Desktop's kind-provisioned
Kubernetes (context docker-desktop): registry-free image delivery via ctr import,
port-forward endpoint, in-cluster Postgres, chainguard redis (4KB pages). e2e-platform
stays on standalone kind (needs Calico/NetworkPolicy + ephemeral clusters). Sequenced
after the k3s target (#516).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01Kk9GxWS9EpazjbwBKfMpUX
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01Kk9GxWS9EpazjbwBKfMpUX
… | ctr import)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01Kk9GxWS9EpazjbwBKfMpUX
…ageclass)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01Kk9GxWS9EpazjbwBKfMpUX
- Rename apply_k3s_postgres -> apply_incluster_postgres(cluster_target)
  so it resolves the right overlay's postgres.yml (k3s or docker-desktop).
- Rename teardown_k3s_namespace -> teardown_namespace and generalize its
  docstring to cover both no-own-cluster targets.
- Extend server port-forward gating in wait_and_forward and restart from
  cluster_target == 'k3s' to cluster_target in ('k3s', 'docker-desktop').
- Add docker-desktop branch to start(): skip registry setup, import images
  via build_and_push, bring up in-cluster Postgres via apply_incluster_postgres.
- Add docker-desktop branch to restart(): same no-registry guard shape.
- Generalize cmd_nuke guard in devenv.py from k3s-only to both no-own-cluster
  targets; calls teardown_namespace and redeploys via deploy.start.
- Update test_server_port_forward_is_k3s_only to assert tuple-form gating;
  add test_server_port_forward_gated_for_docker_desktop_too.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01Kk9GxWS9EpazjbwBKfMpUX
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01Kk9GxWS9EpazjbwBKfMpUX
… the default)

- Delete deployments/k8s/dev/kind-cluster.yml (dev kind config, not e2e)
- scripts/lib/cluster.py: remove all kind lifecycle helpers (ensure_registry,
  is_registry_running, cluster_exists, _kind_node_containers, start_stopped_nodes,
  connect_registry_to_kind) and kind-only constants (CLUSTER_NAME, REGISTRY_CONTAINER,
  REGISTRY_IMAGE, REGISTRY_PORT, LOCAL_REGISTRY, KIND_CONFIG); up()/down() now handle
  only k3s and docker-desktop (raises ValueError on unknown target); registry_for() and
  expected_context() likewise simplified and made explicit.
- scripts/lib/deploy.py: remove kind registry block from start()/restart(), remove
  REGISTRY_CONTAINER stop from teardown(), update HOST_PORT/NODE_PORT comment block,
  defaults updated from "kind" to "docker-desktop".
- scripts/devenv.py: drop _uses_host_db(), _db_profile(), database import, cmd_db and
  "db" verb (host Postgres container no longer in use); drop "kind" from --cluster
  choices; simplify cmd_nuke (always namespace-scoped for k3s/docker-desktop).
- scripts/lib/devstatus.py: replace kind/registry/host-db status rows with a single
  kube-context row; remove cluster and database imports.
- All tests updated: kind-specific tests removed, kind defaults updated to
  docker-desktop, new tests assert kind raises ValueError.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01Kk9GxWS9EpazjbwBKfMpUX
…ng after kind retirement

- Makefile: delete dev-db-up and dev-db-down targets (called now-removed
  devenv.py db verb) and remove them from .PHONY
- scripts/help.py: remove dev-db-up/dev-db-down help entries; update
  stale "kind" wording to match current docker-desktop/k3s reality
- scripts/lib/deploy.py: change seven cluster_target default args from
  "kind" to "docker-desktop" for consistency with cluster.py

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01Kk9GxWS9EpazjbwBKfMpUX
…kind overlays; fix stale comments

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01Kk9GxWS9EpazjbwBKfMpUX
@ericfitz ericfitz merged commit 02dbf9b into main Jul 4, 2026
12 checks passed
@ericfitz ericfitz deleted the feature/docker-desktop-dev-target branch July 4, 2026 16:54
ericfitz added a commit that referenced this pull request Jul 4, 2026
…orkers (#517, #518, #519)

Three follow-ups from the Docker Desktop dev-target work (#520), all in the
dev-tooling layer (scripts/lib/deploy.py + dev k8s overlays).

#517 — pre-import postgres/redis base images to avoid the first-run cgr.dev
pull flake. Docker Desktop's containerd pulls cgr.dev/chainguard/{postgres,redis}
independently of the host Docker daemon, and that first pull occasionally fails
with a transient EOF, leaving the pods in ErrImagePull. build_and_push now
`docker pull`s the base images on the host and imports them into the node's
containerd alongside the tmi-* images, and the postgres/redis manifests are
pinned to imagePullPolicy: IfNotPresent so the imported copy is used (a :latest
tag otherwise defaults to Always and re-pulls, defeating the import). The
redis pin is a per-overlay kustomize patch (redis.yml is shared with k3s, which
remaps redis to redis:7-alpine); postgres is pinned directly in the
docker-desktop postgres.yml (applied raw by deploy.py). The base-image set is
db-aware: oracle uses an external ADB and deploys no Postgres pod, so only redis
is imported there.

#518 — remove the --no-workers bring-up path. It applied the raw leaf manifests
(image: localhost:5000/tmi-*:dev), which only worked against the retired kind
local registry and yields ErrImagePull on docker-desktop/k3s. No make target
passes it, so it was developer-manual-only dead/broken code. Dropped the flag
from devenv.py, the no_workers params from start/restart/apply_overlay, and the
_no_workers_files helper.

#519 — harden import_image_to_node against a Popen-raises-before-close hang. If
the importer Popen raised before saver.stdout.close() ran, the parent kept the
pipe's read end open and saver.wait() in the finally could deadlock once the
pipe buffer filled. The importer Popen is now wrapped so saver's stdout is
released (and saver killed) on any exception before the wait.

Unit tests added for the db-aware base-image selection and the import teardown
path; the --no-workers tests were removed. make test-dev-scripts (94) and
make lint pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01Kk9GxWS9EpazjbwBKfMpUX
ericfitz added a commit that referenced this pull request Jul 4, 2026
…orkers (#517, #518, #519) (#521)

Three follow-ups from the Docker Desktop dev-target work (#520), all in the
dev-tooling layer (scripts/lib/deploy.py + dev k8s overlays).

#517 — pre-import postgres/redis base images to avoid the first-run cgr.dev
pull flake. Docker Desktop's containerd pulls cgr.dev/chainguard/{postgres,redis}
independently of the host Docker daemon, and that first pull occasionally fails
with a transient EOF, leaving the pods in ErrImagePull. build_and_push now
`docker pull`s the base images on the host and imports them into the node's
containerd alongside the tmi-* images, and the postgres/redis manifests are
pinned to imagePullPolicy: IfNotPresent so the imported copy is used (a :latest
tag otherwise defaults to Always and re-pulls, defeating the import). The
redis pin is a per-overlay kustomize patch (redis.yml is shared with k3s, which
remaps redis to redis:7-alpine); postgres is pinned directly in the
docker-desktop postgres.yml (applied raw by deploy.py). The base-image set is
db-aware: oracle uses an external ADB and deploys no Postgres pod, so only redis
is imported there.

#518 — remove the --no-workers bring-up path. It applied the raw leaf manifests
(image: localhost:5000/tmi-*:dev), which only worked against the retired kind
local registry and yields ErrImagePull on docker-desktop/k3s. No make target
passes it, so it was developer-manual-only dead/broken code. Dropped the flag
from devenv.py, the no_workers params from start/restart/apply_overlay, and the
_no_workers_files helper.

#519 — harden import_image_to_node against a Popen-raises-before-close hang. If
the importer Popen raised before saver.stdout.close() ran, the parent kept the
pipe's read end open and saver.wait() in the finally could deadlock once the
pipe buffer filled. The importer Popen is now wrapped so saver's stdout is
released (and saver killed) on any exception before the wait.

Unit tests added for the db-aware base-image selection and the import teardown
path; the --no-workers tests were removed. make test-dev-scripts (94) and
make lint pass.


Claude-Session: https://claude.ai/code/session_01Kk9GxWS9EpazjbwBKfMpUX

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
ericfitz added a commit that referenced this pull request Jul 4, 2026
The dev environment was migrated to Docker Desktop Kubernetes (#520/#521),
but CLAUDE.md still described a kind cluster with the database and Redis
running as containers "external to the cluster." That stale description
caused a misdiagnosis when the in-cluster PostgreSQL PVC came up empty:
the real data was assumed lost when it was actually stranded in the old
host Docker volume the new topology no longer mounts.

Update both occurrences to reflect reality:
- Default CLUSTER=docker-desktop (k3s also supported)
- server, PostgreSQL, Redis, and NATS all run in-cluster in the
  tmi-platform namespace (Deployments + StatefulSets)
- PostgreSQL data persists in a Kubernetes PVC (data-postgres-0), NOT a
  host Docker volume; re-provisioning the PVC starts from an empty DB
- With DB=oracle the database is an external managed Oracle ADB
- orchestration is via scripts/devenv.py; manifests under
  deployments/k8s/dev/<cluster>/


Claude-Session: https://claude.ai/code/session_01Kk9GxWS9EpazjbwBKfMpUX

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant