docs: retarget deploy docs from Cloud Run to k3s + Argo CD by robbiebyrd · Pull Request #19 · Keeping-History/time-machine-web-proxy

robbiebyrd · 2026-06-15T17:25:37Z

Summary

Production moved off Google Cloud Run to a self-hosted 2-node k3s cluster (nodes joined over Tailscale/WireGuard). The build (GitHub Actions → GHCR) and rollout (Argo CD GitOps) layers are unchanged — only the runtime substrate changed. The deploy docs still described the old Cloud Run + Memorystore + VPC Connector + Cloud Build pipeline, so they're rewritten to match.

All details were verified against the live manifests in Keeping-History/infra (apps/time-machine/), not assumed.

Changes

README.md — replace the "Deployment (Google Cloud Run)" section with GHCR + Argo CD: single-replica Deployment, gcsfuse native sidecar, in-cluster Redis, Ingress-terminated TLS/WSS.
docs/deployment.md — full rewrite for k3s / Argo CD / GHCR:
- gcsfuse native sidecar (initContainer w/ restartPolicy: Always) mounting tm-cache-723408812472
- GCS key in its own gcs-sa-key Secret (not time-machine-secrets)
- Redis on a local-path PVC with --appendonly
- pod MTU = 1280 gotcha (WireGuard overhead black-holes large TLS egress packets → UND_ERR_CONNECT_TIMEOUT)
- ProxyMesh section flagged currently disabled in prod (direct egress)
docs/post-deploy.md — translate gcloud checks to kubectl; replace the Cloud Run flags check with rollout / image / sidecar-health verification; rollback via kubectl rollout undo / argocd app rollback.
CLAUDE.md — note the cluster is self-hosted k3s (not GKE) + the MTU gotcha.

Legacy Cloud Build / Cloud Run artifacts (cloudbuild.yaml, deploy.sh, .gcloudignore) are left in-tree, labeled reference-only.

Notes

Docs only — no code changed; nothing to build or test.
.env.prod was intentionally not included (project rules forbid committing it).

🤖 Generated with Claude Code

…can't kill the process The recursive crawl exposed a process-fatal hazard. A 200 response body is returned as a Node Readable carrying the request's armed AbortSignal.timeout. If a consumer abandons it before reading (cache.writeStream throws at mkdir/rename on a path collision), the timeout fires ~10s later, aborts the dangling stream, and the unhandled 'error' becomes an uncaughtException. Since the HTTP server and BullMQ workers share one process, the whole server dies (CrashLoopBackOff). Attach a bound no-op 'error' listener to the returned body so an orphaned stream can never surface as an uncaughtException; real consumers use stream.pipeline and still observe/propagate the error. Also cancel the body on 4xx/5xx paths so a non-200 response can't leave a stream dangling either. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…re-child case A URL namespace lets a node be both a leaf (the `/a` page) and an internal node (parent of `/a/b`), but a filesystem name can't be both a file and a directory. The recursive crawl hit this constantly: when a child URL is cached before its parent page, the slot at `<root>/a` becomes a directory, and writing the `/a` page then fails EISDIR on rename — and lookup would even return the directory as a file body (EISDIR on read). - writeStream: on EISDIR, store the page at `<dest>/index.html` — the directory-index form lookup already probes. - lookup: when the primary path is a directory, skip it and fall through to the `<path>/index.html` probe instead of returning a directory as a file. No cache migration: existing file-form entries are still found by the primary probe; both forms coexist. The inverse direction (a page cached as a file, then a child needs that name as a directory) still fails the child's write gracefully (no crash, served live on demand) — a deeper canonicalization fix is tracked separately. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Production moved off Cloud Run to a self-hosted 2-node k3s cluster (joined over Tailscale/WireGuard). The build (GHCR) and rollout (Argo CD GitOps) layers are unchanged; only the runtime substrate changed. Rewrite the deploy docs to match, verified against the live manifests in Keeping-History/infra (apps/time-machine/): - README: replace the Cloud Run deploy section with GHCR + Argo CD; single-replica Deployment, gcsfuse native sidecar, in-cluster Redis. - docs/deployment.md: full rewrite (k3s/Argo CD/GHCR). Native gcsfuse sidecar mounting tm-cache-723408812472; GCS key in its own gcs-sa-key Secret (not time-machine-secrets); Redis on a local-path PVC; pod MTU=1280 gotcha for WireGuard TLS egress; ProxyMesh currently disabled. - docs/post-deploy.md: translate gcloud checks to kubectl; replace the Cloud Run flags check with rollout/image/sidecar-health verification. - CLAUDE.md: note the cluster is self-hosted k3s (not GKE) + the MTU gotcha. Legacy Cloud Build/Cloud Run artifacts left in-tree, labeled reference-only. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Robbie Byrd and others added 3 commits June 15, 2026 17:16

robbiebyrd merged commit 3439f2b into main Jun 15, 2026
1 check passed

robbiebyrd mentioned this pull request Jun 15, 2026

fix(cache): handle file/directory path collisions reactively #20

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

docs: retarget deploy docs from Cloud Run to k3s + Argo CD#19

docs: retarget deploy docs from Cloud Run to k3s + Argo CD#19
robbiebyrd merged 3 commits into
mainfrom
docs/k3s-argocd-deploy

robbiebyrd commented Jun 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

robbiebyrd commented Jun 15, 2026

Summary

Changes

Notes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant