docs: retarget deploy docs from Cloud Run to k3s + Argo CD#19
Merged
Conversation
…can't kill the process The recursive crawl exposed a process-fatal hazard. A 200 response body is returned as a Node Readable carrying the request's armed AbortSignal.timeout. If a consumer abandons it before reading (cache.writeStream throws at mkdir/rename on a path collision), the timeout fires ~10s later, aborts the dangling stream, and the unhandled 'error' becomes an uncaughtException. Since the HTTP server and BullMQ workers share one process, the whole server dies (CrashLoopBackOff). Attach a bound no-op 'error' listener to the returned body so an orphaned stream can never surface as an uncaughtException; real consumers use stream.pipeline and still observe/propagate the error. Also cancel the body on 4xx/5xx paths so a non-200 response can't leave a stream dangling either. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…re-child case A URL namespace lets a node be both a leaf (the `/a` page) and an internal node (parent of `/a/b`), but a filesystem name can't be both a file and a directory. The recursive crawl hit this constantly: when a child URL is cached before its parent page, the slot at `<root>/a` becomes a directory, and writing the `/a` page then fails EISDIR on rename — and lookup would even return the directory as a file body (EISDIR on read). - writeStream: on EISDIR, store the page at `<dest>/index.html` — the directory-index form lookup already probes. - lookup: when the primary path is a directory, skip it and fall through to the `<path>/index.html` probe instead of returning a directory as a file. No cache migration: existing file-form entries are still found by the primary probe; both forms coexist. The inverse direction (a page cached as a file, then a child needs that name as a directory) still fails the child's write gracefully (no crash, served live on demand) — a deeper canonicalization fix is tracked separately. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Production moved off Cloud Run to a self-hosted 2-node k3s cluster (joined over Tailscale/WireGuard). The build (GHCR) and rollout (Argo CD GitOps) layers are unchanged; only the runtime substrate changed. Rewrite the deploy docs to match, verified against the live manifests in Keeping-History/infra (apps/time-machine/): - README: replace the Cloud Run deploy section with GHCR + Argo CD; single-replica Deployment, gcsfuse native sidecar, in-cluster Redis. - docs/deployment.md: full rewrite (k3s/Argo CD/GHCR). Native gcsfuse sidecar mounting tm-cache-723408812472; GCS key in its own gcs-sa-key Secret (not time-machine-secrets); Redis on a local-path PVC; pod MTU=1280 gotcha for WireGuard TLS egress; ProxyMesh currently disabled. - docs/post-deploy.md: translate gcloud checks to kubectl; replace the Cloud Run flags check with rollout/image/sidecar-health verification. - CLAUDE.md: note the cluster is self-hosted k3s (not GKE) + the MTU gotcha. Legacy Cloud Build/Cloud Run artifacts left in-tree, labeled reference-only. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Production moved off Google Cloud Run to a self-hosted 2-node k3s cluster (nodes joined over Tailscale/WireGuard). The build (GitHub Actions → GHCR) and rollout (Argo CD GitOps) layers are unchanged — only the runtime substrate changed. The deploy docs still described the old Cloud Run + Memorystore + VPC Connector + Cloud Build pipeline, so they're rewritten to match.
All details were verified against the live manifests in
Keeping-History/infra(apps/time-machine/), not assumed.Changes
gcsfusenative sidecar, in-cluster Redis, Ingress-terminated TLS/WSS.gcsfusenative sidecar (initContainerw/restartPolicy: Always) mountingtm-cache-723408812472gcs-sa-keySecret (nottime-machine-secrets)local-pathPVC with--appendonlyUND_ERR_CONNECT_TIMEOUT)gcloudchecks tokubectl; replace the Cloud Run flags check with rollout / image / sidecar-health verification; rollback viakubectl rollout undo/argocd app rollback.Legacy Cloud Build / Cloud Run artifacts (
cloudbuild.yaml,deploy.sh,.gcloudignore) are left in-tree, labeled reference-only.Notes
.env.prodwas intentionally not included (project rules forbid committing it).🤖 Generated with Claude Code