fix(longhorn): node-local CI work volumes + raise manager memory (JDWLABS-22) by jdwillmsen · Pull Request #8 · jdwlabs/platform

jdwillmsen · 2026-06-06T20:14:26Z

Summary

Stability fixes for the Longhorn instance-manager cascade (JDWLABS-22) that faulted ~11 volumes — including Vault (sealed) and the CI runner _work (EIO/SIGBUS, crashed apps PR #7) — on 2026-06-06.

Closes JDWLABS-23 and JDWLABS-24. Part of epic JDWLABS-22.

Changes

New StorageClass longhorn-ephemeral-local — numberOfReplicas: 1, dataLocality: strict-local, reclaimPolicy: Delete. Node-local, single-replica scratch for throwaway CI work volumes.
Repoint both ARC runner sets (jdwlabs, dotablaze-tech) work-volume claims from longhorn-ephemeral → longhorn-ephemeral-local.
longhorn-manager memory request 512Mi → 1Gi, limit 1Gi → 2Gi.

Why

CI does heavy node_modules small-file I/O. The old 3-replica longhorn-ephemeral SC replicated every write across the network → slow builds (12→30 min) and made CI a casualty of any instance-manager disruption. strict-local single-replica removes the replication load and confines a build's blast radius to its own node. kubernetes containerMode requires a PVC, so emptyDir is not viable — this is the node-local equivalent. Separately, longhorn-manager was OOM-killed (exitCode 137) at 1Gi under churn.

Verification

✅ kubectl apply --dry-run=server on the new StorageClass: accepted by Longhorn.
✅ All edited values files parse as valid YAML.
⬜ Post-merge: ArgoCD platform-longhorn + jdwlabs-arc-runner-set-jdwlabs Synced/Healthy; a CI build runs green on the new SC with improved wall-time.

Scope / follow-ups (separate issues under JDWLABS-22)

Vault HA (JDWLABS-25), cluster memory pressure (JDWLABS-26), CI timeout-minutes (JDWLABS-27), runner node isolation (JDWLABS-28), alerting (JDWLABS-29), PDBs (JDWLABS-30).

🤖 Generated with Claude Code

…LABS-22) Two related stability fixes for the Longhorn instance-manager cascade that faulted ~11 volumes (Vault, CI runner _work) on 2026-06-06. JDWLABS-23: add longhorn-ephemeral-local StorageClass (numberOfReplicas 1, dataLocality strict-local, reclaim Delete) and point both ARC runner-set work-volume claims at it. CI does heavy node_modules small-file I/O; the previous 3-replica longhorn-ephemeral SC replicated every write across the network, which both slowed builds (12->30 min) and made CI a casualty of any instance-manager disruption. A strict-local single replica removes the replication load and confines a CI build's blast radius to its own node. kubernetes containerMode requires a PVC, so emptyDir is not an option; this is the node-local equivalent. JDWLABS-24: raise longhorn-manager memory request 512Mi->1Gi and limit 1Gi->2Gi. It was OOM-killed (exitCode 137) under volume churn, disrupting instance-managers. Verified: new StorageClass passes kubectl apply --dry-run=server; all edited values files parse as valid YAML. Refs JDWLABS-22, JDWLABS-23, JDWLABS-24 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Ticket traceability belongs in commit messages and PR descriptions, not in code/config comments where it rots and clutters the codebase.

jdwillmsen and others added 2 commits June 6, 2026 15:13

chore(longhorn): remove Jira refs from code comments

3123e84

Ticket traceability belongs in commit messages and PR descriptions, not in code/config comments where it rots and clutters the codebase.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(longhorn): node-local CI work volumes + raise manager memory (JDWLABS-22)#8

fix(longhorn): node-local CI work volumes + raise manager memory (JDWLABS-22)#8
jdwillmsen wants to merge 2 commits into
mainfrom
fix/JDWLABS-22-ci-stability

jdwillmsen commented Jun 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jdwillmsen commented Jun 6, 2026

Summary

Changes

Why

Verification

Scope / follow-ups (separate issues under JDWLABS-22)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant