feat(charts): valkey-search Helm chart for self-host and cloud tenants#278
feat(charts): valkey-search Helm chart for self-host and cloud tenants#278KIvanow wants to merge 4 commits into
Conversation
Single chart for both the self-host (marketing) profile and cloud per-tenant provisioning. Ships Valkey with the search module (FT.*), an ACL-hardened named user loaded from an aclfile at boot (survives restarts), optional AOF persistence + maxmemory, and an optional Traefik IngressRouteTCP HostSNI route for public TLS exposure. The container runs the bundle image via args (not command) so its entrypoint auto-loads the search module.
| */}} | ||
| {{- $rules := "~* &* +@all" }} | ||
| {{- if .Values.hardening.disableDangerousCommands }} | ||
| {{- $rules = "~* &* +@all -@dangerous" }} |
There was a problem hiding this comment.
-@dangerous blocks INFO. Verified on valkey-bundle 9.1: INFO is in @dangerous (alongside DEBUG/FLUSHALL), so +@all -@dangerous denies it. The BetterDB Monitor's metric collection — and common library health/version checks — then get NOPERM on every instance provisioned with the default hardening. CONFIG/SLOWLOG/CLIENT are not in @dangerous, so only INFO regresses; easy to grant back:
| {{- $rules = "~* &* +@all -@dangerous" }} | |
| {{- $rules = "~* &* +@all -@dangerous +info" }} |
Re-grant any other @dangerous command a client needs (check with ACL CAT dangerous).
There was a problem hiding this comment.
Good catch, fixed in b8be535 — ACL is now +@all -@dangerous +info.
| {{- if .Values.hardening.disableDangerousCommands }} | ||
| {{- $rules = "~* &* +@all -@dangerous" }} | ||
| {{- end }} | ||
| {{- $acl := printf "user default off\nuser %s on >%s %s\n" $username $password $rules }} |
There was a problem hiding this comment.
A bring-your-own auth.password with whitespace/special chars corrupts the aclfile → CrashLoop. ACL rules are space-delimited, so auth.password: "p@ss w0rd" renders user betterdb on >p@ss w0rd ~* &* +@all — w0rd… becomes stray ACL tokens, the aclfile fails to load, and Valkey CrashLoopBackOffs. Generated randAlphaNum passwords are safe, but the documented BYO value isn't validated. A guard would catch it early, e.g. {{- if regexMatch "\\s" $password }}{{- fail "auth.password must not contain whitespace" }}{{- end }}.
There was a problem hiding this comment.
Fixed in b8be535: added a fail guard rejecting whitespace in auth.password at template time.
| {{- $existing := (lookup "v1" "Secret" .Release.Namespace $secretName) }} | ||
| {{- $password := .Values.auth.password }} | ||
| {{- if not $password }} | ||
| {{- if and $existing $existing.data (index $existing.data "password") }} | ||
| {{- $password = index $existing.data "password" | b64dec }} | ||
| {{- else }} | ||
| {{- $password = randAlphaNum 32 }} |
There was a problem hiding this comment.
The lookup-based password reuse silently rotates the password under GitOps. The "never rotates out from under connected clients" guarantee depends on lookup reading the live Secret. Argo CD / Flux render with helm template (no cluster access), so lookup returns empty → randAlphaNum 32 mints a new password on every sync, rotating the Secret + ACL and breaking connected clients. Worth documenting that GitOps installs must set auth.existingSecret (or a fixed auth.password), since the reuse path can't work without a live cluster read.
There was a problem hiding this comment.
Documented in b8be535 (values.yaml): GitOps installs must set existingSecret or a fixed auth.password.
| size: 2Gi # ~2.5x maxmemory: holds RDB + AOF + rewrite headroom | ||
|
|
||
| valkey: | ||
| maxmemory: 768mb # ~75% of the 1Gi limit |
There was a problem hiding this comment.
maxmemory at ~75% of the pod memory limit risks OOMKill during AOF/RDB rewrite. With maxmemory: 768mb, limits.memory: 1Gi, and appendonly: true, a BGREWRITEAOF/RDB-save fork's copy-on-write pages plus AOF rewrite buffers under write load can push RSS past 1Gi → OOMKilled mid-rewrite. ~256Mi of headroom is tight for a forking persistence model; 50–60% of the limit is the usual guidance once persistence is on.
There was a problem hiding this comment.
Fixed in b8be535: dropped to 512mb (~50% of the limit).
| readinessProbe: | ||
| exec: | ||
| command: | ||
| - sh | ||
| - -c | ||
| - 'valkey-cli --user {{ .Values.auth.username }} --pass "$VALKEY_PASSWORD" ping' |
There was a problem hiding this comment.
Readiness only pings — it never verifies the search module actually loaded. If the module fails to load but valkey-server stays up (FT.* unknown, not a crash), the pod reports Ready while the chart's entire purpose is broken. A readiness probe that runs FT._LIST (or COMMAND DOCS FT.SEARCH) would catch a silent module-load failure. The :latest double-load case is already covered since it crashes, but a non-crashing miss isn't.
There was a problem hiding this comment.
Fixed in b8be535: readiness now runs COMMAND INFO FT.SEARCH | grep -q to confirm the module loaded.
- statefulset: hash the live external Secret's data for checksum/secret so pods roll when an entitlement-owned existingSecret rotates - secret: re-grant +info (INFO is in @dangerous) so Monitor metric collection keeps working under hardening - secret: fail fast when a bring-your-own auth.password contains whitespace (it corrupts the space-delimited ACL file and CrashLoops the pod) - statefulset: readiness now verifies the search module loaded (COMMAND INFO FT.SEARCH) instead of only pinging - values-cloud-tenant: drop maxmemory to ~50% of the limit to avoid OOMKills during AOF/RDB rewrite forks - values: document that GitOps installs must set existingSecret or a fixed password, since lookup-based reuse can't work without a live cluster read Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
| {{- if and $existing $existing.data (index $existing.data "password") }} | ||
| {{- $password = index $existing.data "password" | b64dec }} | ||
| {{- else }} | ||
| {{- $password = randAlphaNum 32 }} |
There was a problem hiding this comment.
GitOps sync rotates password
High Severity
When auth.password is empty, the chart falls back to lookup and then randAlphaNum if no Secret exists. GitOps render paths (helm template, many Argo CD flows) cannot lookup, so every sync generates a new password, rewrites the Secret and ACL, and disconnects clients despite the upgrade-reuse intent.
Reviewed by Cursor Bugbot for commit b8be535. Configure here.
There was a problem hiding this comment.
Documented in b8be535 (values.yaml): GitOps installs must set existingSecret or a fixed auth.password since the lookup-reuse path needs a live cluster read.
When auth.existingSecret is set the chart renders no users.acl, so the hardening flag does not control runtime permissions. Show an external-ACL note instead of misstating the tenant's actual permissions. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
There are 2 total unresolved issues (including 1 from previous review).
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 36b8592. Configure here.
… explicit The whitespace guard only checked .Values.auth.password, so a password loaded from an existing Secret (the upgrade lookup path) bypassed it and could still corrupt the space-delimited ACL file. Guard the resolved $password instead, covering explicit, looked-up, and generated values.


Single chart for both the self-host (marketing) profile and cloud per-tenant provisioning. Ships Valkey with the search module (FT.*), an ACL-hardened named user loaded from an aclfile at boot (survives restarts), optional AOF persistence + maxmemory, and an optional Traefik IngressRouteTCP HostSNI route for public TLS exposure.
The container runs the bundle image via args (not command) so its entrypoint auto-loads the search module.
Summary
Changes
Checklist
roborev review --branchor/roborev-review-branchin Claude Code (internal)Note
Medium Risk
New deploy surface for credentials, ACLs, and optional public TLS routing; misconfiguration (GitOps password rotation, external Secret ACLs) could break clients or expose Valkey incorrectly.
Overview
Adds a new
charts/valkey-searchHelm chart that deploys Valkey with the search module (valkey/valkey-bundle) as a single-replica StatefulSet, ClusterIP Service, and install notes.Self-host defaults use chart-managed auth: a Secret with
passwordand boot-loadedusers.acl(default user off, optional-@dangeroushardening with+infofor monitoring), stable passwords on upgrade via Helmlookup, and whitespace validation on passwords. Cloud tenants can point atauth.existingSecretand layervalues-cloud-tenant.yamlfor persistence,maxmemory, resources, and public exposure.The workload starts
valkey-serverviaargsso the bundle entrypoint still loads the search module; readiness checksFT.SEARCHregistration, not just PING. Optional TraefikIngressRouteTCPregisters a per-tenant HostSNI route whenexposure.publicandexposure.sniRouteare enabled. Pod rollout annotations hash chart-rendered or external Secrets so credential/ACL changes trigger restarts.Reviewed by Cursor Bugbot for commit 3877eda. Bugbot is set up for automated code reviews on this repo. Configure here.