fix(core): make migration Job hooks stable across Helm + ArgoCD#358
fix(core): make migration Job hooks stable across Helm + ArgoCD#358Dav-14 wants to merge 1 commit into
Conversation
`core.job.annotations` had two real-world failure modes operators have
hit across Formance projects (membership, portal, console-v3, and
inherited by every chart that uses the helper):
1. **Identity flip on first upgrade.** The conditional
`{{- if and (not .Release.IsInstall) .Values.feature.migrationHooks }}`
meant the Job was rendered as a plain release resource on
`helm install`, but as a `pre-upgrade` hook on the next
`helm upgrade`. Helm tracks hooks outside the release manifest, so
the same `Job/<release>-migrate` flips identity:
- Install: tracked in release, owned by Helm release.
- Upgrade: emitted as a hook, NOT in release manifest, orphaned.
First post-feature-enable upgrade fails with `Job already exists`
(install-time Job is still there, hook can't claim the same name);
`helm uninstall` later leaves an orphan because Helm only deletes
resources it tracks in the latest release.
2. **`hook-succeeded` swallows logs.** The previous delete policy
`before-hook-creation,hook-succeeded,hook-failed` deleted the
Job (and its pod) immediately on success, leaving `kubectl logs`
empty when an operator goes to investigate a slow migration.
3. **ArgoCD's `helm template` ignored hook annotations entirely**, so
under Argo the Job re-applied on every sync as a plain resource and
tripped Job-spec immutability on the second sync.
This commit replaces the helper with always-emit hook annotations
covering all three orchestration paths:
- `helm.sh/hook: pre-install,pre-upgrade` — stable identity on both
install and upgrade; no flip.
- `helm.sh/hook-delete-policy: before-hook-creation` — handles spec
changes (image bumps) by deleting the previous Job before
recreating. `hook-succeeded` removed; cleanup is bounded by the
Job's `ttlSecondsAfterFinished` instead.
- `argocd.argoproj.io/hook: PreSync` +
`argocd.argoproj.io/hook-delete-policy: BeforeHookCreation` +
`argocd.argoproj.io/sync-wave: "10"` — same intent under Argo.
Also fixes `core.postgres.job.sa.annotations` (previously
deprecated and aliased to `core.job.annotations`, so the SA inherited
the same `hook-weight: "10"` as the Job that referenced it — meaning
the SA had no guarantee of existing when the Job's pod tried to
schedule). Now emits the same hook block but at weight `0` /
sync-wave `0` so the SA is always created first.
`feature.migrationHooks` is no longer consulted; consumers can
drop it from their values without behaviour change. Kept in
deprecated state with the old alias `core.postgres.job.annotations`
for backward-compat.
Bump core to 1.6.0.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: ⛔ Files ignored due to path filters (1)
📒 Files selected for processing (2)
WalkthroughThe Helm Job template annotation strategy is consolidated to unconditionally apply combined Helm and ArgoCD hook annotations (pre-install, pre-upgrade, PreSync) with synchronized policies. The Postgres Job template is updated to delegate to the refactored base template and add explicit service-account-specific hook metadata. ChangesHook Annotation Consolidation
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary
core.job.annotationshas hit three real-world failures across Formance projects (membership, portal, console-v3 — and any chart that consumes it):{{- if and (not .Release.IsInstall) .Values.feature.migrationHooks }}meant the Job was a plain release resource onhelm installand apre-upgradehook on the nexthelm upgrade. Helm tracks hooks outside the release manifest, so the resource flips identity. First upgrade after enabling the feature fails withJob already exists;helm uninstalllater leaves orphans.hook-succeededswallowed logs, leavingkubectl logsempty post-migration when operators try to investigate failures.helm templatemode ignored Helm hook annotations entirely, so under Argo the Job re-applied on every sync as a plain resource and tripped Job-spec immutability on the second sync.Changes
core.job.annotationsto always emit hook annotations covering Helm install + Helm upgrade + ArgoCD sync:core.postgres.job.sa.annotationsfrom a deprecated alias to a real helper. SA now useshook-weight: "0"/sync-wave: "0"so it's created BEFORE the Job that references it (previously both were weight 10 → SA could be missing when the Job's pod tried to schedule).feature.migrationHooksis no longer consulted. Hooks are always emitted because every other mode is broken in some way. Consumers can drop the value with no behaviour change. The old aliascore.postgres.job.annotationsis kept (still deprecated) for backward compat.What this lets consumer charts do
feature.migrationHooks.pre-install,pre-upgradelifecycle on plain Helm AND a cleanPreSynclifecycle on ArgoCD.core.postgres.job.serviceAccountName+core.postgres.job.sa.annotationswithout the SA-not-yet-existing race.Test plan
helm install+helm upgradecycle: Job is a hook on both, identity stable, no orphans.PreSync, recreated each sync viaBeforeHookCreation.kubectl logs.ttlSecondsAfterFinished, not by Helm hook policy.🤖 Generated with Claude Code