fix(image): pin dev/warm pods + prepuller to immutable hash tag (stop stale :latest serving + warm thrash)#202
Merged
Merged
Conversation
Root cause of 'new image never reaches pods' (codex stayed broken after apply) and the warm-rotation thrash: Pods used :latest + imagePullPolicy=IfNotPresent. After a rebuild, a node that already has an old :latest cached does NOT re-pull — kubelet serves the stale image to every new pod until the prepuller finishes re-pulling 27GB (~5-6min/node, 24 nodes). A cold reserve in that window gets the old (broken-codex) image; and the #199 warm rotation recycles old-image pods that instantly come back on the cached old :latest -> recycled again -> thrash. Fix (the pattern #191 already uses for build jobs): pin pods to the immutable hash tag latest-<context-hash> (local.full_image_uri). Each rebuild = a tag the node has never seen, so IfNotPresent pulls the NEW image -> guaranteed-correct, no stale window, and the warm rotation converges (the recycled pod can't come up on the old cache; it pulls the new tag). Prepuller pinned to the same tag so it pre-warms the exact ref. Tag is immutable/stable, so OOM-restart still works. ami-baker/eks user-data keep :latest (boot-time LAYER prewarm; same digest, fast manifest-only pod pull). No docker files changed -> no image rebuild on apply; this is a lambda-env + prepuller-DS change only.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Root cause (the '3 hours and codex still hangs' bug)
Pods used
:latest+imagePullPolicy: IfNotPresent. Images are cached per node. After a rebuild, a node that already has an old:latestcached does not re-pull — the kubelet serves the stale image to every new pod until the prepuller finishes re-pulling the 27 GB image (~5–6 min × 24 nodes).Consequences observed live this session:
:latest→ recycled again → thrash (pods recreated every 1–2 s, none ever on the new image).Verified:
:latestdigest was the new image, but 0 pods were running it; all 24 prepuller pods were still mid-pull (Init:0/1).Fix
Pin dev + warm pods (
GPU_DEV_CONTAINER_IMAGE) and the prepuller to the immutable hash taglatest-<context-hash>(local.full_image_uri) — the exact pattern #191 already uses for the build/ondemand jobs. Each rebuild = a tag the node has never seen, soIfNotPresentpulls the new image → guaranteed-correct, no stale window, and the warm rotation converges (a recycled pod can't come up on the old cache — it must pull the new tag). Prepuller pinned to the same tag so it pre-warms the exact ref.:latestwas used).ami-baker/eksuser-data keep:latest(boot-time layer prewarm; same digest → pod pull is manifest-only/fast). Correctness now comes from the hash tag, not the prewarm.