Fix: leaf smokes seed world-writable /work dirs for the non-root harness#40
Merged
Merged
Conversation
…rite result_ref The leaf-orchestrator pod runs as root and created /work/<run> and /work/<run>/results as 0755 root:root. The harness ksvc (and the leaf-worker job) run as uid 65532 with readOnlyRootFilesystem, so runLeaf hit EACCES writing result_ref into those dirs. Only fast paths that write no result (bad_inputs, workspace-isolation, scale-to-zero) passed, masking the bug. Add a shared seed_work_dirs helper in lib.sh that creates the run dirs via the orchestrator and chmod 777s them, so the non-root harness UID can create result subdirs and write result_ref. On Kind the ksvc's fsGroup:65532 is NOT applied to the orchestrator-created dirs (local-path/hostPath PVC), so an explicit chmod -- not fsGroup -- is the reliable fix here. - lib.sh: add seed_work_dirs - leaf-smoke.sh, leaf-async-smoke.sh: use seed_work_dirs - leaf-cron-smoke.sh: seed /work/nightly world-writable (the leaf-worker creates the fire-stamped results subdir under it) - leaf-gate-smoke.sh: fold the three inline `mkdir -p + chmod -R 0777` blocks into seed_work_dirs (DRY; behavior unchanged) Also fix an unrelated hang in leaf-smoke.sh: the parallel fan-out used a bare `wait`, which also blocks on the background port-forward started by ensure_port_forward (and the sampler); on a clean session with no pre-existing port-forward the smoke hung forever. Wait only on the dispatch PIDs. Verified live on Kind (sh-knative): leaf-smoke.sh -> 9 passed, 0 failed. result_ref files are written by uid 65532 with correct verdicts (i1=FLAGGED, i2=CLEAR, i3=CLEAR). Closes #39 Assisted-By: Claude (Anthropic AI) <noreply@anthropic.com> Signed-off-by: Paolo Dettori <dettori@us.ibm.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes the
leaf-workPVC permission mismatch (#39): theleaf-orchestratorpod runs as root and created/work/<run>and/work/<run>/resultsas0755 root:root, but the harness ksvc (and theleaf-workerjob) run as uid 65532 withreadOnlyRootFilesystem. SorunLeaf's write ofresult_refhitEACCES. Only fast paths that write no result (bad_inputs, workspace-isolation, scale-to-zero) passed, masking the bug.Fix (issue option 1, factored to a shared helper)
Add
seed_work_dirstodeploy/knative/lib.sh— it creates the run dirs via the orchestrator andchmod 777s them, so the non-root harness UID can create result subdirs and writeresult_ref.lib.sh: addseed_work_dirsleaf-smoke.sh,leaf-async-smoke.sh: useseed_work_dirsleaf-cron-smoke.sh: seed/work/nightlyworld-writable (the leaf-worker creates the fire-stamped results subdir under it)leaf-gate-smoke.sh: fold the three inlinemkdir -p + chmod -R 0777blocks intoseed_work_dirs(DRY; behavior unchanged)Also: unrelated fan-out hang in
leaf-smoke.shSurfaced during verification. The parallel fan-out used a bare
wait, which also blocks on the background port-forward started byensure_port_forward(and the sampler) — so on a clean session with no pre-existing port-forward the smoke hung forever at Claim 1. Now it waits only on the dispatch PIDs.Verification (live, Kind
sh-knative)EACCESon a0755 rootdir, and writes OK afterchmod 777.LEAF_LIVE_SMOKE=1 leaf-smoke.sh→9 passed, 0 failed — LEAF SMOKE PASS.result_reffiles written by uid 65532 with correct verdicts (i1=FLAGGED, i2=CLEAR, i3=CLEAR).Closes #39
Assisted-By: Claude Code