Skip to content

Fix: leaf smokes seed world-writable /work dirs for the non-root harness#40

Merged
pdettori merged 1 commit into
mainfrom
fix/leaf-smoke-result-dir-perms-39
Jul 1, 2026
Merged

Fix: leaf smokes seed world-writable /work dirs for the non-root harness#40
pdettori merged 1 commit into
mainfrom
fix/leaf-smoke-result-dir-perms-39

Conversation

@pdettori

@pdettori pdettori commented Jul 1, 2026

Copy link
Copy Markdown
Member

Summary

Fixes the leaf-work PVC permission mismatch (#39): the leaf-orchestrator pod runs as root and created /work/<run> and /work/<run>/results as 0755 root:root, but the harness ksvc (and the leaf-worker job) run as uid 65532 with readOnlyRootFilesystem. So runLeaf's write of result_ref hit EACCES. Only fast paths that write no result (bad_inputs, workspace-isolation, scale-to-zero) passed, masking the bug.

Fix (issue option 1, factored to a shared helper)

Add seed_work_dirs to deploy/knative/lib.sh — it creates the run dirs via the orchestrator and chmod 777s them, so the non-root harness UID can create result subdirs and write result_ref.

On Kind the ksvc's fsGroup: 65532 is not applied to the orchestrator-created dirs (local-path/hostPath PVC) — that's exactly why the bug exists — so an explicit chmod, not fsGroup, is the reliable fix. Options 2/3 (orchestrator-as-65532 / fsGroup contract) are noted in #39 but don't hold on Kind: option 2 also breaks the orchestrator's apk add.

  • lib.sh: add seed_work_dirs
  • leaf-smoke.sh, leaf-async-smoke.sh: use seed_work_dirs
  • leaf-cron-smoke.sh: seed /work/nightly world-writable (the leaf-worker creates the fire-stamped results subdir under it)
  • leaf-gate-smoke.sh: fold the three inline mkdir -p + chmod -R 0777 blocks into seed_work_dirs (DRY; behavior unchanged)

Also: unrelated fan-out hang in leaf-smoke.sh

Surfaced during verification. The parallel fan-out used a bare wait, which also blocks on the background port-forward started by ensure_port_forward (and the sampler) — so on a clean session with no pre-existing port-forward the smoke hung forever at Claim 1. Now it waits only on the dispatch PIDs.

Verification (live, Kind sh-knative)

  • Isolated repro of the mechanism: a uid-65532 probe pod gets EACCES on a 0755 root dir, and writes OK after chmod 777.
  • Full LEAF_LIVE_SMOKE=1 leaf-smoke.sh9 passed, 0 failed — LEAF SMOKE PASS.
  • result_ref files written by uid 65532 with correct verdicts (i1=FLAGGED, i2=CLEAR, i3=CLEAR).

Closes #39

Assisted-By: Claude Code

…rite result_ref

The leaf-orchestrator pod runs as root and created /work/<run> and
/work/<run>/results as 0755 root:root. The harness ksvc (and the leaf-worker
job) run as uid 65532 with readOnlyRootFilesystem, so runLeaf hit EACCES
writing result_ref into those dirs. Only fast paths that write no result
(bad_inputs, workspace-isolation, scale-to-zero) passed, masking the bug.

Add a shared seed_work_dirs helper in lib.sh that creates the run dirs via the
orchestrator and chmod 777s them, so the non-root harness UID can create result
subdirs and write result_ref. On Kind the ksvc's fsGroup:65532 is NOT applied
to the orchestrator-created dirs (local-path/hostPath PVC), so an explicit
chmod -- not fsGroup -- is the reliable fix here.

- lib.sh: add seed_work_dirs
- leaf-smoke.sh, leaf-async-smoke.sh: use seed_work_dirs
- leaf-cron-smoke.sh: seed /work/nightly world-writable (the leaf-worker
  creates the fire-stamped results subdir under it)
- leaf-gate-smoke.sh: fold the three inline `mkdir -p + chmod -R 0777` blocks
  into seed_work_dirs (DRY; behavior unchanged)

Also fix an unrelated hang in leaf-smoke.sh: the parallel fan-out used a bare
`wait`, which also blocks on the background port-forward started by
ensure_port_forward (and the sampler); on a clean session with no pre-existing
port-forward the smoke hung forever. Wait only on the dispatch PIDs.

Verified live on Kind (sh-knative): leaf-smoke.sh -> 9 passed, 0 failed.
result_ref files are written by uid 65532 with correct verdicts
(i1=FLAGGED, i2=CLEAR, i3=CLEAR).

Closes #39

Assisted-By: Claude (Anthropic AI) <noreply@anthropic.com>
Signed-off-by: Paolo Dettori <dettori@us.ibm.com>
@pdettori pdettori merged commit d953544 into main Jul 1, 2026
9 checks passed
@pdettori pdettori deleted the fix/leaf-smoke-result-dir-perms-39 branch July 1, 2026 14:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

leaf smoke: non-root harness (UID 65532) can't write result_ref into orchestrator-created /work dirs (EACCES)

1 participant