feat(deploy): add OpenShift setup script (setup-ocp.sh)#43
Conversation
Update: OpenShift smoke drivers + opt-in KEDA (commit 3ea4696)Two capabilities added on top of the base bring-up: Smoke on OpenShift. KSVC_URL=$(oc get ksvc serverless-harness -n default -o jsonpath='{.status.url}') ./deploy/knative/smoke.shLive result on OCP 4.20.8: 4/6 claims PASS — health, scale-to-zero, scale-up-from-zero, 404, and Redis session recall across a cold start (matching KEDA opt-in. Assisted-By: Claude Code |
Adds deploy/knative/setup-ocp.sh, the OpenShift-native sibling of setup-kind.sh (issue #41), plus a shared OCP kustomize overlay, a pre-baked sandbox image, docs, and CI to publish the sandbox image. Base bring-up on OpenShift 4.20+: - OpenShift Serverless Operator (OLM Subscription) + KnativeServing CR with the PVC/securityContext feature flags and autoscaler tuning set in the CR spec (the operator reverts direct ConfigMap patches). - Redis, sandbox, leaf-work PVC, LLM secret and the harness Knative Service applied via deploy/knative/overlays/ocp; OCP tweaks are kustomize patches, base YAMLs stay shared with Kind. - Sandbox image pre-baked (deploy/knative/sandbox.Dockerfile, sets USER 65532) and built in-cluster against the internal registry; also published to GHCR by build.yaml alongside the harness image. - Harness SA granted the nonroot-v2 SCC so its explicit non-root UID is admitted (the published image declares no USER) - issue #41 item #4b. - Ingress via the auto-created OpenShift Route (no Kourier port-forward). - Idempotent; --dry-run/--help and --image/--namespace/--skip-sandbox-build flags. KEDA (async leaf) and the optional Redis Enterprise Operator are deferred follow-ups (--skip-keda is the default). Verified end-to-end on OpenShift 4.20.8: operator install, KnativeServing Ready, SCC grant, in-cluster sandbox build, PVC bind (gp3-csi RWO), ksvc Ready, /health 200 over the Route, and an idempotent re-run. A full /turn inference needs LLM-gateway egress, which is environment-specific. Depends on #42 (harness image fix) for a runnable default image. Assisted-By: Claude (Anthropic AI) <noreply@anthropic.com> Signed-off-by: Paolo Dettori <paolo.dettori@example.com>
- lib.sh/smoke.sh: add Route mode (export KSVC_URL) so the Kind smoke and experiment drivers run against the OpenShift Route - no Kourier port-forward, no Host header, -k for the router cert. Kind behavior unchanged. - setup-ocp.sh: add --with-keda (+ --keda-channel) to install the Red Hat Custom Metrics Autoscaler Operator (openshift-keda Subscription + KedaController CR) for the async-leaf ScaledJob path; default stays --skip-keda. - SMOKE.md: document OpenShift smoke usage and --with-keda. Verified on OpenShift 4.20.8: - KSVC_URL=<route> ./smoke.sh -> 4/6 claims PASS (health, scale-to-zero, scale-up, 404; Redis session recall confirmed via a matching sessionId across a cold start). The 2 failing claims assert on the LLM /turn response, which needs cluster egress to the Anthropic gateway (environment-specific). - setup-ocp.sh --with-keda -> CMA operator installed, KedaController "Installation Succeeded", all openshift-keda deployments Available, scaledjobs CRD present. Assisted-By: Claude (Anthropic AI) <noreply@anthropic.com> Signed-off-by: Paolo Dettori <paolo.dettori@example.com>
3ea4696 to
dcff251
Compare
Add deploy/knative/README-ocp.md: prerequisites, setup-ocp.sh quick start + options, what it installs, Route access, smoke-on-OCP, --with-keda, storage/SCC notes, image delivery (GHCR / in-cluster build), troubleshooting and cleanup. Slim the SMOKE.md OpenShift section to the smoke-on-OCP usage plus a pointer to the guide, and link the guide from README.md (Quick Start note, table of contents, Documentation). Assisted-By: Claude (Anthropic AI) <noreply@anthropic.com> Signed-off-by: Paolo Dettori <paolo.dettori@example.com>
Update: OpenShift install guide (commit ce3a9a6)Added Also: slimmed the Assisted-By: Claude Code |
pdettori
left a comment
There was a problem hiding this comment.
(Posted as COMMENT, not APPROVE — GitHub disallows approving your own PR. Findings are approve-level: no blocking issues.)
Adds setup-ocp.sh, the OpenShift-native sibling of setup-kind.sh (OCP 4.20+): OpenShift Serverless operator + KnativeServing CR, pre-baked sandbox image, nonroot-v2 SCC grant, shared manifests via an overlays/ocp kustomize overlay, plus Route-mode support in the smoke drivers (KSVC_URL) and docs. Clean shell (set -euo pipefail, dry-run, idempotent, secrets never echoed), correct set -u-safe empty-array curl expansion, and the OpenShift arbitrary-UID pattern (chgrp 0 / g=u / USER 65532) is right.
Verified the non-default --namespace path holds: every base YAML embeds namespace: default (redis, sandbox, PVC, service, SA, Role, RoleBinding subject) and redis://redis.default.svc:6379, so both render_overlay sed rewrites catch them all. #42 is merged, so this diff is clean against main.
Only two optional nits below — nothing blocking.
Author: pdettori (MEMBER — maintainer) - Areas: CI, Shell, Dockerfile, Kustomize, Docs - Commits: 3, all DCO-signed - CI: 9/9 passing
Assisted-By: Claude Code
| # Built the same way as the harness so OCP can pull it from GHCR instead | ||
| # of building it in-cluster (issue #41). | ||
| - image: ghcr.io/kagenti/serverless-harness-sandbox | ||
| context: . |
There was a problem hiding this comment.
nit: the sandbox image builds with context: . (whole repo), but sandbox.Dockerfile has no COPY — a scoped context (e.g. deploy/knative) would ship a smaller build context. Purely cosmetic; the built image is identical either way.
There was a problem hiding this comment.
Done in 81bf425 — the sandbox build now uses context: deploy/knative (file stays deploy/knative/sandbox.Dockerfile, which build-push-action resolves from the workspace root). Verified locally that the image builds identically with the scoped context. Thanks!
| # (oc new-build --binary --strategy=docker). The harness routes agent tool | ||
| # execution into this pod via `kubectl exec`, so it needs bash + GNU coreutils, | ||
| # findutils and grep on PATH. | ||
| FROM alpine:3.20 |
There was a problem hiding this comment.
nit: FROM alpine:3.20 is tag-pinned, not digest-pinned. hadolint is green and this matches the repo's existing convention, so optional — a @sha256:… digest would harden reproducibility if you later want a stricter supply-chain policy.
There was a problem hiding this comment.
Leaving this as tag-pinned for now, for two reasons: (1) consistency — the harness Dockerfile pins node:22-alpine by tag too, and hadolint is green, so digest-pinning just the sandbox would be a lopsided/false-hardening signal; (2) a correct multi-arch pin needs the manifest-list (index) digest, and build.yaml runs on main-push only (not PRs), so I can't CI-verify a digest change on this PR before merge. Happy to do digest-pinning as a dedicated, repo-wide supply-chain PR (both images, index digests) if you want to adopt that policy — just say the word.
Address review nit: sandbox.Dockerfile has no COPY, so build the sandbox image with context deploy/knative instead of the whole repo — smaller build context, identical image. Assisted-By: Claude (Anthropic AI) <noreply@anthropic.com> Signed-off-by: Paolo Dettori <paolo.dettori@example.com>
Closes #41.
What
deploy/knative/setup-ocp.sh— the OpenShift-native sibling ofsetup-kind.sh, targeting OpenShift 4.20+. Base bring-up: OpenShift Serverless (Knative + Kourier), Redis, sandbox,leaf-workPVC, LLM secret, and the harness Knative Service, reachable over its auto-created Route.Manifests stay shared with Kind via a small
deploy/knative/overlays/ocpkustomize overlay (OCP tweaks are patches, not forked YAMLs).How it differs from Kind (issue #41)
KnativeServingCR (Kourier bundled)config-*ConfigMap patches)ksvcstatus.url); no Kourier port-forwardrunAsUser 65532; SA grantednonroot-v2SCC so the explicit non-root UID is admitted (the GHCR image declares noUSER) — issue #41 item #4bsandbox.Dockerfile(USER 65532), built in-cluster against the internal registry; also published to GHCR bybuild.yaml--image(nodev.local/kind load)Flags:
--help,--dry-run,--image,--namespace,--sandbox-image,--skip-sandbox-build,--serverless-channel,--skip-keda. Idempotent (safe to re-run).Scope
Base bring-up only. KEDA (async leaf, via the Custom Metrics Autoscaler Operator) and the optional Redis Enterprise Operator are deferred follow-ups —
--skip-kedais the default. Redis stays the lightweight in-repo Deployment.CI
build.yamlis extended (matrix) to build & push the sandbox image (ghcr.io/kagenti/serverless-harness-sandbox) the same way as the harness image.Verification (live, OpenShift 4.20.8)
KnativeServingReady with the PVC/securityContext feature flags & autoscaler tuning in the CRnonroot-v2SCC grant; harness pod runs non-root (wasCreateContainerConfigError: runAsNonRootbefore the fix)leaf-workPVC binds (gp3-csi, RWO)ksvc/serverless-harnessReady;GET /health→ HTTP 200 over the Route;POST /turnroutes and creates a sessionA full
/turninference requires egress to the LLM gateway, which is environment-specific (this cluster couldn't reach the private gateway).bash -n+shellcheckclean;oc kustomizeoverlay renders.Storage caveat
leaf-workisReadWriteOnce. On block storage (AWS EBSgp3-csi) that binds to a single node — fine for the single harness consumer in base bring-up. Concurrent multi-node scale-out or co-mounting with the leaf-orchestrator would need RWX (documented inSMOKE.md).Assisted-By: Claude Code