Skip to content

feat(deploy): add OpenShift setup script (setup-ocp.sh)#43

Merged
pdettori merged 4 commits into
mainfrom
feat/setup-ocp-41
Jul 1, 2026
Merged

feat(deploy): add OpenShift setup script (setup-ocp.sh)#43
pdettori merged 4 commits into
mainfrom
feat/setup-ocp-41

Conversation

@pdettori

@pdettori pdettori commented Jul 1, 2026

Copy link
Copy Markdown
Member

Closes #41.

Stacked on #42 (harness image fix). Review/merge #42 first; this PR's base will retarget to main once #42 lands. The diff shown here is the setup-script work only.

What

deploy/knative/setup-ocp.sh — the OpenShift-native sibling of setup-kind.sh, targeting OpenShift 4.20+. Base bring-up: OpenShift Serverless (Knative + Kourier), Redis, sandbox, leaf-work PVC, LLM secret, and the harness Knative Service, reachable over its auto-created Route.

Manifests stay shared with Kind via a small deploy/knative/overlays/ocp kustomize overlay (OCP tweaks are patches, not forked YAMLs).

How it differs from Kind (issue #41)

Concern OpenShift approach
Knative Red Hat OpenShift Serverless Operator (OLM Subscription) + KnativeServing CR (Kourier bundled)
Config feature flags + autoscaler tuning in the CR spec (the operator reverts config-* ConfigMap patches)
Ingress auto-created OpenShift Route (ksvc status.url); no Kourier port-forward
Harness UID keeps runAsUser 65532; SA granted nonroot-v2 SCC so the explicit non-root UID is admitted (the GHCR image declares no USER) — issue #41 item #4b
Sandbox pre-baked sandbox.Dockerfile (USER 65532), built in-cluster against the internal registry; also published to GHCR by build.yaml
Image published GHCR image via --image (no dev.local/kind load)

Flags: --help, --dry-run, --image, --namespace, --sandbox-image, --skip-sandbox-build, --serverless-channel, --skip-keda. Idempotent (safe to re-run).

Scope

Base bring-up only. KEDA (async leaf, via the Custom Metrics Autoscaler Operator) and the optional Redis Enterprise Operator are deferred follow-ups — --skip-keda is the default. Redis stays the lightweight in-repo Deployment.

CI

build.yaml is extended (matrix) to build & push the sandbox image (ghcr.io/kagenti/serverless-harness-sandbox) the same way as the harness image.

Verification (live, OpenShift 4.20.8)

  • ✅ OpenShift Serverless Operator install (idempotent) + KnativeServing Ready with the PVC/securityContext feature flags & autoscaler tuning in the CR
  • nonroot-v2 SCC grant; harness pod runs non-root (was CreateContainerConfigError: runAsNonRoot before the fix)
  • ✅ in-cluster sandbox build → internal registry; sandbox pod Running under restricted-v2
  • leaf-work PVC binds (gp3-csi, RWO)
  • ksvc/serverless-harness Ready; GET /health → HTTP 200 over the Route; POST /turn routes and creates a session
  • ✅ idempotent re-run (operator install skipped)

A full /turn inference requires egress to the LLM gateway, which is environment-specific (this cluster couldn't reach the private gateway). bash -n + shellcheck clean; oc kustomize overlay renders.

Storage caveat

leaf-work is ReadWriteOnce. On block storage (AWS EBS gp3-csi) that binds to a single node — fine for the single harness consumer in base bring-up. Concurrent multi-node scale-out or co-mounting with the leaf-orchestrator would need RWX (documented in SMOKE.md).

Assisted-By: Claude Code

@pdettori

pdettori commented Jul 1, 2026

Copy link
Copy Markdown
Member Author

Update: OpenShift smoke drivers + opt-in KEDA (commit 3ea4696)

Two capabilities added on top of the base bring-up:

Smoke on OpenShift. lib.sh/smoke.sh now support a Route mode — export KSVC_URL and the Kind smoke/experiment drivers run against the OCP Route (no Kourier port-forward, no Host header, -k for the router cert). Kind behavior is unchanged.

KSVC_URL=$(oc get ksvc serverless-harness -n default -o jsonpath='{.status.url}') ./deploy/knative/smoke.sh

Live result on OCP 4.20.8: 4/6 claims PASS — health, scale-to-zero, scale-up-from-zero, 404, and Redis session recall across a cold start (matching sessionId). The 2 failing claims assert on the LLM /turn response text, which needs cluster egress to the Anthropic gateway (environment-specific, not a harness defect).

KEDA opt-in. --with-keda (+ --keda-channel) installs the Red Hat Custom Metrics Autoscaler Operator (openshift-keda Subscription + KedaController CR) for the async-leaf ScaledJob path; default stays --skip-keda. Verified live: KedaControllerInstallation Succeeded, all openshift-keda deployments Available, scaledjobs.keda.sh CRD present. (Wiring/verifying the async-leaf ScaledJob itself on OCP is a further step.)

Assisted-By: Claude Code

Base automatically changed from fix/harness-dockerfile-work-queue-deps to main July 1, 2026 17:07
Paolo Dettori added 2 commits July 1, 2026 13:08
Adds deploy/knative/setup-ocp.sh, the OpenShift-native sibling of
setup-kind.sh (issue #41), plus a shared OCP kustomize overlay, a
pre-baked sandbox image, docs, and CI to publish the sandbox image.

Base bring-up on OpenShift 4.20+:
- OpenShift Serverless Operator (OLM Subscription) + KnativeServing CR with
  the PVC/securityContext feature flags and autoscaler tuning set in the CR
  spec (the operator reverts direct ConfigMap patches).
- Redis, sandbox, leaf-work PVC, LLM secret and the harness Knative Service
  applied via deploy/knative/overlays/ocp; OCP tweaks are kustomize patches,
  base YAMLs stay shared with Kind.
- Sandbox image pre-baked (deploy/knative/sandbox.Dockerfile, sets USER
  65532) and built in-cluster against the internal registry; also published
  to GHCR by build.yaml alongside the harness image.
- Harness SA granted the nonroot-v2 SCC so its explicit non-root UID is
  admitted (the published image declares no USER) - issue #41 item #4b.
- Ingress via the auto-created OpenShift Route (no Kourier port-forward).
- Idempotent; --dry-run/--help and --image/--namespace/--skip-sandbox-build
  flags.

KEDA (async leaf) and the optional Redis Enterprise Operator are deferred
follow-ups (--skip-keda is the default).

Verified end-to-end on OpenShift 4.20.8: operator install, KnativeServing
Ready, SCC grant, in-cluster sandbox build, PVC bind (gp3-csi RWO), ksvc
Ready, /health 200 over the Route, and an idempotent re-run. A full /turn
inference needs LLM-gateway egress, which is environment-specific.

Depends on #42 (harness image fix) for a runnable default image.

Assisted-By: Claude (Anthropic AI) <noreply@anthropic.com>
Signed-off-by: Paolo Dettori <paolo.dettori@example.com>
- lib.sh/smoke.sh: add Route mode (export KSVC_URL) so the Kind smoke and
  experiment drivers run against the OpenShift Route - no Kourier port-forward,
  no Host header, -k for the router cert. Kind behavior unchanged.
- setup-ocp.sh: add --with-keda (+ --keda-channel) to install the Red Hat
  Custom Metrics Autoscaler Operator (openshift-keda Subscription +
  KedaController CR) for the async-leaf ScaledJob path; default stays --skip-keda.
- SMOKE.md: document OpenShift smoke usage and --with-keda.

Verified on OpenShift 4.20.8:
- KSVC_URL=<route> ./smoke.sh -> 4/6 claims PASS (health, scale-to-zero,
  scale-up, 404; Redis session recall confirmed via a matching sessionId across
  a cold start). The 2 failing claims assert on the LLM /turn response, which
  needs cluster egress to the Anthropic gateway (environment-specific).
- setup-ocp.sh --with-keda -> CMA operator installed, KedaController
  "Installation Succeeded", all openshift-keda deployments Available, scaledjobs
  CRD present.

Assisted-By: Claude (Anthropic AI) <noreply@anthropic.com>
Signed-off-by: Paolo Dettori <paolo.dettori@example.com>
@pdettori pdettori force-pushed the feat/setup-ocp-41 branch from 3ea4696 to dcff251 Compare July 1, 2026 17:08
Add deploy/knative/README-ocp.md: prerequisites, setup-ocp.sh quick start +
options, what it installs, Route access, smoke-on-OCP, --with-keda, storage/SCC
notes, image delivery (GHCR / in-cluster build), troubleshooting and cleanup.

Slim the SMOKE.md OpenShift section to the smoke-on-OCP usage plus a pointer to
the guide, and link the guide from README.md (Quick Start note, table of
contents, Documentation).

Assisted-By: Claude (Anthropic AI) <noreply@anthropic.com>
Signed-off-by: Paolo Dettori <paolo.dettori@example.com>
@pdettori

pdettori commented Jul 1, 2026

Copy link
Copy Markdown
Member Author

Update: OpenShift install guide (commit ce3a9a6)

Added deploy/knative/README-ocp.md — a full OpenShift install guide: prerequisites, setup-ocp.sh quick start + flag reference, what it installs, Route access, smoke-on-OCP, --with-keda, storage/SCC notes, image delivery (GHCR or in-cluster build), troubleshooting, and cleanup.

Also: slimmed the SMOKE.md OpenShift section to the smoke usage + a pointer, and linked the guide from README.md (Quick Start note, ToC, Documentation).

Assisted-By: Claude Code

@pdettori pdettori left a comment

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Posted as COMMENT, not APPROVE — GitHub disallows approving your own PR. Findings are approve-level: no blocking issues.)

Adds setup-ocp.sh, the OpenShift-native sibling of setup-kind.sh (OCP 4.20+): OpenShift Serverless operator + KnativeServing CR, pre-baked sandbox image, nonroot-v2 SCC grant, shared manifests via an overlays/ocp kustomize overlay, plus Route-mode support in the smoke drivers (KSVC_URL) and docs. Clean shell (set -euo pipefail, dry-run, idempotent, secrets never echoed), correct set -u-safe empty-array curl expansion, and the OpenShift arbitrary-UID pattern (chgrp 0 / g=u / USER 65532) is right.

Verified the non-default --namespace path holds: every base YAML embeds namespace: default (redis, sandbox, PVC, service, SA, Role, RoleBinding subject) and redis://redis.default.svc:6379, so both render_overlay sed rewrites catch them all. #42 is merged, so this diff is clean against main.

Only two optional nits below — nothing blocking.


Author: pdettori (MEMBER — maintainer) - Areas: CI, Shell, Dockerfile, Kustomize, Docs - Commits: 3, all DCO-signed - CI: 9/9 passing

Assisted-By: Claude Code

Comment thread .github/workflows/build.yaml Outdated
# Built the same way as the harness so OCP can pull it from GHCR instead
# of building it in-cluster (issue #41).
- image: ghcr.io/kagenti/serverless-harness-sandbox
context: .

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: the sandbox image builds with context: . (whole repo), but sandbox.Dockerfile has no COPY — a scoped context (e.g. deploy/knative) would ship a smaller build context. Purely cosmetic; the built image is identical either way.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 81bf425 — the sandbox build now uses context: deploy/knative (file stays deploy/knative/sandbox.Dockerfile, which build-push-action resolves from the workspace root). Verified locally that the image builds identically with the scoped context. Thanks!

# (oc new-build --binary --strategy=docker). The harness routes agent tool
# execution into this pod via `kubectl exec`, so it needs bash + GNU coreutils,
# findutils and grep on PATH.
FROM alpine:3.20

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: FROM alpine:3.20 is tag-pinned, not digest-pinned. hadolint is green and this matches the repo's existing convention, so optional — a @sha256:… digest would harden reproducibility if you later want a stricter supply-chain policy.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leaving this as tag-pinned for now, for two reasons: (1) consistency — the harness Dockerfile pins node:22-alpine by tag too, and hadolint is green, so digest-pinning just the sandbox would be a lopsided/false-hardening signal; (2) a correct multi-arch pin needs the manifest-list (index) digest, and build.yaml runs on main-push only (not PRs), so I can't CI-verify a digest change on this PR before merge. Happy to do digest-pinning as a dedicated, repo-wide supply-chain PR (both images, index digests) if you want to adopt that policy — just say the word.

Address review nit: sandbox.Dockerfile has no COPY, so build the sandbox image
with context deploy/knative instead of the whole repo — smaller build context,
identical image.

Assisted-By: Claude (Anthropic AI) <noreply@anthropic.com>
Signed-off-by: Paolo Dettori <paolo.dettori@example.com>
@pdettori pdettori merged commit 2a670a2 into main Jul 1, 2026
9 checks passed
@pdettori pdettori deleted the feat/setup-ocp-41 branch July 1, 2026 19:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

deploy: Add OpenShift setup script (setup-ocp.sh) — Knative stack only supports Kind today

1 participant