Conversation
Cubic review finding (sdk/src/beta9/inference.py:307): Batch embed() results were being assigned to the singular `embedding` field while `embeddings` (the documented plural list-of-lists) stayed empty. tests/test_cubic_fixes.py::test_batch_embeddings exercises this path and asserts `len(result.embeddings) == 3`. Fix: track whether caller passed a string (single) or a list (batch), and populate: - `embedding` = flat vector for single-input calls (back-compat) - `embeddings` = full list-of-lists for batch calls Co-Authored-By: Claudistrator <savetheplanet@agentosaurus.com>
Cubic review finding (pkg/gateway/inference_router.go:463): UnloadModel previously cleared the registry's LoadState to Idle no matter what the worker returned. Any 4xx/5xx response would leave the registry inconsistent with the worker's actual in-memory model set — a cache-coherency bug. Fix: - On non-200 response: log error with status + up to 4 KiB of body, return an error, and leave the model's LoadState unchanged. - Only on 200: drain body and transition to LoadStateIdle. Added table-driven TestUnloadModelNonOK + TestUnloadModelNodeNotFound in pkg/gateway/inference_router_test.go covering 200/500/404 paths and the missing-node early return. Co-Authored-By: Claudistrator <savetheplanet@agentosaurus.com>
…broke Cubic review finding (pkg/agent/control.go:227): handleInferencePull forwarded to Ollama's /api/pull, looped over the NDJSON body with a decoder, and returned StatusOK unconditionally. A non-200 from Ollama or a malformed stream both produced a false success response. Fix: - Check resp.StatusCode != 200 before streaming; non-OK now returns 502 with a 512-byte body preview from Ollama for diagnosis. - Track decoder errors; if decode fails (or the stream is empty) the handler returns 502 instead of pretending success. - Success case now returns 201 Created to distinguish from the pre-fix false-success 200. Inline P0-A hardening (required by spec, not yet on origin/main): - MaxBytesReader caps request body at 4 KiB. - isAllowedModelName() allowlist [A-Za-z0-9._/:-] + length <= 128 and blocks ".." path-traversal sequences before they reach Ollama. Added TestInferencePullStatus (200/404/500 upstream), TestInferencePullDecodeFailureReturnsBadGateway, TestInferencePullRejectsInvalidModelName, TestInferencePullMethodNotAllowed in pkg/agent/control_test.go. ControlServer gained an optional ollamaBaseURL field so httptest servers can stand in for the daemon. Co-Authored-By: Claudistrator <savetheplanet@agentosaurus.com>
Cubic review finding (pkg/agent/keepalive.go, Start ~L113 / Stop): Calling Stop() twice — or Start() after Stop() — closed an already closed channel and produced a "close of closed channel" panic. The bug is reachable during shutdown races and during retries after a transient failure. Fix: - Added `startOnce` + `stopOnce` sync.Once guards plus an atomic `started` flag. Start is a no-op after the first call; subsequent Stops are no-ops too. - Stop-before-Start is also safe: when started == 0 we return without waiting on doneCh (which would otherwise never close). Tests in pkg/agent/keepalive_test.go: - TestKeepaliveLoopIdempotent — Start, Stop, Stop, Start, Stop - TestKeepaliveLoopConcurrentStop — 32 goroutines stopping in parallel - TestKeepaliveLoopStopBeforeStart — Stop never blocks pre-Start Co-Authored-By: Claudistrator <savetheplanet@agentosaurus.com>
Cubic review finding (manifests/k3d/beta9.yaml around L361): The manifest's comment declared "using NodePort instead" but the only gateway Service was ClusterIP on 1994/1993/9090. Tailscale peers and external workers hitting :31994/:31993 on the OCI node saw connection refused because no NodePort Service ever materialised — matches empty output of `ss -tlnp | grep 31994` on the live host. Fix: add `Service/gateway-nodeport` (type: NodePort) alongside the existing ClusterIP Service, with: - nodePort: 31994 -> targetPort 1994 (HTTP) - nodePort: 31993 -> targetPort 1993 (gRPC) - selector: app=gateway Kept the ClusterIP Service unchanged for in-cluster traffic. Validated with `python3 -c "import yaml; yaml.safe_load_all(...)"`; kubectl dry-run not available on the build host but the Service matches the Kubernetes v1 Service schema used elsewhere in the repo. Co-Authored-By: Claudistrator <savetheplanet@agentosaurus.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Addresses 5 bugs Cubic AI flagged on the (now-closed) upstream PR beam-cloud#1562. All fixes land with tests; full Go + SDK suites pass (62/62 SDK, was 61/62).
Branch off
origin/main; one commit per fix for clean review.Fixes
e463ced4embeddingnow receives a single vector,embeddingsreceives the batch list-of-listssdk/src/beta9/inference.py55adb935UnloadModelstatus-checks worker response; non-200 logs error with ≤4 KiB body, returns error, leavesLoadStateunchangedpkg/gateway/inference_router.gocd7acd37/inference/pullstatus-checks Ollama before decoding; decode errors return 502; includesisAllowedModelName()allowlist (blocks..) +MaxBytesReader(4096)pkg/agent/control.go241e932aKeepaliveLoopStart/Stop guarded withsync.Once+ atomic started flag; Stop-before-Start no longer hangs; Stop is idempotent; concurrent Stop is safepkg/agent/keepalive.go14d47d1fService/gateway-nodeport(type NodePort, ports 31994 + 31993, selectorapp=gateway) alongside the existing ClusterIP Servicemanifests/k3d/beta9.yamlTests added / updated
sdk/tests/test_cubic_fixes.py::test_batch_embeddings— pre-existing, now passespkg/gateway/inference_router_test.go— new —TestUnloadModelNonOK(200/500/404 table) +TestUnloadModelNodeNotFoundpkg/agent/control_test.go— new —TestInferencePullStatus(200/404/500) +TestInferencePullDecodeFailureReturnsBadGateway+TestInferencePullRejectsInvalidModelName+TestInferencePullMethodNotAllowedpkg/agent/keepalive_test.go— new —TestKeepaliveLoopIdempotent+TestKeepaliveLoopConcurrentStop(32 goroutines) +TestKeepaliveLoopStopBeforeStartVerification
go build ./...— cleango test ./... -count=1 -timeout=300s— all packages PASSgo vet ./...— clean on touched files (pre-existing warnings inpkg/worker/worker.goandpkg/gateway/services/stub.gounchanged)cd sdk && python -m pytest tests/ -x --tb=short— 62 passed (was 61 passed / 1 failed before this PR)Notes
kubectl apply --dry-run=client -f manifests/k3d/beta9.yamlbefore apply.origin/main; fixes are by function identity, not line — the underlying bugs were unchanged./inference/pullallowlist and body cap are inlined here (not pulled fromsecurity-hardening-b9agentbranch) since this PR is based onorigin/mainfor a clean diff.Co-Authored-By: Claudistrator savetheplanet@agentosaurus.com