Skip to content

fix: adapt code interpreter warm pool to agent-sandbox v0.4.6#387

Open
ranxi2001 wants to merge 2 commits into
volcano-sh:mainfrom
ranxi2001:feat/agent-sandbox-latest
Open

fix: adapt code interpreter warm pool to agent-sandbox v0.4.6#387
ranxi2001 wants to merge 2 commits into
volcano-sh:mainfrom
ranxi2001:feat/agent-sandbox-latest

Conversation

@ranxi2001

@ranxi2001 ranxi2001 commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

What type of PR is this?

/kind feature

What this PR does / why we need it:

This PR adds compatibility with the current stable sigs.k8s.io/agent-sandbox Go module release, v0.4.6, for AgentCube's CodeInterpreter warm pool integration.

AgentCube's existing CodeInterpreter warm pool path was built against agent-sandbox v0.1.1. Since then, agent-sandbox has changed both its public API surface and its warm-pool adoption behavior. Supporting the current stable release requires AgentCube to handle these compatibility changes:

  • SandboxPodNameAnnotation moved out of the internal controllers package and is now exposed from the public sandbox API package;
  • warm pool adoption changed from the old direct/bare Pod shape to SandboxWarmPool -> Sandbox -> Pod, and SandboxClaim reports the serving Sandbox through status;
  • after the compile-level API update, runtime creation can still time out unless AgentCube waits/probes by the adopted Sandbox name rather than the claim/template Sandbox name;
  • agent-sandbox v0.4.6 defaults SandboxTemplate.networkPolicyManagement to managed NetworkPolicy, which blocks AgentCube's current Router / WorkloadManager data path unless AgentCube opts out or provides matching allow rules.

This PR:

  • bumps sigs.k8s.io/agent-sandbox to v0.4.6 and aligns the Kubernetes / controller-runtime dependency stack;
  • uses the public sandboxv1alpha1.SandboxPodNameAnnotation constant instead of importing an internal agent-sandbox controller package;
  • waits for SandboxClaim.Status.SandboxStatus.Name, fetches the adopted Sandbox, waits until it is Ready, and uses that Sandbox for Pod IP / entrypoint probing;
  • keeps the SandboxClaim name in the AgentCube session store when the request kind is SandboxClaim, so delete / GC still operate on the claim resource;
  • sets CodeInterpreter-created SandboxTemplate objects to networkPolicyManagement: Unmanaged to preserve the existing AgentCube Router / WorkloadManager traffic path;
  • updates warm pool e2e discovery to support both the old direct Pod ownership shape and the newer SandboxWarmPool -> Sandbox -> Pod ownership shape;
  • regenerates the CRD and client-go code with the matching Kubernetes v0.35.4 generator stack;
  • updates hack/update-codegen.sh so code generation uses k8s.io/code-generator v0.35.4 without mutating project dependencies through go get -d;
  • adds review-feedback fixes for watcher edge cases, context cancellation, lint findings, and warm-pool pod ownership matching.

Which issue(s) this PR fixes:

Refs #386

Special notes for your reviewer:

  • Target version: agent-sandbox v0.4.6, which is the current Go module @latest. I did not include v0.5.0rc1 in this PR because it is not listed as a canonical Go module release and, more importantly, it has already moved the APIs AgentCube uses from v1alpha1 to v1beta1 (SandboxClaimSpec.TemplateRef -> required WarmPoolRef, SandboxSpec.Replicas -> OperatingMode). I will track the v0.5.x / v1beta1 migration as a separate follow-up because it changes warm-pool claim semantics rather than being a small patch on top of the current v0.4.6 adaptation.
  • Dependency impact: this PR updates the agent-sandbox / Kubernetes / controller-runtime module stack required for the compatibility work.
  • NetworkPolicy compatibility: networkPolicyManagement: Unmanaged keeps the behavior AgentCube had with agent-sandbox v0.1.1. If reviewers prefer to keep agent-sandbox managed NetworkPolicies, the alternative is to add explicit allow rules for agentcube-router and workloadmanager.
  • Generated code: CRD manifest and client-go changes are generated from the dependency / Kubernetes OpenAPI update; this PR does not intentionally change the AgentRuntime API surface. hack/update-codegen.sh was also aligned to the new Kubernetes minor version because the old code-generator v0.34.1 path downgraded agent-sandbox during generation.
  • AI assistance: I used Codex to inspect the agent-sandbox API changes, implement focused tests, run local / k3s / fork CI validation, and prepare this PR description. I reviewed and validated the changes.

Tests run:

  • go test ./pkg/workloadmanager -count=1
  • make lint
  • go test -race ./pkg/workloadmanager -count=1
  • go test -race -v -coverprofile=coverage.out -coverpkg=./pkg/... ./pkg/...
  • go list ./... | grep -v '^github.com/volcano-sh/agentcube/test/e2e$' | xargs go test -count=1
  • make gen-check
  • go test ./test/e2e -run '^$' -count=1
  • make build-all
  • fork CI validation PR: test: validate PR 387 rebase on Go 1.26.4 ranxi2001/agentcube#4
    • Codespell: success
    • Python SDK Tests: success
    • Python Lint: success
    • Copyright Check: success
    • Codegen Check: success
    • Agentcube CI Workflow: success
    • Lint / golangci-lint: success
    • Test Coverage: success
    • Agentcube E2E Tests: success
  • k3s: go test ./test/e2e -run 'TestCodeInterpreter(BasicInvocation|FileOperations)$' -count=1 passed
  • k3s: go test ./test/e2e -run 'TestCodeInterpreterWarmPool$' -count=1 passed
  • k3s: go test ./test/e2e -run 'TestCodeInterpreterWarmPoolLoad$' -count=1 passed with 100 / 100 successful requests
  • Python e2e with a Python 3.11 venv:
    • test/e2e/test_codeinterpreter.py: 3 tests OK
    • test/e2e/test_langchain_agentcube_sandbox.py: 4 tests OK
    • test/e2e/test_mcp_code_interpreter.py: 5 tests OK
    • test/e2e/test_mcp_code_interpreter_stdio.py: 1 test OK
  • math-agent live LLM e2e with an OpenAI-compatible /v1 endpoint returned the expected final answer 42

Does this PR introduce a user-facing change?:

NONE

Copilot AI review requested due to automatic review settings June 17, 2026 12:32
@volcano-sh-bot volcano-sh-bot added the kind/bug Something isn't working label Jun 17, 2026
@volcano-sh-bot

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign hzxuzhonghu for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@gemini-code-assist

Copy link
Copy Markdown
Contributor

Warning

Gemini encountered an error creating the review. You can try again by commenting /gemini review.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

This PR updates Kubernetes/agent-sandbox dependencies and adjusts workload manager + e2e logic to support newer warm-pool controller ownership patterns and SandboxClaim adoption, while also refreshing build/codegen tooling.

Changes:

  • Refactor warm-pool pod counting/ready checks to support both direct Pod ownership and SandboxWarmPool → Sandbox → Pod ownership.
  • Update sandbox creation flow to handle SandboxClaim adoption by polling claim/sandbox readiness and storing claim identity correctly.
  • Bump Kubernetes/tooling versions, regenerate CRDs, and update Docker build images.

Reviewed changes

Copilot reviewed 13 out of 18 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
test/e2e/e2e_test.go Refactors warm-pool pod discovery to handle new/old ownership chains and reuses shared listing logic.
pkg/workloadmanager/sandbox_helper.go Adds CreatedAt to placeholder sandbox store entries.
pkg/workloadmanager/k8s_client.go Adds dynamic-client getters for Sandbox and SandboxClaim.
pkg/workloadmanager/handlers_test.go Updates annotation constant usage and adds coverage for SandboxClaim-adoption behavior.
pkg/workloadmanager/handlers.go Splits “wait for ready” paths for direct sandboxes vs claim-adopted sandboxes; fixes stored identity for claims.
pkg/workloadmanager/codeinterpreter_controller_test.go Adds tests ensuring SandboxTemplate network policy management is set to unmanaged.
pkg/workloadmanager/codeinterpreter_controller.go Forces SandboxTemplate NetworkPolicyManagement to unmanaged and updates existing templates accordingly.
manifests/charts/base/crds/runtime.agentcube.volcano.sh_agentruntimes.yaml Regenerated CRD schema changes (likely from dependency/codegen bump).
hack/update-codegen.sh Updates code-generator version and changes module discovery approach.
go.mod Updates Go version and bumps k8s/controller-runtime/agent-sandbox deps.
go.sum Updates module sums for the dependency/tooling bumps.
docker/Dockerfile.router Updates Go builder image version.
docker/Dockerfile.picod Updates Go builder image and hardens apt install layer.
docker/Dockerfile Updates Go builder image version.
Files not reviewed (4)
  • client-go/clientset/versioned/fake/clientset_generated.go: Generated file
  • client-go/informers/externalversions/factory.go: Generated file
  • client-go/informers/externalversions/runtime/v1alpha1/agentruntime.go: Generated file
  • client-go/informers/externalversions/runtime/v1alpha1/codeinterpreter.go: Generated file
Comments suppressed due to low confidence (1)

pkg/workloadmanager/sandbox_helper.go:60

  • When CreationTimestamp is zero, createdAt is set from time.Now(), but expiresAt uses a separate time.Now() call. To keep timestamps consistent (and avoid tiny negative/odd deltas in tests/metrics), compute the default expiry from createdAt (e.g., createdAt.Add(DefaultSandboxTTL)).
	createdAt := sandboxCR.GetCreationTimestamp().Time
	if createdAt.IsZero() {
		createdAt = time.Now()
	}
	var expiresAt time.Time
	if sandboxCR.Spec.Lifecycle.ShutdownTime != nil {
		expiresAt = sandboxCR.Spec.Lifecycle.ShutdownTime.Time
	} else {
		expiresAt = time.Now().Add(DefaultSandboxTTL)
	}

return nil
}

func (s *Server) waitForDirectSandboxReady(ctx context.Context, sandbox *sandboxv1alpha1.Sandbox, resultChan <-chan SandboxStatusUpdate) (*sandboxv1alpha1.Sandbox, error) {
Comment thread pkg/workloadmanager/handlers.go Outdated
Comment on lines +206 to +207
select {
case result := <-resultChan:
Comment on lines +296 to +299
// if warmpool is used, the pod name is stored in sandbox's annotation `agents.x-k8s.io/pod-name`
sandboxNameForPod := createdSandbox.Name
sandboxPodName := createdSandbox.Name
if podName, exists := createdSandbox.Annotations[sandboxv1alpha1.SandboxPodNameAnnotation]; exists {
Comment thread test/e2e/e2e_test.go Outdated
Comment on lines +1220 to +1221
warmPoolSandboxes := make(map[string]struct{}, len(sandboxList.Items))
for _, sandbox := range sandboxList.Items {
Comment thread test/e2e/e2e_test.go Outdated
Comment on lines +1225 to +1227
for _, owner := range sandbox.OwnerReferences {
if owner.Kind == ownerKindSandboxWarmPool && owner.Name == codeInterpreterName {
warmPoolSandboxes[sandbox.Name] = struct{}{}
Comment thread test/e2e/e2e_test.go Outdated
Comment on lines +1245 to +1247
_, ownedByWarmPoolSandbox := warmPoolSandboxes[owner.Name]
if (owner.Kind == ownerKindSandboxWarmPool && owner.Name == codeInterpreterName) ||
(owner.Kind == "Sandbox" && ownedByWarmPoolSandbox) {
Comment on lines +111 to +119
func (f *recordingStore) UpdateSandbox(ctx context.Context, sandbox *types.SandboxInfo) error {
f.fakeStore.UpdateSandbox(ctx, sandbox)
if f.updateErr != nil {
return f.updateErr
}
copied := *sandbox
f.lastUpdated = &copied
return nil
}
Signed-off-by: ranxi2001 <ranxi169@163.com>
Signed-off-by: ranxi2001 <ranxi169@163.com>
@codecov-commenter

Copy link
Copy Markdown

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 72.22222% with 35 lines in your changes missing coverage. Please review.
✅ Project coverage is 58.83%. Comparing base (524e55e) to head (5867183).
⚠️ Report is 127 commits behind head on main.

Files with missing lines Patch % Lines
pkg/workloadmanager/handlers.go 72.44% 24 Missing and 3 partials ⚠️
pkg/workloadmanager/k8s_client.go 50.00% 4 Missing and 4 partials ⚠️
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.
Additional details and impacted files
@@             Coverage Diff             @@
##             main     #387       +/-   ##
===========================================
+ Coverage   47.57%   58.83%   +11.26%     
===========================================
  Files          30       34        +4     
  Lines        2819     3267      +448     
===========================================
+ Hits         1341     1922      +581     
+ Misses       1338     1147      -191     
- Partials      140      198       +58     
Flag Coverage Δ
unittests 58.83% <72.22%> (+11.26%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@zhzhuang-zju

Copy link
Copy Markdown

Hi @ranxi2001, thanks for your work.

First, this PR looks more like a feature than a bug fix. Please confirm and update the PR label if needed.

I did not target v0.5.0rc1 because Go resolves that tag to a pseudo-version rather than the stable latest release.

Could you elaborate on this a bit more? Specifically, at which step would this become a problem? agent-sandbox also introduced substantial changes in v0.5.0rc1. So once agent-sandbox v0.5.0 is officially released, we may need to do another substantial round of adaptation work.

@ranxi2001

Copy link
Copy Markdown
Contributor Author

/remove-kind bug
/kind feature

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants