oc debug: Propagate exit code#2253
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughAdd helpers to determine container exit codes and convert non-zero exits into Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant CLI
participant KubeAPI
participant Pod
User->>CLI: run "oc debug ... -- /cmd"
CLI->>KubeAPI: create debug pod / attach to pod
CLI->>Pod: attach and start command
Pod->>KubeAPI: update pod status (container terminated)
CLI->>KubeAPI: re-fetch pod status
KubeAPI-->>CLI: return fresh pod status
CLI->>CLI: containerExitCode(podStatus, containerName)
alt authoritative non-zero exit code
CLI->>User: return utilexec.CodeExitError with exit status
else authoritative zero exit code
CLI->>User: return nil
else pod deleted or no authoritative status
CLI->>User: return error ("debug pod was deleted before completion") or nil per logic
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 8 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (8 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: tchap The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/cli/debug/debug.go`:
- Around line 652-664: The current branch handling conditions.ErrNonZeroExitCode
blindly returns utilexec.CodeExitError{Code: 1} even when a different container
(not the debug container) failed; change it to determine the exit code of the
selected debug container (using exitCodeError and the pod from
containerRunningEvent.Object.(*corev1.Pod)) and only return a non-zero
CodeExitError when that specific container's exit code is non-zero. Keep calling
o.getLogs(pod) as before, then fetch the pod object, call
exitCodeError(resultPod, o.ContainerName) and if it returns a non-nil exitErr
return that (or construct a CodeExitError with the actual code); if the debug
container exited 0, do not return Code:1—return nil instead.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Pro Plus
Run ID: 2b5cf4ba-5fae-4998-a583-2dbe2fa1def3
📒 Files selected for processing (3)
pkg/cli/debug/debug.gopkg/cli/debug/debug_test.gotest/e2e/cli.go
d44add8 to
dbf04d7
Compare
There was a problem hiding this comment.
♻️ Duplicate comments (1)
pkg/cli/debug/debug.go (1)
667-676:⚠️ Potential issue | 🟠 MajorDebug container exit code 0 incorrectly returns error when sibling container fails.
When
conditions.ErrNonZeroExitCodeis triggered by a sibling container (not the debug container), but the debug container itself exited with code 0, the code still falls through to returnCodeExitError{Code: 1}. This violates the "propagate the debug container's exit code" contract.The issue occurs because
exitCodeErrorreturnsnilfor bothcode == 0(success) andcode == -1(not found/not terminated), and both cases fall through to theCode: 1return.🐛 Proposed fix to distinguish exit code 0 from unavailable
if freshPod != nil { + code := containerExitCode(freshPod, o.ContainerName) + if code == 0 { + return nil + } if exitErr := exitCodeError(freshPod, o.ContainerName); exitErr != nil { return exitErr } } // We know the exit code was non-zero but couldn't determine the actual value. return utilexec.CodeExitError{🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/cli/debug/debug.go` around lines 667 - 676, The current logic treats both a debug-container exit code of 0 and an unknown exit code the same because exitCodeError returns only nil in both cases; change the handling so you retrieve the actual exit code (refactor exitCodeError or add getExitCode) for freshPod/o.ContainerName: if the debug container's code == 0 return nil, if code > 0 return utilexec.CodeExitError with Code set to that code and Err describing the non-zero exit, and only if the code is unavailable (-1) return the fallback utilexec.CodeExitError{Code: 1}; ensure you call the updated function from where freshPod != nil and use o.ContainerName to identify the container.
🧹 Nitpick comments (1)
pkg/cli/debug/debug.go (1)
1297-1309: Consider returning pointer type for consistency with codebase patterns.The function returns
utilexec.CodeExitErroras a value type, but other places in the codebase (e.g.,pkg/cli/admin/nodeimage/monitor.go) return*utilexec.CodeExitError. While both satisfy theerrorinterface, direct type assertions likeerr.(*exec.CodeExitError)(used inmustgather.go) would fail on a value type.Using
errors.Aswould work for both, but returning a pointer is more consistent with existing patterns.♻️ Suggested change for consistency
func exitCodeError(pod *corev1.Pod, containerName string) error { code := containerExitCode(pod, containerName) if code <= 0 { return nil } - return utilexec.CodeExitError{ + return &utilexec.CodeExitError{ Err: fmt.Errorf("the debug container terminated with exit code %d", code), Code: int(code), } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/cli/debug/debug.go` around lines 1297 - 1309, The exitCodeError function currently returns a value-typed utilexec.CodeExitError which breaks direct type assertions elsewhere; change exitCodeError (and its return site) to return a pointer by constructing and returning &utilexec.CodeExitError{Err: fmt.Errorf(...), Code: int(code)} when code > 0 (leave nil returns unchanged) so callers using a direct type assertion like err.(*utilexec.CodeExitError) succeed; locate the function exitCodeError and the helper containerExitCode to apply this change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@pkg/cli/debug/debug.go`:
- Around line 667-676: The current logic treats both a debug-container exit code
of 0 and an unknown exit code the same because exitCodeError returns only nil in
both cases; change the handling so you retrieve the actual exit code (refactor
exitCodeError or add getExitCode) for freshPod/o.ContainerName: if the debug
container's code == 0 return nil, if code > 0 return utilexec.CodeExitError with
Code set to that code and Err describing the non-zero exit, and only if the code
is unavailable (-1) return the fallback utilexec.CodeExitError{Code: 1}; ensure
you call the updated function from where freshPod != nil and use o.ContainerName
to identify the container.
---
Nitpick comments:
In `@pkg/cli/debug/debug.go`:
- Around line 1297-1309: The exitCodeError function currently returns a
value-typed utilexec.CodeExitError which breaks direct type assertions
elsewhere; change exitCodeError (and its return site) to return a pointer by
constructing and returning &utilexec.CodeExitError{Err: fmt.Errorf(...), Code:
int(code)} when code > 0 (leave nil returns unchanged) so callers using a direct
type assertion like err.(*utilexec.CodeExitError) succeed; locate the function
exitCodeError and the helper containerExitCode to apply this change.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Pro Plus
Run ID: 5b853a14-4202-4085-adb7-42968bf13da6
📒 Files selected for processing (3)
pkg/cli/debug/debug.gopkg/cli/debug/debug_test.gotest/e2e/cli.go
🚧 Files skipped from review as they are similar to previous changes (2)
- pkg/cli/debug/debug_test.go
- test/e2e/cli.go
dbf04d7 to
1700148
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@test/e2e/cli.go`:
- Around line 1091-1103: The test pulls images from quay.io (the deployment
create call and debugImage) but only calls skipIfMicroShift(oc); add a
disconnected-cluster guard before those pulls to avoid failures in air-gapped
environments—e.g., invoke the existing helper that skips tests when external
pulls are disallowed (call skipIfDisconnected(oc) or the project's equivalent)
immediately before the oc.SetupProject()/oc.Run("create") block so the
oc.Run("create").Args(...) and debugImage usage are skipped on disconnected
clusters.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Pro Plus
Run ID: b8e4bc07-d40c-45c9-ac6c-8c3fa63ba661
📒 Files selected for processing (3)
pkg/cli/debug/debug.gopkg/cli/debug/debug_test.gotest/e2e/cli.go
🚧 Files skipped from review as they are similar to previous changes (2)
- pkg/cli/debug/debug.go
- pkg/cli/debug/debug_test.go
1700148 to
b3aaa8e
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@test/e2e/cli.go`:
- Around line 1139-1145: The current assertion only checks that runErr occurred
and can hide unrelated failures; change the test after running cmd to first
assert the error is an exec.ExitError (e.g.
BeAssignableToTypeOf(&exec.ExitError{})) and then extract the exit code from the
error's ProcessState (or type-assert to *exec.ExitError) and assert the exit
code is non-zero using o.Expect(...).NotTo(o.Equal(0)); update the assertions
around the existing cmd/runErr variables and keep using o.Expect so the test
explicitly verifies an ExitError and a non-zero exit code.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Pro Plus
Run ID: 04944cfd-0341-4336-a231-28339c864b44
📒 Files selected for processing (3)
pkg/cli/debug/debug.gopkg/cli/debug/debug_test.gotest/e2e/cli.go
🚧 Files skipped from review as they are similar to previous changes (2)
- pkg/cli/debug/debug.go
- pkg/cli/debug/debug_test.go
b3aaa8e to
090ffa0
Compare
There was a problem hiding this comment.
♻️ Duplicate comments (1)
pkg/cli/debug/debug.go (1)
654-676:⚠️ Potential issue | 🟠 MajorAvoid forcing exit code
1when the selected debug container exited0.In this branch,
exitCodeError()returning nil is ambiguous (0or unknown). The unconditional fallback toCode: 1can still misreport failure when only another container failed.💡 Proposed fix
if freshPod != nil { - if exitErr := exitCodeError(freshPod, o.ContainerName); exitErr != nil { - return exitErr - } + switch code := containerExitCode(freshPod, o.ContainerName); { + case code > 0: + return exitCodeError(freshPod, o.ContainerName) + case code == 0: + return nil + } } // We know the exit code was non-zero but couldn't determine the actual value. return utilexec.CodeExitError{ Err: fmt.Errorf("the debug container terminated with a non-zero exit code"), Code: 1, }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/cli/debug/debug.go` around lines 654 - 676, The code always returns utilexec.CodeExitError with Code: 1 when we can't determine a non-zero exit code, which can misreport success if the selected debug container actually exited 0; update the logic after obtaining freshPod (and when exitCodeError(freshPod, o.ContainerName) returns nil) to explicitly inspect the container status for o.ContainerName on freshPod (or use containerRunningEvent.Object when freshPod is nil) and only return a non-zero Code when that container status shows a non-zero exit code or is terminated with a non-zero exit; if the debug container exited 0, return nil (no error) instead of forcing Code: 1, and keep using exitCodeError(), freshPod, containerRunningEvent and utilexec.CodeExitError as the referenced symbols.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@pkg/cli/debug/debug.go`:
- Around line 654-676: The code always returns utilexec.CodeExitError with Code:
1 when we can't determine a non-zero exit code, which can misreport success if
the selected debug container actually exited 0; update the logic after obtaining
freshPod (and when exitCodeError(freshPod, o.ContainerName) returns nil) to
explicitly inspect the container status for o.ContainerName on freshPod (or use
containerRunningEvent.Object when freshPod is nil) and only return a non-zero
Code when that container status shows a non-zero exit code or is terminated with
a non-zero exit; if the debug container exited 0, return nil (no error) instead
of forcing Code: 1, and keep using exitCodeError(), freshPod,
containerRunningEvent and utilexec.CodeExitError as the referenced symbols.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Pro Plus
Run ID: 898d7dd3-47c0-4469-85f5-43e724164d79
📒 Files selected for processing (3)
pkg/cli/debug/debug.gopkg/cli/debug/debug_test.gotest/e2e/cli.go
✅ Files skipped from review due to trivial changes (1)
- pkg/cli/debug/debug_test.go
090ffa0 to
7e02b20
Compare
This is often created by Claude as a temp dir.
7e02b20 to
a00a8f4
Compare
702eb82 to
428ff7f
Compare
Make oc debug propagate the debug container exit code. This is not happening now and `oc debug` always exits with 0.
428ff7f to
ef9cf52
Compare
|
/retitle oc debug: Propagate exit code |
| o.Expect(err).NotTo(o.HaveOccurred()) | ||
| }) | ||
|
|
||
| g.It("Author:okupka-ROSA-OSD_CCS-ARO-ConnectedOnly-High-debug-exit-code-oc debug should propagate container exit code", oteginkgo.Informing(), func() { |
There was a problem hiding this comment.
No idea how this naming works, this was spat out by Claude.
|
/retest |
|
@tchap: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Make
oc debugpropagate the debug container exit code.This is not happening now and
oc debugalways exits with 0.Summary by CodeRabbit
oc debugnow reliably detects and propagates container exit codes (including fast-failing and non-zero exits), re-checks pod status to avoid stale results, surfaces attach failures directly, and returns an error if a debug pod is deleted before completion.