Skip to content

oc debug: Propagate exit code#2253

Open
tchap wants to merge 2 commits intoopenshift:mainfrom
tchap:oc-debug-exit-code
Open

oc debug: Propagate exit code#2253
tchap wants to merge 2 commits intoopenshift:mainfrom
tchap:oc-debug-exit-code

Conversation

@tchap
Copy link
Copy Markdown
Contributor

@tchap tchap commented Apr 14, 2026

Make oc debug propagate the debug container exit code.
This is not happening now and oc debug always exits with 0.

Summary by CodeRabbit

  • Bug Fixes
    • oc debug now reliably detects and propagates container exit codes (including fast-failing and non-zero exits), re-checks pod status to avoid stale results, surfaces attach failures directly, and returns an error if a debug pod is deleted before completion.
  • Tests
    • Added unit and end-to-end tests validating exit-code detection and propagation for successful, failing, explicitly non-zero, and non-existent commands.

@openshift-ci openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 14, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 14, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Add helpers to determine container exit codes and convert non-zero exits into k8s.io/utils/exec exit errors; re-fetch pod status after attach/completion to avoid stale termination state, return log errors promptly, and surface authoritative exit codes (or an error if the debug pod was deleted).

Changes

Cohort / File(s) Summary
Exit Code Handling Implementation
pkg/cli/debug/debug.go
Add containerExitCode and exitCodeError helpers (using k8s.io/utils/exec). Change pod completion/attach flows to return log errors first, re-fetch pod status to obtain authoritative container termination info, and return utilexec.CodeExitError for non-zero exits or an explicit error if the debug pod was deleted.
Unit Tests
pkg/cli/debug/debug_test.go
Add TestContainerExitCode and TestExitCodeError table-driven tests covering terminated (0 and non-zero), running, missing, init, and empty-status cases; add k8s.io/utils/exec alias import (utilexec) for runtime type assertions.
End-to-End Tests
test/e2e/cli.go
Add e2e test that creates a deployment and runs oc debug with overridden image to assert propagation of container exit codes for success, failure, explicit non-zero exit (42), and non-existent commands; add errors import.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant CLI
    participant KubeAPI
    participant Pod

    User->>CLI: run "oc debug ... -- /cmd"
    CLI->>KubeAPI: create debug pod / attach to pod
    CLI->>Pod: attach and start command
    Pod->>KubeAPI: update pod status (container terminated)
    CLI->>KubeAPI: re-fetch pod status
    KubeAPI-->>CLI: return fresh pod status
    CLI->>CLI: containerExitCode(podStatus, containerName)
    alt authoritative non-zero exit code
        CLI->>User: return utilexec.CodeExitError with exit status
    else authoritative zero exit code
        CLI->>User: return nil
    else pod deleted or no authoritative status
        CLI->>User: return error ("debug pod was deleted before completion") or nil per logic
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 8 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 60.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Test Structure And Quality ❓ Inconclusive The tests added are standard Go testing.T tests, not Ginkgo tests, so Ginkgo-specific quality requirements do not apply. Clarify if the check should apply to standard Go tests or only to Ginkgo tests, and provide equivalent quality criteria if applicable.
✅ Passed checks (8 passed)
Check name Status Explanation
Stable And Deterministic Test Names ✅ Passed Test names in debug_test.go use static, descriptive strings like 'terminated container with exit code' and 'container is running' with no dynamic values that would change between runs.
Microshift Test Compatibility ✅ Passed The new e2e test is properly protected from running on MicroShift using skipIfMicroShift(oc) at the test body beginning.
Single Node Openshift (Sno) Test Compatibility ✅ Passed The new e2e test does not make any multi-node or HA cluster assumptions. It creates only a single deployment with one replica and tests single-pod exit code propagation functionality compatible with SNO.
Topology-Aware Scheduling Compatibility ✅ Passed PR focuses on exit code propagation in CLI debug command without introducing topology-aware scheduling issues.
Ote Binary Stdout Contract ✅ Passed No process-level stdout writes detected. Changes focus on exit code handling, error propagation, and tests without logging setup modifications.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed The new e2e test properly includes both the ConnectedOnly label and skipIfDisconnected call, contains no IPv4 hardcoding, and uses standard POSIX utilities compatible with both IPv4 and IPv6.
Title check ✅ Passed The title 'oc debug: Propagate exit code' directly and clearly describes the main objective of the pull request—making oc debug return the debug container's actual exit code instead of always exiting with code 0.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci Bot requested review from ardaguclu and ingvagabund April 14, 2026 12:15
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Apr 14, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: tchap
Once this PR has been reviewed and has the lgtm label, please assign atiratree for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/cli/debug/debug.go`:
- Around line 652-664: The current branch handling conditions.ErrNonZeroExitCode
blindly returns utilexec.CodeExitError{Code: 1} even when a different container
(not the debug container) failed; change it to determine the exit code of the
selected debug container (using exitCodeError and the pod from
containerRunningEvent.Object.(*corev1.Pod)) and only return a non-zero
CodeExitError when that specific container's exit code is non-zero. Keep calling
o.getLogs(pod) as before, then fetch the pod object, call
exitCodeError(resultPod, o.ContainerName) and if it returns a non-nil exitErr
return that (or construct a CodeExitError with the actual code); if the debug
container exited 0, do not return Code:1—return nil instead.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Pro Plus

Run ID: 2b5cf4ba-5fae-4998-a583-2dbe2fa1def3

📥 Commits

Reviewing files that changed from the base of the PR and between 66dee73 and d44add8.

📒 Files selected for processing (3)
  • pkg/cli/debug/debug.go
  • pkg/cli/debug/debug_test.go
  • test/e2e/cli.go

Comment thread pkg/cli/debug/debug.go Outdated
@tchap tchap force-pushed the oc-debug-exit-code branch from d44add8 to dbf04d7 Compare April 14, 2026 12:25
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
pkg/cli/debug/debug.go (1)

667-676: ⚠️ Potential issue | 🟠 Major

Debug container exit code 0 incorrectly returns error when sibling container fails.

When conditions.ErrNonZeroExitCode is triggered by a sibling container (not the debug container), but the debug container itself exited with code 0, the code still falls through to return CodeExitError{Code: 1}. This violates the "propagate the debug container's exit code" contract.

The issue occurs because exitCodeError returns nil for both code == 0 (success) and code == -1 (not found/not terminated), and both cases fall through to the Code: 1 return.

🐛 Proposed fix to distinguish exit code 0 from unavailable
 		if freshPod != nil {
+			code := containerExitCode(freshPod, o.ContainerName)
+			if code == 0 {
+				return nil
+			}
 			if exitErr := exitCodeError(freshPod, o.ContainerName); exitErr != nil {
 				return exitErr
 			}
 		}
 		// We know the exit code was non-zero but couldn't determine the actual value.
 		return utilexec.CodeExitError{
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/cli/debug/debug.go` around lines 667 - 676, The current logic treats both
a debug-container exit code of 0 and an unknown exit code the same because
exitCodeError returns only nil in both cases; change the handling so you
retrieve the actual exit code (refactor exitCodeError or add getExitCode) for
freshPod/o.ContainerName: if the debug container's code == 0 return nil, if code
> 0 return utilexec.CodeExitError with Code set to that code and Err describing
the non-zero exit, and only if the code is unavailable (-1) return the fallback
utilexec.CodeExitError{Code: 1}; ensure you call the updated function from where
freshPod != nil and use o.ContainerName to identify the container.
🧹 Nitpick comments (1)
pkg/cli/debug/debug.go (1)

1297-1309: Consider returning pointer type for consistency with codebase patterns.

The function returns utilexec.CodeExitError as a value type, but other places in the codebase (e.g., pkg/cli/admin/nodeimage/monitor.go) return *utilexec.CodeExitError. While both satisfy the error interface, direct type assertions like err.(*exec.CodeExitError) (used in mustgather.go) would fail on a value type.

Using errors.As would work for both, but returning a pointer is more consistent with existing patterns.

♻️ Suggested change for consistency
 func exitCodeError(pod *corev1.Pod, containerName string) error {
 	code := containerExitCode(pod, containerName)
 	if code <= 0 {
 		return nil
 	}
-	return utilexec.CodeExitError{
+	return &utilexec.CodeExitError{
 		Err:  fmt.Errorf("the debug container terminated with exit code %d", code),
 		Code: int(code),
 	}
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/cli/debug/debug.go` around lines 1297 - 1309, The exitCodeError function
currently returns a value-typed utilexec.CodeExitError which breaks direct type
assertions elsewhere; change exitCodeError (and its return site) to return a
pointer by constructing and returning &utilexec.CodeExitError{Err:
fmt.Errorf(...), Code: int(code)} when code > 0 (leave nil returns unchanged) so
callers using a direct type assertion like err.(*utilexec.CodeExitError)
succeed; locate the function exitCodeError and the helper containerExitCode to
apply this change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@pkg/cli/debug/debug.go`:
- Around line 667-676: The current logic treats both a debug-container exit code
of 0 and an unknown exit code the same because exitCodeError returns only nil in
both cases; change the handling so you retrieve the actual exit code (refactor
exitCodeError or add getExitCode) for freshPod/o.ContainerName: if the debug
container's code == 0 return nil, if code > 0 return utilexec.CodeExitError with
Code set to that code and Err describing the non-zero exit, and only if the code
is unavailable (-1) return the fallback utilexec.CodeExitError{Code: 1}; ensure
you call the updated function from where freshPod != nil and use o.ContainerName
to identify the container.

---

Nitpick comments:
In `@pkg/cli/debug/debug.go`:
- Around line 1297-1309: The exitCodeError function currently returns a
value-typed utilexec.CodeExitError which breaks direct type assertions
elsewhere; change exitCodeError (and its return site) to return a pointer by
constructing and returning &utilexec.CodeExitError{Err: fmt.Errorf(...), Code:
int(code)} when code > 0 (leave nil returns unchanged) so callers using a direct
type assertion like err.(*utilexec.CodeExitError) succeed; locate the function
exitCodeError and the helper containerExitCode to apply this change.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Pro Plus

Run ID: 5b853a14-4202-4085-adb7-42968bf13da6

📥 Commits

Reviewing files that changed from the base of the PR and between d44add8 and dbf04d7.

📒 Files selected for processing (3)
  • pkg/cli/debug/debug.go
  • pkg/cli/debug/debug_test.go
  • test/e2e/cli.go
🚧 Files skipped from review as they are similar to previous changes (2)
  • pkg/cli/debug/debug_test.go
  • test/e2e/cli.go

@tchap tchap force-pushed the oc-debug-exit-code branch from dbf04d7 to 1700148 Compare April 14, 2026 15:08
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test/e2e/cli.go`:
- Around line 1091-1103: The test pulls images from quay.io (the deployment
create call and debugImage) but only calls skipIfMicroShift(oc); add a
disconnected-cluster guard before those pulls to avoid failures in air-gapped
environments—e.g., invoke the existing helper that skips tests when external
pulls are disallowed (call skipIfDisconnected(oc) or the project's equivalent)
immediately before the oc.SetupProject()/oc.Run("create") block so the
oc.Run("create").Args(...) and debugImage usage are skipped on disconnected
clusters.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Pro Plus

Run ID: b8e4bc07-d40c-45c9-ac6c-8c3fa63ba661

📥 Commits

Reviewing files that changed from the base of the PR and between dbf04d7 and 1700148.

📒 Files selected for processing (3)
  • pkg/cli/debug/debug.go
  • pkg/cli/debug/debug_test.go
  • test/e2e/cli.go
🚧 Files skipped from review as they are similar to previous changes (2)
  • pkg/cli/debug/debug.go
  • pkg/cli/debug/debug_test.go

Comment thread test/e2e/cli.go
@tchap tchap force-pushed the oc-debug-exit-code branch from 1700148 to b3aaa8e Compare April 14, 2026 15:39
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test/e2e/cli.go`:
- Around line 1139-1145: The current assertion only checks that runErr occurred
and can hide unrelated failures; change the test after running cmd to first
assert the error is an exec.ExitError (e.g.
BeAssignableToTypeOf(&exec.ExitError{})) and then extract the exit code from the
error's ProcessState (or type-assert to *exec.ExitError) and assert the exit
code is non-zero using o.Expect(...).NotTo(o.Equal(0)); update the assertions
around the existing cmd/runErr variables and keep using o.Expect so the test
explicitly verifies an ExitError and a non-zero exit code.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Pro Plus

Run ID: 04944cfd-0341-4336-a231-28339c864b44

📥 Commits

Reviewing files that changed from the base of the PR and between 1700148 and b3aaa8e.

📒 Files selected for processing (3)
  • pkg/cli/debug/debug.go
  • pkg/cli/debug/debug_test.go
  • test/e2e/cli.go
🚧 Files skipped from review as they are similar to previous changes (2)
  • pkg/cli/debug/debug.go
  • pkg/cli/debug/debug_test.go

Comment thread test/e2e/cli.go
@tchap tchap force-pushed the oc-debug-exit-code branch from b3aaa8e to 090ffa0 Compare April 14, 2026 15:50
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
pkg/cli/debug/debug.go (1)

654-676: ⚠️ Potential issue | 🟠 Major

Avoid forcing exit code 1 when the selected debug container exited 0.

In this branch, exitCodeError() returning nil is ambiguous (0 or unknown). The unconditional fallback to Code: 1 can still misreport failure when only another container failed.

💡 Proposed fix
 		if freshPod != nil {
-			if exitErr := exitCodeError(freshPod, o.ContainerName); exitErr != nil {
-				return exitErr
-			}
+			switch code := containerExitCode(freshPod, o.ContainerName); {
+			case code > 0:
+				return exitCodeError(freshPod, o.ContainerName)
+			case code == 0:
+				return nil
+			}
 		}
 		// We know the exit code was non-zero but couldn't determine the actual value.
 		return utilexec.CodeExitError{
 			Err:  fmt.Errorf("the debug container terminated with a non-zero exit code"),
 			Code: 1,
 		}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/cli/debug/debug.go` around lines 654 - 676, The code always returns
utilexec.CodeExitError with Code: 1 when we can't determine a non-zero exit
code, which can misreport success if the selected debug container actually
exited 0; update the logic after obtaining freshPod (and when
exitCodeError(freshPod, o.ContainerName) returns nil) to explicitly inspect the
container status for o.ContainerName on freshPod (or use
containerRunningEvent.Object when freshPod is nil) and only return a non-zero
Code when that container status shows a non-zero exit code or is terminated with
a non-zero exit; if the debug container exited 0, return nil (no error) instead
of forcing Code: 1, and keep using exitCodeError(), freshPod,
containerRunningEvent and utilexec.CodeExitError as the referenced symbols.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@pkg/cli/debug/debug.go`:
- Around line 654-676: The code always returns utilexec.CodeExitError with Code:
1 when we can't determine a non-zero exit code, which can misreport success if
the selected debug container actually exited 0; update the logic after obtaining
freshPod (and when exitCodeError(freshPod, o.ContainerName) returns nil) to
explicitly inspect the container status for o.ContainerName on freshPod (or use
containerRunningEvent.Object when freshPod is nil) and only return a non-zero
Code when that container status shows a non-zero exit code or is terminated with
a non-zero exit; if the debug container exited 0, return nil (no error) instead
of forcing Code: 1, and keep using exitCodeError(), freshPod,
containerRunningEvent and utilexec.CodeExitError as the referenced symbols.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Pro Plus

Run ID: 898d7dd3-47c0-4469-85f5-43e724164d79

📥 Commits

Reviewing files that changed from the base of the PR and between b3aaa8e and 090ffa0.

📒 Files selected for processing (3)
  • pkg/cli/debug/debug.go
  • pkg/cli/debug/debug_test.go
  • test/e2e/cli.go
✅ Files skipped from review due to trivial changes (1)
  • pkg/cli/debug/debug_test.go

@tchap tchap force-pushed the oc-debug-exit-code branch from 090ffa0 to 7e02b20 Compare April 14, 2026 18:47
This is often created by Claude as a temp dir.
@tchap tchap force-pushed the oc-debug-exit-code branch from 7e02b20 to a00a8f4 Compare April 14, 2026 18:48
@tchap tchap marked this pull request as draft April 15, 2026 08:17
@tchap tchap force-pushed the oc-debug-exit-code branch 5 times, most recently from 702eb82 to 428ff7f Compare April 15, 2026 08:45
Make oc debug propagate the debug container exit code.
This is not happening now and `oc debug` always exits with 0.
@tchap tchap force-pushed the oc-debug-exit-code branch from 428ff7f to ef9cf52 Compare April 15, 2026 08:46
@tchap tchap marked this pull request as ready for review April 15, 2026 08:47
@openshift-ci openshift-ci Bot requested a review from atiratree April 15, 2026 08:48
@tchap
Copy link
Copy Markdown
Contributor Author

tchap commented Apr 15, 2026

/retitle oc debug: Propagate exit code

@openshift-ci openshift-ci Bot changed the title WIP: oc debug: Propagate exit code oc debug: Propagate exit code Apr 15, 2026
@openshift-ci openshift-ci Bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 15, 2026
Comment thread test/e2e/cli.go
o.Expect(err).NotTo(o.HaveOccurred())
})

g.It("Author:okupka-ROSA-OSD_CCS-ARO-ConnectedOnly-High-debug-exit-code-oc debug should propagate container exit code", oteginkgo.Informing(), func() {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No idea how this naming works, this was spat out by Claude.

@tchap
Copy link
Copy Markdown
Contributor Author

tchap commented Apr 16, 2026

/retest

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Apr 16, 2026

@tchap: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant