Skip to content

OCPBUGS-62629: reflect workload controller changes in the deployment …#2196

Open
tjungblu wants to merge 1 commit intoopenshift:masterfrom
tjungblu:OCPBUGS-62629_deployment
Open

OCPBUGS-62629: reflect workload controller changes in the deployment …#2196
tjungblu wants to merge 1 commit intoopenshift:masterfrom
tjungblu:OCPBUGS-62629_deployment

Conversation

@tjungblu
Copy link
Copy Markdown
Contributor

@tjungblu tjungblu commented May 4, 2026

…controller

Contains the degraded/progressing condition changes made to the workload controller from #2128.

Summary by CodeRabbit

  • Refactor

    • Improved deployment status reporting and container-level messages, including explicit messaging when no pods match selectors and clearer diagnostics for failing or unavailable pods.
    • Reworked controller sync flow to compute Available, Progressing, and Degraded more accurately during rollouts, timeouts, and deletions.
  • Documentation

    • Clarified when Degraded is reported (progress timeouts, reduced pod availability outside rollouts, or sync errors).
  • Tests

    • Updated and added tests validating degraded-condition reasons and status messages.

@openshift-ci-robot openshift-ci-robot added jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. labels May 4, 2026
@openshift-ci-robot
Copy link
Copy Markdown

@tjungblu: This pull request references Jira Issue OCPBUGS-62629, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (5.0.0) matches configured target version for branch (5.0.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @gangwgr

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

…controller

Contains the degraded/progressing condition changes made to the workload controller from #2128.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@tjungblu
Copy link
Copy Markdown
Contributor Author

tjungblu commented May 4, 2026

/hold

@openshift-ci openshift-ci Bot requested a review from gangwgr May 4, 2026 07:18
@openshift-ci openshift-ci Bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 4, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 4, 2026

Note

Currently processing new changes in this PR. This may take a few minutes, please wait...

⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: c402234d-f742-451e-ab6d-12b7d0280738

📥 Commits

Reviewing files that changed from the base of the PR and between 7d80de2 and c44884e.

📒 Files selected for processing (5)
  • pkg/apps/deployment/status.go
  • pkg/operator/csi/csidrivercontrollerservicecontroller/csi_driver_controller_service_controller.go
  • pkg/operator/deploymentcontroller/deployment_controller.go
  • pkg/operator/deploymentcontroller/deployment_controller_test.go
  • pkg/operator/deploymentcontroller/helpers_test.go

Walkthrough

Reworks deployment degraded-condition computation and diagnostics: exports container-message builder, refactors container-status helpers, rewrites syncManaged to compute Available/Progressing/Degraded from the applied deployment and matched pods, changes sync wiring for sync-error degraded behavior, updates deletion routing, and expands tests to assert degraded reasons/messages.

Changes

Degraded Condition Handling for Deployments

Layer / File(s) Summary
Helper Export / Data Shape
pkg/apps/deployment/status.go
Adds exported ContainerMessagesForPods(deployment *appsv1.Deployment, pods []*corev1.Pod) []string that appends a "no pods found with labels ..." message when pods is empty. Refactors containerStatusMessagesForPods to accept only deploymentPods []*corev1.Pod and return []string of container-state messages.
Core Condition Logic
pkg/operator/deploymentcontroller/deployment_controller.go
Rewrites syncManaged to use the applied deploymentObj for identity/generation, set Available from deploymentObj.Status.AvailableReplicas, determine Progressing via hasDeploymentProgressed/hasDeploymentTimedOutProgressing, detect failing/non-ready pods via listOperandPodsForDiagnostics/nonDeletingPods/hasFailingPods/findPodReadyCondition, and set Degraded (when enabled) for timeouts (ProgressDeadlineExceeded) or reduced availability with optional per-container messages from ContainerMessagesForPods. Removes prior isProgressing/hasFinishedProgressing.
Sync Error / Wiring
pkg/operator/deploymentcontroller/deployment_controller.go
sync now defers applySyncErrorDegraded only when the controller's c.conditions includes OperatorStatusTypeDegraded and applies it if sync returns an error. ToController wires WithSyncDegradedOnError only when Degraded is not present in c.conditions.
Reconciliation Router
pkg/operator/deploymentcontroller/deployment_controller.go
Deletion path now calls syncDeleting(ctx, opSpec, syncContext) (no opStatus); managed path calls syncManaged(ctx, opSpec, opStatus, syncContext). Added applySyncErrorDegraded helper.
Pod Listing / Diagnostics Helpers
pkg/operator/deploymentcontroller/deployment_controller.go
Adds listOperandPodsForDiagnostics (selector-based pod listing) and helpers nonDeletingPods, hasFailingPods, findPodReadyCondition, hasDeploymentProgressed, hasDeploymentTimedOutProgressing.
Tests / Test Helpers
pkg/operator/deploymentcontroller/deployment_controller_test.go, pkg/operator/deploymentcontroller/helpers_test.go
Introduces conditionDegraded constant, extends test harness with assertOperatorStatus callback to inspect Condition Reason/Message pre-sanitization, updates many TestSync cases to assert degraded behavior (including ProgressDeadlineExceeded case), and adds withDeploymentConditionWithMessage test helper to inject deployment conditions with Message.
Comments / Documentation
pkg/operator/csi/.../csi_driver_controller_service_controller.go, pkg/operator/deploymentcontroller/deployment_controller.go
Updates <name>Degraded comments to include progressing timeouts and failing-pod availability reductions (outside mid-rollout) as triggers in addition to sync errors.

Sequence Diagram(s)

sequenceDiagram
    participant Controller
    participant KubeAPI as "Kube API (Deployment)"
    participant PodAPI as "Kube API (Pods)"
    participant OperatorStatus

    Controller->>KubeAPI: Apply deployment (returns deploymentObj)
    Controller->>PodAPI: List pods matching deploymentObj.Spec.Selector
    PodAPI-->>Controller: Pod list
    Controller->>Controller: compute Available / Progressing (hasDeploymentProgressed / timed out)
    Controller->>Controller: filter non-deleting pods, detect failing pods, build container messages (ContainerMessagesForPods)
    alt progressing timed out
        Controller->>OperatorStatus: Set Degraded=True (Reason=ProgressDeadlineExceeded)
    else failing/unavailable pods (not mid-rollout)
        Controller->>OperatorStatus: Set Degraded=True (Reason=UnavailablePod) with container messages
    else no errors
        Controller->>OperatorStatus: Set Available/Progressing accordingly (Degraded=False)
    end
    Note right of Controller: If sync returns error and controller owns Degraded, applySyncErrorDegraded runs to mark Degraded=SyncError
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 9 | ❌ 3

❌ Failed checks (1 warning, 2 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 40.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The PR title is incomplete and truncated; it ends with 'deployment …' and doesn't convey the full change being made. Complete the truncated title to clearly describe what deployment controller changes are being reflected (e.g., 'OCPBUGS-62629: reflect workload controller degraded/progressing changes in deployment controller').
Test Structure And Quality ❓ Inconclusive The custom check is designed for Ginkgo BDD-style tests, but the PR uses standard Go table-driven testing patterns with zero Ginkgo constructs. Clarify whether the custom check applies to standard Go testing patterns or only to Ginkgo-based tests, and provide appropriate guidelines.
✅ Passed checks (9 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed All test case names are stable, static strings with no dynamic generation, format specifiers, or variable concatenation.
Microshift Test Compatibility ✅ Passed The pull request does not add any new Ginkgo e2e tests, only updates to standard Go unit tests and helper functions.
Single Node Openshift (Sno) Test Compatibility ✅ Passed This pull request does not add any Ginkgo e2e tests; test files use Go's standard testing package with unit tests, not Ginkgo patterns.
Topology-Aware Scheduling Compatibility ✅ Passed This pull request does not introduce any topology-aware scheduling constraints. The changes implement a generic deployment controller that manages status conditions without modifying pod specifications or scheduling fields.
Ote Binary Stdout Contract ✅ Passed Library packages should not configure klog output settings; this responsibility belongs to binaries.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed The modified test files use standard Go testing patterns, not Ginkgo e2e tests, so IPv6/disconnected network checks do not apply.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Review rate limit: 9/10 reviews remaining, refill in 6 minutes.

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci Bot requested review from bertinatto and jsafrane May 4, 2026 07:18
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 4, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: tjungblu
Once this PR has been reviewed and has the lgtm label, please assign bertinatto for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
pkg/operator/deploymentcontroller/deployment_controller_test.go (1)

456-671: ⚡ Quick win

Add one regression case that asserts the new reason/message fields.

These table updates prove the boolean transitions, but the behavior added here is really the new UnavailablePod / ProgressDeadlineExceeded / SyncError condition semantics. One focused case that keeps and asserts Reason/Message would catch the exact regressions this PR is trying to prevent.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/operator/deploymentcontroller/deployment_controller_test.go` around lines
456 - 671, Add a single regression test case to the TestSync testCases table
that specifically asserts Reason and Message on operator conditions (e.g.
UnavailablePod, ProgressDeadlineExceeded, SyncError). Create a case (name like
"assert condition reason/message") that sets up initialObjects using
makeDeployment with appropriate withDeploymentStatus and
withDeploymentConditions to trigger the target condition, and an operator via
makeFakeOperatorInstance; then in expectedObjects.operator include an assertion
that the condition entries include the exact Reason and Message (extend or use a
helper similar to withTrueConditions/withFalseConditions but that checks
Reason/Message, or add a new helper e.g. withConditionsIncludingReasonMessage)
so the test fails if Reason/Message are not set as intended; insert this new
case into the existing testCases slice inside TestSync.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/apps/deployment/status.go`:
- Line 41: The loop currently appends pods[i].Status.InitContainerStatuses onto
pods[i].Status.ContainerStatuses which can mutate cached pod objects; instead
create a fresh combined slice (e.g., combined := make([]v1.ContainerStatus, 0,
len(pods[i].Status.ContainerStatuses)+len(pods[i].Status.InitContainerStatuses))
and append both slices into it) or iterate pods[i].Status.ContainerStatuses and
pods[i].Status.InitContainerStatuses in two separate loops; update the loop that
references the combined view to use this new local variable (referencing pods,
pods[i].Status.ContainerStatuses, pods[i].Status.InitContainerStatuses) so you
never modify the original pod object.

In `@pkg/operator/deploymentcontroller/deployment_controller.go`:
- Around line 381-388: The degraded message is built from the full operandPods
slice while hasFailingPods() intentionally ignores terminating/deleting pods,
causing mismatched details; modify the code around the decision branch (when
!workloadIsBeingUpdated && deploymentObj.Status.AvailableReplicas <
desiredReplicas) to first filter operandPods to remove deleting/terminating pods
into a new slice (e.g., livePods), pass that filtered slice into
hasFailingPods(deploymentObj, livePods, now) and into
deployment.ContainerMessagesForPods(deploymentObj, livePods) so both the failure
check and the generated container messages use the identical pod set.

---

Nitpick comments:
In `@pkg/operator/deploymentcontroller/deployment_controller_test.go`:
- Around line 456-671: Add a single regression test case to the TestSync
testCases table that specifically asserts Reason and Message on operator
conditions (e.g. UnavailablePod, ProgressDeadlineExceeded, SyncError). Create a
case (name like "assert condition reason/message") that sets up initialObjects
using makeDeployment with appropriate withDeploymentStatus and
withDeploymentConditions to trigger the target condition, and an operator via
makeFakeOperatorInstance; then in expectedObjects.operator include an assertion
that the condition entries include the exact Reason and Message (extend or use a
helper similar to withTrueConditions/withFalseConditions but that checks
Reason/Message, or add a new helper e.g. withConditionsIncludingReasonMessage)
so the test fails if Reason/Message are not set as intended; insert this new
case into the existing testCases slice inside TestSync.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 77692bf7-5ee0-4146-af3f-307cedb5c776

📥 Commits

Reviewing files that changed from the base of the PR and between d6efc4e and 426221c.

📒 Files selected for processing (4)
  • pkg/apps/deployment/status.go
  • pkg/operator/csi/csidrivercontrollerservicecontroller/csi_driver_controller_service_controller.go
  • pkg/operator/deploymentcontroller/deployment_controller.go
  • pkg/operator/deploymentcontroller/deployment_controller_test.go

Comment thread pkg/apps/deployment/status.go Outdated
Comment thread pkg/operator/deploymentcontroller/deployment_controller.go Outdated
@tjungblu tjungblu force-pushed the OCPBUGS-62629_deployment branch 4 times, most recently from 07ea5a5 to 012e48a Compare May 4, 2026 09:14
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
pkg/operator/deploymentcontroller/deployment_controller_test.go (1)

673-709: ⚡ Quick win

Add one regression case for the new SyncError degraded path.

The suite now covers timeout and unavailable-pod degradation, but the wiring change in this PR is the deferred applySyncErrorDegraded() path in sync(). Please add a case that forces getDeployment() or the new pod lookup to fail and asserts conditionDegraded=True with Reason="SyncError", so the WithSyncDegradedOnError inversion stays protected.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/operator/deploymentcontroller/deployment_controller_test.go` around lines
673 - 709, Add a regression test case in deployment_controller_test.go that
exercises the deferred applySyncErrorDegraded() path by forcing getDeployment()
or the pod lookup to return an error so sync() triggers the SyncError degraded
branch; create a new test entry (parallel to the existing
"ProgressDeadlineExceeded" case) that sets up initialObjects to produce a
failure (e.g., mock getDeployment/pod list to error or supply a nil/erroneous
object) and in assertOperatorStatus verify that conditionDegraded is present
with Status True and Reason "SyncError" (and an appropriate message), ensuring
the WithSyncDegradedOnError inversion remains protected and referencing sync(),
applySyncErrorDegraded(), and WithSyncDegradedOnError for locating the logic to
target.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/operator/deploymentcontroller/deployment_controller.go`:
- Around line 296-313: The current reconcile eagerly lists pods (using
c.kubeClient.CoreV1().Pods(...).List) whenever the controller has the Degraded
condition (slices.Contains(c.conditions, opv1.OperatorStatusTypeDegraded)),
causing transient list failures to surface as hard sync errors; move the pod
fetch logic (creation of operandPods from podList.Items and
metav1.LabelSelectorAsSelector on deploymentObj.Spec.Selector) into the specific
branch that handles UnavailablePod/container-failure diagnostics (where
container details are inspected), and change error handling so that pod list
failures are treated as best-effort diagnostics (log/record the lookup failure
but do not return error from syncManaged or the reconcile path). Ensure you
still validate deploymentObj.Spec.Selector before using it and only attempt the
pods.List when you actually need pod/container details.

---

Nitpick comments:
In `@pkg/operator/deploymentcontroller/deployment_controller_test.go`:
- Around line 673-709: Add a regression test case in
deployment_controller_test.go that exercises the deferred
applySyncErrorDegraded() path by forcing getDeployment() or the pod lookup to
return an error so sync() triggers the SyncError degraded branch; create a new
test entry (parallel to the existing "ProgressDeadlineExceeded" case) that sets
up initialObjects to produce a failure (e.g., mock getDeployment/pod list to
error or supply a nil/erroneous object) and in assertOperatorStatus verify that
conditionDegraded is present with Status True and Reason "SyncError" (and an
appropriate message), ensuring the WithSyncDegradedOnError inversion remains
protected and referencing sync(), applySyncErrorDegraded(), and
WithSyncDegradedOnError for locating the logic to target.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 884898b7-e452-48ae-8bbd-18cbb0f259c8

📥 Commits

Reviewing files that changed from the base of the PR and between 426221c and 012e48a.

📒 Files selected for processing (5)
  • pkg/apps/deployment/status.go
  • pkg/operator/csi/csidrivercontrollerservicecontroller/csi_driver_controller_service_controller.go
  • pkg/operator/deploymentcontroller/deployment_controller.go
  • pkg/operator/deploymentcontroller/deployment_controller_test.go
  • pkg/operator/deploymentcontroller/helpers_test.go
✅ Files skipped from review due to trivial changes (2)
  • pkg/operator/csi/csidrivercontrollerservicecontroller/csi_driver_controller_service_controller.go
  • pkg/operator/deploymentcontroller/helpers_test.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • pkg/apps/deployment/status.go

Comment thread pkg/operator/deploymentcontroller/deployment_controller.go Outdated
@tjungblu tjungblu force-pushed the OCPBUGS-62629_deployment branch from 012e48a to 7d80de2 Compare May 4, 2026 10:40
…controller

Contains the degraded/progressing condition changes made to the
workload controller from openshift#2128.

Signed-off-by: Thomas Jungblut <tjungblu@redhat.com>
@tjungblu tjungblu force-pushed the OCPBUGS-62629_deployment branch from 7d80de2 to c44884e Compare May 4, 2026 11:17
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 4, 2026

Caution

Failed to replace (edit) comment. This is likely due to insufficient permissions or the comment being deleted.

Error details
{}

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
pkg/operator/deploymentcontroller/deployment_controller_test.go (1)

676-739: ⚡ Quick win

Add one regression case for pod-diagnostics failure in the unavailable-pod path.

The new assertions cover ProgressDeadlineExceeded and deferred SyncError, but not the branch where Pods().List() fails while the deployment is partially unavailable. That path now affects whether Degraded stays true or gets cleared, so it should be locked down with assertOperatorStatus.

Also applies to: 796-798

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/operator/deploymentcontroller/deployment_controller_test.go` around lines
676 - 739, Add a regression test case that simulates a failure from
Pods().List() in the unavailable-pod path and asserts operator Degraded remains
set (or not cleared) accordingly; specifically, add a new test entry in the
table (similar style to the existing "ProgressDeadlineExceeded" and "sync error"
cases) that injects a fake client error for Pods().List() when
getDeployment()/the unavailable-pod handling runs, then provide an
assertOperatorStatus closure that checks
v1helpers.FindOperatorCondition(status.Conditions, conditionDegraded) is non-nil
and has the expected Status/Reason/Message (matching the desired behavior for
the unavailable-pod error path). Ensure the test uses the same test harness
helpers (makeDeployment, makeFakeOperatorInstance, optionalManifestHooks if
needed) and names the case to indicate "pod-diagnostics failure in
unavailable-pod" so it covers the branch referenced in
getDeployment()/applySyncErrorDegraded logic.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/operator/deploymentcontroller/deployment_controller.go`:
- Around line 362-367: When AvailableReplicas < desiredReplicas and rollout is
not updating, don't let a nil/failed pod lookup from
listOperandPodsForDiagnostics clear Degraded; treat missing operandPods as a
failure. In the branch around listOperandPodsForDiagnostics -> operandPods, set
hasFailing to true if operandPods is nil or the diagnostics list indicates a
selector/list error (in addition to the existing hasFailingPods(deployment,
livePods, now) check), and still build containerMessages with whatever
diagnostics are available via dpm.ContainerMessagesForPods(deployment, livePods)
so the Degraded condition reflects the deployment shortage rather than
best‑effort pod lookup success. Ensure you reference
listOperandPodsForDiagnostics, nonDeletingPods, hasFailingPods,
dpm.ContainerMessagesForPods, deployment.Status.AvailableReplicas and
desiredReplicas when making the change.

---

Nitpick comments:
In `@pkg/operator/deploymentcontroller/deployment_controller_test.go`:
- Around line 676-739: Add a regression test case that simulates a failure from
Pods().List() in the unavailable-pod path and asserts operator Degraded remains
set (or not cleared) accordingly; specifically, add a new test entry in the
table (similar style to the existing "ProgressDeadlineExceeded" and "sync error"
cases) that injects a fake client error for Pods().List() when
getDeployment()/the unavailable-pod handling runs, then provide an
assertOperatorStatus closure that checks
v1helpers.FindOperatorCondition(status.Conditions, conditionDegraded) is non-nil
and has the expected Status/Reason/Message (matching the desired behavior for
the unavailable-pod error path). Ensure the test uses the same test harness
helpers (makeDeployment, makeFakeOperatorInstance, optionalManifestHooks if
needed) and names the case to indicate "pod-diagnostics failure in
unavailable-pod" so it covers the branch referenced in
getDeployment()/applySyncErrorDegraded logic.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: c402234d-f742-451e-ab6d-12b7d0280738

📥 Commits

Reviewing files that changed from the base of the PR and between 7d80de2 and c44884e.

📒 Files selected for processing (5)
  • pkg/apps/deployment/status.go
  • pkg/operator/csi/csidrivercontrollerservicecontroller/csi_driver_controller_service_controller.go
  • pkg/operator/deploymentcontroller/deployment_controller.go
  • pkg/operator/deploymentcontroller/deployment_controller_test.go
  • pkg/operator/deploymentcontroller/helpers_test.go
✅ Files skipped from review due to trivial changes (1)
  • pkg/operator/deploymentcontroller/helpers_test.go
🚧 Files skipped from review as they are similar to previous changes (2)
  • pkg/operator/csi/csidrivercontrollerservicecontroller/csi_driver_controller_service_controller.go
  • pkg/apps/deployment/status.go

Comment on lines +362 to +367
case !workloadIsBeingUpdated && deployment.Status.AvailableReplicas < desiredReplicas:
operandPods := c.listOperandPodsForDiagnostics(ctx, deployment, syncContext)
livePods := nonDeletingPods(operandPods)
hasFailing := hasFailingPods(deployment, livePods, now)
if hasFailing || deployment.Status.AvailableReplicas == 0 {
containerMessages := dpm.ContainerMessagesForPods(deployment, livePods)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Don’t let a failed pod lookup clear Degraded for partial outages.

When AvailableReplicas < desiredReplicas and rollout is already done, listOperandPodsForDiagnostics() collapses selector/list failures into nil, so hasFailingPods() becomes false and this branch reports Degraded=False unless availability dropped all the way to zero. That makes operator health depend on best-effort diagnostics instead of the deployment shortage itself.

Suggested direction
- operandPods := c.listOperandPodsForDiagnostics(ctx, deployment, syncContext)
+ operandPods, diagnosticsOK := c.listOperandPodsForDiagnostics(ctx, deployment, syncContext)
  livePods := nonDeletingPods(operandPods)
- hasFailing := hasFailingPods(deployment, livePods, now)
- if hasFailing || deployment.Status.AvailableReplicas == 0 {
-   containerMessages := dpm.ContainerMessagesForPods(deployment, livePods)
+ hasFailing := diagnosticsOK && hasFailingPods(deployment, livePods, now)
+ if deployment.Status.AvailableReplicas < desiredReplicas && (!diagnosticsOK || hasFailing || deployment.Status.AvailableReplicas == 0) {
+   var containerMessages []string
+   if diagnosticsOK {
+     containerMessages = dpm.ContainerMessagesForPods(deployment, livePods)
+   }

Also applies to: 401-417

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/operator/deploymentcontroller/deployment_controller.go` around lines 362
- 367, When AvailableReplicas < desiredReplicas and rollout is not updating,
don't let a nil/failed pod lookup from listOperandPodsForDiagnostics clear
Degraded; treat missing operandPods as a failure. In the branch around
listOperandPodsForDiagnostics -> operandPods, set hasFailing to true if
operandPods is nil or the diagnostics list indicates a selector/list error (in
addition to the existing hasFailingPods(deployment, livePods, now) check), and
still build containerMessages with whatever diagnostics are available via
dpm.ContainerMessagesForPods(deployment, livePods) so the Degraded condition
reflects the deployment shortage rather than best‑effort pod lookup success.
Ensure you reference listOperandPodsForDiagnostics, nonDeletingPods,
hasFailingPods, dpm.ContainerMessagesForPods,
deployment.Status.AvailableReplicas and desiredReplicas when making the change.

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 4, 2026

@tjungblu: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants