Skip to content

OCPBUGS-55050: Remove exception for Gateway API Proxy resources limits#31145

Draft
rikatz wants to merge 1 commit intoopenshift:mainfrom
rikatz:remove-sail-exception
Draft

OCPBUGS-55050: Remove exception for Gateway API Proxy resources limits#31145
rikatz wants to merge 1 commit intoopenshift:mainfrom
rikatz:remove-sail-exception

Conversation

@rikatz
Copy link
Copy Markdown
Member

@rikatz rikatz commented May 7, 2026

Previously, there was a need for exception on Gateway API Proxy pods to allow resources limits, as this wasn't properly configurable.

The situation is now fixed on cluster ingress operator, and the exception can be removed. The fix is implemented on openshift/cluster-ingress-operator#1440

Summary by CodeRabbit

  • Tests
    • Updated test expectations for resource limit validation related to Istio gateway containers.

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: automatic mode

@openshift-ci-robot openshift-ci-robot added jira/severity-important Referenced Jira bug's severity is important for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels May 7, 2026
@openshift-ci-robot
Copy link
Copy Markdown

@rikatz: This pull request references Jira Issue OCPBUGS-55050, which is invalid:

  • expected the bug to target the "5.0.0" version, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

Previously, there was a need for exception on Gateway API Proxy pods to allow resources limits, as this wasn't properly configurable.

The situation is now fixed on cluster ingress operator, and the exception can be removed. The fix is implemented on openshift/cluster-ingress-operator#1440

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 7, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 7, 2026

Warning

Rate limit exceeded

@rikatz has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 2 minutes and 18 seconds before requesting another review.

To continue reviewing without waiting, purchase usage credits in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: edc9402f-70a3-4911-9929-fde5a69f5031

📥 Commits

Reviewing files that changed from the base of the PR and between d1b0eef and 275387f.

📒 Files selected for processing (1)
  • test/extended/operators/resources.go

Walkthrough

This PR updates a test file that validates operator resource handling. It removes unused import statements, simplifies ReplicaSet-to-Deployment owner reference resolution by removing error-path handling, and removes two Istio gateway resource-limit entries from the known-broken test data.

Changes

Test Logic and Data Updates

Layer / File(s) Summary
Controller Resolution Logic
test/extended/operators/resources.go (lines 126–131)
ReplicaSet→Deployment controller name resolution now only handles the success case; error-path behavior that checked for IsNotFound orphaned ReplicaSets is removed.
Unused Import Removal
test/extended/operators/resources.go (lines 8–10)
apierrors import removed because it is no longer used after the IsNotFound check was deleted.
Test Data Updates
test/extended/operators/resources.go (lines 56–58)
Two Istio gateway container resource-limit entries (limit[cpu] and limit[memory]) are removed from the knownBrokenPods list, changing which violations are treated as known broken.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 10 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Test Structure And Quality ⚠️ Warning Error handling degraded: explicit IsNotFound check removed. Now silently ignores all Deployment lookup errors using err == nil pattern instead of failing the test or logging. Restore explicit error handling with err != nil and IsNotFound check for orphaned replicasets. Avoid silently swallowing API errors.
Microshift Test Compatibility ⚠️ Warning New test references namespaces unavailable on MicroShift (openshift-kube-apiserver, openshift-kube-controller-manager, openshift-kube-scheduler) without exemption protection. Add [Skipped:MicroShift] label to test name or guard with exutil.IsMicroShiftCluster() runtime check using g.Skip().
✅ Passed checks (10 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely describes the main change: removing an exception for Gateway API Proxy resource limits, which aligns with the PR's primary objective.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed Test names are stable and deterministic. Both test declarations use static strings with no dynamic values, timestamps, generated identifiers, or node/pod names.
Single Node Openshift (Sno) Test Compatibility ✅ Passed New test validates pod resource constraints. Does not assume multiple nodes, HA failover, or topology constraints. Works identically on SNO and multi-node clusters.
Topology-Aware Scheduling Compatibility ✅ Passed Only a test file is modified. No deployment manifests or operator code introducing scheduling constraints. Check not applicable.
Ote Binary Stdout Contract ✅ Passed No OTE Binary Stdout Contract violations detected. The file contains only Ginkgo test code with all fmt calls being fmt.Sprintf (safe), no klog, and no process-level stdout writes.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed Test contains no IPv4 assumptions, hardcoded addresses, external connectivity requirements, or IP family checks. All operations are cluster-internal using standard Kubernetes APIs.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 7, 2026

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

Previously, there was a need for exception on Gateway API Proxy pods to allow resources limits, as this wasn't properly configurable.

The situation is now fixed on cluster ingress operator, and the exception can be removed.
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 7, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: rikatz
Once this PR has been reviewed and has the lgtm label, please assign sdodson for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@rikatz rikatz force-pushed the remove-sail-exception branch from d1b0eef to 275387f Compare May 7, 2026 16:30
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
test/extended/operators/resources.go (1)

123-131: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Silent swallowing of non-NotFound Deployment lookup errors may cause spurious test failures.

The previous code explicitly handled apierrors.IsNotFound (orphaned RS → keep RS ref) versus other errors (some alternate path). The new code collapses all error cases into a no-op: any error—transient 5xx, RBAC denial, rate-limiting—leaves the ref as kind=ReplicaSet, so the constructed controller key will be apps/v1/ReplicaSet/<ns>/<rs-name> instead of apps/v1/Deployment/<ns>/<deploy-name>.

Consequence: if such an error occurs at test runtime, any matching entry in exceptionGranted or knownBrokenPods (which are Deployment-keyed) will not match, and the violation is promoted from waitingForFix/excepted → notAllowed, causing a hard test failure unrelated to the actual resource-limit compliance.

Consider at minimum logging the unexpected error so it's visible in test output:

🛡️ Proposed fix
 if deploy, err := oc.KubeFramework().ClientSet.AppsV1().Deployments(pod.Namespace).Get(context.Background(), name, metav1.GetOptions{}); err == nil {
     ref.Name = deploy.Name
     ref.Kind = "Deployment"
     ref.APIVersion = "apps/v1"
+} else if !apierrors.IsNotFound(err) {
+    e2e.Logf("WARNING: unexpected error resolving Deployment for ReplicaSet %s/%s: %v", pod.Namespace, ref.Name, err)
 }

And restore the apierrors import:

+   "k8s.io/apimachinery/pkg/api/errors"
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/extended/operators/resources.go` around lines 123 - 131, When resolving
ReplicaSet refs in the switch case (the block that calls
oc.KubeFramework().ClientSet.AppsV1().Deployments(...).Get and then sets
ref.Name/ref.Kind/ref.APIVersion), do not silently ignore all errors: import
k8s.io/apimachinery/pkg/api/errors (apierrors), check apierrors.IsNotFound(err)
and keep the ReplicaSet ref in that case, but for any other non-nil error log
the unexpected error (with context: namespace, rs name and the error) and avoid
treating it as a benign NotFound — this preserves the original behavior for
orphaned RS while surfacing transient/permission errors so they don't produce
spurious test failures.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@test/extended/operators/resources.go`:
- Around line 123-131: When resolving ReplicaSet refs in the switch case (the
block that calls oc.KubeFramework().ClientSet.AppsV1().Deployments(...).Get and
then sets ref.Name/ref.Kind/ref.APIVersion), do not silently ignore all errors:
import k8s.io/apimachinery/pkg/api/errors (apierrors), check
apierrors.IsNotFound(err) and keep the ReplicaSet ref in that case, but for any
other non-nil error log the unexpected error (with context: namespace, rs name
and the error) and avoid treating it as a benign NotFound — this preserves the
original behavior for orphaned RS while surfacing transient/permission errors so
they don't produce spurious test failures.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: f7326b7f-5fca-45c9-8e2c-f06703cc9d14

📥 Commits

Reviewing files that changed from the base of the PR and between 65f4e3a and d1b0eef.

📒 Files selected for processing (1)
  • test/extended/operators/resources.go

@openshift-merge-bot openshift-merge-bot Bot added the ready-for-human-review Indicates a PR has been reviewed by automated tools and is ready for human review label May 7, 2026
@rikatz
Copy link
Copy Markdown
Member Author

rikatz commented May 7, 2026

/jira refresh

@openshift-ci-robot openshift-ci-robot added jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. and removed jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels May 7, 2026
@openshift-ci-robot
Copy link
Copy Markdown

@rikatz: This pull request references Jira Issue OCPBUGS-55050, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (5.0.0) matches configured target version for branch (5.0.0)
  • bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, POST)

No GitHub users were found matching the public email listed for the QA contact in Jira (iamin@redhat.com), skipping review request.

Details

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@rikatz
Copy link
Copy Markdown
Member Author

rikatz commented May 7, 2026

I am leaving this as a draft just while openshift/cluster-ingress-operator#1440 is not merged yet

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. jira/severity-important Referenced Jira bug's severity is important for the branch this PR is targeting. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. ready-for-human-review Indicates a PR has been reviewed by automated tools and is ready for human review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants