WIP: OCPBUGS-80958: Kube apiserver node replace#2178
WIP: OCPBUGS-80958: Kube apiserver node replace#2178bhperry wants to merge 1 commit intoopenshift:masterfrom
Conversation
|
Skipping CI for Draft Pull Request. |
|
@bhperry: This pull request references Jira Issue OCPBUGS-80958, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Warning Rate limit exceeded
To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Repository: openshift/coderabbit/.coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: ⛔ Files ignored due to path filters (13)
📒 Files selected for processing (4)
WalkthroughAdds Node UID tracking to the static pod operator: reconciliation now compares NodeName and NodeUID when pruning/updating NodeStatuses, records NodeUID in apply configurations and events, and test fixtures/helpers are updated. Also adds two Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Important Pre-merge checks failedPlease resolve all errors before merging. Addressing warnings is optional. ❌ Failed checks (1 warning, 3 inconclusive)
✅ Passed checks (8 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Review rate limit: 0/1 reviews remaining, refill in 52 minutes and 25 seconds.Comment |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: bhperry The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (3)
pkg/operator/staticpod/controller/node/node_controller_test.go (3)
740-740: Remove unusedexpectedNodesfield.
expectedNodes [][2]stringis declared in the table struct but never populated or read by any of the four scenarios. Drop it to avoid confusion about what the test actually asserts.♻️ Proposed change
tests := []struct { name string startNodes []runtime.Object startNodeStatus []operatorv1.NodeStatus - expectedNodes [][2]string evaluateNodeStatus func([]operatorv1.NodeStatus) error }{🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/operator/staticpod/controller/node/node_controller_test.go` at line 740, The test table struct contains an unused field expectedNodes [][2]string; remove this field from the table type declaration in node_controller_test.go and from any table literal entries so the scenarios only include fields actually used by the tests (ensure you update the table struct definition that declares expectedNodes and all test case literals that reference it, e.g., in the testCases slice for the Node controller tests).
786-807: Consider asserting the replacement event and JSONPatch, not just the finalNodeStatuses.The "node replaced mismatched uid in status" scenario is the most important new behavior this PR introduces, but the test only checks the final
NodeStatuses. Nothing verifies that:
- The controller emits
MasterNodeReplaced(with old+new UIDs) via the recorder — seenode_controller.goline 106.- A
jsonPatchis created to remove the stale entry at the expected index before the apply — seenode_controller.goline 117.Both are observable via
kubeClient.Fake.Actions()(event create) andfakeStaticPodOperatorClient.GetPatchedOperatorStatus(), as already done inTestNodeControllerDegradedConditionTypescenarios 5 and 12. Adding these assertions would lock in the behavior the PR description calls out ("ensure the node status is reset by comparing the node UID"). Non-blocking.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/operator/staticpod/controller/node/node_controller_test.go` around lines 786 - 807, The test currently only verifies final NodeStatuses; update the "node replaced mismatched uid in status" case to also assert that the controller emitted a MasterNodeReplaced event and that a JSONPatch removing the stale status entry was created: after running the controller, inspect kubeClient.Fake.Actions() for an event create matching MasterNodeReplaced (including old and new UIDs) and call fakeStaticPodOperatorClient.GetPatchedOperatorStatus() to retrieve the patch and assert it contains the expected JSONPatch operation that removes the stale index (as produced by the jsonPatch logic in node_controller.go). Ensure you reference the test case by name and check the exact event reason/message and the remove op in the patch, failing the test if either is missing.
808-829: Scenario name is misleading.The "node removed" scenario actually exercises the case where the previously tracked node name is gone and a different node has appeared (
test-node-1→test-node-2), i.e. a combined remove-and-add, not a pure removal. Consider renaming to something like"node removed and replaced by different name"to match the existing"single-node-removed"case inTestNewNodeControllerwhich tests the pure-removal behavior. Optional.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/operator/staticpod/controller/node/node_controller_test.go` around lines 808 - 829, The test case in the table uses a misleading name "node removed" but actually exercises a remove-and-add scenario (previous NodeStatus test-node-1 replaced by startNodes test-node-2); update the test case's name field in node_controller_test.go to something clearer (for example "node removed and replaced by different name") so it accurately reflects the scenario and aligns with the existing "single-node-removed" naming in TestNewNodeController; locate the table entry by its name field in the test case slice and change only that string (no behavior changes).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@go.mod`:
- Around line 5-7: The go.mod contains forked-replace directives (replace
github.com/openshift/api => github.com/bhperry/openshift-api
v0.0.0-20260422220337-85d77b44a5a5 and replace github.com/openshift/client-go =>
github.com/bhperry/openshift-client-go v0.0.0-20260424144941-de7d2d9bafad) that
must be removed from the shared module manifest; delete these two replace lines
and either move the local overrides into a developer-only go.work (or similar
local config) or update the require lines to point to the approved upstream
module revisions before committing. Ensure no other project-wide go.mod contains
the same forked replace entries and run go mod tidy to verify dependency graph
is clean.
---
Nitpick comments:
In `@pkg/operator/staticpod/controller/node/node_controller_test.go`:
- Line 740: The test table struct contains an unused field expectedNodes
[][2]string; remove this field from the table type declaration in
node_controller_test.go and from any table literal entries so the scenarios only
include fields actually used by the tests (ensure you update the table struct
definition that declares expectedNodes and all test case literals that reference
it, e.g., in the testCases slice for the Node controller tests).
- Around line 786-807: The test currently only verifies final NodeStatuses;
update the "node replaced mismatched uid in status" case to also assert that the
controller emitted a MasterNodeReplaced event and that a JSONPatch removing the
stale status entry was created: after running the controller, inspect
kubeClient.Fake.Actions() for an event create matching MasterNodeReplaced
(including old and new UIDs) and call
fakeStaticPodOperatorClient.GetPatchedOperatorStatus() to retrieve the patch and
assert it contains the expected JSONPatch operation that removes the stale index
(as produced by the jsonPatch logic in node_controller.go). Ensure you reference
the test case by name and check the exact event reason/message and the remove op
in the patch, failing the test if either is missing.
- Around line 808-829: The test case in the table uses a misleading name "node
removed" but actually exercises a remove-and-add scenario (previous NodeStatus
test-node-1 replaced by startNodes test-node-2); update the test case's name
field in node_controller_test.go to something clearer (for example "node removed
and replaced by different name") so it accurately reflects the scenario and
aligns with the existing "single-node-removed" naming in TestNewNodeController;
locate the table entry by its name field in the test case slice and change only
that string (no behavior changes).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 0131c169-b1c2-4ef4-ad19-3d6cd3ae6310
⛔ Files ignored due to path filters (6)
go.sumis excluded by!**/*.sumvendor/github.com/openshift/api/operator/v1/types.gois excluded by!**/vendor/**,!vendor/**vendor/github.com/openshift/api/operator/v1/zz_generated.swagger_doc_generated.gois excluded by!**/vendor/**,!vendor/**,!**/zz_generated*vendor/github.com/openshift/client-go/operator/applyconfigurations/internal/internal.gois excluded by!**/vendor/**,!vendor/**vendor/github.com/openshift/client-go/operator/applyconfigurations/operator/v1/nodestatus.gois excluded by!**/vendor/**,!vendor/**vendor/modules.txtis excluded by!**/vendor/**,!vendor/**
📒 Files selected for processing (4)
go.modpkg/operator/staticpod/controller/node/node_controller.gopkg/operator/staticpod/controller/node/node_controller_test.gopkg/operator/v1helpers/test_helpers.go
| replace github.com/openshift/api => github.com/bhperry/openshift-api v0.0.0-20260422220337-85d77b44a5a5 | ||
|
|
||
| replace github.com/openshift/client-go => github.com/bhperry/openshift-client-go v0.0.0-20260424144941-de7d2d9bafad |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify fork-based replace directives and estimate impact scope (read-only).
set -euo pipefail
echo "== replace directives in go.mod =="
rg -n '^replace\s+' go.mod
echo
echo "== non-openshift target replacements (potential fork/private targets) =="
awk '
$1=="replace" && $3=="=>" {
src=$2; dst=$4;
if (dst !~ /^github\.com\/openshift\//) print NR ":" $0
}
' go.mod
echo
echo "== import usage of affected modules across repo =="
rg -n --type=go 'github.com/openshift/(api|client-go)/' -g '!**/vendor/**'Repository: openshift/library-go
Length of output: 49882
Remove personal-fork replace directives from the shared module manifest.
Lines 5 and 7 redirect core dependencies to github.com/bhperry/* forks. These appear in 350+ imports across the codebase and affect critical operator, encryption, configuration, and routing functionality. Committing fork-based replacements breaks reproducibility for CI and downstream consumers if fork availability or history changes. Keep these overrides local (e.g., go.work) or switch to upstream module revisions before merge.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@go.mod` around lines 5 - 7, The go.mod contains forked-replace directives
(replace github.com/openshift/api => github.com/bhperry/openshift-api
v0.0.0-20260422220337-85d77b44a5a5 and replace github.com/openshift/client-go =>
github.com/bhperry/openshift-client-go v0.0.0-20260424144941-de7d2d9bafad) that
must be removed from the shared module manifest; delete these two replace lines
and either move the local overrides into a developer-only go.work (or similar
local config) or update the require lines to point to the approved upstream
module revisions before committing. Ensure no other project-wide go.mod contains
the same forked replace entries and run go mod tidy to verify dependency graph
is clean.
|
/jira refresh |
|
@bhperry: This pull request references Jira Issue OCPBUGS-80958, which is valid. 3 validation(s) were run on this bug
DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/retest |
|
@bhperry: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
69d77b8 to
9f5d47f
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
pkg/operator/staticpod/controller/node/node_controller.go (1)
110-118:⚠️ Potential issue | 🟠 MajorTighten the remove patch precondition to the UID-aware match.
The JSONPatch delete still only guards on
nodeName. If the status list has already been rewritten for the same name, this can delete the wrong entry. The removal path should validate the UID as well, or re-fetch before patching.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/operator/staticpod/controller/node/node_controller.go` around lines 110 - 118, The remove operation currently only tests /status/nodeStatuses/%d/nodeName via jsonPatch.WithRemove and jsonpatch.NewTestCondition using nodeState.NodeName, which can remove the wrong entry if names were reused; update the precondition to also validate the node UID (e.g., add a test for /status/nodeStatuses/%d/uid using nodeState.UID or otherwise combine name+uid checks) before calling jsonPatch.WithRemove (referencing jsonPatch.WithRemove, jsonpatch.NewTestCondition, nodeState.NodeName, nodeState.UID, and removedNodeStatusesCounter), or alternatively re-fetch the status list and re-compute removeAtIndex immediately prior to patching to ensure the exact entry is removed.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/operator/staticpod/controller/node/node_controller.go`:
- Around line 98-103: The current condition treats an empty nodeState.NodeUID as
a match which preserves legacy statuses and can keep stale state after a node is
deleted/recreated; update the matching logic so you only consider it a match
when nodeState.NodeUID exactly equals the currentUID (i.e., remove the "||
nodeState.NodeUID == \"\"" branch) in the block building newTargetNodeState (the
code that calls
applyoperatorv1.NodeStatus().WithNodeName(...).WithNodeUID(currentUID) and
appends to newTargetNodeStates), and apply the same strict-equality change to
the analogous logic around lines 126-136 so legacy empty UIDs are not treated as
matches and will be reset instead of preserved.
---
Outside diff comments:
In `@pkg/operator/staticpod/controller/node/node_controller.go`:
- Around line 110-118: The remove operation currently only tests
/status/nodeStatuses/%d/nodeName via jsonPatch.WithRemove and
jsonpatch.NewTestCondition using nodeState.NodeName, which can remove the wrong
entry if names were reused; update the precondition to also validate the node
UID (e.g., add a test for /status/nodeStatuses/%d/uid using nodeState.UID or
otherwise combine name+uid checks) before calling jsonPatch.WithRemove
(referencing jsonPatch.WithRemove, jsonpatch.NewTestCondition,
nodeState.NodeName, nodeState.UID, and removedNodeStatusesCounter), or
alternatively re-fetch the status list and re-compute removeAtIndex immediately
prior to patching to ensure the exact entry is removed.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: e0840c87-5aa4-4d00-bb60-525c3ca309aa
⛔ Files ignored due to path filters (6)
go.sumis excluded by!**/*.sumvendor/github.com/openshift/api/operator/v1/types.gois excluded by!**/vendor/**,!vendor/**vendor/github.com/openshift/api/operator/v1/zz_generated.swagger_doc_generated.gois excluded by!**/vendor/**,!vendor/**,!**/zz_generated*vendor/github.com/openshift/client-go/operator/applyconfigurations/internal/internal.gois excluded by!**/vendor/**,!vendor/**vendor/github.com/openshift/client-go/operator/applyconfigurations/operator/v1/nodestatus.gois excluded by!**/vendor/**,!vendor/**vendor/modules.txtis excluded by!**/vendor/**,!vendor/**
📒 Files selected for processing (4)
go.modpkg/operator/staticpod/controller/node/node_controller.gopkg/operator/staticpod/controller/node/node_controller_test.gopkg/operator/v1helpers/test_helpers.go
✅ Files skipped from review due to trivial changes (1)
- go.mod
🚧 Files skipped from review as they are similar to previous changes (2)
- pkg/operator/v1helpers/test_helpers.go
- pkg/operator/staticpod/controller/node/node_controller_test.go
9f5d47f to
849d9a8
Compare
When master node is deleted and recreated with the same name, ensure the node status is reset by comparing UID.
Summary by CodeRabbit
New Features
Tests