Skip to content

[POC] test: e2e test for UIPlugin cleanup on operator uninstall (COO-1404)#1083

Draft
DavidRajnoha wants to merge 1 commit intorhobs:mainfrom
DavidRajnoha:COO-1404-uninstall-cleanup-e2e
Draft

[POC] test: e2e test for UIPlugin cleanup on operator uninstall (COO-1404)#1083
DavidRajnoha wants to merge 1 commit intorhobs:mainfrom
DavidRajnoha:COO-1404-uninstall-cleanup-e2e

Conversation

@DavidRajnoha
Copy link
Copy Markdown
Collaborator

Summary

  • Adds TestUIPluginUninstallCleanup e2e test that reproduces the bug where UIPlugin operands are orphaned when COO is uninstalled via OLM without manually deleting UIPlugin CRs first
  • Creates a monitoring UIPlugin with health-analyzer, simulates OLM uninstall (CSV + Subscription deletion), then asserts cleanup of UIPlugin CR, Deployments, Services, ServiceAccount, ClusterRole, ClusterRoleBinding, and pods
  • All 9 assertions currently fail, confirming the bug
  • Uses a custom waitForResourceAbsent instead of Framework.AssertResourceAbsent due to a silent-pass bug in the framework (e2e framework: AssertResourceAbsent silently passes when resource still exists #1082)
  • Includes self-healing t.Cleanup that reinstalls the Subscription after the test

Run modes

# Normal: auto-restores operator afterward
go test -v -timeout 20m ./test/e2e/... \
  -run TestUIPluginUninstallCleanup -count 1 \
  -args -operatorInstallNS="openshift-cluster-observability-operator"

# Inspect: pauses before restoring (bump -timeout accordingly)
go test -v -timeout 30m ./test/e2e/... \
  -run TestUIPluginUninstallCleanup -count 1 \
  -args -operatorInstallNS="openshift-cluster-observability-operator" \
  -postpone-restoration=10m

# Retain: never restores, full manual control
go test -v -timeout 20m ./test/e2e/... \
  -run TestUIPluginUninstallCleanup -count 1 \
  -args -operatorInstallNS="openshift-cluster-observability-operator" -retain

Example output (all subtests fail as expected)

--- FAIL: TestUIPluginUninstallCleanup/UIPlugin_CR_is_deleted (180.00s)
--- FAIL: TestUIPluginUninstallCleanup/monitoring_plugin_deployment_is_deleted (180.00s)
--- FAIL: TestUIPluginUninstallCleanup/health-analyzer_deployment_is_deleted (180.00s)
--- FAIL: TestUIPluginUninstallCleanup/health-analyzer_service_is_deleted (180.00s)
--- FAIL: TestUIPluginUninstallCleanup/monitoring_plugin_service_is_deleted (180.00s)
--- FAIL: TestUIPluginUninstallCleanup/monitoring_plugin_service_account_is_deleted (180.00s)
--- FAIL: TestUIPluginUninstallCleanup/components-health-view_ClusterRole_is_deleted (180.00s)
--- FAIL: TestUIPluginUninstallCleanup/components-health-view_ClusterRoleBinding_is_deleted (180.00s)
--- FAIL: TestUIPluginUninstallCleanup/no_UIPlugin-managed_pods_remain_in_operator_namespace (300.00s)

Related

Test plan

  • Verified on OCP 4.21.0 nightly — all 9 assertions fail, confirming orphaned resources
  • Cleanup reinstalls Subscription after run
  • -retain flag skips cleanup for manual inspection
  • -postpone-restoration=10m delays cleanup
  • go vet passes

Made with Cursor

Adds TestUIPluginUninstallCleanup to reproduce the bug where UIPlugin
operands (Deployments, Services, ServiceAccounts, ClusterRoles,
ClusterRoleBindings, pods) are orphaned when COO is uninstalled via OLM
without manually deleting UIPlugin CRs first.

The test creates a monitoring UIPlugin with health-analyzer enabled,
simulates OLM uninstall by deleting the CSV and Subscription, then
asserts all child resources are cleaned up. All 9 assertions currently
fail, confirming the bug.

Does not use Framework.AssertResourceAbsent due to a silent-pass bug
(rhobs#1082); uses a custom waitForResourceAbsent
with correct polling semantics.

Includes self-healing cleanup that reinstalls the Subscription after the
test, and a -postpone-restoration flag for manual cluster inspection.

Co-authored-by: Cursor <cursoragent@cursor.com>
@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented May 6, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: DavidRajnoha
Once this PR has been reviewed and has the lgtm label, please assign jgbernalp for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 6, 2026

Important

Review skipped

Auto reviews are limited based on label configuration.

🚫 Review skipped — only excluded labels are configured. (1)
  • work-in-progress

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Repository YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: a43c4e01-fb43-42dc-98f9-d036fceb255d

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • ✅ Review completed - (🔄 Check again to review again)
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented May 6, 2026

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
test/e2e/uiplugin_uninstall_test.go (1)

189-190: 💤 Low value

"--- parallel resource checks done ---" fires before any parallel subtest runs

In Go's testing model, subtests that call t.Parallel() pause immediately and are released only when the parent function returns. All nine t.Run calls at lines 144–187 return instantly (the subtests are queued, not executing). Line 190 therefore logs done before any parallel work begins. Consider removing or relabelling it (e.g., "--- parallel resource checks released ---").

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e/uiplugin_uninstall_test.go` around lines 189 - 190, The log message
"t.Log(\"--- parallel resource checks done ---\")" fires before any t.Parallel()
subtests actually start because parent returns immediately after queuing
subtests; update the message or remove it: locate the t.Log call in the test
containing the sequence of t.Run(...) subtests that invoke t.Parallel() and
either delete that log or change it to a clearer phrase such as "--- parallel
resource checks released ---" (or move the log to after waiting for parallel
work to complete if you add explicit synchronization).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@test/e2e/uiplugin_uninstall_test.go`:
- Around line 144-187: The tests rely on hardcoded resource names and call
waitForResourceAbsent which treats NotFound as success, so a wrong name silently
passes; modify the test flow to verify the resource actually exists before
waiting for its deletion: add a pre-existence Get (using the same client/kube
context used by waitForResourceAbsent) for each named resource (e.g.,
"monitoring-sa", "components-health-view", "monitoring-components-health-view")
and fail the subtest immediately if Get returns NotFound or another unexpected
error, then call waitForResourceAbsent to poll for eventual deletion; update
waitForResourceAbsent (or its callers in uiplugin_uninstall_test.go) to perform
this two-step check so wrong names no longer produce instant, false-positive
success.

---

Nitpick comments:
In `@test/e2e/uiplugin_uninstall_test.go`:
- Around line 189-190: The log message "t.Log(\"--- parallel resource checks
done ---\")" fires before any t.Parallel() subtests actually start because
parent returns immediately after queuing subtests; update the message or remove
it: locate the t.Log call in the test containing the sequence of t.Run(...)
subtests that invoke t.Parallel() and either delete that log or change it to a
clearer phrase such as "--- parallel resource checks released ---" (or move the
log to after waiting for parallel work to complete if you add explicit
synchronization).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: ddaac954-c82b-4a9b-bcff-d26c643d11f9

📥 Commits

Reviewing files that changed from the base of the PR and between 9bd5e3a and cf0b561.

📒 Files selected for processing (2)
  • test/e2e/main_test.go
  • test/e2e/uiplugin_uninstall_test.go

Comment on lines +144 to +187
t.Run("UIPlugin CR is deleted", func(t *testing.T) {
t.Parallel()
waitForResourceAbsent(t, "monitoring", "", &uiv1.UIPlugin{}, cleanupTimeout)
})

t.Run("monitoring plugin deployment is deleted", func(t *testing.T) {
t.Parallel()
waitForResourceAbsent(t, "monitoring", ns, &appsv1.Deployment{}, cleanupTimeout)
})

t.Run("health-analyzer deployment is deleted", func(t *testing.T) {
t.Parallel()
waitForResourceAbsent(t, "health-analyzer", ns, &appsv1.Deployment{}, cleanupTimeout)
})

t.Run("health-analyzer service is deleted", func(t *testing.T) {
t.Parallel()
waitForResourceAbsent(t, "health-analyzer", ns, &corev1.Service{}, cleanupTimeout)
})

t.Run("monitoring plugin service is deleted", func(t *testing.T) {
t.Parallel()
waitForResourceAbsent(t, "monitoring", ns, &corev1.Service{}, cleanupTimeout)
})

t.Run("monitoring plugin service account is deleted", func(t *testing.T) {
t.Parallel()
waitForResourceAbsent(t, "monitoring-sa", ns, &corev1.ServiceAccount{}, cleanupTimeout)
})

t.Run("components-health-view ClusterRole is deleted", func(t *testing.T) {
t.Parallel()
waitForResourceAbsent(t, "components-health-view", "", &rbacv1.ClusterRole{}, cleanupTimeout)
})

t.Run("components-health-view ClusterRoleBinding is deleted", func(t *testing.T) {
t.Parallel()
waitForResourceAbsent(t, "monitoring-components-health-view", "", &rbacv1.ClusterRoleBinding{}, cleanupTimeout)
})

t.Run("no UIPlugin-managed pods remain in operator namespace", func(t *testing.T) {
t.Parallel()
assertNoManagedPodsRemain(t, ctx, ns)
})
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Hardcoded resource names create silent false-pass risk

waitForResourceAbsent polls Get and returns success immediately when apierrors.IsNotFound — which is indistinguishable from a wrong name and an actually-absent resource. If any of the hardcoded names ("monitoring-sa", "components-health-view", "monitoring-components-health-view") don't match what the operator actually creates, the subtests pass instantly regardless of whether orphaned resources remain, defeating the test's purpose.

A lightweight fix is to verify each resource exists before entering the absence poll:

🛡️ Proposed pre-existence check in waitForResourceAbsent
 func waitForResourceAbsent(t *testing.T, name, namespace string, obj client.Object, timeout time.Duration) {
 	t.Helper()
 	key := client.ObjectKey{Name: name, Namespace: namespace}
+	// Confirm the resource actually exists before waiting for it to disappear,
+	// so a wrong name doesn't silently pass the assertion.
+	if err := f.K8sClient.Get(context.Background(), key, obj); apierrors.IsNotFound(err) {
+		t.Logf("warning: %T %s/%s not found before waiting for absence — verify the resource name is correct", obj, namespace, name)
+	}
 	err := wait.PollUntilContextTimeout(context.Background(), 5*time.Second, timeout, true, func(ctx context.Context) (bool, error) {
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e/uiplugin_uninstall_test.go` around lines 144 - 187, The tests rely
on hardcoded resource names and call waitForResourceAbsent which treats NotFound
as success, so a wrong name silently passes; modify the test flow to verify the
resource actually exists before waiting for its deletion: add a pre-existence
Get (using the same client/kube context used by waitForResourceAbsent) for each
named resource (e.g., "monitoring-sa", "components-health-view",
"monitoring-components-health-view") and fail the subtest immediately if Get
returns NotFound or another unexpected error, then call waitForResourceAbsent to
poll for eventual deletion; update waitForResourceAbsent (or its callers in
uiplugin_uninstall_test.go) to perform this two-step check so wrong names no
longer produce instant, false-positive success.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant