[POC] test: e2e test for UIPlugin cleanup on operator uninstall (COO-1404) by DavidRajnoha · Pull Request #1083 · rhobs/observability-operator

DavidRajnoha · 2026-05-06T11:59:36Z

Summary

Adds TestUIPluginUninstallCleanup e2e test that reproduces the bug where UIPlugin operands are orphaned when COO is uninstalled via OLM without manually deleting UIPlugin CRs first
Creates a monitoring UIPlugin with health-analyzer, simulates OLM uninstall (CSV + Subscription deletion), then asserts cleanup of UIPlugin CR, Deployments, Services, ServiceAccount, ClusterRole, ClusterRoleBinding, and pods
All 9 assertions currently fail, confirming the bug
Uses a custom waitForResourceAbsent instead of Framework.AssertResourceAbsent due to a silent-pass bug in the framework (e2e framework: AssertResourceAbsent silently passes when resource still exists #1082)
Includes self-healing t.Cleanup that reinstalls the Subscription after the test

Run modes

# Normal: auto-restores operator afterward
go test -v -timeout 20m ./test/e2e/... \
  -run TestUIPluginUninstallCleanup -count 1 \
  -args -operatorInstallNS="openshift-cluster-observability-operator"

# Inspect: pauses before restoring (bump -timeout accordingly)
go test -v -timeout 30m ./test/e2e/... \
  -run TestUIPluginUninstallCleanup -count 1 \
  -args -operatorInstallNS="openshift-cluster-observability-operator" \
  -postpone-restoration=10m

# Retain: never restores, full manual control
go test -v -timeout 20m ./test/e2e/... \
  -run TestUIPluginUninstallCleanup -count 1 \
  -args -operatorInstallNS="openshift-cluster-observability-operator" -retain

Example output (all subtests fail as expected)

--- FAIL: TestUIPluginUninstallCleanup/UIPlugin_CR_is_deleted (180.00s)
--- FAIL: TestUIPluginUninstallCleanup/monitoring_plugin_deployment_is_deleted (180.00s)
--- FAIL: TestUIPluginUninstallCleanup/health-analyzer_deployment_is_deleted (180.00s)
--- FAIL: TestUIPluginUninstallCleanup/health-analyzer_service_is_deleted (180.00s)
--- FAIL: TestUIPluginUninstallCleanup/monitoring_plugin_service_is_deleted (180.00s)
--- FAIL: TestUIPluginUninstallCleanup/monitoring_plugin_service_account_is_deleted (180.00s)
--- FAIL: TestUIPluginUninstallCleanup/components-health-view_ClusterRole_is_deleted (180.00s)
--- FAIL: TestUIPluginUninstallCleanup/components-health-view_ClusterRoleBinding_is_deleted (180.00s)
--- FAIL: TestUIPluginUninstallCleanup/no_UIPlugin-managed_pods_remain_in_operator_namespace (300.00s)

Test plan

Verified on OCP 4.21.0 nightly — all 9 assertions fail, confirming orphaned resources
Cleanup reinstalls Subscription after run
-retain flag skips cleanup for manual inspection
-postpone-restoration=10m delays cleanup
go vet passes

Made with Cursor

Adds TestUIPluginUninstallCleanup to reproduce the bug where UIPlugin operands (Deployments, Services, ServiceAccounts, ClusterRoles, ClusterRoleBindings, pods) are orphaned when COO is uninstalled via OLM without manually deleting UIPlugin CRs first. The test creates a monitoring UIPlugin with health-analyzer enabled, simulates OLM uninstall by deleting the CSV and Subscription, then asserts all child resources are cleaned up. All 9 assertions currently fail, confirming the bug. Does not use Framework.AssertResourceAbsent due to a silent-pass bug (rhobs#1082); uses a custom waitForResourceAbsent with correct polling semantics. Includes self-healing cleanup that reinstalls the Subscription after the test, and a -postpone-restoration flag for manual cluster inspection. Co-authored-by: Cursor <cursoragent@cursor.com>

openshift-ci · 2026-05-06T11:59:41Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: DavidRajnoha
Once this PR has been reviewed and has the lgtm label, please assign jgbernalp for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

coderabbitai · 2026-05-06T11:59:44Z

Important

Review skipped

Auto reviews are limited based on label configuration.

🚫 Review skipped — only excluded labels are configured. (1)

work-in-progress

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Repository YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: a43c4e01-fb43-42dc-98f9-d036fceb255d

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

✅ Review completed - (🔄 Check again to review again)

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

openshift-ci · 2026-05-06T11:59:46Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

test/e2e/uiplugin_uninstall_test.go (1)
189-190: 💤 Low value

"--- parallel resource checks done ---" fires before any parallel subtest runs

In Go's testing model, subtests that call t.Parallel() pause immediately and are released only when the parent function returns. All nine t.Run calls at lines 144–187 return instantly (the subtests are queued, not executing). Line 190 therefore logs done before any parallel work begins. Consider removing or relabelling it (e.g., "--- parallel resource checks released ---").
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e/uiplugin_uninstall_test.go` around lines 189 - 190, The log message
"t.Log(\"--- parallel resource checks done ---\")" fires before any t.Parallel()
subtests actually start because parent returns immediately after queuing
subtests; update the message or remove it: locate the t.Log call in the test
containing the sequence of t.Run(...) subtests that invoke t.Parallel() and
either delete that log or change it to a clearer phrase such as "--- parallel
resource checks released ---" (or move the log to after waiting for parallel
work to complete if you add explicit synchronization).

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@test/e2e/uiplugin_uninstall_test.go`:
- Around line 144-187: The tests rely on hardcoded resource names and call
waitForResourceAbsent which treats NotFound as success, so a wrong name silently
passes; modify the test flow to verify the resource actually exists before
waiting for its deletion: add a pre-existence Get (using the same client/kube
context used by waitForResourceAbsent) for each named resource (e.g.,
"monitoring-sa", "components-health-view", "monitoring-components-health-view")
and fail the subtest immediately if Get returns NotFound or another unexpected
error, then call waitForResourceAbsent to poll for eventual deletion; update
waitForResourceAbsent (or its callers in uiplugin_uninstall_test.go) to perform
this two-step check so wrong names no longer produce instant, false-positive
success.

---

Nitpick comments:
In `@test/e2e/uiplugin_uninstall_test.go`:
- Around line 189-190: The log message "t.Log(\"--- parallel resource checks
done ---\")" fires before any t.Parallel() subtests actually start because
parent returns immediately after queuing subtests; update the message or remove
it: locate the t.Log call in the test containing the sequence of t.Run(...)
subtests that invoke t.Parallel() and either delete that log or change it to a
clearer phrase such as "--- parallel resource checks released ---" (or move the
log to after waiting for parallel work to complete if you add explicit
synchronization).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: ddaac954-c82b-4a9b-bcff-d26c643d11f9

📥 Commits

Reviewing files that changed from the base of the PR and between 9bd5e3a and cf0b561.

📒 Files selected for processing (2)

test/e2e/main_test.go
test/e2e/uiplugin_uninstall_test.go

coderabbitai · 2026-05-06T12:07:01Z

+	t.Run("UIPlugin CR is deleted", func(t *testing.T) {
+		t.Parallel()
+		waitForResourceAbsent(t, "monitoring", "", &uiv1.UIPlugin{}, cleanupTimeout)
+	})
+
+	t.Run("monitoring plugin deployment is deleted", func(t *testing.T) {
+		t.Parallel()
+		waitForResourceAbsent(t, "monitoring", ns, &appsv1.Deployment{}, cleanupTimeout)
+	})
+
+	t.Run("health-analyzer deployment is deleted", func(t *testing.T) {
+		t.Parallel()
+		waitForResourceAbsent(t, "health-analyzer", ns, &appsv1.Deployment{}, cleanupTimeout)
+	})
+
+	t.Run("health-analyzer service is deleted", func(t *testing.T) {
+		t.Parallel()
+		waitForResourceAbsent(t, "health-analyzer", ns, &corev1.Service{}, cleanupTimeout)
+	})
+
+	t.Run("monitoring plugin service is deleted", func(t *testing.T) {
+		t.Parallel()
+		waitForResourceAbsent(t, "monitoring", ns, &corev1.Service{}, cleanupTimeout)
+	})
+
+	t.Run("monitoring plugin service account is deleted", func(t *testing.T) {
+		t.Parallel()
+		waitForResourceAbsent(t, "monitoring-sa", ns, &corev1.ServiceAccount{}, cleanupTimeout)
+	})
+
+	t.Run("components-health-view ClusterRole is deleted", func(t *testing.T) {
+		t.Parallel()
+		waitForResourceAbsent(t, "components-health-view", "", &rbacv1.ClusterRole{}, cleanupTimeout)
+	})
+
+	t.Run("components-health-view ClusterRoleBinding is deleted", func(t *testing.T) {
+		t.Parallel()
+		waitForResourceAbsent(t, "monitoring-components-health-view", "", &rbacv1.ClusterRoleBinding{}, cleanupTimeout)
+	})
+
+	t.Run("no UIPlugin-managed pods remain in operator namespace", func(t *testing.T) {
+		t.Parallel()
+		assertNoManagedPodsRemain(t, ctx, ns)
+	})


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Hardcoded resource names create silent false-pass risk

waitForResourceAbsent polls Get and returns success immediately when apierrors.IsNotFound — which is indistinguishable from a wrong name and an actually-absent resource. If any of the hardcoded names ("monitoring-sa", "components-health-view", "monitoring-components-health-view") don't match what the operator actually creates, the subtests pass instantly regardless of whether orphaned resources remain, defeating the test's purpose.

A lightweight fix is to verify each resource exists before entering the absence poll:

🛡️ Proposed pre-existence check in waitForResourceAbsent

func waitForResourceAbsent(t *testing.T, name, namespace string, obj client.Object, timeout time.Duration) { t.Helper() key := client.ObjectKey{Name: name, Namespace: namespace} + // Confirm the resource actually exists before waiting for it to disappear, + // so a wrong name doesn't silently pass the assertion. + if err := f.K8sClient.Get(context.Background(), key, obj); apierrors.IsNotFound(err) { + t.Logf("warning: %T %s/%s not found before waiting for absence — verify the resource name is correct", obj, namespace, name) + } err := wait.PollUntilContextTimeout(context.Background(), 5*time.Second, timeout, true, func(ctx context.Context) (bool, error) {

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/e2e/uiplugin_uninstall_test.go` around lines 144 - 187, The tests rely on hardcoded resource names and call waitForResourceAbsent which treats NotFound as success, so a wrong name silently passes; modify the test flow to verify the resource actually exists before waiting for its deletion: add a pre-existence Get (using the same client/kube context used by waitForResourceAbsent) for each named resource (e.g., "monitoring-sa", "components-health-view", "monitoring-components-health-view") and fail the subtest immediately if Get returns NotFound or another unexpected error, then call waitForResourceAbsent to poll for eventual deletion; update waitForResourceAbsent (or its callers in uiplugin_uninstall_test.go) to perform this two-step check so wrong names no longer produce instant, false-positive success.

openshift-ci Bot added the do-not-merge/work-in-progress label May 6, 2026

coderabbitai Bot reviewed May 6, 2026

View reviewed changes

DavidRajnoha mentioned this pull request May 7, 2026

COO-1404: fix: remove ui plugin finalizers in favor of k8s garbage collection #1084

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[POC] test: e2e test for UIPlugin cleanup on operator uninstall (COO-1404)#1083

[POC] test: e2e test for UIPlugin cleanup on operator uninstall (COO-1404)#1083
DavidRajnoha wants to merge 1 commit intorhobs:mainfrom
DavidRajnoha:COO-1404-uninstall-cleanup-e2e

DavidRajnoha commented May 6, 2026

Uh oh!

openshift-ci Bot commented May 6, 2026

Uh oh!

coderabbitai Bot commented May 6, 2026 •

edited

Loading

Review skipped

Uh oh!

openshift-ci Bot commented May 6, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot May 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

DavidRajnoha commented May 6, 2026

Summary

Run modes

Example output (all subtests fail as expected)

Related

Test plan

Uh oh!

openshift-ci Bot commented May 6, 2026

Uh oh!

coderabbitai Bot commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

openshift-ci Bot commented May 6, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 6, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai Bot commented May 6, 2026 •

edited

Loading