SREP-4796: Allow CreateMustGather alert in conformance tests#31128
SREP-4796: Allow CreateMustGather alert in conformance tests#31128dustman9000 wants to merge 3 commits intoopenshift:mainfrom
Conversation
CreateMustGather is a managed-cluster-config meta-alert that fires when any other alert has been active for 15 minutes, triggering CAD to collect a must-gather. It fires during cluster install settling and fails the alerts-in-firing-state conformance test. Jira: https://redhat.atlassian.net/browse/SREP-4796
|
Pipeline controller notification For optional jobs, comment This repository is configured in: automatic mode |
|
@dustman9000: This pull request references SREP-4796 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "5.0.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
WalkthroughAdds a regex-style allowed alert pattern ChangesAllowed-alert matching and skip decision
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes 🚥 Pre-merge checks | ✅ 11 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (11 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: dustman9000 The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Scheduling required tests: |
|
/retest |
The CAD team plans to add a suffix to the alert name. Use a regex pattern so the allowlist matches both current and future variants. Jira: https://redhat.atlassian.net/browse/SREP-4796
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@pkg/monitortests/testframework/legacytestframeworkmonitortests/alerts.go`:
- Line 243: The loop in isSkippedAlert currently ignores the error from
regexp.MatchString which hides malformed regexes in AllowedAlertNames; change
isSkippedAlert to check the error returned by regexp.MatchString("^"+a+"$",
alertName) and fail fast by returning the error (i.e., change signature to
isSkippedAlert(...) (bool, error) or panic/log.Fatalf in test code), update all
callers accordingly to propagate or handle the error, and ensure any malformed
pattern causes an immediate test failure rather than being silently skipped.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 17d9042d-bd24-4629-aa13-473c77eed93a
📒 Files selected for processing (2)
pkg/monitortestlibrary/allowedalerts/types.gopkg/monitortests/testframework/legacytestframeworkmonitortests/alerts.go
✅ Files skipped from review due to trivial changes (1)
- pkg/monitortestlibrary/allowedalerts/types.go
| // Some alerts we always skip over in CI: | ||
| for _, a := range allowedalerts.AllowedAlertNames { | ||
| if a == alertName { | ||
| if matched, _ := regexp.MatchString("^"+a+"$", alertName); matched { |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "Verify ignored regexp errors in isSkippedAlert:"
rg -n --type=go 'regexp\.MatchString\(' pkg/monitortests/testframework/legacytestframeworkmonitortests/alerts.go
rg -n --type=go 'if matched,\s*_\s*:=' pkg/monitortests/testframework/legacytestframeworkmonitortests/alerts.go
echo
echo "Inspect AllowedAlertNames patterns to audit regex validity/intent:"
awk '/AllowedAlertNames/,/\}/ {print NR ":" $0}' pkg/monitortestlibrary/allowedalerts/types.goRepository: openshift/origin
Length of output: 1156
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Check imports in the alerts.go file
echo "Checking imports in alerts.go:"
head -20 pkg/monitortests/testframework/legacytestframeworkmonitortests/alerts.go
echo
echo "Checking full isSkippedAlert function:"
sed -n '238,260p' pkg/monitortests/testframework/legacytestframeworkmonitortests/alerts.goRepository: openshift/origin
Length of output: 1577
Handle regex errors explicitly in isSkippedAlert.
Line 243 ignores regexp.MatchString errors, so an invalid allowlist pattern fails silently and can cause hard-to-debug invariant test failures. Since AllowedAlertNames contains regex patterns like "CreateMustGather.*", malformed patterns must be caught early. Please fail fast instead of dropping the error.
Proposed fix
func isSkippedAlert(alertName string) bool {
// Some alerts we always skip over in CI:
for _, a := range allowedalerts.AllowedAlertNames {
- if matched, _ := regexp.MatchString("^"+a+"$", alertName); matched {
+ pattern := "^" + a + "$"
+ matched, err := regexp.MatchString(pattern, alertName)
+ if err != nil {
+ panic(fmt.Sprintf("invalid allowed alert regex %q: %v", pattern, err))
+ }
+ if matched {
return true
}
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if matched, _ := regexp.MatchString("^"+a+"$", alertName); matched { | |
| pattern := "^" + a + "$" | |
| matched, err := regexp.MatchString(pattern, alertName) | |
| if err != nil { | |
| panic(fmt.Sprintf("invalid allowed alert regex %q: %v", pattern, err)) | |
| } | |
| if matched { | |
| return true | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@pkg/monitortests/testframework/legacytestframeworkmonitortests/alerts.go` at
line 243, The loop in isSkippedAlert currently ignores the error from
regexp.MatchString which hides malformed regexes in AllowedAlertNames; change
isSkippedAlert to check the error returned by regexp.MatchString("^"+a+"$",
alertName) and fail fast by returning the error (i.e., change signature to
isSkippedAlert(...) (bool, error) or panic/log.Fatalf in test code), update all
callers accordingly to propagate or handle the error, and ensure any malformed
pattern causes an immediate test failure rather than being silently skipped.
|
@dustman9000: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Summary
Add
CreateMustGatherto theAllowedAlertNameslist so it does not fail the "shouldn't report any alerts in firing state" conformance test.Why
CreateMustGatheris a managed-cluster-config meta-alert that fires when any other alert has been active for 15 minutes. It triggers CAD (configuration-anomaly-detection) to collect a must-gather for debugging. On CI clusters, transient alerts during cluster install settling trigger CreateMustGather, which persists after the underlying alert resolves and fails conformance testing.This has caused the alert to be reverted from MCC twice.
Context
Test plan
Summary by CodeRabbit
New Features
Bug Fixes / Behavior Change