adding investigate-ci-failure skills#1659
Conversation
📝 WalkthroughWalkthroughThis PR introduces a comprehensive Cursor skill guide for investigating CI failures in the lightspeed-operator repository. The guide covers end-to-end procedures for diagnosing Prow and Konflux failures from PR URLs, including artifact triage, task analysis, job reference data, and validation workflows. ChangesCI Failure Investigation Skill Guide
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In @.cursor/skills/investigate-ci-failure/SKILL.md:
- Around line 218-223: The table row for "Enterprise Contract / trusted links"
is missing the third cell and triggers MD056; update that row in the Markdown
table to include an `output.summary` column (e.g., add a third pipe cell with a
short summary like "Use EC docs/UI — no task table" or "N/A") so the row has
three cells matching the headers (`Check type`, `Task table location`,
`output.summary`) and ensure the row uses the same pipe-separated format as the
other rows.
- Around line 54-56: Several fenced code blocks in SKILL.md are missing language
tags (e.g., the block containing the Prow URL ```` ``` https://prow... ````);
update every fenced block to include an appropriate language identifier so
markdownlint MD040 passes: use "text" for plain URLs/snippets, "bash" for
shell/CLI examples, and "json" for JSON payloads. Scan the file for all
triple-backtick blocks (including the ones noted in the review) and add the
correct tag right after the opening backticks (for example change ``` to ```text
or ```bash/```json) keeping the block content unchanged.
- Around line 563-565: The example uses a literal '*' in the HTTP URL (curl -sf
"$BASE/pull-ci-openshift-lightspeed-operator-main-unit/*/finished.json"), which
doesn't work because shells don't expand wildcards in remote URLs; replace the
one-step wildcard example with a two-step approach: first retrieve the builds
index (e.g., list the directory at
"$BASE/pull-ci-openshift-lightspeed-operator-main-unit/"), determine the desired
build_id (latest or explicit), then request that specific finished.json URL
(i.e., construct and curl "$BASE/.../{build_id}/finished.json"); update the
SKILL.md example to show these two steps and emphasize preferring an explicit
build_id rather than a wildcard.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: f51eeaca-b8e0-417f-bdae-c8b7e6008285
📒 Files selected for processing (1)
.cursor/skills/investigate-ci-failure/SKILL.md
| ``` | ||
| https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/{org}_{repo}/{pr}/{job_name}/{build_id} | ||
| ``` |
There was a problem hiding this comment.
Add language identifiers to fenced code blocks to satisfy markdownlint (MD040).
Several fenced blocks are missing a language tag. Add text/bash/json as appropriate to keep docs lint-clean.
Also applies to: 70-72, 78-80, 94-96, 99-112, 166-168, 422-424, 435-439, 443-447
🧰 Tools
🪛 markdownlint-cli2 (0.22.1)
[warning] 54-54: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In @.cursor/skills/investigate-ci-failure/SKILL.md around lines 54 - 56, Several
fenced code blocks in SKILL.md are missing language tags (e.g., the block
containing the Prow URL ```` ``` https://prow... ````); update every fenced
block to include an appropriate language identifier so markdownlint MD040
passes: use "text" for plain URLs/snippets, "bash" for shell/CLI examples, and
"json" for JSON payloads. Scan the file for all triple-backtick blocks
(including the ones noted in the review) and add the correct tag right after the
opening backticks (for example change ``` to ```text or ```bash/```json) keeping
the block content unchanged.
| | Check type | Task table location | `output.summary` | | ||
| |---|---|---| | ||
| | Build pipelines (`lightspeed-operator-on-pull-request`, `ols-bundle-on-pull-request`) | **`output.text`** — HTML `<h4>Task Statuses:</h4>` table with 🟢/🔴 and per-task Konflux UI log links | Often short (e.g. "Build pipeline … has passed") | | ||
| | Integration tests (`operator-e2e-tests-*`, `upgrade-e2e-tests`, `service-e2e-tests-*`, console tests) | **`output.text`** — markdown table: Task \| Duration \| Status \| Details; pipelinerun link at top | One line: "Integration test for component … has passed/failed" | | ||
| | Enterprise Contract / trusted links | `https://red.ht/trusted` in `gh pr checks` — use EC docs or UI; not a task table in the API | | ||
|
|
There was a problem hiding this comment.
Fix the table row with missing third column (MD056).
The Enterprise Contract / trusted links row has only 2 cells in a 3-column table, which breaks rendering/linting. Add the missing output.summary cell.
🧰 Tools
🪛 LanguageTool
[uncategorized] ~221-~221: Did you mean the formatting language “Markdown” (= proper noun)?
Context: ..., console tests) | **output.text`** — markdown table: Task | Duration | Status | De...
(MARKDOWN_NNP)
🪛 markdownlint-cli2 (0.22.1)
[warning] 222-222: Table column count
Expected: 3; Actual: 2; Too few cells, row will be missing data
(MD056, table-column-count)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In @.cursor/skills/investigate-ci-failure/SKILL.md around lines 218 - 223, The
table row for "Enterprise Contract / trusted links" is missing the third cell
and triggers MD056; update that row in the Markdown table to include an
`output.summary` column (e.g., add a third pipe cell with a short summary like
"Use EC docs/UI — no task table" or "N/A") so the row has three cells matching
the headers (`Check type`, `Task table location`, `output.summary`) and ensure
the row uses the same pipe-separated format as the other rows.
| curl -sf "$BASE/pull-ci-openshift-lightspeed-operator-main-unit/*/finished.json" 2>/dev/null | head -1 | ||
| # Prefer explicit build_id from gh pr checks URL tail | ||
|
|
There was a problem hiding this comment.
Fix the broken wildcard curl example in smoke tests.
curl does not expand * in HTTP URLs, so this command won’t reliably fetch finished.json and can mislead validation.
Proposed doc fix
-# Or take build_id from prow target_url in statuses API
-curl -sf "$BASE/pull-ci-openshift-lightspeed-operator-main-unit/*/finished.json" 2>/dev/null | head -1
-# Prefer explicit build_id from gh pr checks URL tail
+# Prefer explicit build_id from prow target_url (statuses API) or gh pr checks URL tail
+BUILD_ID=$(gh api "repos/$REPO/statuses/$SHA" \
+ --jq '.[] | select(.context=="ci/prow/unit") | .target_url' \
+ | awk -F/ '{print $NF}' | head -1)
+curl -sf "$BASE/pull-ci-openshift-lightspeed-operator-main-unit/$BUILD_ID/finished.json" | head -1🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In @.cursor/skills/investigate-ci-failure/SKILL.md around lines 563 - 565, The
example uses a literal '*' in the HTTP URL (curl -sf
"$BASE/pull-ci-openshift-lightspeed-operator-main-unit/*/finished.json"), which
doesn't work because shells don't expand wildcards in remote URLs; replace the
one-step wildcard example with a two-step approach: first retrieve the builds
index (e.g., list the directory at
"$BASE/pull-ci-openshift-lightspeed-operator-main-unit/"), determine the desired
build_id (latest or explicit), then request that specific finished.json URL
(i.e., construct and curl "$BASE/.../{build_id}/finished.json"); update the
SKILL.md example to show these two steps and emphasize preferring an explicit
build_id rather than a wildcard.
|
@blublinsky: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Description
Summary
Adds an operator-specific investigate-ci-failure Cursor skill so agents can triage failed PR checks on openshift/lightspeed-operator using Prow and Konflux without Konflux SSO for most cases.
Adapted from the lightspeed-service skill and validated against PR #1641 (smoke-tested gh, GCS, and Quay paths).
What’s in the skill
Dual CI entry point — gh pr checks, Prow commit statuses vs Konflux check runs
Prow (operator) — GCS artifact layout for ci/prow/*, especially bundle-e2e-4-XX/e2e-test/build-log.txt (Ginkgo in test/e2e/), distinct from service e2e-ols-cluster/
Konflux — Task tables in check run output.text (not output.summary); build vs integration check patterns; Quay ols-operator-artifacts:{sha} and on-pr-{sha} retention
Repo reference — Prow job list, Konflux pipeline/check names, .tekton triggers (including related_images.json → ols-bundle-on-pull-request), local command mapping (make test, etc.)
Triage workflow — Prow → GCS → Konflux API → oras → Konflux UI fallback; report template (retry / fix / escalate)
Skill validation — Copy-paste smoke commands against a known green PR
How to use
Invoke when a PR has red checks, e.g. /investigate-ci-failure or @ skill with a PR URL. Skill has disable-model-invocation: true — it does not auto-attach; invoke explicitly.
Type of change
Related Tickets & Documents
Checklist before requesting a review
Testing
Summary by CodeRabbit