Skip to content

[TRUNK-17757] Document Test History tab on Flaky Tests test detail page#481

Open
samgutentag wants to merge 1 commit intomainfrom
sam-gutentag/test-run-history-tab
Open

[TRUNK-17757] Document Test History tab on Flaky Tests test detail page#481
samgutentag wants to merge 1 commit intomainfrom
sam-gutentag/test-run-history-tab

Conversation

@samgutentag
Copy link
Copy Markdown
Member

See PR body below

@trunk-staging-io
Copy link
Copy Markdown

Merging to main in this repository is managed by Trunk.

  • To merge this pull request, check the box to the left or comment /trunk merge below.

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented Mar 16, 2026

Claude finished @samgutentag's task in 1m 21s —— View job


PR Review

  • Gather context
  • Read changed files
  • Review for typos, grammar, and formatting issues
  • Post inline comments
  • Final summary

The documentation for the Test History tab is well-structured and clearly written. The reorganization of the "Test case details" section into a tabbed layout, and the renaming of the old "Test history" section to "Status history" to avoid naming conflict, are both good improvements.

Two minor issues found (inline comments posted):

  1. flaky-tests/dashboard.md line 121 — Color inconsistency between the chart legend (Green/Red/Blue/Gray) and the row border color description (Green/Red/Blue/Orange). The gray/skipped color is missing from the row border list.

  2. flaky-tests/quarantining.md line 204 — "These views show every override, setting change, and comment..." conflates what the Events tab and Test History tab each show. The Events tab covers overrides/changes/comments; the Test History tab shows runs.

Everything else looks good — headings hierarchy is correct, tables are well-formatted, filter descriptions are accurate, and cross-references between docs are updated appropriately.

Comment thread flaky-tests/dashboard.md Outdated

<table><thead><tr><th width="188">Column</th><th>Description</th></tr></thead><tbody><tr><td>Timestamp</td><td>When the test ran, displayed in UTC.</td></tr><tr><td>Duration</td><td>How long the test took to execute.</td></tr><tr><td>Branch</td><td>The branch context. Shows "Mergequeue testing #NNN" for merge queue runs, "#NNN" for pull request runs, or the branch name for other runs.</td></tr><tr><td>Commit</td><td>The first 7 characters of the commit SHA.</td></tr></tbody></table>

Each row has a colored left border indicating the result: green for pass, red for fail, blue for quarantined, and orange for error.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The chart color legend above (lines 98–101) lists Gray = Skipped, but this row-border description mentions orange for error without including gray/skipped. The two color references are inconsistent — readers may wonder why the row borders and chart use different color sets.

Consider aligning them, e.g.:

Suggested change
Each row has a colored left border indicating the result: green for pass, red for fail, blue for quarantined, and orange for error.
Each row has a colored left border indicating the result: green for pass, red for fail, blue for quarantined, gray for skipped, and orange for error.

Comment thread flaky-tests/quarantining.md Outdated
<figure><img src="../.gitbook/assets/qurantine-individual-tests-revert.png" alt=""><figcaption></figcaption></figure>

To review a history of all quarantine changes on a test, use the **Quarantine Events** filter within the **Test History** section. This will show every override, setting change, and comment, along with the author and timestamp for each entry.
To review a history of all quarantine changes on a test, check the **Events** tab on the test details page. You can also use the **Quarantined** result filter in the **Test History** tab to see all quarantined runs. These views show every override, setting change, and comment, along with the author and timestamp for each entry.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The last sentence "These views show every override, setting change, and comment..." conflates two tabs that show different things: the Events tab shows overrides/setting changes/comments, while the Test History tab (filtered to Quarantined) shows runs. Consider splitting for clarity:

Suggested change
To review a history of all quarantine changes on a test, check the **Events** tab on the test details page. You can also use the **Quarantined** result filter in the **Test History** tab to see all quarantined runs. These views show every override, setting change, and comment, along with the author and timestamp for each entry.
To review a history of all quarantine changes on a test, check the **Events** tab on the test details page. You can also use the **Quarantined** result filter in the **Test History** tab to see all quarantined runs. The Events tab shows every override, setting change, and comment, along with the author and timestamp for each entry.

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented Mar 17, 2026

Claude finished @samgutentag's task in 1m 9s —— View job


PR Review

  • Gather context
  • Read changed files
  • Review for typos, grammar, and formatting issues
  • Post inline comments
  • Final summary

The new Test History tab documentation is well-structured and the reorganization into a tabbed layout is an improvement. The same two issues from the previous review remain unaddressed (inline comments re-posted):

  1. flaky-tests/dashboard.md line 121 — The daily runs chart legend lists Gray = Skipped, but the row border color list omits gray/skipped. Needs gray for skipped added.

  2. flaky-tests/quarantining.md line 204 — "These views show every override, setting change, and comment..." still conflates the Events tab (overrides/changes/comments) and Test History tab (runs). The final sentence should apply only to the Events tab.

Comment thread flaky-tests/dashboard.md Outdated

<table><thead><tr><th width="188">Column</th><th>Description</th></tr></thead><tbody><tr><td>Timestamp</td><td>When the test ran, displayed in UTC.</td></tr><tr><td>Duration</td><td>How long the test took to execute.</td></tr><tr><td>Branch</td><td>The branch context. Shows "Mergequeue testing #NNN" for merge queue runs, "#NNN" for pull request runs, or the branch name for other runs.</td></tr><tr><td>Commit</td><td>The first 7 characters of the commit SHA.</td></tr></tbody></table>

Each row has a colored left border indicating the result: green for pass, red for fail, blue for quarantined, and orange for error.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The daily runs chart legend above (lines ~93–101) includes Gray = Skipped, but gray/skipped is missing from this row border color list. Consider adding it for consistency:

Suggested change
Each row has a colored left border indicating the result: green for pass, red for fail, blue for quarantined, and orange for error.
Each row has a colored left border indicating the result: green for pass, red for fail, blue for quarantined, gray for skipped, and orange for error.

Comment thread flaky-tests/quarantining.md Outdated
<figure><img src="../.gitbook/assets/qurantine-individual-tests-revert.png" alt=""><figcaption></figcaption></figure>

To review a history of all quarantine changes on a test, use the **Quarantine Events** filter within the **Test History** section. This will show every override, setting change, and comment, along with the author and timestamp for each entry.
To review a history of all quarantine changes on a test, check the **Events** tab on the test details page. You can also use the **Quarantined** result filter in the **Test History** tab to see all quarantined runs. These views show every override, setting change, and comment, along with the author and timestamp for each entry.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The closing sentence "These views show every override, setting change, and comment..." still conflates the two tabs: the Events tab shows overrides/setting changes/comments, while the Test History tab only shows runs. Applying the cleaner split from the previous suggestion:

Suggested change
To review a history of all quarantine changes on a test, check the **Events** tab on the test details page. You can also use the **Quarantined** result filter in the **Test History** tab to see all quarantined runs. These views show every override, setting change, and comment, along with the author and timestamp for each entry.
To review a history of all quarantine changes on a test, check the **Events** tab on the test details page. You can also use the **Quarantined** result filter in the **Test History** tab to see all quarantined runs. The Events tab shows every override, setting change, and comment, along with the author and timestamp for each entry.

@samgutentag
Copy link
Copy Markdown
Member Author

Triage status (2026-05-05): merge conflicts with main in flaky-tests/dashboard.md and flaky-tests/quarantining.md. The test details page has shifted significantly since this PR opened. May need to re-apply changes rather than a simple resolve.

@samgutentag
Copy link
Copy Markdown
Member Author

samgutentag commented May 6, 2026

Verification status (2026-05-07): live

Verified: customers can use this. Ready to publish.

  • Eng PRs (sourced from Linear ticket TRUNK-17757 since PR body lacks refs): trunk2#3180, #3160, #3151 (all merged March 2026, merge commits still on main)
  • Linear ticket: TRUNK-17757 (referenced in title only; PR body says "See PR body below")
  • Flag: enableFlakeDetectionVnext
  • Rollout signals:
    • trunk2#3387 ("Remove enableFlakeDetectionVnext frontend feature flag and related code", merged 2026-03-27) deletes the gating flag and the legacy code path.
    • Code search for enableFlakeDetectionVnext in trunk2 returns 0 hits. Flag is gone from the codebase. Vnext is the unconditional default.

Merge readiness:

  • Branch state on GitHub: still CONFLICTING against main. Locally the rebase has been applied (worktree HEAD d41713b, 82 commits ahead of origin) but nothing has been pushed.
  • The two prior Claude review comments have already been addressed in the rebased branch:
    • The flagged row-border color list in flaky-tests/dashboard.md no longer exists. The remaining Green/Red/Blue/Gray list is the chart legend and is consistent.
    • The conflation in flaky-tests/quarantining.md is rewritten (now line 224): override/setting-change/comment is correctly attributed to the Events tab, with a separate sentence pointing to the Quarantined result filter on Test History.

Suggested next action: push the rebased branch, then merge.

Update the dashboard page to describe the new tabbed layout (Summary,
Test History, Monitors, Events) and add comprehensive documentation for
the Test History tab including the daily runs chart, filter bar, runs
table, and run detail panel.

Re-applied against current main and integrated with the branch glob
filter content from #576. Addresses prior PR review comments: row border
colors now include gray for skipped, and the quarantining wording is
scoped to the Events tab.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@samgutentag samgutentag force-pushed the sam-gutentag/test-run-history-tab branch from 427dd03 to d41713b Compare May 7, 2026 07:05
@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 7, 2026

Claude finished @samgutentag's task in 2m 47s —— View job


PR Review

  • Gather context
  • Read changed files
  • Review for typos, grammar, and formatting issues
  • Post inline comments
  • Final summary

The two issues from prior reviews are resolved in the rebased branch — the row-border color list now includes gray for skipped, and quarantining.md correctly attributes overrides/changes/comments to the Events tab and quarantined runs to the Test History tab. Nice cleanup.

Found four new items on this pass (inline comments posted):

  1. dashboard.md:91#### **Failure types** is bold-styled but every sibling H4 in the new Test History tab section (Daily runs chart, Filters, Runs table, Run detail panel) plus the adjacent #### Failure details are plain. Drop the bold for consistency.

  2. dashboard.md:163-167### Status history is at the same heading level as the tabs but isn't in the tabs bullet list at lines 70–73. Heading hierarchy implies "another tab," content implies something else. Worth either listing it as a tab or prefacing the section with where it lives in the UI.

  3. dashboard.md:151 — "Mergequeue testing #NNN" uses one-word "Mergequeue," which appears nowhere else in this docs repo (the rest of the repo uses "merge queue"). Verify whether the UI literally renders it that way.

  4. dashboard.md:132-153 — The chart legend, the Result filter list, and the row-border color list each enumerate a different set of result categories (chart: Pass/Fail/Quarantined/Skipped; filter: Pass/Fail/Quarantined; borders: adds Error). If that's accurate to product behavior, fine — but a reader will trip on it. One sentence reconciling them would help.

Branch is currently flagged CONFLICTING against main on GitHub per the most recent triage comment; the local rebase has been applied but not pushed. None of the items above are merge-blockers — the four findings are minor polish, not correctness issues.
Branch

Comment thread flaky-tests/dashboard.md

The Summary tab shows an overview of the test’s recent run results and groups past failures by unique failure type.

#### **Failure types**
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Formatting inconsistency. This H4 heading uses bold (#### **Failure types**), but its siblings under "Test History tab" — #### Daily runs chart, #### Filters, #### Runs table, #### Run detail panel — and the adjacent #### Failure details are plain. Either drop the bold here for consistency, or apply it to the others.

Suggested change
#### **Failure types**
#### Failure types

Comment thread flaky-tests/dashboard.md
Comment on lines +163 to +167
### Status history

<figure><picture><source srcset="../.gitbook/assets/test-history-dark.png" media="(prefers-color-scheme: dark)"><img src="../.gitbook/assets/test-history-light.png" alt=""></picture><figcaption></figcaption></figure>

Tests may transition between Healthy, Flaky, and Broken states multiple times over their lifetime. You can see previous status changes in the Status History view, including an explanation for why each transition occurred and which monitor triggered it.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Section placement is unclear. "Status history" is rendered at H3 — the same level as ### Summary tab and ### Test History tab — which makes it look like another tab. But it isn't listed in the tabs bullet list at lines 70–73 (Summary / Test History / Monitors / Events). Readers will reasonably ask: where does Status history actually live in the UI? Is it part of the Events tab? A separate panel on the page?

Consider either (a) adding it to the tabs bullet list at the top, or (b) prefacing this section with one sentence that locates it (e.g., "The Events tab also surfaces a Status History view, which..."). Right now the heading hierarchy implies one thing and the content implies another.

Comment thread flaky-tests/dashboard.md

The runs table displays a paginated list of individual test runs (25 per page) with the following columns:

<table><thead><tr><th width="188">Column</th><th>Description</th></tr></thead><tbody><tr><td>Timestamp</td><td>When the test ran, displayed in UTC.</td></tr><tr><td>Duration</td><td>How long the test took to execute.</td></tr><tr><td>Branch</td><td>The branch context. Shows "Mergequeue testing #NNN" for merge queue runs, "#NNN" for pull request runs, or the branch name for other runs.</td></tr><tr><td>Commit</td><td>The first 7 characters of the commit SHA.</td></tr></tbody></table>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verify product naming. "Mergequeue testing" (one word) doesn't appear anywhere else in this docs repo — every other reference uses "merge queue" (two words). If the UI literally renders "Mergequeue testing #NNN" as a single token (e.g., it's an internal label not yet rebranded), then this is correct documentation. Otherwise it should match the product's user-facing form, e.g., "Merge queue testing #NNN".

Comment thread flaky-tests/dashboard.md
Comment on lines +132 to +153
<table><thead><tr><th width="188">Filter</th><th>Description</th></tr></thead><tbody><tr><td>Result</td><td>Toggle between <strong>All</strong>, <strong>Pass</strong>, <strong>Fail</strong>, and <strong>Quarantined</strong> to show only runs with a specific outcome.</td></tr><tr><td>SHA</td><td>Filter by commit hash. Matches runs whose SHA starts with the entered text.</td></tr><tr><td>Branch</td><td>Filter by branch name. Accepts exact names or glob patterns. Use <code>*</code> to match any sequence of characters and <code>?</code> to match a single character.</td></tr></tbody></table>

Branch filter examples:

| Pattern | Matches |
|---|---|
| `main` | The branch named `main` exactly |
| `release/*` | All release branches, e.g. `release/1.0`, `release/2.3` |
| `feature-??` | Feature branches with a two-character suffix, e.g. `feature-v2` |
| `trunk-merge/*` | All merge queue branches |

All filters combine using AND logic, so you can use them together to find specific runs. For example, select **Fail** and enter a branch name to see all failures on that branch. Click the **Reset** button to clear all active filters.

Filter state is saved in the URL, so you can share or bookmark a filtered view.

#### Runs table

The runs table displays a paginated list of individual test runs (25 per page) with the following columns:

<table><thead><tr><th width="188">Column</th><th>Description</th></tr></thead><tbody><tr><td>Timestamp</td><td>When the test ran, displayed in UTC.</td></tr><tr><td>Duration</td><td>How long the test took to execute.</td></tr><tr><td>Branch</td><td>The branch context. Shows "Mergequeue testing #NNN" for merge queue runs, "#NNN" for pull request runs, or the branch name for other runs.</td></tr><tr><td>Commit</td><td>The first 7 characters of the commit SHA.</td></tr></tbody></table>

Each row has a colored left border indicating the result: green for pass, red for fail, blue for quarantined, gray for skipped, and orange for error.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mismatched result categories across three places in this section.

  • Daily runs chart legend (lines 121–124): Pass / Fail / Quarantined / Skipped
  • Result filter (line 132): All / Pass / Fail / Quarantined (no Skipped)
  • Row border colors (line 153): pass / fail / quarantined / skipped / error (adds Error)

If the product genuinely doesn't let users filter by Skipped and the chart genuinely doesn't bucket Error, this is fine — but a reader will hit it and wonder why. Worth either reconciling the lists or adding one sentence acknowledging the asymmetry (e.g., "Errored runs are included in the Fail category in the chart" or "Skipped runs aren't filterable, since they don't reflect a test outcome").

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Auto-approved: Claude code review passed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

1 participant