Skip to content

feat(mwpw-184989): multi-page agent QA review with PR-aware routing#535

Open
sanrai wants to merge 4 commits into
mainfrom
sanrai/agent-multipage
Open

feat(mwpw-184989): multi-page agent QA review with PR-aware routing#535
sanrai wants to merge 4 commits into
mainfrom
sanrai/agent-multipage

Conversation

@sanrai

@sanrai sanrai commented Jun 23, 2026

Copy link
Copy Markdown
Collaborator

Upgrades the agent QA review from a single hardcoded URL to a validated 5-page covering set, each exercising distinct CaaS code paths, with one combined comment.

  • A customer-success-stories (Left filter, OR semantics, search/sort/deep-link/bookmarks/chips/no-results)
  • B news.adobe.com (Top filter panel)
  • C adobe.com/events (event ordering + register/gating banner)
  • D adobe.com/max community (carousel)
  • E milo card-styles gallery (visual diff of all card styles)

PR-aware routing: a shared change (Card.jsx / Container / Helpers / any LESS) reviews all pages; a localized change reviews only the pages whose code paths it touches; a .github-only change runs a single smoke page. OR-filter semantics are encoded in the page A/B instructions so the agent does not flag the union-widening as a bug. a11y interaction checks deferred to a later iteration. Per-page agent time-capped; job timeout raised to 30 min.

@github-actions

github-actions Bot commented Jun 23, 2026

Copy link
Copy Markdown

🤖 Agent QA review — multi-page, interactive + visual diff (advisory, non-blocking)

Reviewed 1 page(s) on the live site with the PR build injected. Overall: PASS.

PASS — A-left-hub · 0.00% pixels changed · https://business.adobe.com/customer-success-stories.html

QA Review Report — PR #535 (feat(mwpw-184989): multi-page agent QA review with PR-aware routing)

Target URL: https://business.adobe.com/customer-success-stories.html


Pre-captured Pixel Diff

  • 0.00% pixels changed between PR build and stable. No magenta regions were present in the diff image — the PR introduces zero visual rendering changes to this page.

PR Code Changes Summary

This PR refactors the agent-review.mjs CI script from single-page to multi-page QA review:

  • Removes single hardcoded BASE_URL / CAASVER env vars
  • Introduces a PAGES array to drive per-page diff + review
  • Adds OR-filter documentation string (OR constant)
  • Minor code style cleanup (variable renames, comment rewrites)
  • No changes to any CaaS front-end code (no card, filter, search, sort, or pagination logic changed)

Live Page Observations

Initial Load

  • Page loaded successfully at https://business.adobe.com/customer-success-stories.html
  • Hero section with featured customer stories rendered correctly
  • Breadcrumb navigation (Home / Resource Center / Customer Success Stories) present and correct
  • Global nav (Adobe for Business, Products, AI, Industries, Roles, Resources, Support, Get Started, Sign In) all visible and properly rendered

Filter Panel

  • "Refine Your Results" left panel rendered correctly with "Clear" link
  • Result count displayed: 344 results — visible and correctly positioned
  • Sort dropdown "Date: (Newest To Oldest)" present and functional-looking
  • Three filter groups present: Products, Content Type, Industry — all collapsed by default (correct)
  • Search input box present with placeholder "Search Here" and clear (×) button

Filter Expansion Test

  • Clicked "Products" filter header → expanded correctly, revealing a full alphabetical list of product checkboxes (Acrobat, Acrobat Sign, Advertising, Analytics, Campaign, Commerce, Connect, Creative Cloud, Customer Journey Analytics, Experience Manager Assets/Forms/Guides/Sites, GenStudio for…)
  • Checkboxes rendered with correct styling, no visual overflow or truncation
  • No console errors observed during interaction

Card Grid

  • Cards rendered in a 2-column grid layout (correct for left-filter pages)
  • Card images, titles, descriptions, and CTA buttons ("Read article", "Read now") all present and correctly aligned
  • Cards visible: Walt Disney Imagineering, Genpact, Best Buy, Lumen — all properly structured

Issues Found

None. The PR makes no changes to any front-end CaaS component. The 0.00% pixel diff confirms no visual regression. The page loads, filters expand, cards render, and the result count displays correctly.


Verdict

The PR is a CI/tooling-only change (agent-review.mjs refactor for multi-page QA support). It has no impact on the rendered customer-success-stories page. All visible UI elements — filter panel, search, sort, card grid, result count — appear correct and unbroken.

Per-page screenshots + diffs in the workflow run.

@github-actions

Copy link
Copy Markdown

AI Code Review

Critical Issues

1. Sequential page loop with shared CDP browser — race condition / connection leak

In captureDiff, for each page the code calls chromium.connectOverCDP(CDP) and then browser.close(). The comment from the original code explains why this matters:

"Disconnect THIS CDP client (does not kill the persistent Chrome) so the qa-runner subprocess is the sole Playwright client — two concurrent clients on one browser cause navigation timeouts and hangs."

However, in the new loop:

for (const page of selected) {
  const pct = await captureDiff(page.url, page.id);   // closes CDP client
  // ... then spawnSync qa-runner ...                  // qa-runner connects
  // ... qa-runner finishes ...
  // next iteration: captureDiff connects again
}

This is sequential so the logic is preserved — but browser.close() on a CDP-connected browser in Playwright disconnects the client AND may terminate the underlying browser process (behavior varies by Playwright version). If the persistent Chrome is killed after the first page, subsequent iterations fail silently (caught by try/catch), producing pct = 'n/a' for all remaining pages with no screenshots. This was always a latent risk in the original but is now multiplied across N pages.

Fix: Use browser.disconnect() instead of browser.close() to safely detach without killing the persistent Chrome, consistent with the original intent expressed in the comment (which was deleted in this PR).

2. AGENT_TIMEOUT_MS reduced from 600 s to 180 s per page, but up to 5 pages can run

With selected = PAGES (shared file change), 5 pages × 180 s = 900 s = 15 minutes of agent time, plus capture overhead. The workflow timeout-minutes was only raised to 30 (1800 s). This is borderline but likely fine for the common case — however, if any spawnSync times out and emits SIGKILL, the subprocess doesn't clean up its CDP connection, potentially leaving the browser in a bad state for subsequent iterations.

3. REPORT_OUT shadowed inside loop — unlinkSync on previous iteration's file

const REPORT_OUT = `/tmp/pr-review-${page.id}.json`;
try { unlinkSync(REPORT_OUT); } catch {}

Each iteration uses a unique filename, so no actual collision. This is fine. (Not an issue — noting for completeness.)

4. spawnSync with stdio: 'inherit' inside a for loop — zombie Chrome connections

spawnSync is synchronous and blocks. If qa-runner-v2.mjs connects to CDP and then crashes (non-timeout), the Chrome session may be left with an open page. The next captureDiff call then gets browser.contexts()[0] which could be a stale/broken context from the crashed runner rather than a clean one.

Fix: After each spawnSync, consider navigating or closing any lingering pages before the next captureDiff connects, or always create a fresh context (browser.newContext()) rather than reusing contexts()[0].

Summary

Severity Issue
High browser.close() may kill the persistent Chrome; use browser.disconnect()
Medium Crashed/killed qa-runner leaves stale CDP context; next captureDiff reuses it

@github-actions

Copy link
Copy Markdown

AI Code Review

Critical Issues

1. Sequential CDP connections cause browser resource exhaustion / hangs

In captureDiff(), you call chromium.connectOverCDP(CDP) and then browser.close() for each page in the loop. The comment in the old code explicitly warned that two concurrent CDP clients cause timeouts, but now you're serially connecting/disconnecting across multiple pages. The browser.close() call on a CDP-connected browser (as opposed to a browser launched by Playwright) typically kills the underlying browser process, not just disconnects the client. Subsequent iterations will fail to connect since the persistent Chrome is gone.

Fix: Connect once before the loop, disconnect (not close) after, or use browser.disconnect() instead of browser.close().

2. spawnSync timeout is per-page at 180s but overall job timeout is 30min

With up to 5 pages × (180s agent + ~37s capture) ≈ ~18 minutes just for agents, plus capture overhead and sequential execution, you're cutting it very close to the 30-minute job timeout. If any single page's capture hangs (e.g., waitForTimeout(22000) for page E plus 45s goto timeout × 2 shots), the job will be killed mid-run with no comment posted—silent failure, no PR comment.

Fix: Add a top-level try/finally that always posts whatever sections have been collected, or reduce per-page timeouts.

3. AGENT_TIMEOUT_MS env var read inside the loop but set once externally

timeout: Number(env('AGENT_TIMEOUT_MS', '180000')),

The default changed from 600000ms (10 min) to 180000ms (3 min) per page. If the env var is set to the old value (600s) expecting a single agent run, all pages will now each get 600s = potentially 50+ minutes total, guaranteeing a job timeout. This is a behavioral change with no guard—callers using the old env var will silently get the wrong behavior.

4. Data loss: changed variable computed from meta.files paths only—no content

const changed = (meta.files || []).map((f) => f.path).join('\n');

The triggers regex is tested against changed (file paths), but Card\.jsx would only match if the literal string Card.jsx appears in the path. Since most paths are like src/js/Cards/Card.jsx, this works—but Helpers\/ with an escaped slash will never match a file path string because the path uses / not \/. The regex /Helpers\//i tested against a plain string with forward slashes works fine in JS (no escaping needed), but Helpers\/ in the regex with an escaped forward slash still matches a literal /—this is actually okay in JS. However, SHARED regex /Card\.jsx|Container\.jsx|Helpers\/|app\.jsx|\.less/i and page triggers will silently produce wrong selected if meta.files is empty or the API call fails (caught with empty catch {}), falling through to selected = [PAGES[0]] smoke only—potentially missing all relevant pages with no warning logged.

5. Race condition: REPORT_OUT path collision if pages have similar IDs

Not a race here since it's sequential, but page IDs must be filesystem-safe. The current IDs (A-left-hub, etc.) contain hyphens which are fine, but this is fragile if IDs are ever parameterized from untrusted input (e.g., PR branch names injected elsewhere).

Most Critical Action Items:

  1. Replace browser.close() with browser.disconnect() in captureDiff() to avoid killing the persistent Chrome between iterations.
  2. Add a top-level try/finally to post partial results if the loop is interrupted before completion.

@github-actions

Copy link
Copy Markdown

AI Code Review

Issues Found

1. Sequential per-page agent runs may silently starve on timeout

Each page now runs spawnSync with AGENT_TIMEOUT_MS defaulting to 180000 ms (3 min). With up to 5 pages selected, total wall time can reach ~15+ minutes (5 × 3 min agents + capture time), but the job timeout-minutes is only 30. If isShared is true, all 5 pages run. Late pages will be killed mid-run by the workflow timeout with no comment posted at all — not even a partial result — because sections won't accumulate enough entries to post the combined comment.

Fix: Either cap selected pages, or ensure the comment is posted even on early exit (e.g., move the comment-posting into a finally block or post after each page).

2. CDP browser left connected across sequential captureDiff calls

captureDiff calls browser.close() at the end, but only inside the try block. If an exception is thrown after connectOverCDP but before browser.close() (e.g., during screenshot), the browser connection leaks. More critically, qa-runner-v2.mjs also connects via CDP immediately after. With 5 pages, each loop iteration calls captureDiff then spawnSync(qa-runner-v2.mjs) — if captureDiff throws partway, the orphaned CDP client may block the runner's connection.

Fix: Use a try/finally to guarantee browser.close().

3. PNG.bitblt source bounds are not validated — potential silent data corruption

In the non-segmented path, PNG.bitblt(a, ca, 0, 0, w, Ht, 0, 0) copies w×Ht pixels from a, but a's actual dimensions may differ from w (which is Math.min(a.width, b.width)). While w is min of widths, if a.width > w the blit reads from a larger source into a smaller destination buffer — this is the same bug as the original code, but it's still present. The segmented path has the same issue with bh.

This is low-risk in practice since pixelmatch uses the destination buffer size, but PNG.bitblt can throw or corrupt if source region extends past source bounds.

4. changed variable used before the diff try/catch completes

let diff = ''; try { diff = gh(['pr', 'diff', PR, '-R', REPO]); } catch {}
const changed = (meta.files || []).map((f) => f.path).join('\n');

changed depends on meta.files, not diff, so this is fine — but SHARED.test(changed) and p.triggers.test(changed) test against the newline-joined file paths. If changed is empty (e.g., API failure returned empty files), selected falls back to [PAGES[0]], which is correct. Not a bug, but worth noting the fallback is silent.

5. No deduplication of diff images between pages when all 5 pages run

Each captureDiff writes ${OUT}/${tag}-pr.png and ${OUT}/${tag}-stable.png. Tags are unique per page (page.id), so no collision — not a bug. However, for the gallery page with wait: 22000, the sequential loop adds significant time that is not accounted for in the timeout analysis from issue #1.


Highest priority fix: Wrap the for loop in a try/finally to post sections (even if partial) on any early exit, and ensure browser.close() is in a finally inside captureDiff.

@github-actions

Copy link
Copy Markdown

AI Code Review

Critical Issues

1. CDP Resource Leak / Broken Isolation Between Pages

In captureDiff, a new browser context is created (or the existing one reused) via:

const c = browser.contexts()[0] || (await browser.newContext());

At the end of the function, browser.close() is called. However, this function is called in a loop (for (const page of selected)). On the second iteration, chromium.connectOverCDP(CDP) reconnects to the same persistent Chrome instance — but the comment in the old code notes that browser.close() "does not kill the persistent Chrome." The concern here is correctness: if browser.close() does close the context/pages created within it, the CDP connection state between iterations may be inconsistent. More critically, the page routes registered via p.route() are scoped to the page, but the context (c) may be the shared contexts()[0]. If a context is reused across calls and not properly cleaned up, route handlers from one page iteration could leak into the next, causing the wrong build to be injected on subsequent pages.

2. Per-Page Timeout Drastically Reduced — Likely to Cause Silent Failures

The per-page agent timeout was changed from 600000ms (10 min) to 120000ms (2 min), while the number of pages can be up to 5 (all pages if shared files changed). The agent instructions for page A alone specify: apply filter, search, paginate, check URL, bookmark, click card — plus load screenshot, run axe, get console errors. 2 minutes is almost certainly insufficient for this scope.

The fallback when timed out is a partial report with verdict = 'UNKNOWN', which rolls into overall = 'MIXED' (not FAIL). This means real regressions discovered during a partial run will be silently downgraded from FAIL to MIXED.

3. spawnSync with stdio: 'inherit' in a Loop — Blocks but No Error Surfacing

spawnSync is used correctly for sequential execution, but run.error is only checked for ETIMEDOUT. If the subprocess fails for other reasons (missing module, JS error, non-zero exit), the error is silently swallowed — report stays as '(no report produced)' and verdict stays 'UNKNOWN'. This masks broken runner invocations.

Fix: Check run.status !== 0 && !timedOut and log/report accordingly.

4. Workflow Timeout May Still Be Insufficient

With 5 pages × (15s wait × 2 shots + 2s + diff computation + 120s agent) ≈ ~12 min just for the happy path, plus GitHub API calls, the 30-minute job timeout is tight and could cause the workflow to be killed before posting the comment, resulting in no PR feedback at all (silent failure).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant