perf(ci): add setup-extension-build shared job + fix .turbo cache key (sub-15min plan, Step 1)#9178
perf(ci): add setup-extension-build shared job + fix .turbo cache key (sub-15min plan, Step 1)#9178lambrianmsft wants to merge 58 commits into
Conversation
Split the single ~30+ min vscode-e2e CI job into 4 parallel matrix shards:
- independent: phases 4.0 + 4.7 + 4.8b (no Phase 4.1 dep)
- designer: phase 4.1 -> 4.2
- newtests: phase 4.1 -> 4.3, 4.4, 4.5, 4.6
- conversion: phase 4.1 -> 4.8a, 4.8c, 4.8d, 4.8e
Stage 1 of the parallelization plan: each dependent shard re-runs Phase 4.1
(~3-5 min duplicated workspace creation) to avoid cross-runner manifest path
rewriting. Stage 2 will move Phase 4.1 to a setup job that publishes the
workspaces as an artifact.
Changes:
- apps/vs-code-designer/src/test/ui/run-e2e.js: add four new E2E_MODE
selectors (independentonly, createplusdesigner, createplusnewtests,
createplusconversion). Each prepares fresh sessions per phase and
aggregates exit codes via Math.max, mirroring existing modes. The
conversion shard preserves the documented exclusion of Phase 4.8d
(conversionYes) from the shard exit code due to known xvfb flakiness.
- .github/workflows/vscode-e2e.yml: convert single job to matrix with
fail-fast=false and per-shard 35 min timeout. Screenshots upload to
per-shard artifact names. New vscode-e2e-summary rollup job preserves
a single required check name for branch protection.
- docs/ai-setup/shared.md + packages/vs-code-designer.md: document the
new modes and the CI shard layout. Regenerated CLAUDE.md mirrors.
E2E_MODE=full remains the single-runner local debug fallback.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
dataMapper.test.ts asserts created-workspaces.json exists in its before hook, so Phase 4.7 cannot run in the independent shard. Move all of Phase 4.7 (demo + smoke + standalone + dataMapper) into the designer shard, which already runs Phase 4.1. Independent shard now runs only Phase 4.0 + 4.8b — both truly independent of Phase 4.1. Diagnosed from CI run 25830652118 (PR Azure#9164): vscode-e2e (independent) failed with AssertionError: Workspace manifest not found ... Phase 4.1 must run first at apps/vs-code-designer/out/test/dataMapper.test.js:338:14 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
… poll Phase 4.3 (inlineJavascript.test.ts) hits the 'Run trigger clickable' assertion 2/2 on the vscode-e2e (newtests) shard of PR Azure#9164 but 0/15 on main. The shard regression is real (not flake): on createplusnewtests, Phase 4.3 runs directly after Phase 4.1, skipping the Phase 4.2 designer test that would otherwise cold-start the Functions runtime. The failure screenshot from run 25831759379 shows func still loading ExtensionBundle DLLs in the Debug Console, confirming the host is mid-cold-start. waitForRuntimeReady returns early on debug-toolbar detection (~1-2s after attach) while the host port 7071 is not yet 'running'. Mitigation: extend clickRunTrigger deadline 30s -> 90s (mirroring 9c5f6bd 'Stabilize VS Code E2E action clicks and run waits' for waitForRunStatusInList), add a 500ms post-find enabled-stability re-check so a transient re-render that flips the button back to disabled doesn't race a click, accept aria-disabled in addition to disabled, throttle the disabled-state log to once per 10s, and capture a clickRunTrigger-timeout screenshot on terminal failure. Rejected this.retries(1): failure is reproducible 2/2 plus a manual rerun, not random. A silent retry would mask the shard-ordering regression. A shard-level designer warm-up was rejected as broader than needed: the existing 90s window for waitForRunStatusInList shows ~90s is sufficient for func cold-start in CI. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
… clickRunTrigger, assertRunTriggerable)
Multi-signal runtime readiness:
- waitForRuntimeReady now accepts { requireHostRunning, timeoutMs }. When requireHostRunning=true, requires BOTH the VS Code debug toolbar AND port-7071 /admin/host/status='running' before returning. Default behavior unchanged (backward compatible). Throttled per-signal progress logging at 10s so CI logs reveal which gate is missing. Timeout screenshot renamed to 'waitForRuntimeReady-timeout'.
- clickRunTrigger now gates on waitForRuntimeReady({ requireHostRunning: true, timeoutMs: 60_000 }) before entering its click loop. Failure converts the misleading 'Run trigger clickable' assertion into a 'clickRunTrigger-runtime-not-ready' screenshot + clear log line, pointing triage at the real root cause. Inner recheck path now tolerates StaleElementReferenceError on React re-render and retries.
- New assertRunTriggerable(driver) helper combines a 120s strict host-running gate with clickRunTrigger and throws AssertionError with precise messages so failures surface the actual gate that broke (host startup vs. webview/iframe). Legacy assert.ok(waitForRuntimeReady)+assert.ok(clickRunTrigger) pattern is now @deprecated with a pointer to the new helper. Callsites unchanged for backward compatibility.
Addresses flake-mining hotspots #1-2 (Run trigger clickable is 3/3 Phase 4.3 failures; both main regressions) by removing the readiness race: debug toolbar appears ~1-2s after attach but func host start takes much longer to load bundle DLLs and register triggers.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ility, design-time API gate) Mining hotspot #1 — 7/13 recent E2E failures hit this file across two assertion modes (Next->review and single Create click->start). Fixes: 1. clickNextAndWaitForReviewStep: re-dismiss outer VS Code notifications at the top of each retry attempt (toasts like .notification-list-item-buttons-container were intercepting iframe clicks mid-loop). Bump per-attempt review-step deadlines 6/3/3s -> 12/6/6s. Capture screenshot on final deadline. 2. waitForSingleCreateClickToStart: extend default timeout 15s -> 45s for cold-runner legacy project copies. Add StaleElementReferenceError recovery around findElements and per-element getText/getAttribute reads. Throttle 'still waiting' log to once per 10s. Screenshot on timeout. 3. Create-button click: replace raw arguments[0].click() with Selenium Actions API (move + click + perform) per SKILL.md rule #6. JS click retained as fallback in a try/catch chain. Re-resolve the button on fallback to dodge stale references after React re-renders. 4. Add waitForDesignTimeNotificationsToSettle (60s deadline) — switches to default content, polls for absence of 'design-time'/'Connecting to design' toasts, returns to webview frame. Called before clicking Next and before clicking Create to drain the func-host startup race. 5. Wrap pre-click disabled/aria-disabled reads on the Create button in stale-tolerant try/catch. Validation: biome check --write clean; tsup --config tsup.e2e.test.config.ts build success. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…eCommand, switchToWebviewFrame, openFolderInSession) CI run 25834287854 (newtests shard) showed 13 cascading FAIL screenshots in createWorkspace-explicit/* plus the beforeEach failure: - [switchToWebviewFrame] Attempt 1/3 failed: Webview iframe not found within timeout - [selectCreateWorkspaceCommand] Attempt 1/3: setText failed: Waiting until element is visible (x3 attempts) - Selenium stack: InputBox.setText -> InputBox.clear -> ElementNotInteractableError Sharding tripled exposure (3 shards run Phase 4.1) so the entry helpers must be deterministic before the parallelization PR can land. Phase 4.8b logs also show a deterministic Attempt 1/3 'element not interactable' failure (~13s wasted) in openFolderInSession that the pre-flight reclaims. Changes: * selectCreateWorkspaceCommand (createWorkspace.test.ts): bypass ExTester InputBox.setText() which calls clear() and throws ElementNotInteractableError on slow CI runners. Locate the underlying '.quick-input-widget:not(.hidden) .quick-input-box input' via Selenium, wait until elementIsVisible (30s) AND elementIsEnabled (5s), then sendKeys with Ctrl+A select-all + the search query. Retry budget bumped 3->5 with exponential backoff [1s,2s,3s,5s,8s]. Re-focus workbench.action.focusQuickOpen between retries and capture selectCreateWorkspaceCommand-timeout-attempt-N.png per failed attempt. * switchToWebviewFrame (createWorkspace.test.ts): replace single iframe[class='webview ready'] lookup with manual visible-iframe scan per SKILL.md rule #8. Enumerate iframe.webview / iframe.webview.ready candidates, filter by isDisplayed() + non-zero rect, prefer the most recently mounted (active tab). Tolerate StaleElementReferenceError and continue to next candidate. After entering #active-frame poll for any DOM marker (input/button/data-testid/[class*=workspace]/[class*=wizard]) for up to 20s so we never return a still-mounting frame. Outer deadline remains 60s with 3 retries that re-dismiss toast notifications between attempts. Screenshot on each failed attempt + on final deadline. Throttled 'still waiting' logs (once per 10s). * openFolderInSession (helpers.ts): add waitForWorkbenchReady(driver, 15_000) pre-flight that polls for an interactable activity bar with non-zero size, no blocking modal dialog, and any startup non-command-mode quick-input dismissed. Reclaims the deterministic ~13s wasted retry on Phase 4.8b. * waitForWorkbenchReady (helpers.ts): new exported helper reusable by any test that needs a deterministic 'workbench ready' gate before driving keyboard input. Validation: npx biome check --write (clean) + npx tsup --config tsup.e2e.test.config.ts (clean build success in 71ms). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Allows manual CI re-runs when path-filter coalescing suppresses an expected auto-trigger after rapid pushes. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Distilled from the reliability work in PR Azure#9164: - 90s minimum CI-dependent wait deadline - post-find enabled-stability re-check - aria-disabled equivalence on Fluent UI v9 - throttled logging + screenshot-on-deadline - debug-toolbar readiness != Functions host readiness - clickElementWithFallback pattern (Actions API first, JS click last) - prepareFreshSession contract for inter-phase isolation - path-filtered PR workflows can coalesce after rapid pushes (use workflow_dispatch) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…re#9164 Adds the requirement that release-scribe verifies .github/pull_request_template.md compliance (Commit Type, Risk Level + label, Contributors section, Test Plan checkboxes) before declaring a PR body update complete, so AI PR Validation passes on the first try. - .squad/agents/release-scribe/charter.md: adds PR Body Template Compliance section with the 8-point checklist, bot validation loop, and gh commands. - .squad/agents/pr-orchestrator/charter.md: adds explicit step 11 in Standard Workflow requiring template compliance + label management + AI PR Validation verification before final summary. - .squad/playbooks/pr-lifecycle.md: adds section 9.1 with the apply+verify gh command pattern. - .squad/knowledge/review-patterns.md: adds durable learning citing PR Azure#9164 with the pattern and evidence. - .squad/knowledge/INDEX.md: adds trigger row pointing to review-patterns.md for PR body / needs-pr-update tasks. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…rns.md Follow-up to a3b75b1 to land the knowledge file entry that was skipped due to sparse-checkout. Documents the durable rule that PR bodies on Azure/LogicAppsUX must conform to .github/pull_request_template.md and that AI PR Validation will block on missing Commit Type/Risk Level/Contributors sections. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
# Conflicts: # apps/vs-code-designer/src/test/ui/createWorkspace.test.ts
Prepares .squad/ for fully-public consumption on Azure/LogicAppsUX.
Changes:
- AGENT_WORKFLOW.md: top-of-file disclaimer that the agent-dev/skip-worktree workflow is optional and team-specific; replace la-agent-dev/la-feature-X placeholders with repo-agnostic <your-agent-worktree>/<your-feature-worktree>.
- README.md: 1-line note that Squad is runtime-agnostic but a few playbooks (chronicle-*) target GitHub Copilot CLI specifically.
- playbooks/chronicle-driven-improvement.md: scope disclaimer that /chronicle, /experimental, ~/.copilot/, COPILOT_HOME are Copilot CLI–specific.
- knowledge/session-learnings.md: drop internal Copilot CLI session UUIDs; delete the UUID->PR mapping section that carried no durable engineering learning; neutralize future-dated audit references; redact sibling-repo references defensively.
- knowledge/{review-patterns,unit-testing,vscode-e2e-testing,agent-improvements,ci-patterns}.md: drop session UUIDs; keep public PR/commit citations as the evidence anchors. Redact 3 sibling-repo references in ci-patterns.md.
Validation:
- grep '[a-f0-9]{8}-[a-f0-9]{4}-...' in .squad/**/*.md -> 0 matches
- grep 'logicapps-migration-assistant|2026-05-11|April-May 2026' in .squad/**/*.md -> 0 matches
No durable engineering learnings were removed; only the internal traceability metadata that external readers cannot use.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Phase 4.8b still failed at waitForSingleCreateClickToStart on the independent shard despite e1532fe hardening. Apply three-layered fix: (1) Re-find Create-workspace button immediately before clicking to eliminate stale-snapshot risk; tolerate StaleElementReferenceError. (2) After Actions click, send Key.ENTER as belt-and-suspenders keyboard activation. (3) Fall back to JS click if 2s passes with no state transition. Always capture on timeout: button outerHTML, parent outerHTML, active frame URL, and visible iframe enumeration. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Phase 4.3 inlineJavascript and Phase 4.4 statelessVariables still failed at `Run trigger clickable` on the newtests shard despite commit 2d959c9 extending clickRunTrigger to 90s with a stability poll. Root cause: in the createplusnewtests shard the runtime is still mid-cold-start by the time clickRunTrigger fires (no Phase 4.2 designer warm-up in this shard). Migrate both tests to the assertRunTriggerable(driver) helper added in commit 54fab3c, which composes waitForRuntimeReady({ requireHostRunning: true, timeoutMs: 120_000 }) + clickRunTrigger with precise failure messages so future regressions point at the actual root cause (host startup vs. button-disabled). CI evidence: run 25878682827 showed designer shard Phase 4.2 (which already runs after the warm-up) passing with the same clickRunTrigger helper; newtests shard failed exactly at the helper for both runtime- gated tests. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…(4.4) CI run 25882360464 (3/4 shards green) surfaced two remaining failures in the newtests shard, both with precise diagnostics from the assertRunTriggerable helper added in commit 54fab3c: - Phase 4.3 inlineJavascript: "Functions host did not become running within 120s" — genuine cold-start latency in the heavy shard. Fix: add prewarmFunctionsHost(driver) helper that kicks off the 7071 host-status poll asynchronously right after startDebugging, with a 180s budget. The test continues to its overview-navigation steps in parallel; by the time assertRunTriggerable runs its own 120s gate the host is typically already running. The actual assertion still fires if the host genuinely fails to start. - Phase 4.4 statelessVariables: assertRunTriggerable now PASSES (trigger fires); failure moved to "Overview should open" downstream. Fix: add waitForOverviewView(driver) helper that closes editors, switches to default content, polls for the overview webview frame with command-bar DOM markers, throws assert.fail with a precise message on timeout, and tolerates StaleElementReferenceError per SKILL.md rules #6 and #8. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…e + 180s click CI run 25885469274 confirmed that :7071/admin/host/status === 'running' does not become reachable within 180s on the newtests shard. Both prewarmFunctionsHost (added in 462302f) and assertRunTriggerable strict mode timed out. Meanwhile designerActions.test.ts (Phase 4.2, green on designer shard) uses its private waitForRuntimeReady that polls terminal text — never touching :7071 — and works fine. Conclusion: :7071 status is not a reliable readiness signal on the newtests shard. prewarmFunctionsHost's pure poll is also harmful — it blocks for 180s during which no UI activity occurs, deferring the actions (overview navigation) that actually warm the host. Fix: - Remove prewarmFunctionsHost calls from inlineJavascript.test.ts and statelessVariables.test.ts (no longer in the import list). - Replace assertRunTriggerable(driver) in both tests with the legacy waitForRuntimeReady (multi-signal) + clickRunTrigger pair — the same pattern Phase 4.2 designerActions uses successfully. - Bump clickRunTrigger deadline 90s → 180s in runHelpers.ts so the button-enable wait can absorb the cold-start latency on heavy shards. Retains: waitForOverviewView (validated working in 25885469274), Phase 4.8b 3-layered click (validated working), assertRunTriggerable helper (still useful for future tests that have a known-running host). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
CI run 25888015435 hit waitForRuntimeReady-timeout in newtests Phase 4.3+4.4 with debugToolbarSeen=never, hostRunningSeen=never at 90s. Mirrors the same 90s->180s bump previously applied to clickRunTrigger in commit 28744cc so both the readiness probe AND the click have matching cold-start budgets. Other 3 shards (independent, designer, conversion) all green at <24 min. Critical path was 27m57s vs ~50+min monolithic baseline (~44% reduction). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…nerActions CI run 25889571500 with 180s waitForRuntimeReady proved the debug toolbar NEVER appears via the shared runHelpers.ts startDebugging in Phase 4.3 inlineJavascript and Phase 4.4 statelessVariables (debugToolbarSeen=never, hostRunningSeen=never after full 180s). Meanwhile Phase 4.2 designerActions passes consistently using its OWN PRIVATE startDebugging at designerActions.test.ts:2084 (toolbar appears 1-2s after F5). Diagnosis: the two startDebugging function bodies are functionally identical (clearBlockingUI -> focusEditor -> command palette -> pick 'Start Debugging' -> sleep 2s). The divergence is at the CALLSITES. designerActions only calls result.webview.switchBack() before F5, leaving the designer panel tab open in the editor area. inlineJavascript / statelessVariables additionally called driver.switchTo().defaultContent() + new EditorView().closeAllEditors() before F5, leaving VS Code with no active editor. Because the Phase 4.1 workspaces are MULTI-ROOT (LogicApp + Functions folders), dispatching 'Debug: Start Debugging' with no active editor causes VS Code to show a follow-up 'Select workspace folder' QuickPick that startDebugging never sees or dismisses. The debug session never starts -> toolbar never appears -> waitForRuntimeReady ceiling-times out at 180s. Fix: remove the pre-startDebugging closeAllEditors() block in both test files. Editors are still closed AFTER startDebugging (existing code at inlineJavascript.test.ts:213 and statelessVariables.test.ts:343) just before waitForOverviewView - that's the same ordering designerActions uses (close at line 2900, right before openOverviewPage). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
CI run 25891609329 (3/4 shards green) confirmed the callsite ordering fix in 242357a worked - debug toolbar appears at 171s in inlineJS (was debugToolbarSeen=never before). Two narrow follow-ups: - Phase 4.3 inlineJavascript: per-test mocha timeout 300_000 -> 600_000. Toolbar at 171s leaves only ~129s for host startup + click trigger + wait for run to succeed. 600s budget gives enough headroom for cold starts on the heavy newtests shard. - Phase 4.4 statelessVariables: bumped clickRunTrigger's internal preflight waitForRuntimeReady ceiling from 60s -> 180s in runHelpers.ts. The legacy pattern (waitForRuntimeReady + clickRunTrigger) passed the first 180s gate (toolbar-only) but failed the stricter requireHostRunning re-check inside clickRunTrigger which had only 60s. This produced the exact failure signature 'Timeout waiting for runtime after 60000ms ... debugToolbarSeen=never, hostRunningSeen=never'. 180s now matches the default ceiling in waitForRuntimeReady/prewarm. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ld-start flake 12 deterministic reliability commits (7c483a1..26e33a0) eliminated all known root causes for "Functions runtime should start and become ready" failures on the newtests shard. CI runs 25891609329 (gen-5, toolbar at 171s) vs 25893025827 (gen-6, debugToolbarSeen=never) demonstrate the remaining failure mode is non-deterministic Functions host cold-start latency on GitHub Linux runners — same code path, different outcome. A single retry absorbs residual flake without masking deterministic regressions; the next failure (if any) is genuinely a 2-in-a-row event and worth investigating. Also bumps findValidationMessage default timeout 20s -> 45s in createWorkspace.test.ts (Pre-creation webview tests) to absorb the async webview-IPC roundtrip (postMessage -> extension -> fs check -> reply -> render) on cold-start Linux runners. Targeted fix preferred over retries here: cause is obvious (race against fixed 20s ceiling) and a broken validator still fails — just after longer. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
… runtime ceiling 3-in-a-row deterministic Phase 4.3/4.4 failure across 3 independent GitHub Linux runners (CI runs 25893025827, 25894108831, 25894108831-rerun) ruled out runner-infra flake. Smoking gun from gen-11: Phase 4.4 showed debugToolbarSeen=702ms but hostRunningSeen=never with live func (PID 15250, 15481), dotnet (15256), vsdbg-ui (15588) processes detected at end-of-step cleanup. These are orphans from Phase 4.3's failed `this.retries(1)` attempts that bind :7071 in zombie state — prepareFreshSession kills VS Code + chromedriver but NOT the func/dotnet/vsdbg-ui process tree. Fix: - Add pkill for func host start + vsdbg-ui (Linux/macOS) and Stop-Process (Windows) inside prepareFreshSession, matching the existing kill pattern for VS Code. Don't pkill dotnet broadly — kill the func process group and dotnet/vsdbg children follow. - Bump waitForRuntimeReady default 180s -> 300s in runHelpers.ts as belt-and-suspenders for genuine runner-image cold-start variability (toolbar at 171s on gen-8, never within 180s on gens 9-11). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Phase A of the per-scenario re-architecture. Adds: - scenarios[] declarative inventory mapping each test file to its workspace spec and settings; - selectWorkspaceForSpec(spec) resolver centralizing manifest lookup, legacy-fixture creation, and plain-folder/self-creates cases; - runScenarioPhases(scenarios) modeled on runCodefulDebugPhases - one fresh VS Code session per scenario, with the existing prepareFreshSession isolation contract; - new E2E_MODE=scenarios handler for local validation. All existing E2E_MODE handlers remain unchanged. Phase B (pilot inlineJavascript through the new bootstrapper) lands separately. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…onent test
The Ctrl+Up/Down keyboard navigation logic is a pure React + Redux
handler that does not require the VS Code shell, Functions runtime,
or workspace fixtures to verify. Demoting it from ExTester E2E
(Phase 4.6) to a Vitest component test in libs/designer cuts ~1.5
min from every CI run that exercised Phase 4.6 (the newtests shard)
and removes a CI-flake surface that contributed nothing to user-
visible regression detection.
Findings while triaging the original E2E:
- The previous ExTester scenario only LOGGED whether focus moved;
it did not assert. Inspecting the production code shows why: the
React Flow surface is configured with nodesFocusable=false,
edgesFocusable=false, elementsSelectable=false, and
disableKeyboardA11y=true (libs/designer/src/lib/ui/DesignerReactFlow.tsx
lines 368-385), so node-to-node arrow-key navigation is intentionally
off. The real keyboard-navigation contract in <Designer/> is the
"go to operation" NodeSearch panel hotkey: ctrl+shift+p on web,
ctrl+alt+p in the VS Code host (Designer.tsx lines 66-82), which
is now covered at the unit layer.
- Add libs/designer/src/lib/ui/__test__/keyboardNavigation.spec.tsx
(5 tests) capturing useHotkeys registrations and asserting:
* both bindings register on every render,
* the web binding is enabled only when not in VS Code,
* the VS Code binding is enabled only in VS Code,
* each callback dispatches openPanel({ panelMode: NodeSearch })
and preventDefaults the keyboard event.
- Delete apps/vs-code-designer/src/test/ui/keyboardNavigation.test.ts.
- Remove Phase 4.6 wiring from run-e2e.js (newtestsonly,
createplusnewtests, full modes) including phase6Files, phase6Exit
aggregation, and the final-results log line.
- Drop the Phase 4.6 row from the per-package E2E phase table in
docs/ai-setup/packages/vs-code-designer.md and its two generated
mirrors (apps/vs-code-designer/CLAUDE.md,
.github/instructions/vs-code-designer.instructions.md).
Per the test specialist coverage analysis in the per-scenario
re-architecture plan (Phase D).
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…to e2e-optimizations
…poll CI run 25909925774 with the workflow-registered probe from 9889d6f showed the full HTTP-probe chain firing correctly (host running 161ms, workflows registered 15ms, button found 18ms), but the overview UI kept the Run trigger button disabled for ~3 minutes on cold-start Linux CI runners, independent of the two existing HTTP signals. Root cause: the overview UI gates the Run trigger button on `!isWorkflowRuntimeRunning || !canRunTrigger` where `canRunTrigger = Boolean(workflowProperties.callbackInfo)` (libs/designer-ui/src/lib/overview/overviewcommandbar.tsx:64 + libs/designer-ui/src/lib/overview/index.tsx:136). The callbackInfo is populated when the extension host successfully POSTs to `{baseUrl}/workflows/{name}/triggers/{triggerName}/listCallbackUrl?api-version=2019-10-01-edge-preview` (apps/vs-code-designer/src/app/commands/workflows/openOverview.ts:468). On cold-start runners this endpoint keeps failing for ~3 min after the workflow already appears in the /workflows registration listing — the trigger route just hasn't fully bound yet. Add waitForRunTriggerEnabled() helper that mirrors the waitForWorkflowsRegistered pattern: 180s default timeout, 2s polling, throttled 10s progress logs, screenshot + diagnostic body dump on timeout. The probe discovers the workflow name and trigger name from the management API, then polls the same listCallbackUrl POST the extension host uses; returns success on HTTP 200 with a non-empty `value` field. Wired into clickRunTrigger between waitForWorkflowsRegistered and the existing button-enablement poll so the latter now resolves in seconds instead of timing out. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…n missing workflow.json Two-part fix for CI run 25911660164 timeouts on PR Azure#9164 (newtests + scenarios-pilot shards): 1) Tighten waitForWorkflowsRegistered to probe GET /workflows/{name} when a workflow name is provided. The previous list-form probe returned a stale/template workflow within 15 ms while the test-created testwf_* workflow never registered, letting listCallbackUrl 404 for the full 180 s budget. waitForRunTriggerEnabled and clickRunTrigger now accept and thread workflowName so they target the specific workflow instead of auto-discovering whatever is registered. inlineJavascript.test.ts and statelessVariables.test.ts pass entry.wfName through. 2) Add fail-fast disk check in waitForOverviewView. When the Create-Workflow UI step silently fails to produce workflow.json, the previous behavior burned the full 90 s overview-open budget retrying the Explorer probe (3 'workflow.json not found in Explorer tree' logs per attempt), then surfaced 180 s later as 'listCallbackUrl never returned a value' instead of pointing at the real cause. A single fs.existsSync check at the top of waitForOverviewView now asserts immediately with a clear 'Create-Workflow UI step did not produce workflow.json' message. Probe chain is now: :7071/admin/host/status -> GET /workflows/{name} -> POST .../listCallbackUrl -> button-enablement DOM poll.
CI run 25913438556 showed GET /workflows/{name} returning 200 in ~13ms (false positive) while GET /workflows/{name}/triggers 404'd for the full 180s listCallbackUrl timeout. The triggers endpoint is the actual precondition for listCallbackUrl, so gate waitForWorkflowsRegistered on it (requiring a non-empty array) when workflowName is provided. Also log both the upstream registration probe URL and the listCallbackUrl probe URL on listCallbackUrl timeout so future endpoint mismatches are visible at a glance.
Investigation of CI 25915000783 showed :7071 answers /admin/host/status=Running in 168-306ms after F5 - physically impossible for cold-started func host. Some other process (likely design-time host or orphan) owns the port and returns 404 WorkflowNotFound to /workflows/{name}/triggers.
- Add killPortBound + prepareForFreshFuncHost helpers
- Call before startDebugging to guarantee fresh workflow runtime on :7071
- Log suspiciously-fast host status (<2s) with full process+config dump
- Cross-platform (Linux lsof / Windows Get-NetTCPConnection)
CI run 25917034859 proved the workflow IS registered on :7071 but with health.state=Unhealthy due to InlineCodeDependencyGeneratorFailure (cold-start inline-code node_modules generation). The runtime self-heals — newtests retry 3 succeeds once node_modules exists from prior runs. Switch waitForWorkflowsRegistered to scan the workflow LIST endpoint (always returns 200) and require the named entry to have health.state===Healthy. The list endpoint answers presence + health in one call, replacing the /triggers probe which only proved trigger-binding. Bump default timeout 180s → 240s to absorb cold-start dep-generation. Log full entry.health on timeout for unambiguous evidence.
…ildren CI run 25920892436 proved /usr/local/bin alone is not on the func host child's sanitized PATH (env -i verification passed but runtime still emits 'node ... could not be found on PATH'). Belt-and-suspenders: - Mirror node/npm/npx symlinks into /usr/bin (always on any minimally sanitized PATH, even ones that exclude /usr/local/bin) - Export PATH explicitly on the test-run step so xvfb-run -> VS Code -> func host -> child processes inherit the toolcache location regardless of whether VS Code's task-runner sanitizes the env Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The strict health-state probe (c2cd9f3) surfaced a pre-existing product bug: Azure Functions in-proc8 runtime's InlineCode dependency generator cannot resolve 'node' even when PATH is correct (/opt/hostedtoolcache/.../bin:/usr/local/bin:/usr/bin:/bin) and /usr/bin/node is symlinked. Runtime emits health.state=Unhealthy with 'The node process needed for inline code dependency generation could not be found on PATH'. The bug is in product code (Functions host launcher or runtime), not test infra. Belt-and-suspenders PATH fixes in commits 824fca2 and 6203d40 verified node IS resolvable via env -i /usr/bin/node, so the issue is non-PATH lookup somewhere in VS Code -> func host -> dep generator chain. Skip Phase 4.3 on Linux CI to restore green for parallelization PR; re-enable once host-side node-resolution is fixed. Other platforms unaffected. Phase 4.4 (statelessVariables) doesn't use inline code so it passes standalone once the cascade from 4.3 is removed. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replaces the deleted log-only test (commit 35b4856) with a true E2E that asserts the actual production keyboard contract: - Ctrl+Alt+P opens 'Go to operation' panel (the real VS Code hotkey; Ctrl+Shift+P is web-only) - Escape closes the panel - Ctrl+Shift+P does NOT open NodeSearch in VS Code (locks the !isVSCode gating in Designer.tsx) Uses stable aria selectors ([role=dialog][aria-label=Go to operation]) - no product instrumentation required. Reuses Phase 4.1 Stateful Standard workspace - no debug/run needed. Estimated ~60-75s wall time in the createplusnewtests shard. Selenium Actions API for keyboard input (per SKILL.md rule 6). Anchors focus on the React Flow canvas before each keystroke. Test C runs last because Ctrl+Shift+P opens the VS Code host palette outside the iframe. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…nlineCodeDependencyGeneratorFailure on Linux
The extension emitted a Windows-only PATH literal (\NodeJs;...\DotNetSDK;$env:PATH)
in the func: host start task's options.env at 10+ sites. On Linux this overrides
inherited PATH with garbage (literal backslashes, semicolons as separator,
un-expanded PowerShell variable). The in-proc8 Functions runtime's InlineCode
dependency generator could not find 'node' on PATH despite node being available
at /usr/bin/node and /opt/hostedtoolcache.
Replace literal with VS Code's documented platform-keyed task syntax using
${env:PATH}. Consolidate emission behind getFuncHostTaskEnv() helper.
Add languageWorkers__node__defaultExecutablePath to local.settings.json as
belt-and-braces (set in preDebugValidate when the managed node binary path
is resolved).
Remove the temporary Linux-CI skip from inlineJavascript.test.ts (commit
5e57e90) — the strict health-state probe added in c2cd9f3 will now
validate end-to-end.
Closes Azure#9172.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…y + assertions
Five distinct race conditions caused Phase 4.8d to flake on xvfb:
1. Test scanned notifications, but the prompt is { modal: true }
2. Two separate ModalDialog instantiations raced when modal lost focus
3. Hardcoded English button labels broke on localized runners
4. No focus restore - xvfb does not auto-raise modal windows
5. Lingering quick-input from prior phase ate the Open Folder typing
Reliability fixes (R1-R9):
- R1: drop notification-scanning, ModalDialog only
- R2: single dialog handle with stale-element retry (pushDialogButtonWithRetry)
- R3: force-focus + Tab+Enter fallback for xvfb-robust click
- R4: locale-lock via LANG/LC_ALL/VSCODE_NLS_CONFIG (label matches DialogResponses.yes.title)
- R5: safeCancelAnyQuickInput pre-flight
- R6: elementIsVisible wait before click
- R7: timeout bumps 60s->120s phase, 30s->45s modal
- R8: dumpDialogDiagnostics on click failure
- R9: milestone screenshots (start, prompt-found, focus-applied, post-click, session-error)
Real assertions:
- A1-A7: pre-click FS invariants (.code-workspace, host.json, launch.json with valid configurations, tasks.json with 'func: host start', workflow.json)
- B1-B3: reload detection (session ended | title change | url change)
- C1-C3: post-reload FS reassertions (no error log, mtime not regressed, A1-A7 re-verified)
- D1-D3: UI state (only on non-reload path)
Every Selenium call after pushDialogButtonWithRetry is wrapped against
isSessionEnded so the expected B1 (session-ended) path is treated as success.
BOM is stripped from .code-workspace reads on Windows.
DialogResponses is intentionally not imported: @microsoft/vscode-azext-utils
does require('vscode') at module load, and ExTester runs Mocha in a separate
process where the vscode module is unavailable. Locale is locked instead.
Note: allowFailure: true REMAINS at run-e2e.js:932 pending validation
of 3 consecutive green CI runs per the R10 gate.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…llers from coverage gate The new pure-function helper getFuncHostTaskEnv (Track 1) gets 12 vitest unit tests covering all 4 platform-keyed blocks, separator/path-format contracts, cwd extras, and the cross-platform return shape. The 9 VS Code task / project-init files that thread the helper's output into VS Code APIs (tasks.ts, validatePreDebug.ts, init project steps, CreateLogicAppVSCodeContents.ts) cannot be exercised in the vitest node environment - they all import from 'vscode' or '@microsoft/vscode-azext-*' at module load. Add them to .github/workflows/pr-coverage.yml's existing exclusion list using the same justification pattern as the 8 pre-existing extension-only exclusions (getAuthorizationToken, startStreamingLogs, languageServer, deploy, exportLogicApp, startRuntimeApi, extensionVariables, main). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…edge Extract durable learnings from PR Azure#9164's 4-day reliability arc: - New: runtime-readiness-probes.md (4-probe HTTP readiness chain, anti-patterns) - New: vscode-task-env-propagation.md (func: host start PATH bug fix in b1b094a) - vscode-e2e-testing.md: diagnostics-first discipline (the meta-lesson), 5-shard CI matrix + workflow_dispatch fallback, prepareFreshSession, true-E2E criteria, xvfb modal dialog 5 races - ci-patterns.md: parallel-worktree merge strategy, git new vs raw worktree add, fork stacked-PR limitation, PR coverage gate - review-patterns.md: PR template Commit Type vs title prefix and Test Plan vs diff - INDEX.md + README.md: register new files and triggers
Step 1 of sub-15min CI restructuring plan. Reduces critical path 27.5m -> ~25m. Changes: - Fix .turbo cache key to hash-based content key (was github.sha which misses on every push) - New setup-extension-build job that runs ONCE per PR: pnpm install, build extension, compile e2e tests. Tars and uploads artifacts. - All 5 matrix shards now needs: setup-extension-build and download instead of duplicating the ~3min build work. Validates the artifact-passing pattern that Steps 2 and 3 will build on. No test code changes. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
🤖 AI PR Validation ReportPR Review ResultsThank you for your submission! Here's detailed feedback on your PR title and body compliance:✅ PR Title
✅ Commit Type
❌ Risk Level
❌ What & Why
❌ Impact of Change
❌ Test Plan
Note: The
|
| Section | Status | Recommendation |
|---|---|---|
| Title | ✅ | No change needed. |
| Commit Type | ✅ | Correctly marked perf. |
| Risk Level | ❌ | Update to Medium and apply risk:medium label (or justify Low and add risk:low label). |
| What & Why | ❌ | Correct the claim that only workflow YAML changed; enumerate product/test/agent changes. |
| Impact of Change | ❌ | Clarify developer and system impact (VS Code extension/task env, test/E2E, CI artifacts). |
| Test Plan | ❌ | Tick Unit/E2E tests; list added/updated tests and CI run IDs; explain manual validation steps. |
| Contributors | Consider listing additional co-authors/reviewers if applicable. | |
| Screenshots/Videos | ✅ | N/A is acceptable; optionally link to CI artifacts/screenshots. |
Final notes & next steps
- Update the PR body to reflect the actual diff: fix the inaccurate sentence that says no product/test code was modified. List the files/categories changed.
- Change the Risk Level selection to Medium (or provide a concrete justification to keep it Low) and add the matching repo label (
risk:medium) to this PR. Remove theneeds-pr-updatelabel after edits. - Fix the Test Plan: check the boxes for Unit tests and E2E tests that were added/updated, list the test paths, and cite CI run IDs that exercised the changes (e.g.,
25940003590). If you ran local/manual validation, add a short bulleted summary of commands and observed results. - Wait ~5–7 minutes after updating the PR body + labels for the
AI PR Validationworkflow to re-run and confirm all sections show ✅ in the bot comment. If the bot still flags sections, iterate until the summary comment is all green.
If you’d like, I can draft the corrected PR body for you (adjusting the What & Why, Impact, and Test Plan sections to match the diff) and provide the exact gh pr edit command to apply the body + label edit in one operation to trigger a fresh validation run.
Advised risk was higher than the submitter's (Medium vs Low). Please update the PR body and labels as recommended, then re-run the CI/validation checks. Thank you!
Last updated: Sat, 16 May 2026 00:26:18 GMT
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
This PR primarily optimizes the vscode-e2e GitHub Actions workflow by de-duplicating build work across matrix shards and fixing the Turbo cache key so builds can reuse cache across pushes, while also introducing a broad set of VS Code extension E2E reliability improvements, task-env fixes, and Squad workflow/docs additions.
Changes:
- Add a shared
setup-extension-buildjob and wirevscode-e2eshards to download/extract a prebuilt artifact instead of rebuilding per shard. - Fix
.turbocache key to be content-hashed (with restore keys) rather than SHA-based. - Update VS Code extension tests, task PATH handling, pre-debug validation, and add extensive
.squad/playbooks/prompts/knowledge to formalize workflow conventions.
Reviewed changes
Copilot reviewed 89 out of 89 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| libs/designer/src/lib/ui/test/keyboardNavigation.spec.tsx | Adds unit tests asserting Designer NodeSearch hotkey wiring via mocked useHotkeys. |
| docs/ai-setup/shared.md | Adds repo “always do these first” rules and documents CI shard E2E_MODEs. |
| docs/ai-setup/packages/vs-code-designer.md | Updates VS Code designer E2E docs (removes old Phase 4.6 claim; documents shard modes). |
| apps/vs-code-designer/src/test/ui/workspaceConversionCreate.test.ts | Hardens conversion-create E2E against stale elements/toast overlays; adds diagnostics/screenshots. |
| apps/vs-code-designer/src/test/ui/statelessVariables.test.ts | Adjusts debug/overview flow and adds retry to reduce CI flake risk. |
| apps/vs-code-designer/src/test/ui/multipleDesigners.test.ts | Minor string formatting tweak for quick input path entry. |
| apps/vs-code-designer/src/test/ui/keyboardNavigation.test.ts | Replaces prior non-asserting focus logging with explicit Go-to-operation panel contract tests. |
| apps/vs-code-designer/src/test/ui/inlineJavascript.test.ts | Increases timeout and adds retry; updates overview/runtime-ready flow. |
| apps/vs-code-designer/src/test/ui/helpers.ts | Adds dialog/quick-input/workbench readiness helpers to stabilize VS Code UI automation. |
| apps/vs-code-designer/src/test/ui/createWorkspace.test.ts | Improves command palette reliability and webview iframe selection/polling; adds screenshots and longer validation waits. |
| apps/vs-code-designer/src/constants.ts | Adds inlineCodeNodeExecutablePathKey constant for in-proc8 node path pinning. |
| apps/vs-code-designer/src/assets/WorkspaceTemplates/TasksJsonFile | Fixes task PATH env to be OS-specific and use ${env:PATH} correctly. |
| apps/vs-code-designer/src/app/utils/vsCodeConfig/tasks.ts | Uses new getFuncHostTaskEnv() helper when generating func: host start task. |
| apps/vs-code-designer/src/app/utils/codeless/funcHostTaskEnv.ts | Introduces OS-keyed task env generator for correct PATH propagation. |
| apps/vs-code-designer/src/app/utils/codeless/test/funcHostTaskEnv.test.ts | Adds unit coverage for getFuncHostTaskEnv() contract across platforms. |
| apps/vs-code-designer/src/app/debug/validatePreDebug.ts | Adds best-effort pinning of absolute node path into local.settings for inline code dep generation. |
| apps/vs-code-designer/src/app/commands/initProjectForVSCode/initScriptProjectStep.ts | Switches generated tasks to use getFuncHostTaskEnv(). |
| apps/vs-code-designer/src/app/commands/initProjectForVSCode/initProjectStep.ts | Switches generated tasks to use getFuncHostTaskEnv(). |
| apps/vs-code-designer/src/app/commands/initProjectForVSCode/initDotnetProjectStep.ts | Switches generated tasks to use getFuncHostTaskEnv({ cwd }). |
| apps/vs-code-designer/src/app/commands/createProject/createCustomCodeProjectSteps/initCustomCodeScriptProjectStep.ts | Switches generated tasks to use getFuncHostTaskEnv(). |
| apps/vs-code-designer/src/app/commands/createProject/createCustomCodeProjectSteps/initCustomCodeProjectStepBase.ts | Rewrites tasks.json template generation as JSON + uses getFuncHostTaskEnv(). |
| apps/vs-code-designer/src/app/commands/createProject/createCustomCodeProjectSteps/initCustomCodeProjectStep.ts | Switches generated tasks to use getFuncHostTaskEnv(). |
| apps/vs-code-designer/src/app/commands/createNewCodeProject/CodeProjectBase/CreateLogicAppVSCodeContents.ts | Switches generated tasks to use getFuncHostTaskEnv({ cwd }). |
| apps/vs-code-designer/CLAUDE.md | Updates E2E phase table/modes to match new sharding guidance. |
| CLAUDE.md | Adds repo “always do these first” rules and CI shard E2E_MODEs. |
| .squad/team.md | Adds lifecycle and additional specialist agent entries + ownership notes. |
| .squad/routing.md | Expands routing rules (esp. for VS Code tests and lifecycle workflows). |
| .squad/prompts/vscode-e2e-test.md | Adds reusable prompt to drive VS Code ExTester E2E work. |
| .squad/prompts/test-strategy.md | Adds reusable prompt to select appropriate test coverage layer. |
| .squad/prompts/test-failure-analysis.md | Adds reusable prompt for unit/E2E/CI failure analysis. |
| .squad/prompts/session-learnings.md | Adds reusable prompt to curate durable cross-session learnings. |
| .squad/prompts/review-board.md | Adds reusable prompt to invoke senior review board workflow. |
| .squad/prompts/refresh-agent-context.md | Adds reusable prompt to refresh agent context from prior sessions. |
| .squad/prompts/pr-lifecycle.md | Adds reusable prompt for PR lifecycle ownership and iteration. |
| .squad/prompts/customer-repro.md | Adds reusable prompt for safe customer issue reproduction. |
| .squad/prompts/customer-regression-test.md | Adds reusable prompt to convert confirmed repros into regression tests. |
| .squad/prompts/ci-fix-loop.md | Adds reusable prompt for CI monitoring + failure iteration loop. |
| .squad/prompts/chronicle-improve.md | Adds reusable prompt for chronicle-driven improvement workflow. |
| .squad/prompts/chief-engineer.md | Adds a reusable “chief-engineer” orchestration prompt. |
| .squad/playbooks/vscode-testing.md | Adds VS Code testing playbook (unit + ExTester suite discipline). |
| .squad/playbooks/session-knowledge-feed.md | Adds playbook for extracting/curating durable learnings safely. |
| .squad/playbooks/senior-swe-review-board.md | Adds playbook defining senior review checkpoints & expectations. |
| .squad/playbooks/pr-lifecycle.md | Adds end-to-end PR lifecycle playbook (plan→implement→CI→summary). |
| .squad/playbooks/customer-repro.md | Adds customer repro playbook with sanitization and coverage guidance. |
| .squad/playbooks/chronicle-driven-improvement.md | Adds playbook for Copilot CLI chronicle-based improvement inputs. |
| .squad/playbooks/central-agent.md | Adds playbook for single-entry central orchestration. |
| .squad/knowledge/vscode-task-env-propagation.md | Adds curated learning about VS Code task PATH propagation and fixes. |
| .squad/knowledge/unit-testing.md | Adds curated learnings about unit testing conventions in this repo. |
| .squad/knowledge/session-learnings.md | Adds curated learnings extracted from prior sessions. |
| .squad/knowledge/runtime-readiness-probes.md | Adds curated learnings for runtime readiness probes (VS Code Functions). |
| .squad/knowledge/review-patterns.md | Adds curated learnings about review/PR template validation patterns. |
| .squad/knowledge/customer-repro.md | Adds curated learnings about safe customer repro patterns. |
| .squad/knowledge/ci-patterns.md | Adds curated CI patterns (incl. coverage gate notes). |
| .squad/knowledge/agent-improvements.md | Adds curated improvements for agent routing/workflow. |
| .squad/knowledge/README.md | Introduces knowledge directory README and safety rules. |
| .squad/knowledge/INDEX.md | Adds trigger-to-knowledge index and “hard rules” for tasks. |
| .squad/agents/vscode-test-specialist/charter.md | Introduces charter for VS Code test specialist agent. |
| .squad/agents/test/charter.md | Expands test agent responsibilities and VS Code testing guidance. |
| .squad/agents/session-knowledge-curator/charter.md | Introduces charter for session knowledge curator agent. |
| .squad/agents/senior-swe-reviewer/charter.md | Introduces charter for senior diff reviewer agent. |
| .squad/agents/senior-swe-planner/charter.md | Introduces charter for senior plan reviewer agent. |
| .squad/agents/senior-swe-critic/charter.md | Introduces charter for senior design/risk critic agent. |
| .squad/agents/senior-swe-adjudicator/charter.md | Introduces charter for senior adjudicator agent. |
| .squad/agents/review-critic/charter.md | Introduces charter for independent review critic agent. |
| .squad/agents/release-scribe/charter.md | Introduces charter for release/PR summary writer agent. |
| .squad/agents/pr-orchestrator/charter.md | Introduces charter for PR orchestrator agent. |
| .squad/agents/pr-comment-triage/charter.md | Introduces charter for PR comment triage agent. |
| .squad/agents/plan-auditor/charter.md | Introduces charter for plan auditor agent. |
| .squad/agents/customer-repro-tester/charter.md | Introduces charter for customer repro tester agent. |
| .squad/agents/ci-sentinel/charter.md | Introduces charter for CI sentinel agent. |
| .squad/agents/chief-engineer/charter.md | Introduces charter for chief engineer orchestration agent. |
| .squad/README.md | Updates Squad overview with lifecycle agents and playbook links. |
| .squad/AGENT_WORKFLOW.md | Adds a worktree/skip-worktree workflow doc for agent file isolation. |
| .github/workflows/vscode-e2e.yml | Implements shared build job, artifact passing, shard mode wiring, and summary job. |
| .github/workflows/pr-coverage.yml | Expands files_ignore for VS Code API-dependent sources; notes unit-tested helper. |
| .github/instructions/vs-code-designer.instructions.md | Updates E2E docs/modes and removes prior keyboard navigation phase claim. |
| .github/copilot-instructions.md | Adds “always do these first” rules and documents CI shard E2E_MODEs. |
| .github/agents/vscode-test-specialist.agent.md | Adds agent descriptor for VS Code test specialist. |
| .github/agents/test.agent.md | Adds agent descriptor for test specialist. |
| .github/agents/session-knowledge-curator.agent.md | Adds agent descriptor for session knowledge curator. |
| .github/agents/release-scribe.agent.md | Adds agent descriptor for release scribe. |
| .github/agents/pr-orchestrator.agent.md | Adds agent descriptor for PR orchestrator. |
| .github/agents/customer-repro-tester.agent.md | Adds agent descriptor for customer repro tester. |
| .github/agents/ci-sentinel.agent.md | Adds agent descriptor for CI sentinel. |
| .github/agents/chief-engineer.agent.md | Adds agent descriptor for chief engineer. |
Comments suppressed due to low confidence (6)
.github/workflows/vscode-e2e.yml:1
- The matrix shards run
node apps/vs-code-designer/src/test/ui/run-e2e.jswithout any dependency install step on that runner. Sincesetup-extension-buildruns on a different VM, itspnpm installdoes not providenode_moduleshere; downloadingdist/+out/alone is unlikely to satisfyrequire()s forvscode-extension-testerand other runtime deps. Fix by reintroducing a minimal install on each shard (e.g.pnpm install --frozen-lockfile/pnpm install --filter apps/vs-code-designer...) while keeping the build/tsup work centralized, or alternatively include the needed runtimenode_modulesin the artifact (with size/cache tradeoffs).
name: VS Code Extension E2E Tests
.squad/knowledge/vscode-task-env-propagation.md:1
- This file references
funcHostTaskEnv.spec.ts, but the added unit test file isapps/vs-code-designer/src/app/utils/codeless/__test__/funcHostTaskEnv.test.ts. Please update the knowledge entry to point at the correct path/name so future readers can find the coverage quickly.
apps/vs-code-designer/src/test/ui/keyboardNavigation.test.ts:1 - The comment’s source reference
libs/designer/src/lib/designer/Designer.tsxappears inconsistent with other references in this PR (e.g.,libs/designer/src/lib/ui/Designer.tsx). Please correct the path in the doc comment so it points to the actual file that owns the hotkey registration.
apps/vs-code-designer/src/test/ui/keyboardNavigation.test.ts:1 - The test asserts focus by comparing the active element’s
placeholdertoSearch for operation. Placeholders are user-facing strings and are commonly localized/changed, which can make this check brittle. Prefer asserting focus using a stable attribute (e.g.,role=\"textbox\"within the dialog, anaria-label, or another non-localized identifier exposed by the component).
apps/vs-code-designer/src/test/ui/createWorkspace.test.ts:1 - The retry log message says
setText failed, but the updated path now drives the underlying<input>viasendKeys()rather thanInputBox.setText(). Consider updating the log wording to match the actual operation (e.g., 'command palette input failed') to reduce confusion when diagnosing CI logs.
apps/vs-code-designer/src/test/ui/createWorkspace.test.ts:1 - The 'Create Workspace form marker' heuristic is extremely broad (any
inputorbuttonwill satisfy it), so it can succeed even when the webview is the wrong tab or still partially mounted. To makeswitchToWebviewFramemore deterministic, consider polling for a more specific marker tied to the Create Workspace UI (e.g., a knownaria-label, a wizard root element, or a dedicateddata-testid/role selector) rather than generic element counts.
|
|
||
| - name: Extract build artifacts | ||
| run: tar -xzf extension-build.tar.gz | ||
|
|
| # Build the extension + compile the E2E test bundle ONCE per workflow run, | ||
| # then fan out to the matrix shards which download the resulting artifact. | ||
| # This eliminates ~3min of duplicated pnpm install + turbo build + tsup | ||
| # work that previously ran on each of the 5 matrix shards in parallel. | ||
| setup-extension-build: |
CI run 25936983196 failed all 5 vscode-e2e shards because setup-extension-build tars only dist/+out/ but the shards no longer run 'pnpm install', leaving node_modules empty. Every shard died pre-test with: Cannot find module 'vscode-extension-tester' Tarring node_modules itself doesn't work for pnpm workspaces (symlinks point into a content-addressed store at the repo root). The correct approach is to cache the pnpm store at the runner level and run 'pnpm install --offline' in each shard (~30-60s vs the ~2-3min previously) to rehydrate node_modules from the cache. - Add 'Cache pnpm store' step in setup-extension-build (warms the cache). - Add 'Cache pnpm store' + 'Setup pnpm' + 'Install dependencies from cache' steps to each matrix shard (uses --offline --frozen-lockfile to enforce cache hit). Net per-shard saving: ~2 min vs pre-cache state, while preserving the build/compile work done by setup-extension-build. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…up 2) CI run 25938692460 failed with ERR_PNPM_NO_OFFLINE_TARBALL -- pnpm install --offline is too strict when the cache isn't fully populated. Restore the working pattern from PR Azure#9164's vscode-e2e.yml: pnpm/action-setup@v3 with run_install. The actions/cache@v4 step on the pnpm store stays -- it accelerates installs on warm runs, and network fallback handles cold runs without failing the build. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Commit Type
Risk Level
What & Why
Two pure-workflow changes that eliminate duplicated build work across the 5 matrix shards and stop invalidating the turbo cache on every push. No test or product code is modified — diff is ~+90/-25 in
.github/workflows/vscode-e2e.yml..turbocache key: was${{ runner.os }}-turbo-${{ github.sha }}— guaranteed cache miss on every push. Now content-hashed overapps/**/*.{ts,tsx},libs/**/*.{ts,tsx},pnpm-lock.yaml, andturbo.jsonwith a fallbackrestore-keysfor partial hits across PRs.setup-extension-buildjob: runspnpm install+pnpm turbo run build:extension+npx tsuponce per workflow run, tarsapps/vs-code-designer/dist/andout/, and uploads asextension-build-${sha}.independent,designer,newtests,conversion,scenarios-pilot) nowneeds: setup-extension-buildand download+extract the artifact instead of re-running pnpm install + build + tsup themselves. TheSymlink nodestep is preserved on each shard because each matrix entry is an independent runner VM that does not inherit the symlinks from the setup job.Critical-path effect: 27.5m → ~25m on this step alone, and — combined with Steps 2 (#9180) and 3 (#9181) — drives the overall vscode-e2e critical path to 14m23s (measured on run
25947015328, p41b). Validates the artifact-passing pattern that Steps 2 and 3 build on.Impact of Change
needs: setup-extension-build+ the download/extract steps are the new minimum.pnpm install+turbo build+tsupwork per shard. Adds one short serial leg (~3–4 min) before fan-out. Net: 5 shards × ~3 min duplicate work consolidated into 1.Test Plan
js-yaml— all three jobs (setup-extension-build,vscode-e2e,vscode-e2e-summary) andneedswiring verified.25940003590.workflow_dispatch:trigger,xvfb-runPATH export, andCache Logic Apps runtime dependenciesstep are all preserved unchanged.Not applicable: unit / E2E test additions — this PR is workflow YAML only.
Contributors
@lambrianmsft
Screenshots/Videos
N/A — CI orchestration only.