Skip to content

fix(ci): reconcile stale PR preview resources#146

Merged
isuttell merged 5 commits into
mainfrom
codex/ap-97-pr-preview-cleanup-reconciliation
May 31, 2026
Merged

fix(ci): reconcile stale PR preview resources#146
isuttell merged 5 commits into
mainfrom
codex/ap-97-pr-preview-cleanup-reconciliation

Conversation

@isuttell

@isuttell isuttell commented May 30, 2026

Copy link
Copy Markdown
Contributor

Summary

Fixes AP-97 by making PR-preview cleanup event-independent. Closed or missing PR preview resources are now reconciled before new preview creation and on a scheduled cleanup workflow.

Changes

  • Add cleanup-stale-pr-previews.mjs to discover PR-scoped Hyperdrive configs, Queues, Workers, and stale GitHub PR state before cleaning Cloudflare and Neon resources.
  • Run stale reconciliation before PR preview capacity checks and every six hours from the cleanup workflow.
  • Keep single-PR manual cleanup intact and document the reconciliation path.

Risk: HIGH

  • Areas touched: GitHub Actions PR-preview deploy/cleanup workflows, Cloudflare and Neon cleanup scripts, ops docs.
  • Security: uses existing GitHub, Cloudflare, and Neon tokens from the Preview environment only; no secrets are logged.
  • Performance: scheduled reconciliation does bounded resource listing and only deletes closed or missing PR resources.
  • Breaking: no production runtime behavior changes.

Test plan

  • pnpm verify
  • pnpm test:coverage
  • pnpm test:scripts
  • pnpm exec vitest run scripts/cleanup-stale-pr-previews.test.mjs scripts/cleanup-pr-preview.test.mjs scripts/delete-neon-pr-branch.test.mjs scripts/check-pr-preview-capacity.test.mjs
  • pnpm exec biome check scripts/cleanup-stale-pr-previews.mjs scripts/cleanup-stale-pr-previews.test.mjs .github/workflows/pr-preview.yml .github/workflows/pr-preview-cleanup.yml
  • pnpm exec prettier --check .github/workflows/pr-preview.yml .github/workflows/pr-preview-cleanup.yml scripts/README.md docs/ops/first-deploy.md
  • git diff --check
  • gitleaks detect --no-git --redact --source . --log-opts --since=HEAD --verbose=false
  • gitleaks protect --staged --redact --verbose=false
  • Live cleanup: PR 139 Cloudflare resources deleted locally and Neon branch deleted by GitHub cleanup workflow run 26694370548
  • Live cleanup: PR 37 stale Workers deleted through Wrangler-backed cleanup script

Issue: AP-97

Summary by CodeRabbit

  • New Features

    • Added scheduled stale resource cleanup (every 6 hours) to automatically remove orphaned PR preview resources and prevent leaks from missed close events.
    • Enabled manual cleanup dispatch for handling edge cases without a specific PR number.
  • Documentation

    • Updated PR preview cleanup runbook to clarify the new scheduled reconciliation behavior.

Review Change Stack

@linear-code

linear-code Bot commented May 30, 2026

Copy link
Copy Markdown
AP-97 Fix PR preview resource cleanup reconciliation

Problem

PR preview creation can leave PR-scoped Cloudflare Workers, Queues, Hyperdrive configs, or Neon branches behind when the pull_request.closed cleanup event is missed or when a branch/PR flow creates a second PR after an earlier PR is already closed.

Observed stale resources:

  • PR 139 left Cloudflare preview resources and a Neon branch after the PR was closed.
  • A Wrangler scan later found old PR 37 Workers still deployed.

Scope

  • Add a repo-local stale PR preview cleanup reconciler.
  • Discover PR-scoped preview resources from Cloudflare resource names.
  • Check GitHub PR state and clean resources for closed or missing PRs.
  • Run reconciliation before creating a new PR preview and on a scheduled cleanup workflow.
  • Keep manual cleanup for a specific PR number working.
  • Document the reconciliation behavior.

Acceptance criteria

  • Closed or missing PR preview resources are cleaned without relying only on pull_request.closed.
  • Current PR preview creation excludes the active PR and does not delete its resources.
  • Cleanup remains idempotent when some resource classes are already gone.
  • Existing PR-specific cleanup still handles Workers, Queues, Hyperdrive, and Neon branch deletion.
  • Focused script tests cover discovery, dry run, missing GitHub context, and cleanup behavior.
  • pnpm verify passes.

Out of scope

  • Production resource cleanup.
  • Changing deployed preview app behavior beyond cleanup workflows.
  • Replacing Wrangler or Neon APIs used by existing scripts.

Review in Linear

@coderabbitai

coderabbitai Bot commented May 30, 2026

Copy link
Copy Markdown

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 57f48708-250b-4882-b5c6-a314ca61e55b

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch codex/ap-97-pr-preview-cleanup-reconciliation

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@scripts/cleanup-stale-pr-previews.mjs`:
- Around line 32-35: The PR classification step is silently skipping items with
state "unknown", hiding API/auth failures; update the logic after
classifyPrNumbers (the classified variable returned by classifyPrNumbers) to
detect any item where item.state === "unknown" and fail fast (throw an Error or
exit non-zero) with a clear message including the offending prNumber(s) and any
diagnostic info from the classifier, instead of proceeding to build stale via
.filter/.map; apply the same check for the later identical block referenced
(lines around 179-181) so reconciliation stops on unknown classifications.

In `@scripts/cleanup-stale-pr-previews.test.mjs`:
- Around line 168-201: The test for cleanupStalePrPreviews currently treats a
500 response from GitHub as silent success; change it to assert that
cleanupStalePrPreviews rejects (fails) when the fetch for "/pulls/137" returns
status 500. Update the assertion to use expect(...).rejects.toThrow() (or
toReject with a matching error message) instead of resolves.toEqual(...),
keeping the same mocked run, fetch, cleanupPreview, and deleteNeonBranch fns,
and still assert that cleanupPreview and deleteNeonBranch were not called; this
ensures cleanupStalePrPreviews reports classification failures rather than
silently succeeding.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: e0741338-4b99-4f52-b87b-007dc4823a0e

📥 Commits

Reviewing files that changed from the base of the PR and between 4f112de and 673985e.

📒 Files selected for processing (6)
  • .github/workflows/pr-preview-cleanup.yml
  • .github/workflows/pr-preview.yml
  • docs/ops/first-deploy.md
  • scripts/README.md
  • scripts/cleanup-stale-pr-previews.mjs
  • scripts/cleanup-stale-pr-previews.test.mjs

Comment on lines +32 to +35
const classified = await classifyPrNumbers(candidates, options.github, fetchFn);
const stale = classified
.filter((item) => item.state === "closed" || item.state === "missing")
.map((item) => item.prNumber);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Fail reconciliation when PR state classification is unknown.

Right now a GitHub API 5xx/auth failure is converted to "unknown" and silently skipped, which can make scheduled reconciliation report success while cleanup is effectively blind.

Suggested fix
   const classified = await classifyPrNumbers(candidates, options.github, fetchFn);
+  const unknown = classified.filter((item) => item.state === "unknown").map((item) => item.prNumber);
+  if (unknown.length > 0) {
+    throw new Error(
+      `Unable to classify GitHub PR state for ${unknown.length} candidate(s): ${unknown
+        .map((prNumber) => `#${prNumber}`)
+        .join(", ")}.`,
+    );
+  }
   const stale = classified
     .filter((item) => item.state === "closed" || item.state === "missing")
     .map((item) => item.prNumber);

Also applies to: 179-181

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/cleanup-stale-pr-previews.mjs` around lines 32 - 35, The PR
classification step is silently skipping items with state "unknown", hiding
API/auth failures; update the logic after classifyPrNumbers (the classified
variable returned by classifyPrNumbers) to detect any item where item.state ===
"unknown" and fail fast (throw an Error or exit non-zero) with a clear message
including the offending prNumber(s) and any diagnostic info from the classifier,
instead of proceeding to build stale via .filter/.map; apply the same check for
the later identical block referenced (lines around 179-181) so reconciliation
stops on unknown classifications.

Comment on lines +168 to +201
it("does not clean previews when GitHub PR state cannot be classified", async () => {
const run = vi.fn(async (_command, args) => {
if (args.join(" ") === "exec wrangler hyperdrive list") {
return { code: 0, stdout: hyperdriveList(["agent-paste-db-pr-137"]), stderr: "" };
}
if (args.join(" ") === "exec wrangler queues list") {
return { code: 0, stdout: "", stderr: "" };
}
throw new Error(`Unexpected command ${args.join(" ")}`);
});
const fetch = vi.fn(async (url) => {
if (String(url).endsWith("/pulls/137")) {
return textResponse("server error", { status: 500 });
}
throw new Error(`Unexpected fetch ${url}`);
});
const cleanupPreview = vi.fn(async () => {});
const deleteNeonBranch = vi.fn(async () => {});

await expect(
cleanupStalePrPreviews(
{ github, cloudflare: {}, neon },
{ run, fetch, cleanupPreview, deleteNeonBranch, log: () => {} },
),
).resolves.toEqual({
discovered: ["137"],
stale: [],
cleaned: [],
dryRun: false,
});

expect(cleanupPreview).not.toHaveBeenCalled();
expect(deleteNeonBranch).not.toHaveBeenCalled();
});

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Update the GitHub-500 classification test to assert failure.

This test currently codifies silent success on PR-state classification failure, which can hide reconciliation outages.

Suggested test update
-  it("does not clean previews when GitHub PR state cannot be classified", async () => {
+  it("fails when GitHub PR state cannot be classified", async () => {
@@
-    await expect(
-      cleanupStalePrPreviews(
-        { github, cloudflare: {}, neon },
-        { run, fetch, cleanupPreview, deleteNeonBranch, log: () => {} },
-      ),
-    ).resolves.toEqual({
-      discovered: ["137"],
-      stale: [],
-      cleaned: [],
-      dryRun: false,
-    });
+    await expect(
+      cleanupStalePrPreviews(
+        { github, cloudflare: {}, neon },
+        { run, fetch, cleanupPreview, deleteNeonBranch, log: () => {} },
+      ),
+    ).rejects.toThrow("Unable to classify GitHub PR state");
 
     expect(cleanupPreview).not.toHaveBeenCalled();
     expect(deleteNeonBranch).not.toHaveBeenCalled();
   });
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/cleanup-stale-pr-previews.test.mjs` around lines 168 - 201, The test
for cleanupStalePrPreviews currently treats a 500 response from GitHub as silent
success; change it to assert that cleanupStalePrPreviews rejects (fails) when
the fetch for "/pulls/137" returns status 500. Update the assertion to use
expect(...).rejects.toThrow() (or toReject with a matching error message)
instead of resolves.toEqual(...), keeping the same mocked run, fetch,
cleanupPreview, and deleteNeonBranch fns, and still assert that cleanupPreview
and deleteNeonBranch were not called; this ensures cleanupStalePrPreviews
reports classification failures rather than silently succeeding.

@isuttell isuttell merged commit 99bdb96 into main May 31, 2026
5 checks passed
@isuttell isuttell deleted the codex/ap-97-pr-preview-cleanup-reconciliation branch May 31, 2026 00:05
@github-actions

Copy link
Copy Markdown

agent-paste PR preview resources were cleaned up. The shared Preview GitHub Environment is retained for future preview deploys.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant