Skip to content

ci: harden PR preview readiness gate#71

Merged
isuttell merged 2 commits into
mainfrom
feat/harden-pr-preview-readiness
May 25, 2026
Merged

ci: harden PR preview readiness gate#71
isuttell merged 2 commits into
mainfrom
feat/harden-pr-preview-readiness

Conversation

@isuttell

@isuttell isuttell commented May 25, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Phase 3 close-out item 3: harden the PR-preview readiness gate.
  • Adds a real per-worker health check and skips deploys for docs-only PRs.

Changes

  • Add an unauthenticated GET /healthz returning 200 ok (no cookies, no DB) to every deployed preview worker: api, upload, content (Hono, registered before auth/contract routes) and apex (routeApex). web already had one.
  • Register /healthz in each worker's nonContractRoutePaths guardrail and add a healthz test per worker.
  • Rewrite the preview "Wait for Workers" step to poll /healthz on every deployed worker and require 3 consecutive 200s before proceeding. Only an exact 200 counts as healthy; 404 (route not yet propagated) and 530 (Cloudflare error 1042) are treated as transient and retried within the budget (45 attempts x 2s).
  • Add paths-ignore (docs/**, **/*.md, .claude/**) so documentation-only PRs skip the per-PR preview deploy.
  • Check off backlog item 3 in docs/ops/status/phase-backlog.md.

Risk

  • Low. The preview deploy is not a required status check on main, so the docs-only skip cannot block merges.
  • Deviation from the literal backlog wording ("curl --fail"): the gate uses an explicit [ status = 200 ] check instead, which is strictly stricter than --fail (rejects all non-200, not just >=400) and lets us capture/log real status codes cleanly. Behavior is fail-closed.

Test plan

  • pnpm verify (72 Turbo tasks) green locally.
  • One local CodeRabbit pass; sole finding (status capture) addressed.
  • CI Validate passes on the PR.
  • PR preview deploy reaches the new /healthz gate and goes green.

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Added /healthz health-check endpoints across services returning HTTP 200 with plain-text "ok".
  • Chores

    • PR preview workflow now skips docs-only changes and uses improved readiness polling: requires three consecutive HTTP 200 responses per service and treats certain transient responses as retryable.
  • Documentation

    • Marked PR-preview workflow hardening complete and documented the new readiness behavior.

Review Change Stack

Add an unauthenticated GET /healthz returning 200 to every deployed
preview worker (api, upload, content, apex; web already had one) and
rewrite the preview readiness step to poll /healthz on each, requiring 3
consecutive 200s before proceeding. Only an exact 200 counts as healthy;
404 (route not yet propagated) and 530 (Cloudflare error 1042) are
treated as transient and retried within the budget. Skip per-PR deploys
for docs-only changes via paths-ignore.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@isuttell isuttell temporarily deployed to pr-preview-71 May 25, 2026 22:17 — with GitHub Actions Inactive
@coderabbitai

coderabbitai Bot commented May 25, 2026

Copy link
Copy Markdown

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: 643af596-e772-49c6-9bbc-83cbfc6741c2

📥 Commits

Reviewing files that changed from the base of the PR and between 3e01dfc and b9b1e6e.

📒 Files selected for processing (1)
  • .github/workflows/pr-preview.yml

Walkthrough

This PR implements health check endpoints across the deployed workers and refactors the PR preview workflow readiness gate. A /healthz GET endpoint is added to apex, api, content, and upload workers, each returning plain-text "ok" with HTTP 200. Each endpoint is marked as a non-contract route and covered by tests. The workflow trigger is updated to skip documentation-only changes, and the readiness wait logic is replaced with a polling function that queries each worker's /healthz endpoint, requires three consecutive successful responses, and treats HTTP 404 and 530 as transient failures that reset the consecutive counter.

Sequence Diagram

sequenceDiagram
  participant Workflow as PR Preview Workflow
  participant ApexWorker as Apex Worker
  participant APIWorker as API Worker
  participant ContentWorker as Content Worker
  participant UploadWorker as Upload Worker
  
  Workflow->>ApexWorker: GET /healthz
  ApexWorker-->>Workflow: 200 ok
  
  Workflow->>APIWorker: GET /healthz (1)
  APIWorker-->>Workflow: 200 ok
  Workflow->>APIWorker: GET /healthz (2)
  APIWorker-->>Workflow: 200 ok
  Workflow->>APIWorker: GET /healthz (3)
  APIWorker-->>Workflow: 200 ok
  
  Workflow->>ContentWorker: GET /healthz
  ContentWorker-->>Workflow: 200 ok
  
  Workflow->>UploadWorker: GET /healthz (1)
  UploadWorker-->>Workflow: 404 Not Ready
  Workflow->>UploadWorker: GET /healthz (2)
  UploadWorker-->>Workflow: 200 ok
  Workflow->>UploadWorker: GET /healthz (3)
  UploadWorker-->>Workflow: 200 ok
  
  Workflow->>Workflow: All workers healthy — proceed
Loading

Possibly related PRs

  • zaks-io/agent-paste#2: Earlier PR that added a polling-based DNS propagation wait in the workflow; related to readiness gating changes.
  • zaks-io/agent-paste#62: Modifies the PR preview workflow to wait for /healthz when a per-PR web smoke URL is provided; closely related to this PR's readiness updates.
  • zaks-io/agent-paste#61: Documents the /healthz-based readiness gate hardening (consecutive polls + transient error handling) that this PR implements.

Poem

A rabbit checks each tunnel's beat—
/healthz pings three times for treat,
When workers bounce, we wait and see,
Three steady pings unlock the tree! 🐰✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately and concisely summarizes the main change: hardening the PR preview readiness gate through health checks and documentation path filtering.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/harden-pr-preview-readiness

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.github/workflows/pr-preview.yml:
- Around line 134-157: The error message in wait_for_healthz() uses MAX_ATTEMPTS
* SLEEP_SECONDS but omits the curl per-request timeout; update the timeout
calculation used in the final echo to include the curl --max-time value (5s) so
it reflects MAX_ATTEMPTS * (SLEEP_SECONDS + CURL_TIMEOUT). Add a local variable
(e.g., CURL_TIMEOUT=5) near the top of wait_for_healthz or reuse the literal 5,
compute total_wait=$((MAX_ATTEMPTS * (SLEEP_SECONDS + CURL_TIMEOUT))) and use
that variable in the "::error::" echo and return path so the reported seconds
match the actual maximum wait.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: 548ff1dc-5319-4001-8eda-6aae889a858d

📥 Commits

Reviewing files that changed from the base of the PR and between 72304c5 and 3e01dfc.

📒 Files selected for processing (10)
  • .github/workflows/pr-preview.yml
  • apps/apex/src/index.test.ts
  • apps/apex/src/routes.ts
  • apps/api/src/index.test.ts
  • apps/api/src/index.ts
  • apps/content/src/index.test.ts
  • apps/content/src/index.ts
  • apps/upload/src/index.test.ts
  • apps/upload/src/index.ts
  • docs/ops/status/phase-backlog.md

Comment thread .github/workflows/pr-preview.yml
Include curl --max-time in the worst-case wait shown in the gate's
timeout error so the reported seconds match actual behavior.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@isuttell isuttell temporarily deployed to pr-preview-71 May 25, 2026 22:21 — with GitHub Actions Inactive
@github-actions

Copy link
Copy Markdown

@isuttell isuttell merged commit 6027b8b into main May 25, 2026
4 checks passed
@isuttell isuttell deleted the feat/harden-pr-preview-readiness branch May 25, 2026 22:42
@github-actions

Copy link
Copy Markdown

agent-paste PR preview resources were cleaned up. The pr-preview-${context.issue.number} environment is left in place; remove it from the GitHub UI if desired.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant