Skip to content

fix(api,upload): replay idempotent mutations before rate-limit accounting#9

Merged
isuttell merged 6 commits into
mainfrom
worktree-agent-a58e503adc3cfbe5c
May 22, 2026
Merged

fix(api,upload): replay idempotent mutations before rate-limit accounting#9
isuttell merged 6 commits into
mainfrom
worktree-agent-a58e503adc3cfbe5c

Conversation

@isuttell

@isuttell isuttell commented May 22, 2026

Copy link
Copy Markdown
Contributor

Summary

Idempotent retries on the upload mutation routes now short-circuit before either rate-limit binding is touched, so a client replaying a known-successful command no longer burns through its per-actor or per-workspace budget. Genuinely new mutations still rate-limit normally.

Changes

  • Adds peekIdempotentReplay (SELECT-only) to @agent-paste/commands plus unit coverage for the hit/miss/in-flight cases.
  • Wires the peek into apps/upload/src/index.ts for POST /v1/upload-sessions and POST /v1/upload-sessions/:id/finalize. On cache hit the route shapes the cached UploadSessionRecord or finalize result through the existing signing helpers and returns 200, skipping ACTOR_RATE_LIMIT.limit() / WORKSPACE_BURST_CAP.limit().
  • Adds an upload-worker vitest that asserts a replayed key returns the cached response while both rate-limit bindings are stubbed to deny (would otherwise 429).
  • Hosted smoke (scripts/smoke-hosted.mjs) hammers the upload mutation with unique idempotency keys against preview/PR targets and asserts a 429 with error.code = rate_limited_actor and Retry-After: 60. Skipped on production.

Done

  • Idempotency replay lookup happens before rateLimitAuthenticatedRequest on both upload mutation routes.
  • Unit test covers same-key + over-limit second call returns the cached body, not 429.
  • Hosted smoke exercises 429 on the per-actor limit and asserts the envelope shape; skipped on production.
  • pnpm verify (49 turbo tasks) green.
  • pnpm smoke:local green.

Files touched

  • packages/commands/src/index.ts
  • packages/commands/src/index.test.ts
  • apps/upload/src/index.ts
  • apps/upload/src/index.test.ts
  • scripts/smoke-hosted.mjs

Summary by CodeRabbit

  • New Features

    • Upload API now supports idempotent replay: repeated idempotency keys return prior session responses (create/finalize) and skip rate-limit enforcement on replayed requests.
    • Added a public idempotency "peek" capability to read completed replay results.
  • Tests

    • Unit tests added for idempotent-replay behavior and its rate-limit interaction.
    • Non-production smoke check added to verify actor rate-limit responses.
  • Documentation

    • Project status updated to reflect idempotency and native rate-limit coverage.

Review Change Stack

…ting

Adds a SELECT-only peekIdempotentReplay helper to @agent-paste/commands
and uses it in both upload mutation routes to short-circuit known-good
retries before either rate-limit binding is touched. Hosted smoke now
hammers POST /v1/upload-sessions with unique idempotency keys to confirm
429 with the rate_limited_actor envelope on preview targets.
@coderabbitai

coderabbitai Bot commented May 22, 2026

Copy link
Copy Markdown

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

This PR adds a peek-only idempotency query (peekIdempotentReplay) to the commands package, extends UploadDatabase with an optional peekIdempotentReplay, and wires a peekUploadReplay helper that delegates to DB or a Hyperdrive fallback. createUploadSession and finalizeUploadSession now check for cached completed results (by operation and Idempotency-Key) and return replayed payloads before applying rate limits. Tests assert replay returns without incrementing rate-limit counters, smoke scripts exercise actor-rate-limit 429 behavior, and docs are updated.

Sequence Diagram(s)

sequenceDiagram
  participant Client
  participant UploadHandler
  participant peekUploadReplay
  participant peekIdempotentReplay
  participant idempotency_records_or_hyperdrive
  Client->>UploadHandler: POST /upload-sessions (Idempotency-Key)
  UploadHandler->>peekUploadReplay: peek(operation="upload.session.create", key, actor)
  peekUploadReplay->>peekIdempotentReplay: delegate to DB peek or hyperdrive lookup
  peekIdempotentReplay->>idempotency_records_or_hyperdrive: SELECT or hyperdrive read for idempotency record
  alt completed replay exists
    idempotency_records_or_hyperdrive-->>peekIdempotentReplay: { result_json }
    peekUploadReplay-->>UploadHandler: replayed session payload
    UploadHandler-->>Client: 200 { upload_session_id, put_url, ... }
  else no replay
    idempotency_records_or_hyperdrive-->>peekIdempotentReplay: null
    UploadHandler->>UploadHandler: perform rate-limit checks and normal flow
    UploadHandler-->>Client: 201 created (or other final response)
  end
Loading

Possibly related PRs

  • zaks-io/agent-paste#5: Touches upload-session idempotency and rate-limit wiring at the same create/finalize call sites; changes are directly related.
  • zaks-io/agent-paste#4: Introduced durable idempotent mutation handling and idempotency_records usage that peekIdempotentReplay queries.

Poem

🐰 I peeked the past and made sessions replay,
No extra counts when the same keys play,
A cached hop returns the signed URL neat,
Tests tap their paws — rate limits stay discrete,
🥕✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly and accurately summarizes the main change: implementing idempotent replay before rate-limit accounting on upload mutations.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch worktree-agent-a58e503adc3cfbe5c

Comment @coderabbitai help to get the list of available commands and usage tips.

PR #9 finished the rate-limit work, so mark ADR 0039/0064 done, drop
the backlog entry, and shift items #2-#11 down one slot. Updates the
phase partition note to match the new numbering.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@apps/upload/src/index.test.ts`:
- Around line 107-168: Add a parallel test to index.test.ts that mirrors the
"replays cached idempotent result without consuming rate budget" case but
targets the finalize endpoint: call handleRequest with method POST to
"/v1/upload-sessions/:sessionId/finalize" (use session_id "upl_replay") and the
same env stubs, update DB.peekIdempotentReplay to return { result: session }
when operation === "upload.session.finalize" and idempotencyKey === "replay",
keep ACTOR_RATE_LIMIT and WORKSPACE_BURST_CAP.limit stubs returning { success:
false } and incrementing counters, then assert the response is 200, the returned
finalize payload matches the cached session/finalize result, and that
rateLimitCalls remains { actor: 0, workspace: 0 } so cached replay doesn't
consume rate budget.

In `@apps/upload/src/index.ts`:
- Around line 161-164: The replay branch returns a replayed UploadSessionRecord
from peekUploadReplay and then calls buildCreateSessionResponse which re-signs
upload URLs; change this so replayed responses cannot extend the original
session: read the original session expiry (e.g., expiresAt or ttl) from the
returned UploadSessionRecord in the peekUploadReplay flow and pass that expiry
into buildCreateSessionResponse (or into the URL signing routine such as the
sign/upload URL helper) so the signed URL TTL is computed as max(0,
originalExpiry - now) instead of using a fresh now+ttl; update
buildCreateSessionResponse or the signing helper to accept an optional expiry
cutoff and use it when re-signing in the upload.session.create replay path.

In `@scripts/smoke-hosted.mjs`:
- Around line 63-67: The rate-limit probe assertActorRateLimitFires(key.secret)
is run before subsequent publish checks and can cause 429s to affect later
assertions; reorder the non-production branch so assertActorRateLimitFires runs
last (after assertBytesPurgedAfterDelete(published) and
assertBytesPurgedAfterExpiry(userEnv, published)), ensuring the same key.secret
isn't rate-limited during the publish-based checks; apply the same reorder to
the analogous block around lines 271-297.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: 36ab7a37-1904-4640-a270-7eaa117521e0

📥 Commits

Reviewing files that changed from the base of the PR and between 80b5813 and 405eb4a.

📒 Files selected for processing (5)
  • apps/upload/src/index.test.ts
  • apps/upload/src/index.ts
  • packages/commands/src/index.test.ts
  • packages/commands/src/index.ts
  • scripts/smoke-hosted.mjs

Comment on lines +107 to +168
it("replays cached idempotent result without consuming rate budget", async () => {
const session: UploadSessionRecord = {
session_id: "upl_replay",
artifact_id: "art_replay",
revision_id: "rev_replay",
expires_at: "2030-01-01T00:00:00.000Z",
files: [{ path: "index.html", size_bytes: 12 }],
};
const rateLimitCalls = { actor: 0, workspace: 0 };
const env: Env = {
UPLOAD_SIGNING_SECRET: "secret",
AUTH: {
async verifyApiKey() {
return { type: "api_key", id: "key_1", workspace_id: "w_1" };
},
},
DB: {
async createUploadSession() {
throw new Error("replayed requests must not create new sessions");
},
async getUploadSession() {
return null;
},
async finalizeUploadSession() {
return {};
},
async peekIdempotentReplay({ idempotencyKey, operation }) {
if (operation === "upload.session.create" && idempotencyKey === "replay") {
return { result: session };
}
return null;
},
},
ACTOR_RATE_LIMIT: {
async limit() {
rateLimitCalls.actor += 1;
return { success: false };
},
},
WORKSPACE_BURST_CAP: {
async limit() {
rateLimitCalls.workspace += 1;
return { success: false };
},
},
};

const response = await handleRequest(
new Request("https://upload.test/v1/upload-sessions", {
method: "POST",
headers: { authorization: "Bearer ok", "idempotency-key": "replay", "content-type": "application/json" },
body: JSON.stringify({ files: [{ path: "index.html", size_bytes: 12 }] }),
}),
env,
);

expect(response.status).toBe(200);
const body = (await response.json()) as { upload_session_id: string; files: Array<{ put_url: string }> };
expect(body.upload_session_id).toBe("upl_replay");
expect(body.files[0]?.put_url).toContain("upl_replay");
expect(rateLimitCalls).toEqual({ actor: 0, workspace: 0 });
});

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial | ⚡ Quick win

Add a matching replay test for finalize route.

This new test protects POST /v1/upload-sessions, but the PR also changed replay-before-rate-limit behavior for POST /v1/upload-sessions/:sessionId/finalize. Please add the same assertion pattern there (cached 200 response while actor/workspace limit stubs deny).

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@apps/upload/src/index.test.ts` around lines 107 - 168, Add a parallel test to
index.test.ts that mirrors the "replays cached idempotent result without
consuming rate budget" case but targets the finalize endpoint: call
handleRequest with method POST to "/v1/upload-sessions/:sessionId/finalize" (use
session_id "upl_replay") and the same env stubs, update DB.peekIdempotentReplay
to return { result: session } when operation === "upload.session.finalize" and
idempotencyKey === "replay", keep ACTOR_RATE_LIMIT and WORKSPACE_BURST_CAP.limit
stubs returning { success: false } and incrementing counters, then assert the
response is 200, the returned finalize payload matches the cached
session/finalize result, and that rateLimitCalls remains { actor: 0, workspace:
0 } so cached replay doesn't consume rate budget.

Comment thread apps/upload/src/index.ts
Comment on lines +161 to +164
const replay = await peekUploadReplay<UploadSessionRecord>(db, env, actor, "upload.session.create", idempotencyKey);
if (replay) {
return jsonResponse(await buildCreateSessionResponse(request, env, replay));
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Cap replay-signed upload URLs to session expiry.

Line 163 now serves replayed sessions repeatedly, and Line 219 re-signs URLs each time. Because signing currently uses now + ttl only, repeated replay calls can keep issuing fresh upload tokens even after the original session lifetime intent.

🔧 Proposed fix
 async function signUploadUrl(
   request: Request,
   env: Env,
   session: UploadSessionRecord,
   file: UploadFileInput & { object_key?: string },
 ): Promise<string> {
   if (!env.UPLOAD_SIGNING_SECRET) {
     throw new Error("UPLOAD_SIGNING_SECRET is required");
   }

-  const exp = Math.floor(Date.now() / 1000) + ttlSeconds(env);
+  const nowSec = Math.floor(Date.now() / 1000);
+  const sessionExpSec = Math.floor(new Date(session.expires_at).getTime() / 1000);
+  const exp = Math.min(nowSec + ttlSeconds(env), sessionExpSec);

Also applies to: 211-223

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@apps/upload/src/index.ts` around lines 161 - 164, The replay branch returns a
replayed UploadSessionRecord from peekUploadReplay and then calls
buildCreateSessionResponse which re-signs upload URLs; change this so replayed
responses cannot extend the original session: read the original session expiry
(e.g., expiresAt or ttl) from the returned UploadSessionRecord in the
peekUploadReplay flow and pass that expiry into buildCreateSessionResponse (or
into the URL signing routine such as the sign/upload URL helper) so the signed
URL TTL is computed as max(0, originalExpiry - now) instead of using a fresh
now+ttl; update buildCreateSessionResponse or the signing helper to accept an
optional expiry cutoff and use it when re-signing in the upload.session.create
replay path.

Comment thread scripts/smoke-hosted.mjs
@isuttell

Copy link
Copy Markdown
Contributor Author

/preview

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The new assertActorRateLimitFires helper exhausts the actor's 60/min
budget by hammering an upload mutation. The next assertion,
assertBytesPurgedAfterExpiry, republishes through the same user API key
and was hitting rate_limited_actor at the publish step.

Reorder so the rate-limit probe runs last; bytes-after-delete and
-after-expiry now run on a fresh budget.

Also reconcile docs/ops/project-status.md: PR #6 (CSP) and PR #8
(bytes-after-delete/expiry) closed two backlog items without updating
this doc. Moved both into Recently Completed and renumbered.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
docs/ops/project-status.md (1)

3-3: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Update the "Last updated" date to match this PR's date.

The date should be 2026-05-22 to reflect when these changes were made, not 2026-05-21.

📅 Proposed fix
-Last updated: 2026-05-21 (idempotent mutations and security baseline landed).
+Last updated: 2026-05-22 (idempotent mutations and security baseline landed).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/ops/project-status.md` at line 3, Update the "Last updated" metadata
line in the document by changing the date string "2026-05-21" to "2026-05-22" on
the line that currently reads 'Last updated: 2026-05-21 (idempotent mutations
and security baseline landed).' so the file's header reflects the PR date.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/ops/project-status.md`:
- Line 254: The status line "Status: Done on 2026-05-21 via PR `#9`." has a date
mismatch with PR `#9` (created 2026-05-22); update that line to either match the
PR creation date ("Status: Done on 2026-05-22 via PR `#9`.") or clarify both
events (e.g., "Work completed 2026-05-21; PR `#9` opened 2026-05-22") by editing
the exact string that contains "Status: Done on 2026-05-21 via PR `#9`." in the
document.

---

Outside diff comments:
In `@docs/ops/project-status.md`:
- Line 3: Update the "Last updated" metadata line in the document by changing
the date string "2026-05-21" to "2026-05-22" on the line that currently reads
'Last updated: 2026-05-21 (idempotent mutations and security baseline landed).'
so the file's header reflects the PR date.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: dec6ca35-8790-4eca-9785-4785f6b4e686

📥 Commits

Reviewing files that changed from the base of the PR and between 283d4a5 and f88f42d.

📒 Files selected for processing (2)
  • docs/ops/project-status.md
  • scripts/smoke-hosted.mjs


### Enforce native rate-limit bindings

- Status: Done on 2026-05-21 via PR #9.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Date mismatch: PR #9 was created on 2026-05-22, not 2026-05-21.

Per the PR metadata, this PR was created at 2026-05-22T01:57:26Z. If the work was completed on 2026-05-21, consider clarifying (e.g., "Work completed 2026-05-21; PR #9 opened 2026-05-22"), otherwise update the date to match the PR creation date.

📅 Proposed fix (if aligning to PR creation date)
-- Status: Done on 2026-05-21 via PR `#9`.
+- Status: Done on 2026-05-22 via PR `#9`.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- Status: Done on 2026-05-21 via PR #9.
- Status: Done on 2026-05-22 via PR `#9`.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/ops/project-status.md` at line 254, The status line "Status: Done on
2026-05-21 via PR `#9`." has a date mismatch with PR `#9` (created 2026-05-22);
update that line to either match the PR creation date ("Status: Done on
2026-05-22 via PR `#9`.") or clarify both events (e.g., "Work completed
2026-05-21; PR `#9` opened 2026-05-22") by editing the exact string that contains
"Status: Done on 2026-05-21 via PR `#9`." in the document.

Sequential 120 attempts at ~2/sec spread across multiple Cloudflare
worker isolates means no single isolate ever sees the 60-in-60s
threshold. Replace the serial loop with four parallel waves of 80
requests each. The burst forces a single isolate's counter past the
limit before the trailing window slides.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@isuttell isuttell merged commit 82579a7 into main May 22, 2026
4 checks passed
isuttell added a commit that referenced this pull request May 22, 2026
PR #2 added `permissions: administration: write` to the cleanup
workflow, but `administration` is not a valid GITHUB_TOKEN scope. GitHub
rejected the workflow at evaluation time, silently dropping every
`pull_request.closed` event for PRs #2--#9 and accumulating eight
stale `preview/pr-N` Neon branches that tripped the 10-branch free-tier
cap (blocks PR #10/#11/#12 deploys with HTTP 422).

Drop the invalid permission and the `deleteAnEnvironment` step that
required it; rename the file so GitHub registers a fresh workflow id
instead of reusing the wedged record; validate the resolved PR number
is a positive integer before deleting anything.

Stale Neon branches still need a one-time operator purge -- agent is
forbidden from calling `neondatabase/delete-branch-action` autonomously.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
isuttell added a commit that referenced this pull request May 22, 2026
PR #2 added `permissions: administration: write` to the cleanup
workflow, but `administration` is not a valid GITHUB_TOKEN scope. GitHub
rejected the workflow at evaluation time, silently dropping every
`pull_request.closed` event for PRs #2--#9 and accumulating eight
stale `preview/pr-N` Neon branches that tripped the 10-branch free-tier
cap (blocks PR #10/#11/#12 deploys with HTTP 422).

Drop the invalid permission and the `deleteAnEnvironment` step that
required it; rename the file so GitHub registers a fresh workflow id
instead of reusing the wedged record; validate the resolved PR number
is a positive integer before deleting anything.

Stale Neon branches still need a one-time operator purge -- agent is
forbidden from calling `neondatabase/delete-branch-action` autonomously.

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant