Skip to content

fix(cleanup): UTC cutoff, Composite-cursor Batching, Safe Entity deletion, FK-safe order#10

Open
Siffersari wants to merge 12 commits into
vula-africa:mainfrom
Siffersari:bug-fix/cleanup-unsubmitted-forms
Open

fix(cleanup): UTC cutoff, Composite-cursor Batching, Safe Entity deletion, FK-safe order#10
Siffersari wants to merge 12 commits into
vula-africa:mainfrom
Siffersari:bug-fix/cleanup-unsubmitted-forms

Conversation

@Siffersari
Copy link
Copy Markdown

@Siffersari Siffersari commented Sep 9, 2025

Summary

This PR corrects and hardens the scheduled cleanup job for unsubmitted public forms. It fixes time math and date filtering, adds deterministic pagination and batching, protects entities from accidental deletion, and enforces an FK-safe deletion order. It also improves observability via job status updates and structured logs.

Context / Problem

The original job had several correctness and reliability issues:

  • Subtracted seconds instead of milliseconds (virtually no records matched).
  • Narrow 24-hour window (7 to 6 days ago) instead of “older than 7 days.”
  • Per-token transactions (N+1 round-trips, poor performance).
  • Risky deletes (entities removed even if still referenced; unsafe FK order).
  • Relationship dependency could block valid cleanups.
  • Limited observability for debugging and operations.

What changed

  • Correct date handling with a UTC midnight cutoff 7 days ago (prevents delayed-run drift).
  • Deterministic, scalable pagination using a composite cursor (createdAt, token).
  • Batching: single transaction per batch instead of per-token.
  • Safe entity deletion: only delete entities not referenced by any non-expired tokens.
  • FK-safe order: delete dependent data → tokens → entities.
  • Optional relationship cleanup (non-blocking) for status "new".
  • Job status transitions and structured logs with counters.

Key implementation details

  • UTC cutoff:
    • Compute midnight in UTC, subtract 7 days; filter uses createdAt < cutoff.
  • Composite-cursor pagination (createdAt, token):
    • orderBy createdAt asc, token asc; next page where createdAt > lastCreatedAt OR (createdAt == lastCreatedAt AND token > lastToken).
  • Safe entity deletion:
    • Before deleting entities in a batch, check for any tokens with createdAt ≥ cutoff referencing those entities; if found, keep the entity.
  • Transaction content and order per batch:
    • new_corpus.deleteMany (children) → publicFormsTokens.deleteManyentity.deleteMany (parents).
    • Optional relationship.deleteMany for product_id IN batch and status = "new" (can be scoped further by entity_id if schema requires).

Files changed

  • tests/unsubmitted_forms/cleanup_unsubmitted_forms.ts
  • tests/unsubmitted_forms/README.md (design/ops notes)

Why this approach

  • Correctness: captures all records strictly older than the retention boundary regardless of job start time.
  • Performance: avoids N+1 transactions; minimizes round-trips; keeps memory bounded.
  • Data safety: prevents deleting entities still needed by other in-progress or newer tokens.
  • Integrity: FK constraints respected by delete order.
  • Operations: better logs, counters, and job status transitions.

Assumptions

  • update_job_status(jobId, status) exists and updates job state in the scheduler.
  • Schema has:
    • publicFormsTokens: token, entityId, productId, createdAt.
    • new_corpus referencing entities via entity_id.
    • entity with primary key id.
    • Optional relationship with product_id (and optionally entity_id) and status.
  • If your schema tracks submission flags (e.g., submittedAt or isSubmitted), you can add those filters to the where clause as an extra guard.

Operational guidance

  • Batch size: default 500; tune based on DB capacity.
  • Indexes recommended:
    • publicFormsTokens(createdAt)
    • publicFormsTokens(entityId)
  • Idempotency: re-runs are safe; deletes skip missing rows without failing.

How to test locally

  1. Seed tokens:
    • Older than cutoff (should be deleted).
    • Newer than or equal to cutoff (must be kept).
    • Multiple tokens referencing the same entity (ensure entity only deleted if no non-expired tokens exist).
  2. Run the job and observe logs:
    • “Deleted tokens … Deleted entities … Failed chunks …”
    • Ensure job transitions in_progress → completed.
  3. Validate DB:
    • All tokens with createdAt < cutoff are gone.
    • Entities without any non-expired token references are gone; entities referenced by non-expired tokens remain.
    • Any optional relationships with status "new" tied to products in the batch are removed if enabled.

Trade-offs considered

  • Batching vs one mega-transaction:
    • Batching limits blast radius and avoids long locks/timeouts; slightly more code, much better reliability.
  • Extra safe-entity check:
    • Adds a small query cost, but vastly reduces risk of data loss.
  • UTC cutoff:
    • Opinionated, but eliminates “delayed-run” gaps and keeps retention calendar-accurate.

Future work (optional)

  • Configurable retention days and batch size via env.
  • Metrics and alerting (e.g., Prometheus counters, error budgets).
  • Hardening optional relationship scoping by entity_id if schema requires.

Checklist

  • Correct date math and cutoff boundary
  • Composite-cursor pagination and batching
  • Safe entity deletion and FK-safe order
  • Optional relationship cleanup non-blocking
  • Logs and job status transitions
  • Docs updated (README, EXPLAINER, scripts)

Reference

Summary by CodeRabbit

  • Chores

    • Improved scheduled cleanup for unsubmitted public forms: safer batched deletions of stale tokens/entities, retry and failure resilience, progress logging, and reduced database clutter for tokens older than seven days.
  • Documentation

    • Added README describing job behavior, timing, safety checks, observability, and local testing guidance.
    • Added diagnostic log analysis documenting observed latency/timeout patterns.
  • Tests

    • Added test scaffolding and guidance to validate cleanup behavior and edge cases.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Sep 9, 2025

Walkthrough

Adds a batched, cursor-paginated cleanup job and README to remove public form tokens and related data older than a UTC-midnight 7-day cutoff; introduces per-batch transactions, safe-entity checks, retries, lifecycle logging, and an added log-evidence section in incident analysis docs.

Changes

Cohort / File(s) Summary
Cleanup job implementation
tests/unsubmitted_forms/cleanup_unsubmitted_forms.ts
New batched cleanup: cursor pagination by (createdAt, token) with UTC-midnight 7-day cutoff, BATCH_SIZE=500, MAX_RETRIES=3; job lifecycle state, per-batch logging, per-page retry, transactional delete sequence (new_corpus → tokens → entities → optional relationships), safe-to-delete checks, and stronger Prisma typing.
Cleanup documentation
tests/unsubmitted_forms/README.md
New README describing cutoff calculation, pagination strategy, batching semantics, safe-deletion rules and ordering, observability/logging, testing guidance, assumptions, and operational notes.
Incident analysis docs
tests/end_of_week_issues/analysis.md
Appends "Supporting Evidence from the Logs" with timestamped log comparisons showing latency/timeouts correlated with model v1.3 vs v1.2 across endpoints.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Scheduler
  participant JobRunner
  participant DB
  Note right of Scheduler #EEF6FF: scheduled trigger (daily)
  Scheduler->>JobRunner: start cleanup job
  Note right of JobRunner #F6FFF0: compute UTC-midnight cutoff (7 days)
  loop per-batch (cursor by createdAt, token)
    JobRunner->>DB: SELECT batch of tokens < cutoff ORDER BY createdAt, token LIMIT BATCH_SIZE
    DB-->>JobRunner: batch(rows)
    alt batch empty
      JobRunner->>JobRunner: finish loop
    else
      JobRunner->>DB: query other tokens for batch entityIds
      DB-->>JobRunner: referencing tokens
      JobRunner->>DB: BEGIN TRANSACTION
      Note right of DB #FFF7EE: transactional deletions (ordered)
      JobRunner->>DB: DELETE FROM new_corpus WHERE entity_id IN (safe_entity_ids)
      JobRunner->>DB: DELETE FROM publicFormsTokens WHERE token IN (batch_tokens)
      JobRunner->>DB: DELETE FROM entities WHERE id IN (safe_entity_ids)
      JobRunner->>DB: DELETE FROM relationships WHERE product_id IN (batch_products) AND status='new'
      JobRunner->>DB: COMMIT
      JobRunner->>JobRunner: advance cursor, log progress
    end
  end
  JobRunner->>Scheduler: mark job completed (or failed on error)
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20–30 minutes

Pre-merge checks (3 passed)

✅ Passed checks (3 passed)
Check name Status Explanation
Title Check ✅ Passed The PR title clearly references the substantive changes—UTC cutoff calculation, composite-cursor batching, safe-entity deletion, and FK-safe delete order—each of which is present in the changeset, so it is fully related to the main cleanup improvements.
Description Check ✅ Passed The description is directly focused on the cleanup job improvements and closely aligns with the files changed and PR objectives by detailing the problem, implementation, and testing guidance.
Docstring Coverage ✅ Passed No functions found in the changes. Docstring coverage check skipped.

Poem

"I nibble logs and timestamps neat,
I hop on batches, brisk and fleet,
Seven nights' dust swept clean and gone,
Tokens tumble at my dawn.
🐇✨"


📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge Base: Disabled due to data retention organization setting

📥 Commits

Reviewing files that changed from the base of the PR and between 24e10b6 and a6e2666.

📒 Files selected for processing (1)
  • tests/unsubmitted_forms/cleanup_unsubmitted_forms.ts (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/unsubmitted_forms/cleanup_unsubmitted_forms.ts
✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (7)
tests/unsubmitted_forms/cleanup_unsubmitted_forms.ts (4)

50-76: Composite-cursor pagination: correct comparator; add supporting indexes

The OR comparator correctly implements (createdAt, token) > (lastCreatedAt, lastToken). To keep scans efficient at scale, ensure compound indexes exist:

  • publicFormsTokens(createdAt asc, token asc)
  • publicFormsTokens(entityId asc, createdAt asc) for the entity lookups below.

Would you like me to generate a migration with these indexes?


121-136: Relationship cleanup scope may over-delete; restrict by entity when possible

Deleting by product_id + status "new" could remove rows unrelated to the entities actually being purged (shared product). Prefer scoping by entity where schema allows.

-            prisma.relationship.deleteMany({
-              where: {
-                product_id: { in: productIdsInBatch },
-                status: "new",
-                // If relationships are entity-scoped, uncomment the next line
-                // entity_id: { in: entityIdsSafeToDelete }
-              },
-            })
+            prisma.relationship.deleteMany({
+              where: {
+                product_id: { in: productIdsInBatch },
+                status: "new",
+                entity_id: { in: entityIdsSafeToDelete } // ensure schema supports this
+              },
+            })

If relationships are not entity-scoped, consider deferring this optional cleanup or adding tighter predicates.


53-64: Unsubmitted-only filtering (schema-dependent)

If the schema tracks submission state (e.g., submittedAt or isSubmitted), add it to both the page query and the “other tokens” check to ensure we only clean up unsubmitted flows.

I can wire this in once you confirm the exact column(s).


153-156: Consider “completed_with_errors” status when totalFailedChunks > 0

You already count failed chunks; surfacing that in the terminal status can help operators distinguish clean vs. partial runs.

tests/unsubmitted_forms/README.md (3)

6-15: Document entity-safety nuance and failure cursor policy

Add that entities are deleted only when no tokens remain for them (not just no unexpired tokens) to avoid FK issues/orphans, and that the job advances the cursor only after successful batch commits (retry on failures).


23-27: Operational status for partial success

Since batches can fail but the job continues, note how operators should interpret totals and whether a “completed_with_errors” status is used when failed chunks > 0.


28-37: Indexing requirements

Call out the recommended indexes explicitly to ensure predictable performance:

  • publicFormsTokens(createdAt, token)
  • publicFormsTokens(entityId, createdAt)
  • relationship(product_id, status) if the optional cleanup is enabled.
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge Base: Disabled due to data retention organization setting

📥 Commits

Reviewing files that changed from the base of the PR and between 1dd21f7 and 43bf6ec.

📒 Files selected for processing (2)
  • tests/unsubmitted_forms/README.md (1 hunks)
  • tests/unsubmitted_forms/cleanup_unsubmitted_forms.ts (1 hunks)
🔇 Additional comments (2)
tests/unsubmitted_forms/cleanup_unsubmitted_forms.ts (2)

39-42: UTC cutoff calculation looks correct

Zeroing to UTC midnight then subtracting 7 days avoids drift from late job starts.


90-105: Verify FK referential action and broaden tokens check

  • Confirm whether publicFormsTokens.entityId → entity.id is defined with ON DELETE CASCADE.
  • If it isn’t, remove the createdAt: { gte: cutoff } filter so you keep any other token (expired or not) before deleting its entity.

Comment thread tests/unsubmitted_forms/cleanup_unsubmitted_forms.ts Outdated
Comment thread tests/unsubmitted_forms/cleanup_unsubmitted_forms.ts
… scope relationship cleanup by entity, cursor advance only on success with retries; docs: indexes, cursor policy
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (7)
tests/end_of_week_issues/analysis.md (3)

3-3: Soften causality claim; call it correlation

“Direct correlation” reads as causal. Recommend: “strong correlation observed; needs controlled rollback/flip to establish causation.”


5-15: Clarify the ‘Status’ column semantics

Mixing HTTP states (“OK”) with outcome labels (“Timeout”) is ambiguous. Either use HTTP codes (e.g., 200/504) or add a separate “Outcome” column.

-| Response Time (ms) | Status    |
+| Response Time (ms) | HTTP Code | Outcome |
- | 120 | `OK` |
+ | 120 | 200 | `OK` |
- | 750 | `Timeout` |
+ | 750 | 504 | `Timeout` |

16-16: Remove stray line artifact

There’s a lone “16” line at the end. Delete it to avoid rendering issues.

-16
tests/unsubmitted_forms/cleanup_unsubmitted_forms.ts (4)

52-76: Composite-cursor pagination looks correct; consider native cursor if a unique composite exists

The OR predicate matches (createdAt asc, token asc) traversal. If you can add a unique index on (createdAt, token), you could switch to Prisma’s cursor to offload the OR to tuple comparison.

// If unique index exists:
await prisma.publicFormsTokens.findMany({
  where: { createdAt: { lt: cutoff } },
  orderBy: [{ createdAt: "asc" }, { token: "asc" }],
  cursor: last ? { createdAt_token: { createdAt: last.createdAt, token: last.token } } : undefined,
  skip: last ? 1 : 0,
  take: BATCH_SIZE,
});

Also ensure indexes:

  • publicFormsTokens(createdAt)
  • publicFormsTokens(entityId)

158-159: Add small exponential backoff between retries

Avoid hammering the DB on transient errors.

-            continue; // retry same page; cursor unchanged
+            await new Promise((r) => setTimeout(r, Math.min(1000, 100 * retriesForPage)));
+            continue; // retry same page; cursor unchanged

174-176: Include job id in error log for correlation

Makes failure logs grep-able by job id.

-    console.error("Error cleaning up unsubmitted forms:", error);
+    console.error(`[Cleanup Job ${job.id}] Error cleaning up unsubmitted forms:`, error);

83-90: Minor: rename variables for clarity

tokenIdsInBatch → tokensInPage; entityIdsInBatch → entitiesInPage. Not required, but improves readability when scanning logs.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge Base: Disabled due to data retention organization setting

📥 Commits

Reviewing files that changed from the base of the PR and between 43bf6ec and 24e10b6.

⛔ Files ignored due to path filters (1)
  • tests/unsubmitted_forms/.DS_Store is excluded by !**/.DS_Store
📒 Files selected for processing (3)
  • tests/end_of_week_issues/analysis.md (1 hunks)
  • tests/unsubmitted_forms/README.md (1 hunks)
  • tests/unsubmitted_forms/cleanup_unsubmitted_forms.ts (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • tests/unsubmitted_forms/README.md
🔇 Additional comments (4)
tests/unsubmitted_forms/cleanup_unsubmitted_forms.ts (4)

40-42: UTC midnight cutoff is computed correctly

setUTCHours(0,0,0,0) then subtracting 7 days yields the intended “midnight UTC 7 days ago”.


109-109: Typed ops as Prisma promises — good fix

Using Prisma.PrismaPromise[] avoids the $transaction overload union trap.


71-76: Verify ‘take’ batch size and DB limits

Some databases have parameter limits that make IN (...) with 500 values borderline. If you see “too many SQL variables,” drop to 200–300.


93-108: Ensure database index on publicFormsTokens.entityId
Add or verify an index on entityId in your Prisma schema or generated migrations to keep the “other tokens” lookup performant at scale.

Comment thread tests/unsubmitted_forms/cleanup_unsubmitted_forms.ts Outdated
Comment thread tests/unsubmitted_forms/cleanup_unsubmitted_forms.ts
Comment on lines 169 to 173
await update_job_status(job.id, "completed");
console.log(
`[Cleanup Job ${job.id}] Completed in ${new Date().getTime() - jobStartTime.getTime()}ms. Deleted tokens: ${totalDeletedTokens}. Failed chunks: ${totalFailedChunks}.`
);
} catch (error) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Guard ‘completed’ status on success only

If you adopt the throw above, this block won’t run on failure. Otherwise, add a success flag and set status accordingly.

-    await update_job_status(job.id, "completed");
+    await update_job_status(job.id, "completed");

(No change needed if you throw on failure.)

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In tests/unsubmitted_forms/cleanup_unsubmitted_forms.ts around lines 169 to 173,
the code unconditionally calls update_job_status(job.id, "completed") even when
an error may occur; ensure the job status is set to "completed" only on success
by either moving the update_job_status call into the try block after all work
succeeds or introduce a success flag (set true at end of try) and in the
finally/outer scope call update_job_status(job.id, success ? "completed" :
"failed"); also ensure any thrown errors are allowed to propagate or are logged
before setting the failed status so failure paths are accurately recorded.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants