Skip to content

Fix critical bugs and improve performance in form cleanup job#25

Open
Abiorh001 wants to merge 1 commit into
vula-africa:mainfrom
Abiorh001:fix/unsubmitted_forms
Open

Fix critical bugs and improve performance in form cleanup job#25
Abiorh001 wants to merge 1 commit into
vula-africa:mainfrom
Abiorh001:fix/unsubmitted_forms

Conversation

@Abiorh001
Copy link
Copy Markdown

@Abiorh001 Abiorh001 commented Feb 3, 2026

Fixed multiple critical issues in the daily cleanup job that prevented expired tokens from being deleted. The job had date calculation bugs, inefficient queries, and poor error handling.

Issues Fixed

  1. Date calculation bug - Missing milliseconds multiplier caused tokens to never be cleaned up
  2. Wrong query logic - Only cleaned 24-hour window instead of all expired tokens
  3. N+1 query problem - Individual queries in loop instead of bulk fetch
  4. No pagination - Memory issues with large datasets
  5. Poor error handling - Single failure crashed entire job
  6. Null safety - Invalid entityId could cause database errors
  7. No visibility - Added stats tracking and logging

Performance Improvements

  • Eliminated N+1 queries (O(n) to O(1) per batch)
  • Added pagination to handle any dataset size
  • Reduced database round-trips by approximately 100x for large datasets

Key Changes

  • Fixed date: Date.now() - 7*24*60*60*1000 (was missing *1000)
  • Fixed query: createdAt: { lte: sevenDaysAgo } (was gte/lt window)
  • Bulk fetch relationships with Map lookup
  • Process in batches of 100
  • Individual error handling per token
  • Stats tracking: deleted, skipped, errors

Trade-offs

  • Used skip-based pagination (simple) over cursor-based (faster)
  • Individual transactions (partial success) over batch (faster)
  • Continue on error (resilient) over fail fast

loom video link: https://www.loom.com/share/ba4b4b51d94b47cb85c4ac83dd27336c

Summary by CodeRabbit

Chores

  • Optimized unsubmitted forms cleanup with batch processing for improved efficiency and resource utilization
  • Enhanced error handling with detailed logging and consolidated failure reporting
  • Implemented comprehensive cleanup statistics including counts for tokens processed, deleted, and skipped
  • Improved operational monitoring with better progress tracking capabilities

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Feb 3, 2026

Walkthrough

This change refactors the unsubmitted forms cleanup task to introduce batching for scalable processing, add comprehensive metrics tracking (tokens found, deleted, errors, skipped), optimize relationship resolution through batch queries, and implement per-token transactional processing with robust error handling and detailed failure reporting.

Changes

Cohort / File(s) Summary
Unsubmitted Forms Cleanup Task
tests/unsubmitted_forms/cleanup_unsubmitted_forms.ts
Introduces batching logic with configurable batch size, initializes and updates metrics throughout execution, optimizes relationship loading by querying all expired tokens at once, implements per-token transactional processing with error handling, improves loop control flow with proper break conditions and batch skip increments, and enhances error reporting with consolidated stats and job status updates.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Poem

Hop by hop, we scrub the store,
Batching cleanup like never before!
Metrics track each page we sweep,
Errors caught, transactions deep!
What once was slow now bounds so free! 🐰

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately reflects the main changes: fixing critical bugs (date calculation, query logic, error handling, null safety) and improving performance (eliminating N+1 queries, adding pagination, reducing database round-trips).
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Important

Action Needed: IP Allowlist Update

If your organization protects your Git platform with IP whitelisting, please add the new CodeRabbit IP address to your allowlist:

  • 136.113.208.247/32 (new)
  • 34.170.211.100/32
  • 35.222.179.152/32

Failure to add the new IP will result in interrupted reviews.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Fix all issues with AI agents
In `@tests/unsubmitted_forms/cleanup_unsubmitted_forms.ts`:
- Around line 87-89: The catch block currently logs the raw credential via
token.token; change the log to avoid exposing the full token by logging a
non-sensitive identifier (e.g., token.id) or a masked prefix of token.token
(e.g., first 4 chars + ellipsis) instead, and keep the existing error object in
the message; update the catch handling where stats.errors++ and console.error
are used so it references the safe identifier (token.id or maskedToken) rather
than token.token.
- Around line 71-83: The current loop increments stats.skipped and continues
when !relationship || !token.entityId, which leaves expired public form tokens
undeleted; change the logic so the token deletion always runs (call
tx.publicFormsTokens.delete for token.id inside the transaction) while
conditionally deleting relationship (if relationship exists, call
tx.relationship.delete with relationship.id) and conditionally deleting
corpus/entity only when token.entityId is present (tx.new_corpus.deleteMany and
tx.entity.delete with token.entityId); keep the whole set of deletes inside
prisma.$transaction to ensure atomicity and remove the stats.skipped increment
only for truly skipped entity/relationship cleanup cases, not for leaving tokens
behind.
- Around line 41-53: The loop uses skip-based pagination with
prisma.publicFormsTokens.findMany which can miss records when deletions occur;
change to cursor-based pagination by ordering by a stable field (e.g.,
createdAt, id) via orderBy and use the last returned record as the cursor for
the next findMany call instead of incrementing skip/BATCH_SIZE; update both
occurrences (the loop starting near prisma.publicFormsTokens.findMany and the
similar block around line 93) to pass a cursor object and remove skip logic so
each expired token is processed exactly once.

Comment on lines +41 to +53
const sevenDaysAgo = new Date(Date.now() - 7 * 24 * 60 * 60 * 1000);
let skip = 0;

while (true) {
const expiredTokens = await prisma.publicFormsTokens.findMany({
where: {
createdAt: { lte: sevenDaysAgo },
},
},
});
skip: skip,
take: BATCH_SIZE,
});

if (expiredTokens.length === 0) break;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cat -n tests/unsubmitted_forms/cleanup_unsubmitted_forms.ts

Repository: vula-africa/coding_tests

Length of output: 4334


🏁 Script executed:

find . -type f \( -name "*.prisma" -o -name "*schema*" \) | head -20

Repository: vula-africa/coding_tests

Length of output: 50


🏁 Script executed:

rg "publicFormsTokens" -t ts -t prisma -B 2 -A 2 | head -40

Repository: vula-africa/coding_tests

Length of output: 98


🏁 Script executed:

rg "publicFormsTokens" -t ts -B 2 -A 2 | head -40

Repository: vula-africa/coding_tests

Length of output: 1114


🏁 Script executed:

rg "model publicFormsTokens" -A 15

Repository: vula-africa/coding_tests

Length of output: 50


🏁 Script executed:

find . -name "*.prisma" -o -name "schema.prisma"

Repository: vula-africa/coding_tests

Length of output: 50


🏁 Script executed:

find . -type f -name "*.ts" -o -name "*.js" | xargs rg "interface.*publicFormsTokens|type.*publicFormsTokens" | head -20

Repository: vula-africa/coding_tests

Length of output: 50


🏁 Script executed:

rg "where.*id.*token.id" tests/unsubmitted_forms/cleanup_unsubmitted_forms.ts

Repository: vula-africa/coding_tests

Length of output: 143


Switch to cursor-based pagination with orderBy to prevent skipping tokens during deletion.

Without orderBy, Prisma doesn't guarantee consistent ordering across queries. Combined with skip-based pagination and mid-loop deletions, this causes remaining records to be skipped unpredictably. Use cursor pagination with a stable sort order to ensure each expired token is processed exactly once.

♻️ Suggested cursor-based pagination
-    let skip = 0;
+    let cursor: { id: string } | undefined;
...
       const expiredTokens = await prisma.publicFormsTokens.findMany({
         where: {
           createdAt: { lte: sevenDaysAgo },
         },
-        skip: skip,
-        take: BATCH_SIZE,
+        orderBy: { id: "asc" },
+        take: BATCH_SIZE,
+        ...(cursor ? { cursor, skip: 1 } : {}),
       });
 
       if (expiredTokens.length === 0) break;
+      cursor = { id: expiredTokens[expiredTokens.length - 1].id };
...
-      skip += BATCH_SIZE;

Also applies to: line 93

🤖 Prompt for AI Agents
In `@tests/unsubmitted_forms/cleanup_unsubmitted_forms.ts` around lines 41 - 53,
The loop uses skip-based pagination with prisma.publicFormsTokens.findMany which
can miss records when deletions occur; change to cursor-based pagination by
ordering by a stable field (e.g., createdAt, id) via orderBy and use the last
returned record as the cursor for the next findMany call instead of incrementing
skip/BATCH_SIZE; update both occurrences (the loop starting near
prisma.publicFormsTokens.findMany and the similar block around line 93) to pass
a cursor object and remove skip logic so each expired token is processed exactly
once.

Comment on lines +71 to +83
const relationship = relationshipMap.get(token.productId);

if (!relationship || !token.entityId) {
stats.skipped++;
continue;
}

await prisma.$transaction(async (tx) => {
await tx.relationship.delete({ where: { id: relationship.id } });
await tx.publicFormsTokens.delete({ where: { id: token.id } });
await tx.new_corpus.deleteMany({ where: { entity_id: token.entityId } });
await tx.entity.delete({ where: { id: token.entityId } });
});
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Find and inspect schema.prisma for PublicFormsTokens model
fd -a 'schema.prisma' -t f -exec cat {} \;

Repository: vula-africa/coding_tests

Length of output: 255


🏁 Script executed:

#!/bin/bash
# Find schema.prisma file
fd 'schema.prisma' -t f

Repository: vula-africa/coding_tests

Length of output: 50


🏁 Script executed:

#!/bin/bash
# Search for any prisma files and directories
echo "=== Searching for .prisma files ==="
fd '\\.prisma$' -t f

echo ""
echo "=== Checking for prisma directory ==="
fd '^prisma$' -t d

echo ""
echo "=== Listing repository structure ==="
git ls-files | head -30

Repository: vula-africa/coding_tests

Length of output: 408


🏁 Script executed:

#!/bin/bash
# Read the cleanup_unsubmitted_forms.ts file to understand the test
cat -n tests/unsubmitted_forms/cleanup_unsubmitted_forms.ts

Repository: vula-africa/coding_tests

Length of output: 4334


Skipping on missing relationship/entityId leaves expired tokens behind.

This code increments skipped and continues without deleting the token when either !relationship or !token.entityId. This causes expired tokens to accumulate in the database indefinitely, directly contradicting the stated goal to "prevent the database from being cluttered with unused tokens."

The token should always be deleted (it's the primary artifact to clean up). Only the entity and related corpus entries should be conditionally skipped when entityId is missing. The relationship, if present, should also always be deleted.

🛠️ Suggested conditional cleanup
-          if (!relationship || !token.entityId) {
-            stats.skipped++;
-            continue;
-          }
-
-          await prisma.$transaction(async (tx) => {
-            await tx.relationship.delete({ where: { id: relationship.id } });
-            await tx.publicFormsTokens.delete({ where: { id: token.id } });
-            await tx.new_corpus.deleteMany({ where: { entity_id: token.entityId } });
-            await tx.entity.delete({ where: { id: token.entityId } });
-          });
+          if (!token.entityId) {
+            stats.skipped++;
+            await prisma.$transaction(async (tx) => {
+              if (relationship) {
+                await tx.relationship.delete({ where: { id: relationship.id } });
+              }
+              await tx.publicFormsTokens.delete({ where: { id: token.id } });
+            });
+            continue;
+          }
+
+          await prisma.$transaction(async (tx) => {
+            if (relationship) {
+              await tx.relationship.delete({ where: { id: relationship.id } });
+            }
+            await tx.publicFormsTokens.delete({ where: { id: token.id } });
+            await tx.new_corpus.deleteMany({ where: { entity_id: token.entityId } });
+            await tx.entity.delete({ where: { id: token.entityId } });
+          });
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const relationship = relationshipMap.get(token.productId);
if (!relationship || !token.entityId) {
stats.skipped++;
continue;
}
await prisma.$transaction(async (tx) => {
await tx.relationship.delete({ where: { id: relationship.id } });
await tx.publicFormsTokens.delete({ where: { id: token.id } });
await tx.new_corpus.deleteMany({ where: { entity_id: token.entityId } });
await tx.entity.delete({ where: { id: token.entityId } });
});
const relationship = relationshipMap.get(token.productId);
if (!token.entityId) {
stats.skipped++;
await prisma.$transaction(async (tx) => {
if (relationship) {
await tx.relationship.delete({ where: { id: relationship.id } });
}
await tx.publicFormsTokens.delete({ where: { id: token.id } });
});
continue;
}
await prisma.$transaction(async (tx) => {
if (relationship) {
await tx.relationship.delete({ where: { id: relationship.id } });
}
await tx.publicFormsTokens.delete({ where: { id: token.id } });
await tx.new_corpus.deleteMany({ where: { entity_id: token.entityId } });
await tx.entity.delete({ where: { id: token.entityId } });
});
🤖 Prompt for AI Agents
In `@tests/unsubmitted_forms/cleanup_unsubmitted_forms.ts` around lines 71 - 83,
The current loop increments stats.skipped and continues when !relationship ||
!token.entityId, which leaves expired public form tokens undeleted; change the
logic so the token deletion always runs (call tx.publicFormsTokens.delete for
token.id inside the transaction) while conditionally deleting relationship (if
relationship exists, call tx.relationship.delete with relationship.id) and
conditionally deleting corpus/entity only when token.entityId is present
(tx.new_corpus.deleteMany and tx.entity.delete with token.entityId); keep the
whole set of deletes inside prisma.$transaction to ensure atomicity and remove
the stats.skipped increment only for truly skipped entity/relationship cleanup
cases, not for leaving tokens behind.

Comment on lines +87 to +89
} catch (error) {
stats.errors++;
console.error(`Error processing token ${token.token}:`, error);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don’t log raw token strings.

Token values are credentials; logging them risks leakage. Log the token id or a masked prefix instead.

🔒 Mask token in logs
-          console.error(`Error processing token ${token.token}:`, error);
+          console.error(`Error processing token ${token.id}:`, error);
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
} catch (error) {
stats.errors++;
console.error(`Error processing token ${token.token}:`, error);
} catch (error) {
stats.errors++;
console.error(`Error processing token ${token.id}:`, error);
🤖 Prompt for AI Agents
In `@tests/unsubmitted_forms/cleanup_unsubmitted_forms.ts` around lines 87 - 89,
The catch block currently logs the raw credential via token.token; change the
log to avoid exposing the full token by logging a non-sensitive identifier
(e.g., token.id) or a masked prefix of token.token (e.g., first 4 chars +
ellipsis) instead, and keep the existing error object in the message; update the
catch handling where stats.errors++ and console.error are used so it references
the safe identifier (token.id or maskedToken) rather than token.token.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant