Skip to content

fix: Correct date logic and relationship matching in unsubmitted forms cleanup job#27

Open
hungaikev wants to merge 3 commits into
vula-africa:mainfrom
hungaikev:fix/forms
Open

fix: Correct date logic and relationship matching in unsubmitted forms cleanup job#27
hungaikev wants to merge 3 commits into
vula-africa:mainfrom
hungaikev:fix/forms

Conversation

@hungaikev
Copy link
Copy Markdown

@hungaikev hungaikev commented Feb 18, 2026

Summary

Fixes critical bugs in the cleanup_unsubmitted_forms scheduled job that prevented it from correctly identifying and deleting expired unsubmitted forms. The job now properly handles date filtering, relationship matching, and includes improved error handling and performance optimizations.

Issues Fixed

🔴 Critical Bugs

  1. Incorrect date logic - Changed gte (greater than or equal) to lte (less than or equal) to correctly find tokens older than 7 days
  2. Missing milliseconds in date calculation - Added * 1000 to properly convert days to milliseconds
  3. Incorrect relationship matching - Relationship query now matches by both product_id AND entity_id to prevent deleting wrong relationships
  4. Unsafe null handling - Replaced token.entityId || "" with proper null checks to avoid deleting records with empty string IDs
  5. Incorrect deletion order - Fixed order to: Corpus → Relationship → Token → Entity to avoid foreign key constraint violations

⚠️ Improvements

  1. Added batch processing - Processes tokens in batches of 100 to handle large datasets efficiently
  2. Improved error handling - Per-token try-catch blocks prevent one failure from stopping the entire job
  3. Enhanced observability - Added comprehensive statistics tracking (tokens found, deleted, skipped, errors)
  4. Better transaction handling - Switched to transaction callback form for improved error handling

Changes

  • Fixed date query to use lte: sevenDaysAgo instead of gte: sevenDaysAgo
  • Added entity_id filter to relationship query to ensure correct matching
  • Implemented batch processing with configurable BATCH_SIZE (100)
  • Added upfront filtering of tokens without entityId
  • Fixed deletion order to respect foreign key constraints
  • Added per-token error handling with detailed logging
  • Implemented statistics tracking for monitoring and debugging

Testing

  • Verify date logic correctly identifies tokens older than 7 days
  • Test relationship matching with multiple tokens sharing same product_id but different entity_id
  • Confirm deletion order doesn't violate foreign key constraints
  • Test error handling when individual token processing fails
  • Verify batch processing handles large datasets correctly

Performance Impact

  • Before: Sequential processing, N+1 queries for relationships
  • After: Batch processing, bulk relationship queries, ~10x fewer database queries for large datasets

Trade-offs

  • Batch size: Set to 100 as a balance between memory usage and query efficiency. Can be adjusted based on production metrics.
  • Error handling: Individual token failures are logged but don't stop the job, ensuring maximum cleanup coverage even with partial failures.
  • Relationship query: Now requires both product_id and entity_id, which is more restrictive but ensures correctness.

Related

Addresses issues in the unsubmitted forms cleanup scheduled job.

Summary by CodeRabbit

  • Chores
    • Improved efficiency of cleanup process with batched operations and cursor-based pagination.
    • Enhanced error handling and logging to track processing statistics and operations.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Feb 18, 2026

Walkthrough

Refactors cleanup_unsubmitted_forms to use cursor-based, batched processing (BATCH_SIZE=100) with deterministic sorting, bulk relationship prefetch, per-token transactional deletions, immediate deletion of tokens without entityId, per-batch stats (tokensFound/tokensDeleted/errors/skipped), and final logging.

Changes

Cohort / File(s) Summary
Token Cleanup Batching & Transaction Workflow
tests/unsubmitted_forms/cleanup_unsubmitted_forms.ts
Adds cursor-based pagination (lastId) and deterministic ordering (createdAt asc, id asc) with BATCH_SIZE=100. Splits tokens by presence of entityId; deletes tokens without entityId immediately. Prefetches relationships in bulk (status "new"), maps by product_id/entity_id, processes tokens per-batch with per-token error handling and stats tracking. Per-token transactional steps: delete corpus items, delete relationships, delete token, then delete entity. Logs per-batch summary and overall completion; on outer failure logs aggregated stats. Removes previous date-window single-query approach.

Sequence Diagram(s)

sequenceDiagram
    rect rgba(200,230,255,0.5)
    participant Job as JobScheduleQueue
    end
    rect rgba(220,255,220,0.5)
    participant DB as Database
    end
    rect rgba(255,240,200,0.5)
    participant Logger as Logger
    end

    Job->>DB: fetch batch of tokens (createdAt asc, id asc) where createdAt <= sevenDaysAgo, limit BATCH_SIZE using lastId
    DB-->>Job: tokens[]
    Job->>DB: fetch relationships in bulk for tokens' productIds/entityIds (status="new")
    DB-->>Job: relationships[]
    loop per token
      alt token has no entityId
        Job->>DB: delete token
        DB-->>Job: delete result
        Job->>Logger: increment tokensDeleted
      else token has entityId
        Job->>DB: begin transaction
        DB-->>Job: tx started
        Job->>DB: delete corpus items for entity
        Job->>DB: delete matching relationships
        Job->>DB: delete token
        Job->>DB: delete entity
        DB-->>Job: commit / or error
        Job->>Logger: update tokensDeleted or errors
      end
    end
    Job->>Logger: log batch summary (tokensFound, tokensDeleted, skipped, errors)
    opt more tokens (lastId advanced)
      Job->>DB: fetch next batch
    end
    Job->>Logger: log completion summary or failure
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 Batches hop by, a tidy little spree,
I nibble old tokens, set the data free,
Transactions snug, relationships flee,
Counts tallied neat — a clean garden, whee! 🥕

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main changes: it fixes date logic (lte operator, millisecond conversion) and relationship matching (requiring both product_id and entity_id), which are the primary bug fixes in this changeset.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@hungaikev hungaikev changed the title fix: unsubmitted forms fix: Correct date logic and relationship matching in unsubmitted forms cleanup job Feb 18, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
tests/unsubmitted_forms/cleanup_unsubmitted_forms.ts (2)

59-66: Expired tokens without entityId are never deleted and accumulate indefinitely.

Tokens failing the entityId filter are skipped but remain in the database. Since they're already past the 7-day threshold, they'll be re-fetched on every subsequent run without ever being cleaned up. Consider deleting these tokens (they have no entity to cascade to, so a simple delete suffices).

Proposed fix
       // Filter out tokens without entityId upfront
       const validTokens = expiredTokens.filter((t) => t.entityId);
-      stats.skipped += expiredTokens.length - validTokens.length;
+      const invalidTokens = expiredTokens.filter((t) => !t.entityId);
+      if (invalidTokens.length > 0) {
+        await prisma.publicFormsTokens.deleteMany({
+          where: { token: { in: invalidTokens.map((t) => t.token) } },
+        });
+        stats.tokensDeleted += invalidTokens.length;
+      }

       if (validTokens.length === 0) {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unsubmitted_forms/cleanup_unsubmitted_forms.ts` around lines 59 - 66,
Expired tokens filtered out by the entityId check (expiredTokens.filter((t) =>
t.entityId)) are never removed and will reappear on subsequent runs; when
validTokens.length === 0, delete the expiredTokens that lack an entityId instead
of only skipping them: call the existing delete/remove mechanism for those
tokens, increment the appropriate stats counter (e.g., stats.deleted) by the
number removed, and then continue advancing skip by BATCH_SIZE; reference
validTokens, expiredTokens, stats.skipped, BATCH_SIZE and the skip variable to
locate where to perform the deletion.

76-88: Good batch pre-fetch pattern. One edge case: if multiple relationships exist for the same product_id:entity_id pair with status "new", the Map keeps only the last one, leaving the others orphaned until the next run.

If this is a realistic scenario, consider using a Map<string, Relationship[]> (grouping by key) to process all matches.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unsubmitted_forms/cleanup_unsubmitted_forms.ts` around lines 76 - 88,
The current grouping uses relationshipMap = new Map(relationships.map((r) =>
[`${r.product_id}:${r.entity_id}`, r])) which drops duplicates for the same
product_id:entity_id; change this to build a Map<string, Relationship[]> by
pushing each relationship into an array keyed by
`${r.product_id}:${r.entity_id}` (use the existing prisma.relationship.findMany
result), then when processing use the array for each key to handle every
matching relationship instead of only the last one—update any downstream code
that reads relationshipMap to iterate the arrays for full processing.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/unsubmitted_forms/cleanup_unsubmitted_forms.ts`:
- Around line 44-53: The query using prisma.publicFormsTokens.findMany with
skip/take and no orderBy causes non-deterministic pagination; change to a
deterministic pagination strategy by adding an explicit orderBy (e.g., {
createdAt: 'asc', id: 'asc' } to break ties) and replace skip-based paging with
cursor-based paging (use the last returned record's id as cursor in subsequent
calls) or, if keeping offset, ensure orderBy uses createdAt and id so pages are
deterministic; update the loop to track the last cursor (or last createdAt+id)
and use prisma's cursor/skip semantics accordingly (reference
prisma.publicFormsTokens.findMany, BATCH_SIZE, skip, sevenDaysAgo).
- Line 42: The current offset-based pagination using skip and BATCH_SIZE in
tests/unsubmitted_forms/cleanup_unsubmitted_forms.ts loses rows when deletes
occur; replace it with cursor-based pagination: stop using skip, introduce a
lastId (or cursor) variable initialized to 0/null, change the query that
currently uses OFFSET skip to instead filter by the primary key (e.g.
publicFormsTokens.id) with "WHERE id > lastId ORDER BY id ASC LIMIT BATCH_SIZE"
(adjust the id field name if your PK is different), and at the end of each loop
set lastId to the max id from the batch (or the last row's id) instead of doing
skip += BATCH_SIZE so deletions won't cause missed rows.

---

Nitpick comments:
In `@tests/unsubmitted_forms/cleanup_unsubmitted_forms.ts`:
- Around line 59-66: Expired tokens filtered out by the entityId check
(expiredTokens.filter((t) => t.entityId)) are never removed and will reappear on
subsequent runs; when validTokens.length === 0, delete the expiredTokens that
lack an entityId instead of only skipping them: call the existing delete/remove
mechanism for those tokens, increment the appropriate stats counter (e.g.,
stats.deleted) by the number removed, and then continue advancing skip by
BATCH_SIZE; reference validTokens, expiredTokens, stats.skipped, BATCH_SIZE and
the skip variable to locate where to perform the deletion.
- Around line 76-88: The current grouping uses relationshipMap = new
Map(relationships.map((r) => [`${r.product_id}:${r.entity_id}`, r])) which drops
duplicates for the same product_id:entity_id; change this to build a Map<string,
Relationship[]> by pushing each relationship into an array keyed by
`${r.product_id}:${r.entity_id}` (use the existing prisma.relationship.findMany
result), then when processing use the array for each key to handle every
matching relationship instead of only the last one—update any downstream code
that reads relationshipMap to iterate the arrays for full processing.

Comment thread tests/unsubmitted_forms/cleanup_unsubmitted_forms.ts Outdated
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
tests/unsubmitted_forms/cleanup_unsubmitted_forms.ts (3)

134-159: LGTM — deletion order and transaction usage are correct.

The FK-safe order (corpus → relationship → token → entity) inside prisma.$transaction is correct, and the per-token try/catch ensures one failure doesn't abort the entire batch.

One edge case worth being aware of: if the entity already has non-"new" relationships (e.g., a previously submitted form), tx.entity.delete at line 156 will fail on a FK constraint, rolling back the entire transaction. The token and entity are left intact (correct behavior — don't delete a live entity), but the error message "Error processing token …" gives no hint that this was the intentional guard. Consider adding a pre-check or a more descriptive catch message to distinguish this expected skip from an actual error.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unsubmitted_forms/cleanup_unsubmitted_forms.ts` around lines 134 - 159,
The current per-token transaction may fail on tx.entity.delete when the entity
has non-"new" relationships, but the catch only logs a generic "Error processing
token…" message; add a pre-check using the relationshipsForToken array or a
query inside the transaction (e.g., count non-"new" relationships for
token.entityId) before attempting tx.entity.delete to detect this expected
condition and skip deletion, and/or update the catch block around
prisma.$transaction to log a distinct, descriptive message when the failure is
due to existing non-"new" relationships (referencing prisma.$transaction,
relationshipsForToken, tx.relationship.delete, and tx.entity.delete to locate
the code).

70-85: Minor perf: replace the per-token loop with a single deleteMany for tokensWithoutEntityId.

Up to BATCH_SIZE individual DELETE statements are issued sequentially when the batch contains tokens without entityId. A single deleteMany achieves the same result in one round-trip. If you still need per-token error visibility, the outer batch error count already provides aggregate signal.

⚡ Proposed refactor
-      if (tokensWithoutEntityId.length > 0) {
-        for (const token of tokensWithoutEntityId) {
-          try {
-            await prisma.publicFormsTokens.delete({
-              where: { token: token.token },
-            });
-            stats.tokensDeleted++;
-          } catch (error) {
-            stats.errors++;
-            console.error(
-              `Error deleting token without entityId ${token.token}:`,
-              error instanceof Error ? error.message : error
-            );
-          }
-        }
-      }
+      if (tokensWithoutEntityId.length > 0) {
+        try {
+          await prisma.publicFormsTokens.deleteMany({
+            where: { token: { in: tokensWithoutEntityId.map((t) => t.token) } },
+          });
+          stats.tokensDeleted += tokensWithoutEntityId.length;
+        } catch (error) {
+          stats.errors += tokensWithoutEntityId.length;
+          console.error(
+            `Error bulk-deleting ${tokensWithoutEntityId.length} tokens without entityId:`,
+            error instanceof Error ? error.message : error
+          );
+        }
+      }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unsubmitted_forms/cleanup_unsubmitted_forms.ts` around lines 70 - 85,
Replace the per-token delete loop that iterates over tokensWithoutEntityId and
calls prisma.publicFormsTokens.delete for each token with a single bulk call
using prisma.publicFormsTokens.deleteMany; construct the where clause to match
the tokens (e.g. where: { token: { in: tokensWithoutEntityId.map(t => t.token)
}, entityId: null }) and, on success, increment stats.tokensDeleted by the
returned count (result.count); in the catch increment stats.errors and log the
error — remove the individual delete calls and loop around
prisma.publicFormsTokens.delete.

93-99: Tokens with entityId but null productId are silently skipped — add a diagnostic log.

validTokens only filters on t.entityId; tokens with a non-null entityId but null productId are included in the loop but will never find a match in relationshipMap (key null:<entityId> never resolves to a relationship). They get counted as skipped without any indication of the root cause, making future debugging harder.

🔍 Suggested addition
       if (validTokens.length === 0) {
         lastId = expiredTokens[expiredTokens.length - 1].id;
         continue;
       }
+
+      const tokensWithoutProductId = validTokens.filter((t) => !t.productId);
+      if (tokensWithoutProductId.length > 0) {
+        console.warn(
+          `Skipping ${tokensWithoutProductId.length} token(s) with entityId but no productId`
+        );
+      }

Additionally, entityIds.filter(Boolean) on line 98 is redundant — validTokens already guarantees t.entityId is truthy. The filter(Boolean) on productIds (line 95) is necessary and should stay.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unsubmitted_forms/cleanup_unsubmitted_forms.ts` around lines 93 - 99,
The code silently skips tokens that have entityId but null productId; update the
productIds/entityIds collection so you log (via the existing logger) any token
where t.entityId is truthy but t.productId is falsy before they become
"skipped", and remove the redundant .filter(Boolean) on entityIds (since
validTokens guarantees entityId is present). Concretely: iterate validTokens,
push truthy productIds into productIds (keeping the existing filter for
productId), push t.entityId into entityIds without filtering, and when you
encounter t.entityId && !t.productId emit a diagnostic log mentioning the token
(or its id) so these cases are visible; keep using the same variables
productIds, entityIds and the relationshipMap lookup logic unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/unsubmitted_forms/cleanup_unsubmitted_forms.ts`:
- Around line 46-59: The current pagination uses a manual filter (id > lastId)
while ordering by [{ createdAt: "asc" }, { id: "asc" }], which can skip rows;
fix by making the ordering and cursor consistent: simplest fix — change the
query in prisma.publicFormsTokens.findMany to order only by id (orderBy: { id:
"asc"" }) and keep the filter ...(lastId ? { id: { gt: lastId } } : {}), or if
you must preserve createdAt ordering implement proper keyset pagination by
tracking the last row's composite (createdAt, id) and use a composite
cursor/where logic that advances by (createdAt > lastCreatedAt) OR (createdAt ==
lastCreatedAt AND id > lastId) while ordering by [{ createdAt: "asc" }, { id:
"asc" }]; update the variables lastId/lastCreatedAt and loop termination to
match the chosen approach.

---

Nitpick comments:
In `@tests/unsubmitted_forms/cleanup_unsubmitted_forms.ts`:
- Around line 134-159: The current per-token transaction may fail on
tx.entity.delete when the entity has non-"new" relationships, but the catch only
logs a generic "Error processing token…" message; add a pre-check using the
relationshipsForToken array or a query inside the transaction (e.g., count
non-"new" relationships for token.entityId) before attempting tx.entity.delete
to detect this expected condition and skip deletion, and/or update the catch
block around prisma.$transaction to log a distinct, descriptive message when the
failure is due to existing non-"new" relationships (referencing
prisma.$transaction, relationshipsForToken, tx.relationship.delete, and
tx.entity.delete to locate the code).
- Around line 70-85: Replace the per-token delete loop that iterates over
tokensWithoutEntityId and calls prisma.publicFormsTokens.delete for each token
with a single bulk call using prisma.publicFormsTokens.deleteMany; construct the
where clause to match the tokens (e.g. where: { token: { in:
tokensWithoutEntityId.map(t => t.token) }, entityId: null }) and, on success,
increment stats.tokensDeleted by the returned count (result.count); in the catch
increment stats.errors and log the error — remove the individual delete calls
and loop around prisma.publicFormsTokens.delete.
- Around line 93-99: The code silently skips tokens that have entityId but null
productId; update the productIds/entityIds collection so you log (via the
existing logger) any token where t.entityId is truthy but t.productId is falsy
before they become "skipped", and remove the redundant .filter(Boolean) on
entityIds (since validTokens guarantees entityId is present). Concretely:
iterate validTokens, push truthy productIds into productIds (keeping the
existing filter for productId), push t.entityId into entityIds without
filtering, and when you encounter t.entityId && !t.productId emit a diagnostic
log mentioning the token (or its id) so these cases are visible; keep using the
same variables productIds, entityIds and the relationshipMap lookup logic
unchanged.

Comment on lines +46 to +59
while (true) {
const expiredTokens = await prisma.publicFormsTokens.findMany({
where: {
createdAt: {
lte: sevenDaysAgo, // tokens older than or equal to 7 days
},
...(lastId ? { id: { gt: lastId } } : {}), // cursor-based pagination
},
},
});
orderBy: [
{ createdAt: "asc" },
{ id: "asc" }, // deterministic ordering, break ties with id
],
take: BATCH_SIZE,
});
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

find . -name "cleanup_unsubmitted_forms.ts" -type f

Repository: vula-africa/coding_tests

Length of output: 122


🏁 Script executed:

cat -n tests/unsubmitted_forms/cleanup_unsubmitted_forms.ts | head -80

Repository: vula-africa/coding_tests

Length of output: 3752


🌐 Web query:

Prisma cursor pagination manual cursor filter orderBy multiple fields

💡 Result:

1) orderBy with multiple fields (stable ordering)

Use an orderBy array (or multi-key object) so ties on a non-unique field (like createdAt) are broken by a unique field (like id). This avoids duplicates/missing rows caused by unstable ordering. [1] [2]

orderBy: [{ createdAt: 'asc' }, { id: 'asc' }]

Prisma docs (example) show orderBy: [ { role: 'desc' }, { name: 'desc' } ]. [1]


2) Prisma “cursor” requires a unique selector

cursor is typed as ModelWhereUniqueInput, so it can only be a unique field (or a compound @@unique) — not arbitrary multi-column values. [3]

If you want a compound cursor, define a compound unique and use it as the cursor:

model Post {
  id        String   `@id` `@default`(cuid())
  createdAt DateTime `@default`(now())

  @@unique([createdAt, id]) // enables compound cursor input
}

Then:

const page = await prisma.post.findMany({
  take: 20,
  skip: 1,
  cursor: { createdAt_id: { createdAt: cursorCreatedAt, id: cursorId } },
  orderBy: [{ createdAt: 'asc' }, { id: 'asc' }],
})

(Prisma has an open request to support cursors containing all sort-column values without requiring a unique constraint.) [4]


3) “Manual cursor” pagination (cursor filter in where)

If you can’t (or don’t want to) add a compound @@unique, do keyset pagination by putting the cursor logic in where, while keeping the same multi-field orderBy:

ASC example (createdAt, then id):

const page = await prisma.post.findMany({
  take: 20,
  where: cursor
    ? {
        OR: [
          { createdAt: { gt: cursor.createdAt } },
          { createdAt: cursor.createdAt, id: { gt: cursor.id } },
        ],
      }
    : undefined,
  orderBy: [{ createdAt: 'asc' }, { id: 'asc' }],
})

This matches the standard pattern (createdAt, id) > (cursorCreatedAt, cursorId) used to avoid ambiguity when createdAt is not unique. [2]


Why this matters

Using cursor: { id } while sorting by a non-unique field (like createdAt) can produce repeated results because the ordering isn’t stable unless you add a tie-breaker. [5] [2]

Sources: [1] Prisma “Filtering and sorting” docs (orderBy array example); [2] “Stable sorting in Prisma” (why tie-breakers matter); [3] Prisma client delegate reference (cursor is WhereUniqueInput); [4] Prisma issue proposing { createdAt, id } cursors; [5] Prisma issue reporting duplicates with cursor + orderBy createdAt.


🏁 Script executed:

# Check if publicFormsTokens has compound unique constraints
rg "publicFormsTokens" tests/unsubmitted_forms/cleanup_unsubmitted_forms.ts -A 5 -B 5

Repository: vula-africa/coding_tests

Length of output: 1246


🏁 Script executed:

# Search for schema definition
find . -name "*.prisma" -type f 2>/dev/null

Repository: vula-africa/coding_tests

Length of output: 50


🏁 Script executed:

# Search for schema or model definitions in common locations
find . -type f \( -name "*.prisma" -o -name "schema.prisma" -o -name "package.json" \) | head -20

Repository: vula-africa/coding_tests

Length of output: 50


🏁 Script executed:

# Look for any reference to publicFormsTokens schema or type definitions
rg "publicFormsTokens|@@unique|@unique" --type typescript --type prisma -B 2 -A 2 | head -100

Repository: vula-africa/coding_tests

Length of output: 102


🏁 Script executed:

# Check the complete pagination loop and how lastId is used
cat -n tests/unsubmitted_forms/cleanup_unsubmitted_forms.ts | sed -n '40,120p'

Repository: vula-africa/coding_tests

Length of output: 3521


🏁 Script executed:

# Search for schema or prisma config files more broadly
find . -type f -name "schema.*" -o -name "*.prisma" 2>/dev/null | head -20

Repository: vula-africa/coding_tests

Length of output: 50


🏁 Script executed:

# Check package.json or any documentation about the schema
find . -type f \( -name "package.json" -o -name "README*" -o -name "SCHEMA*" \) | head -10

Repository: vula-africa/coding_tests

Length of output: 254


🏁 Script executed:

# Check if there are type definitions or any references to publicFormsTokens structure
rg "publicFormsTokens" --max-count=20 -B 3 -A 3

Repository: vula-africa/coding_tests

Length of output: 1850


🏁 Script executed:

# Look at the complete function to understand context
cat -n tests/unsubmitted_forms/cleanup_unsubmitted_forms.ts | tail -80

Repository: vula-africa/coding_tests

Length of output: 3270


Critical: cursor filter id > lastId is incompatible with composite orderBy [createdAt, id] — tokens will be permanently skipped between batches.

Using a manual cursor filter id > lastId only works when the result set is ordered by id as the primary key. With orderBy: [{ createdAt: "asc" }, { id: "asc" }], the page boundary is determined by createdAt first. Example of the bug:

Batch 1 ends with createdAt id
(lastId) 2025-02-05 z…
Batch 2 should include createdAt id
(skipped by id > 'z') 2025-02-06 a…

The token with createdAt=2025-02-06 and id='a' appears after the previous batch in sorted order, but the filter id > 'z' excludes it since 'a' < 'z' lexicographically. This token is never deleted in the current run.

Fix: Use one of these approaches:

Option A — simplest: order by `id` only
       const expiredTokens = await prisma.publicFormsTokens.findMany({
         where: {
           createdAt: { lte: sevenDaysAgo },
           ...(lastId ? { id: { gt: lastId } } : {}),
         },
-        orderBy: [
-          { createdAt: "asc" },
-          { id: "asc" },
-        ],
+        orderBy: { id: "asc" },
         take: BATCH_SIZE,
       });
Option B — keyset pagination with composite cursor logic
+    let lastCreatedAt: Date | null = null;
     let lastId: string | null = null;

     while (true) {
       const expiredTokens = await prisma.publicFormsTokens.findMany({
         where: {
           createdAt: { lte: sevenDaysAgo },
-          ...(lastId ? { id: { gt: lastId } } : {}),
+          ...(lastCreatedAt && lastId ? {
+            OR: [
+              { createdAt: { gt: lastCreatedAt } },
+              { createdAt: lastCreatedAt, id: { gt: lastId } },
+            ],
+          } : {}),
         },
         orderBy: [
           { createdAt: "asc" },
           { id: "asc" },
         ],
         take: BATCH_SIZE,
       });

       if (expiredTokens.length === 0) break;

       stats.tokensFound += expiredTokens.length;
+      const lastToken = expiredTokens[expiredTokens.length - 1];
+      lastCreatedAt = lastToken.createdAt;
-      lastId = expiredTokens[expiredTokens.length - 1].id;
+      lastId = lastToken.id;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
while (true) {
const expiredTokens = await prisma.publicFormsTokens.findMany({
where: {
createdAt: {
lte: sevenDaysAgo, // tokens older than or equal to 7 days
},
...(lastId ? { id: { gt: lastId } } : {}), // cursor-based pagination
},
},
});
orderBy: [
{ createdAt: "asc" },
{ id: "asc" }, // deterministic ordering, break ties with id
],
take: BATCH_SIZE,
});
while (true) {
const expiredTokens = await prisma.publicFormsTokens.findMany({
where: {
createdAt: {
lte: sevenDaysAgo, // tokens older than or equal to 7 days
},
...(lastId ? { id: { gt: lastId } } : {}), // cursor-based pagination
},
orderBy: { id: "asc" },
take: BATCH_SIZE,
});
Suggested change
while (true) {
const expiredTokens = await prisma.publicFormsTokens.findMany({
where: {
createdAt: {
lte: sevenDaysAgo, // tokens older than or equal to 7 days
},
...(lastId ? { id: { gt: lastId } } : {}), // cursor-based pagination
},
},
});
orderBy: [
{ createdAt: "asc" },
{ id: "asc" }, // deterministic ordering, break ties with id
],
take: BATCH_SIZE,
});
let lastCreatedAt: Date | null = null;
let lastId: string | null = null;
while (true) {
const expiredTokens = await prisma.publicFormsTokens.findMany({
where: {
createdAt: { lte: sevenDaysAgo },
...(lastCreatedAt && lastId ? {
OR: [
{ createdAt: { gt: lastCreatedAt } },
{ createdAt: lastCreatedAt, id: { gt: lastId } },
],
} : {}),
},
orderBy: [
{ createdAt: "asc" },
{ id: "asc" },
],
take: BATCH_SIZE,
});
if (expiredTokens.length === 0) break;
stats.tokensFound += expiredTokens.length;
const lastToken = expiredTokens[expiredTokens.length - 1];
lastCreatedAt = lastToken.createdAt;
lastId = lastToken.id;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unsubmitted_forms/cleanup_unsubmitted_forms.ts` around lines 46 - 59,
The current pagination uses a manual filter (id > lastId) while ordering by [{
createdAt: "asc" }, { id: "asc" }], which can skip rows; fix by making the
ordering and cursor consistent: simplest fix — change the query in
prisma.publicFormsTokens.findMany to order only by id (orderBy: { id: "asc"" })
and keep the filter ...(lastId ? { id: { gt: lastId } } : {}), or if you must
preserve createdAt ordering implement proper keyset pagination by tracking the
last row's composite (createdAt, id) and use a composite cursor/where logic that
advances by (createdAt > lastCreatedAt) OR (createdAt == lastCreatedAt AND id >
lastId) while ordering by [{ createdAt: "asc" }, { id: "asc" }]; update the
variables lastId/lastCreatedAt and loop termination to match the chosen
approach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant