Consume @technical-1/email-archive-parser v3 (retire duplicated parser/detector code)#1
Consume @technical-1/email-archive-parser v3 (retire duplicated parser/detector code)#1Technical-1 wants to merge 9 commits into
Conversation
…mail.date nullable
…eaming no-abort limitation
…move dead inline parsers
…hase/Subscription/Newsletter)
…ity with old factory)
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
📝 WalkthroughWalkthroughThis PR migrates the email analytics app from in-house email parsing and detection logic to an external ChangesMigrate to external library with nullable date support
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
web/src/db/database.ts (1)
218-221:⚠️ Potential issue | 🔴 Critical | 🏗️ Heavy liftFix Dexie
orderBy('date')silently dropping undated emails (date: null).
web/src/db/database.ts(getEmailsand similarlygetEmailHeaders/folder paginated queries) relies onorderBy('date')over a secondary index. Dexie treats secondary indices as sparse, so records whose indexed key value isnull/undefined(non-indexable) are omitted fromorderBy()results. That means imported emails withdate: nullwon’t appear ingetEmails()/getEmailHeaders()and will also be missing fromgetEmailsByFolderPaginated()where[folderId+date]is used, preventing the “Unknown date” UI goal.Options to unblock:
- Ensure the indexed
datecolumn is always indexable at write time (e.g., store a sentinel like0/-Infinityfor “unknown” and map it back in the app), or- Keep a separate non-indexed nullable domain field for “unknown” and base sorting/filtering on an indexable value, or
- Avoid using the
dateindex for list loading—fetch and sort in-memory.Add/confirm an import test case for “undated” emails to ensure they remain visible after persistence and sorting.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@web/src/db/database.ts` around lines 218 - 221, getEmails (and related loaders getEmailHeaders / getEmailsByFolderPaginated) currently use Dexie secondary index ordering (db.emails.orderBy('date') / orderBy('[folderId+date]')) which silently omits records with date null; to fix, persist an indexable sentinel value for missing dates (e.g., add/maintain a dateIndex numeric field set to epoch 0 or -Infinity when date is null and index/sort on dateIndex) or alter those loaders to read all relevant records and perform an in-memory sort by a mapped date (treating null as oldest) instead of relying on the sparse index; update the read/write code paths that create Email records (where date is set) to populate the new indexable field and add/adjust tests to assert imported “undated” emails remain visible and correctly ordered after persistence and retrieval.web/src/services/backupService.ts (1)
327-391:⚠️ Potential issue | 🔴 Critical | ⚡ Quick winCritical: Import path corrupts nullable dates to epoch.
The export path (lines 100, 116, 136, 167, 179) correctly preserves
nulldates, but the import path callstoTimestampwithout null guards. SincetoTimestamp(line 445) executesnew Date(null).getTime(), it convertsnullto0(epoch), breaking round-trip integrity.Impact: A backup exported with
email.date = nullwill be imported asdate = 0(Jan 1, 1970), corrupting the nullable-date contract and causing "Unknown date" emails to display as 1970.🛡️ Proposed fix: guard toTimestamp calls with null checks
const dbEmails: DBEmail[] | null = emails && emails.length > 0 - ? (emails.map((e) => ({ ...e, date: this.toTimestamp(e.date, 'email.date') })) as DBEmail[]) + ? (emails.map((e) => ({ ...e, date: e.date == null ? null : this.toTimestamp(e.date, 'email.date') })) as DBEmail[]) : null;const dbAccounts: DBAccount[] | null = accounts && accounts.length > 0 ? (accounts.map((a) => ({ ...a, - signupDate: this.toTimestamp(a.signupDate, 'account.signupDate'), + signupDate: a.signupDate == null ? null : this.toTimestamp(a.signupDate, 'account.signupDate'), lastActivityDate: a.lastActivityDate ? this.toTimestamp(a.lastActivityDate, 'account.lastActivityDate') : undefined, })) as unknown as DBAccount[]) : null;const dbContacts: DBContact[] | null = contacts && contacts.length > 0 ? (contacts.map((c) => ({ ...c, - lastEmailDate: this.toTimestamp(c.lastEmailDate, 'contact.lastEmailDate'), + lastEmailDate: c.lastEmailDate == null ? null : this.toTimestamp(c.lastEmailDate, 'contact.lastEmailDate'), })) as DBContact[]) : null;const dbSubscriptions: DBSubscription[] | null = subscriptions && subscriptions.length > 0 ? (subscriptions.map((s) => ({ ...s, - lastRenewalDate: this.toTimestamp(s.lastRenewalDate, 'subscription.lastRenewalDate'), + lastRenewalDate: s.lastRenewalDate == null ? null : this.toTimestamp(s.lastRenewalDate, 'subscription.lastRenewalDate'), nextRenewalDate: s.nextRenewalDate ? this.toTimestamp(s.nextRenewalDate, 'subscription.nextRenewalDate') : undefined, emailIds: JSON.stringify(s.emailIds || []), })) as unknown as DBSubscription[]) : null;const dbNewsletters: DBNewsletter[] | null = newsletters && newsletters.length > 0 ? (newsletters.map((n) => ({ ...n, - lastEmailDate: this.toTimestamp(n.lastEmailDate, 'newsletter.lastEmailDate'), + lastEmailDate: n.lastEmailDate == null ? null : this.toTimestamp(n.lastEmailDate, 'newsletter.lastEmailDate'), })) as DBNewsletter[]) : null;🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@web/src/services/backupService.ts` around lines 327 - 391, The import mapping calls this.toTimestamp unguarded and new Date(null) turns null into 0; update each mapping in the import path (the dbEmails, dbAccounts, dbPurchases, dbContacts, dbEvents, dbFolders, dbSubscriptions, dbNewsletters mappings shown) to first check for null/undefined before calling this.toTimestamp and preserve null/undefined values (e.g., if e.date == null leave as null, if a.lastActivityDate == null leave undefined/null) so toTimestamp is only invoked for actual date values; ensure usages reference the toTimestamp method name exactly and keep JSON.stringify for emailIds unchanged.
🧹 Nitpick comments (1)
web/src/pages/AnalyticsPage.tsx (1)
18-19: 💤 Low valueConsider simplifying redundant Date wrapping.
After filtering
email.date != null, TypeScript knowsemail.dateis aDateobject, so wrapping it innew Date(...)creates an unnecessary copy:
- Line 19:
new Date(email.date).getFullYear()→email.date.getFullYear()- Line 27:
new Date(e.date).getFullYear()→e.date.getFullYear()- Line 40:
new Date(e.date) >= thirtyDaysAgo→e.date >= thirtyDaysAgo- Line 48:
new Date(e.date as Date).getTime()→e.date.getTime()(also removes redundant type assertion)- Line 70:
new Date(email.date)→email.dateThe null guards themselves are correct and essential.
♻️ Proposed simplification
emails.forEach((email) => { if (!email.date) return; // undated emails contribute no year - years.add(new Date(email.date).getFullYear()); + years.add(email.date.getFullYear()); });if (selectedYear === 'all') return emails; - return emails.filter(e => e.date != null && new Date(e.date).getFullYear() === selectedYear); + return emails.filter(e => e.date != null && e.date.getFullYear() === selectedYear);- const recentEmails = filteredEmails.filter(e => e.date != null && new Date(e.date) >= thirtyDaysAgo); + const recentEmails = filteredEmails.filter(e => e.date != null && e.date >= thirtyDaysAgo);const sortedDates = filteredEmails .filter(e => e.date != null) - .map(e => new Date(e.date as Date).getTime()) + .map(e => e.date.getTime()) .sort((a, b) => a - b);filteredEmails.forEach((email) => { if (!email.date) return; // undated emails excluded from volume aggregation - const date = new Date(email.date); + const date = email.date; const key = `${date.getFullYear()}-${String(date.getMonth() + 1).padStart(2, '0')}`;filteredEmails.forEach((email) => { if (!email.date) return; // undated emails excluded from activity heatmap - const date = new Date(email.date); + const date = email.date; const day = date.getDay();Also applies to: 27-27, 40-40, 46-49, 69-71, 135-136
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@web/src/pages/AnalyticsPage.tsx` around lines 18 - 19, Replace redundant new Date(...) wrappings on email date values with direct Date property access: in the block that adds to years use years.add(email.date.getFullYear()) instead of new Date(email.date).getFullYear(); in any map/filter using e.date use e.date.getFullYear() and comparisons like e.date >= thirtyDaysAgo instead of new Date(e.date) >= thirtyDaysAgo; replace new Date(e.date as Date).getTime() with e.date.getTime(); and use email.date directly where a Date is needed (e.g., remove new Date(email.date)). Keep the existing null/undefined guards on email.date but remove the extra Date construction and the redundant type assertion.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@web/src/db/database.ts`:
- Around line 18-23: DBEmail.date being nullable breaks existing indexes (used
by orderBy('date') in getEmails/getEmailHeaders) because indexed fields cannot
be null; fix by keeping the indexed field numeric and mapping nulls to a
sentinel before storage: change DBEmail.date to number (timestamp) for
storage/indexing or add a separate indexed field (e.g., indexedDate: number) and
ensure the persistence layer (where emails are created/updated) converts null ->
0 (or another sentinel) and reads 0 back to null when returning Email objects;
update any index definitions and references (orderBy('date') or
orderBy('indexedDate')) and the getEmails/getEmailHeaders plumbing to use the
indexed numeric field.
In `@web/src/workers/parserWorker.ts`:
- Around line 229-244: The dedup key currently uses e.threadId which collapses
Gmail conversation-level threads and drops distinct messages; update the key
generation in the parser.parseStreaming callback to prefer the message-id from
the libEmail (e.g., use libEmail.messageId or e.messageId) instead of threadId,
falling back to the existing `${e.subject}|${e.sender}|...` composite when
message-id is missing; modify the line that assigns key (and any related
seenEmailKeys logic) to use messageId-first deduplication so each unique message
is preserved.
In `@web/src/workers/toAppEmail.ts`:
- Line 19: The mapper in toAppEmail.ts currently hard-codes attachments: [],
dropping parsed attachments; update the mapper that converts LibEmail -> Email
(the toAppEmail function) to map LibEmail.attachments into the app Attachment
shape instead of an empty array, preserving fields like filename, contentType,
size and including the attachment base64 payload (e.g., data) when present so
the persistence layer (insertEmail / bulkInsertEmails which split
email.attachments and emailBodies.attachmentData) can store metadata and payload
correctly; ensure the mapped property names match the app Attachment/interface
expected by the database layer.
---
Outside diff comments:
In `@web/src/db/database.ts`:
- Around line 218-221: getEmails (and related loaders getEmailHeaders /
getEmailsByFolderPaginated) currently use Dexie secondary index ordering
(db.emails.orderBy('date') / orderBy('[folderId+date]')) which silently omits
records with date null; to fix, persist an indexable sentinel value for missing
dates (e.g., add/maintain a dateIndex numeric field set to epoch 0 or -Infinity
when date is null and index/sort on dateIndex) or alter those loaders to read
all relevant records and perform an in-memory sort by a mapped date (treating
null as oldest) instead of relying on the sparse index; update the read/write
code paths that create Email records (where date is set) to populate the new
indexable field and add/adjust tests to assert imported “undated” emails remain
visible and correctly ordered after persistence and retrieval.
In `@web/src/services/backupService.ts`:
- Around line 327-391: The import mapping calls this.toTimestamp unguarded and
new Date(null) turns null into 0; update each mapping in the import path (the
dbEmails, dbAccounts, dbPurchases, dbContacts, dbEvents, dbFolders,
dbSubscriptions, dbNewsletters mappings shown) to first check for null/undefined
before calling this.toTimestamp and preserve null/undefined values (e.g., if
e.date == null leave as null, if a.lastActivityDate == null leave
undefined/null) so toTimestamp is only invoked for actual date values; ensure
usages reference the toTimestamp method name exactly and keep JSON.stringify for
emailIds unchanged.
---
Nitpick comments:
In `@web/src/pages/AnalyticsPage.tsx`:
- Around line 18-19: Replace redundant new Date(...) wrappings on email date
values with direct Date property access: in the block that adds to years use
years.add(email.date.getFullYear()) instead of new
Date(email.date).getFullYear(); in any map/filter using e.date use
e.date.getFullYear() and comparisons like e.date >= thirtyDaysAgo instead of new
Date(e.date) >= thirtyDaysAgo; replace new Date(e.date as Date).getTime() with
e.date.getTime(); and use email.date directly where a Date is needed (e.g.,
remove new Date(email.date)). Keep the existing null/undefined guards on
email.date but remove the extra Date construction and the redundant type
assertion.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: bb48665c-6ad2-4cf0-8325-5b0157007238
⛔ Files ignored due to path filters (1)
web/package-lock.jsonis excluded by!**/package-lock.json
📒 Files selected for processing (50)
web/package.jsonweb/src/__tests__/phase-7/mboxParser.test.tsweb/src/__tests__/phase-9/accountDetector.detect.test.tsweb/src/__tests__/phase-9/accountDetector.domain.test.tsweb/src/__tests__/phase-9/bucket-d-regression.test.tsxweb/src/__tests__/phase-9/domainMatch.test.tsweb/src/__tests__/phase-9/newsletterDetector.classify.test.tsweb/src/__tests__/phase-9/newsletterDetector.domain.test.tsweb/src/__tests__/phase-9/purchaseDetector.currency.test.tsweb/src/__tests__/phase-9/purchaseDetector.detect.test.tsweb/src/__tests__/phase-9/purchaseDetector.domain.test.tsweb/src/__tests__/phase-9/purchaseDetector.locale.test.tsweb/src/__tests__/phase-9/snippet-render.test.tsxweb/src/__tests__/phase-9/subscriptionDetector.billing.test.tsweb/src/__tests__/phase-9/subscriptionDetector.detect.test.tsweb/src/__tests__/phase-9/subscriptionDetector.domain.test.tsweb/src/components/AttachmentGallery.tsxweb/src/components/ContactModal.tsxweb/src/components/EmailCard.tsxweb/src/components/ThreadView.tsxweb/src/db/database.tsweb/src/pages/AccountsPage.tsxweb/src/pages/AnalyticsPage.tsxweb/src/pages/AttachmentsPage.tsxweb/src/pages/ContactsPage.tsxweb/src/pages/EmailDetailPage.tsxweb/src/pages/EmailsPage.tsxweb/src/pages/HomePage.tsxweb/src/pages/NewslettersPage.tsxweb/src/pages/SenderEmailsPage.tsxweb/src/pages/SendersPage.tsxweb/src/pages/SubscriptionsPage.tsxweb/src/services/__tests__/library-smoke.test.tsweb/src/services/accountDetector.tsweb/src/services/backupService.tsweb/src/services/domainMatch.tsweb/src/services/gmailTakeoutParser.tsweb/src/services/importPipeline.tsweb/src/services/mboxParser.tsweb/src/services/newsletterDetector.tsweb/src/services/olmParser.tsweb/src/services/purchaseDetector.tsweb/src/services/searchParser.tsweb/src/services/subscriptionDetector.tsweb/src/services/threadingService.tsweb/src/types/index.tsweb/src/utils/emailUtils.tsweb/src/workers/__tests__/toAppEmail.test.tsweb/src/workers/parserWorker.tsweb/src/workers/toAppEmail.ts
💤 Files with no reviewable changes (22)
- web/src/tests/phase-9/domainMatch.test.ts
- web/src/services/subscriptionDetector.ts
- web/src/tests/phase-9/subscriptionDetector.detect.test.ts
- web/src/services/accountDetector.ts
- web/src/tests/phase-9/accountDetector.detect.test.ts
- web/src/services/domainMatch.ts
- web/src/tests/phase-9/accountDetector.domain.test.ts
- web/src/tests/phase-9/purchaseDetector.locale.test.ts
- web/src/tests/phase-9/purchaseDetector.detect.test.ts
- web/src/tests/phase-9/purchaseDetector.domain.test.ts
- web/src/services/newsletterDetector.ts
- web/src/services/mboxParser.ts
- web/src/services/purchaseDetector.ts
- web/src/services/olmParser.ts
- web/src/tests/phase-9/purchaseDetector.currency.test.ts
- web/src/services/gmailTakeoutParser.ts
- web/src/tests/phase-9/newsletterDetector.domain.test.ts
- web/src/tests/phase-9/subscriptionDetector.billing.test.ts
- web/src/tests/phase-7/mboxParser.test.ts
- web/src/tests/phase-9/subscriptionDetector.domain.test.ts
- web/src/tests/phase-9/newsletterDetector.classify.test.ts
- web/src/utils/emailUtils.ts
| export interface DBEmail extends Omit<Email, 'date' | 'body' | 'htmlBody'> { | ||
| id: number; | ||
| date: number; | ||
| date: number | null; | ||
| body?: string; | ||
| htmlBody?: string; | ||
| } |
There was a problem hiding this comment.
date remains indexed while becoming nullable — see the orderBy('date') exclusion flagged above.
The DBEmail.date: number | null widening is correct for storage, but date is declared as an index across schema versions 1–6 (Lines 78, 87, 97, 107, 120, 173) and in three compound indexes. Records with date: null won't be enumerable through these indexes. This is the root of the query-exclusion issue noted on getEmails/getEmailHeaders.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@web/src/db/database.ts` around lines 18 - 23, DBEmail.date being nullable
breaks existing indexes (used by orderBy('date') in getEmails/getEmailHeaders)
because indexed fields cannot be null; fix by keeping the indexed field numeric
and mapping nulls to a sentinel before storage: change DBEmail.date to number
(timestamp) for storage/indexing or add a separate indexed field (e.g.,
indexedDate: number) and ensure the persistence layer (where emails are
created/updated) converts null -> 0 (or another sentinel) and reads 0 back to
null when returning Email objects; update any index definitions and references
(orderBy('date') or orderBy('indexedDate')) and the getEmails/getEmailHeaders
plumbing to use the indexed numeric field.
| await parser.parseStreaming(mboxFile, undefined, async (batch) => { | ||
| if (ctx.isCancelled) return; | ||
| for (const libEmail of batch) { | ||
| const e = toAppEmail(libEmail); | ||
| const key = e.threadId || `${e.subject}|${e.sender}|${e.date ? e.date.getTime() : 'nodate'}`; | ||
| if (seenEmailKeys.has(key)) continue; | ||
| seenEmailKeys.add(key); | ||
| currentBatch.push(email); | ||
|
|
||
| currentBatch.push({ ...e, folderId }); | ||
| if (currentBatch.length >= BATCH_SIZE) { | ||
| sendEmailBatch(currentBatch, batchNumber, false); | ||
| sendEmailBatch(currentBatch, batchNumber++, false); | ||
| ctx.totalEmailsParsed += currentBatch.length; | ||
| batchNumber++; | ||
| currentBatch = []; | ||
| await new Promise(resolve => setTimeout(resolve, 0)); | ||
| await new Promise((r) => setTimeout(r, 0)); | ||
| } | ||
| } | ||
| } | ||
| }); |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Inspect how the library assigns threadId for MBOX parsing (thread-scoped vs message-scoped),
# and whether a message-id field is exposed on the parsed Email.
pkg=$(curl -s https://registry.npmjs.org/@technical-1%2Femail-archive-parser | jq -r '."dist-tags".latest')
url=$(curl -s https://registry.npmjs.org/@technical-1%2Femail-archive-parser | jq -r --arg v "$pkg" '.versions[$v].dist.tarball')
tmp=$(mktemp -d); curl -sL "$url" -o "$tmp/p.tgz"; tar -xzf "$tmp/p.tgz" -C "$tmp"
rg -nP -C3 'threadId|messageId|message-id|X-GM-THRID' "$tmp/package" || echo "no matches"Repository: Technical-1/EmailAnalyzer
Length of output: 8835
🏁 Script executed:
#!/bin/bash
# Inspect parserWorker.ts around the reported dedup logic to identify which parsing path it belongs to.
file="web/src/workers/parserWorker.ts"
if [ ! -f "$file" ]; then
echo "Missing $file"
exit 1
fi
sed -n '200,280p' "$file" | nl -ba | sed -n '1,120p'Repository: Technical-1/EmailAnalyzer
Length of output: 110
🏁 Script executed:
#!/bin/bash
set -euo pipefail
file="web/src/workers/parserWorker.ts"
sed -n '1,120p' "$file" | nl -ba | sed -n '1,120p'
echo "----"
sed -n '120,220p' "$file" | nl -ba | sed -n '1,120p'
echo "----"
sed -n '220,270p' "$file" | nl -ba | sed -n '1,80p'
echo "----"
sed -n '270,360p' "$file" | nl -ba | sed -n '1,120p'Repository: Technical-1/EmailAnalyzer
Length of output: 112
🏁 Script executed:
#!/bin/bash
set -euo pipefail
file="web/src/workers/parserWorker.ts"
wc -l "$file"Repository: Technical-1/EmailAnalyzer
Length of output: 105
🏁 Script executed:
#!/bin/bash
set -euo pipefail
file="web/src/workers/parserWorker.ts"
sed -n '210,260p' "$file" | nl -ba | sed -n '1,120p'Repository: Technical-1/EmailAnalyzer
Length of output: 112
🏁 Script executed:
#!/bin/bash
set -euo pipefail
file="web/src/workers/parserWorker.ts"
rg -n "seenEmailKeys|threadId|messageId|parseStreaming" "$file"Repository: Technical-1/EmailAnalyzer
Length of output: 589
🏁 Script executed:
#!/bin/bash
set -euo pipefail
file="web/src/workers/parserWorker.ts"
sed -n '180,260p' "$file" | cat -nRepository: Technical-1/EmailAnalyzer
Length of output: 4224
🏁 Script executed:
#!/bin/bash
set -euo pipefail
file="web/src/workers/parserWorker.ts"
sed -n '210,260p' "$file" | nl -ba | sed -n '1,120p'Repository: Technical-1/EmailAnalyzer
Length of output: 112
🏁 Script executed:
#!/bin/bash
set -euo pipefail
rg -n "function toAppEmail|toAppEmail\\s*\\(" web/src/workers/parserWorker.ts
rg -n "type Email\\b|interface Email\\b|export type Email\\b|from '.*Email'" web/src/workers/parserWorker.ts web/src -SRepository: Technical-1/EmailAnalyzer
Length of output: 726
🏁 Script executed:
#!/bin/bash
set -euo pipefail
file="web/src/workers/parserWorker.ts"
# Show the toAppEmail definition (and nearby mappings) without dumping the whole file
start=$(rg -n "function toAppEmail\\b" "$file" | head -n1 | cut -d: -f1 || true)
if [ -n "${start}" ]; then
from=$((start-30)); to=$((start+80))
sed -n "${from},${to}p" "$file" | nl -ba | head -n 200
else
# fallback: show a reasonable region around where toAppEmail is referenced
ref=$(rg -n "toAppEmail\\(" "$file" | head -n1 | cut -d: -f1)
from=$((ref-60)); to=$((ref+60))
sed -n "${from},${to}p" "$file" | nl -ba | head -n 200
fiRepository: Technical-1/EmailAnalyzer
Length of output: 113
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Find the app Email model/type and show messageId/originalId fields (if present)
fd -t f "Email.ts|email.*\\.ts|types.*Email.*\\.ts" web/src -a 2>/dev/null | head -n 50Repository: Technical-1/EmailAnalyzer
Length of output: 459
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Search for messageId/originalId in the app codebase
rg -n "messageId\\b|originalId\\b" web/srcRepository: Technical-1/EmailAnalyzer
Length of output: 51
🏁 Script executed:
#!/bin/bash
set -euo pipefail
file1="web/src/workers/toAppEmail.ts"
file2="web/src/types/index.ts"
echo "---- $file1 ----"
awk 'NR>=1 && NR<=220 {print NR ":" $0}' "$file1" | sed -n '1,220p'
echo "---- $file2 ----"
awk 'NR>=1 && NR<=220 {print NR ":" $0}' "$file2" | sed -n '1,220p'Repository: Technical-1/EmailAnalyzer
Length of output: 7029
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Search specifically within app for messageId/originalId property usage in email mapping
rg -n "messageId|originalId" web/src/workers/toAppEmail.ts web/src/types web/src -SRepository: Technical-1/EmailAnalyzer
Length of output: 51
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Re-show just the parseGmailTakeoutFile dedup key area with line numbers via awk (no nl)
file="web/src/workers/parserWorker.ts"
awk 'NR>=1 && NR<=1 {print}' "$file" >/dev/null
awk 'NR>=200 && NR<=260 {print NR ":" $0}' "$file" | sed -n '200,260p'Repository: Technical-1/EmailAnalyzer
Length of output: 51
Fix Gmail Takeout dedup to avoid conversation-scoped threadId collisions. In parseGmailTakeoutFile, the dedup key is e.threadId || ..., so when threadId comes from Gmail x-gm-thrid (conversation-level), only the first email per thread is kept and subsequent distinct messages are dropped. Use libEmail.messageId (Message-ID) for the dedup key (fallback when missing) instead of relying on threadId.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@web/src/workers/parserWorker.ts` around lines 229 - 244, The dedup key
currently uses e.threadId which collapses Gmail conversation-level threads and
drops distinct messages; update the key generation in the parser.parseStreaming
callback to prefer the message-id from the libEmail (e.g., use
libEmail.messageId or e.messageId) instead of threadId, falling back to the
existing `${e.subject}|${e.sender}|...` composite when message-id is missing;
modify the line that assigns key (and any related seenEmailKeys logic) to use
messageId-first deduplication so each unique message is preserved.
| date: e.date, // Date | null | ||
| body: e.body, | ||
| htmlBody: e.htmlBody, | ||
| attachments: [], |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Confirm whether attachments are handled anywhere else in the worker/import path.
rg -nP --type=ts -C3 '\battachments\b' -g 'web/src/workers/**' -g 'web/src/services/**'
# Look for any consumer that maps library attachments into app Attachment.
ast-grep --pattern 'attachments: $_'Repository: Technical-1/EmailAnalyzer
Length of output: 3992
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Locate the parser worker implementation and where attachments might be produced.
fd -a "parserWorker.ts" web/src/workers -t f -x cat -n | head -n 260
# Inspect the mapper in toAppEmail.ts around the attachments field.
sed -n '1,120p' web/src/workers/toAppEmail.ts | cat -n
# Find any usage of `e.attachments` or `attachments` produced by the parsing library types.
rg -n --type=ts --glob 'web/src/workers/**' --glob 'web/src/services/**' '\be\.attachments\b|\battachments\s*:\s*e\.attachments\b|\battachment(s)?\b'
# Check app/domain type definition for Attachment and where Email.attachments is expected.
fd -a "*Attachment*" web/src -t f -x rg -n --type=ts -S 'export (type|interface|class) Attachment|attachments:\s*Attachment\[\]' || true
fd -a "*Email*" web/src -t f -x rg -n --type=ts -S 'attachments:\s*Attachment\[\]|attachments\??:' || true
# Look for any code that populates attachments after mapping (e.g., via attachmentService or later enrichment).
rg -n --type=ts 'attachmentService|downloadMultiple\(|createAttachment|extractAttachment|enrich.*attachment|populate.*attachments' web/src/services web/src/workersRepository: Technical-1/EmailAnalyzer
Length of output: 16138
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Inspect app Email/Attachment types.
fd -t f "types.ts" web/src -x sed -n '1,260p'
fd -a "*types*" web/src -t f -x rg -n --type=ts 'export (type|interface|class) Email|export (type|interface|class) Attachment|attachments:\s*Attachment\[\]'
# Find any later enrichment/persistence steps that touch `email.attachments`.
rg -n --type=ts '\.attachments\b|attachments:' web/src --glob '!**/node_modules/**' | head -n 200
# Find how attachments are persisted/created from downloaded data.
rg -n --type=ts 'downloadMultiple\(|downloadAttachment\(|createBlob\(|mimeType|filename' web/src/services web/src/workers | head -n 200Repository: Technical-1/EmailAnalyzer
Length of output: 2790
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# List likely types directory/files.
ls -la web/src || true
ls -la web/src/types || true
# Find and print the app Email/Attachment type definitions.
fd -t f -g 'types*' web/src | head -n 50
fd -t f -g '*attachment*' web/src/types web/src | head -n 50
fd -t f -g '*email*' web/src/types web/src | head -n 50
# Search within types for Attachment/Email definitions.
fd -t f -g '*ts' web/src/types 2>/dev/null | xargs -r rg -n --type=ts --fixed-string 'interface Attachment' --fixed-string 'type Attachment' --fixed-string 'class Attachment' --fixed-string 'attachments:'
fd -t f -g '*ts' web/src/types 2>/dev/null | xargs -r rg -n --type=ts --fixed-string 'interface Email' --fixed-string 'type Email' --fixed-string 'class Email' --fixed-string 'attachments:'
# Also search outside types in case Attachment/Email types live elsewhere.
fd -t f -g '*ts' web/src | xargs -r rg -n --type=ts --fixed-string 'interface Attachment' --fixed-string 'type Attachment' --fixed-string 'attachments: Attachment' --fixed-string 'attachments:'
fd -t f -g '*ts' web/src | xargs -r rg -n --type=ts --fixed-string 'interface Email' --fixed-string 'type Email' --fixed-string 'attachments:'Repository: Technical-1/EmailAnalyzer
Length of output: 2518
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Show what's in web/src/types and list candidate type files.
ls -la web/src/types
fd -t f -g '*.ts' web/src/types
# Search within app types for Attachment and Email attachments shape.
fd -t f -g '*.ts' web/src/types | xargs rg -n --type=ts --fixed-strings 'Attachment' --fixed-strings 'attachments' --fixed-strings 'interface Email' --fixed-strings 'type Email' --fixed-strings 'interface Attachment' --fixed-strings 'type Attachment'
# Also search within web/src for Email/Attachment types if they are re-exported elsewhere.
fd -t f -g '*.ts' web/src | xargs rg -n --type=ts --fixed-strings 'attachments:' --fixed-strings 'attachments.' --fixed-strings 'Attachment' | head -n 200
# Open types/index.ts for re-exports.
sed -n '1,240p' web/src/types/index.ts | cat -nRepository: Technical-1/EmailAnalyzer
Length of output: 725
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Open the relevant sections of the app type definitions.
sed -n '1,220p' web/src/types/index.ts | cat -n
# Narrow views around Attachment and Email definitions.
rg -n --fixed-strings 'export interface Attachment' web/src/types/index.ts
rg -n --fixed-strings 'attachments:' web/src/types/index.ts
rg -n --fixed-strings 'export interface Email' web/src/types/index.ts || true
rg -n --fixed-strings 'interface Email' web/src/types/index.ts || trueRepository: Technical-1/EmailAnalyzer
Length of output: 7088
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Find where attachmentData is read/written.
rg -n --type=ts --fixed-strings 'attachmentData' web/src | head -n 200
# Find where Attachment objects (id/filename/mimeType/size) are created or persisted.
rg -n --type=ts '\bAttachment\b|attachment(s)?\.' web/src | head -n 200
# Find DB layer / store usage for attachments.
fd -t f -g '*.ts' web/src/db web/src/store web/src/services web/src/pages | xargs -r rg -n --type=ts 'attachments|attachmentData' || true
# Look for any import pipeline code that might populate attachments after worker mapping.
rg -n --type=ts 'EMAIL_BATCH|parse.*(worker|import)|toAppEmail\(' web/src | head -n 200Repository: Technical-1/EmailAnalyzer
Length of output: 14916
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Show where EMAIL_BATCH from worker is persisted to DB.
rg -n --type=ts "'EMAIL_BATCH'|EMAIL_BATCH" web/src/pages/HomePage.tsx web/src -S | head -n 200
# Inspect the HomePage handling that writes emails to DB.
sed -n '1,260p' web/src/pages/HomePage.tsx | cat -n
# Inspect database persistence logic for attachments + attachmentData.
sed -n '1,260p' web/src/db/database.ts | cat -n
sed -n '260,520p' web/src/db/database.ts | cat -n
# Inspect the DBEmail -> slimAttachments / attachmentData extraction.
rg -n --type=ts 'slimAttachments|attachmentData|attachments:' web/src/db/database.ts
# Re-open toAppEmail.ts snippet for context.
sed -n '1,120p' web/src/workers/toAppEmail.ts | cat -nRepository: Technical-1/EmailAnalyzer
Length of output: 37832
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Inspect import pipeline entrypoints used by HomePage when receiving EMAIL_BATCH.
sed -n '1,260p' web/src/services/importPipeline.ts | cat -n
# Search within importPipeline for attachments/attachmentData.
rg -n --type=ts 'attachments|attachmentData|Attachment' web/src/services/importPipeline.ts web/src/services | head -n 200
# Inspect any helper used by importPipeline that might rebuild attachments.
rg -n --type=ts 'processEmailBatch|createImportCounts|insertEmail' web/src/services/importPipeline.ts web/src/services | head -n 200Repository: Technical-1/EmailAnalyzer
Length of output: 13578
Fix imported email attachments being dropped by the mapper
web/src/workers/toAppEmail.ts hard-codes attachments: [], so any attachments produced by the archive parser are never carried into the app Email row.
The persistence layer (web/src/db/database.ts insertEmail / bulkInsertEmails) splits email.attachments into:
- attachment metadata stored in
emails.attachments, and - base64 payload stored in
emailBodies.attachmentData.
With attachments: [], both the attachment count and attachment previews/downloads will be empty for newly imported emails. Map LibEmail attachments into the app Attachment shape (and include data when available), or document where attachments are intentionally reconstructed later.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@web/src/workers/toAppEmail.ts` at line 19, The mapper in toAppEmail.ts
currently hard-codes attachments: [], dropping parsed attachments; update the
mapper that converts LibEmail -> Email (the toAppEmail function) to map
LibEmail.attachments into the app Attachment shape instead of an empty array,
preserving fields like filename, contentType, size and including the attachment
base64 payload (e.g., data) when present so the persistence layer (insertEmail /
bulkInsertEmails which split email.attachments and emailBodies.attachmentData)
can store metadata and payload correctly; ensure the mapped property names match
the app Attachment/interface expected by the database layer.
Summary
Replaces EmailAnalyzer's duplicated, drift-prone parser/detector logic with the published library @technical-1/email-archive-parser@^3.0.0, ending the fork drift. The Web Worker keeps its streaming/messaging shell and IndexedDB persistence; parsing + detection now come from the library.
What changed:
parserWorker.ts): MBOX, OLM, and Gmail-Takeout paths now delegate to the library'sMBOXParser/OLMParservia a newtoAppEmailadapter. All inline parsing/MIME helpers removed (~−950 lines net across the migration).importPipeline.ts): uses the library's four detectors; smallcreateXFromEmailfactories inlined (withlastActivityDateparity preserved)..zip→.mboxwalk + dedup + folder mapping stays app-side (packaging logic) and delegates per-.mboxto the library.Date | nullsemantics app-wide (no more fabricatednow()for missing headers); undated emails are guarded everywhere and excluded from time-based aggregations.Inherited for free from library v3: correct non-US locale money parsing (
€1.234,56), honest subscriptionmonthlyAmount/frequency, byte-accurateEmail.size, tightened newsletter heuristics, and the nullable-date correctness.Test Plan
npx tsc -bcleannpm run build(tsc + vite) succeeds; library bundles into theparserWorkerchunknpm run test:run— 257 pass, 0 fail (count down from 329: 72 duplicated detector/parser unit tests removed)npm run dev, upload (a) a small.mbox, (b) a.olm, (c) a Gmail Takeout.zip; confirm progress advances, emails/contacts/subscriptions/newsletters/purchases populate IndexedDB, an email with a missingDate:header shows "Unknown date" (not 1970/today), and a€1.234,56subscription stores 1234.56.Notes / follow-ups
parseStreaminghas no cancellation hook, so "cancel" stops importing but the library finishes parsing the current file in the background (documented in-code; candidate for a futureAbortSignallib enhancement).Date); rare edge, left minimal.Summary by CodeRabbit
Release Notes
Bug Fixes
Chores