Account reset, Monica import overhaul, duplicate detection, worker clustering#23
Account reset, Monica import overhaul, duplicate detection, worker clustering#23bashar-qassis wants to merge 58 commits into
Conversation
…ger after imports The duplicate detection worker had several bugs preventing it from catching obvious duplicates: - Scoring formula (name*0.4 + email*0.35 + phone*0.25 with threshold 0.4) meant contacts sharing the same email but with different names scored 0.35, below the threshold — silently missed. - Email comparison was case-sensitive. - Only one side of email/phone field pairs had its type verified. - Address data was completely ignored. - No import worker triggered duplicate detection after completion. Fixes: - Replace additive scoring with max-signal + bonus approach where each signal independently qualifies (email=0.85, phone=0.75, address=0.60, name=similarity) - Add case-insensitive email matching via LOWER() fragments - Filter both cf1 and cf2 contact_field_types in email/phone queries - Use LIKE 'mailto%' pattern to handle protocol colon inconsistency - Add address matching on normalized line1 + postal_code - Enqueue DuplicateDetectionWorker after successful completion in all three import workers (MonicaApiCrawlWorker, ImportSourceWorker, ImportWorker) - Add comprehensive test suite (20 tests) for the detection worker
list_candidates now takes limit/offset opts (default 20 per page). The LiveView loads one page at a time with a "Load more" button. Dismiss removes the candidate from the current list without reloading.
The /contacts/duplicates route uses ContactLive.Index, not the standalone Duplicates LiveView. Added limit/offset pagination with Load more button and optimistic dismiss (no full re-query) to match the standalone page.
Photos with the same content_hash on both contacts caused a unique constraint violation during merge. Now deletes duplicate photos from the non-survivor before transferring the rest, matching the pattern used for contact_tags and activity_contacts. Also collapsed the merge flow from 4 steps to 3 by combining the preview and confirm steps into a single "Review & merge" step. From the duplicates page (contact preselected), merge is now 2 clicks instead of 3.
Replace the custom hand-rolled Oban dashboard (412 LOC LiveView) with the official `oban_web` package, now open-source and free as of Oban 2.20. Mount at /admin/oban behind a new :require_admin on_mount hook gated by Kith.Policy.can?(user, :manage, :oban). Hide the "Jobs" nav link from non-admin users. Drops the dead photo_sync queue from Oban config — its worker (PhotoBatchSyncWorker) was removed in commit e474853 when Monica imports moved to API crawling.
Make every published host port configurable through env-var substitution so contributors can resolve local port conflicts without editing compose files. Defaults preserve current behavior with no .env present. Dev (docker-compose.dev.yml): MAILPIT_SMTP_PORT, MAILPIT_WEB_PORT, APP_PORT join the existing DB_PORT and MINIO_*_PORT vars. Prod (docker-compose.prod.yml): HTTP_PORT, HTTPS_PORT for Caddy. Internal container ports remain hardcoded so services can address each other on standard ports over the Docker network.
Photo import previously ran inline as Phase 4 of MonicaApi.crawl/5, which buried it inside the contact-crawl job and left the import_history "Photo Sync" UI panel stuck on "in progress" forever — sync_summary was never written because the refactor that deleted PhotoBatchSyncWorker (commit e474853) removed the only writers. Move the photo crawl into its own MonicaPhotoSyncWorker on the photo_sync queue, enqueued by MonicaApiCrawlWorker when api_options["photos"] is true. The worker passes credentials through job args (matching the MonicaDocumentImportWorker pattern) so the main worker can wipe the API key from the DB immediately after contact import completes. Drop the unauthenticated link fallback from the decoder — Monica's /api/photos endpoint always returns dataUrl, so the previous Req.get(link) path was likely 401'ing on protected storage URLs. If a photo lacks dataUrl, it's now surfaced as a failed entry in sync_summary with a no_data_url reason, instead of being silently dropped. The worker writes sync_summary after each page so the UI shows live progress — total/synced/failed/not_found counts plus a per-photo table — and logs each page boundary at :info under the [MonicaPhotoSync] prefix. Per-photo decisions log at :debug. Tests cover: dataUrl import + avatar set, not_found on missing contact, failed on missing dataUrl, dedup by content_hash, and mid-flight sync_summary writes between pages.
The :photo_sync queue was removed in commit e474853 along with PhotoBatchSyncWorker, but the Oban queues config in config.exs was updated to drop it. Jobs queued to :photo_sync would sit forever with no consumers. Switch to :imports to match MonicaApiCrawlWorker and MonicaDocumentImportWorker.
- CLAUDE.md: remove photo_sync + api_supplement from the Oban queues list (those queues were removed in commit e474853); update queue count from 9 to 7 to match config/config.exs. - MonicaApiCrawlWorker moduledoc: clarify that photo import runs as a separate MonicaPhotoSyncWorker job, not inline. - Delete docs/superpowers/specs/2026-03-21-extensible-import-system-design.md and docs/superpowers/plans/2026-03-22-extensible-import-system.md. Both describe the pre-refactor design (PhotoSyncWorker, ApiSupplementWorker, file-based Monica import, per-photo job model) that no longer exists. CLAUDE.md plus the live moduledocs are now the source of truth.
Spec captures the photo-sync-after-reset bug, its root cause (orphaned import_records pointing at deleted contacts), and the decomposition of AccountResetWorker into a thin orchestrator plus one Cleanup module per data domain. Covers module layout, order-of-operations, account-scoped Oban job cancellation, error handling, and the test plan including a mandatory cross-account isolation check on every Cleanup module.
…efinement Plan decomposes the work into 13 TDD tasks, each producing one cleanup module (or the worker refactor) per commit. Spec refinement: split the proposed TagsAndActivitiesCleanup into Contacts.Cleanup (handles tags as part of the contacts axis) and Activities.Cleanup (its own context), aligning with the SOLID-elixir SRP guidance flagged in the brainstorming pass.
reminder_rules is account-scoped (not reminder-scoped) and has no FK relationship to reminders, so it's not CASCADE-cleared. Rules are 3 seeded-per-account pre-notification defaults treated as reference data.
…n explosion, E.164 normalization
Three independent bugs combined to make 1000 Monica imports surface as
~6000 entries in the duplicates tab even when "Auto-merge definite
duplicates" was checked:
Bug C (primary cause of "auto-merge did nothing"):
MonicaApiCrawlWorker.build_opts/1 only forwarded "extra_notes" — every
other wizard option, including "auto_merge_duplicates", was silently
dropped before reaching MonicaApi.crawl/5. Auto-merge was structurally
unreachable from the UI; only unit tests calling crawl/5 directly with
their own opts had ever exercised it. Now forwards api_options
unchanged, preserving the legacy extra_notes default-on semantic.
Bug A (primary cause of "6000 entries"):
DuplicateDetectionWorker.find_phone_matches/1 joined contact_fields on
the digit-normalized value but pre-filtered the *raw* value, so any
phone whose digits-stripped form was empty ("+", "()", "-", "N/A")
passed the filter, normalized to "", and matched every other zero-digit
phone — C(N,2) false candidates. The email side had a smaller analog:
no TRIM, no cf2 filter. Now strict equality on canonical values for
phone, TRIM(LOWER(...)) plus per-side non-empty filters for email.
Bug B (auto-merge predicate too narrow):
Within a name group the predicate compared values raw — Monica's
CardDAV-sync duplicates (monicahq/monica#6175) escape it on trivial
whitespace/casing artifacts. Name key now trims + collapses whitespace;
predicate accepts shared email OR phone OR address with normalized
comparators; addresses preloaded.
Bug D (heuristic phone storage):
PhoneFormatter.normalize/1 was heuristic — "10 digits stays as-is",
"11 digits starting with 1 becomes +1...", "00" IDD prefix unhandled —
so the same number written two ways was stored as two different
values. Replaced with ex_phone_number (libphonenumber port) producing
E.164. normalize/2 takes a default_region for bare numbers; bare input
without a region round-trips unchanged. PhoneRenormalizeWorker
backfills existing rows once.
UX:
auto_merge_duplicates wizard default flipped to true. New region
picker pre-populated from account.locale, listing every
libphonenumber-supported region with localized country names via
ex_cldr_territories ∩ ExPhoneNumber.Metadata.get_supported_regions/0.
Detection worker phone match simplified to plain equality now that
values are canonical at write-time, removing the per-row regex and the
filter mismatch that caused the cartesian explosion.
Tests:
- Boundary test in monica_api_crawl_worker_test that round-trips the
wizard flag through build_opts (would have caught Bug C directly).
- Cartesian-explosion regression in duplicate_detection_worker_test.
- Email TRIM and phone-normalization-on-import in respective files.
- phone_formatter_test rewritten for explicit-region semantics.
- New phone_renormalize_worker_test (5 tests).
- Production libphonenumber metadata in :test (was test-only metadata
that diverged from real validation on "555" NANP prefixes).
Dialyzer:
Added .dialyzer_ignore.exs suppressing two :contract_supertype
warnings against Kith.Cldr.Territory — these are emitted from
generated code in ex_cldr_territories, not actionable from our side.
1122 tests pass, mix quality clean.
…iner clustering DNSCluster connects via `Node.connect(:"basename@<ip>")` — it uses the raw IP as the host part of the node name. That requires each peer's actual node name to be `name@<ip>`, which conflicts with Phoenix release's default of `name@<hostname>`. The user's Portainer deployment exposes containers under stable service names (`app`, `worker`) that resolve via Docker DNS — but the BEAM nodes are named after the container ID (`kith@64c98536e88c`), so `Node.connect(:"kith@app")` fails the handshake. Switch to libcluster's Epmd strategy which connects by explicit node name (no IP rewriting). Each container is configured via env to: - `RELEASE_NODE=kith@app` (or `kith@worker`) - `KITH_CLUSTER_HOSTS=kith@app,kith@worker` - `RELEASE_COOKIE=<shared>` - `RELEASE_DISTRIBUTION=name` libcluster runs Cluster.Strategy.Epmd which polls `Node.connect/1` for each host periodically; once one direction connects, the bidirectional distribution is established and PubSub spans both. Dev and test are unaffected: `KITH_CLUSTER_HOSTS` is unset, so the libcluster topology list is empty and Cluster.Supervisor no-ops.
Spec for the follow-up to PR #23's Monica normalization work — replaces the hand-rolled NANP-only renderer in PhoneFormatter.format/2 with ExPhoneNumber library calls so account.phone_format honors non-US phones.
Bite-sized TDD plan to replace the NANP-only renderer in PhoneFormatter.format/2 with ExPhoneNumber library calls. Includes IEx probe step to capture exact library output strings, replacement of bug-pinning tests, non-NANP coverage matrix, and a Playwright extension to guard the e2e display path.
PhoneFormatter.format/2 honored account.phone_format only for +1 numbers because format_national/format_international were hand-rolled binary patterns matching the NANP shape. Every non-NANP phone fell through to the unchanged- pass-through clause, silently ignoring the user's display preference. Replace with ExPhoneNumber.format/2 (libphonenumber port already declared as :ex_phone_number in mix.exs). The phone's own country code drives the national rendering; unparseable input passes through unchanged, matching the existing normalize/2 contract. Tests at lines 153-159 of phone_formatter_test.exs encoded the bug as expected behavior; replaced with correct GB rendering plus FR/DE/JP/SA coverage and legacy/garbage/empty-string tests.
Moduledoc now reflects the ExPhoneNumber-driven implementation and warns contributors that any new phone-display UI must call format/2 with the account's phone_format setting. Also rename render/2's parameter from library_format to phone_number_format with a clarifying inline comment, addressing a code-review nit about the overload with the account's phone_format string field.
The existing Playwright spec only exercises +1 numbers, which is precisely the country where the NANP-only renderer happened to work. Adding GB national + international cases guards the e2e display path against a regression where someone reintroduces locale-specific hand-rolled rendering.
|
Follow-up: phone display format fix (5 new commits) While reviewing the Monica import normalization work, I noticed the account-level The fix replaces the hand-rolled code with Commits added:
Diff stats (code-only): +147 / −25 across Verification: Bug-pinning tests replaced: the existing Spec: |
Spec to address silent contact drop in Monica v4 /api/contacts listing. v4 LIMIT/OFFSET pagination over ORDER BY created_at loses a deterministic ~1.7% of contacts at tie-group boundaries with no visible error. Design adds a Phase 1.4 coverage check between the listing crawl and auto-merge: re-fetch meta.total, compare against import_records, backfill the gap via direct GET /api/contacts/:id for IDs in [min, max] not already seen, applying Monica's same is_active and is_partial filters client-side to avoid importing rows the listing deliberately hides. Partials ARE imported to anchor relationship targets.
Bite-sized TDD plan in 7 tasks: thread ref_data through crawl_all_contacts, add fetch_single_contact + accept_backfill_response helpers, implement coverage_check_and_backfill/3 core algorithm, wire into crawl/5 between Phase 1 and 1.5, extend summary, exhaustive test matrix (10 scenarios covering happy path, mixed 200+404, inactive skip, partial import, no-op-no-gap, unresolved-gap warning, early termination, auto-merge interaction, safety margin, hard iteration cap).
Phase 1.4 (coverage backfill, next commits) needs ref_data so it can call safe_import_api_contact/5 on directly-fetched contacts. crawl_all_contacts/1 was already building ref_data per page but discarding it on return; this commit threads it through to the orchestrator. No behavior change.
…fill_response)
fetch_single_contact/2 wraps api_get/3 to distinguish 404 (Monica
soft-delete, expected) from other errors. Returns
{:ok, contact} | :not_found | {:error, reason}.
accept_backfill_response/1 mirrors Monica's listing filter
(->real()->active() = is_active=1 AND is_partial=0) on direct-GET
responses so the backfill won't import contacts Monica hides from the
listing. Partials are still accepted because they anchor relationship
targets.
Both helpers are unused in this commit; @compile {:nowarn_unused_function, ...}
directive suppresses warnings until coverage_check_and_backfill/3
wires them in the next commit.
Implements Phase 1.4 logic: re-fetch meta.total, compare against import_records, iterate [min_id..max_id+50] for unseen IDs, dispatch each via fetch_single_contact + accept_backfill_response, early-terminate when gap closes, cap iterations at (max_id-min_id)+100 to guarantee termination. Stats accumulator covers gap_detected, range_scanned, imported_full, imported_partial, skipped_deleted, skipped_inactive, errors, unresolved_gap. Removes the @compile {:nowarn_unused_function, ...} directive added in the previous commit (it didn't work in Elixir 1.18.4 anyway, and the helpers are now used by coverage_check_and_backfill/3). Wiring into crawl/5 lands in the next commit.
Phase 1.4 now runs between the listing crawl (Phase 1) and auto-merge
(Phase 1.5). Backfilled contacts participate in auto-merge and Phase 2
cross-reference resolution as first-class import-record holders.
Import summary now carries coverage_backfill.{gap_detected, range_scanned,
imported_full, imported_partial, skipped_deleted, skipped_inactive,
errors, unresolved_gap}. The unresolved_gap field is the self-reporting
safety net: if it ends up > 0, the operator knows the listing dropped
contacts the backfill couldn't recover.
The happy-path test exercises the full pipeline end-to-end: listing
returns 4 of 5 contacts, meta.total reports 5, backfill issues one
direct GET for the missing ID 4, returns 200, and the contact ends up
in import_records.
Also fix three existing tests to account for the new meta-total call made
during coverage backfill phase.
Adds: mixed 200+404 closure, inactive skip, partial import, no-op when no gap, unresolved-gap log+summary, early termination, auto-merge interaction, safety margin (past max_seen), and hard iteration cap. Together with the happy path from the previous commit, this covers every branch listed in the spec's test matrix.
Credo --strict flagged two 'nested too deep' findings in scan_gap_range/8 and fetch_and_dispatch_backfill/4 (depth 3, max 2). Surgical fix: extract exactly two helpers, step_backfill/4 (the reduce_while body) and dispatch_accepted_contact/4 (the verdict dispatch). Each function now has nesting depth <= 2. No behavior change.
|
Follow-up: Monica import coverage backfill (8 new commits) Investigation triggered by a missing-contact report: Monica id Verified via Monica's open-source code: the listing endpoint applies FixNew Phase 1.4 in
Commits
Diff stats: +2380 / −13 across Test plan
Spec & Plan
|
Scope
76 files · +10,962 / −5,070 · 45 commits. Five themes:
1. Duplicate detection
line1 + postal_code(case-insensitive, trimmed).contact_field_typefilter.LIKE 'mailto%'protocol matching to handle seeded vs custom-created colon inconsistency.MonicaApiCrawlWorker,ImportSourceWorker,ImportWorker) enqueueDuplicateDetectionWorkeron success.2. Account reset completeness
Refactored
AccountResetWorkerinto a thin orchestrator over per-domainCleanupmodules. Each module owns account-scoped deletion for its context:Kith.Imports.Cleanup+Kith.Imports.JobCancellation(cancels in-flight Oban jobs)Kith.Storage.AccountCleanup(file wipe)Kith.Contacts.Cleanup(contacts + tags)Kith.Conversations.Cleanup·Kith.Journal.Cleanup·Kith.Tasks.Cleanup·Kith.Reminders.Cleanup·Kith.Activities.Cleanup·Kith.AuditLogs.CleanupAccountResetWorker.3. Monica import overhaul
Correctness:
MonicaPhotoSyncWorkerwith livesync_summary; queued on:imports(not:photo_sync).MonicaMiscDataWorker(per-contact extra data, plan-driven).Performance:
Contacts.create_contact_field/3acceptsnormalize: falseto skip redundant E.164 normalization on Monica writes.:persistent_termphone-CFT cache with a ref_data map passed through the call chain.4. Worker clustering
KITH_MODEin:prod(web= insert-only,worker= consumer).PubSub+ cluster discovery inbase_childrenso both modes have them.DNSClusterwithlibclusterEpmd strategy + sharedRELEASE_COOKIE+ DNS alias for cross-container clustering.RELEASE_COOKIEdocumented in.env.example.5. Infra / UX
.env.Test plan
mix compile --warnings-as-errors— cleanmix test— full suite passes (DuplicateDetectionWorker now has 20-case coverage; AccountResetWorker has regression + cross-account isolation tests; per-Cleanupmodule tests added across all 9 contexts)mix quality— format, credo, sobelow, dialyzer all passworkercontainer picks up:imports/:exports/:reminders/:mailers/:purgequeues andwebcontainer only insertsweb↔worker(checkNode.list/0in remote shell)RELEASE_COOKIEis set in both containersCleanupmodules execute and in-flight import jobs are cancelled