Fix FFM-1520: cover direct first-meme nudge sends#306
Conversation
Fixes FFM-1520.
|
STAFF ENGINEER REVIEW: CHANGES REQUESTED — one blocking concurrency issue. [P2] Please adjust the timeout/cancellation behavior so an ambiguous in-flight send cannot release the lease and duplicate the nudge. Verification I ran: focused local test slice passed, 12 passed (
|
|
STAFF ENGINEER REVIEW: CHANGES REQUESTED — Direct first-meme nudge cohort assignment currently happens before Telegram delivery is confirmed. If the first direct send fails before a meme is delivered, the user can still be inserted into first_meme_nudge and counted in v_experiment_results without receiving the measured first meme. Move get_first_meme_nudge_variant_to_send() to after confirmed send_new_message_with_meme() success but before _record_delivered_meme_reaction(), and add regression coverage for a failed direct send not assigning the cohort. |
|
STAFF ENGINEER REVIEW: CHANGES REQUESTED — two blocking post-delivery edge cases remain.\n\n[P2] src/tgbot/senders/meme.py:74-78: after Telegram accepts a direct send, cancellation can still land while awaiting get_first_meme_nudge_variant_to_send(), before _record_delivered_meme_reaction() is reached. The shield only protects the later insert, so a delivered meme can still miss user_meme_reaction and later be duplicated or fail reaction updates. Record the delivery before cancellable nudge assignment work, or otherwise make the whole post-delivery section cancellation-safe.\n\n[P2] src/tgbot/senders/meme.py:73-78: send_new_message_with_meme() returns None on Forbidden after marking the user blocked. The new nudge path treats that as a delivered meme and can assign the first-meme experiment / popup lease for an undelivered blocked-user send. Stop before nudge/reaction work when Telegram rejected the meme.\n\nVerification: auto-merge disabled before this comment; git diff --check origin/production...HEAD passed; secret scan on changed files clean; focused pytest with explicit env passed (14 passed): tests/tgbot/test_meme_sender.py, tests/flows/test_broadcasts_meme.py, selected first_meme_nudge tests, selected pipeline diagnostics test. /codex review flagged both items as [P2]. |
|
STAFF ENGINEER REVIEW: CHANGES REQUESTED Two production issues in the direct-send timeout path need fixing before this can land:
Verification performed:
Please keep the invariant simple: once Telegram has accepted the meme, delivery recording must be impossible to skip due to first-nudge experiment work, and any deferred nudge spawned after a timeout must still have an owner/drain/logging path. |
|
STAFF ENGINEER REVIEW: CHANGES REQUESTED — Issues found: [P1] src/tgbot/senders/meme.py:103-107 records [P2] src/flows/broadcasts/meme.py:31-42 only attaches a callback for nudge tasks appended after the drain closes and does not retain or await them. If the broadcast flow exits soon after, those late tasks can be cancelled with the loop, dropping treatment nudges in the timeout path. Please keep the post-Telegram-success guard, but restore the invariant that first-meme cohort assignment happens before recording the delivery row, and make late broadcast nudge tasks owned long enough to finish or fail observably. |
|
|
|
STAFF ENGINEER REVIEW: APPROVED — manual review clean; /codex review returned no discrete correctness issue; targeted local tests passed (18 passed); GitHub lint/test are green; no new secrets, SQL interpolation, cso-trigger surfaces, or blender weight changes found. |
|
✅ Approved + merged. Staff Engineer fallback approval posted because GitHub blocks self-review; /codex review returned no discrete correctness issue; CI was green; squash auto-merge merged immediately. |
Fixes FFM-1520.
Root cause
Production current-week data showed 17 delivered new users, with 3 delivered users missing
first_meme_nudgeassignment. The three misses were all direct-send paths that bypass the originalnext_message()first-meme assignment path:cold_start_explorevia broadcast/direct sendshare_linkThose paths call
send_meme_to_user(), which recordeduser_meme_reactionwithout assigning the first-meme nudge experiment for confirmed first deliveries.Change
send_meme_to_user()only aftersend_new_message_with_meme()succeeds, and before_record_delivered_meme_reaction()records the measured delivery.This is the production-targeted version of the reviewed PR #305 fix, applied on top of current
origin/productionwithout unrelated local comms/experiment workspace changes.Verification
ruff check --fix src/ tests/ && ruff format src/ tests/git diff --check origin/production...HEADDATABASE_URL=postgresql+asyncpg://app:app@app_db:5432/app TELEGRAM_BOT_TOKEN=123456:ABCdefghijklmnopqrstuvwxyz pytest tests/tgbot/test_meme_sender.py tests/flows/test_broadcasts_meme.py tests/tgbot/test_first_meme_nudge.py::test_send_cancellation_keeps_lease tests/tgbot/test_first_meme_nudge.py::test_delivery_helper_retries_only_existing_treatment_assignment tests/recommendations/test_pipeline.py::test_compact_diagnostics_exclude_candidate_idsResult: 14 passed.