Skip to content

Run score_paths scoring off the event loop to keep heartbeat alive#93

Merged
maximusunc merged 1 commit into
mainfrom
claude/eloquent-planck-KInga
Jun 1, 2026
Merged

Run score_paths scoring off the event loop to keep heartbeat alive#93
maximusunc merged 1 commit into
mainfrom
claude/eloquent-planck-KInga

Conversation

@maximusunc
Copy link
Copy Markdown
Collaborator

The score_paths worker's scoring job was running almost entirely on the asyncio event loop: the synchronous feature-build loop (LMDB reads + numpy) plus message decompression/compression in get_message/save_message. On large queries this blocked the loop for longer than the 15s heartbeat TTL, so the heartbeat coroutine couldn't refresh its Redis key and the monitor fired false "worker lost" alerts even though the worker was healthy and busy.

Convert score_paths to a sync function and run the whole job in a single run_in_executor call on the existing ThreadPoolExecutor, using the sync DB helpers (get_message_sync/save_message_sync). This frees the event loop for the full duration of a scoring job, collapses the previous three executor hops (feature build, MLP, classifier) into one, and also offloads the message compression. save_message_sync is wrapped in a retry loop to preserve the 4-retry durability semantics of the async save_message.

The score_paths worker's scoring job was running almost entirely on the
asyncio event loop: the synchronous feature-build loop (LMDB reads + numpy)
plus message decompression/compression in get_message/save_message. On large
queries this blocked the loop for longer than the 15s heartbeat TTL, so the
heartbeat coroutine couldn't refresh its Redis key and the monitor fired
false "worker lost" alerts even though the worker was healthy and busy.

Convert score_paths to a sync function and run the whole job in a single
run_in_executor call on the existing ThreadPoolExecutor, using the sync DB
helpers (get_message_sync/save_message_sync). This frees the event loop for
the full duration of a scoring job, collapses the previous three executor
hops (feature build, MLP, classifier) into one, and also offloads the
message compression. save_message_sync is wrapped in a retry loop to preserve
the 4-retry durability semantics of the async save_message.
@maximusunc maximusunc merged commit 415827f into main Jun 1, 2026
2 checks passed
@codecov
Copy link
Copy Markdown

codecov Bot commented Jun 1, 2026

Codecov Report

❌ Patch coverage is 0% with 16 lines in your changes missing coverage. Please review.
✅ Project coverage is 44.37%. Comparing base (46d68af) to head (960d3b0).
⚠️ Report is 6 commits behind head on main.

Files with missing lines Patch % Lines
workers/score_paths/worker.py 0.00% 16 Missing ⚠️
Files with missing lines Coverage Δ
workers/score_paths/worker.py 0.00% <0.00%> (ø)

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 22056d3...960d3b0. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@maximusunc maximusunc deleted the claude/eloquent-planck-KInga branch June 1, 2026 18:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants