Summary
daily_memory_generation is intermittently failing on the live install with a 300-second OpenAI request timeout. Surfaced by the new wake_briefing digest (#2557) on its first live run — it crossed the repeated-failure threshold in the 24h window.
Evidence (live, 24h window)
status n latest
completed 32 2026-06-07 00:05:48
failed - packet_failure 1 2026-06-07 00:00:01
failed: AI request failed 1 2026-06-07 00:00:02
The AI-request failure (job 3561):
AI request failed: Network error occurred while sending request to
https://api.openai.com/v1/responses: cURL error 28: Operation timed out
after 300001 milliseconds with 0 bytes received
engine_data: agent_id=1, task_type=daily_memory_generation, scheduled 2026-06-06 20:00:02.
Analysis
- Most runs (32) complete fine — this is intermittent, not a hard logic bug.
- The failure is a 300s timeout to OpenAI (
cURL error 28, 0 bytes received) — the non-streaming /v1/responses call hung for the full timeout window. The daily-memory prompt can be large (it ships the whole MEMORY.md), so a slow/large completion against a loaded provider endpoint can exceed 300s.
- A second failure mode (
packet_failure) also appears once in the window — likely the engine's downstream handling of the same empty/failed response.
Worth investigating
- Is 300s the right ceiling for this call, and is it configurable? A large memory file + slow provider can legitimately exceed it. Consider streaming or a longer timeout for the system-mode memory call specifically.
- Should
daily_memory_generation retry on a transient network/timeout failure rather than failing the job outright? It's idempotent (recompacts the same file) so a retry is safe and would absorb provider blips.
- The
packet_failure path: confirm an AI timeout produces a clean, single failed job rather than cascading into a second distinct failure status.
- Provider/model: the call hit
api.openai.com/v1/responses for the system task — confirm the system-mode model choice is appropriate (a faster/smaller model may be fine for memory compaction and would dodge the timeout).
Acceptance criteria
- A transient provider timeout on
daily_memory_generation does not leave a hard-failed job when a safe retry would succeed (idempotent task), OR the timeout ceiling is raised/made configurable for the memory call.
- The double-failure (
AI request failed + packet_failure for the same tick) is reduced to a single clean outcome.
Notes
Constraints
- Conventional commits.
- No CHANGELOG edits / version bumps (homeboy-managed).
Summary
daily_memory_generationis intermittently failing on the live install with a 300-second OpenAI request timeout. Surfaced by the new wake_briefing digest (#2557) on its first live run — it crossed the repeated-failure threshold in the 24h window.Evidence (live, 24h window)
The AI-request failure (job 3561):
engine_data:
agent_id=1,task_type=daily_memory_generation, scheduled 2026-06-06 20:00:02.Analysis
cURL error 28, 0 bytes received) — the non-streaming/v1/responsescall hung for the full timeout window. The daily-memory prompt can be large (it ships the whole MEMORY.md), so a slow/large completion against a loaded provider endpoint can exceed 300s.packet_failure) also appears once in the window — likely the engine's downstream handling of the same empty/failed response.Worth investigating
daily_memory_generationretry on a transient network/timeout failure rather than failing the job outright? It's idempotent (recompacts the same file) so a retry is safe and would absorb provider blips.packet_failurepath: confirm an AI timeout produces a clean, single failed job rather than cascading into a second distinct failure status.api.openai.com/v1/responsesfor the system task — confirm the system-mode model choice is appropriate (a faster/smaller model may be fine for memory compaction and would dodge the timeout).Acceptance criteria
daily_memory_generationdoes not leave a hard-failed job when a safe retry would succeed (idempotent task), OR the timeout ceiling is raised/made configurable for the memory call.AI request failed+packet_failurefor the same tick) is reduced to a single clean outcome.Notes
Constraints