Skip to content

daily_memory_generation intermittently fails on 300s OpenAI timeout (cURL 28) #2576

@chubes4

Description

@chubes4

Summary

daily_memory_generation is intermittently failing on the live install with a 300-second OpenAI request timeout. Surfaced by the new wake_briefing digest (#2557) on its first live run — it crossed the repeated-failure threshold in the 24h window.

Evidence (live, 24h window)

status                    n   latest
completed                 32  2026-06-07 00:05:48
failed - packet_failure    1  2026-06-07 00:00:01
failed: AI request failed  1  2026-06-07 00:00:02

The AI-request failure (job 3561):

AI request failed: Network error occurred while sending request to
https://api.openai.com/v1/responses: cURL error 28: Operation timed out
after 300001 milliseconds with 0 bytes received

engine_data: agent_id=1, task_type=daily_memory_generation, scheduled 2026-06-06 20:00:02.

Analysis

  • Most runs (32) complete fine — this is intermittent, not a hard logic bug.
  • The failure is a 300s timeout to OpenAI (cURL error 28, 0 bytes received) — the non-streaming /v1/responses call hung for the full timeout window. The daily-memory prompt can be large (it ships the whole MEMORY.md), so a slow/large completion against a loaded provider endpoint can exceed 300s.
  • A second failure mode (packet_failure) also appears once in the window — likely the engine's downstream handling of the same empty/failed response.

Worth investigating

  1. Is 300s the right ceiling for this call, and is it configurable? A large memory file + slow provider can legitimately exceed it. Consider streaming or a longer timeout for the system-mode memory call specifically.
  2. Should daily_memory_generation retry on a transient network/timeout failure rather than failing the job outright? It's idempotent (recompacts the same file) so a retry is safe and would absorb provider blips.
  3. The packet_failure path: confirm an AI timeout produces a clean, single failed job rather than cascading into a second distinct failure status.
  4. Provider/model: the call hit api.openai.com/v1/responses for the system task — confirm the system-mode model choice is appropriate (a faster/smaller model may be fine for memory compaction and would dodge the timeout).

Acceptance criteria

  • A transient provider timeout on daily_memory_generation does not leave a hard-failed job when a safe retry would succeed (idempotent task), OR the timeout ceiling is raised/made configurable for the memory call.
  • The double-failure (AI request failed + packet_failure for the same tick) is reduced to a single clean outcome.

Notes

Constraints

  • Conventional commits.
  • No CHANGELOG edits / version bumps (homeboy-managed).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions