daily_memory_generation intermittently fails on 300s OpenAI timeout (cURL 28)

## Summary

`daily_memory_generation` is intermittently failing on the live install with a 300-second OpenAI request timeout. Surfaced by the new wake_briefing digest (#2557) on its first live run — it crossed the repeated-failure threshold in the 24h window.

## Evidence (live, 24h window)

```
status                    n   latest
completed                 32  2026-06-07 00:05:48
failed - packet_failure    1  2026-06-07 00:00:01
failed: AI request failed  1  2026-06-07 00:00:02
```

The AI-request failure (job 3561):
```
AI request failed: Network error occurred while sending request to
https://api.openai.com/v1/responses: cURL error 28: Operation timed out
after 300001 milliseconds with 0 bytes received
```
engine_data: `agent_id=1`, `task_type=daily_memory_generation`, scheduled 2026-06-06 20:00:02.

## Analysis

- Most runs (32) complete fine — this is **intermittent**, not a hard logic bug.
- The failure is a **300s timeout to OpenAI** (`cURL error 28`, 0 bytes received) — the non-streaming `/v1/responses` call hung for the full timeout window. The daily-memory prompt can be large (it ships the whole MEMORY.md), so a slow/large completion against a loaded provider endpoint can exceed 300s.
- A second failure mode (`packet_failure`) also appears once in the window — likely the engine's downstream handling of the same empty/failed response.

## Worth investigating

1. Is 300s the right ceiling for this call, and is it configurable? A large memory file + slow provider can legitimately exceed it. Consider streaming or a longer timeout for the system-mode memory call specifically.
2. Should `daily_memory_generation` **retry** on a transient network/timeout failure rather than failing the job outright? It's idempotent (recompacts the same file) so a retry is safe and would absorb provider blips.
3. The `packet_failure` path: confirm an AI timeout produces a clean, single failed job rather than cascading into a second distinct failure status.
4. Provider/model: the call hit `api.openai.com/v1/responses` for the system task — confirm the system-mode model choice is appropriate (a faster/smaller model may be fine for memory compaction and would dodge the timeout).

## Acceptance criteria

- A transient provider timeout on `daily_memory_generation` does not leave a hard-failed job when a safe retry would succeed (idempotent task), OR the timeout ceiling is raised/made configurable for the memory call.
- The double-failure (`AI request failed` + `packet_failure` for the same tick) is reduced to a single clean outcome.

## Notes

- Found by the wake_briefing feature (#2557) on its first live data run — exactly the kind of recurring-but-quiet failure it's designed to surface.
- Not urgent (most runs succeed; memory still compacts), but recurring, so tracking.

## Constraints

- Conventional commits.
- No CHANGELOG edits / version bumps (homeboy-managed).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

daily_memory_generation intermittently fails on 300s OpenAI timeout (cURL 28) #2576

Summary

Evidence (live, 24h window)

Analysis

Worth investigating

Acceptance criteria

Notes

Constraints

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

daily_memory_generation intermittently fails on 300s OpenAI timeout (cURL 28) #2576

Description

Summary

Evidence (live, 24h window)

Analysis

Worth investigating

Acceptance criteria

Notes

Constraints

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions