Skip to content

TemperFS actor ask timeout (5s) causes session failures under normal load #140

@rita-aga

Description

@rita-aga

Summary

TemperFS file read/write operations fail with actor dispatch failed: ask timeout after 5s even with a single concurrent agent session. This causes session failures that require manual intervention (Fail + Retry on parent jobs).

Impact

Agent sessions that use TemperFS (which is all of them — conversation writes, file reads, embodiment writes) randomly fail mid-execution. On Railway production (openpaw-production.up.railway.app):

  • 11 concurrent sessions: 8 of 11 failed with actor timeout
  • 3 concurrent sessions: 2 of 3 failed
  • 1 concurrent session: Still fails intermittently (observed at turn 5 with a single session running)

This makes the agent pipeline unreliable — jobs require manual babysitting with Fail/Retry cycles.

Error

TemperFS write failed (HTTP 409): {"error":{"code":"ActionRejected","message":"actor dispatch failed: ask timeout after 5s"}}

Also seen as:

failed to write REPL state file (HTTP 409): {"error":{"code":"ActionRejected","message":"actor dispatch failed: ask timeout after 5s"}}

Questions

  1. Is the 5s actor ask timeout configurable? Should it be higher for production?
  2. Is the TemperFS actor single-threaded? Can it handle backpressure (queue requests instead of failing them)?
  3. Could Turso latency be causing the actor to block on I/O, making it unable to respond within 5s?
  4. Should there be automatic retry logic in the session's TemperFS client before surfacing the error?

Environment

  • Railway deployment, single service
  • Docker image: ghcr.io/nerdsane/openpaw:edge
  • Storage: Turso (embedded SQLite)
  • Observed with both concurrent and single-session workloads

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions