Reclaim idle sessions to fix session-capacity overshoot#277
Conversation
Users hit "Server is at capacity (379/200)" and couldn't create sessions: idle/forgotten runtimes never left the global pool, the frontend reactivated every historical session on load, and concurrent creates could overshoot the cap. Backend: - Idle-session reaper (2h): releases the sandbox + RAM of inactive sessions and evicts them from the live pool, leaving them resumable from Mongo (persisted resumable, never "ended"). Teardown uses asyncio.wait so a shutdown cancel can't be swallowed; submit() enqueues under the lock and refuses sessions being reaped. Reaping is skipped when persistence is disabled and aborts if the resumable snapshot (incl. the message bulk_write, via save_snapshot strict mode) can't be durably written. - Global create-slot reservation closes the check-then-create race that let concurrent creates exceed MAX_SESSIONS. - last_active_at drives the reaper, stamped on genuine activity only. - Restore injects a sandbox-reset note so the agent knows its sandbox was wiped, skipped when an approval is pending (would orphan the tool results). Frontend: - Only the active or backend-processing session reactivates its runtime on load and tab refocus; idle sessions render from cache, so opening the app no longer refills the pool. isProcessing is cleared on all terminal and approval paths. Tests: reaper happy path/spared cases, submit-reap race, turn-finish restamp, global reservation race, persistence-disabled and write-failure aborts, shutdown cancellation propagation, restore-note inject/skip. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Claude finished @cmpatino's task in 5m 46s —— View job Claude's Review1 P0, 3 P1. Solid PR — the design is coherent (snapshot-then-pop, raise_on_error to gate eviction on durable persistence, reservation counter to close the create race, frontend gating on P0 — Old task's
|
|
Claude finished @cmpatino's task in 2m 32s —— View job No blocking issues. Re-review of 5b40506 ("Address review comments"). All four findings from the prior pass are addressed:
VerificationTraced the new reaper window (post-
No new P0 surfaced. |
Adopt PR #277's reviewed reclaim-idle-sessions implementation (the reaper persists the resumable snapshot outside the manager lock then re-checks under it; the useAgentChat setProcessingState refactor; test_session_reaper) while preserving this branch's premium-billing changes (premium_user_billed plumbing in session_manager, and the ClaudeCapDialog / CLAUDE_QUOTA_EXHAUSTED removal in useAgentChat). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Users hit "Server is at capacity (379/200)" and couldn't create sessions: idle/forgotten runtimes never left the global pool, the frontend reactivated every historical session on load, and concurrent creates could overshoot the cap.
Backend:
Frontend:
Tests: reaper happy path/spared cases, submit-reap race, turn-finish restamp, global reservation race, persistence-disabled and write-failure aborts, shutdown cancellation propagation, restore-note inject/skip.