feat(org): configurable agent-run timeouts with env-clamped ceiling#168
Open
MesoX wants to merge 1 commit into
Open
feat(org): configurable agent-run timeouts with env-clamped ceiling#168MesoX wants to merge 1 commit into
MesoX wants to merge 1 commit into
Conversation
Adds a per-organization override of the agent-run wall-clock + per-step timeouts for both chat-driven and trigger-driven runs. Defaults remain the same; overrides only kick in when an org admin sets them in the organization settings UI. Schema - Migration 0041 adds a nullable jsonb 'agent_run_settings' column on 'organization'. Idempotent: existing rows keep the default null value and continue using env / hardcoded defaults. - 'AgentRunSettings' shape is exported from '@platypus/schemas' as a strict zod object with four optional positive-int fields: chatPerRunTimeoutMs, chatPerStepTimeoutMs, triggerPerRunTimeoutMs, triggerPerStepTimeoutMs. Backend - New 'services/agent-run-settings.ts' exposes 'resolveRunTimeouts(orgId, kind)' (chat | trigger). It reads the org override, falls back to env, and finally to documented hardcoded defaults. The result is clamped to the env-supplied ceiling so a misconfigured override can never exceed the deployer-allowed maximum. Chat defaults are sourced from run-registry's exported constants so the two cannot drift. - 'PUT /organizations/:orgId' (admin-only) validates incoming overrides against the env ceilings and returns 400 with a single 'error' message (per API conventions) listing the offending env vars — admins can lower but never raise. - New 'GET /organizations/:orgId/agent-run-settings/ceilings' returns the current chat + trigger ceilings so the UI can display them next to each input. - 'routes/chat.ts' and 'services/trigger-execution.ts' now invoke 'resolveRunTimeouts' instead of reading env directly, so the org override is honored by every active run. Frontend - 'OrganizationForm' gains an 'Agent run timeouts' section (only shown on edit, not create). Four minute-valued inputs map to the four override fields; placeholders show the current ceilings fetched from the new endpoint. Values are converted to milliseconds before saving, validated client-side against the fetched ceilings for inline feedback, and the server error message is surfaced on rejection. Config - '.env.example' documents the four ceiling env vars (RUN_PER_RUN_TIMEOUT_MS, RUN_PER_STEP_TIMEOUT_MS, TRIGGER_PER_RUN_TIMEOUT_MS, TRIGGER_PER_STEP_TIMEOUT_MS) and their defaults. Tests - 'services/agent-run-settings.test.ts' covers env-default fallback, env-override parsing, garbage-env rejection, and the DB-backed 'resolveRunTimeouts' lookup including ceiling clamping, the no-row fallback, and chat-vs-trigger isolation. - 'routes/organization.test.ts' covers the new PUT path: 403 for non-admins, 400 (with 'error' key) when an override exceeds the ceiling, and a successful within-ceiling persist. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
MesoX
pushed a commit
to MesoX/platypus
that referenced
this pull request
Jun 1, 2026
Ports the reviewed version of the configurable agent-run timeout feature (upstream PR willdady#168) onto the deploy branch. Behavior unchanged; quality and convention fixes only. - Error responses use the singular `error` key per API conventions (was a custom `{ errors }` map). - Remove dead `clampRunTimeouts` / `__TEST_HOOKS__`; source chat defaults from run-registry's exported constants so they cannot drift. - Table-driven ceiling validation; server message reports minutes. - Frontend validates against the fetched ceilings for inline feedback and surfaces the server error message on rejection. - Document the four ceiling env vars in .env.example; migration trailing newline. - Add PUT /organizations/:orgId tests (403 / 400-over-ceiling / within-ceiling persist / null clears the override). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Contributor
Author
|
Hey @willdady this would deliberately help with #261 - Chat compaction can take some time (even more on the locally used models on slower hardware (not a big problem to see even 5 minutes of compaction happening with long context and slow dense model). Even with faster model we have run into timeouts. I would update it to the latest main, just let me know whether this is something you want to add to the tool tiself, or is there any other way how to make these things a bit more configurable? |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a per-organization override of agent-run wall-clock + per-step timeouts for both chat-driven and trigger-driven runs. Defaults are unchanged; overrides only take effect when an org admin sets them in the organization settings UI. Every override is clamped server-side to a deployer-supplied environment ceiling — an admin can lower a timeout but never raise it past what the host operator allows.
Motivation
Long-running agent runs (multi-step research, slow MCP/tool calls) sometimes need more headroom than the hardcoded defaults, but the limit should stay under the deployer's control. This exposes a safe, bounded knob per organization.
Changes
Schema
0041adds a nullablejsonb agent_run_settingscolumn onorganization(idempotentADD COLUMN IF NOT EXISTS; existing rows keepnulland behave exactly as before).AgentRunSettingsexported from@platypus/schemasas a strict zod object with four optional positive-int fields.Backend
services/agent-run-settings.ts—resolveRunTimeouts(orgId, kind)reads the org override, falls back to env, then to hardcoded defaults, and clamps to the env ceiling. Chat defaults are sourced fromrun-registryconstants so they cannot drift.PUT /organizations/:orgId(admin-only) rejects overrides above the env ceiling with400 { error }naming the offending env vars.GET /organizations/:orgId/agent-run-settings/ceilingsexposes the current ceilings for the UI.routes/chat.tsandservices/trigger-execution.tsresolve timeouts via the new service instead of reading env directly.Frontend
OrganizationFormgains an "Agent run timeouts" section (edit only) with four minute-valued inputs, ceiling placeholders, client-side ceiling validation for inline feedback, and surfacing of the server error message.Config
.env.exampledocuments the four ceiling env vars and their defaults (RUN_PER_RUN_TIMEOUT_MS,RUN_PER_STEP_TIMEOUT_MS,TRIGGER_PER_RUN_TIMEOUT_MS,TRIGGER_PER_STEP_TIMEOUT_MS).Behavior compatibility
Chat defaults (10 min run / 2 min step) are imported from
run-registry's existing defaults, so a run with no env override and no org override behaves identically to before this PR.Testing
services/agent-run-settings.test.ts— env defaults, env overrides, garbage-env rejection, DB-backed resolution including ceiling clamping, no-row fallback, chat-vs-trigger isolation.routes/organization.test.ts— PUT path: 403 for non-admins, 400 (witherrorkey) when over the ceiling, successful within-ceiling persist, and null clears the override.🤖 Generated with Claude Code