Skip to content

fix: make consolidation timeouts configurable#970

Open
ggoldani wants to merge 1 commit into
rohitg00:mainfrom
ggoldani:fix/configurable-timeouts
Open

fix: make consolidation timeouts configurable#970
ggoldani wants to merge 1 commit into
rohitg00:mainfrom
ggoldani:fix/configurable-timeouts

Conversation

@ggoldani

@ggoldani ggoldani commented Jun 22, 2026

Copy link
Copy Markdown

Summary

  • make consolidate per-concept timeout configurable instead of hardcoding 30s
  • make worker invocation timeout configurable instead of hardcoding 180s
  • align iii HTTP timeout defaults across main/docker/deploy entrypoints
  • add regression tests for timeout parsing, precedence, clamp, and failed-attempt budgeting

Root cause

Large semantic consolidations were failing even when the service/provider were healthy because timeouts were hardcoded in multiple layers:

  • mem::consolidate used a fixed 30s compress timeout per concept
  • the worker registered with a fixed invocationTimeoutMs: 180000
  • iii HTTP configs/templates also used default_timeout: 180000

In real runs this caused failures like:

  • Invocation timeout after 180000ms: mem::consolidate

Fix

  • add resolveTimeoutMs() / bounded parsePositiveInt() in config
  • mem::consolidate now resolves timeout precedence as:
    1. AGENTMEMORY_CONSOLIDATION_COMPRESS_TIMEOUT_MS
    2. OPENAI_TIMEOUT_MS
    3. AGENTMEMORY_LLM_TIMEOUT_MS
    4. fallback 120000
  • clamp consolidate timeout to 1000..300000ms
  • count timed-out/failed attempts toward the existing LLM budget so failure loops cannot run unbounded across concept groups
  • make worker invocation timeout configurable via AGENTMEMORY_INVOCATION_TIMEOUT_MS with clamp 1000..600000ms
  • raise default iii HTTP timeout to 600000ms consistently in:
    • iii-config.yaml
    • iii-config.docker.yaml
    • deploy entrypoints (railway/render/fly/coolify)
    • eval sandbox script
  • document the new env vars in .env.example and README.md

Validation

  • npm run build
  • npx vitest run test/config-timeouts.test.ts test/consolidate-project-scope.test.ts test/consolidation-default.test.ts
  • verified generated dist/ contains the new timeout logic and 600000 iii defaults ✅
  • equivalent local runtime repro that previously failed with Invocation timeout after 180000ms completed successfully in ~200.4s ({"consolidated":7,"totalObservations":30197}) ✅

Notes

I intentionally kept this patch surgical: no provider API changes, no endpoint surface changes, just timeout configurability + consistency + regression coverage.

Summary by CodeRabbit

  • New Features

    • Added configurable timeout environment variables for memory consolidation compression and worker invocation operations.
    • Updated default worker invocation timeout from 180 seconds to 600 seconds across all deployments.
  • Documentation

    • Extended configuration documentation with new timeout variable specifications including valid ranges.
  • Tests

    • Added test suite for timeout configuration parsing and consolidation timeout behavior.

@vercel

vercel Bot commented Jun 22, 2026

Copy link
Copy Markdown

@ggoldani is attempting to deploy a commit to the rohitg00's projects Team on Vercel.

A member of the Team first needs to authorize it.

@coderabbitai

coderabbitai Bot commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

📝 Walkthrough

Walkthrough

Two new timeout configuration utilities (parsePositiveInt, resolveTimeoutMs) are added to src/config.ts. These are wired into src/functions/consolidate.ts and src/index.ts to replace hard-coded timeout values with env-var-driven ones. All static deploy configs raise default_timeout from 180000 to 600000. A new test suite and documentation updates cover the changes.

Changes

Configurable Timeout Support

Layer / File(s) Summary
parsePositiveInt and resolveTimeoutMs utilities
src/config.ts
Adds PositiveIntOptions type with optional min/max bounds, parsePositiveInt to validate and clamp digit-only env string values, and resolveTimeoutMs to return the first valid candidate from an ordered list or fall back to a provided default.
Consolidation and worker invocation timeouts
src/functions/consolidate.ts, src/index.ts
consolidate.ts replaces the hardcoded 30000 ms compress timeout with a value resolved from AGENTMEMORY_CONSOLIDATION_COMPRESS_TIMEOUT_MS / OPENAI_TIMEOUT_MS / AGENTMEMORY_LLM_TIMEOUT_MS (default 120000, bounds 1000–300000). index.ts replaces the fixed 180000 ms invocationTimeoutMs with a value from AGENTMEMORY_INVOCATION_TIMEOUT_MS (default 600000, bounds 1000–600000).
Static default_timeout raised to 600000
iii-config.yaml, iii-config.docker.yaml, deploy/*/entrypoint.sh, eval/scripts/sandbox.sh
All static iii-http worker config files and deploy entrypoint heredocs update default_timeout from 180000 to 600000 to align with the new runtime default.
Timeout configurability regression tests
test/config-timeouts.test.ts
New Vitest suite with in-memory KV/SDK mocks validates parsePositiveInt and resolveTimeoutMs behavior, and two fake-timer integration tests verify that mem::consolidate honors the configured compression timeout and that timed-out compress calls count toward the LLM budget.
Environment variable documentation
.env.example, README.md
Documents the two new timeout env vars (AGENTMEMORY_CONSOLIDATION_COMPRESS_TIMEOUT_MS, AGENTMEMORY_INVOCATION_TIMEOUT_MS) with their valid ranges and commented-out default entries.

Possibly related issues

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🐰 A rabbit once timed things in stone,
But hardcoded waits made him groan.
Now env vars decide,
How long timers glide,
And the tests keep the schedule well-known! 🕐

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'fix: make consolidation timeouts configurable' directly describes the main change: adding configurability to consolidation timeouts, which is the primary focus of the PR across config functions, consolidate handler, and worker registration.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@test/config-timeouts.test.ts`:
- Around line 93-98: The afterEach block currently only cleans up environment
variables but does not restore real timers, which means if a test fails before
individual timer restoration code executes, fake timers remain active and
contaminate subsequent tests. Add a call to jest.useRealTimers() at the end of
the afterEach block to ensure that real timers are always restored as part of
the shared cleanup, preventing timer state from leaking between tests.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 3d01c5f1-4bb0-4c76-b629-21a405c53577

📥 Commits

Reviewing files that changed from the base of the PR and between f6f9e3c and 7e60f8f.

📒 Files selected for processing (13)
  • .env.example
  • README.md
  • deploy/coolify/entrypoint.sh
  • deploy/fly/entrypoint.sh
  • deploy/railway/entrypoint.sh
  • deploy/render/entrypoint.sh
  • eval/scripts/sandbox.sh
  • iii-config.docker.yaml
  • iii-config.yaml
  • src/config.ts
  • src/functions/consolidate.ts
  • src/index.ts
  • test/config-timeouts.test.ts

Comment on lines +93 to +98
afterEach(() => {
delete process.env["AGENTMEMORY_CONSOLIDATION_COMPRESS_TIMEOUT_MS"];
delete process.env["AGENTMEMORY_LLM_TIMEOUT_MS"];
delete process.env["OPENAI_TIMEOUT_MS"];
delete process.env["AGENTMEMORY_INVOCATION_TIMEOUT_MS"];
});

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Restore real timers in shared cleanup to prevent cross-test leakage.

If a test fails before Line 152 or Line 190, fake timers remain active and can cascade flaky failures into later tests. Move timer restoration into afterEach so cleanup always runs.

Suggested fix
 afterEach(() => {
+  vi.useRealTimers();
   delete process.env["AGENTMEMORY_CONSOLIDATION_COMPRESS_TIMEOUT_MS"];
   delete process.env["AGENTMEMORY_LLM_TIMEOUT_MS"];
   delete process.env["OPENAI_TIMEOUT_MS"];
   delete process.env["AGENTMEMORY_INVOCATION_TIMEOUT_MS"];
 });
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/config-timeouts.test.ts` around lines 93 - 98, The afterEach block
currently only cleans up environment variables but does not restore real timers,
which means if a test fails before individual timer restoration code executes,
fake timers remain active and contaminate subsequent tests. Add a call to
jest.useRealTimers() at the end of the afterEach block to ensure that real
timers are always restored as part of the shared cleanup, preventing timer state
from leaking between tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant