Skip to content

Prioritize chat over memory synthesis#1061

Merged
joelteply merged 1 commit into
canaryfrom
fix/persona-chat-inference-priority
May 8, 2026
Merged

Prioritize chat over memory synthesis#1061
joelteply merged 1 commit into
canaryfrom
fix/persona-chat-inference-priority

Conversation

@joelteply
Copy link
Copy Markdown
Contributor

Summary

  • default Hippocampus consolidation to raw memory so background memory work does not consume the visible chat inference lane
  • keep semantic compression available only behind CONTINUUM_ENABLE_LLM_MEMORY_SYNTHESIS
  • move Hippocampus to the lowest background cadence, pause it during startup autonomous-work gating, and require strict background backpressure for opt-in semantic synthesis

Validation

  • npx vitest run tests/unit/memory/HippocampusConsolidationPolicy.test.ts tests/unit/memory/RawMemoryAdapter.test.ts tests/unit/memory/SemanticCompressionAdapter.test.ts --no-color
  • npx tsc --noEmit --project tsconfig.json --pretty false
  • precommit: TypeScript build, modified-file ESLint baseline check, browser ping
  • prepush: TypeScript clean, ESLint baseline-tolerant clean

@joelteply
Copy link
Copy Markdown
Contributor Author

LGTM — small surgical fix that targets exactly the 'Compressed Insight' canned-text leak we observed when memory consolidation was sharing the visible-chat inference engine.

Verified

  1. Default consolidation = RAW (no LLM): selectDefaultConsolidationAdapter reads getDefaultConsolidationMode() which defaults to 'raw' unless CONTINUUM_ENABLE_LLM_MEMORY_SYNTHESIS=1. RawMemoryAdapter is a real pass-through (WorkingMemoryEntry → MemoryEntity, no synthesis call) — memory still gets stored, just without the LLM step that was bleeding into chat replies.

  2. Priority 'low' → 'lowest': doc says "Background memory must not compete with visible chat turns." Belt-and-suspenders.

  3. Hippocampus.tick short-circuits on StartupAutonomousWorkGate.isPaused(): same gate as Stabilize startup persona backpressure #1058. Memory consolidation deferred during seed/cold-start, consistent with the autonomous-loop pattern.

  4. SemanticCompressionAdapter BackpressureService check upgraded 'low' → 'background': when LLM synthesis IS opted in, it now uses the strictest back-off lane.

  5. Test: covers env-var policy function (default raw, opt-in semantic). 2 cases.

Non-blocking observations

  • Test coverage gap: the policy function is tested but the larger behavior changes — RawMemoryAdapter selection at construction, 'lowest' priority, StartupAutonomousWorkGate.isPaused() short-circuit, 'background' backpressure level — aren't directly asserted in tests. The first one in particular is the user-visible behavior change. A simple Hippocampus constructor test (env unset → adapter is RawMemoryAdapter; env=1 → SemanticCompressionAdapter) would lock the wiring in. Worth a follow-up.

  • 'lowest' priority value: assuming the PersonaContinuousSubprocess priority enum supports it (TS would catch otherwise — the prepush passed). Worth checking the enum has lowest distinctly below low so this isn't a no-op rename.

  • Documentation: when CONTINUUM_ENABLE_LLM_MEMORY_SYNTHESIS=1, what's the user-facing behavior difference? Briefly: (a) memory entries become semantic summaries instead of raw thoughts, (b) consumes inference budget that could otherwise serve chat. Worth a one-line operator note in the env var docs.

LGTM ship.

@joelteply joelteply merged commit bddcb00 into canary May 8, 2026
3 checks passed
@joelteply joelteply deleted the fix/persona-chat-inference-priority branch May 8, 2026 03:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant