feat(memory): agent memory v2 — second brain with hybrid search, LLM extraction, and observability dashboard by kovtcharov · Pull Request #606 · amd/gaia

kovtcharov · 2026-03-20T08:43:53Z

Summary

Comprehensive agent memory system that serves as a second brain — storing, recalling, and learning from every interaction. Built on proven patterns from Mem0, Zep, and Hindsight.

Architecture (v2)

Hybrid search: Vector (FAISS) + BM25 (FTS5) + RRF fusion + cross-encoder reranking (ms-marco-MiniLM-L-6-v2)
Mem0-style extraction: LLM decides ADD/UPDATE/DELETE/NOOP against existing memory after each conversation turn
Zep-inspired fact lineage: superseded_by column preserves history when facts are corrected
Hindsight-inspired reconciliation: Background pairwise similarity check detects contradictions across sessions
Complexity-aware recall: Adaptive top_k (3/5/10) based on query complexity heuristics
Temporal search: time_from/time_to on all search methods for time-based recall
Conversation consolidation: Auto-distill old sessions into durable knowledge before 90-day prune
No silent fallback: Embeddings are a hard requirement — system fails loudly on misconfiguration

Schema v2

Three tables (knowledge, conversations, tool_history) with new columns:

knowledge.embedding BLOB — 768-dim vector (nomic-embed-text-v2)
knowledge.superseded_by TEXT — fact lineage chain
conversations.consolidated_at TEXT — consolidation tracking

Memory Tools (5 LLM-facing tools)

Tool	Purpose
`remember`	Store facts, notes, reminders with category/domain/entity/due_at
`recall`	Hybrid semantic+keyword search with temporal filtering
`update_memory`	Modify existing items, set reminded_at
`forget`	Delete a memory item
`search_past_conversations`	Search conversation history with temporal filtering

Use Cases

Note-taking, journaling, meeting notes capture
Reminders with due dates and wake-up scheduling
Personal knowledge management (research, articles)
Contact profiles via entity linking (person:sarah_chen)
Error learning and skill capture from tool usage
Recurring commitments (LLM advances due_at)

Observability Dashboard

Full-page Memory Dashboard in Agent UI with:

Header stats cards (memories, sessions, tool calls, success rate)
Activity timeline (30-day heatmap)
Knowledge browser (filterable, sortable, paginated table)
Tool performance stats
Conversation history browser with consolidation status
Upcoming/overdue reminders panel
Maintenance actions (consolidate, rebuild embeddings, reconcile)
Embedding coverage indicator

Startup Sequence

Validate Lemonade → 2. Backfill embeddings → 3. Rebuild FAISS → 4. Confidence decay → 5. Reconcile memory → 6. Consolidate sessions → 7. Prune → 8. Generate session

Files

Component	Files
Data layer	`src/gaia/agents/base/memory_store.py`
Agent mixin	`src/gaia/agents/base/memory.py`
System discovery	`src/gaia/agents/base/discovery.py`
REST API	`src/gaia/ui/routers/memory.py`
Agent UI	`src/gaia/apps/webui/src/pages/MemoryDashboard.tsx`
Architecture spec	`docs/spec/agent-memory-architecture.md`
Unit tests	`tests/unit/test_memory_*.py`
Integration tests	`tests/integration/test_memory_*.py`

Design References

System	Pattern adopted
Mem0	LLM-in-the-loop extraction (ADD/UPDATE/DELETE/NOOP)
Zep/Graphiti	Fact lineage via `superseded_by`, temporal search
Hindsight	Cross-encoder reranking, background reconciliation
ENGRAM	Memory typing (category-based) over knowledge graphs
CoALA	Four-tier cognitive architecture (working/episodic/semantic/procedural)

Test plan

## Summary - **`gaia init` now installs RAG dependencies** for `chat`, `rag`, and `all` profiles — adds `pip_extras` field to profile definitions and a new `_install_pip_extras()` step that detects editable vs package install, tries `uv pip` first with `pip` fallback - **Added `self.rag` None guards** to 8 RAG tools in `rag_tools.py` that were crashing with `'NoneType' object has no attribute 'index_document'` when RAG deps not installed - **Widened ChatAgent RAG init exception catch** from `ImportError` to `Exception` with warning-level logging and debug traceback - **Updated Agent UI docs** to include `[rag]` in install instructions (`[ui,rag]`) ## Test plan - [x] Lint passing (black, isort, pylint, flake8) - [x] All 1104 unit tests passing - [ ] `gaia init --profile chat` installs RAG deps automatically - [ ] Agent UI document indexing works after `pip install -e ".[rag]"` - [ ] RAG tools return actionable error when deps not installed (instead of crashing) 🤖 Generated with [Claude Code](https://claude.com/claude-code)

C-1: Guard winreg import and all registry-scanning methods in discovery.py so the module loads cleanly on Linux/macOS where winreg is absent. Also guard _scan_credential_manager() behind sys.platform check to avoid subprocess.CREATE_NO_WINDOW AttributeError on non-Windows. C-3: Replace direct _lock/_conn access in CLI with two new MemoryStore public methods: get_source_counts() and delete_by_source(source). delete_by_source() wraps FTS cleanup + DELETE in a single atomic transaction with rollback, removing the per-ID loop that could leave knowledge/FTS diverged on partial failure. C-4: Add close_store() to memory router module; call it from FastAPI lifespan shutdown so the WAL is checkpointed and the SQLite connection is released cleanly on server exit. M-2: list_knowledge endpoint now excludes sensitive items by default. New include_sensitive=false query param (default false) controls visibility; sensitive=true still filters to sensitive-only. M-6: Add append-only comment to conversations FTS trigger block noting that an AFTER UPDATE trigger would be required if store_turn() ever changes to update existing rows. Tests: +9 tests (394 total) covering get_source_counts, delete_by_source rollback discipline, and all three sensitive filter modes in the router.

- Fix _original_user_input=None fallback bug in _after_process_query (getattr default ignored None; switch to `or` to handle init state) - Extract VALID_CATEGORIES/MAX_CONTENT_LENGTH/MAX_TURN_LENGTH and other magic numbers to named module-level constants in memory_store.py - Import constants in memory.py to eliminate duplicate category sets and ensure truncation limits stay in sync across all call sites - DRY: memory router imports VALID_CATEGORIES from data layer instead of redefining its own copy - Clean up unused imports in test files (F401/F811 flake8 violations) - 394 unit tests passing, flake8 clean

Replace substring `"github.com" in url_lower` with urlparse().hostname comparison to fix CodeQL CWE-20 "Incomplete URL substring sanitization". A crafted URL like http://evil.com/github.com could otherwise bypass the check. Hostname equality/suffix match is unambiguous.

Security: - recall tool now filters out sensitive items before returning results to the LLM — sensitive entries (API keys, credentials) are for internal use only and must not appear in tool output. Performance: - Add get_by_category_contexts() to MemoryStore: single SQL query with WHERE context IN (active, 'global') replaces two separate get_by_category() calls in _get_context_items(), halving DB round-trips per system-prompt build (was 6 queries, now 3). - Replace N+1 correlated subquery in get_sessions() with a LEFT JOIN on MIN(id) per session — scales linearly regardless of session count. Reliability: - Add PRAGMA busy_timeout=5000 so concurrent WAL readers/writers in the same process (dashboard REST singleton + ChatAgent) retry for 5 s instead of failing immediately with SQLITE_BUSY. Correctness: - update_memory tool truncation check now uses MAX_CONTENT_LENGTH constant instead of hardcoded 2000, keeping it in sync with memory_store.py. Testability: - Replace sys.exit(1) in _bootstrap_chat/_bootstrap_discover/_bootstrap_reset helpers with raise RuntimeError; _handle_memory_bootstrap catches and exits, making helpers unit-testable in isolation. Tests (+34): - TestGetByCategoryContexts (5): single-query context+global fetch - TestGetAllKnowledgeSortByValidation (4): sort_by whitelist protection - TestGetSessionsFirstMessageV2 (3): join-based first_message - test_memory_discovery.py (22): _classify_remote, _classify_path, _classify_domain, scan_all structure, Windows guard 428 tests passing, 1 skipped (Windows-only guard on non-Windows).

# Conflicts: # src/gaia/agents/chat/agent.py # src/gaia/apps/webui/src/App.tsx # src/gaia/apps/webui/src/components/ChatView.tsx # src/gaia/ui/server.py

Comprehensive rewrite of agent-memory-architecture.md as a single unified design document. Key changes: - Hybrid search: vector (FAISS) + BM25 (FTS5) + RRF fusion + cross-encoder reranking (ms-marco-MiniLM-L-6-v2). No fallback — embeddings are a hard requirement. - Mem0-style LLM extraction: ADD/UPDATE/DELETE/NOOP operations against existing memory, replacing naive extract-and-store. - Zep-inspired fact lineage: superseded_by column preserves history when facts are corrected rather than silently overwriting. - Hindsight-inspired background reconciliation: pairwise similarity check on startup detects contradictions missed at extraction time. - Complexity-aware recall depth: adaptive top_k (3/5/10) based on query complexity heuristics. - Temporal range search: time_from/time_to on all search methods for natural time-based recall. - Conversation consolidation: auto-distill old sessions to durable knowledge before 90-day prune. - Second brain use cases: journaling, meeting notes, PKM, reminders, wake-up scheduling, recurring commitments. - Removed all graceful degradation / silent fallback patterns. - Removed openjarvis-memory-analysis.md (temp analysis doc). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…coverage, temporal+superseded filters - POST /api/memory/consolidate, /reconcile, /rebuild-embeddings - GET /api/memory/embedding-coverage - Updated GET /api/memory/knowledge with include_superseded, time_from, time_to - Updated GET /api/memory/stats with embedding coverage and reconciliation stats - 95 tests passing, lint clean Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…d_by, temporal search, consolidation - Schema v1→v2 migration: embedding BLOB, superseded_by TEXT, consolidated_at TEXT - New methods: store_embedding, get_items_with/without_embeddings, get_unconsolidated_sessions, mark_turns_consolidated, get_items_for_reconciliation - Updated search() with time_from/time_to, superseded_by IS NULL, use_count increment - Updated all query methods with superseded_by IS NULL filter - 275 tests passing, lint clean Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…LLM extraction, temporal recall Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

… FAISS, API integration Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ledge browser, activity timeline, tool stats 6-section dashboard: header stat cards, 30-day activity bar chart, paginated knowledge browser with entity/category/context/search filters, tool performance table, conversation history with FTS search, upcoming & overdue temporal panel. Features: - Embedding coverage indicator with progress bar - Maintenance dropdown: consolidate, rebuild embeddings, reconcile, rebuild FTS - Click-to-expand knowledge row detail (metadata, timestamps, superseded_by chain) - Inline actions: edit, delete, toggle sensitive, copy ID - Superseded entries toggle with server-side filtering - Toast notification system for all CRUD and maintenance operations - Brain icon in sidebar for navigation - Keyboard support: Escape key (layered close), Enter/Space on rows - ARIA labels, roles, and aria-live for accessibility - Responsive layout (3 breakpoints) - Relative date formatting ("in 2 days", "3 days ago") - API calls aligned with backend router field names Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…em0 extraction, consolidation, reconciliation Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The backend returns metadata as parsed JSON (dict), not a string. Rendering it directly showed [object Object]. Now uses JSON.stringify for object metadata and plain text for strings. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…e cases - Strengthen conversation context filtering test with explicit zero-result assertions instead of vacuous loop - Add due_at validation, empty-list consolidation, and history limit tests - Remove dead _past_iso import from API test file - 117 tests, all passing Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…m0 extraction, consolidation, reconciliation Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…up scope includes entity, dynamic context always returns time - MemoryStore.search(): corrected from "hybrid" to "FTS5 keyword search" (hybrid is MemoryMixin._hybrid_search) - get_memory_dynamic_context(): fixed "returns empty" claim — always returns current time - store() dedup scope: category+context+entity, not category+context - get_items_with_embeddings(): added missing top_k, time_from, time_to params - _classify_query_complexity: added missing medium/complex signal words - get_entities(): added missing last_updated field in return - Added undocumented update_confidence() and delete_by_source() methods - update(): noted embedding cleared on content change Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

… fixes - memory_store.py: set embedding=NULL when content changes in update() to force re-embedding (stale embedding would return wrong results) - server.py: alphabetize router imports - test fixes: formatting cleanup, mixin test updates from parallel tasks Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…aces, expand test coverage - Replace hardcoded mixin prompt dispatch in Agent._get_mixin_prompts() with auto-discovery of all get_*_system_prompt() methods — no manual registration needed - ChatAgent._get_mixin_prompts() simplified to call super() then filter SD when uninitialized - Fix KeyError in _EXTRACTION_PROMPT.format() by escaping literal curly braces - Update test_memory_mixin: reflect always-present memory instructions, fix embed/extract tests - Add test_memory_store coverage: update clears embedding, dedup clears embedding, superseded exclusion in get_by_entity/get_upcoming/get_by_category_contexts Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…nce fix - formatRelativeDate: add seconds granularity (6s ago, 45s ago), fall back to full date+time (not date-only) for items older than 30 days - Swap formatDate → formatRelativeDate in Updated column, conversation turn timestamps, and session last-activity - memory_store.get_stats(): add total_retrievals (SUM of use_count) - Add "Retrieved" stat card (gold accent) with avg recalls per memory - Rename "Memories" card label to "Stored" for clarity - remember() tool now stores with confidence=0.7 (was defaulting to 0.50) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Add GET/PUT /api/memory/settings endpoint (mcp_memory_enabled key) stored in ChatDatabase settings table - agent_ui_mcp.py: conditionally register memory_stats, memory_list, memory_recall tools when mcp_memory_enabled=true at server start - memoryApi.ts: add getMemorySettings / updateMemorySettings helpers - Dashboard: Settings section with toggle switch, loads on open, persists immediately on click; hint explains restart requirement Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…resent instructions block, new header _build_stable_memory_prompt() now always emits instructions even with zero memories. Update docs to reflect new header (=== MEMORY (Persistent Second Brain) ===), the instructions block, the zero-memories fallback text, and the 4000-char hard cap note. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Remove the disabled attribute from the chat textarea so users can compose their next message while the agent is still generating a response. Sending is still blocked until streaming completes.

7 test classes covering the full memory second-brain pipeline: - TestMemoryRememberRecall: store/search round-trip, confidence bumps, entity tagging, no false positives - TestMemoryNoteTaking: categories, reminders with due dates, context isolation, sensitive items, error/skill notes - TestMemoryJournaling: multi-turn conversation storage, FTS over history, session isolation, cross-session knowledge persistence - TestMemoryConfidenceAndDedup: explicit vs LLM-extracted confidence, dedup on overlap, supersession exclusion, embedding cleared on update - TestMemoryStatsAPI: total_retrievals in stats, REST endpoint contract - TestMemorySettingsAPI: GET/PUT mcp_memory_enabled default + persistence - TestMemoryMixinSystemPrompt: preferences/facts in stable prompt, sensitive exclusion, dynamic context timestamps, conversation storage All run without Lemonade — deterministic fake embeddings, mocked LLM. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…ions Reconcile and consolidate were silently no-ops because self.chat (AgentSDK) isn't available when init_memory() runs (before Agent.__init__). Fix: defer both LLM-dependent steps to first process_query() invocation via _memory_post_init_pending flag and _run_memory_post_init() helper. Remove stale hasattr(self, 'chat') guards from both methods. Additional v2 completions staged alongside: - system_context.py: day-0 OS/hardware/software context collection - memory_store.py: v2 store enhancements (WAL checkpoint, get_* methods) - cli.py: memory CLI commands (status, clear, export, context) - agent_ui_mcp.py: MCP memory access toggle (default disabled) - ui/routers/memory.py: memory router v2 endpoints - docs/spec/agent-memory-architecture.pdf: architecture spec - tests: 4 new deferred-init tests + store/router/eval coverage Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- discovery.py: add _make_profile_fact() for user profile facts - cli.py: first-boot onboarding flow — detect missing profile, offer quick intro (~1 min) on first gaia chat launch - _chat_helpers.py: _register_agent_memory_ops() wires live ChatAgent LLM + FAISS into memory router for consolidation / reconciliation - memory.py, memory_store.py: MemoryMixin v2 improvements - memory.py router: updated endpoints - tests: test_memory_router updates plans/autonomous-agent-mode.md: detailed spec for the upcoming feature/agent-always-running branch (loop state machine, request_user_input tool design, AgentLoop architecture, UI components, open questions)

…temporal examples, ASCII arrows - Replace all DB-log agent responses with conversational voice (12 occurrences) - Add missing error_auto source (0.5) to confidence table - Convert confidence bullet list to scannable table with all 6 sources - Show real agent output in temporal recall examples instead of [internal process] - Add time-range example to search_past_conversations - Fix update_memory double Agent: line — separate output from internal annotation - Replace Unicode arrows (→) with ASCII (->) in fact lineage code block Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…d + spec v2 Memory store: - Add 'goal' and 'task' to VALID_CATEGORIES for autonomous agent use; goals/tasks require approved_for_auto=True metadata before the loop acts Sessions: - Add private: bool field to sessions (CreateSessionRequest, UpdateSessionRequest, SessionResponse) — private sessions are never touched by the autonomous loop Memory dashboard & UI: - MemoryDashboard, Sidebar, ChatView, memoryApi.ts updates - types/index.ts, api.ts, utils.py additions Backend: - _chat_helpers.py, database.py, routers/memory.py, routers/sessions.py updates - models.py: private session field Tests: - test_memory_router.py updates Spec: - plans/autonomous-agent-mode.md v2 — addresses all 15 security/design issues from critique: event-driven triggers, background_mode immediate-deny, memory injection prevention, audit log as P1, PathValidator deadlock fix, SCHEDULE bounds, tunnel safety, private session exclusion, user input queue, __NO_RESPONSE__ sentinel, permission overlay context, step budgets

…2, fix tool-message sentinel bug - Add GoalStore (goal_store.py) with state-machine goals/tasks and goals router - Expand SystemDiscovery with UserAssist, recent file types, gaming/media, macOS app usage scans - Replace goal/task categories in MemoryStore with permission category; export GoalStore from base - Extend memory router with goals CRUD, upcoming, conversation search, stats v2, settings - Memory dashboard v2: goals panel, search, upcoming, knowledge CRUD, conversation history - Fix sdk.py: remove "continue" sentinel from _prepare_messages_for_llm; convert tool messages to user role (was assistant) so LLM sees tool results as a proper user turn, not a bare "continue" command that caused nonsensical responses like "What do you want me to continue?" - Add test_sdk_tool_messages.py regression tests for the sentinel fix - Update bootstrap inference with app-usage, file-type, and gaming/media sections Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…istory storage Qwen3.5 reasoning models emit <think>...</think> blocks before their JSON response. These were passed raw into _parse_llm_response and stored verbatim in conversation_history. On the next turn the LLM saw its own prior thinking as historical assistant context and used it to infer what the user "must have" said, producing replies unrelated to the actual current message (e.g. answering a games question when the user asked to set a reminder). Strip <think>...</think> from both the main response and the plan_response paths immediately after the LLM returns, before parsing and before storing in messages. The SSE streaming handler already filtered these for display; this fixes the agent-side persistence. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…journal system prompt - recall() now accepts domain= parameter for journal/sub-type filtering (e.g. recall(category='note', domain='journal') returns all journal entries) - get_by_category() gains domain= filter parameter with SQL pushdown - Default limit raised to 20 for list-style queries (no query text) - Added perf_counter timing in recall tool (DEBUG level per-step, INFO total) - System prompt now teaches agent: - 'Show my journal' → recall(category='note', domain='journal') - 'What reminders?' → recall(category='reminder') - Tightened storage IMPERATIVE for set-a-reminder / journal-entry requests - MIN_EXTRACTION_WORDS lowered 20→5, EXTRACTION_TIMEOUT_S raised 3→8 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…e eval scenarios recall() tool: - Expanded docstring to document both search and browse/list modes - Added offset= parameter for pagination (browse page 2, etc.) - Max limit raised to 100 for list-style queries - has_more flag in response signals more results available Eval personas (simulator.md + runner.py): - Added home_user, small_business_owner, student, creative_professional - Each persona has distinct communication style and context New memory eval scenarios (4): - memory_todo_tracking: persistent todo list (add, complete, list) - memory_notes_capture: second-brain quick note capture & retrieval - memory_small_business_context: business context + customer prefs - memory_student_study_assistant: courses, deadlines, learning prefs - memory_home_user_basics: simple personal facts (non-technical) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

New memory_health_tracking scenario tests the agent as a personal health journal: logging sleep, exercise, and habits, then analyzing patterns and giving personalized recommendations based on accumulated data. Uses home_user persona for non-technical communication style. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

ChatAgent personality rules: - GREETING RULE now memory-aware: references stored name/project instead of generic "What are you working on?" template - Added FACT-SHARING RULE: when user shares personal info ("I'm Sam", "I use Python"), agent must acknowledge the specific content, not give a generic greeting response - Removed hardcoded "RIGHT: Hey! What are you working on?" example that the LLM was copying verbatim for every response Memory instructions: - Greeting personalization examples added (use stored name + project) - Shortened some instruction lines to reduce prompt bloat Root cause: personality greeting examples were overriding memory context, causing identical "Got it, Sam! What are you working on?" responses regardless of what the user actually said. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

… CLI - Add AgentLoop background task for autonomous goal execution with event-driven state machine (IDLE/RUNNING/SCHEDULED/PAUSED) - Expand MCP router with connection health, tool list, and control endpoints - Add agent-mode and memory toggle controls to SettingsModal (frontend + API) - Extend CLI with memory/goal/agent-mode subcommands - Update GoalStore with scheduling, priority sorting, and rate-limit guards - Wire SSE handler to stream AgentLoop state-change events - Update memory router and database schema for new fields - Fix discovery scan edge cases and tool-message sentinel handling All 1891 unit tests passing. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

+            yield _sse({"type": "done", "total": 0})
+
+    return StreamingResponse(
+        _generate(),


+            yield _sse({"type": "done", "total": 0})
+
+    return StreamingResponse(
+        _generate(),


…tration-v1 Full recursive pipeline analysis (planning-analysis-strategist → software-program-manager → quality-reviewer → technical-writer-expert) of amd#606 (feat(memory): agent memory v2 — kovtcharov). Key findings: - 4 HIGH severity collisions: _chat_helpers.py, database.py, sse_handler.py, routers/mcp.py — all follow same pattern: our branch created comprehensive modules where PR amd#606 made targeted additions. Resolution: absorb PR's additions into ours during post-merge rebase. - 1 ZERO conflict: sdk.py ChatSDK→AgentSDK rename is identical in both branches — auto-resolves on merge. - 6 build-upon opportunities: MemoryMixin for pipeline agents, GoalStore↔PipelineExecutor wiring, AgentLoop convergence, SystemDiscovery→DomainAnalyzer calibration, GapDetector caching, declarative memory tool-calls in component-framework templates. - Recommended: PR amd#606 lands in main first, we rebase and absorb. - Open Items 9–15 added to branch-change-matrix.md tracking all conflicts and Phase 6 build-upon work. Files: docs/reference/pr606-integration-analysis.md (531 lines), docs/reference/branch-change-matrix.md (+16 lines, OI 9–15) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

antmikinka · 2026-04-13T22:19:10Z

PR #606 Integration Requirements

What `agent-memory-v2` Must Change to Fit the Pipeline Architecture

Document version: 1.0
Date: 2026-04-13
From: Pipeline orchestration team (feature/pipeline-orchestration-v1)
To: kovtcharov, author of PR #606 (agent-memory-v2)
Status: Pre-merge review — action required before merge approval

Who This Is For and What It Covers

This document is addressed to you, kovtcharov. You know your PR deeply. You do not know our branch. This document gives you all the context you need inline — you will not have to look at feature/pipeline-orchestration-v1 or ask us clarifying questions before you start working.

Your PR introduces an agent memory subsystem: MemoryMixin, AgentLoop, GoalStore, MemoryStore, SystemDiscovery, and a set of agent tools registered via _register_agent_memory_ops(). This is valuable work and we want it merged. However, PR #606 currently conflicts with six established integration points on feature/pipeline-orchestration-v1 and one architectural boundary that will block Phase 6 convergence if left unfixed.

This document defines:

Six merge-blockers (REQ-1 through REQ-5, REQ-8) — changes your PR must make before it can merge into the pipeline branch.
Two rebase-blockers (REQ-6, REQ-7) — changes that do not block today's merge but that will cause painful conflicts within 1-2 sprints if deferred past the next rebase.
Five recommendations (REC-1 through REC-5) — improvements that make the integration cleaner but are not gating conditions.
Our half of the contract (BU-1 through BU-4) — what we commit to building on top of your memory system once the above requirements are met.

Read Section 7 (Quick-Start Checklist) if you want to start immediately. Items are ordered by priority and time cost.

Our Pipeline Architecture — Context for Kovtcharov

The feature/pipeline-orchestration-v1 branch is AMD's internal pipeline orchestration system for GAIA. It extends the core Agent UI (src/gaia/ui/) with a recursive, multi-phase pipeline runtime. Here is what we have built, stated precisely so you can orient your changes:

Pipeline orchestrator (src/gaia/pipeline/orchestrator.py, 681 lines): Drives multi-stage agentic pipelines. The central function for recursive execution is _execute_recursive_pipeline() at line 556. This function manages its own async event loop explicitly (lines 656-661) because FastAPI's lifespan loop and Uvicorn's thread pool interact in ways that make loop injection non-trivial — but we have solved this for our use case. Your AgentLoop must not replicate this pattern; see REQ-7 for why.

Pipeline engine (src/gaia/pipeline/engine.py): PipelineEngine class. Drives PipelinePhase state transitions. We intend to converge PipelineEngine and AgentLoop in Phase 6, which is architecturally blocked unless AgentLoop exposes a drivable async interface. See REQ-7.

SSE streaming layer (src/gaia/ui/sse_handler.py, 950 lines): SSEOutputHandler class at line 89 queues typed JSON events for the browser. The pipeline runtime emits six event type strings that are hardwired into the frontend. These strings are reserved. See REQ-3.

Frontend type contract (src/gaia/apps/webui/src/types/index.ts, src/gaia/apps/webui/src/services/api.ts): The browser's TypeScript layer has a StreamEventType union (lines 260-265 of index.ts) that explicitly lists our six pipeline event names. The routing table PIPELINE_EVENT_MAP (lines 626-643 of api.ts) maps those exact string literals to frontend callbacks. Any SSE event type string you emit that collides with one of those six names will be silently misrouted to a pipeline UI callback. See REQ-3.

Agent cache (src/gaia/ui/_chat_helpers.py, 1,144 lines): Live agent instances are stored in _agent_cache: dict[str, dict] (line 54). Each entry is keyed by session_id and contains {"agent": <instance>, "model_id": str, "document_ids": list}. Access is serialized by _agent_cache_lock = threading.Lock() (line 57). Your _register_agent_memory_ops() must use this interface. See REQ-1.

Database schema and migration (src/gaia/ui/database.py, 787 lines): ChatDatabase class. The schema (SCHEMA_SQL, lines 23-76) is executed first; then _migrate() (line 118) runs incremental column additions. _migrate() follows a strict pattern — each column addition is its own try/except block using PRAGMA table_info. This pattern exists so a failed migration on one column does not block all subsequent migrations. Your new columns must follow this pattern. See REQ-2.

Shutdown sequence (src/gaia/ui/server.py, lines 209-213): The application lifespan closes resources in strict order: monitor.stop() at line 210, then db.close() at line 212. If your MemoryStore or GoalStore has a close_store() call, it must go between those two lines. After db.close(), the SQLite connection is gone — any write after that point raises OperationalError. See REQ-4.

Tool registry (src/gaia/agents/base/tools.py): ToolRegistry.register() at line 396 writes directly to self._tools[name] = {...} with no collision guard. When we add MemoryMixin to multiple pipeline agents (BU-1), each agent instantiation will call _register_agent_memory_ops(), registering remember, recall, update_memory, forget, and search_past_conversations five times. Silent overwrites will mask configuration errors. See REQ-6.

Section 1: Required Changes

The following eight items must be addressed before your PR can merge into feature/pipeline-orchestration-v1. Items marked [MERGE-BLOCKER] block the merge vote. Items marked [REBASE-BLOCKER] will cause architectural conflicts on the next rebase if not resolved now.

REQ-1 — Fix Agent Cache Access in `_register_agent_memory_ops()`

Classification: MERGE-BLOCKER
Estimated effort: 30 minutes
File to change: The file in your PR that contains _register_agent_memory_ops()

What your PR currently does:

Your _register_agent_memory_ops() function accesses the running agent instance through a path that does not match the live agent cache structure in _chat_helpers.py. The specifics depend on how your PR currently retrieves the agent, but the integration point you must target is described below.

What it must do instead:

The live agent for a session is accessed as:

# File: src/gaia/ui/_chat_helpers.py
# Line 54:
_agent_cache: dict[str, dict]
# Structure: session_id -> {"agent": ChatAgent, "model_id": str, "document_ids": list}

# Line 57:
_agent_cache_lock = threading.Lock()

To retrieve a live agent safely, your code must:

Acquire _agent_cache_lock before reading _agent_cache.
Call _agent_cache.get(session_id) — not direct key access.
Check the return value for None before indexing into it.
Access the agent as entry["agent"].
Return early (or raise a descriptive error) if session_id is missing from the cache.

The _store_agent() function (lines 95-111 of _chat_helpers.py) shows the reference write pattern. Mirror that lock discipline when you read.

Why this matters:

The _agent_cache_lock is a threading.Lock, not a reentrant lock. If _register_agent_memory_ops() reads _agent_cache without holding the lock, and a concurrent chat request evicts the entry (line 87 of _chat_helpers.py), you will get a KeyError or read a stale agent reference. The eviction path at line 87 (del _agent_cache[session_id]) runs while holding the lock, so your read must also hold it.

Correctness checklist for REQ-1:

Lock is acquired before reading _agent_cache
_agent_cache.get(session_id) used (not _agent_cache[session_id])
Return early if session_id not in cache (do not raise unhandled KeyError)
Agent accessed as entry["agent"]
Lock released in finally block or via with statement

REQ-2 — Add New Schema Columns via the `_migrate()` Pattern

Classification: MERGE-BLOCKER
Estimated effort: 1-2 hours
File to change: src/gaia/ui/database.py

What your PR currently does:

Your PR adds columns (knowledge.embedding BLOB, knowledge.superseded_by TEXT, conversations.consolidated_at TEXT) either directly in SCHEMA_SQL on existing tables or via a migration block that does not follow the established pattern.

What it must do instead:

The _migrate() method (line 118 of database.py) uses a strict pattern: one try/except block per column, each block using PRAGMA table_info to check existence before adding. The last existing block ends at line 176. New column additions must come after line 176.

The required pattern for each column is:

# After line 176 in _migrate()

# <Descriptive comment for the migration>
try:
    cols = [
        row[1]
        for row in self._conn.execute("PRAGMA table_info(<table_name>)").fetchall()
    ]
    if "<column_name>" not in cols:
        self._conn.execute(
            "ALTER TABLE <table_name> ADD COLUMN <column_name> <TYPE>"
        )
        self._conn.commit()
        logger.info("Migrated <table_name> table: added <column_name> column")
except Exception as e:
    logger.debug("Migration check for <column_name>: %s", e)

Apply this pattern three times, one per column:

knowledge.embedding BLOB — requires knowledge table to exist first
knowledge.superseded_by TEXT — requires knowledge table to exist first
conversations.consolidated_at TEXT — requires conversations table to exist first

If knowledge and conversations are new tables:

If your PR introduces these tables as entirely new (not columns on existing tables), add their CREATE TABLE IF NOT EXISTS statements to SCHEMA_SQL (lines 23-76 of database.py). SCHEMA_SQL is executed via executescript() at line 107, which runs before _migrate() at line 108. New columns on new tables may be declared inline in SCHEMA_SQL. Only columns being added to existing, already-deployed tables require the _migrate() try/except pattern.

Why this matters:

The _init_schema() method (line 105) calls executescript(SCHEMA_SQL) then _migrate(). The try/except-per-column pattern in _migrate() ensures that a failure adding embedding (e.g., because the column already exists on a pre-existing database from an older build) does not prevent superseded_by from being added. If you bundle all three columns into one try/except block, a failure on the first column silently skips the others with no error surfaced to the user.

Correctness checklist for REQ-2:

Each new column is a separate try/except block
All three blocks are placed after line 176 of _migrate()
Each block checks PRAGMA table_info before ALTER TABLE
New tables (if any) are in SCHEMA_SQL, not in _migrate()
_conn.commit() is called after each successful ALTER TABLE

REQ-3 — Rename AgentLoop SSE Event Types to Avoid Frontend Collision

Classification: MERGE-BLOCKER
Estimated effort: 2-3 hours (implementation) + cross-team coordination (1-2 days)
Files to change: All files in your PR that emit SSE event type strings from AgentLoop

What your PR currently does:

Your AgentLoop emits SSE events with type strings that include one or more of the six names reserved by the pipeline orchestration layer.

The six reserved names (do not use any of these):

Reserved string	Where it is hardwired	Effect of collision
`loop_back`	`types/index.ts` line 260, `api.ts` line 637	Routed to `onLoopBack` pipeline callback
`quality_score`	`types/index.ts` line 261, `api.ts` line 638	Routed to `onQualityScore` pipeline callback
`phase_jump`	`types/index.ts` line 262, `api.ts` line 639	Routed to `onPhaseJump` pipeline callback
`iteration_start`	`types/index.ts` line 263, `api.ts` line 640	Routed to `onIterationStart` pipeline callback
`iteration_end`	`types/index.ts` line 264, `api.ts` line 641	Routed to `onIterationEnd` pipeline callback
`defect_found`	`types/index.ts` line 265, `api.ts` line 642	Routed to `onDefectFound` pipeline callback

The StreamEventType union lives at lines 260-265 of src/gaia/apps/webui/src/types/index.ts. The routing table PIPELINE_EVENT_MAP is a Record<string, keyof PipelineStreamCallbacks> at lines 626-643 of src/gaia/apps/webui/src/services/api.ts. These strings are literal TypeScript values, not constants — they are compared directly against the type field of incoming SSE JSON objects.

What it must do instead:

You must propose replacement names for every AgentLoop event type that collides with the six names above. Send us your proposed names before writing any code — we will confirm no collision exists on our side, then you implement.

Suggested naming convention for AgentLoop events: prefix with memory_ or agent_loop_ to create a distinct namespace. Examples (not binding — send us your proposed list):

agent_loop_cycle_start instead of iteration_start
agent_loop_cycle_end instead of iteration_end
memory_quality_check instead of quality_score
agent_loop_back instead of loop_back
agent_phase_transition instead of phase_jump
memory_defect_detected instead of defect_found

Why this matters:

There is no runtime error when a collision occurs. The frontend PIPELINE_EVENT_MAP routes the event to a pipeline callback. The pipeline callback expects specific fields (iteration, score, defect_type, etc.) that your AgentLoop event does not carry. The pipeline UI component renders with undefined fields or renders stale pipeline state data instead of memory state data. This failure mode is invisible at the Python layer and produces silently wrong UI behavior.

Cross-team coordination required:

Do not implement this change unilaterally. Contact us (pipeline team, feature/pipeline-orchestration-v1) with your proposed replacement name list. We confirm. Then you implement and update REQ-5's MemoryDashboard.tsx accordingly. See Section 6 for the full coordination gate protocol.

Correctness checklist for REQ-3:

No AgentLoop SSE event uses any of the 6 reserved strings listed above
Replacement names agreed bilaterally before implementation
All Python-side emitters updated to the new names
MemoryDashboard.tsx event type strings updated to match (see REC-5)

REQ-4 — Insert `close_store()` in the Correct Position in the Shutdown Sequence

Classification: MERGE-BLOCKER
Estimated effort: 15 minutes
File to change: src/gaia/ui/server.py

What your PR currently does:

Your PR calls close_store() (or equivalent) either after db.close() at line 212, or registers a shutdown hook that runs after the lifespan context exits.

What it must do instead:

The lifespan shutdown block in server.py runs from lines 209-213:

# Line 209 (context yield exits here, shutdown begins)
# Line 210:
        await monitor.stop()
        logger.info("Document file monitor stopped")
# Line 212:
        db.close()
        logger.info("Database connection closed")

Your close_store() call must be inserted between line 210 and line 212 — after monitor.stop() completes, before db.close() is called. The position must look like:

        await monitor.stop()
        logger.info("Document file monitor stopped")
        # INSERT: close_store() here — after monitor, before db
        close_store()
        logger.info("Memory store closed")
        db.close()
        logger.info("Database connection closed")

Why this matters:

db.close() at line 212 calls self._conn.close() on the SQLite connection and then sets self._conn = None (lines 179-182 of database.py). The _transaction() context manager (lines 184-195 of database.py) checks if self._conn is None and raises RuntimeError("Database connection is closed"). If your MemoryStore shares the ChatDatabase connection (or opens its own WAL-mode connection to the same file), a close_store() that runs after db.close() will either operate on a closed connection or fail to flush pending writes before shutdown.

If MemoryStore uses a completely separate file and connection, the ordering still matters for clean log sequencing and to match our lifespan expectations for future shutdown hooks.

Correctness checklist for REQ-4:

close_store() is called after monitor.stop() (line 210) and before db.close() (line 212)
close_store() is idempotent (safe to call even if store was never opened)

REQ-5 — Do Not Add `dependencies` to `APIRouter` in `routers/mcp.py`

Classification: MERGE-BLOCKER
Estimated effort: 15 minutes
File to change: src/gaia/ui/routers/mcp.py

What your PR currently does:

Your PR adds authentication or dependency injection to the MCP router by modifying the APIRouter constructor at line 16 of routers/mcp.py.

What it must do instead:

Line 16 of routers/mcp.py currently reads:

router = APIRouter(tags=["mcp"])

This line must remain exactly as it is — no dependencies=[...] argument. If your new control endpoints (the ones that trigger memory operations via MCP) require authentication, add the dependencies argument at the individual endpoint level using the @router.post(...) decorator, not at the APIRouter level.

Example of the correct pattern:

# WRONG — do not do this:
router = APIRouter(tags=["mcp"], dependencies=[Depends(verify_auth)])

# CORRECT — per-endpoint auth:
router = APIRouter(tags=["mcp"])

@router.post("/api/mcp/memory/control", dependencies=[Depends(verify_auth)])
async def memory_control_endpoint(...):
    ...

Why this matters:

The existing catalog endpoints — GET /api/mcp/catalog, GET /api/mcp/catalog/{name}, and GET /api/mcp/install-config — are intentionally unauthenticated. They serve a read-only curated catalog of MCP server definitions. These endpoints are accessed by the GAIA installer and by external tooling during first-run setup, before any auth tokens exist. Adding dependencies=[...] to the APIRouter constructor applies that dependency to all routes registered on the router, including these unauthenticated catalog routes. This breaks the installer flow with a 401 or 403 on the catalog fetch.

Correctness checklist for REQ-5:

Line 16 of routers/mcp.py reads router = APIRouter(tags=["mcp"]) — no dependencies argument
Any auth on new endpoints is applied via @router.post(..., dependencies=[...]) at the individual route level
GET /api/mcp/catalog, GET /api/mcp/catalog/{name}, and GET /api/mcp/install-config remain unauthenticated

REQ-6 — Add Collision Guard to `ToolRegistry.register()` Before MemoryMixin Rollout

Classification: REBASE-BLOCKER
Estimated effort: 1 hour
File to change: src/gaia/agents/base/tools.py

What the current code does:

ToolRegistry.register() at line 396 of tools.py writes:

self._tools[name] = {
    "name": name,
    "function": func,
    "description": description or (func.__doc__ or ""),
    "parameters": params,
    "atomic": atomic,
    "display_name": display_name or name,
}

There is no check for whether name already exists in self._tools. A second call with the same name silently overwrites the previous registration with no log, no warning, and no error.

What it must do instead:

Add a logger.warning() call immediately before the assignment at line 396:

if name in self._tools:
    logger.warning(
        "ToolRegistry: tool name %r is already registered and will be overwritten. "
        "This may indicate duplicate mixin registration.",
        name,
    )
self._tools[name] = {
    ...
}

Why this matters now (not later):

Our BU-1 (described in Section 3) adds MemoryMixin to all five pipeline stage agents: DomainAnalyzer, PlannerAgent, ExecutorAgent, ReviewerAgent, and SynthesizerAgent. Each agent instantiation calls _register_agent_memory_ops(), which registers your five memory tool names: remember, recall, update_memory, forget, search_past_conversations. All five agents share a single process and, in some pipeline configurations, a shared ToolRegistry instance. That means your five tool names will be registered five times.

Silent overwrites in this scenario will mask real bugs: if DomainAnalyzer's remember function is accidentally overwritten by SynthesizerAgent's remember function (due to, say, a different bound self), the pipeline will call the wrong agent's memory with no diagnostic output. With a logger.warning() in place, we will see exactly which agent caused the collision and can fix the registry-sharing design before it causes data corruption.

The five tool names are currently safe (each agent registers the same function with the same signature). The guard is needed as insurance before we scale to five registrations.

Correctness checklist for REQ-6:

logger.warning() is emitted when name in self._tools before the assignment at line 396
Warning message includes the tool name and indicates potential duplicate mixin registration
The overwrite still proceeds after the warning (do not raise — existing behavior is preserved for non-memory tools)
logger is already imported in tools.py — do not add a new import

REQ-7 — Make `AgentLoop` Externally Drivable (Injectable Event Loop + Async Generator Interface)

Classification: REBASE-BLOCKER
Estimated effort: 3-5 days
Files to change: Your AgentLoop class and any caller that instantiates it

What your PR currently does:

Your AgentLoop creates its own event loop internally — either via asyncio.new_event_loop() or asyncio.run() — and runs to completion. It is not drivable from an external async caller.

What it must do instead:

AgentLoop must expose three things:

1. Injectable event loop:

AgentLoop must accept an optional event loop parameter in its constructor. It must not call asyncio.new_event_loop() or asyncio.set_event_loop() internally. If no loop is provided, it may use asyncio.get_event_loop(), but it must never create or set one.

class AgentLoop:
    def __init__(self, ..., loop: asyncio.AbstractEventLoop | None = None):
        self._loop = loop  # Injected by caller, not created here

2. Async generator interface:

AgentLoop must expose a coroutine or async generator that PipelineEngine can drive:

async def run(self, goal: str) -> AsyncIterator[LoopEvent]:
    """
    Drive the agent loop externally.
    Yields LoopEvent objects as work progresses.
    Caller controls when to advance, cancel, or inspect state.
    """
    ...
    yield LoopEvent(type="cycle_start", ...)
    ...
    yield LoopEvent(type="cycle_end", ...)

The exact LoopEvent schema is flexible — work with us on the spec (see BU-3). The requirement is that the method is an async generator (or returns an AsyncIterator) so PipelineEngine can async for event in loop.run(goal): without blocking its own coroutine.

3. GoalStore as constructor injection:

AgentLoop must accept GoalStore as a constructor parameter rather than instantiating it internally. This allows PipelineOrchestrator to pass its own GoalStore instance:

class AgentLoop:
    def __init__(self, ..., goal_store: GoalStore | None = None):
        self._goal_store = goal_store or GoalStore()

Why this matters:

_execute_recursive_pipeline() (lines 656-661 of src/gaia/pipeline/orchestrator.py) creates its own event loop:

loop = asyncio.new_event_loop()
asyncio.set_event_loop(loop)
try:
    return loop.run_until_complete(_run_async())
finally:
    loop.close()

This pattern exists because _execute_recursive_pipeline() is called from a sync context (the FastAPI route handler thread). If AgentLoop also calls asyncio.new_event_loop() inside a coroutine that is already running inside loop.run_until_complete(), Python will raise RuntimeError: This event loop is already running. There is no safe way to nest two asyncio.new_event_loop() calls in the same thread.

Our Phase 6 plan (BU-3) is to converge PipelineEngine and AgentLoop into a single runtime where the engine drives the loop iteration-by-iteration. That convergence is architecturally impossible if AgentLoop owns its own event loop. The injectable loop + async generator interface is the minimal contract that makes convergence tractable without a full rewrite.

Correctness checklist for REQ-7:

AgentLoop.__init__ accepts an optional loop parameter
AgentLoop does not call asyncio.new_event_loop() or asyncio.set_event_loop() anywhere
AgentLoop.run(goal) is an async generator that yields LoopEvent objects
GoalStore is injected via the constructor, not instantiated inside AgentLoop
AgentLoop can be used both standalone (with default GoalStore) and driven externally

REQ-8 — Scope the Curly-Brace Escaping Fix to `_get_mixin_prompts()` Only

Classification: MERGE-BLOCKER
Estimated effort: 1 hour
Files to change: src/gaia/agents/base/agent.py (scope the fix); verify no change to src/gaia/utils/component_loader.py

What your PR currently does:

Your PR fixes a curly-brace escaping issue where mixin prompt strings containing {...} were being interpreted as Python .format() placeholders. The fix escapes or converts the strings before the format call. However, your fix either touches component_loader.py or applies the escaping too broadly (affecting code paths beyond _get_mixin_prompts()).

What it must do instead:

The escaping fix must be applied only inside _get_mixin_prompts() at line 299 of src/gaia/agents/base/agent.py. It must not touch src/gaia/utils/component_loader.py at all.

Here is why component_loader.py must not be changed: ComponentLoader.render_component() (lines 240-246 of component_loader.py) uses str.replace() with {{KEY}} format:

# Lines 240-246 of src/gaia/utils/component_loader.py
# Replace {{VARIABLE}} placeholders
for key, value in variables.items():
    # Handle both "KEY" and "{{KEY}}" formats
    if not key.startswith("{{"):
        key = f"{{{{{key}}}}}"

    content = content.replace(key, str(value))

This is a literal str.replace() call — not Python string .format(). It operates on {{KEY}} tokens by doing a character-by-character string search and replace. Python .format() escaping rules (doubling curly braces as {{ and }} to produce literal { and }) are completely orthogonal to this mechanism. There is no interaction between Python .format() and str.replace(). Changing component_loader.py in response to a .format() escaping bug is a category error — the two systems do not share state, parsing, or execution path.

Verification step you must perform:

After your changes, run:

git diff HEAD -- src/gaia/utils/component_loader.py

The diff must be empty. If it is not empty, your fix has overreached. Revert the component_loader.py changes and re-scope the fix to agent.py line 299 only.

Correctness checklist for REQ-8:

Curly-brace escaping fix is inside _get_mixin_prompts() at agent.py line 299
src/gaia/utils/component_loader.py is unchanged (verified by git diff)
ComponentLoader.render_component() behavior (lines 240-246) is unaffected
Fix does not apply to any other string formatting path in agent.py

Section 2: Recommended Changes

These five items are not gating conditions for the merge but will make the integration significantly smoother. Items marked [PRE-MERGE] we would strongly prefer to see in the PR before merge. Items marked [POST-MERGE] can follow in a subsequent PR.

REC-1 — Stabilize and Document `GoalStore` Public API Signatures [PRE-MERGE]

What we need:

We are building BU-2 (wiring GoalStore into PipelineOrchestrator) against your GoalStore API. We need the following three method signatures to be stable and documented in your PR description before we start:

def create_goal(title: str, priority: int) -> str:
    """Create a new goal. Returns the goal_id (string UUID)."""

def create_task(goal_id: str, title: str) -> str:
    """Create a task under a goal. Returns the task_id (string UUID)."""

def update_task_status(
    task_id: str,
    status: Literal["PENDING", "ACTIVE", "COMPLETED", "FAILED"]
) -> None:
    """Update the status of a task. Raises KeyError if task_id not found."""

If your current implementation differs in signature (different parameter names, different return type, different status literals), tell us before merge. We will update BU-2's dependency spec to match. We just need the signatures to be frozen so BU-2 does not break on every rebase.

Action: Add the exact method signatures with docstrings to your PR description. If the implementation deviates from what is above, describe how it deviates.

REC-2 — `SystemDiscovery.get_cached_context()` Must Return a Typed Object [PRE-MERGE]

What we need:

We are wiring SystemDiscovery.get_cached_context() into our DomainAnalyzer agent (BU-4) to enable NPU-aware tier recommendations. We need the return value to reliably expose:

context.hardware.npu_available: bool
context.hardware.npu_driver_version: str  # empty string when no NPU, not None

The import path must be:

from gaia.agents.base.discovery import SystemDiscovery

The method must be idempotent: the first call performs hardware discovery and caches the result; subsequent calls return the cached result without re-probing hardware. This is required because DomainAnalyzer may be instantiated multiple times per pipeline run.

The npu_driver_version must be str, not Optional[str]. When no NPU is present, return an empty string "". Our code does if context.hardware.npu_driver_version: and an empty string is falsy — None requires a separate is None check that we do not want to add.

Action: Verify your SystemDiscovery.get_cached_context() implementation matches the above contract. If the attribute path is different (e.g., context.npu.available instead of context.hardware.npu_available), tell us the actual path so we can update BU-4's implementation.

REC-3 — Acknowledge the AgentRegistry Cache Staleness Race in Code Comments [POST-MERGE]

Context:

Code in your PR that reads AgentDefinition objects from AgentRegistry may encounter stale cached definitions. We are actively fixing issue OI-21: adding a file lock and an invalidate_capability_cache() call to the PUT /api/v1/pipeline/agents/{agent_id}/raw write path. Until OI-21 lands, there is a race window: your code reads a cached definition while a concurrent pipeline API write updates the underlying file.

What we ask:

Add a code comment in the AgentLoop or MemoryMixin code that reads from AgentRegistry acknowledging this race:

# NOTE: AgentDefinition read from AgentRegistry may be stale if a concurrent
# PUT /api/v1/pipeline/agents/{agent_id}/raw write is in progress.
# OI-21 (pipeline team) adds invalidate_capability_cache() on that write path.
# Until OI-21 lands, treat AgentDefinition as a read-only snapshot.

This prevents future contributors from assuming the registry is always consistent and prevents the issue from being silently papered over.

REC-4 — Use a Separate File Path for `MemoryStore` SQLite Database [PRE-MERGE]

What we need:

MemoryStore should default to ~/.gaia/memory/gaia_memory.db, not ~/.gaia/chat/gaia_chat.db.

Why:

ChatDatabase opens its connection with PRAGMA journal_mode = WAL (line 101 of database.py). WAL mode allows one writer and multiple readers on the same database file. However, two Python objects opening separate sqlite3.connect() connections to the same WAL-mode file — one for chat and one for memory — compete as simultaneous writers. Under concurrent chat message writes and memory consolidation writes, one of the two connections will receive OperationalError: database is locked.

SQLite's WAL mode serializes writers at the OS level, not the Python thread level. The ChatDatabase._lock = threading.RLock() (line 93 of database.py) serializes within the ChatDatabase instance, but it does not protect against a second sqlite3.connect() opened by MemoryStore from a different Python object.

Using a separate file (~/.gaia/memory/gaia_memory.db) eliminates the contention entirely. The memory database can run its own WAL-mode connection independently.

REC-5 — Add `MemoryEventType` as a Separate Union in `types/index.ts` [POST-MERGE]

What we need:

After REQ-3 renames are agreed and implemented, your AgentLoop will emit new event type strings (the non-colliding names). These strings must be added to the TypeScript type system. However, they must not be added to the existing StreamEventType union at lines 260-265 of src/gaia/apps/webui/src/types/index.ts.

Instead, create a separate union:

// In src/gaia/apps/webui/src/types/index.ts
// Add AFTER the StreamEventType union (after line 265):

/** Event types emitted specifically by AgentLoop (memory subsystem). */
export type MemoryEventType =
    | 'agent_loop_cycle_start'   // example — use agreed names from REQ-3
    | 'agent_loop_cycle_end'
    | 'memory_quality_check'
    | 'memory_recalled'
    | 'memory_consolidated';
    // ... add all agreed AgentLoop event names here

Why separate unions:

StreamEventType is used by SSEOutputHandler (line 89 of sse_handler.py) and the pipeline SSE router. PIPELINE_EVENT_MAP in api.ts routes StreamEventType strings. Extending StreamEventType with memory events means the pipeline SSE router would need to handle memory events — coupling two subsystems that should be independent. A separate MemoryEventType union keeps the two namespaces clean and allows MemoryDashboard.tsx to narrow its type to MemoryEventType without accepting the full StreamEventType union.

Section 3: What We Will Build On Top

This section is our contractual commitment to you. When your PR meets the requirements above, we will build the following on top of your memory subsystem. The dependency annotations show which REQ or REC items must be complete first.

BU-1 — Add `MemoryMixin` to All Five Pipeline Stage Agents

Dependencies: REQ-1 (cache access), REQ-2 (schema migration), REQ-3 (SSE name collision), REQ-6 (collision guard)
Our estimated effort: 1 sprint
What we will do:

We will apply MemoryMixin to these five agents:

DomainAnalyzer — stores domain context as long-term knowledge
PlannerAgent — stores plan decisions for retrospective analysis
ExecutorAgent — stores tool call results and execution trace
ReviewerAgent — stores quality assessments and defect patterns
SynthesizerAgent — stores synthesized outputs for cross-run deduplication

Each agent's __init__ will call _register_agent_memory_ops(), registering your five tool names five times total. REQ-6's collision guard must be in place to make this safe to debug.

REQ-1 must be in place so _register_agent_memory_ops() correctly retrieves the live agent instance from _agent_cache. REQ-2 must be in place so the memory schema columns exist before any memory write occurs. REQ-3 must be in place so the memory SSE events from these five agents do not collide with pipeline SSE events in the browser.

BU-2 — Wire `GoalStore` into `PipelineOrchestrator`

Dependencies: REC-1 (stable GoalStore API), REQ-4 (shutdown order)
Our estimated effort: 3-5 days
What we will do:

PipelineOrchestrator will accept a GoalStore instance and call:

create_goal(title=task_description, priority=1) at pipeline start
create_task(goal_id, title=stage_name) for each pipeline stage
update_task_status(task_id, "ACTIVE") when a stage starts
update_task_status(task_id, "COMPLETED" or "FAILED") when a stage ends

This makes pipeline runs visible in the Memory Dashboard without any change to the dashboard itself. The dashboard will show pipeline goals alongside agent memory goals, unified by GoalStore.

REQ-4 must be in place to ensure GoalStore (and its underlying MemoryStore) is closed before db.close() runs during application shutdown.

We are building against exactly the three method signatures specified in REC-1. If those signatures change after we start BU-2, we will need a version bump and a migration.

BU-3 — Phase 6 Convergence Spec: `AgentLoop` / `PipelineEngine` Unified Runtime

Dependencies: REQ-7 (injectable event loop + async generator interface)
Our estimated effort: 1-2 sprints (design), 2-3 sprints (implementation)
What we will do:

We will write a convergence specification that defines how AgentLoop and PipelineEngine merge into a single runtime. The unified runtime will:

Accept a goal or task description
Drive pipeline phases (domain → plan → execute → review → synthesize) as AgentLoop cycles
Emit a unified event stream that covers both LoopEvent (from your system) and PipelineEvent (from ours)
Support both memory-first (AgentLoop drives, engine records state) and pipeline-first (engine drives, loop provides execution substrate) operating modes

This convergence is only tractable if AgentLoop exposes the injectable event loop and async generator interface from REQ-7. Without REQ-7, the two systems cannot share a coroutine runtime without nesting event loops, which Python does not support.

We will invite you to co-author the convergence spec once REQ-7 is implemented.

BU-4 — Wire `SystemDiscovery` into `DomainAnalyzer` for NPU-Aware Tier Recommendations

Dependencies: REC-2 (SystemDiscovery.get_cached_context() returns typed object with hardware.npu_available)
Our estimated effort: 2-3 days
What we will do:

DomainAnalyzer currently selects a model tier based on task complexity alone. We will extend it to check context.hardware.npu_available from SystemDiscovery.get_cached_context(). When an NPU is available, DomainAnalyzer will recommend smaller quantized models (Tier 1-2) that run efficiently on the Ryzen AI NPU. When no NPU is present, it falls back to the current tier selection logic.

This requires REC-2 because DomainAnalyzer is instantiated multiple times per pipeline run and cannot afford the latency of repeated hardware probes. The caching guarantee in REC-2 is essential.

Section 4: Integration Contract Table

This table shows the two-way dependency map. "Kovtcharov provides" means your PR must implement it. "Pipeline team provides" means we build it after your requirement is met.

Dependency	Kovtcharov provides	Pipeline team consumes	Unblocked by
Agent cache access	REQ-1: correct `_agent_cache` read with lock	`MemoryMixin` in pipeline agents	REQ-1
Schema migration	REQ-2: three columns via `_migrate()` pattern	Memory reads/writes in pipeline runs	REQ-2
SSE event names	REQ-3: non-colliding AgentLoop event names	Browser memory UI (no misrouting)	REQ-3
Shutdown order	REQ-4: `close_store()` before `db.close()`	Clean process exit after pipeline runs	REQ-4
Router auth scope	REQ-5: per-endpoint auth only	MCP catalog used by installer	REQ-5
Tool collision guard	REQ-6: `logger.warning()` on overwrite	MemoryMixin on 5 pipeline agents (BU-1)	REQ-6
Drivable AgentLoop	REQ-7: async generator + injectable loop	PipelineEngine convergence (BU-3)	REQ-7
Scoped brace fix	REQ-8: only `_get_mixin_prompts()` touched	`ComponentLoader` template rendering	REQ-8
GoalStore API	REC-1: stable method signatures	GoalStore in PipelineOrchestrator (BU-2)	REC-1
SystemDiscovery	REC-2: typed return with `hardware.npu_available`	NPU-aware DomainAnalyzer (BU-4)	REC-2
Registry staleness	REC-3: code comment acknowledging race	Future OI-21 fix	REC-3
Separate DB file	REC-4: MemoryStore on `~/.gaia/memory/`	No WAL contention with ChatDatabase	REC-4
MemoryEventType	REC-5: separate TS union after REQ-3 names agreed	MemoryDashboard type safety	REQ-3 + REC-5
MemoryMixin rollout	BU-1: pipeline team adds MemoryMixin	AgentLoop events from pipeline agents	REQ-1,2,3,6
GoalStore wiring	BU-2: pipeline team wires GoalStore	Pipeline runs in Memory Dashboard	REC-1, REQ-4
Convergence spec	BU-3: pipeline team authors Phase 6 spec	Unified runtime for Phase 6	REQ-7
NPU tier selection	BU-4: pipeline team extends DomainAnalyzer	Efficient model selection on Ryzen AI	REC-2

Section 5: Merge Sequencing

The requirements above have dependencies. This section describes the recommended order of work to minimize rework.

Phase A — Unblocked work (start immediately, no cross-team coordination needed)

These items have no external dependencies and no coordination requirements. Estimated total: 4-5 hours.

Item	Effort	File
REQ-1 — Fix `_agent_cache` access	30 min	Your `_register_agent_memory_ops()` file
REQ-2 — Fix `_migrate()` additions	1-2 hr	`src/gaia/ui/database.py`
REQ-4 — Fix shutdown order	15 min	`src/gaia/ui/server.py`
REQ-5 — Remove router-level auth	15 min	`src/gaia/ui/routers/mcp.py`
REQ-8 — Scope brace-escaping fix	1 hr	`src/gaia/agents/base/agent.py`
REC-1 — Lock GoalStore signatures	30 min	PR description
REC-4 — Separate MemoryStore path	30 min	Your MemoryStore default path

Phase B — Coordination required (send us your proposal, wait for confirmation)

Item	Action	Waiting on
REQ-3 — SSE name collision	Send us your proposed AgentLoop event names	Our confirmation (target: 1 business day)

Do not write any code for REQ-3 until we confirm the names. Writing first means you may have to rename everything again if we find a collision with a future event name we have pending in a feature branch.

Phase C — After REQ-3 names confirmed (implement)

Item	Effort	File
REQ-3 implementation	2-3 hr	All AgentLoop SSE emitters
REC-5 — `MemoryEventType` union	1 hr	`src/gaia/apps/webui/src/types/index.ts`
`MemoryDashboard.tsx` updates	1 hr	Your `MemoryDashboard.tsx`

Phase D — Long-lead work (can run in parallel with Phase A-C, longer timeline)

Item	Effort	Notes
REQ-6 — Collision guard	1 hr	Can land any time before BU-1 starts
REQ-7 — Drivable AgentLoop	3-5 days	Does not block Phase A-C merge
REC-2 — SystemDiscovery contract	1 hr	Needed before BU-4 starts
REC-3 — Registry staleness comment	30 min	Post-merge PR acceptable

Target merge gate: All of Phase A complete + REQ-3 implementation complete (Phase C) = merge-ready. REQ-6 and REQ-7 can follow in a subsequent PR on a 1-sprint timeline.

Section 6: Cross-Team Coordination Gate (REQ-3 Name Agreement)

REQ-3 requires bilateral agreement on the AgentLoop SSE event type names before any implementation. This section defines the protocol.

The problem in concrete terms

Your AgentLoop emits SSE events. The browser receives them as JSON objects: {"type": "loop_back", "data": {...}}. The browser's PIPELINE_EVENT_MAP (a plain JavaScript object at lines 626-643 of src/gaia/apps/webui/src/services/api.ts) looks up event.type as a key and routes the event to a callback.

The six pipeline-owned strings are keys in PIPELINE_EVENT_MAP. If your event type matches one of those keys, the event is routed to a pipeline callback. There is no error — the pipeline callback silently receives wrong data. The memory UI receives nothing.

This is not a bug we can fix after the fact with a minor patch — it requires a coordinated rename across Python emitters, TypeScript types, and the dashboard component.

The protocol

Step 1 — You send us a list.

Reply to this document (or message the pipeline team directly) with a list of all SSE event type strings your AgentLoop emits. For example:

AgentLoop SSE event types (proposed):
- agent_loop_cycle_start
- agent_loop_cycle_end
- memory_quality_check
- agent_loop_back
- agent_phase_transition
- memory_defect_detected
- memory_recalled
- memory_consolidated
- goal_created
- task_status_changed

Step 2 — We check for collisions.

We check your proposed names against:

The six pipeline-reserved names (listed in REQ-3)
Any pending event names in our feature branch that have not landed yet
Names in the broader StreamEventType union (lines 255-265 of types/index.ts)

Step 3 — We confirm (target: 1 business day).

We reply with either "approved" or "here are the names that conflict — please propose alternatives for those."

Step 4 — You implement.

Only after step 3 confirmation do you implement the renames in your Python emitters, TypeScript types, and dashboard component.

Why this gate exists

The six pipeline event names were not chosen arbitrarily. They correspond to conceptual phases in the recursive pipeline: looping back (quality retry), scoring (quality metric), phase jumping (state machine transition), and defect detection. If we let AgentLoop use similar concepts with similar names, we will have a permanent naming conflict as the two systems co-evolve. The gate forces us to align vocabulary upfront.

Section 7: Quick-Start Checklist

Seven items kovtcharov can start today, in priority order. Total estimated time for items 1-6: approximately 4.5 hours.

Item 1 — Fix `_agent_cache` access (REQ-1)

Time: 30 minutes
Do: In _register_agent_memory_ops(), replace your current agent retrieval with the _agent_cache_lock + _agent_cache.get(session_id) + entry["agent"] pattern. Add an early return if session_id is not in the cache.
Verify: Run your existing memory tool tests. Confirm no KeyError on missing session.

Item 2 — Send us your AgentLoop SSE event name list (REQ-3 coordination)

Time: 15 minutes (to draft the list)
Do: List every SSE event type string your AgentLoop emits. Send to the pipeline team (feature/pipeline-orchestration-v1). No code changes yet.
Verify: Your list contains no names from the six reserved names in REQ-3. If it does, propose alternatives in the same message.

Item 3 — Fix `_migrate()` column additions (REQ-2)

Time: 1-2 hours
Do: For each of the three new columns, add a separate try/except block after line 176 of database.py using the PRAGMA table_info pattern. If knowledge and conversations are new tables, add CREATE TABLE IF NOT EXISTS to SCHEMA_SQL.
Verify: Run python -c "from gaia.ui.database import ChatDatabase; db = ChatDatabase(':memory:'); db.close()" and confirm no exception.

Item 4 — Remove router-level auth (REQ-5)

Time: 15 minutes
Do: Confirm line 16 of src/gaia/ui/routers/mcp.py reads router = APIRouter(tags=["mcp"]) with no dependencies argument. Move any auth to individual @router.post(...) decorators.
Verify: grep -n "dependencies" src/gaia/ui/routers/mcp.py — the APIRouter line must not appear in results.

Item 5 — Scope the curly-brace fix to `_get_mixin_prompts()` (REQ-8)

Time: 1 hour
Do: Ensure your escaping fix touches only _get_mixin_prompts() at line 299 of agent.py. Run git diff HEAD -- src/gaia/utils/component_loader.py and verify the output is empty.
Verify: Write a unit test: create a mixin that returns a prompt string with {variable} in it; confirm it does not raise KeyError or ValueError. Confirm render_component() still works with {{KEY}} substitution.

Item 6 — Fix shutdown order (REQ-4)

Time: 15 minutes
Do: In server.py, move your close_store() call to run after monitor.stop() (line 210) and before db.close() (line 212). The exact insertion point is between those two existing lines.
Verify: Read the lifespan block in server.py and confirm the sequence is: monitor.stop() → close_store() → db.close().

Item 7 — Lock in GoalStore API signatures in the PR description (REC-1)

Time: 30 minutes
Do: Add the exact method signatures for create_goal(), create_task(), and update_task_status() to your PR #606 description. Include parameter types, return types, and error behavior (what happens if task_id not found).
Verify: We (pipeline team) will confirm receipt and that BU-2's dependency on those signatures is satisfied.

Appendix A: File Reference Summary

All files mentioned in this document, with locations verified against HEAD d187907.

File	Lines	Relevant to
`src/gaia/ui/_chat_helpers.py`	1,144	REQ-1
`src/gaia/ui/database.py`	787	REQ-2, REC-4
`src/gaia/ui/sse_handler.py`	950	REQ-3, context
`src/gaia/ui/routers/mcp.py`	425	REQ-5
`src/gaia/ui/server.py`	—	REQ-4
`src/gaia/agents/base/tools.py`	—	REQ-6
`src/gaia/pipeline/orchestrator.py`	681	REQ-7, context
`src/gaia/utils/component_loader.py`	—	REQ-8
`src/gaia/agents/base/agent.py`	—	REQ-8
`src/gaia/apps/webui/src/types/index.ts`	—	REQ-3, REC-5
`src/gaia/apps/webui/src/services/api.ts`	—	REQ-3, REC-5

Appendix B: Specific Line Numbers for Every Claim

Every line number in this document was verified against HEAD d187907 of feature/pipeline-orchestration-v1. If you are working on a different commit, verify the line numbers have not shifted before using them as edit targets.

Claim	File	Line
`_agent_cache` declaration	`_chat_helpers.py`	54
`_agent_cache_lock` declaration	`_chat_helpers.py`	57
`_store_agent()` cache write	`_chat_helpers.py`	102-106
`SCHEMA_SQL` start	`database.py`	23
`_init_schema()` definition	`database.py`	105
`_migrate()` definition	`database.py`	118
`_ensure_settings_table()` call	`database.py`	121
Last existing migration block end	`database.py`	176
`PRAGMA journal_mode = WAL`	`database.py`	101
`SSEOutputHandler` class	`sse_handler.py`	89
`router = APIRouter(tags=["mcp"])`	`routers/mcp.py`	16
`monitor.stop()` in shutdown	`server.py`	210
`db.close()` in shutdown	`server.py`	212
`ToolRegistry.register()` write	`tools.py`	396
`_execute_recursive_pipeline()`	`orchestrator.py`	556
Event loop creation in orchestrator	`orchestrator.py`	656-661
`asyncio` local import	`orchestrator.py`	584
`render_component()` str.replace block	`component_loader.py`	240-246
`_get_mixin_prompts()`	`agent.py`	299
`StreamEventType` pipeline entries	`types/index.ts`	260-265
`PIPELINE_EVENT_MAP`	`api.ts`	626-643

This document was written by the pipeline orchestration team on feature/pipeline-orchestration-v1. For questions about requirements, contact us before implementing. For questions about your own PR, you know it better than we do. For cross-team coordination on REQ-3 names, initiate contact with a proposed name list — we will respond within one business day.

Two-phase local-first email triage agent — MVT (~1.5d CC-assisted) for v0.20.0, full EmailTriageAgent for v0.23.0. Covers auto-discovery, per-cohort autonomy, speech-act classification, undo ledger, Slack as first-class output channel, and an honest §27 catalog of research bets and unvalidated claims. §22.4 maps outstanding PRs to prerequisite role: #606 / #517 / #495 / #622 / #779 / #741 / #737. Landing the "minimum set" of #495 + #741 + one of #606 / #517 M1 collapses most of the missing-infrastructure workarounds before implementation starts.

## Summary Adds a two-phase spec for a local-first email triage agent that runs inference on-device via Lemonade (Ryzen AI NPU/iGPU) — no email content transits a cloud API. Phase **MVT** ships in ~1.5 days (CC-assisted) by thin-wrapping existing primitives; **Phase C1** polishes UX for v0.20.0; **Phase C2** adds scheduled triage, Agent Inbox HITL, and in-tree Gmail MCP for v0.23.0. Slack is a first-class output channel from day one (webhook → MCP → interactive buttons across phases). ## Key threads - **MVT ships fast because ~95% of plumbing exists.** §2.5 maps every required capability to an existing GAIA primitive (`MCPClientMixin`, `DatabaseMixin`, `RAGSDK`, `TalkSDK`, `SummarizeAgent`, `ApiAgent`, SSE). Why it matters: scoping the MVT as thin wrappers rather than new plumbing is what makes the ~1.5d estimate credible. - **§22.4 catalogs in-flight PRs as prerequisites.** Maps [#606](#606) (memory v2), [#517](#517) (autonomy M1/M3/M5), [#495](#495) (security.py), [#622](#622) (orchestrator), [#779](#779) (eval), [#741](#741) (vault), [#737](#737) (Slack connector) to which spec risks each one collapses. Why it matters: the "minimum set to start MVT safely" is named explicitly — #495 + #741 + one of #606 / #517 M1 — so sequencing is actionable. - **Memory-PR conflict flagged (§22.4.4).** #606 and #517 M1 overlap on memory subsystem; §22.4.4 calls out the reconciliation as a prerequisite decision, not a runtime surprise. - **§27 "Known Weaknesses, Unvalidated Claims, Decision Debt"** names the research bets (Custom AI Labels on local 4B, per-relationship voice, auto-follow-up quality) and unvalidated claims cited in the spec (97.5% tool-call reliability, GongRzhe archive date, etc.) so C2 isn't treated as an engineering certainty. - **Slack integration scoped as an output channel (§12.18).** Webhook at MVT → Slack MCP at C1 → interactive approve/edit/reject buttons at C2. Aligned with [messaging-integrations-plan.mdx](https://github.com/amd/gaia/blob/main/docs/plans/messaging-integrations-plan.mdx) (#635). ## Test plan - [ ] Render preview of `docs/plans/email-triage-agent.mdx` via Mintlify dev or amd-gaia.ai preview — confirm frontmatter, tables, code blocks, and section numbering (1–28) render cleanly. - [ ] Verify `docs/docs.json` navigation entry places the page under *Agent UI* group next to `email-calendar-integration`. - [ ] Cross-reference check: every `[Link](file.mdx)` target exists (`email-calendar-integration`, `autonomy-engine`, `security-model`, `agent-ui`, `setup-wizard`, `messaging-integrations-plan`). - [ ] Scan §22.4 PR numbers against the current PR queue (`gh pr list --repo amd/gaia --state open`) to confirm they're still open and the recommended sequence is feasible.

kovtcharov · 2026-04-18T02:03:30Z

Heads-up: semantic overlap with PR #495's scratchpad

@kovtcharov — flagging a tool-selection collision to think through before both PRs land.

PR #495 adds ScratchpadToolsMixin with query_data(sql: str) — a read-only SQL interface to a per-session SQLite workspace at ~/.gaia/scratchpad.db. It's the terminal step of a find_files → create_table → insert_data → query_data workflow for multi-document structured analysis (spending analysis, research reviews, tax prep).

PR #606 adds recall(query: str, ...) — hybrid vector + FTS5 + RRF + cross-encoder rerank over ~/.gaia/memory.db. The terminal step of a "store a fact → recall it later" workflow for personal knowledge / note-taking.

The collision

After both PRs merge, the ChatAgent will have both of these tools registered simultaneously. They answer two genuinely different questions:

Query intent	Right tool	Why
"What did I spend on groceries in March?"	`query_data`	Structured data from a document pipeline, aggregate math
"What did I learn about FTS5 last week?"	`recall`	Semantic concept match across conversation history
"How many research papers mention transformer attention?"	Either, depending on how they were stored	Ambiguous

The LLM will not naturally know which to pick. The third row is the failure mode — an LLM could pick recall when the data lives in a scratchpad table, or vice versa, and get zero results.

What I'd suggest

Mutually-exclusive system prompt section that explicitly disambiguates. Something like:

**WHICH "QUERY" TOOL TO USE:**
- `query_data(sql)` — for data you put there via `insert_data` in this session.
  Always SQL. Always structured. Always recent.
- `recall(query)` — for facts/notes you stored via `remember`, or conversation
  content from prior sessions. Always semantic. Always persistent.
- If the data didn't come from a `create_table` / `insert_data` pair, it's not
  in the scratchpad — use `recall`.

Distinct tool names to match distinct contexts. query_data is already good; consider renaming recall → recall_memory so the namespace visually separates ephemeral SQL work from persistent personal memory.
A guard-rail test — an eval prompt like "What's my current PTO balance from the employee handbook?" (RAG-retrieval, not scratchpad) and "How much did I spend on groceries in March?" (scratchpad, not recall) back-to-back, with ground truth. Catches cross-tool confusion the moment it regresses.

Scope suggestion

This is coordination work, not a blocker for either PR. Reasonable landings:

Land Enhance ChatAgent with file navigation, web browsing, scratchpad tools, and write security guardrails #495 first with a TODO comment in its system prompt pointing at this thread
Land feat(memory): agent memory v2 — second brain with hybrid search, LLM extraction, and observability dashboard #606 next and own the disambiguating prompt section in its own update
OR: land both in the same release and address the prompt coordination in a small follow-up PR together

Either way, worth a 10-minute conversation before the second of the two merges.

Cross-referencing from the #495 final-state comment: #495 (comment)

🤖 Generated with Claude Code

# Conflicts: # src/gaia/apps/webui/package-lock.json # src/gaia/apps/webui/src/components/ChatView.tsx # src/gaia/apps/webui/src/services/api.ts # src/gaia/apps/webui/src/types/index.ts # src/gaia/ui/_chat_helpers.py # src/gaia/ui/database.py # src/gaia/ui/models.py # src/gaia/ui/routers/sessions.py # src/gaia/ui/server.py # src/gaia/ui/sse_handler.py # src/gaia/ui/utils.py

github-actions bot added documentation Documentation changes dependencies Dependency updates agents Agent system changes cli CLI changes tests Test changes electron Electron app changes labels Mar 20, 2026

github-advanced-security AI found potential problems Mar 20, 2026

View reviewed changes

Comment thread src/gaia/agents/base/discovery.py Fixed

kovtcharov force-pushed the feature/agent-memory branch from e0eff31 to 068eead Compare March 21, 2026 23:13

itomek and others added 6 commits March 21, 2026 18:10

Merge remote-tracking branch 'origin/main' into feature/agent-memory

a06f9cc

# Conflicts: # src/gaia/agents/chat/agent.py # src/gaia/apps/webui/src/App.tsx # src/gaia/apps/webui/src/components/ChatView.tsx # src/gaia/ui/server.py

kovtcharov force-pushed the feature/agent-memory branch from d4fdb90 to a06f9cc Compare April 1, 2026 16:31

kovtcharov changed the title ~~feat(memory): persistent agent memory system with dashboard UI~~ feat(memory): agent memory v2 — second brain with hybrid search, LLM extraction, and observability dashboard Apr 1, 2026

Claude Code and others added 7 commits April 1, 2026 15:48

docs(memory): v2 user guide — second brain use cases, hybrid search, …

7e60495

…LLM extraction, temporal recall Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

test(memory): v2 integration tests — full pipeline with real SQLite +…

28e78ce

… FAISS, API integration Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

docs(memory): v2 SDK reference — embedding pipeline, hybrid search, M…

84d672f

…em0 extraction, consolidation, reconciliation Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

kovtcharov self-assigned this Apr 1, 2026

kovtcharov added this to the v0.20.0 — Agent Memory & Bootstrap [OSS] milestone Apr 1, 2026

Claude Code and others added 4 commits April 1, 2026 15:52

feat(memory): v2 mixin — embedding pipeline, FAISS, hybrid search, Me…

edb3f67

…m0 extraction, consolidation, reconciliation Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Claude Code and others added 4 commits April 1, 2026 16:55

github-actions bot added the mcp MCP integration changes label Apr 2, 2026

Claude Code and others added 13 commits April 1, 2026 17:17

feat(chat-ui): allow typing in input box while agent is streaming

9b35304

Remove the disabled attribute from the chat textarea so users can compose their next message while the agent is still generating a response. Sending is still blocked until streaming completes.

github-actions bot added chat Chat SDK changes eval Evaluation framework changes performance Performance-critical changes labels Apr 3, 2026

github-advanced-security AI found potential problems Apr 3, 2026

View reviewed changes

kovtcharov mentioned this pull request Apr 17, 2026

docs(plans): add email triage agent spec #796

Merged

4 tasks

kovtcharov mentioned this pull request Apr 18, 2026

Tool-registry scaling: resolve scratchpad/memory collision via #688 dynamic loading #800

Open

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(memory): agent memory v2 — second brain with hybrid search, LLM extraction, and observability dashboard#606

feat(memory): agent memory v2 — second brain with hybrid search, LLM extraction, and observability dashboard#606
kovtcharov wants to merge 38 commits intomainfrom
feature/agent-memory

kovtcharov commented Mar 20, 2026 •

edited

Loading

Uh oh!

Uh oh!

Check warning

Check warning

antmikinka commented Apr 13, 2026

Uh oh!

kovtcharov commented Apr 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

kovtcharov commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Architecture (v2)

Schema v2

Memory Tools (5 LLM-facing tools)

Use Cases

Observability Dashboard

Startup Sequence

Files

Design References

Test plan

Uh oh!

Uh oh!

Check warning

Uh oh!

Uh oh!

Check warning

Uh oh!

Uh oh!

Uh oh!

Uh oh!

antmikinka commented Apr 13, 2026

PR #606 Integration Requirements

What agent-memory-v2 Must Change to Fit the Pipeline Architecture

Who This Is For and What It Covers

Our Pipeline Architecture — Context for Kovtcharov

Section 1: Required Changes

REQ-1 — Fix Agent Cache Access in _register_agent_memory_ops()

REQ-2 — Add New Schema Columns via the _migrate() Pattern

REQ-3 — Rename AgentLoop SSE Event Types to Avoid Frontend Collision

REQ-4 — Insert close_store() in the Correct Position in the Shutdown Sequence

REQ-5 — Do Not Add dependencies to APIRouter in routers/mcp.py

REQ-6 — Add Collision Guard to ToolRegistry.register() Before MemoryMixin Rollout

REQ-7 — Make AgentLoop Externally Drivable (Injectable Event Loop + Async Generator Interface)

REQ-8 — Scope the Curly-Brace Escaping Fix to _get_mixin_prompts() Only

Section 2: Recommended Changes

REC-1 — Stabilize and Document GoalStore Public API Signatures [PRE-MERGE]

REC-2 — SystemDiscovery.get_cached_context() Must Return a Typed Object [PRE-MERGE]

REC-3 — Acknowledge the AgentRegistry Cache Staleness Race in Code Comments [POST-MERGE]

REC-4 — Use a Separate File Path for MemoryStore SQLite Database [PRE-MERGE]

REC-5 — Add MemoryEventType as a Separate Union in types/index.ts [POST-MERGE]

Section 3: What We Will Build On Top

BU-1 — Add MemoryMixin to All Five Pipeline Stage Agents

BU-2 — Wire GoalStore into PipelineOrchestrator

BU-3 — Phase 6 Convergence Spec: AgentLoop / PipelineEngine Unified Runtime

BU-4 — Wire SystemDiscovery into DomainAnalyzer for NPU-Aware Tier Recommendations

Section 4: Integration Contract Table

Section 5: Merge Sequencing

Phase A — Unblocked work (start immediately, no cross-team coordination needed)

Phase B — Coordination required (send us your proposal, wait for confirmation)

Phase C — After REQ-3 names confirmed (implement)

Phase D — Long-lead work (can run in parallel with Phase A-C, longer timeline)

Section 6: Cross-Team Coordination Gate (REQ-3 Name Agreement)

The problem in concrete terms

The protocol

Why this gate exists

Section 7: Quick-Start Checklist

Item 1 — Fix _agent_cache access (REQ-1)

Item 2 — Send us your AgentLoop SSE event name list (REQ-3 coordination)

Item 3 — Fix _migrate() column additions (REQ-2)

Item 4 — Remove router-level auth (REQ-5)

Item 5 — Scope the curly-brace fix to _get_mixin_prompts() (REQ-8)

Item 6 — Fix shutdown order (REQ-4)

Item 7 — Lock in GoalStore API signatures in the PR description (REC-1)

Appendix A: File Reference Summary

Appendix B: Specific Line Numbers for Every Claim

Uh oh!

kovtcharov commented Apr 18, 2026

Heads-up: semantic overlap with PR #495's scratchpad

The collision

What I'd suggest

Scope suggestion

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

kovtcharov commented Mar 20, 2026 •

edited

Loading

What `agent-memory-v2` Must Change to Fit the Pipeline Architecture

REQ-1 — Fix Agent Cache Access in `_register_agent_memory_ops()`

REQ-2 — Add New Schema Columns via the `_migrate()` Pattern

REQ-4 — Insert `close_store()` in the Correct Position in the Shutdown Sequence

REQ-5 — Do Not Add `dependencies` to `APIRouter` in `routers/mcp.py`

REQ-6 — Add Collision Guard to `ToolRegistry.register()` Before MemoryMixin Rollout

REQ-7 — Make `AgentLoop` Externally Drivable (Injectable Event Loop + Async Generator Interface)

REQ-8 — Scope the Curly-Brace Escaping Fix to `_get_mixin_prompts()` Only

REC-1 — Stabilize and Document `GoalStore` Public API Signatures [PRE-MERGE]

REC-2 — `SystemDiscovery.get_cached_context()` Must Return a Typed Object [PRE-MERGE]

REC-4 — Use a Separate File Path for `MemoryStore` SQLite Database [PRE-MERGE]

REC-5 — Add `MemoryEventType` as a Separate Union in `types/index.ts` [POST-MERGE]

BU-1 — Add `MemoryMixin` to All Five Pipeline Stage Agents

BU-2 — Wire `GoalStore` into `PipelineOrchestrator`

BU-3 — Phase 6 Convergence Spec: `AgentLoop` / `PipelineEngine` Unified Runtime

BU-4 — Wire `SystemDiscovery` into `DomainAnalyzer` for NPU-Aware Tier Recommendations

Item 1 — Fix `_agent_cache` access (REQ-1)

Item 3 — Fix `_migrate()` column additions (REQ-2)

Item 5 — Scope the curly-brace fix to `_get_mixin_prompts()` (REQ-8)