feat(memory): agent memory v2 — second brain with hybrid search, LLM extraction, and observability dashboard#606
feat(memory): agent memory v2 — second brain with hybrid search, LLM extraction, and observability dashboard#606kovtcharov wants to merge 38 commits intomainfrom
Conversation
e0eff31 to
068eead
Compare
## Summary - **`gaia init` now installs RAG dependencies** for `chat`, `rag`, and `all` profiles — adds `pip_extras` field to profile definitions and a new `_install_pip_extras()` step that detects editable vs package install, tries `uv pip` first with `pip` fallback - **Added `self.rag` None guards** to 8 RAG tools in `rag_tools.py` that were crashing with `'NoneType' object has no attribute 'index_document'` when RAG deps not installed - **Widened ChatAgent RAG init exception catch** from `ImportError` to `Exception` with warning-level logging and debug traceback - **Updated Agent UI docs** to include `[rag]` in install instructions (`[ui,rag]`) ## Test plan - [x] Lint passing (black, isort, pylint, flake8) - [x] All 1104 unit tests passing - [ ] `gaia init --profile chat` installs RAG deps automatically - [ ] Agent UI document indexing works after `pip install -e ".[rag]"` - [ ] RAG tools return actionable error when deps not installed (instead of crashing) 🤖 Generated with [Claude Code](https://claude.com/claude-code)
C-1: Guard winreg import and all registry-scanning methods in discovery.py
so the module loads cleanly on Linux/macOS where winreg is absent.
Also guard _scan_credential_manager() behind sys.platform check to
avoid subprocess.CREATE_NO_WINDOW AttributeError on non-Windows.
C-3: Replace direct _lock/_conn access in CLI with two new MemoryStore
public methods: get_source_counts() and delete_by_source(source).
delete_by_source() wraps FTS cleanup + DELETE in a single atomic
transaction with rollback, removing the per-ID loop that could
leave knowledge/FTS diverged on partial failure.
C-4: Add close_store() to memory router module; call it from FastAPI
lifespan shutdown so the WAL is checkpointed and the SQLite
connection is released cleanly on server exit.
M-2: list_knowledge endpoint now excludes sensitive items by default.
New include_sensitive=false query param (default false) controls
visibility; sensitive=true still filters to sensitive-only.
M-6: Add append-only comment to conversations FTS trigger block noting
that an AFTER UPDATE trigger would be required if store_turn()
ever changes to update existing rows.
Tests: +9 tests (394 total) covering get_source_counts, delete_by_source
rollback discipline, and all three sensitive filter modes in the router.
- Fix _original_user_input=None fallback bug in _after_process_query (getattr default ignored None; switch to `or` to handle init state) - Extract VALID_CATEGORIES/MAX_CONTENT_LENGTH/MAX_TURN_LENGTH and other magic numbers to named module-level constants in memory_store.py - Import constants in memory.py to eliminate duplicate category sets and ensure truncation limits stay in sync across all call sites - DRY: memory router imports VALID_CATEGORIES from data layer instead of redefining its own copy - Clean up unused imports in test files (F401/F811 flake8 violations) - 394 unit tests passing, flake8 clean
Replace substring `"github.com" in url_lower` with urlparse().hostname comparison to fix CodeQL CWE-20 "Incomplete URL substring sanitization". A crafted URL like http://evil.com/github.com could otherwise bypass the check. Hostname equality/suffix match is unambiguous.
Security: - recall tool now filters out sensitive items before returning results to the LLM — sensitive entries (API keys, credentials) are for internal use only and must not appear in tool output. Performance: - Add get_by_category_contexts() to MemoryStore: single SQL query with WHERE context IN (active, 'global') replaces two separate get_by_category() calls in _get_context_items(), halving DB round-trips per system-prompt build (was 6 queries, now 3). - Replace N+1 correlated subquery in get_sessions() with a LEFT JOIN on MIN(id) per session — scales linearly regardless of session count. Reliability: - Add PRAGMA busy_timeout=5000 so concurrent WAL readers/writers in the same process (dashboard REST singleton + ChatAgent) retry for 5 s instead of failing immediately with SQLITE_BUSY. Correctness: - update_memory tool truncation check now uses MAX_CONTENT_LENGTH constant instead of hardcoded 2000, keeping it in sync with memory_store.py. Testability: - Replace sys.exit(1) in _bootstrap_chat/_bootstrap_discover/_bootstrap_reset helpers with raise RuntimeError; _handle_memory_bootstrap catches and exits, making helpers unit-testable in isolation. Tests (+34): - TestGetByCategoryContexts (5): single-query context+global fetch - TestGetAllKnowledgeSortByValidation (4): sort_by whitelist protection - TestGetSessionsFirstMessageV2 (3): join-based first_message - test_memory_discovery.py (22): _classify_remote, _classify_path, _classify_domain, scan_all structure, Windows guard 428 tests passing, 1 skipped (Windows-only guard on non-Windows).
# Conflicts: # src/gaia/agents/chat/agent.py # src/gaia/apps/webui/src/App.tsx # src/gaia/apps/webui/src/components/ChatView.tsx # src/gaia/ui/server.py
d4fdb90 to
a06f9cc
Compare
Comprehensive rewrite of agent-memory-architecture.md as a single unified design document. Key changes: - Hybrid search: vector (FAISS) + BM25 (FTS5) + RRF fusion + cross-encoder reranking (ms-marco-MiniLM-L-6-v2). No fallback — embeddings are a hard requirement. - Mem0-style LLM extraction: ADD/UPDATE/DELETE/NOOP operations against existing memory, replacing naive extract-and-store. - Zep-inspired fact lineage: superseded_by column preserves history when facts are corrected rather than silently overwriting. - Hindsight-inspired background reconciliation: pairwise similarity check on startup detects contradictions missed at extraction time. - Complexity-aware recall depth: adaptive top_k (3/5/10) based on query complexity heuristics. - Temporal range search: time_from/time_to on all search methods for natural time-based recall. - Conversation consolidation: auto-distill old sessions to durable knowledge before 90-day prune. - Second brain use cases: journaling, meeting notes, PKM, reminders, wake-up scheduling, recurring commitments. - Removed all graceful degradation / silent fallback patterns. - Removed openjarvis-memory-analysis.md (temp analysis doc). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…coverage, temporal+superseded filters - POST /api/memory/consolidate, /reconcile, /rebuild-embeddings - GET /api/memory/embedding-coverage - Updated GET /api/memory/knowledge with include_superseded, time_from, time_to - Updated GET /api/memory/stats with embedding coverage and reconciliation stats - 95 tests passing, lint clean Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…d_by, temporal search, consolidation - Schema v1→v2 migration: embedding BLOB, superseded_by TEXT, consolidated_at TEXT - New methods: store_embedding, get_items_with/without_embeddings, get_unconsolidated_sessions, mark_turns_consolidated, get_items_for_reconciliation - Updated search() with time_from/time_to, superseded_by IS NULL, use_count increment - Updated all query methods with superseded_by IS NULL filter - 275 tests passing, lint clean Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…LLM extraction, temporal recall Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… FAISS, API integration Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ledge browser, activity timeline, tool stats
6-section dashboard: header stat cards, 30-day activity bar chart,
paginated knowledge browser with entity/category/context/search filters,
tool performance table, conversation history with FTS search,
upcoming & overdue temporal panel.
Features:
- Embedding coverage indicator with progress bar
- Maintenance dropdown: consolidate, rebuild embeddings, reconcile, rebuild FTS
- Click-to-expand knowledge row detail (metadata, timestamps, superseded_by chain)
- Inline actions: edit, delete, toggle sensitive, copy ID
- Superseded entries toggle with server-side filtering
- Toast notification system for all CRUD and maintenance operations
- Brain icon in sidebar for navigation
- Keyboard support: Escape key (layered close), Enter/Space on rows
- ARIA labels, roles, and aria-live for accessibility
- Responsive layout (3 breakpoints)
- Relative date formatting ("in 2 days", "3 days ago")
- API calls aligned with backend router field names
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…em0 extraction, consolidation, reconciliation Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The backend returns metadata as parsed JSON (dict), not a string. Rendering it directly showed [object Object]. Now uses JSON.stringify for object metadata and plain text for strings. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…e cases - Strengthen conversation context filtering test with explicit zero-result assertions instead of vacuous loop - Add due_at validation, empty-list consolidation, and history limit tests - Remove dead _past_iso import from API test file - 117 tests, all passing Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…m0 extraction, consolidation, reconciliation Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…up scope includes entity, dynamic context always returns time - MemoryStore.search(): corrected from "hybrid" to "FTS5 keyword search" (hybrid is MemoryMixin._hybrid_search) - get_memory_dynamic_context(): fixed "returns empty" claim — always returns current time - store() dedup scope: category+context+entity, not category+context - get_items_with_embeddings(): added missing top_k, time_from, time_to params - _classify_query_complexity: added missing medium/complex signal words - get_entities(): added missing last_updated field in return - Added undocumented update_confidence() and delete_by_source() methods - update(): noted embedding cleared on content change Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… fixes - memory_store.py: set embedding=NULL when content changes in update() to force re-embedding (stale embedding would return wrong results) - server.py: alphabetize router imports - test fixes: formatting cleanup, mixin test updates from parallel tasks Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…aces, expand test coverage - Replace hardcoded mixin prompt dispatch in Agent._get_mixin_prompts() with auto-discovery of all get_*_system_prompt() methods — no manual registration needed - ChatAgent._get_mixin_prompts() simplified to call super() then filter SD when uninitialized - Fix KeyError in _EXTRACTION_PROMPT.format() by escaping literal curly braces - Update test_memory_mixin: reflect always-present memory instructions, fix embed/extract tests - Add test_memory_store coverage: update clears embedding, dedup clears embedding, superseded exclusion in get_by_entity/get_upcoming/get_by_category_contexts Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…nce fix - formatRelativeDate: add seconds granularity (6s ago, 45s ago), fall back to full date+time (not date-only) for items older than 30 days - Swap formatDate → formatRelativeDate in Updated column, conversation turn timestamps, and session last-activity - memory_store.get_stats(): add total_retrievals (SUM of use_count) - Add "Retrieved" stat card (gold accent) with avg recalls per memory - Rename "Memories" card label to "Stored" for clarity - remember() tool now stores with confidence=0.7 (was defaulting to 0.50) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add GET/PUT /api/memory/settings endpoint (mcp_memory_enabled key) stored in ChatDatabase settings table - agent_ui_mcp.py: conditionally register memory_stats, memory_list, memory_recall tools when mcp_memory_enabled=true at server start - memoryApi.ts: add getMemorySettings / updateMemorySettings helpers - Dashboard: Settings section with toggle switch, loads on open, persists immediately on click; hint explains restart requirement Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…resent instructions block, new header _build_stable_memory_prompt() now always emits instructions even with zero memories. Update docs to reflect new header (=== MEMORY (Persistent Second Brain) ===), the instructions block, the zero-memories fallback text, and the 4000-char hard cap note. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Remove the disabled attribute from the chat textarea so users can compose their next message while the agent is still generating a response. Sending is still blocked until streaming completes.
7 test classes covering the full memory second-brain pipeline: - TestMemoryRememberRecall: store/search round-trip, confidence bumps, entity tagging, no false positives - TestMemoryNoteTaking: categories, reminders with due dates, context isolation, sensitive items, error/skill notes - TestMemoryJournaling: multi-turn conversation storage, FTS over history, session isolation, cross-session knowledge persistence - TestMemoryConfidenceAndDedup: explicit vs LLM-extracted confidence, dedup on overlap, supersession exclusion, embedding cleared on update - TestMemoryStatsAPI: total_retrievals in stats, REST endpoint contract - TestMemorySettingsAPI: GET/PUT mcp_memory_enabled default + persistence - TestMemoryMixinSystemPrompt: preferences/facts in stable prompt, sensitive exclusion, dynamic context timestamps, conversation storage All run without Lemonade — deterministic fake embeddings, mocked LLM. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ions Reconcile and consolidate were silently no-ops because self.chat (AgentSDK) isn't available when init_memory() runs (before Agent.__init__). Fix: defer both LLM-dependent steps to first process_query() invocation via _memory_post_init_pending flag and _run_memory_post_init() helper. Remove stale hasattr(self, 'chat') guards from both methods. Additional v2 completions staged alongside: - system_context.py: day-0 OS/hardware/software context collection - memory_store.py: v2 store enhancements (WAL checkpoint, get_* methods) - cli.py: memory CLI commands (status, clear, export, context) - agent_ui_mcp.py: MCP memory access toggle (default disabled) - ui/routers/memory.py: memory router v2 endpoints - docs/spec/agent-memory-architecture.pdf: architecture spec - tests: 4 new deferred-init tests + store/router/eval coverage Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- discovery.py: add _make_profile_fact() for user profile facts - cli.py: first-boot onboarding flow — detect missing profile, offer quick intro (~1 min) on first gaia chat launch - _chat_helpers.py: _register_agent_memory_ops() wires live ChatAgent LLM + FAISS into memory router for consolidation / reconciliation - memory.py, memory_store.py: MemoryMixin v2 improvements - memory.py router: updated endpoints - tests: test_memory_router updates plans/autonomous-agent-mode.md: detailed spec for the upcoming feature/agent-always-running branch (loop state machine, request_user_input tool design, AgentLoop architecture, UI components, open questions)
…temporal examples, ASCII arrows - Replace all DB-log agent responses with conversational voice (12 occurrences) - Add missing error_auto source (0.5) to confidence table - Convert confidence bullet list to scannable table with all 6 sources - Show real agent output in temporal recall examples instead of [internal process] - Add time-range example to search_past_conversations - Fix update_memory double Agent: line — separate output from internal annotation - Replace Unicode arrows (→) with ASCII (->) in fact lineage code block Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…d + spec v2 Memory store: - Add 'goal' and 'task' to VALID_CATEGORIES for autonomous agent use; goals/tasks require approved_for_auto=True metadata before the loop acts Sessions: - Add private: bool field to sessions (CreateSessionRequest, UpdateSessionRequest, SessionResponse) — private sessions are never touched by the autonomous loop Memory dashboard & UI: - MemoryDashboard, Sidebar, ChatView, memoryApi.ts updates - types/index.ts, api.ts, utils.py additions Backend: - _chat_helpers.py, database.py, routers/memory.py, routers/sessions.py updates - models.py: private session field Tests: - test_memory_router.py updates Spec: - plans/autonomous-agent-mode.md v2 — addresses all 15 security/design issues from critique: event-driven triggers, background_mode immediate-deny, memory injection prevention, audit log as P1, PathValidator deadlock fix, SCHEDULE bounds, tunnel safety, private session exclusion, user input queue, __NO_RESPONSE__ sentinel, permission overlay context, step budgets
…2, fix tool-message sentinel bug - Add GoalStore (goal_store.py) with state-machine goals/tasks and goals router - Expand SystemDiscovery with UserAssist, recent file types, gaming/media, macOS app usage scans - Replace goal/task categories in MemoryStore with permission category; export GoalStore from base - Extend memory router with goals CRUD, upcoming, conversation search, stats v2, settings - Memory dashboard v2: goals panel, search, upcoming, knowledge CRUD, conversation history - Fix sdk.py: remove "continue" sentinel from _prepare_messages_for_llm; convert tool messages to user role (was assistant) so LLM sees tool results as a proper user turn, not a bare "continue" command that caused nonsensical responses like "What do you want me to continue?" - Add test_sdk_tool_messages.py regression tests for the sentinel fix - Update bootstrap inference with app-usage, file-type, and gaming/media sections Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…istory storage Qwen3.5 reasoning models emit <think>...</think> blocks before their JSON response. These were passed raw into _parse_llm_response and stored verbatim in conversation_history. On the next turn the LLM saw its own prior thinking as historical assistant context and used it to infer what the user "must have" said, producing replies unrelated to the actual current message (e.g. answering a games question when the user asked to set a reminder). Strip <think>...</think> from both the main response and the plan_response paths immediately after the LLM returns, before parsing and before storing in messages. The SSE streaming handler already filtered these for display; this fixes the agent-side persistence. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…journal system prompt
- recall() now accepts domain= parameter for journal/sub-type filtering
(e.g. recall(category='note', domain='journal') returns all journal entries)
- get_by_category() gains domain= filter parameter with SQL pushdown
- Default limit raised to 20 for list-style queries (no query text)
- Added perf_counter timing in recall tool (DEBUG level per-step, INFO total)
- System prompt now teaches agent:
- 'Show my journal' → recall(category='note', domain='journal')
- 'What reminders?' → recall(category='reminder')
- Tightened storage IMPERATIVE for set-a-reminder / journal-entry requests
- MIN_EXTRACTION_WORDS lowered 20→5, EXTRACTION_TIMEOUT_S raised 3→8
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…e eval scenarios recall() tool: - Expanded docstring to document both search and browse/list modes - Added offset= parameter for pagination (browse page 2, etc.) - Max limit raised to 100 for list-style queries - has_more flag in response signals more results available Eval personas (simulator.md + runner.py): - Added home_user, small_business_owner, student, creative_professional - Each persona has distinct communication style and context New memory eval scenarios (4): - memory_todo_tracking: persistent todo list (add, complete, list) - memory_notes_capture: second-brain quick note capture & retrieval - memory_small_business_context: business context + customer prefs - memory_student_study_assistant: courses, deadlines, learning prefs - memory_home_user_basics: simple personal facts (non-technical) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
New memory_health_tracking scenario tests the agent as a personal health journal: logging sleep, exercise, and habits, then analyzing patterns and giving personalized recommendations based on accumulated data. Uses home_user persona for non-technical communication style. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
ChatAgent personality rules:
- GREETING RULE now memory-aware: references stored name/project instead
of generic "What are you working on?" template
- Added FACT-SHARING RULE: when user shares personal info ("I'm Sam",
"I use Python"), agent must acknowledge the specific content, not give
a generic greeting response
- Removed hardcoded "RIGHT: Hey! What are you working on?" example that
the LLM was copying verbatim for every response
Memory instructions:
- Greeting personalization examples added (use stored name + project)
- Shortened some instruction lines to reduce prompt bloat
Root cause: personality greeting examples were overriding memory context,
causing identical "Got it, Sam! What are you working on?" responses
regardless of what the user actually said.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… CLI - Add AgentLoop background task for autonomous goal execution with event-driven state machine (IDLE/RUNNING/SCHEDULED/PAUSED) - Expand MCP router with connection health, tool list, and control endpoints - Add agent-mode and memory toggle controls to SettingsModal (frontend + API) - Extend CLI with memory/goal/agent-mode subcommands - Update GoalStore with scheduling, priority sorting, and rate-limit guards - Wire SSE handler to stream AgentLoop state-change events - Update memory router and database schema for new fields - Fix discovery scan edge cases and tool-message sentinel handling All 1891 unit tests passing. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
| yield _sse({"type": "done", "total": 0}) | ||
|
|
||
| return StreamingResponse( | ||
| _generate(), |
Check warning
Code scanning / CodeQL
Information exposure through an exception Medium
| yield _sse({"type": "done", "total": 0}) | ||
|
|
||
| return StreamingResponse( | ||
| _generate(), |
Check warning
Code scanning / CodeQL
Information exposure through an exception Medium
…tration-v1 Full recursive pipeline analysis (planning-analysis-strategist → software-program-manager → quality-reviewer → technical-writer-expert) of amd#606 (feat(memory): agent memory v2 — kovtcharov). Key findings: - 4 HIGH severity collisions: _chat_helpers.py, database.py, sse_handler.py, routers/mcp.py — all follow same pattern: our branch created comprehensive modules where PR amd#606 made targeted additions. Resolution: absorb PR's additions into ours during post-merge rebase. - 1 ZERO conflict: sdk.py ChatSDK→AgentSDK rename is identical in both branches — auto-resolves on merge. - 6 build-upon opportunities: MemoryMixin for pipeline agents, GoalStore↔PipelineExecutor wiring, AgentLoop convergence, SystemDiscovery→DomainAnalyzer calibration, GapDetector caching, declarative memory tool-calls in component-framework templates. - Recommended: PR amd#606 lands in main first, we rebase and absorb. - Open Items 9–15 added to branch-change-matrix.md tracking all conflicts and Phase 6 build-upon work. Files: docs/reference/pr606-integration-analysis.md (531 lines), docs/reference/branch-change-matrix.md (+16 lines, OI 9–15) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
PR #606 Integration RequirementsWhat
|
| Reserved string | Where it is hardwired | Effect of collision |
|---|---|---|
loop_back |
types/index.ts line 260, api.ts line 637 |
Routed to onLoopBack pipeline callback |
quality_score |
types/index.ts line 261, api.ts line 638 |
Routed to onQualityScore pipeline callback |
phase_jump |
types/index.ts line 262, api.ts line 639 |
Routed to onPhaseJump pipeline callback |
iteration_start |
types/index.ts line 263, api.ts line 640 |
Routed to onIterationStart pipeline callback |
iteration_end |
types/index.ts line 264, api.ts line 641 |
Routed to onIterationEnd pipeline callback |
defect_found |
types/index.ts line 265, api.ts line 642 |
Routed to onDefectFound pipeline callback |
The StreamEventType union lives at lines 260-265 of src/gaia/apps/webui/src/types/index.ts. The routing table PIPELINE_EVENT_MAP is a Record<string, keyof PipelineStreamCallbacks> at lines 626-643 of src/gaia/apps/webui/src/services/api.ts. These strings are literal TypeScript values, not constants — they are compared directly against the type field of incoming SSE JSON objects.
What it must do instead:
You must propose replacement names for every AgentLoop event type that collides with the six names above. Send us your proposed names before writing any code — we will confirm no collision exists on our side, then you implement.
Suggested naming convention for AgentLoop events: prefix with memory_ or agent_loop_ to create a distinct namespace. Examples (not binding — send us your proposed list):
agent_loop_cycle_startinstead ofiteration_startagent_loop_cycle_endinstead ofiteration_endmemory_quality_checkinstead ofquality_scoreagent_loop_backinstead ofloop_backagent_phase_transitioninstead ofphase_jumpmemory_defect_detectedinstead ofdefect_found
Why this matters:
There is no runtime error when a collision occurs. The frontend PIPELINE_EVENT_MAP routes the event to a pipeline callback. The pipeline callback expects specific fields (iteration, score, defect_type, etc.) that your AgentLoop event does not carry. The pipeline UI component renders with undefined fields or renders stale pipeline state data instead of memory state data. This failure mode is invisible at the Python layer and produces silently wrong UI behavior.
Cross-team coordination required:
Do not implement this change unilaterally. Contact us (pipeline team, feature/pipeline-orchestration-v1) with your proposed replacement name list. We confirm. Then you implement and update REQ-5's MemoryDashboard.tsx accordingly. See Section 6 for the full coordination gate protocol.
Correctness checklist for REQ-3:
- No AgentLoop SSE event uses any of the 6 reserved strings listed above
- Replacement names agreed bilaterally before implementation
- All Python-side emitters updated to the new names
-
MemoryDashboard.tsxevent type strings updated to match (see REC-5)
REQ-4 — Insert close_store() in the Correct Position in the Shutdown Sequence
Classification: MERGE-BLOCKER
Estimated effort: 15 minutes
File to change: src/gaia/ui/server.py
What your PR currently does:
Your PR calls close_store() (or equivalent) either after db.close() at line 212, or registers a shutdown hook that runs after the lifespan context exits.
What it must do instead:
The lifespan shutdown block in server.py runs from lines 209-213:
# Line 209 (context yield exits here, shutdown begins)
# Line 210:
await monitor.stop()
logger.info("Document file monitor stopped")
# Line 212:
db.close()
logger.info("Database connection closed")Your close_store() call must be inserted between line 210 and line 212 — after monitor.stop() completes, before db.close() is called. The position must look like:
await monitor.stop()
logger.info("Document file monitor stopped")
# INSERT: close_store() here — after monitor, before db
close_store()
logger.info("Memory store closed")
db.close()
logger.info("Database connection closed")Why this matters:
db.close() at line 212 calls self._conn.close() on the SQLite connection and then sets self._conn = None (lines 179-182 of database.py). The _transaction() context manager (lines 184-195 of database.py) checks if self._conn is None and raises RuntimeError("Database connection is closed"). If your MemoryStore shares the ChatDatabase connection (or opens its own WAL-mode connection to the same file), a close_store() that runs after db.close() will either operate on a closed connection or fail to flush pending writes before shutdown.
If MemoryStore uses a completely separate file and connection, the ordering still matters for clean log sequencing and to match our lifespan expectations for future shutdown hooks.
Correctness checklist for REQ-4:
-
close_store()is called aftermonitor.stop()(line 210) and beforedb.close()(line 212) -
close_store()is idempotent (safe to call even if store was never opened)
REQ-5 — Do Not Add dependencies to APIRouter in routers/mcp.py
Classification: MERGE-BLOCKER
Estimated effort: 15 minutes
File to change: src/gaia/ui/routers/mcp.py
What your PR currently does:
Your PR adds authentication or dependency injection to the MCP router by modifying the APIRouter constructor at line 16 of routers/mcp.py.
What it must do instead:
Line 16 of routers/mcp.py currently reads:
router = APIRouter(tags=["mcp"])This line must remain exactly as it is — no dependencies=[...] argument. If your new control endpoints (the ones that trigger memory operations via MCP) require authentication, add the dependencies argument at the individual endpoint level using the @router.post(...) decorator, not at the APIRouter level.
Example of the correct pattern:
# WRONG — do not do this:
router = APIRouter(tags=["mcp"], dependencies=[Depends(verify_auth)])
# CORRECT — per-endpoint auth:
router = APIRouter(tags=["mcp"])
@router.post("/api/mcp/memory/control", dependencies=[Depends(verify_auth)])
async def memory_control_endpoint(...):
...Why this matters:
The existing catalog endpoints — GET /api/mcp/catalog, GET /api/mcp/catalog/{name}, and GET /api/mcp/install-config — are intentionally unauthenticated. They serve a read-only curated catalog of MCP server definitions. These endpoints are accessed by the GAIA installer and by external tooling during first-run setup, before any auth tokens exist. Adding dependencies=[...] to the APIRouter constructor applies that dependency to all routes registered on the router, including these unauthenticated catalog routes. This breaks the installer flow with a 401 or 403 on the catalog fetch.
Correctness checklist for REQ-5:
- Line 16 of
routers/mcp.pyreadsrouter = APIRouter(tags=["mcp"])— nodependenciesargument - Any auth on new endpoints is applied via
@router.post(..., dependencies=[...])at the individual route level -
GET /api/mcp/catalog,GET /api/mcp/catalog/{name}, andGET /api/mcp/install-configremain unauthenticated
REQ-6 — Add Collision Guard to ToolRegistry.register() Before MemoryMixin Rollout
Classification: REBASE-BLOCKER
Estimated effort: 1 hour
File to change: src/gaia/agents/base/tools.py
What the current code does:
ToolRegistry.register() at line 396 of tools.py writes:
self._tools[name] = {
"name": name,
"function": func,
"description": description or (func.__doc__ or ""),
"parameters": params,
"atomic": atomic,
"display_name": display_name or name,
}There is no check for whether name already exists in self._tools. A second call with the same name silently overwrites the previous registration with no log, no warning, and no error.
What it must do instead:
Add a logger.warning() call immediately before the assignment at line 396:
if name in self._tools:
logger.warning(
"ToolRegistry: tool name %r is already registered and will be overwritten. "
"This may indicate duplicate mixin registration.",
name,
)
self._tools[name] = {
...
}Why this matters now (not later):
Our BU-1 (described in Section 3) adds MemoryMixin to all five pipeline stage agents: DomainAnalyzer, PlannerAgent, ExecutorAgent, ReviewerAgent, and SynthesizerAgent. Each agent instantiation calls _register_agent_memory_ops(), which registers your five memory tool names: remember, recall, update_memory, forget, search_past_conversations. All five agents share a single process and, in some pipeline configurations, a shared ToolRegistry instance. That means your five tool names will be registered five times.
Silent overwrites in this scenario will mask real bugs: if DomainAnalyzer's remember function is accidentally overwritten by SynthesizerAgent's remember function (due to, say, a different bound self), the pipeline will call the wrong agent's memory with no diagnostic output. With a logger.warning() in place, we will see exactly which agent caused the collision and can fix the registry-sharing design before it causes data corruption.
The five tool names are currently safe (each agent registers the same function with the same signature). The guard is needed as insurance before we scale to five registrations.
Correctness checklist for REQ-6:
-
logger.warning()is emitted whenname in self._toolsbefore the assignment at line 396 - Warning message includes the tool name and indicates potential duplicate mixin registration
- The overwrite still proceeds after the warning (do not raise — existing behavior is preserved for non-memory tools)
-
loggeris already imported intools.py— do not add a new import
REQ-7 — Make AgentLoop Externally Drivable (Injectable Event Loop + Async Generator Interface)
Classification: REBASE-BLOCKER
Estimated effort: 3-5 days
Files to change: Your AgentLoop class and any caller that instantiates it
What your PR currently does:
Your AgentLoop creates its own event loop internally — either via asyncio.new_event_loop() or asyncio.run() — and runs to completion. It is not drivable from an external async caller.
What it must do instead:
AgentLoop must expose three things:
1. Injectable event loop:
AgentLoop must accept an optional event loop parameter in its constructor. It must not call asyncio.new_event_loop() or asyncio.set_event_loop() internally. If no loop is provided, it may use asyncio.get_event_loop(), but it must never create or set one.
class AgentLoop:
def __init__(self, ..., loop: asyncio.AbstractEventLoop | None = None):
self._loop = loop # Injected by caller, not created here2. Async generator interface:
AgentLoop must expose a coroutine or async generator that PipelineEngine can drive:
async def run(self, goal: str) -> AsyncIterator[LoopEvent]:
"""
Drive the agent loop externally.
Yields LoopEvent objects as work progresses.
Caller controls when to advance, cancel, or inspect state.
"""
...
yield LoopEvent(type="cycle_start", ...)
...
yield LoopEvent(type="cycle_end", ...)The exact LoopEvent schema is flexible — work with us on the spec (see BU-3). The requirement is that the method is an async generator (or returns an AsyncIterator) so PipelineEngine can async for event in loop.run(goal): without blocking its own coroutine.
3. GoalStore as constructor injection:
AgentLoop must accept GoalStore as a constructor parameter rather than instantiating it internally. This allows PipelineOrchestrator to pass its own GoalStore instance:
class AgentLoop:
def __init__(self, ..., goal_store: GoalStore | None = None):
self._goal_store = goal_store or GoalStore()Why this matters:
_execute_recursive_pipeline() (lines 656-661 of src/gaia/pipeline/orchestrator.py) creates its own event loop:
loop = asyncio.new_event_loop()
asyncio.set_event_loop(loop)
try:
return loop.run_until_complete(_run_async())
finally:
loop.close()This pattern exists because _execute_recursive_pipeline() is called from a sync context (the FastAPI route handler thread). If AgentLoop also calls asyncio.new_event_loop() inside a coroutine that is already running inside loop.run_until_complete(), Python will raise RuntimeError: This event loop is already running. There is no safe way to nest two asyncio.new_event_loop() calls in the same thread.
Our Phase 6 plan (BU-3) is to converge PipelineEngine and AgentLoop into a single runtime where the engine drives the loop iteration-by-iteration. That convergence is architecturally impossible if AgentLoop owns its own event loop. The injectable loop + async generator interface is the minimal contract that makes convergence tractable without a full rewrite.
Correctness checklist for REQ-7:
-
AgentLoop.__init__accepts an optionalloopparameter -
AgentLoopdoes not callasyncio.new_event_loop()orasyncio.set_event_loop()anywhere -
AgentLoop.run(goal)is an async generator that yieldsLoopEventobjects -
GoalStoreis injected via the constructor, not instantiated insideAgentLoop -
AgentLoopcan be used both standalone (with defaultGoalStore) and driven externally
REQ-8 — Scope the Curly-Brace Escaping Fix to _get_mixin_prompts() Only
Classification: MERGE-BLOCKER
Estimated effort: 1 hour
Files to change: src/gaia/agents/base/agent.py (scope the fix); verify no change to src/gaia/utils/component_loader.py
What your PR currently does:
Your PR fixes a curly-brace escaping issue where mixin prompt strings containing {...} were being interpreted as Python .format() placeholders. The fix escapes or converts the strings before the format call. However, your fix either touches component_loader.py or applies the escaping too broadly (affecting code paths beyond _get_mixin_prompts()).
What it must do instead:
The escaping fix must be applied only inside _get_mixin_prompts() at line 299 of src/gaia/agents/base/agent.py. It must not touch src/gaia/utils/component_loader.py at all.
Here is why component_loader.py must not be changed: ComponentLoader.render_component() (lines 240-246 of component_loader.py) uses str.replace() with {{KEY}} format:
# Lines 240-246 of src/gaia/utils/component_loader.py
# Replace {{VARIABLE}} placeholders
for key, value in variables.items():
# Handle both "KEY" and "{{KEY}}" formats
if not key.startswith("{{"):
key = f"{{{{{key}}}}}"
content = content.replace(key, str(value))This is a literal str.replace() call — not Python string .format(). It operates on {{KEY}} tokens by doing a character-by-character string search and replace. Python .format() escaping rules (doubling curly braces as {{ and }} to produce literal { and }) are completely orthogonal to this mechanism. There is no interaction between Python .format() and str.replace(). Changing component_loader.py in response to a .format() escaping bug is a category error — the two systems do not share state, parsing, or execution path.
Verification step you must perform:
After your changes, run:
git diff HEAD -- src/gaia/utils/component_loader.pyThe diff must be empty. If it is not empty, your fix has overreached. Revert the component_loader.py changes and re-scope the fix to agent.py line 299 only.
Correctness checklist for REQ-8:
- Curly-brace escaping fix is inside
_get_mixin_prompts()atagent.pyline 299 -
src/gaia/utils/component_loader.pyis unchanged (verified bygit diff) -
ComponentLoader.render_component()behavior (lines 240-246) is unaffected - Fix does not apply to any other string formatting path in
agent.py
Section 2: Recommended Changes
These five items are not gating conditions for the merge but will make the integration significantly smoother. Items marked [PRE-MERGE] we would strongly prefer to see in the PR before merge. Items marked [POST-MERGE] can follow in a subsequent PR.
REC-1 — Stabilize and Document GoalStore Public API Signatures [PRE-MERGE]
What we need:
We are building BU-2 (wiring GoalStore into PipelineOrchestrator) against your GoalStore API. We need the following three method signatures to be stable and documented in your PR description before we start:
def create_goal(title: str, priority: int) -> str:
"""Create a new goal. Returns the goal_id (string UUID)."""
def create_task(goal_id: str, title: str) -> str:
"""Create a task under a goal. Returns the task_id (string UUID)."""
def update_task_status(
task_id: str,
status: Literal["PENDING", "ACTIVE", "COMPLETED", "FAILED"]
) -> None:
"""Update the status of a task. Raises KeyError if task_id not found."""If your current implementation differs in signature (different parameter names, different return type, different status literals), tell us before merge. We will update BU-2's dependency spec to match. We just need the signatures to be frozen so BU-2 does not break on every rebase.
Action: Add the exact method signatures with docstrings to your PR description. If the implementation deviates from what is above, describe how it deviates.
REC-2 — SystemDiscovery.get_cached_context() Must Return a Typed Object [PRE-MERGE]
What we need:
We are wiring SystemDiscovery.get_cached_context() into our DomainAnalyzer agent (BU-4) to enable NPU-aware tier recommendations. We need the return value to reliably expose:
context.hardware.npu_available: bool
context.hardware.npu_driver_version: str # empty string when no NPU, not NoneThe import path must be:
from gaia.agents.base.discovery import SystemDiscoveryThe method must be idempotent: the first call performs hardware discovery and caches the result; subsequent calls return the cached result without re-probing hardware. This is required because DomainAnalyzer may be instantiated multiple times per pipeline run.
The npu_driver_version must be str, not Optional[str]. When no NPU is present, return an empty string "". Our code does if context.hardware.npu_driver_version: and an empty string is falsy — None requires a separate is None check that we do not want to add.
Action: Verify your SystemDiscovery.get_cached_context() implementation matches the above contract. If the attribute path is different (e.g., context.npu.available instead of context.hardware.npu_available), tell us the actual path so we can update BU-4's implementation.
REC-3 — Acknowledge the AgentRegistry Cache Staleness Race in Code Comments [POST-MERGE]
Context:
Code in your PR that reads AgentDefinition objects from AgentRegistry may encounter stale cached definitions. We are actively fixing issue OI-21: adding a file lock and an invalidate_capability_cache() call to the PUT /api/v1/pipeline/agents/{agent_id}/raw write path. Until OI-21 lands, there is a race window: your code reads a cached definition while a concurrent pipeline API write updates the underlying file.
What we ask:
Add a code comment in the AgentLoop or MemoryMixin code that reads from AgentRegistry acknowledging this race:
# NOTE: AgentDefinition read from AgentRegistry may be stale if a concurrent
# PUT /api/v1/pipeline/agents/{agent_id}/raw write is in progress.
# OI-21 (pipeline team) adds invalidate_capability_cache() on that write path.
# Until OI-21 lands, treat AgentDefinition as a read-only snapshot.This prevents future contributors from assuming the registry is always consistent and prevents the issue from being silently papered over.
REC-4 — Use a Separate File Path for MemoryStore SQLite Database [PRE-MERGE]
What we need:
MemoryStore should default to ~/.gaia/memory/gaia_memory.db, not ~/.gaia/chat/gaia_chat.db.
Why:
ChatDatabase opens its connection with PRAGMA journal_mode = WAL (line 101 of database.py). WAL mode allows one writer and multiple readers on the same database file. However, two Python objects opening separate sqlite3.connect() connections to the same WAL-mode file — one for chat and one for memory — compete as simultaneous writers. Under concurrent chat message writes and memory consolidation writes, one of the two connections will receive OperationalError: database is locked.
SQLite's WAL mode serializes writers at the OS level, not the Python thread level. The ChatDatabase._lock = threading.RLock() (line 93 of database.py) serializes within the ChatDatabase instance, but it does not protect against a second sqlite3.connect() opened by MemoryStore from a different Python object.
Using a separate file (~/.gaia/memory/gaia_memory.db) eliminates the contention entirely. The memory database can run its own WAL-mode connection independently.
REC-5 — Add MemoryEventType as a Separate Union in types/index.ts [POST-MERGE]
What we need:
After REQ-3 renames are agreed and implemented, your AgentLoop will emit new event type strings (the non-colliding names). These strings must be added to the TypeScript type system. However, they must not be added to the existing StreamEventType union at lines 260-265 of src/gaia/apps/webui/src/types/index.ts.
Instead, create a separate union:
// In src/gaia/apps/webui/src/types/index.ts
// Add AFTER the StreamEventType union (after line 265):
/** Event types emitted specifically by AgentLoop (memory subsystem). */
export type MemoryEventType =
| 'agent_loop_cycle_start' // example — use agreed names from REQ-3
| 'agent_loop_cycle_end'
| 'memory_quality_check'
| 'memory_recalled'
| 'memory_consolidated';
// ... add all agreed AgentLoop event names hereWhy separate unions:
StreamEventType is used by SSEOutputHandler (line 89 of sse_handler.py) and the pipeline SSE router. PIPELINE_EVENT_MAP in api.ts routes StreamEventType strings. Extending StreamEventType with memory events means the pipeline SSE router would need to handle memory events — coupling two subsystems that should be independent. A separate MemoryEventType union keeps the two namespaces clean and allows MemoryDashboard.tsx to narrow its type to MemoryEventType without accepting the full StreamEventType union.
Section 3: What We Will Build On Top
This section is our contractual commitment to you. When your PR meets the requirements above, we will build the following on top of your memory subsystem. The dependency annotations show which REQ or REC items must be complete first.
BU-1 — Add MemoryMixin to All Five Pipeline Stage Agents
Dependencies: REQ-1 (cache access), REQ-2 (schema migration), REQ-3 (SSE name collision), REQ-6 (collision guard)
Our estimated effort: 1 sprint
What we will do:
We will apply MemoryMixin to these five agents:
DomainAnalyzer— stores domain context as long-term knowledgePlannerAgent— stores plan decisions for retrospective analysisExecutorAgent— stores tool call results and execution traceReviewerAgent— stores quality assessments and defect patternsSynthesizerAgent— stores synthesized outputs for cross-run deduplication
Each agent's __init__ will call _register_agent_memory_ops(), registering your five tool names five times total. REQ-6's collision guard must be in place to make this safe to debug.
REQ-1 must be in place so _register_agent_memory_ops() correctly retrieves the live agent instance from _agent_cache. REQ-2 must be in place so the memory schema columns exist before any memory write occurs. REQ-3 must be in place so the memory SSE events from these five agents do not collide with pipeline SSE events in the browser.
BU-2 — Wire GoalStore into PipelineOrchestrator
Dependencies: REC-1 (stable GoalStore API), REQ-4 (shutdown order)
Our estimated effort: 3-5 days
What we will do:
PipelineOrchestrator will accept a GoalStore instance and call:
create_goal(title=task_description, priority=1)at pipeline startcreate_task(goal_id, title=stage_name)for each pipeline stageupdate_task_status(task_id, "ACTIVE")when a stage startsupdate_task_status(task_id, "COMPLETED" or "FAILED")when a stage ends
This makes pipeline runs visible in the Memory Dashboard without any change to the dashboard itself. The dashboard will show pipeline goals alongside agent memory goals, unified by GoalStore.
REQ-4 must be in place to ensure GoalStore (and its underlying MemoryStore) is closed before db.close() runs during application shutdown.
We are building against exactly the three method signatures specified in REC-1. If those signatures change after we start BU-2, we will need a version bump and a migration.
BU-3 — Phase 6 Convergence Spec: AgentLoop / PipelineEngine Unified Runtime
Dependencies: REQ-7 (injectable event loop + async generator interface)
Our estimated effort: 1-2 sprints (design), 2-3 sprints (implementation)
What we will do:
We will write a convergence specification that defines how AgentLoop and PipelineEngine merge into a single runtime. The unified runtime will:
- Accept a goal or task description
- Drive pipeline phases (domain → plan → execute → review → synthesize) as
AgentLoopcycles - Emit a unified event stream that covers both
LoopEvent(from your system) andPipelineEvent(from ours) - Support both memory-first (AgentLoop drives, engine records state) and pipeline-first (engine drives, loop provides execution substrate) operating modes
This convergence is only tractable if AgentLoop exposes the injectable event loop and async generator interface from REQ-7. Without REQ-7, the two systems cannot share a coroutine runtime without nesting event loops, which Python does not support.
We will invite you to co-author the convergence spec once REQ-7 is implemented.
BU-4 — Wire SystemDiscovery into DomainAnalyzer for NPU-Aware Tier Recommendations
Dependencies: REC-2 (SystemDiscovery.get_cached_context() returns typed object with hardware.npu_available)
Our estimated effort: 2-3 days
What we will do:
DomainAnalyzer currently selects a model tier based on task complexity alone. We will extend it to check context.hardware.npu_available from SystemDiscovery.get_cached_context(). When an NPU is available, DomainAnalyzer will recommend smaller quantized models (Tier 1-2) that run efficiently on the Ryzen AI NPU. When no NPU is present, it falls back to the current tier selection logic.
This requires REC-2 because DomainAnalyzer is instantiated multiple times per pipeline run and cannot afford the latency of repeated hardware probes. The caching guarantee in REC-2 is essential.
Section 4: Integration Contract Table
This table shows the two-way dependency map. "Kovtcharov provides" means your PR must implement it. "Pipeline team provides" means we build it after your requirement is met.
| Dependency | Kovtcharov provides | Pipeline team consumes | Unblocked by |
|---|---|---|---|
| Agent cache access | REQ-1: correct _agent_cache read with lock |
MemoryMixin in pipeline agents |
REQ-1 |
| Schema migration | REQ-2: three columns via _migrate() pattern |
Memory reads/writes in pipeline runs | REQ-2 |
| SSE event names | REQ-3: non-colliding AgentLoop event names | Browser memory UI (no misrouting) | REQ-3 |
| Shutdown order | REQ-4: close_store() before db.close() |
Clean process exit after pipeline runs | REQ-4 |
| Router auth scope | REQ-5: per-endpoint auth only | MCP catalog used by installer | REQ-5 |
| Tool collision guard | REQ-6: logger.warning() on overwrite |
MemoryMixin on 5 pipeline agents (BU-1) | REQ-6 |
| Drivable AgentLoop | REQ-7: async generator + injectable loop | PipelineEngine convergence (BU-3) | REQ-7 |
| Scoped brace fix | REQ-8: only _get_mixin_prompts() touched |
ComponentLoader template rendering |
REQ-8 |
| GoalStore API | REC-1: stable method signatures | GoalStore in PipelineOrchestrator (BU-2) | REC-1 |
| SystemDiscovery | REC-2: typed return with hardware.npu_available |
NPU-aware DomainAnalyzer (BU-4) | REC-2 |
| Registry staleness | REC-3: code comment acknowledging race | Future OI-21 fix | REC-3 |
| Separate DB file | REC-4: MemoryStore on ~/.gaia/memory/ |
No WAL contention with ChatDatabase | REC-4 |
| MemoryEventType | REC-5: separate TS union after REQ-3 names agreed | MemoryDashboard type safety | REQ-3 + REC-5 |
| MemoryMixin rollout | BU-1: pipeline team adds MemoryMixin | AgentLoop events from pipeline agents | REQ-1,2,3,6 |
| GoalStore wiring | BU-2: pipeline team wires GoalStore | Pipeline runs in Memory Dashboard | REC-1, REQ-4 |
| Convergence spec | BU-3: pipeline team authors Phase 6 spec | Unified runtime for Phase 6 | REQ-7 |
| NPU tier selection | BU-4: pipeline team extends DomainAnalyzer | Efficient model selection on Ryzen AI | REC-2 |
Section 5: Merge Sequencing
The requirements above have dependencies. This section describes the recommended order of work to minimize rework.
Phase A — Unblocked work (start immediately, no cross-team coordination needed)
These items have no external dependencies and no coordination requirements. Estimated total: 4-5 hours.
| Item | Effort | File |
|---|---|---|
REQ-1 — Fix _agent_cache access |
30 min | Your _register_agent_memory_ops() file |
REQ-2 — Fix _migrate() additions |
1-2 hr | src/gaia/ui/database.py |
| REQ-4 — Fix shutdown order | 15 min | src/gaia/ui/server.py |
| REQ-5 — Remove router-level auth | 15 min | src/gaia/ui/routers/mcp.py |
| REQ-8 — Scope brace-escaping fix | 1 hr | src/gaia/agents/base/agent.py |
| REC-1 — Lock GoalStore signatures | 30 min | PR description |
| REC-4 — Separate MemoryStore path | 30 min | Your MemoryStore default path |
Phase B — Coordination required (send us your proposal, wait for confirmation)
| Item | Action | Waiting on |
|---|---|---|
| REQ-3 — SSE name collision | Send us your proposed AgentLoop event names | Our confirmation (target: 1 business day) |
Do not write any code for REQ-3 until we confirm the names. Writing first means you may have to rename everything again if we find a collision with a future event name we have pending in a feature branch.
Phase C — After REQ-3 names confirmed (implement)
| Item | Effort | File |
|---|---|---|
| REQ-3 implementation | 2-3 hr | All AgentLoop SSE emitters |
REC-5 — MemoryEventType union |
1 hr | src/gaia/apps/webui/src/types/index.ts |
MemoryDashboard.tsx updates |
1 hr | Your MemoryDashboard.tsx |
Phase D — Long-lead work (can run in parallel with Phase A-C, longer timeline)
| Item | Effort | Notes |
|---|---|---|
| REQ-6 — Collision guard | 1 hr | Can land any time before BU-1 starts |
| REQ-7 — Drivable AgentLoop | 3-5 days | Does not block Phase A-C merge |
| REC-2 — SystemDiscovery contract | 1 hr | Needed before BU-4 starts |
| REC-3 — Registry staleness comment | 30 min | Post-merge PR acceptable |
Target merge gate: All of Phase A complete + REQ-3 implementation complete (Phase C) = merge-ready. REQ-6 and REQ-7 can follow in a subsequent PR on a 1-sprint timeline.
Section 6: Cross-Team Coordination Gate (REQ-3 Name Agreement)
REQ-3 requires bilateral agreement on the AgentLoop SSE event type names before any implementation. This section defines the protocol.
The problem in concrete terms
Your AgentLoop emits SSE events. The browser receives them as JSON objects: {"type": "loop_back", "data": {...}}. The browser's PIPELINE_EVENT_MAP (a plain JavaScript object at lines 626-643 of src/gaia/apps/webui/src/services/api.ts) looks up event.type as a key and routes the event to a callback.
The six pipeline-owned strings are keys in PIPELINE_EVENT_MAP. If your event type matches one of those keys, the event is routed to a pipeline callback. There is no error — the pipeline callback silently receives wrong data. The memory UI receives nothing.
This is not a bug we can fix after the fact with a minor patch — it requires a coordinated rename across Python emitters, TypeScript types, and the dashboard component.
The protocol
Step 1 — You send us a list.
Reply to this document (or message the pipeline team directly) with a list of all SSE event type strings your AgentLoop emits. For example:
AgentLoop SSE event types (proposed):
- agent_loop_cycle_start
- agent_loop_cycle_end
- memory_quality_check
- agent_loop_back
- agent_phase_transition
- memory_defect_detected
- memory_recalled
- memory_consolidated
- goal_created
- task_status_changed
Step 2 — We check for collisions.
We check your proposed names against:
- The six pipeline-reserved names (listed in REQ-3)
- Any pending event names in our feature branch that have not landed yet
- Names in the broader
StreamEventTypeunion (lines 255-265 oftypes/index.ts)
Step 3 — We confirm (target: 1 business day).
We reply with either "approved" or "here are the names that conflict — please propose alternatives for those."
Step 4 — You implement.
Only after step 3 confirmation do you implement the renames in your Python emitters, TypeScript types, and dashboard component.
Why this gate exists
The six pipeline event names were not chosen arbitrarily. They correspond to conceptual phases in the recursive pipeline: looping back (quality retry), scoring (quality metric), phase jumping (state machine transition), and defect detection. If we let AgentLoop use similar concepts with similar names, we will have a permanent naming conflict as the two systems co-evolve. The gate forces us to align vocabulary upfront.
Section 7: Quick-Start Checklist
Seven items kovtcharov can start today, in priority order. Total estimated time for items 1-6: approximately 4.5 hours.
Item 1 — Fix _agent_cache access (REQ-1)
Time: 30 minutes
Do: In _register_agent_memory_ops(), replace your current agent retrieval with the _agent_cache_lock + _agent_cache.get(session_id) + entry["agent"] pattern. Add an early return if session_id is not in the cache.
Verify: Run your existing memory tool tests. Confirm no KeyError on missing session.
Item 2 — Send us your AgentLoop SSE event name list (REQ-3 coordination)
Time: 15 minutes (to draft the list)
Do: List every SSE event type string your AgentLoop emits. Send to the pipeline team (feature/pipeline-orchestration-v1). No code changes yet.
Verify: Your list contains no names from the six reserved names in REQ-3. If it does, propose alternatives in the same message.
Item 3 — Fix _migrate() column additions (REQ-2)
Time: 1-2 hours
Do: For each of the three new columns, add a separate try/except block after line 176 of database.py using the PRAGMA table_info pattern. If knowledge and conversations are new tables, add CREATE TABLE IF NOT EXISTS to SCHEMA_SQL.
Verify: Run python -c "from gaia.ui.database import ChatDatabase; db = ChatDatabase(':memory:'); db.close()" and confirm no exception.
Item 4 — Remove router-level auth (REQ-5)
Time: 15 minutes
Do: Confirm line 16 of src/gaia/ui/routers/mcp.py reads router = APIRouter(tags=["mcp"]) with no dependencies argument. Move any auth to individual @router.post(...) decorators.
Verify: grep -n "dependencies" src/gaia/ui/routers/mcp.py — the APIRouter line must not appear in results.
Item 5 — Scope the curly-brace fix to _get_mixin_prompts() (REQ-8)
Time: 1 hour
Do: Ensure your escaping fix touches only _get_mixin_prompts() at line 299 of agent.py. Run git diff HEAD -- src/gaia/utils/component_loader.py and verify the output is empty.
Verify: Write a unit test: create a mixin that returns a prompt string with {variable} in it; confirm it does not raise KeyError or ValueError. Confirm render_component() still works with {{KEY}} substitution.
Item 6 — Fix shutdown order (REQ-4)
Time: 15 minutes
Do: In server.py, move your close_store() call to run after monitor.stop() (line 210) and before db.close() (line 212). The exact insertion point is between those two existing lines.
Verify: Read the lifespan block in server.py and confirm the sequence is: monitor.stop() → close_store() → db.close().
Item 7 — Lock in GoalStore API signatures in the PR description (REC-1)
Time: 30 minutes
Do: Add the exact method signatures for create_goal(), create_task(), and update_task_status() to your PR #606 description. Include parameter types, return types, and error behavior (what happens if task_id not found).
Verify: We (pipeline team) will confirm receipt and that BU-2's dependency on those signatures is satisfied.
Appendix A: File Reference Summary
All files mentioned in this document, with locations verified against HEAD d187907.
| File | Lines | Relevant to |
|---|---|---|
src/gaia/ui/_chat_helpers.py |
1,144 | REQ-1 |
src/gaia/ui/database.py |
787 | REQ-2, REC-4 |
src/gaia/ui/sse_handler.py |
950 | REQ-3, context |
src/gaia/ui/routers/mcp.py |
425 | REQ-5 |
src/gaia/ui/server.py |
— | REQ-4 |
src/gaia/agents/base/tools.py |
— | REQ-6 |
src/gaia/pipeline/orchestrator.py |
681 | REQ-7, context |
src/gaia/utils/component_loader.py |
— | REQ-8 |
src/gaia/agents/base/agent.py |
— | REQ-8 |
src/gaia/apps/webui/src/types/index.ts |
— | REQ-3, REC-5 |
src/gaia/apps/webui/src/services/api.ts |
— | REQ-3, REC-5 |
Appendix B: Specific Line Numbers for Every Claim
Every line number in this document was verified against HEAD d187907 of feature/pipeline-orchestration-v1. If you are working on a different commit, verify the line numbers have not shifted before using them as edit targets.
| Claim | File | Line |
|---|---|---|
_agent_cache declaration |
_chat_helpers.py |
54 |
_agent_cache_lock declaration |
_chat_helpers.py |
57 |
_store_agent() cache write |
_chat_helpers.py |
102-106 |
SCHEMA_SQL start |
database.py |
23 |
_init_schema() definition |
database.py |
105 |
_migrate() definition |
database.py |
118 |
_ensure_settings_table() call |
database.py |
121 |
| Last existing migration block end | database.py |
176 |
PRAGMA journal_mode = WAL |
database.py |
101 |
SSEOutputHandler class |
sse_handler.py |
89 |
router = APIRouter(tags=["mcp"]) |
routers/mcp.py |
16 |
monitor.stop() in shutdown |
server.py |
210 |
db.close() in shutdown |
server.py |
212 |
ToolRegistry.register() write |
tools.py |
396 |
_execute_recursive_pipeline() |
orchestrator.py |
556 |
| Event loop creation in orchestrator | orchestrator.py |
656-661 |
asyncio local import |
orchestrator.py |
584 |
render_component() str.replace block |
component_loader.py |
240-246 |
_get_mixin_prompts() |
agent.py |
299 |
StreamEventType pipeline entries |
types/index.ts |
260-265 |
PIPELINE_EVENT_MAP |
api.ts |
626-643 |
This document was written by the pipeline orchestration team on feature/pipeline-orchestration-v1. For questions about requirements, contact us before implementing. For questions about your own PR, you know it better than we do. For cross-team coordination on REQ-3 names, initiate contact with a proposed name list — we will respond within one business day.
Two-phase local-first email triage agent — MVT (~1.5d CC-assisted) for v0.20.0, full EmailTriageAgent for v0.23.0. Covers auto-discovery, per-cohort autonomy, speech-act classification, undo ledger, Slack as first-class output channel, and an honest §27 catalog of research bets and unvalidated claims. §22.4 maps outstanding PRs to prerequisite role: #606 / #517 / #495 / #622 / #779 / #741 / #737. Landing the "minimum set" of #495 + #741 + one of #606 / #517 M1 collapses most of the missing-infrastructure workarounds before implementation starts.
## Summary Adds a two-phase spec for a local-first email triage agent that runs inference on-device via Lemonade (Ryzen AI NPU/iGPU) — no email content transits a cloud API. Phase **MVT** ships in ~1.5 days (CC-assisted) by thin-wrapping existing primitives; **Phase C1** polishes UX for v0.20.0; **Phase C2** adds scheduled triage, Agent Inbox HITL, and in-tree Gmail MCP for v0.23.0. Slack is a first-class output channel from day one (webhook → MCP → interactive buttons across phases). ## Key threads - **MVT ships fast because ~95% of plumbing exists.** §2.5 maps every required capability to an existing GAIA primitive (`MCPClientMixin`, `DatabaseMixin`, `RAGSDK`, `TalkSDK`, `SummarizeAgent`, `ApiAgent`, SSE). Why it matters: scoping the MVT as thin wrappers rather than new plumbing is what makes the ~1.5d estimate credible. - **§22.4 catalogs in-flight PRs as prerequisites.** Maps [#606](#606) (memory v2), [#517](#517) (autonomy M1/M3/M5), [#495](#495) (security.py), [#622](#622) (orchestrator), [#779](#779) (eval), [#741](#741) (vault), [#737](#737) (Slack connector) to which spec risks each one collapses. Why it matters: the "minimum set to start MVT safely" is named explicitly — #495 + #741 + one of #606 / #517 M1 — so sequencing is actionable. - **Memory-PR conflict flagged (§22.4.4).** #606 and #517 M1 overlap on memory subsystem; §22.4.4 calls out the reconciliation as a prerequisite decision, not a runtime surprise. - **§27 "Known Weaknesses, Unvalidated Claims, Decision Debt"** names the research bets (Custom AI Labels on local 4B, per-relationship voice, auto-follow-up quality) and unvalidated claims cited in the spec (97.5% tool-call reliability, GongRzhe archive date, etc.) so C2 isn't treated as an engineering certainty. - **Slack integration scoped as an output channel (§12.18).** Webhook at MVT → Slack MCP at C1 → interactive approve/edit/reject buttons at C2. Aligned with [messaging-integrations-plan.mdx](https://github.com/amd/gaia/blob/main/docs/plans/messaging-integrations-plan.mdx) (#635). ## Test plan - [ ] Render preview of `docs/plans/email-triage-agent.mdx` via Mintlify dev or amd-gaia.ai preview — confirm frontmatter, tables, code blocks, and section numbering (1–28) render cleanly. - [ ] Verify `docs/docs.json` navigation entry places the page under *Agent UI* group next to `email-calendar-integration`. - [ ] Cross-reference check: every `[Link](file.mdx)` target exists (`email-calendar-integration`, `autonomy-engine`, `security-model`, `agent-ui`, `setup-wizard`, `messaging-integrations-plan`). - [ ] Scan §22.4 PR numbers against the current PR queue (`gh pr list --repo amd/gaia --state open`) to confirm they're still open and the recommended sequence is feasible.
Heads-up: semantic overlap with PR #495's scratchpad@kovtcharov — flagging a tool-selection collision to think through before both PRs land. PR #495 adds PR #606 adds The collisionAfter both PRs merge, the ChatAgent will have both of these tools registered simultaneously. They answer two genuinely different questions:
The LLM will not naturally know which to pick. The third row is the failure mode — an LLM could pick What I'd suggest
Scope suggestionThis is coordination work, not a blocker for either PR. Reasonable landings:
Either way, worth a 10-minute conversation before the second of the two merges. Cross-referencing from the #495 final-state comment: #495 (comment) 🤖 Generated with Claude Code |
# Conflicts: # src/gaia/apps/webui/package-lock.json # src/gaia/apps/webui/src/components/ChatView.tsx # src/gaia/apps/webui/src/services/api.ts # src/gaia/apps/webui/src/types/index.ts # src/gaia/ui/_chat_helpers.py # src/gaia/ui/database.py # src/gaia/ui/models.py # src/gaia/ui/routers/sessions.py # src/gaia/ui/server.py # src/gaia/ui/sse_handler.py # src/gaia/ui/utils.py
Summary
Comprehensive agent memory system that serves as a second brain — storing, recalling, and learning from every interaction. Built on proven patterns from Mem0, Zep, and Hindsight.
Architecture (v2)
ms-marco-MiniLM-L-6-v2)superseded_bycolumn preserves history when facts are correctedtime_from/time_toon all search methods for time-based recallSchema v2
Three tables (
knowledge,conversations,tool_history) with new columns:knowledge.embedding BLOB— 768-dim vector (nomic-embed-text-v2)knowledge.superseded_by TEXT— fact lineage chainconversations.consolidated_at TEXT— consolidation trackingMemory Tools (5 LLM-facing tools)
rememberrecallupdate_memoryforgetsearch_past_conversationsUse Cases
person:sarah_chen)Observability Dashboard
Full-page Memory Dashboard in Agent UI with:
Startup Sequence
Files
src/gaia/agents/base/memory_store.pysrc/gaia/agents/base/memory.pysrc/gaia/agents/base/discovery.pysrc/gaia/ui/routers/memory.pysrc/gaia/apps/webui/src/pages/MemoryDashboard.tsxdocs/spec/agent-memory-architecture.mdtests/unit/test_memory_*.pytests/integration/test_memory_*.pyDesign References
superseded_by, temporal searchTest plan
pytest tests/unit/test_memory_store.py test_memory_mixin.py test_memory_router.pypytest tests/integration/test_memory_integration.py test_memory_api_integration.pycd src/gaia/apps/webui && npm run build