Skip to content

feat(memory): agent memory v2 — second brain with hybrid search, LLM extraction, and observability dashboard#606

Draft
kovtcharov wants to merge 38 commits intomainfrom
feature/agent-memory
Draft

feat(memory): agent memory v2 — second brain with hybrid search, LLM extraction, and observability dashboard#606
kovtcharov wants to merge 38 commits intomainfrom
feature/agent-memory

Conversation

@kovtcharov
Copy link
Copy Markdown
Collaborator

@kovtcharov kovtcharov commented Mar 20, 2026

Summary

Comprehensive agent memory system that serves as a second brain — storing, recalling, and learning from every interaction. Built on proven patterns from Mem0, Zep, and Hindsight.

Architecture (v2)

  • Hybrid search: Vector (FAISS) + BM25 (FTS5) + RRF fusion + cross-encoder reranking (ms-marco-MiniLM-L-6-v2)
  • Mem0-style extraction: LLM decides ADD/UPDATE/DELETE/NOOP against existing memory after each conversation turn
  • Zep-inspired fact lineage: superseded_by column preserves history when facts are corrected
  • Hindsight-inspired reconciliation: Background pairwise similarity check detects contradictions across sessions
  • Complexity-aware recall: Adaptive top_k (3/5/10) based on query complexity heuristics
  • Temporal search: time_from/time_to on all search methods for time-based recall
  • Conversation consolidation: Auto-distill old sessions into durable knowledge before 90-day prune
  • No silent fallback: Embeddings are a hard requirement — system fails loudly on misconfiguration

Schema v2

Three tables (knowledge, conversations, tool_history) with new columns:

  • knowledge.embedding BLOB — 768-dim vector (nomic-embed-text-v2)
  • knowledge.superseded_by TEXT — fact lineage chain
  • conversations.consolidated_at TEXT — consolidation tracking

Memory Tools (5 LLM-facing tools)

Tool Purpose
remember Store facts, notes, reminders with category/domain/entity/due_at
recall Hybrid semantic+keyword search with temporal filtering
update_memory Modify existing items, set reminded_at
forget Delete a memory item
search_past_conversations Search conversation history with temporal filtering

Use Cases

  • Note-taking, journaling, meeting notes capture
  • Reminders with due dates and wake-up scheduling
  • Personal knowledge management (research, articles)
  • Contact profiles via entity linking (person:sarah_chen)
  • Error learning and skill capture from tool usage
  • Recurring commitments (LLM advances due_at)

Observability Dashboard

Full-page Memory Dashboard in Agent UI with:

  • Header stats cards (memories, sessions, tool calls, success rate)
  • Activity timeline (30-day heatmap)
  • Knowledge browser (filterable, sortable, paginated table)
  • Tool performance stats
  • Conversation history browser with consolidation status
  • Upcoming/overdue reminders panel
  • Maintenance actions (consolidate, rebuild embeddings, reconcile)
  • Embedding coverage indicator

Startup Sequence

  1. Validate Lemonade → 2. Backfill embeddings → 3. Rebuild FAISS → 4. Confidence decay → 5. Reconcile memory → 6. Consolidate sessions → 7. Prune → 8. Generate session

Files

Component Files
Data layer src/gaia/agents/base/memory_store.py
Agent mixin src/gaia/agents/base/memory.py
System discovery src/gaia/agents/base/discovery.py
REST API src/gaia/ui/routers/memory.py
Agent UI src/gaia/apps/webui/src/pages/MemoryDashboard.tsx
Architecture spec docs/spec/agent-memory-architecture.md
Unit tests tests/unit/test_memory_*.py
Integration tests tests/integration/test_memory_*.py

Design References

System Pattern adopted
Mem0 LLM-in-the-loop extraction (ADD/UPDATE/DELETE/NOOP)
Zep/Graphiti Fact lineage via superseded_by, temporal search
Hindsight Cross-encoder reranking, background reconciliation
ENGRAM Memory typing (category-based) over knowledge graphs
CoALA Four-tier cognitive architecture (working/episodic/semantic/procedural)

Test plan

  • Unit tests pass: pytest tests/unit/test_memory_store.py test_memory_mixin.py test_memory_router.py
  • Integration tests pass: pytest tests/integration/test_memory_integration.py test_memory_api_integration.py
  • Schema v2 migration works on existing v1 databases
  • Hybrid search returns semantically relevant results (not just keyword matches)
  • Mem0 extraction correctly handles ADD/UPDATE/DELETE operations
  • Superseded items excluded from search and system prompt
  • Temporal filtering works with time_from/time_to on recall
  • Consolidation distills old sessions into knowledge items
  • Reconciliation detects contradictory facts across sessions
  • Memory Dashboard renders all 6 sections with real data
  • Dashboard knowledge browser supports filter/sort/paginate/edit/delete
  • Lemonade unavailable at startup raises RuntimeError (no silent fallback)
  • Cross-encoder reranking improves precision on ambiguous queries
  • Complexity-aware recall uses adaptive top_k (3/5/10)
  • Frontend build succeeds: cd src/gaia/apps/webui && npm run build

@github-actions github-actions bot added documentation Documentation changes dependencies Dependency updates agents Agent system changes cli CLI changes tests Test changes electron Electron app changes labels Mar 20, 2026
Comment thread src/gaia/agents/base/discovery.py Fixed
@kovtcharov kovtcharov force-pushed the feature/agent-memory branch from e0eff31 to 068eead Compare March 21, 2026 23:13
itomek and others added 6 commits March 21, 2026 18:10
## Summary

- **`gaia init` now installs RAG dependencies** for `chat`, `rag`, and
`all` profiles — adds `pip_extras` field to profile definitions and a
new `_install_pip_extras()` step that detects editable vs package
install, tries `uv pip` first with `pip` fallback
- **Added `self.rag` None guards** to 8 RAG tools in `rag_tools.py` that
were crashing with `'NoneType' object has no attribute 'index_document'`
when RAG deps not installed
- **Widened ChatAgent RAG init exception catch** from `ImportError` to
`Exception` with warning-level logging and debug traceback
- **Updated Agent UI docs** to include `[rag]` in install instructions
(`[ui,rag]`)

## Test plan

- [x] Lint passing (black, isort, pylint, flake8)
- [x] All 1104 unit tests passing
- [ ] `gaia init --profile chat` installs RAG deps automatically
- [ ] Agent UI document indexing works after `pip install -e ".[rag]"`
- [ ] RAG tools return actionable error when deps not installed (instead
of crashing)

🤖 Generated with [Claude Code](https://claude.com/claude-code)
C-1: Guard winreg import and all registry-scanning methods in discovery.py
     so the module loads cleanly on Linux/macOS where winreg is absent.
     Also guard _scan_credential_manager() behind sys.platform check to
     avoid subprocess.CREATE_NO_WINDOW AttributeError on non-Windows.

C-3: Replace direct _lock/_conn access in CLI with two new MemoryStore
     public methods: get_source_counts() and delete_by_source(source).
     delete_by_source() wraps FTS cleanup + DELETE in a single atomic
     transaction with rollback, removing the per-ID loop that could
     leave knowledge/FTS diverged on partial failure.

C-4: Add close_store() to memory router module; call it from FastAPI
     lifespan shutdown so the WAL is checkpointed and the SQLite
     connection is released cleanly on server exit.

M-2: list_knowledge endpoint now excludes sensitive items by default.
     New include_sensitive=false query param (default false) controls
     visibility; sensitive=true still filters to sensitive-only.

M-6: Add append-only comment to conversations FTS trigger block noting
     that an AFTER UPDATE trigger would be required if store_turn()
     ever changes to update existing rows.

Tests: +9 tests (394 total) covering get_source_counts, delete_by_source
       rollback discipline, and all three sensitive filter modes in the router.
- Fix _original_user_input=None fallback bug in _after_process_query
  (getattr default ignored None; switch to `or` to handle init state)
- Extract VALID_CATEGORIES/MAX_CONTENT_LENGTH/MAX_TURN_LENGTH and other
  magic numbers to named module-level constants in memory_store.py
- Import constants in memory.py to eliminate duplicate category sets
  and ensure truncation limits stay in sync across all call sites
- DRY: memory router imports VALID_CATEGORIES from data layer instead
  of redefining its own copy
- Clean up unused imports in test files (F401/F811 flake8 violations)
- 394 unit tests passing, flake8 clean
Replace substring `"github.com" in url_lower` with urlparse().hostname
comparison to fix CodeQL CWE-20 "Incomplete URL substring sanitization".
A crafted URL like http://evil.com/github.com could otherwise bypass the
check. Hostname equality/suffix match is unambiguous.
Security:
- recall tool now filters out sensitive items before returning results
  to the LLM — sensitive entries (API keys, credentials) are for
  internal use only and must not appear in tool output.

Performance:
- Add get_by_category_contexts() to MemoryStore: single SQL query with
  WHERE context IN (active, 'global') replaces two separate
  get_by_category() calls in _get_context_items(), halving DB round-trips
  per system-prompt build (was 6 queries, now 3).
- Replace N+1 correlated subquery in get_sessions() with a LEFT JOIN on
  MIN(id) per session — scales linearly regardless of session count.

Reliability:
- Add PRAGMA busy_timeout=5000 so concurrent WAL readers/writers in the
  same process (dashboard REST singleton + ChatAgent) retry for 5 s
  instead of failing immediately with SQLITE_BUSY.

Correctness:
- update_memory tool truncation check now uses MAX_CONTENT_LENGTH constant
  instead of hardcoded 2000, keeping it in sync with memory_store.py.

Testability:
- Replace sys.exit(1) in _bootstrap_chat/_bootstrap_discover/_bootstrap_reset
  helpers with raise RuntimeError; _handle_memory_bootstrap catches and
  exits, making helpers unit-testable in isolation.

Tests (+34):
- TestGetByCategoryContexts (5): single-query context+global fetch
- TestGetAllKnowledgeSortByValidation (4): sort_by whitelist protection
- TestGetSessionsFirstMessageV2 (3): join-based first_message
- test_memory_discovery.py (22): _classify_remote, _classify_path,
  _classify_domain, scan_all structure, Windows guard

428 tests passing, 1 skipped (Windows-only guard on non-Windows).
# Conflicts:
#	src/gaia/agents/chat/agent.py
#	src/gaia/apps/webui/src/App.tsx
#	src/gaia/apps/webui/src/components/ChatView.tsx
#	src/gaia/ui/server.py
@kovtcharov kovtcharov force-pushed the feature/agent-memory branch from d4fdb90 to a06f9cc Compare April 1, 2026 16:31
Comprehensive rewrite of agent-memory-architecture.md as a single
unified design document. Key changes:

- Hybrid search: vector (FAISS) + BM25 (FTS5) + RRF fusion + cross-encoder
  reranking (ms-marco-MiniLM-L-6-v2). No fallback — embeddings are a hard
  requirement.
- Mem0-style LLM extraction: ADD/UPDATE/DELETE/NOOP operations against
  existing memory, replacing naive extract-and-store.
- Zep-inspired fact lineage: superseded_by column preserves history when
  facts are corrected rather than silently overwriting.
- Hindsight-inspired background reconciliation: pairwise similarity check
  on startup detects contradictions missed at extraction time.
- Complexity-aware recall depth: adaptive top_k (3/5/10) based on query
  complexity heuristics.
- Temporal range search: time_from/time_to on all search methods for
  natural time-based recall.
- Conversation consolidation: auto-distill old sessions to durable
  knowledge before 90-day prune.
- Second brain use cases: journaling, meeting notes, PKM, reminders,
  wake-up scheduling, recurring commitments.
- Removed all graceful degradation / silent fallback patterns.
- Removed openjarvis-memory-analysis.md (temp analysis doc).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@kovtcharov kovtcharov changed the title feat(memory): persistent agent memory system with dashboard UI feat(memory): agent memory v2 — second brain with hybrid search, LLM extraction, and observability dashboard Apr 1, 2026
Claude Code and others added 7 commits April 1, 2026 15:48
…coverage, temporal+superseded filters

- POST /api/memory/consolidate, /reconcile, /rebuild-embeddings
- GET /api/memory/embedding-coverage
- Updated GET /api/memory/knowledge with include_superseded, time_from, time_to
- Updated GET /api/memory/stats with embedding coverage and reconciliation stats
- 95 tests passing, lint clean

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…d_by, temporal search, consolidation

- Schema v1→v2 migration: embedding BLOB, superseded_by TEXT, consolidated_at TEXT
- New methods: store_embedding, get_items_with/without_embeddings, get_unconsolidated_sessions, mark_turns_consolidated, get_items_for_reconciliation
- Updated search() with time_from/time_to, superseded_by IS NULL, use_count increment
- Updated all query methods with superseded_by IS NULL filter
- 275 tests passing, lint clean

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…LLM extraction, temporal recall

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… FAISS, API integration

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ledge browser, activity timeline, tool stats

6-section dashboard: header stat cards, 30-day activity bar chart,
paginated knowledge browser with entity/category/context/search filters,
tool performance table, conversation history with FTS search,
upcoming & overdue temporal panel.

Features:
- Embedding coverage indicator with progress bar
- Maintenance dropdown: consolidate, rebuild embeddings, reconcile, rebuild FTS
- Click-to-expand knowledge row detail (metadata, timestamps, superseded_by chain)
- Inline actions: edit, delete, toggle sensitive, copy ID
- Superseded entries toggle with server-side filtering
- Toast notification system for all CRUD and maintenance operations
- Brain icon in sidebar for navigation
- Keyboard support: Escape key (layered close), Enter/Space on rows
- ARIA labels, roles, and aria-live for accessibility
- Responsive layout (3 breakpoints)
- Relative date formatting ("in 2 days", "3 days ago")
- API calls aligned with backend router field names

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…em0 extraction, consolidation, reconciliation

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The backend returns metadata as parsed JSON (dict), not a string.
Rendering it directly showed [object Object]. Now uses
JSON.stringify for object metadata and plain text for strings.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Claude Code and others added 4 commits April 1, 2026 15:52
…e cases

- Strengthen conversation context filtering test with explicit zero-result
  assertions instead of vacuous loop
- Add due_at validation, empty-list consolidation, and history limit tests
- Remove dead _past_iso import from API test file
- 117 tests, all passing

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…m0 extraction, consolidation, reconciliation

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…up scope includes entity, dynamic context always returns time

- MemoryStore.search(): corrected from "hybrid" to "FTS5 keyword search" (hybrid is MemoryMixin._hybrid_search)
- get_memory_dynamic_context(): fixed "returns empty" claim — always returns current time
- store() dedup scope: category+context+entity, not category+context
- get_items_with_embeddings(): added missing top_k, time_from, time_to params
- _classify_query_complexity: added missing medium/complex signal words
- get_entities(): added missing last_updated field in return
- Added undocumented update_confidence() and delete_by_source() methods
- update(): noted embedding cleared on content change

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… fixes

- memory_store.py: set embedding=NULL when content changes in update()
  to force re-embedding (stale embedding would return wrong results)
- server.py: alphabetize router imports
- test fixes: formatting cleanup, mixin test updates from parallel tasks

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Claude Code and others added 4 commits April 1, 2026 16:55
…aces, expand test coverage

- Replace hardcoded mixin prompt dispatch in Agent._get_mixin_prompts() with
  auto-discovery of all get_*_system_prompt() methods — no manual registration needed
- ChatAgent._get_mixin_prompts() simplified to call super() then filter SD when uninitialized
- Fix KeyError in _EXTRACTION_PROMPT.format() by escaping literal curly braces
- Update test_memory_mixin: reflect always-present memory instructions, fix embed/extract tests
- Add test_memory_store coverage: update clears embedding, dedup clears embedding,
  superseded exclusion in get_by_entity/get_upcoming/get_by_category_contexts

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…nce fix

- formatRelativeDate: add seconds granularity (6s ago, 45s ago), fall
  back to full date+time (not date-only) for items older than 30 days
- Swap formatDate → formatRelativeDate in Updated column, conversation
  turn timestamps, and session last-activity
- memory_store.get_stats(): add total_retrievals (SUM of use_count)
- Add "Retrieved" stat card (gold accent) with avg recalls per memory
- Rename "Memories" card label to "Stored" for clarity
- remember() tool now stores with confidence=0.7 (was defaulting to 0.50)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add GET/PUT /api/memory/settings endpoint (mcp_memory_enabled key)
  stored in ChatDatabase settings table
- agent_ui_mcp.py: conditionally register memory_stats, memory_list,
  memory_recall tools when mcp_memory_enabled=true at server start
- memoryApi.ts: add getMemorySettings / updateMemorySettings helpers
- Dashboard: Settings section with toggle switch, loads on open,
  persists immediately on click; hint explains restart requirement

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…resent instructions block, new header

_build_stable_memory_prompt() now always emits instructions even with zero
memories. Update docs to reflect new header (=== MEMORY (Persistent Second
Brain) ===), the instructions block, the zero-memories fallback text, and
the 4000-char hard cap note.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions github-actions bot added the mcp MCP integration changes label Apr 2, 2026
Claude Code and others added 13 commits April 1, 2026 17:17
Remove the disabled attribute from the chat textarea so users can
compose their next message while the agent is still generating a
response. Sending is still blocked until streaming completes.
7 test classes covering the full memory second-brain pipeline:
- TestMemoryRememberRecall: store/search round-trip, confidence bumps,
  entity tagging, no false positives
- TestMemoryNoteTaking: categories, reminders with due dates, context
  isolation, sensitive items, error/skill notes
- TestMemoryJournaling: multi-turn conversation storage, FTS over
  history, session isolation, cross-session knowledge persistence
- TestMemoryConfidenceAndDedup: explicit vs LLM-extracted confidence,
  dedup on overlap, supersession exclusion, embedding cleared on update
- TestMemoryStatsAPI: total_retrievals in stats, REST endpoint contract
- TestMemorySettingsAPI: GET/PUT mcp_memory_enabled default + persistence
- TestMemoryMixinSystemPrompt: preferences/facts in stable prompt,
  sensitive exclusion, dynamic context timestamps, conversation storage

All run without Lemonade — deterministic fake embeddings, mocked LLM.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ions

Reconcile and consolidate were silently no-ops because self.chat
(AgentSDK) isn't available when init_memory() runs (before Agent.__init__).
Fix: defer both LLM-dependent steps to first process_query() invocation
via _memory_post_init_pending flag and _run_memory_post_init() helper.
Remove stale hasattr(self, 'chat') guards from both methods.

Additional v2 completions staged alongside:
- system_context.py: day-0 OS/hardware/software context collection
- memory_store.py: v2 store enhancements (WAL checkpoint, get_* methods)
- cli.py: memory CLI commands (status, clear, export, context)
- agent_ui_mcp.py: MCP memory access toggle (default disabled)
- ui/routers/memory.py: memory router v2 endpoints
- docs/spec/agent-memory-architecture.pdf: architecture spec
- tests: 4 new deferred-init tests + store/router/eval coverage

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- discovery.py: add _make_profile_fact() for user profile facts
- cli.py: first-boot onboarding flow — detect missing profile, offer
  quick intro (~1 min) on first gaia chat launch
- _chat_helpers.py: _register_agent_memory_ops() wires live ChatAgent
  LLM + FAISS into memory router for consolidation / reconciliation
- memory.py, memory_store.py: MemoryMixin v2 improvements
- memory.py router: updated endpoints
- tests: test_memory_router updates

plans/autonomous-agent-mode.md: detailed spec for the upcoming
feature/agent-always-running branch (loop state machine, request_user_input
tool design, AgentLoop architecture, UI components, open questions)
…temporal examples, ASCII arrows

- Replace all DB-log agent responses with conversational voice (12 occurrences)
- Add missing error_auto source (0.5) to confidence table
- Convert confidence bullet list to scannable table with all 6 sources
- Show real agent output in temporal recall examples instead of [internal process]
- Add time-range example to search_past_conversations
- Fix update_memory double Agent: line — separate output from internal annotation
- Replace Unicode arrows (→) with ASCII (->) in fact lineage code block

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…d + spec v2

Memory store:
- Add 'goal' and 'task' to VALID_CATEGORIES for autonomous agent use;
  goals/tasks require approved_for_auto=True metadata before the loop acts

Sessions:
- Add private: bool field to sessions (CreateSessionRequest, UpdateSessionRequest,
  SessionResponse) — private sessions are never touched by the autonomous loop

Memory dashboard & UI:
- MemoryDashboard, Sidebar, ChatView, memoryApi.ts updates
- types/index.ts, api.ts, utils.py additions

Backend:
- _chat_helpers.py, database.py, routers/memory.py, routers/sessions.py updates
- models.py: private session field

Tests:
- test_memory_router.py updates

Spec:
- plans/autonomous-agent-mode.md v2 — addresses all 15 security/design issues
  from critique: event-driven triggers, background_mode immediate-deny,
  memory injection prevention, audit log as P1, PathValidator deadlock fix,
  SCHEDULE bounds, tunnel safety, private session exclusion, user input queue,
  __NO_RESPONSE__ sentinel, permission overlay context, step budgets
…2, fix tool-message sentinel bug

- Add GoalStore (goal_store.py) with state-machine goals/tasks and goals router
- Expand SystemDiscovery with UserAssist, recent file types, gaming/media, macOS app usage scans
- Replace goal/task categories in MemoryStore with permission category; export GoalStore from base
- Extend memory router with goals CRUD, upcoming, conversation search, stats v2, settings
- Memory dashboard v2: goals panel, search, upcoming, knowledge CRUD, conversation history
- Fix sdk.py: remove "continue" sentinel from _prepare_messages_for_llm; convert tool messages
  to user role (was assistant) so LLM sees tool results as a proper user turn, not a bare
  "continue" command that caused nonsensical responses like "What do you want me to continue?"
- Add test_sdk_tool_messages.py regression tests for the sentinel fix
- Update bootstrap inference with app-usage, file-type, and gaming/media sections

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…istory storage

Qwen3.5 reasoning models emit <think>...</think> blocks before their JSON
response.  These were passed raw into _parse_llm_response and stored verbatim
in conversation_history.  On the next turn the LLM saw its own prior thinking
as historical assistant context and used it to infer what the user "must have"
said, producing replies unrelated to the actual current message (e.g. answering
a games question when the user asked to set a reminder).

Strip <think>...</think> from both the main response and the plan_response paths
immediately after the LLM returns, before parsing and before storing in messages.
The SSE streaming handler already filtered these for display; this fixes the
agent-side persistence.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…journal system prompt

- recall() now accepts domain= parameter for journal/sub-type filtering
  (e.g. recall(category='note', domain='journal') returns all journal entries)
- get_by_category() gains domain= filter parameter with SQL pushdown
- Default limit raised to 20 for list-style queries (no query text)
- Added perf_counter timing in recall tool (DEBUG level per-step, INFO total)
- System prompt now teaches agent:
    - 'Show my journal' → recall(category='note', domain='journal')
    - 'What reminders?' → recall(category='reminder')
- Tightened storage IMPERATIVE for set-a-reminder / journal-entry requests
- MIN_EXTRACTION_WORDS lowered 20→5, EXTRACTION_TIMEOUT_S raised 3→8

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…e eval scenarios

recall() tool:
- Expanded docstring to document both search and browse/list modes
- Added offset= parameter for pagination (browse page 2, etc.)
- Max limit raised to 100 for list-style queries
- has_more flag in response signals more results available

Eval personas (simulator.md + runner.py):
- Added home_user, small_business_owner, student, creative_professional
- Each persona has distinct communication style and context

New memory eval scenarios (4):
- memory_todo_tracking: persistent todo list (add, complete, list)
- memory_notes_capture: second-brain quick note capture & retrieval
- memory_small_business_context: business context + customer prefs
- memory_student_study_assistant: courses, deadlines, learning prefs
- memory_home_user_basics: simple personal facts (non-technical)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
New memory_health_tracking scenario tests the agent as a personal health
journal: logging sleep, exercise, and habits, then analyzing patterns
and giving personalized recommendations based on accumulated data.

Uses home_user persona for non-technical communication style.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
ChatAgent personality rules:
- GREETING RULE now memory-aware: references stored name/project instead
  of generic "What are you working on?" template
- Added FACT-SHARING RULE: when user shares personal info ("I'm Sam",
  "I use Python"), agent must acknowledge the specific content, not give
  a generic greeting response
- Removed hardcoded "RIGHT: Hey! What are you working on?" example that
  the LLM was copying verbatim for every response

Memory instructions:
- Greeting personalization examples added (use stored name + project)
- Shortened some instruction lines to reduce prompt bloat

Root cause: personality greeting examples were overriding memory context,
causing identical "Got it, Sam! What are you working on?" responses
regardless of what the user actually said.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… CLI

- Add AgentLoop background task for autonomous goal execution with
  event-driven state machine (IDLE/RUNNING/SCHEDULED/PAUSED)
- Expand MCP router with connection health, tool list, and control endpoints
- Add agent-mode and memory toggle controls to SettingsModal (frontend + API)
- Extend CLI with memory/goal/agent-mode subcommands
- Update GoalStore with scheduling, priority sorting, and rate-limit guards
- Wire SSE handler to stream AgentLoop state-change events
- Update memory router and database schema for new fields
- Fix discovery scan edge cases and tool-message sentinel handling

All 1891 unit tests passing.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions github-actions bot added chat Chat SDK changes eval Evaluation framework changes performance Performance-critical changes labels Apr 3, 2026
yield _sse({"type": "done", "total": 0})

return StreamingResponse(
_generate(),

Check warning

Code scanning / CodeQL

Information exposure through an exception Medium

Stack trace information
flows to this location and may be exposed to an external user.
Stack trace information
flows to this location and may be exposed to an external user.
yield _sse({"type": "done", "total": 0})

return StreamingResponse(
_generate(),

Check warning

Code scanning / CodeQL

Information exposure through an exception Medium

Stack trace information
flows to this location and may be exposed to an external user.
Stack trace information
flows to this location and may be exposed to an external user.
Stack trace information
flows to this location and may be exposed to an external user.
Stack trace information
flows to this location and may be exposed to an external user.
antmikinka added a commit to antmikinka/gaia that referenced this pull request Apr 8, 2026
…tration-v1

Full recursive pipeline analysis (planning-analysis-strategist →
software-program-manager → quality-reviewer → technical-writer-expert)
of amd#606 (feat(memory): agent memory v2 — kovtcharov).

Key findings:
- 4 HIGH severity collisions: _chat_helpers.py, database.py,
  sse_handler.py, routers/mcp.py — all follow same pattern:
  our branch created comprehensive modules where PR amd#606 made
  targeted additions. Resolution: absorb PR's additions into ours
  during post-merge rebase.
- 1 ZERO conflict: sdk.py ChatSDK→AgentSDK rename is identical
  in both branches — auto-resolves on merge.
- 6 build-upon opportunities: MemoryMixin for pipeline agents,
  GoalStore↔PipelineExecutor wiring, AgentLoop convergence,
  SystemDiscovery→DomainAnalyzer calibration, GapDetector caching,
  declarative memory tool-calls in component-framework templates.
- Recommended: PR amd#606 lands in main first, we rebase and absorb.
- Open Items 9–15 added to branch-change-matrix.md tracking
  all conflicts and Phase 6 build-upon work.

Files: docs/reference/pr606-integration-analysis.md (531 lines),
       docs/reference/branch-change-matrix.md (+16 lines, OI 9–15)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@antmikinka
Copy link
Copy Markdown
Collaborator

PR #606 Integration Requirements

What agent-memory-v2 Must Change to Fit the Pipeline Architecture

Document version: 1.0
Date: 2026-04-13
From: Pipeline orchestration team (feature/pipeline-orchestration-v1)
To: kovtcharov, author of PR #606 (agent-memory-v2)
Status: Pre-merge review — action required before merge approval


Who This Is For and What It Covers

This document is addressed to you, kovtcharov. You know your PR deeply. You do not know our branch. This document gives you all the context you need inline — you will not have to look at feature/pipeline-orchestration-v1 or ask us clarifying questions before you start working.

Your PR introduces an agent memory subsystem: MemoryMixin, AgentLoop, GoalStore, MemoryStore, SystemDiscovery, and a set of agent tools registered via _register_agent_memory_ops(). This is valuable work and we want it merged. However, PR #606 currently conflicts with six established integration points on feature/pipeline-orchestration-v1 and one architectural boundary that will block Phase 6 convergence if left unfixed.

This document defines:

  1. Six merge-blockers (REQ-1 through REQ-5, REQ-8) — changes your PR must make before it can merge into the pipeline branch.
  2. Two rebase-blockers (REQ-6, REQ-7) — changes that do not block today's merge but that will cause painful conflicts within 1-2 sprints if deferred past the next rebase.
  3. Five recommendations (REC-1 through REC-5) — improvements that make the integration cleaner but are not gating conditions.
  4. Our half of the contract (BU-1 through BU-4) — what we commit to building on top of your memory system once the above requirements are met.

Read Section 7 (Quick-Start Checklist) if you want to start immediately. Items are ordered by priority and time cost.


Our Pipeline Architecture — Context for Kovtcharov

The feature/pipeline-orchestration-v1 branch is AMD's internal pipeline orchestration system for GAIA. It extends the core Agent UI (src/gaia/ui/) with a recursive, multi-phase pipeline runtime. Here is what we have built, stated precisely so you can orient your changes:

Pipeline orchestrator (src/gaia/pipeline/orchestrator.py, 681 lines): Drives multi-stage agentic pipelines. The central function for recursive execution is _execute_recursive_pipeline() at line 556. This function manages its own async event loop explicitly (lines 656-661) because FastAPI's lifespan loop and Uvicorn's thread pool interact in ways that make loop injection non-trivial — but we have solved this for our use case. Your AgentLoop must not replicate this pattern; see REQ-7 for why.

Pipeline engine (src/gaia/pipeline/engine.py): PipelineEngine class. Drives PipelinePhase state transitions. We intend to converge PipelineEngine and AgentLoop in Phase 6, which is architecturally blocked unless AgentLoop exposes a drivable async interface. See REQ-7.

SSE streaming layer (src/gaia/ui/sse_handler.py, 950 lines): SSEOutputHandler class at line 89 queues typed JSON events for the browser. The pipeline runtime emits six event type strings that are hardwired into the frontend. These strings are reserved. See REQ-3.

Frontend type contract (src/gaia/apps/webui/src/types/index.ts, src/gaia/apps/webui/src/services/api.ts): The browser's TypeScript layer has a StreamEventType union (lines 260-265 of index.ts) that explicitly lists our six pipeline event names. The routing table PIPELINE_EVENT_MAP (lines 626-643 of api.ts) maps those exact string literals to frontend callbacks. Any SSE event type string you emit that collides with one of those six names will be silently misrouted to a pipeline UI callback. See REQ-3.

Agent cache (src/gaia/ui/_chat_helpers.py, 1,144 lines): Live agent instances are stored in _agent_cache: dict[str, dict] (line 54). Each entry is keyed by session_id and contains {"agent": <instance>, "model_id": str, "document_ids": list}. Access is serialized by _agent_cache_lock = threading.Lock() (line 57). Your _register_agent_memory_ops() must use this interface. See REQ-1.

Database schema and migration (src/gaia/ui/database.py, 787 lines): ChatDatabase class. The schema (SCHEMA_SQL, lines 23-76) is executed first; then _migrate() (line 118) runs incremental column additions. _migrate() follows a strict pattern — each column addition is its own try/except block using PRAGMA table_info. This pattern exists so a failed migration on one column does not block all subsequent migrations. Your new columns must follow this pattern. See REQ-2.

Shutdown sequence (src/gaia/ui/server.py, lines 209-213): The application lifespan closes resources in strict order: monitor.stop() at line 210, then db.close() at line 212. If your MemoryStore or GoalStore has a close_store() call, it must go between those two lines. After db.close(), the SQLite connection is gone — any write after that point raises OperationalError. See REQ-4.

Tool registry (src/gaia/agents/base/tools.py): ToolRegistry.register() at line 396 writes directly to self._tools[name] = {...} with no collision guard. When we add MemoryMixin to multiple pipeline agents (BU-1), each agent instantiation will call _register_agent_memory_ops(), registering remember, recall, update_memory, forget, and search_past_conversations five times. Silent overwrites will mask configuration errors. See REQ-6.


Section 1: Required Changes

The following eight items must be addressed before your PR can merge into feature/pipeline-orchestration-v1. Items marked [MERGE-BLOCKER] block the merge vote. Items marked [REBASE-BLOCKER] will cause architectural conflicts on the next rebase if not resolved now.


REQ-1 — Fix Agent Cache Access in _register_agent_memory_ops()

Classification: MERGE-BLOCKER
Estimated effort: 30 minutes
File to change: The file in your PR that contains _register_agent_memory_ops()

What your PR currently does:

Your _register_agent_memory_ops() function accesses the running agent instance through a path that does not match the live agent cache structure in _chat_helpers.py. The specifics depend on how your PR currently retrieves the agent, but the integration point you must target is described below.

What it must do instead:

The live agent for a session is accessed as:

# File: src/gaia/ui/_chat_helpers.py
# Line 54:
_agent_cache: dict[str, dict]
# Structure: session_id -> {"agent": ChatAgent, "model_id": str, "document_ids": list}

# Line 57:
_agent_cache_lock = threading.Lock()

To retrieve a live agent safely, your code must:

  1. Acquire _agent_cache_lock before reading _agent_cache.
  2. Call _agent_cache.get(session_id) — not direct key access.
  3. Check the return value for None before indexing into it.
  4. Access the agent as entry["agent"].
  5. Return early (or raise a descriptive error) if session_id is missing from the cache.

The _store_agent() function (lines 95-111 of _chat_helpers.py) shows the reference write pattern. Mirror that lock discipline when you read.

Why this matters:

The _agent_cache_lock is a threading.Lock, not a reentrant lock. If _register_agent_memory_ops() reads _agent_cache without holding the lock, and a concurrent chat request evicts the entry (line 87 of _chat_helpers.py), you will get a KeyError or read a stale agent reference. The eviction path at line 87 (del _agent_cache[session_id]) runs while holding the lock, so your read must also hold it.

Correctness checklist for REQ-1:

  • Lock is acquired before reading _agent_cache
  • _agent_cache.get(session_id) used (not _agent_cache[session_id])
  • Return early if session_id not in cache (do not raise unhandled KeyError)
  • Agent accessed as entry["agent"]
  • Lock released in finally block or via with statement

REQ-2 — Add New Schema Columns via the _migrate() Pattern

Classification: MERGE-BLOCKER
Estimated effort: 1-2 hours
File to change: src/gaia/ui/database.py

What your PR currently does:

Your PR adds columns (knowledge.embedding BLOB, knowledge.superseded_by TEXT, conversations.consolidated_at TEXT) either directly in SCHEMA_SQL on existing tables or via a migration block that does not follow the established pattern.

What it must do instead:

The _migrate() method (line 118 of database.py) uses a strict pattern: one try/except block per column, each block using PRAGMA table_info to check existence before adding. The last existing block ends at line 176. New column additions must come after line 176.

The required pattern for each column is:

# After line 176 in _migrate()

# <Descriptive comment for the migration>
try:
    cols = [
        row[1]
        for row in self._conn.execute("PRAGMA table_info(<table_name>)").fetchall()
    ]
    if "<column_name>" not in cols:
        self._conn.execute(
            "ALTER TABLE <table_name> ADD COLUMN <column_name> <TYPE>"
        )
        self._conn.commit()
        logger.info("Migrated <table_name> table: added <column_name> column")
except Exception as e:
    logger.debug("Migration check for <column_name>: %s", e)

Apply this pattern three times, one per column:

  • knowledge.embedding BLOB — requires knowledge table to exist first
  • knowledge.superseded_by TEXT — requires knowledge table to exist first
  • conversations.consolidated_at TEXT — requires conversations table to exist first

If knowledge and conversations are new tables:

If your PR introduces these tables as entirely new (not columns on existing tables), add their CREATE TABLE IF NOT EXISTS statements to SCHEMA_SQL (lines 23-76 of database.py). SCHEMA_SQL is executed via executescript() at line 107, which runs before _migrate() at line 108. New columns on new tables may be declared inline in SCHEMA_SQL. Only columns being added to existing, already-deployed tables require the _migrate() try/except pattern.

Why this matters:

The _init_schema() method (line 105) calls executescript(SCHEMA_SQL) then _migrate(). The try/except-per-column pattern in _migrate() ensures that a failure adding embedding (e.g., because the column already exists on a pre-existing database from an older build) does not prevent superseded_by from being added. If you bundle all three columns into one try/except block, a failure on the first column silently skips the others with no error surfaced to the user.

Correctness checklist for REQ-2:

  • Each new column is a separate try/except block
  • All three blocks are placed after line 176 of _migrate()
  • Each block checks PRAGMA table_info before ALTER TABLE
  • New tables (if any) are in SCHEMA_SQL, not in _migrate()
  • _conn.commit() is called after each successful ALTER TABLE

REQ-3 — Rename AgentLoop SSE Event Types to Avoid Frontend Collision

Classification: MERGE-BLOCKER
Estimated effort: 2-3 hours (implementation) + cross-team coordination (1-2 days)
Files to change: All files in your PR that emit SSE event type strings from AgentLoop

What your PR currently does:

Your AgentLoop emits SSE events with type strings that include one or more of the six names reserved by the pipeline orchestration layer.

The six reserved names (do not use any of these):

Reserved string Where it is hardwired Effect of collision
loop_back types/index.ts line 260, api.ts line 637 Routed to onLoopBack pipeline callback
quality_score types/index.ts line 261, api.ts line 638 Routed to onQualityScore pipeline callback
phase_jump types/index.ts line 262, api.ts line 639 Routed to onPhaseJump pipeline callback
iteration_start types/index.ts line 263, api.ts line 640 Routed to onIterationStart pipeline callback
iteration_end types/index.ts line 264, api.ts line 641 Routed to onIterationEnd pipeline callback
defect_found types/index.ts line 265, api.ts line 642 Routed to onDefectFound pipeline callback

The StreamEventType union lives at lines 260-265 of src/gaia/apps/webui/src/types/index.ts. The routing table PIPELINE_EVENT_MAP is a Record<string, keyof PipelineStreamCallbacks> at lines 626-643 of src/gaia/apps/webui/src/services/api.ts. These strings are literal TypeScript values, not constants — they are compared directly against the type field of incoming SSE JSON objects.

What it must do instead:

You must propose replacement names for every AgentLoop event type that collides with the six names above. Send us your proposed names before writing any code — we will confirm no collision exists on our side, then you implement.

Suggested naming convention for AgentLoop events: prefix with memory_ or agent_loop_ to create a distinct namespace. Examples (not binding — send us your proposed list):

  • agent_loop_cycle_start instead of iteration_start
  • agent_loop_cycle_end instead of iteration_end
  • memory_quality_check instead of quality_score
  • agent_loop_back instead of loop_back
  • agent_phase_transition instead of phase_jump
  • memory_defect_detected instead of defect_found

Why this matters:

There is no runtime error when a collision occurs. The frontend PIPELINE_EVENT_MAP routes the event to a pipeline callback. The pipeline callback expects specific fields (iteration, score, defect_type, etc.) that your AgentLoop event does not carry. The pipeline UI component renders with undefined fields or renders stale pipeline state data instead of memory state data. This failure mode is invisible at the Python layer and produces silently wrong UI behavior.

Cross-team coordination required:

Do not implement this change unilaterally. Contact us (pipeline team, feature/pipeline-orchestration-v1) with your proposed replacement name list. We confirm. Then you implement and update REQ-5's MemoryDashboard.tsx accordingly. See Section 6 for the full coordination gate protocol.

Correctness checklist for REQ-3:

  • No AgentLoop SSE event uses any of the 6 reserved strings listed above
  • Replacement names agreed bilaterally before implementation
  • All Python-side emitters updated to the new names
  • MemoryDashboard.tsx event type strings updated to match (see REC-5)

REQ-4 — Insert close_store() in the Correct Position in the Shutdown Sequence

Classification: MERGE-BLOCKER
Estimated effort: 15 minutes
File to change: src/gaia/ui/server.py

What your PR currently does:

Your PR calls close_store() (or equivalent) either after db.close() at line 212, or registers a shutdown hook that runs after the lifespan context exits.

What it must do instead:

The lifespan shutdown block in server.py runs from lines 209-213:

# Line 209 (context yield exits here, shutdown begins)
# Line 210:
        await monitor.stop()
        logger.info("Document file monitor stopped")
# Line 212:
        db.close()
        logger.info("Database connection closed")

Your close_store() call must be inserted between line 210 and line 212 — after monitor.stop() completes, before db.close() is called. The position must look like:

        await monitor.stop()
        logger.info("Document file monitor stopped")
        # INSERT: close_store() here — after monitor, before db
        close_store()
        logger.info("Memory store closed")
        db.close()
        logger.info("Database connection closed")

Why this matters:

db.close() at line 212 calls self._conn.close() on the SQLite connection and then sets self._conn = None (lines 179-182 of database.py). The _transaction() context manager (lines 184-195 of database.py) checks if self._conn is None and raises RuntimeError("Database connection is closed"). If your MemoryStore shares the ChatDatabase connection (or opens its own WAL-mode connection to the same file), a close_store() that runs after db.close() will either operate on a closed connection or fail to flush pending writes before shutdown.

If MemoryStore uses a completely separate file and connection, the ordering still matters for clean log sequencing and to match our lifespan expectations for future shutdown hooks.

Correctness checklist for REQ-4:

  • close_store() is called after monitor.stop() (line 210) and before db.close() (line 212)
  • close_store() is idempotent (safe to call even if store was never opened)

REQ-5 — Do Not Add dependencies to APIRouter in routers/mcp.py

Classification: MERGE-BLOCKER
Estimated effort: 15 minutes
File to change: src/gaia/ui/routers/mcp.py

What your PR currently does:

Your PR adds authentication or dependency injection to the MCP router by modifying the APIRouter constructor at line 16 of routers/mcp.py.

What it must do instead:

Line 16 of routers/mcp.py currently reads:

router = APIRouter(tags=["mcp"])

This line must remain exactly as it is — no dependencies=[...] argument. If your new control endpoints (the ones that trigger memory operations via MCP) require authentication, add the dependencies argument at the individual endpoint level using the @router.post(...) decorator, not at the APIRouter level.

Example of the correct pattern:

# WRONG — do not do this:
router = APIRouter(tags=["mcp"], dependencies=[Depends(verify_auth)])

# CORRECT — per-endpoint auth:
router = APIRouter(tags=["mcp"])

@router.post("/api/mcp/memory/control", dependencies=[Depends(verify_auth)])
async def memory_control_endpoint(...):
    ...

Why this matters:

The existing catalog endpoints — GET /api/mcp/catalog, GET /api/mcp/catalog/{name}, and GET /api/mcp/install-config — are intentionally unauthenticated. They serve a read-only curated catalog of MCP server definitions. These endpoints are accessed by the GAIA installer and by external tooling during first-run setup, before any auth tokens exist. Adding dependencies=[...] to the APIRouter constructor applies that dependency to all routes registered on the router, including these unauthenticated catalog routes. This breaks the installer flow with a 401 or 403 on the catalog fetch.

Correctness checklist for REQ-5:

  • Line 16 of routers/mcp.py reads router = APIRouter(tags=["mcp"]) — no dependencies argument
  • Any auth on new endpoints is applied via @router.post(..., dependencies=[...]) at the individual route level
  • GET /api/mcp/catalog, GET /api/mcp/catalog/{name}, and GET /api/mcp/install-config remain unauthenticated

REQ-6 — Add Collision Guard to ToolRegistry.register() Before MemoryMixin Rollout

Classification: REBASE-BLOCKER
Estimated effort: 1 hour
File to change: src/gaia/agents/base/tools.py

What the current code does:

ToolRegistry.register() at line 396 of tools.py writes:

self._tools[name] = {
    "name": name,
    "function": func,
    "description": description or (func.__doc__ or ""),
    "parameters": params,
    "atomic": atomic,
    "display_name": display_name or name,
}

There is no check for whether name already exists in self._tools. A second call with the same name silently overwrites the previous registration with no log, no warning, and no error.

What it must do instead:

Add a logger.warning() call immediately before the assignment at line 396:

if name in self._tools:
    logger.warning(
        "ToolRegistry: tool name %r is already registered and will be overwritten. "
        "This may indicate duplicate mixin registration.",
        name,
    )
self._tools[name] = {
    ...
}

Why this matters now (not later):

Our BU-1 (described in Section 3) adds MemoryMixin to all five pipeline stage agents: DomainAnalyzer, PlannerAgent, ExecutorAgent, ReviewerAgent, and SynthesizerAgent. Each agent instantiation calls _register_agent_memory_ops(), which registers your five memory tool names: remember, recall, update_memory, forget, search_past_conversations. All five agents share a single process and, in some pipeline configurations, a shared ToolRegistry instance. That means your five tool names will be registered five times.

Silent overwrites in this scenario will mask real bugs: if DomainAnalyzer's remember function is accidentally overwritten by SynthesizerAgent's remember function (due to, say, a different bound self), the pipeline will call the wrong agent's memory with no diagnostic output. With a logger.warning() in place, we will see exactly which agent caused the collision and can fix the registry-sharing design before it causes data corruption.

The five tool names are currently safe (each agent registers the same function with the same signature). The guard is needed as insurance before we scale to five registrations.

Correctness checklist for REQ-6:

  • logger.warning() is emitted when name in self._tools before the assignment at line 396
  • Warning message includes the tool name and indicates potential duplicate mixin registration
  • The overwrite still proceeds after the warning (do not raise — existing behavior is preserved for non-memory tools)
  • logger is already imported in tools.py — do not add a new import

REQ-7 — Make AgentLoop Externally Drivable (Injectable Event Loop + Async Generator Interface)

Classification: REBASE-BLOCKER
Estimated effort: 3-5 days
Files to change: Your AgentLoop class and any caller that instantiates it

What your PR currently does:

Your AgentLoop creates its own event loop internally — either via asyncio.new_event_loop() or asyncio.run() — and runs to completion. It is not drivable from an external async caller.

What it must do instead:

AgentLoop must expose three things:

1. Injectable event loop:

AgentLoop must accept an optional event loop parameter in its constructor. It must not call asyncio.new_event_loop() or asyncio.set_event_loop() internally. If no loop is provided, it may use asyncio.get_event_loop(), but it must never create or set one.

class AgentLoop:
    def __init__(self, ..., loop: asyncio.AbstractEventLoop | None = None):
        self._loop = loop  # Injected by caller, not created here

2. Async generator interface:

AgentLoop must expose a coroutine or async generator that PipelineEngine can drive:

async def run(self, goal: str) -> AsyncIterator[LoopEvent]:
    """
    Drive the agent loop externally.
    Yields LoopEvent objects as work progresses.
    Caller controls when to advance, cancel, or inspect state.
    """
    ...
    yield LoopEvent(type="cycle_start", ...)
    ...
    yield LoopEvent(type="cycle_end", ...)

The exact LoopEvent schema is flexible — work with us on the spec (see BU-3). The requirement is that the method is an async generator (or returns an AsyncIterator) so PipelineEngine can async for event in loop.run(goal): without blocking its own coroutine.

3. GoalStore as constructor injection:

AgentLoop must accept GoalStore as a constructor parameter rather than instantiating it internally. This allows PipelineOrchestrator to pass its own GoalStore instance:

class AgentLoop:
    def __init__(self, ..., goal_store: GoalStore | None = None):
        self._goal_store = goal_store or GoalStore()

Why this matters:

_execute_recursive_pipeline() (lines 656-661 of src/gaia/pipeline/orchestrator.py) creates its own event loop:

loop = asyncio.new_event_loop()
asyncio.set_event_loop(loop)
try:
    return loop.run_until_complete(_run_async())
finally:
    loop.close()

This pattern exists because _execute_recursive_pipeline() is called from a sync context (the FastAPI route handler thread). If AgentLoop also calls asyncio.new_event_loop() inside a coroutine that is already running inside loop.run_until_complete(), Python will raise RuntimeError: This event loop is already running. There is no safe way to nest two asyncio.new_event_loop() calls in the same thread.

Our Phase 6 plan (BU-3) is to converge PipelineEngine and AgentLoop into a single runtime where the engine drives the loop iteration-by-iteration. That convergence is architecturally impossible if AgentLoop owns its own event loop. The injectable loop + async generator interface is the minimal contract that makes convergence tractable without a full rewrite.

Correctness checklist for REQ-7:

  • AgentLoop.__init__ accepts an optional loop parameter
  • AgentLoop does not call asyncio.new_event_loop() or asyncio.set_event_loop() anywhere
  • AgentLoop.run(goal) is an async generator that yields LoopEvent objects
  • GoalStore is injected via the constructor, not instantiated inside AgentLoop
  • AgentLoop can be used both standalone (with default GoalStore) and driven externally

REQ-8 — Scope the Curly-Brace Escaping Fix to _get_mixin_prompts() Only

Classification: MERGE-BLOCKER
Estimated effort: 1 hour
Files to change: src/gaia/agents/base/agent.py (scope the fix); verify no change to src/gaia/utils/component_loader.py

What your PR currently does:

Your PR fixes a curly-brace escaping issue where mixin prompt strings containing {...} were being interpreted as Python .format() placeholders. The fix escapes or converts the strings before the format call. However, your fix either touches component_loader.py or applies the escaping too broadly (affecting code paths beyond _get_mixin_prompts()).

What it must do instead:

The escaping fix must be applied only inside _get_mixin_prompts() at line 299 of src/gaia/agents/base/agent.py. It must not touch src/gaia/utils/component_loader.py at all.

Here is why component_loader.py must not be changed: ComponentLoader.render_component() (lines 240-246 of component_loader.py) uses str.replace() with {{KEY}} format:

# Lines 240-246 of src/gaia/utils/component_loader.py
# Replace {{VARIABLE}} placeholders
for key, value in variables.items():
    # Handle both "KEY" and "{{KEY}}" formats
    if not key.startswith("{{"):
        key = f"{{{{{key}}}}}"

    content = content.replace(key, str(value))

This is a literal str.replace() call — not Python string .format(). It operates on {{KEY}} tokens by doing a character-by-character string search and replace. Python .format() escaping rules (doubling curly braces as {{ and }} to produce literal { and }) are completely orthogonal to this mechanism. There is no interaction between Python .format() and str.replace(). Changing component_loader.py in response to a .format() escaping bug is a category error — the two systems do not share state, parsing, or execution path.

Verification step you must perform:

After your changes, run:

git diff HEAD -- src/gaia/utils/component_loader.py

The diff must be empty. If it is not empty, your fix has overreached. Revert the component_loader.py changes and re-scope the fix to agent.py line 299 only.

Correctness checklist for REQ-8:

  • Curly-brace escaping fix is inside _get_mixin_prompts() at agent.py line 299
  • src/gaia/utils/component_loader.py is unchanged (verified by git diff)
  • ComponentLoader.render_component() behavior (lines 240-246) is unaffected
  • Fix does not apply to any other string formatting path in agent.py

Section 2: Recommended Changes

These five items are not gating conditions for the merge but will make the integration significantly smoother. Items marked [PRE-MERGE] we would strongly prefer to see in the PR before merge. Items marked [POST-MERGE] can follow in a subsequent PR.


REC-1 — Stabilize and Document GoalStore Public API Signatures [PRE-MERGE]

What we need:

We are building BU-2 (wiring GoalStore into PipelineOrchestrator) against your GoalStore API. We need the following three method signatures to be stable and documented in your PR description before we start:

def create_goal(title: str, priority: int) -> str:
    """Create a new goal. Returns the goal_id (string UUID)."""

def create_task(goal_id: str, title: str) -> str:
    """Create a task under a goal. Returns the task_id (string UUID)."""

def update_task_status(
    task_id: str,
    status: Literal["PENDING", "ACTIVE", "COMPLETED", "FAILED"]
) -> None:
    """Update the status of a task. Raises KeyError if task_id not found."""

If your current implementation differs in signature (different parameter names, different return type, different status literals), tell us before merge. We will update BU-2's dependency spec to match. We just need the signatures to be frozen so BU-2 does not break on every rebase.

Action: Add the exact method signatures with docstrings to your PR description. If the implementation deviates from what is above, describe how it deviates.


REC-2 — SystemDiscovery.get_cached_context() Must Return a Typed Object [PRE-MERGE]

What we need:

We are wiring SystemDiscovery.get_cached_context() into our DomainAnalyzer agent (BU-4) to enable NPU-aware tier recommendations. We need the return value to reliably expose:

context.hardware.npu_available: bool
context.hardware.npu_driver_version: str  # empty string when no NPU, not None

The import path must be:

from gaia.agents.base.discovery import SystemDiscovery

The method must be idempotent: the first call performs hardware discovery and caches the result; subsequent calls return the cached result without re-probing hardware. This is required because DomainAnalyzer may be instantiated multiple times per pipeline run.

The npu_driver_version must be str, not Optional[str]. When no NPU is present, return an empty string "". Our code does if context.hardware.npu_driver_version: and an empty string is falsy — None requires a separate is None check that we do not want to add.

Action: Verify your SystemDiscovery.get_cached_context() implementation matches the above contract. If the attribute path is different (e.g., context.npu.available instead of context.hardware.npu_available), tell us the actual path so we can update BU-4's implementation.


REC-3 — Acknowledge the AgentRegistry Cache Staleness Race in Code Comments [POST-MERGE]

Context:

Code in your PR that reads AgentDefinition objects from AgentRegistry may encounter stale cached definitions. We are actively fixing issue OI-21: adding a file lock and an invalidate_capability_cache() call to the PUT /api/v1/pipeline/agents/{agent_id}/raw write path. Until OI-21 lands, there is a race window: your code reads a cached definition while a concurrent pipeline API write updates the underlying file.

What we ask:

Add a code comment in the AgentLoop or MemoryMixin code that reads from AgentRegistry acknowledging this race:

# NOTE: AgentDefinition read from AgentRegistry may be stale if a concurrent
# PUT /api/v1/pipeline/agents/{agent_id}/raw write is in progress.
# OI-21 (pipeline team) adds invalidate_capability_cache() on that write path.
# Until OI-21 lands, treat AgentDefinition as a read-only snapshot.

This prevents future contributors from assuming the registry is always consistent and prevents the issue from being silently papered over.


REC-4 — Use a Separate File Path for MemoryStore SQLite Database [PRE-MERGE]

What we need:

MemoryStore should default to ~/.gaia/memory/gaia_memory.db, not ~/.gaia/chat/gaia_chat.db.

Why:

ChatDatabase opens its connection with PRAGMA journal_mode = WAL (line 101 of database.py). WAL mode allows one writer and multiple readers on the same database file. However, two Python objects opening separate sqlite3.connect() connections to the same WAL-mode file — one for chat and one for memory — compete as simultaneous writers. Under concurrent chat message writes and memory consolidation writes, one of the two connections will receive OperationalError: database is locked.

SQLite's WAL mode serializes writers at the OS level, not the Python thread level. The ChatDatabase._lock = threading.RLock() (line 93 of database.py) serializes within the ChatDatabase instance, but it does not protect against a second sqlite3.connect() opened by MemoryStore from a different Python object.

Using a separate file (~/.gaia/memory/gaia_memory.db) eliminates the contention entirely. The memory database can run its own WAL-mode connection independently.


REC-5 — Add MemoryEventType as a Separate Union in types/index.ts [POST-MERGE]

What we need:

After REQ-3 renames are agreed and implemented, your AgentLoop will emit new event type strings (the non-colliding names). These strings must be added to the TypeScript type system. However, they must not be added to the existing StreamEventType union at lines 260-265 of src/gaia/apps/webui/src/types/index.ts.

Instead, create a separate union:

// In src/gaia/apps/webui/src/types/index.ts
// Add AFTER the StreamEventType union (after line 265):

/** Event types emitted specifically by AgentLoop (memory subsystem). */
export type MemoryEventType =
    | 'agent_loop_cycle_start'   // example — use agreed names from REQ-3
    | 'agent_loop_cycle_end'
    | 'memory_quality_check'
    | 'memory_recalled'
    | 'memory_consolidated';
    // ... add all agreed AgentLoop event names here

Why separate unions:

StreamEventType is used by SSEOutputHandler (line 89 of sse_handler.py) and the pipeline SSE router. PIPELINE_EVENT_MAP in api.ts routes StreamEventType strings. Extending StreamEventType with memory events means the pipeline SSE router would need to handle memory events — coupling two subsystems that should be independent. A separate MemoryEventType union keeps the two namespaces clean and allows MemoryDashboard.tsx to narrow its type to MemoryEventType without accepting the full StreamEventType union.


Section 3: What We Will Build On Top

This section is our contractual commitment to you. When your PR meets the requirements above, we will build the following on top of your memory subsystem. The dependency annotations show which REQ or REC items must be complete first.


BU-1 — Add MemoryMixin to All Five Pipeline Stage Agents

Dependencies: REQ-1 (cache access), REQ-2 (schema migration), REQ-3 (SSE name collision), REQ-6 (collision guard)
Our estimated effort: 1 sprint
What we will do:

We will apply MemoryMixin to these five agents:

  1. DomainAnalyzer — stores domain context as long-term knowledge
  2. PlannerAgent — stores plan decisions for retrospective analysis
  3. ExecutorAgent — stores tool call results and execution trace
  4. ReviewerAgent — stores quality assessments and defect patterns
  5. SynthesizerAgent — stores synthesized outputs for cross-run deduplication

Each agent's __init__ will call _register_agent_memory_ops(), registering your five tool names five times total. REQ-6's collision guard must be in place to make this safe to debug.

REQ-1 must be in place so _register_agent_memory_ops() correctly retrieves the live agent instance from _agent_cache. REQ-2 must be in place so the memory schema columns exist before any memory write occurs. REQ-3 must be in place so the memory SSE events from these five agents do not collide with pipeline SSE events in the browser.


BU-2 — Wire GoalStore into PipelineOrchestrator

Dependencies: REC-1 (stable GoalStore API), REQ-4 (shutdown order)
Our estimated effort: 3-5 days
What we will do:

PipelineOrchestrator will accept a GoalStore instance and call:

  • create_goal(title=task_description, priority=1) at pipeline start
  • create_task(goal_id, title=stage_name) for each pipeline stage
  • update_task_status(task_id, "ACTIVE") when a stage starts
  • update_task_status(task_id, "COMPLETED" or "FAILED") when a stage ends

This makes pipeline runs visible in the Memory Dashboard without any change to the dashboard itself. The dashboard will show pipeline goals alongside agent memory goals, unified by GoalStore.

REQ-4 must be in place to ensure GoalStore (and its underlying MemoryStore) is closed before db.close() runs during application shutdown.

We are building against exactly the three method signatures specified in REC-1. If those signatures change after we start BU-2, we will need a version bump and a migration.


BU-3 — Phase 6 Convergence Spec: AgentLoop / PipelineEngine Unified Runtime

Dependencies: REQ-7 (injectable event loop + async generator interface)
Our estimated effort: 1-2 sprints (design), 2-3 sprints (implementation)
What we will do:

We will write a convergence specification that defines how AgentLoop and PipelineEngine merge into a single runtime. The unified runtime will:

  • Accept a goal or task description
  • Drive pipeline phases (domain → plan → execute → review → synthesize) as AgentLoop cycles
  • Emit a unified event stream that covers both LoopEvent (from your system) and PipelineEvent (from ours)
  • Support both memory-first (AgentLoop drives, engine records state) and pipeline-first (engine drives, loop provides execution substrate) operating modes

This convergence is only tractable if AgentLoop exposes the injectable event loop and async generator interface from REQ-7. Without REQ-7, the two systems cannot share a coroutine runtime without nesting event loops, which Python does not support.

We will invite you to co-author the convergence spec once REQ-7 is implemented.


BU-4 — Wire SystemDiscovery into DomainAnalyzer for NPU-Aware Tier Recommendations

Dependencies: REC-2 (SystemDiscovery.get_cached_context() returns typed object with hardware.npu_available)
Our estimated effort: 2-3 days
What we will do:

DomainAnalyzer currently selects a model tier based on task complexity alone. We will extend it to check context.hardware.npu_available from SystemDiscovery.get_cached_context(). When an NPU is available, DomainAnalyzer will recommend smaller quantized models (Tier 1-2) that run efficiently on the Ryzen AI NPU. When no NPU is present, it falls back to the current tier selection logic.

This requires REC-2 because DomainAnalyzer is instantiated multiple times per pipeline run and cannot afford the latency of repeated hardware probes. The caching guarantee in REC-2 is essential.


Section 4: Integration Contract Table

This table shows the two-way dependency map. "Kovtcharov provides" means your PR must implement it. "Pipeline team provides" means we build it after your requirement is met.

Dependency Kovtcharov provides Pipeline team consumes Unblocked by
Agent cache access REQ-1: correct _agent_cache read with lock MemoryMixin in pipeline agents REQ-1
Schema migration REQ-2: three columns via _migrate() pattern Memory reads/writes in pipeline runs REQ-2
SSE event names REQ-3: non-colliding AgentLoop event names Browser memory UI (no misrouting) REQ-3
Shutdown order REQ-4: close_store() before db.close() Clean process exit after pipeline runs REQ-4
Router auth scope REQ-5: per-endpoint auth only MCP catalog used by installer REQ-5
Tool collision guard REQ-6: logger.warning() on overwrite MemoryMixin on 5 pipeline agents (BU-1) REQ-6
Drivable AgentLoop REQ-7: async generator + injectable loop PipelineEngine convergence (BU-3) REQ-7
Scoped brace fix REQ-8: only _get_mixin_prompts() touched ComponentLoader template rendering REQ-8
GoalStore API REC-1: stable method signatures GoalStore in PipelineOrchestrator (BU-2) REC-1
SystemDiscovery REC-2: typed return with hardware.npu_available NPU-aware DomainAnalyzer (BU-4) REC-2
Registry staleness REC-3: code comment acknowledging race Future OI-21 fix REC-3
Separate DB file REC-4: MemoryStore on ~/.gaia/memory/ No WAL contention with ChatDatabase REC-4
MemoryEventType REC-5: separate TS union after REQ-3 names agreed MemoryDashboard type safety REQ-3 + REC-5
MemoryMixin rollout BU-1: pipeline team adds MemoryMixin AgentLoop events from pipeline agents REQ-1,2,3,6
GoalStore wiring BU-2: pipeline team wires GoalStore Pipeline runs in Memory Dashboard REC-1, REQ-4
Convergence spec BU-3: pipeline team authors Phase 6 spec Unified runtime for Phase 6 REQ-7
NPU tier selection BU-4: pipeline team extends DomainAnalyzer Efficient model selection on Ryzen AI REC-2

Section 5: Merge Sequencing

The requirements above have dependencies. This section describes the recommended order of work to minimize rework.

Phase A — Unblocked work (start immediately, no cross-team coordination needed)

These items have no external dependencies and no coordination requirements. Estimated total: 4-5 hours.

Item Effort File
REQ-1 — Fix _agent_cache access 30 min Your _register_agent_memory_ops() file
REQ-2 — Fix _migrate() additions 1-2 hr src/gaia/ui/database.py
REQ-4 — Fix shutdown order 15 min src/gaia/ui/server.py
REQ-5 — Remove router-level auth 15 min src/gaia/ui/routers/mcp.py
REQ-8 — Scope brace-escaping fix 1 hr src/gaia/agents/base/agent.py
REC-1 — Lock GoalStore signatures 30 min PR description
REC-4 — Separate MemoryStore path 30 min Your MemoryStore default path

Phase B — Coordination required (send us your proposal, wait for confirmation)

Item Action Waiting on
REQ-3 — SSE name collision Send us your proposed AgentLoop event names Our confirmation (target: 1 business day)

Do not write any code for REQ-3 until we confirm the names. Writing first means you may have to rename everything again if we find a collision with a future event name we have pending in a feature branch.

Phase C — After REQ-3 names confirmed (implement)

Item Effort File
REQ-3 implementation 2-3 hr All AgentLoop SSE emitters
REC-5 — MemoryEventType union 1 hr src/gaia/apps/webui/src/types/index.ts
MemoryDashboard.tsx updates 1 hr Your MemoryDashboard.tsx

Phase D — Long-lead work (can run in parallel with Phase A-C, longer timeline)

Item Effort Notes
REQ-6 — Collision guard 1 hr Can land any time before BU-1 starts
REQ-7 — Drivable AgentLoop 3-5 days Does not block Phase A-C merge
REC-2 — SystemDiscovery contract 1 hr Needed before BU-4 starts
REC-3 — Registry staleness comment 30 min Post-merge PR acceptable

Target merge gate: All of Phase A complete + REQ-3 implementation complete (Phase C) = merge-ready. REQ-6 and REQ-7 can follow in a subsequent PR on a 1-sprint timeline.


Section 6: Cross-Team Coordination Gate (REQ-3 Name Agreement)

REQ-3 requires bilateral agreement on the AgentLoop SSE event type names before any implementation. This section defines the protocol.

The problem in concrete terms

Your AgentLoop emits SSE events. The browser receives them as JSON objects: {"type": "loop_back", "data": {...}}. The browser's PIPELINE_EVENT_MAP (a plain JavaScript object at lines 626-643 of src/gaia/apps/webui/src/services/api.ts) looks up event.type as a key and routes the event to a callback.

The six pipeline-owned strings are keys in PIPELINE_EVENT_MAP. If your event type matches one of those keys, the event is routed to a pipeline callback. There is no error — the pipeline callback silently receives wrong data. The memory UI receives nothing.

This is not a bug we can fix after the fact with a minor patch — it requires a coordinated rename across Python emitters, TypeScript types, and the dashboard component.

The protocol

Step 1 — You send us a list.

Reply to this document (or message the pipeline team directly) with a list of all SSE event type strings your AgentLoop emits. For example:

AgentLoop SSE event types (proposed):
- agent_loop_cycle_start
- agent_loop_cycle_end
- memory_quality_check
- agent_loop_back
- agent_phase_transition
- memory_defect_detected
- memory_recalled
- memory_consolidated
- goal_created
- task_status_changed

Step 2 — We check for collisions.

We check your proposed names against:

  1. The six pipeline-reserved names (listed in REQ-3)
  2. Any pending event names in our feature branch that have not landed yet
  3. Names in the broader StreamEventType union (lines 255-265 of types/index.ts)

Step 3 — We confirm (target: 1 business day).

We reply with either "approved" or "here are the names that conflict — please propose alternatives for those."

Step 4 — You implement.

Only after step 3 confirmation do you implement the renames in your Python emitters, TypeScript types, and dashboard component.

Why this gate exists

The six pipeline event names were not chosen arbitrarily. They correspond to conceptual phases in the recursive pipeline: looping back (quality retry), scoring (quality metric), phase jumping (state machine transition), and defect detection. If we let AgentLoop use similar concepts with similar names, we will have a permanent naming conflict as the two systems co-evolve. The gate forces us to align vocabulary upfront.


Section 7: Quick-Start Checklist

Seven items kovtcharov can start today, in priority order. Total estimated time for items 1-6: approximately 4.5 hours.

Item 1 — Fix _agent_cache access (REQ-1)

Time: 30 minutes
Do: In _register_agent_memory_ops(), replace your current agent retrieval with the _agent_cache_lock + _agent_cache.get(session_id) + entry["agent"] pattern. Add an early return if session_id is not in the cache.
Verify: Run your existing memory tool tests. Confirm no KeyError on missing session.

Item 2 — Send us your AgentLoop SSE event name list (REQ-3 coordination)

Time: 15 minutes (to draft the list)
Do: List every SSE event type string your AgentLoop emits. Send to the pipeline team (feature/pipeline-orchestration-v1). No code changes yet.
Verify: Your list contains no names from the six reserved names in REQ-3. If it does, propose alternatives in the same message.

Item 3 — Fix _migrate() column additions (REQ-2)

Time: 1-2 hours
Do: For each of the three new columns, add a separate try/except block after line 176 of database.py using the PRAGMA table_info pattern. If knowledge and conversations are new tables, add CREATE TABLE IF NOT EXISTS to SCHEMA_SQL.
Verify: Run python -c "from gaia.ui.database import ChatDatabase; db = ChatDatabase(':memory:'); db.close()" and confirm no exception.

Item 4 — Remove router-level auth (REQ-5)

Time: 15 minutes
Do: Confirm line 16 of src/gaia/ui/routers/mcp.py reads router = APIRouter(tags=["mcp"]) with no dependencies argument. Move any auth to individual @router.post(...) decorators.
Verify: grep -n "dependencies" src/gaia/ui/routers/mcp.py — the APIRouter line must not appear in results.

Item 5 — Scope the curly-brace fix to _get_mixin_prompts() (REQ-8)

Time: 1 hour
Do: Ensure your escaping fix touches only _get_mixin_prompts() at line 299 of agent.py. Run git diff HEAD -- src/gaia/utils/component_loader.py and verify the output is empty.
Verify: Write a unit test: create a mixin that returns a prompt string with {variable} in it; confirm it does not raise KeyError or ValueError. Confirm render_component() still works with {{KEY}} substitution.

Item 6 — Fix shutdown order (REQ-4)

Time: 15 minutes
Do: In server.py, move your close_store() call to run after monitor.stop() (line 210) and before db.close() (line 212). The exact insertion point is between those two existing lines.
Verify: Read the lifespan block in server.py and confirm the sequence is: monitor.stop()close_store()db.close().

Item 7 — Lock in GoalStore API signatures in the PR description (REC-1)

Time: 30 minutes
Do: Add the exact method signatures for create_goal(), create_task(), and update_task_status() to your PR #606 description. Include parameter types, return types, and error behavior (what happens if task_id not found).
Verify: We (pipeline team) will confirm receipt and that BU-2's dependency on those signatures is satisfied.


Appendix A: File Reference Summary

All files mentioned in this document, with locations verified against HEAD d187907.

File Lines Relevant to
src/gaia/ui/_chat_helpers.py 1,144 REQ-1
src/gaia/ui/database.py 787 REQ-2, REC-4
src/gaia/ui/sse_handler.py 950 REQ-3, context
src/gaia/ui/routers/mcp.py 425 REQ-5
src/gaia/ui/server.py REQ-4
src/gaia/agents/base/tools.py REQ-6
src/gaia/pipeline/orchestrator.py 681 REQ-7, context
src/gaia/utils/component_loader.py REQ-8
src/gaia/agents/base/agent.py REQ-8
src/gaia/apps/webui/src/types/index.ts REQ-3, REC-5
src/gaia/apps/webui/src/services/api.ts REQ-3, REC-5

Appendix B: Specific Line Numbers for Every Claim

Every line number in this document was verified against HEAD d187907 of feature/pipeline-orchestration-v1. If you are working on a different commit, verify the line numbers have not shifted before using them as edit targets.

Claim File Line
_agent_cache declaration _chat_helpers.py 54
_agent_cache_lock declaration _chat_helpers.py 57
_store_agent() cache write _chat_helpers.py 102-106
SCHEMA_SQL start database.py 23
_init_schema() definition database.py 105
_migrate() definition database.py 118
_ensure_settings_table() call database.py 121
Last existing migration block end database.py 176
PRAGMA journal_mode = WAL database.py 101
SSEOutputHandler class sse_handler.py 89
router = APIRouter(tags=["mcp"]) routers/mcp.py 16
monitor.stop() in shutdown server.py 210
db.close() in shutdown server.py 212
ToolRegistry.register() write tools.py 396
_execute_recursive_pipeline() orchestrator.py 556
Event loop creation in orchestrator orchestrator.py 656-661
asyncio local import orchestrator.py 584
render_component() str.replace block component_loader.py 240-246
_get_mixin_prompts() agent.py 299
StreamEventType pipeline entries types/index.ts 260-265
PIPELINE_EVENT_MAP api.ts 626-643

This document was written by the pipeline orchestration team on feature/pipeline-orchestration-v1. For questions about requirements, contact us before implementing. For questions about your own PR, you know it better than we do. For cross-team coordination on REQ-3 names, initiate contact with a proposed name list — we will respond within one business day.

kovtcharov added a commit that referenced this pull request Apr 17, 2026
Two-phase local-first email triage agent — MVT (~1.5d CC-assisted) for
v0.20.0, full EmailTriageAgent for v0.23.0. Covers auto-discovery, per-cohort
autonomy, speech-act classification, undo ledger, Slack as first-class output
channel, and an honest §27 catalog of research bets and unvalidated claims.

§22.4 maps outstanding PRs to prerequisite role: #606 / #517 / #495 / #622 /
#779 / #741 / #737. Landing the "minimum set" of #495 + #741 + one of #606 /
#517 M1 collapses most of the missing-infrastructure workarounds before
implementation starts.
github-merge-queue bot pushed a commit that referenced this pull request Apr 18, 2026
## Summary

Adds a two-phase spec for a local-first email triage agent that runs
inference on-device via Lemonade (Ryzen AI NPU/iGPU) — no email content
transits a cloud API. Phase **MVT** ships in ~1.5 days (CC-assisted) by
thin-wrapping existing primitives; **Phase C1** polishes UX for v0.20.0;
**Phase C2** adds scheduled triage, Agent Inbox HITL, and in-tree Gmail
MCP for v0.23.0. Slack is a first-class output channel from day one
(webhook → MCP → interactive buttons across phases).

## Key threads

- **MVT ships fast because ~95% of plumbing exists.** §2.5 maps every
required capability to an existing GAIA primitive (`MCPClientMixin`,
`DatabaseMixin`, `RAGSDK`, `TalkSDK`, `SummarizeAgent`, `ApiAgent`,
SSE). Why it matters: scoping the MVT as thin wrappers rather than new
plumbing is what makes the ~1.5d estimate credible.
- **§22.4 catalogs in-flight PRs as prerequisites.** Maps
[#606](#606) (memory v2),
[#517](#517) (autonomy M1/M3/M5),
[#495](#495) (security.py),
[#622](#622) (orchestrator),
[#779](#779) (eval),
[#741](#741) (vault),
[#737](#737) (Slack connector) to
which spec risks each one collapses. Why it matters: the "minimum set to
start MVT safely" is named explicitly — #495 + #741 + one of #606 / #517
M1 — so sequencing is actionable.
- **Memory-PR conflict flagged (§22.4.4).** #606 and #517 M1 overlap on
memory subsystem; §22.4.4 calls out the reconciliation as a prerequisite
decision, not a runtime surprise.
- **§27 "Known Weaknesses, Unvalidated Claims, Decision Debt"** names
the research bets (Custom AI Labels on local 4B, per-relationship voice,
auto-follow-up quality) and unvalidated claims cited in the spec (97.5%
tool-call reliability, GongRzhe archive date, etc.) so C2 isn't treated
as an engineering certainty.
- **Slack integration scoped as an output channel (§12.18).** Webhook at
MVT → Slack MCP at C1 → interactive approve/edit/reject buttons at C2.
Aligned with
[messaging-integrations-plan.mdx](https://github.com/amd/gaia/blob/main/docs/plans/messaging-integrations-plan.mdx)
(#635).

## Test plan

- [ ] Render preview of `docs/plans/email-triage-agent.mdx` via Mintlify
dev or amd-gaia.ai preview — confirm frontmatter, tables, code blocks,
and section numbering (1–28) render cleanly.
- [ ] Verify `docs/docs.json` navigation entry places the page under
*Agent UI* group next to `email-calendar-integration`.
- [ ] Cross-reference check: every `[Link](file.mdx)` target exists
(`email-calendar-integration`, `autonomy-engine`, `security-model`,
`agent-ui`, `setup-wizard`, `messaging-integrations-plan`).
- [ ] Scan §22.4 PR numbers against the current PR queue (`gh pr list
--repo amd/gaia --state open`) to confirm they're still open and the
recommended sequence is feasible.
@kovtcharov
Copy link
Copy Markdown
Collaborator Author

Heads-up: semantic overlap with PR #495's scratchpad

@kovtcharov — flagging a tool-selection collision to think through before both PRs land.

PR #495 adds ScratchpadToolsMixin with query_data(sql: str) — a read-only SQL interface to a per-session SQLite workspace at ~/.gaia/scratchpad.db. It's the terminal step of a find_files → create_table → insert_data → query_data workflow for multi-document structured analysis (spending analysis, research reviews, tax prep).

PR #606 adds recall(query: str, ...) — hybrid vector + FTS5 + RRF + cross-encoder rerank over ~/.gaia/memory.db. The terminal step of a "store a fact → recall it later" workflow for personal knowledge / note-taking.

The collision

After both PRs merge, the ChatAgent will have both of these tools registered simultaneously. They answer two genuinely different questions:

Query intent Right tool Why
"What did I spend on groceries in March?" query_data Structured data from a document pipeline, aggregate math
"What did I learn about FTS5 last week?" recall Semantic concept match across conversation history
"How many research papers mention transformer attention?" Either, depending on how they were stored Ambiguous

The LLM will not naturally know which to pick. The third row is the failure mode — an LLM could pick recall when the data lives in a scratchpad table, or vice versa, and get zero results.

What I'd suggest

  1. Mutually-exclusive system prompt section that explicitly disambiguates. Something like:
    **WHICH "QUERY" TOOL TO USE:**
    - `query_data(sql)` — for data you put there via `insert_data` in this session.
      Always SQL. Always structured. Always recent.
    - `recall(query)` — for facts/notes you stored via `remember`, or conversation
      content from prior sessions. Always semantic. Always persistent.
    - If the data didn't come from a `create_table` / `insert_data` pair, it's not
      in the scratchpad — use `recall`.
    
  2. Distinct tool names to match distinct contexts. query_data is already good; consider renaming recallrecall_memory so the namespace visually separates ephemeral SQL work from persistent personal memory.
  3. A guard-rail test — an eval prompt like "What's my current PTO balance from the employee handbook?" (RAG-retrieval, not scratchpad) and "How much did I spend on groceries in March?" (scratchpad, not recall) back-to-back, with ground truth. Catches cross-tool confusion the moment it regresses.

Scope suggestion

This is coordination work, not a blocker for either PR. Reasonable landings:

Either way, worth a 10-minute conversation before the second of the two merges.

Cross-referencing from the #495 final-state comment: #495 (comment)


🤖 Generated with Claude Code

# Conflicts:
#	src/gaia/apps/webui/package-lock.json
#	src/gaia/apps/webui/src/components/ChatView.tsx
#	src/gaia/apps/webui/src/services/api.ts
#	src/gaia/apps/webui/src/types/index.ts
#	src/gaia/ui/_chat_helpers.py
#	src/gaia/ui/database.py
#	src/gaia/ui/models.py
#	src/gaia/ui/routers/sessions.py
#	src/gaia/ui/server.py
#	src/gaia/ui/sse_handler.py
#	src/gaia/ui/utils.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent system changes chat Chat SDK changes cli CLI changes dependencies Dependency updates documentation Documentation changes electron Electron app changes eval Evaluation framework changes mcp MCP integration changes performance Performance-critical changes tests Test changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants