The only memory layer that learns how you work β not just what you said. Persistent, local memory for AI coding agents: Claude Code, Codex CLI, Cursor, any MCP client. Temporal knowledge graph Β· procedural memory Β· AST codebase ingest Β· cross-project analogy Β· 3D WebGL visualization.
Why this, not mem0 / Letta / Zep / Supermemory / Cognee? β docs/vs-competitors.md
Bugfix release. Four v11 W3 MCP tools β memory_recall_iterative,
memory_temporal_query, memory_entity_resolve, memory_consolidate_status β
were silently broken on main: the dispatcher forwarded an out-of-scope
args symbol, the resulting NameError was swallowed by call_tool's
exception handler, and clients saw "Error: name 'args' is not defined".
Fix passes the per-call args, with regression coverage via
tests/test_v11_dispatch_args.py.
Also aligns the Codex installer env with the .tam memory layout
(TAM_MEMORY_DIR canonical, CLAUDE_MEMORY_DIR kept as compatibility alias,
MEMORY_MODE=fast default) and isolates install tests from real
launchctl / systemctl / XDG directories. Full notes in
CHANGELOG.md.
Claude Code v2.1.139+ emits subagent IDs on every API request
(x-claude-code-agent-id / x-claude-code-parent-agent-id HTTP headers,
plus the same fields as agent_id / parent_agent_id attributes on the
claude_code.tool and claude_code.llm_request OTEL spans). v12.1 wires
these through end-to-end:
- Schema (migration
028_agent_lineage.sql) β nullableagent_idandparent_agent_idcolumns onknowledge, partial indexes (WHERE β¦ IS NOT NULL) so lineage filters are free. - MCP tools β
memory_saveandmemory_save_fastaccept two new optional inputs:agent_idandparent_agent_id. Old callers see no behaviour change. extract_transcript.pyβ readsagent_id/agentId/parent_agent_id/parentAgentIdfrom.jsonlwhen Claude Code writes them, and falls back toisSidechain=trueas a proxy: sessions with any sidechain activity getagent_id = "session-<id>"plus ahas-subagent-worktag on their auto-extracted rows.- KG fact
spawned_byβ whenmemory_savecarries both ids, the store auto-recordsTemporalKG.add_fact(agent, "spawned_by", parent, source="agent-lineage", invalidate_previous=False). Idempotent.kg_at(timestamp)andkg_timeline()can now reconstruct the subagent lineage tree at any past moment.
A reconnect of the MCP memory server is required for clients to see the
updated inputSchema. Full notes in CHANGELOG.md.
The project was renamed from claude-total-memory to total-agent-memory to
reflect that it works with every MCP client, not just Claude Code (Cursor,
Codex CLI, Cline, Continue, Aider, Windsurf, Gemini CLI, OpenCode β all
covered).
Nothing breaks. The old PyPI package (claude-total-memory==11.3.0) is now a
deprecation shim that auto-resolves to total-agent-memory>=12.0.0. Legacy
imports, CLI binaries, env vars, and the ~/.claude-memory/ directory keep
working through automatic migration:
| Old | New | Backward-compat |
|---|---|---|
pip install claude-total-memory |
pip install total-agent-memory |
old name still works (shim + warning) |
from claude_total_memory import β¦ |
from total_agent_memory import β¦ |
old import still works (sys.modules alias + warning) |
claude-total-memory CLI |
total-agent-memory (alias tam) |
old CLI still ships in v12 wheel |
CLAUDE_MEMORY_DIR env |
TAM_MEMORY_DIR env |
old env still respected (deprecation warning) |
~/.claude-memory/ dir |
~/.tam/ dir |
auto-migrated on first run; ~/.claude-memory becomes a symlink to ~/.tam/ so pinned scripts keep working |
Six install paths β pick one:
npx -y total-agent-memory connect claude-code # Node, zero-install
uvx total-agent-memory # Python via uv (fast)
pipx install total-agent-memory # Python via pipx (isolated)
brew install vbcherepanov/tap/total-memory # Homebrew (macOS / Linuxbrew)
docker run -p 37737:37737 -v ~/.tam:/data \
ghcr.io/vbcherepanov/total-agent-memory:12.2.0 # Docker (multi-arch amd64+arm64)
git clone https://github.com/vbcherepanov/total-agent-memory \
~/total-agent-memory && cd ~/total-agent-memory && ./install.sh # manualThe npx path also wires the MCP entry into the IDE you pass to connect <ide>:
claude-code, codex, cursor, cline, continue, aider, windsurf,
gemini-cli, opencode.
Project URLs: totalmemory.dev Β· PyPI Β· npm Β· Docker GHCR Β· GitHub Release
Full migration notes (Docker volume names kept for backward-compat, brew formula
changes, etc.) live in CHANGELOG.md. The historical sections
below (v11.1, v11.0, β¦) are preserved for reference.
Two client-reported bugs fixed (2026-05-14):
Bug #1 β orphan + duplicate graph_nodes. The graph accumulated
case-variant duplicates (Vue / vue / VUE) and type-collision
duplicates (vue/concept vs vue/technology created by different
extractors), plus orphan nodes when an edge insert failed after both
nodes were already committed. Fixed by migration 026_graph_nodes_dedup
(name_norm column, triggers, indexes), a case-insensitive UPSERT
rewrite of add_node with type-collision detection, a new atomic
GraphStore.link_pair() helper, and a one-shot cleanup tool
src/tools/merge_duplicate_nodes.py (dry-run by default).
# After upgrade migration 026 applies automatically. Then optionally:
.venv/bin/python src/tools/merge_duplicate_nodes.py --dry-run
.venv/bin/python src/tools/merge_duplicate_nodes.py --apply --add-uniqueVerified on a real production DB (8304 nodes): 102 duplicates merged, 1472 stale edges cleaned, UNIQUE constraint installed.
Bug #2 β model never calls memory_save on its own. Sonnet/Haiku
skip the priority-10 save rule when SessionStart context fades. v11.1
adds in-session nudges: a counter in ~/.claude-memory/state/
tracks writes-vs-saves per session, and hooks/post-tool-use.{sh,ps1}
emits a stdout line that Claude reads as system context on the next
turn. Soft nudge at 3 edits with 0 saves, hard at 7, and a
MEMORY_FINAL_WARNING on session stop. A new priority-10 rule
instructs the model to treat MEMORY_NUDGE as an immediate command.
Tunables: MEMORY_NUDGE_DISABLE=1 to silence; MEMORY_NUDGE_SOFT /
_HARD / _STEP to retune (defaults 3 / 7 / 3).
Test coverage: +24 graph tests, +12 nudge tests. Full details in
CHANGELOG.md.
v11.0 = production memory engine: fast deterministic memory core + async AI enrichment layer. Default mode is fast: zero LLM, zero Ollama, zero network in the save/search/recall hot path.
The codebase is now split into two layers:
src/memory_core/*β deterministic facade modules (storage, embeddings, vector_store, classifier, chunker, dedup, cache, graph_links, telemetry, health, embedding_spaces). No LLM imports allowed. Enforced bytests/test_no_llm_hot_path.py.src/ai_layer/*β every LLM-touching path (enrichment_worker, summarizer, keyword_extractor, question_generator, relation_extractor, contradiction_detector, reflection, self_improve, plus thin shims for quality_gate / coref_resolver / reranker / query_rewriter). Off-limits to memory_core.
Architecture details and full hot-path audit: docs/v11/audit.md.
MEMORY_MODE selects the runtime profile. Default is fast.
| Mode | Hot-path LLM | Async enrichment | Reranker | Embed fallback | Use when |
|---|---|---|---|---|---|
ultrafast |
off | off | off | FastEmbed only (vector index off, FTS-only) | Throughput stress / CI |
fast (default) |
off | off | off | FastEmbed only, Ollama fallback gated | Production coding-agent loop |
balanced |
off (sync) | on | off | FastEmbed only | You want LLM-derived facets, but never on the critical path |
deep |
on (sync) | on | on (when rerank=true) |
FastEmbed β Ollama ladder | v10.5 behaviour: quality gate / contradiction / coref / HyDE inline |
deep mode reproduces v10.5.0 defaults exactly. Set MEMORY_MODE=deep if you depended on synchronous quality_gate, contradiction_detector, or coref. balanced keeps the same ergonomics but moves enrichment off-thread.
Migration from v10.5: docs/v11/MIGRATION-FROM-V10.md.
Warm, in-memory SQLite, MacBook M-series, MEMORY_MODE=fast, MEMORY_ALLOW_OLLAMA_IN_HOT_PATH=false:
| metric | p50 | p95 | p99 |
|---|---|---|---|
save_fast |
6.5 | 9.0 | 27.8 |
save_fast cached |
0.3 | 0.4 | 1.1 |
search_fast |
3.7 | 4.0 | 6.2 |
cached_search |
0.0 | 0.0 | 0.0 |
llm_calls = 0, network_calls = 0 across the entire hot path. Reproduce: bin/memory-bench. CI gate: bin/memory-perf-gate. Raw artifact: docs/v11/benchmark.md.
The v10.5 native bench (benchmarks/v10_5_latency.py) re-run on v11 fast against the recorded v10.5 baseline (benchmarks/results/v10_5_latency.json):
| metric | v10.5 sync (with LLM) | v11.0 fast | speedup |
|---|---|---|---|
| save p95 | 2150.51 ms | 8.51 ms | 252Γ |
| save p99 | 2178.98 ms | 11.09 ms | 196Γ |
| recall p95 | 1424.26 ms | 5.81 ms | 245Γ |
| recall p99 | 1771.70 ms | 6.75 ms | 262Γ |
| LLM calls / save | 2-4 | 0 | gate |
| Network calls / save | 1-3 | 0 | gate |
Even versus v10.5 without LLM (23.3 ms p95), v11 fast is 2.7Γ faster β the deterministic-only stages (quality_gate probe, contradiction candidate fetch, episodic event creation, project_wiki refresh) are now fully bypassed in fast mode and queued only when MEMORY_ENRICHMENT_ENABLED=true.
Recall quality is preserved: LongMemEval R@5 = 100% on a 30-question sample; hybrid retrieval (FTS5 + dense + RRF + base graph) is identical to v10.5 except for HyDE / analyze_query LLM expansion which is opt-in via MEMORY_MODE=deep. See docs/v11/benchmark.md for the full table including LoCoMo and per-space embedding load characteristics.
memory_save_fast Β· memory_search_fast Β· memory_explain_search Β· memory_warmup Β· memory_perf_report Β· memory_rebuild_fts Β· memory_rebuild_embeddings Β· memory_eval_locomo Β· memory_eval_recall Β· memory_eval_temporal Β· memory_eval_entity_consistency Β· memory_eval_contradictions Β· memory_eval_long_context
All previous tool names (memory_save, memory_recall, ...) continue to work unchanged.
Every vector row now records embedding_provider / embedding_model / embedding_dimension / embedding_space / content_type / language. Spaces: text / code / log / config. Single Chroma backend; per-space model swap is one env flip:
MEMORY_TEXT_EMBED_MODEL=sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
MEMORY_CODE_EMBED_MODEL=jinaai/jina-embeddings-v2-base-code # optional
MEMORY_LOG_EMBED_MODEL= # falls back to TEXT
MEMORY_CONFIG_EMBED_MODEL= # falls back to TEXTOld chunks stay searchable in their space; new chunks pick up the swapped model. Backfill one space at a time via memory_rebuild_embeddings.
v10.x sections below are preserved as legacy v10.5 behaviour β still available via
MEMORY_MODE=deep. The numbers, screenshots, and benchmark blocks dated 2026-04-19 / 2026-04-25 / 2026-04-27 (v10) describe the deep-mode pipeline. v11 replaces defaults, not capabilities.
- v11.1 β graph dedup + proactive save nudges
- v11.0 β production memory engine
- The problem it solves
- 60-second demo
- Benchmarks β how it compares
- Competitor comparison
- What you get
- Architecture
- Install
- Quick start
- CLI:
lookup-memoryfor sub-agents - MCP tools reference
- TypeScript SDK
- Dashboard
- Update
- Upgrading from v8.x to v9.0
- Upgrading from v7.x to v8.0
- Ollama setup
- Configuration
- Performance tuning
- Roadmap
- Support the project
- Philosophy & license
AI coding agents have amnesia. Every new Claude Code / Codex / Cursor session starts from zero. Yesterday's architectural decisions, bug fixes, stack choices, and hard-won lessons vanish the moment you close the terminal. You re-explain the same things, re-discover the same solutions, paste the same context into every new chat.
total-agent-memory gives the agent a persistent brain β on your machine, not in someone else's cloud.
Every decision, solution, error, fact, file change, and session summary is:
- Captured β explicitly via
memory_saveor implicitly via hooks on file edits / bash errors / session end - Linked β automatically extracted into a knowledge graph (entities, relations, temporal facts)
- Searchable β 6-stage hybrid retrieval (BM25 + dense + graph + CrossEncoder + MMR + RRF fusion), 96.2% R@5 on public LongMemEval
- Private β 100% local. SQLite + FastEmbed + optional Ollama. No data leaves your machine.
You: "remember we picked pgvector over ChromaDB because of multi-tenant RLS"
Claude: β memory_save(type=decision, content="Chose pgvector over ChromaDB",
context="WHY: single Postgres, per-tenant RLS")
[3 days later, different session, possibly different project directory:]
You: "why did we pick pgvector again?"
Claude: β memory_recall(query="vector database choice")
β "Chose pgvector over ChromaDB for multi-tenant RLS. Single DB
instance, row-level security per tenant."
It's not just retrieval. It's procedural too:
You: "migrate auth middleware to JWT-only session tokens"
Claude: β workflow_predict(task_description="migrate auth middleware...")
β confidence 0.82, predicted steps:
1. read src/auth/middleware.go + tests
2. update session fixtures in tests/
3. run migration 0042
4. regenerate OpenAPI spec
similar past: wf#118 (success), wf#93 (success)
Public LongMemEval benchmark (xiaowu0162/longmemeval-cleaned, 470 questions, the dataset everyone publishes against):
R@5 (recall_any) on public LongMemEval
βββββββββββββββββββββββββββββββββββββββββ
100% ββ€
β
96.2% β€ ββββ β total-agent-memory v7.0 (LOCAL, 38.8 ms, MIT)
95.0% β€ ββββ β Mastra "Observational" (cloud)
β ββββ
β ββββ
85.4% β€ ββββ β Supermemory (cloud, $0.01/1k tok)
β ββββ
β ββββ
β ββββ
80% β€ ββββ
βββββββββββββββββββββββββββββββββββββββββββ
Reproducible: evals/longmemeval-2026-04-17.json Β· Runner: benchmarks/longmemeval_bench.py
| Question type | Count | Our R@5 |
|---|---|---|
| knowledge-update | 72 | 100.0% |
| single-session-user | 64 | 100.0% |
| multi-session | 121 | 96.7% |
| single-session-assistant | 56 | 96.4% |
| temporal-reasoning | 127 | 95.3% β bi-temporal KG pays off |
| single-session-preference | 30 | 80.0% β weakest spot |
| TOTAL | 470 | 96.2% |
Public LoCoMo benchmark (snap-research/locomo, 1986 QA across 10 long-running conversations, the dataset Mem0 / Memobase / Zep / MemMachine publish against):
LoCoMo Acc (overall, no adversarial)
βββββββββββββββββββββββββββββββββββββ
85% ββ€ ββββ β MemMachine (commercial)
β ββββ
80% β€ ββββ
β ββββ
75% β€ ββββ β Memobase
β ββββ β Zep / Graphiti
β ββββ
70% β€ ββββ
β ββββ
67% β€ ββββ β Mem0
β ββββ
β ββββ β total-agent-memory v9.0 (LOCAL, MIT, gpt-4o-mini)
60% β€ ββββ
59% β€ ββββ β total-agent-memory (0.596)
β ββββ β LangMem (0.581)
55% β€ ββββ
βββββββββββββββββββββββββββββββββββββββββββ
| Rank | System | Overall (no adv) | License |
|---|---|---|---|
| 1 | MemMachine | 0.849 | Commercial |
| 2 | Memobase | 0.758 | Apache-2.0 |
| 3 | Zep / Graphiti | 0.751 | Apache-2.0 |
| 4 | Mem0 | 0.669 | Apache-2.0 |
| 5 | total-agent-memory v9.0 | 0.596 | MIT |
| 6 | LangMem | 0.581 | MIT |
Per-category breakdown (v9.0, gpt-4o-mini gen + judge):
| Category | N | Acc | R@5 |
|---|---|---|---|
| 1 β single-hop | 282 | 0.443 | 0.514 |
| 2 β temporal | 321 | 0.564 | 0.717 |
| 3 β multi-hop | 96 | 0.490 | 0.385 |
| 4 β open-domain | 841 | 0.661 | 0.601 |
| 5 β adversarial | 446 | 0.998 β we lead | 0.421 |
| Overall (no adv) | 1540 | 0.596 | 0.622 |
We lead on adversarial (0.998 vs Memobase 0.90) thanks to judge-weighted ensemble + abstain logic. Top-3 leaders win on cat 1/2 via subject-aware profile retrieval β that's our v10 target.
Reproducible: benchmarks/results/v9_diag_v1_*.json Β· Runner: benchmarks/locomo_bench_llm.py (15 ablation flags). Cost on gpt-4o-mini: ~$5 for full 1986 QA run with ensemble=3.
p50 (warm) β 0.065 ms
p95 (warm) ββ 2.97 ms
LongMemEval βββββ 38.8 ms/query β includes embedding + CrossEncoder rerank
p50 (cold) ββββββββββββββββββββββββββββββββββββββββββ 1333 ms β first query after process start
Warm / cold reproducible from evals/results-2026-04-17.json.
We're not replacing chatbot memory β we're occupying the coding-agent + MCP + local niche.
| mem0 | Letta | Zep | Supermemory | Cognee | LangMem | total-agent-memory | |
|---|---|---|---|---|---|---|---|
| Funding / status | $24M YC | $10M seed | $12M seed | $2.6M seed | $7.5M seed | in LangChain | self-funded OSS |
| Runs 100% local | π‘ | β | π‘ | β | π‘ | π‘ | β |
| MCP-native | via SDK | β | π‘ Graphiti | π‘ | β | β | β 60+ tools |
| Knowledge graph | π $249/mo | β | β | β | β | β | β |
Temporal facts (kg_at) |
β | β | β | β | π‘ | β | β |
| Procedural memory | β | β | β | β | β | π‘ | β
workflow_predict |
| Cross-project analogy | β | β | β | β | β | β | β
analogize |
| Self-improving rules | β | β | β | β | π‘ | β | β
learn_error |
| AST codebase ingest | β | β | β | β | π‘ | β | β tree-sitter 9 lang |
| Pre-edit risk warnings | β | β | β | β | β | β | β
file_context |
| 3D WebGL graph viewer | β | β | π‘ | β | β | β | β |
| Price for graph features | $249/mo | free | cloud | usage | free | free | free |
Full side-by-side with pricing, latency, accuracy, "when to pick each" β docs/vs-competitors.md.
| Capability | Tool | One-liner |
|---|---|---|
| π§ Procedural memory | workflow_predict / workflow_track |
"How did I solve this last time?" β predicts steps with confidence |
| π Cross-project analogy | analogize |
"Was there something like this in another repo?" β Jaccard + Dempster-Shafer |
file_context |
Surfaces past errors / hot spots on the file you're about to edit | |
| π‘ Self-improving rules | learn_error + self_rules_context |
Bash failures β patterns β auto-consolidated behavioral rules at Nβ₯3 |
| π° Temporal facts | kg_add_fact / kg_at |
Append-only KG with valid_from/valid_to β query what was true at any point |
| π― Task workflow phases | classify_task / phase_transition |
Automatic L1-L4 complexity classification, state machine across van/plan/creative/build/reflect/archive |
| π§© Structured decisions | save_decision |
Options + criteria matrix + rationale + discarded β searchable decision records with per-criterion embeddings |
| πΈ Token-efficient retrieval | memory_recall(mode="index") + memory_get |
3-layer workflow: compact IDs β timeline β batched full fetch. ~83% token saving on typical queries |
- 6-stage hybrid retrieval (BM25 + dense + fuzzy + graph + CrossEncoder + MMR, RRF fusion) β 96.2% R@5 public
- Multi-representation embeddings β each record embedded as raw + summary + keywords + questions + compressed
- AST codebase ingest β tree-sitter across 9 languages (Python, TS/JS, Go, Rust, Java, C/C++, Ruby, C#)
- Auto-reflection pipeline β
memory_saveβ LaunchAgent file-watch β graph edges appear ~30 s later - rtk-style content filters β strip noise from pytest / cargo / git / docker logs while preserving URLs, paths, code
- 3D WebGL knowledge graph viewer β 3,500+ nodes, 120,000+ edges, click-to-focus, filters
- Hive plot & adjacency matrix β alternate graph views sorted by node type
- A2A protocol β memory shared between multiple agents (backend + frontend + mobile in a team)
design-exploreskill β drop-in Claude Code skill that walks L3-L4 tasks through options β criteria matrix βsave_decisionbefore code (seeexamples/skills/design-explore/SKILL.md)<private>...</private>inline redaction in any saved content- Cloud LLM/embed providers with per-phase routing (OpenAI / Anthropic / OpenRouter / Together / Groq / Cohere / any OpenAI-compat)
activeContext.mdObsidian projection for human-readable session state- Phase-scoped rules (
self_rules_context(phase="build")) β ~70% token reduction
βββββββββββββββββββββββββββββββββββββββββββββββββββ
β Your AI coding agent β
β (Claude Code Β· Codex CLI Β· Cursor Β· any MCP) β
ββββββββββββββββββββββββ¬βββββββββββββββββββββββββββ
β MCP (stdio or HTTP)
β 60+ tools
ββββββββββββββββββββββββΌβββββββββββββββββββββββββββ
β total-agent-memory server β
β ββββββββββββββββ ββββββββββββββββββββββ β
β β memory_save β β memory_recall β β
β β memory_upd β β 6-stage pipeline: β β
β β kg_add_fact β β BM25 (FTS5) β β
β β learn_error β β + dense (FastEmbed)β β
β β file_context β β + fuzzy β β
β β workflow_* β β + graph expansion β β
β β analogize β β + CrossEncoder β β β
β β ingest_code β β + MMR diversity β β β
β ββββββββ¬ββββββββ β β RRF fusion β β
β β ββββββββββββ¬βββββββββββ β
βββββββββββββΌββββββββββββββββββββββΌβββββββββββββββββ
β β
βββββββββββββΌββββββββββββββββββββββΌβββββββββββββββββ
β Storage β
β ββββββββββββββ ββββββββββββββ βββββββββββββββ β
β β SQLite β β FastEmbed β β Ollama β β
β β + FTS5 β β HNSW β β (optional) β β
β β + KG tbls β β binary-q β β qwen2.5-7b β β
β ββββββββββββββ ββββββββββββββ βββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
β file-watch + debounce
βββββββββββββΌβββββββββββββββββββββββββββββββββββββ
β Auto-reflection pipeline (LaunchAgent) β
β triple_extraction β deep_enrichment β reprs β
β (async, 10s debounce, drains in background) β
βββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββΌββββββββββββββββββββββββββββββββββββββ
β Dashboard (localhost:37737) β
β / - stats, savings, queue depths β
β /graph/live - 3D WebGL force-graph β
β /graph/hive - D3 hive plot β
β /graph/matrix - adjacency matrix β
βββββββββββββββββββββββββββββββββββββββββββββββββββ
β CrossEncoder + MMR are on-demand via `rerank=true` / `diverse=true`
| Channel | Command | What it does |
|---|---|---|
| npx (Node) | npx -y total-agent-memory connect claude-code |
Zero-install. Bootstraps a Python venv in ~/.tam/.venv via uv (or python3 fallback), pulls the PyPI server, wires the MCP entry into your IDE. Replace claude-code with codex / cursor / cline / continue / aider / windsurf / gemini-cli / opencode. |
| uvx (Python via uv) | uvx total-agent-memory |
One-off run with no install. Best for trying without commitment. |
| pipx (Python isolated) | pipx install total-agent-memory |
Installs the total-agent-memory, tam, tam-lookup, lookup-memory binaries on PATH in an isolated venv. |
| brew (macOS / Linuxbrew) | brew install vbcherepanov/tap/total-memory |
Bottle-style install with tam and legacy claude-total-memory symlinks. |
| Docker (multi-arch) | docker run -p 37737:37737 -v ~/.tam:/data ghcr.io/vbcherepanov/total-agent-memory:12.2.0 |
Containerized (linux/amd64 + linux/arm64). Dashboard on :37737. |
| Manual clone | git clone https://github.com/vbcherepanov/total-agent-memory ~/total-agent-memory && cd ~/total-agent-memory && ./install.sh --ide claude-code |
Full control. Lets you hack on the server, run benchmarks, and pick which background services to enable. Detailed walkthrough below. |
All six channels land at the same MCP server. The npx and ./install.sh paths
additionally configure IDE-specific MCP entries and hooks. Other channels start
the server bare β you wire the IDE afterwards (see docs/installation.md).
Upgrade from v11.x? Whatever channel you pick will auto-migrate
~/.claude-memory/ β ~/.tam/ on first run and keep a symlink for backward
compat. No manual data move required.
Two manual paths. Same 60+ tools, same dashboard, different deployment shapes.
The same MCP server, same tools, same protocol β different installation
locations and hook wiring per IDE. The installer (install.sh --ide <name>)
automates all of it.
| IDE | Skill API | Hook API | Sub-agents | Install command |
|---|---|---|---|---|
| Claude Code | β | β full | β | ./install.sh --ide claude-code |
| Codex CLI | β | β | β | ./install.sh --ide codex |
| Cursor | rules-pane | β | composer | ./install.sh --ide cursor |
| Cline (VS Code) | .clinerules/ |
β | β | ./install.sh --ide cline |
| Continue | rules file | β | β | ./install.sh --ide continue |
| Aider | .aider.conf.yml read |
β ΒΉ | β | ./install.sh --ide aider |
| Windsurf | .windsurfrules |
β | cascade | ./install.sh --ide windsurf |
| Gemini CLI | .gemini/rules/ |
β | ./install.sh --ide gemini-cli |
|
| OpenCode | .opencode/skills/ |
β | custom | ./install.sh --ide opencode |
ΒΉ Aider has no MCP yet β the bridge is via lookup_memory.sh /
save_memory.sh shell scripts.
Full per-IDE setup, manual fallbacks, and template snippets:
skills/memory-protocol/references/ide-setup.md.
| OS | Command | Background services |
|---|---|---|
| macOS 10.15+ | ./install.sh --ide claude-code |
LaunchAgents (launchctl) |
| Linux (Ubuntu 22.04+, Debian 12+, Fedora 38+) | ./install.sh --ide claude-code |
systemd --user |
| WSL2 (Windows 11 + Ubuntu/Debian) | ./install.sh --ide claude-code |
systemd --user β requires /etc/wsl.conf with [boot] systemd=true; otherwise falls back to shell-loop autostart |
| Windows 10/11 native | .\install.ps1 -Ide claude-code |
Task Scheduler |
Full per-platform walkthrough, WSL2 Windows-host-vs-WSL IDE nuances, the
wsl -e MCP-command pattern, IDE coverage matrix, and uninstall/diagnostic
flows: docs/installation.md.
git clone https://github.com/vbcherepanov/total-agent-memory.git ~/total-agent-memory
cd ~/total-agent-memory
bash install.sh --ide claude-code # or: cursor | gemini-cli | opencode | codexThe installer:
- Clones + creates
~/total-agent-memory/.venv/ - Installs deps from
requirements.txtandrequirements-dev.txt - Pre-downloads the FastEmbed multilingual MiniLM model
- Registers the MCP server via
claude mcp add-json memory ...(stored in~/.claude.json, the canonical store Claude Code actually reads) - Copies all hooks (
session-*,user-prompt-submit.sh,post-tool-use.sh,pre-edit.sh,on-bash-error.sh, etc.) into~/.claude/hooks/and registers them in~/.claude/settings.json - Grants
permissions.allowfor 20+mcp__memory__*tools so hook-driven calls don't prompt for confirmation - Installs background services for the current OS:
- macOS β 4 LaunchAgents (
reflection,orphan-backfill,check-updates,dashboard) under~/Library/LaunchAgents/ - Linux / WSL2 β 7 systemd
--userunits (*.service,*.timer,*.path) under~/.config/systemd/user/; gracefully degrades ifsystemd --useris unavailable (WSL without/etc/wsl.conf)
- macOS β 4 LaunchAgents (
- Applies all migrations to a fresh
memory.db - Starts the dashboard at
http://127.0.0.1:37737
Restart Claude Code β /mcp β memory should show Connected with 60+ tools.
git clone https://github.com/vbcherepanov/total-agent-memory.git $HOME\total-agent-memory
cd $HOME\total-agent-memory
powershell -ExecutionPolicy Bypass -File install.ps1 -Ide claude-codeSame 9 steps as Unix, but:
- MCP config path is
%USERPROFILE%\.claude\settings.json(or.cursor\mcp.json, etc.) - Hooks copied to
%USERPROFILE%\.claude\hooks\β.ps1versions (auto-capture, memory-trigger, user-prompt-submit, post-tool-use, pre-edit, on-bash-error, session-start/end, on-stop, codex-notify) - Background services via Task Scheduler:
total-agent-memory-reflectionβ every 5 min (no native FileSystemWatcher equivalent)total-agent-memory-orphan-backfillβ daily 00:00 + 6h repetitiontotal-agent-memory-check-updatesβ weekly Mon 09:00TotalAgentMemoryDashboardβ AtLogon
All installers preserve ~/.tam/memory.db (legacy installs: ~/.claude-memory/memory.db) and your config files; only services + hook registrations are removed.
./install.sh --uninstall # macOS/Linux/WSL2 β removes LaunchAgents OR systemd units
.\install.ps1 -Uninstall # Windows β unregisters Scheduled Tasks + cleans settings.jsonOne-shot health check β prints β/β for each subsystem (OS detect, venv, MCP import, services, dashboard HTTP, Ollama, DB migrations):
bash scripts/diagnose.sh # macOS / Linux / WSL2
.\scripts\diagnose.ps1 # WindowsExit code 0 = all green, 1 = something broken.
git clone https://github.com/vbcherepanov/total-agent-memory.git
cd total-agent-memory
bash install-docker.sh --with-composeBrings up 5 services:
| Service | Role | Exposed |
|---|---|---|
mcp |
MCP server (HTTP transport) | 127.0.0.1:3737/mcp |
dashboard |
Web UI | 127.0.0.1:37737 |
ollama |
Local LLM runtime | 127.0.0.1:11434 |
reflection |
File-watch queue drainer | internal |
scheduler |
Ofelia cron (backfill + update check) | internal |
First run pulls qwen2.5-coder:7b (~4.7 GB) + nomic-embed-text (~275 MB) β 5β10 min cold start.
GPU note: Docker Desktop on macOS doesn't forward Metal. Native install is faster on Mac. On Linux with NVIDIA Container Toolkit, uncomment the deploy.resources.reservations.devices block in docker-compose.yml.
memory_save(content="install works", type="fact")
memory_stats()
Open http://127.0.0.1:37737/ β dashboard, knowledge graph, token savings.
v11 default is
MEMORY_MODE=fast. No LLM, no Ollama, no network in the save/search/recall hot path. To restore v10.5 synchronous-LLM behaviour setexport MEMORY_MODE=deep. Mode switching:LAUNCH.mdΒ§ Tuning.
Once installed, in any Claude Code / Codex CLI / Cursor session:
1. Resume where you left off (auto on session start, but you can also invoke)
session_init(project="my-api")
β {summary: "yesterday: migrated auth middleware to JWT",
next_steps: ["update OpenAPI spec", "notify frontend team"],
pitfalls: ["don't revert migration 0042 β dev DB already migrated"]}
2. Save a decision (agent does this automatically after hooks are registered)
memory_save(
type="decision",
content="Chose pgvector over ChromaDB for multi-tenant RLS",
context="WHY: single Postgres instance, per-tenant row-level security",
project="my-api",
tags=["database", "multi-tenant"],
)
3. Recall across sessions / projects
memory_recall(query="vector database choice", project="my-api", limit=5)
β RRF-fused results from 6 retrieval tiers
4. Predict approach before starting a task
workflow_predict(task_description="migrate auth middleware to JWT-only")
β {confidence: 0.82, predicted_steps: [...], similar_past: [...]}
5. Check a file's risk before editing (auto via hook, also manual)
file_context(path="/Users/me/my-api/src/auth/middleware.go")
β {risk_score: 0.71, warnings: ["last 3 edits caused test failures in ..."], hot_spots: [...]}
6. Get full stats
memory_stats()
β {sessions: 515, knowledge: {active: 1859, ...}, storage_mb: 119.5, ...}
New in v9. Bash-friendly memory search for sub-agent workflows where launching the full MCP server would be overkill (e.g. Bash(lookup-memory "fix slow Wave query") from inside a Claude Code agent prompt).
Two equivalent commands ship with the package (registered as [project.scripts] entries β installed automatically by ./install.sh or ./update.sh):
lookup-memory "Caroline researched" # human-readable bullets
tam-lookup "Caroline researched" # short canonical alias
ctm-lookup "Caroline researched" # legacy alias (v11.x and earlier)
lookup-memory --project myproj --limit 5 "auth flow"
lookup-memory --type solution --tag reusable "fix bug"
lookup-memory --json "claude code hooks" # structured stdout for pipingHow it works: opens the same $TAM_MEMORY_DIR/memory.db (legacy: $CLAUDE_MEMORY_DIR/memory.db) the running MCP server uses β BM25 ranking via FTS5 β falls back to LIKE on older DBs. Zero deps beyond the package. No Ollama, no rag_chat.py, no ChromaDB required for the CLI path. Works on macOS, Linux, Windows.
$ lookup-memory --project locomo_0 --limit 2 "adoption"
1. [synthesized_fact|locomo_0] Caroline is researching adoption agencies.
2. [synthesized_fact|locomo_0] Melanie congratulates Caroline on her adoption.
Why three names? lookup-memory matches the legacy bash script that older docs and sub-agent prompts reference (~/claude-memory-server/ollama/lookup_memory.sh, legacy install path). tam-lookup is the new project-prefixed canonical form (v12+). ctm-lookup is the v11.x prefixed name, kept as a legacy alias. All three call into total_agent_memory.lookup:main (v11.x and earlier: claude_total_memory.lookup:main, still importable via deprecation shim).
Migration note: v7/v8 docs that pointed at ~/claude-memory-server/ollama/lookup_memory.sh should be updated β the bash version still works for users with a manual install, but ./install.sh / ./update.sh clients on v9+ now get lookup-memory (and tam-lookup) on PATH directly via the package's [project.scripts] entry.
Core retrieval (9): memory_save, memory_recall, memory_get, memory_update, memory_delete, memory_history, memory_extract_session, memory_relate, memory_search_by_tag
Knowledge graph (8): kg_add_fact, kg_invalidate_fact, kg_at, kg_timeline, memory_graph, memory_graph_index, memory_graph_stats, memory_concepts
Episodic / session (6): memory_episode_save, memory_episode_recall, session_init, session_end, memory_timeline, memory_history
Procedural / workflows (4): workflow_learn, workflow_predict, workflow_track, classify_task
Task phases (4, v8.0): task_create, phase_transition, task_phases_list, complete_task
Decisions (1, v8.0): save_decision
Intents (3, v8.0): save_intent, list_intents, search_intents
Self-improvement (5): self_rules, self_rules_context, self_insight, self_patterns, self_error_log, rule_set_phase (v8.0)
Pre-edit guard / error learning (3): file_context, learn_error, self_error_log
Analogy / cross-project (2): analogize, ingest_codebase
Reflection / consolidation (4): memory_reflect_now, memory_consolidate, memory_forget, memory_observe
Stats / export (5): memory_stats, memory_export, memory_self_assess, memory_context_build, benchmark
Skills (3): memory_skill_get, memory_skill_update, file_context
Total: 60+ tools. Each is documented below with input schema and example.
When you only know the topic but not which records matter, use progressive disclosure:
- Index β
memory_recall(query="auth refactor", mode="index", limit=20)β ~2 KB of{id, title, score, type, project, created_at}per hit. No content, no cognitive expansion. - Timeline β
memory_recall(query="auth refactor", mode="timeline", limit=5, neighbors=2)β top-K hits padded with Β±neighbours from the same session, sorted chronologically. - Fetch β
memory_get(ids=[3622, 3606])β full content for ONLY the IDs you chose (max 50 per call,detail="summary"truncates to 150 chars).
Typical saving: 80-90 %% fewer tokens vs memory_recall(detail="full", limit=20) when you end up using 2-3 of the 20 hits.
Core memory (15)
memory_recall Β· memory_get Β· memory_save Β· memory_update Β· memory_delete Β· memory_search_by_tag Β· memory_history Β· memory_timeline Β· memory_stats Β· memory_consolidate Β· memory_export Β· memory_forget Β· memory_relate Β· memory_extract_session Β· memory_observe
Knowledge graph (6)
memory_graph Β· memory_graph_index Β· memory_graph_stats Β· memory_concepts Β· memory_associate Β· memory_context_build
Episodic memory & skills (4)
memory_episode_save Β· memory_episode_recall Β· memory_skill_get Β· memory_skill_update
Reflection & self-improvement (7)
memory_reflect_now Β· memory_self_assess Β· self_error_log Β· self_insight Β· self_patterns Β· self_reflect Β· self_rules Β· self_rules_context
Temporal knowledge graph (4)
kg_add_fact Β· kg_invalidate_fact Β· kg_at Β· kg_timeline
Procedural memory (3)
workflow_learn Β· workflow_predict Β· workflow_track
Pre-flight guards & automation (8)
file_context (pre-edit risk scoring) Β· learn_error (auto-consolidating error capture) Β· session_init / session_end Β· ingest_codebase (AST, 9 languages) Β· analogize (cross-project analogy) Β· benchmark (regression gate)
Full JSON schemas: python -m total_agent_memory.cli tools --json or open the dashboard at localhost:37737/tools.
For Node.js / browser / any TS project that isn't an MCP-native agent:
npm i @vbch/total-agent-memory-clientimport { connectStdio } from "@vbch/total-agent-memory-client";
const memory = await connectStdio();
await memory.save({
type: "decision",
content: "Picked pgvector over ChromaDB for multi-tenant RLS",
project: "my-api",
});
const hits = await memory.recallFlat({
query: "vector database choice",
project: "my-api",
limit: 5,
});Also ships LangChain adapter example, procedural-memory integration, and HTTP transport (for team / serverless setups).
Package repo: github.com/vbcherepanov/total-agent-memory-client
/β live stats, queue depths, token savings from filters, representation coverage/graph/liveβ 3D WebGL force-graph (Three.js), 3,500+ nodes / 120,000+ edges, click-to-focus, type filters, search/graph/hiveβ D3 hive plot, nodes on radial axes by type/graph/matrixβ canvas adjacency matrix sorted by type/knowledgeβ paginated knowledge browser, tag filters/sessionsβ last 50 sessions with summaries + next steps/errorsβ consolidated error patterns/rulesβ active behavioral rules + fire counts- SSE-pill in header β live reconnect indicator
Screenshots β docs/screenshots/ (coming)
cd ~/total-agent-memory # legacy clones: ~/claude-memory-server
./update.sh7 stages:
- Pre-flight β disk check + DB snapshot (keeps last 7)
- Source pull (git) or SHA-256-verified tarball
- Deps β
pip install -r requirements.txt -r requirements-dev.txt(only if hash changed) - Full pytest suite β aborts with snapshot if red
- Schema migrations β
python src/tools/version_status.py - LaunchAgent reload β reflection + backfill + update-check
- MCP reconnect notification β in-app
/mcpβmemoryβ Reconnect
Manual equivalent:
cd ~/total-agent-memory # legacy clones: ~/claude-memory-server
git pull
.venv/bin/pip install -r requirements.txt -r requirements-dev.txt
.venv/bin/python src/tools/version_status.py
.venv/bin/python -m pytest tests/
# in Claude Code: /mcp β memory β Reconnectv9 is backward compatible. Existing v8 calls and DB schema work unchanged β v9 is an infra release that adds pluggable backends, a public CLI for sub-agents, and LoCoMo benchmark wiring. Nothing is forcibly enabled.
cd ~/total-agent-memory && ./update.sh # legacy clones: ~/claude-memory-server
# pulls v9 src, installs new entry-points (tam, tam-lookup, lookup-memory; legacy: ctm-lookup),
# keeps existing memory.db untouched.After upgrade, verify the new CLI is on PATH:
lookup-memory --limit 1 "any-query-from-your-history"lookup-memory/tam-lookup/ctm-lookup(legacy) CLI now installed alongsidetotal-agent-memoryMCP server (registered as[project.scripts]so./install.shand./update.shput them on PATH automatically). Sub-agent prompts that reference the legacy~/claude-memory-server/ollama/lookup_memory.shscript keep working; new prompts should prefer the package-installed name.- Embedding backends stay on
fastembedby default. Switch viaV9_EMBED_BACKEND=openai-3-large(setMEMORY_EMBED_API_KEY) β costs ~$0.10/5k rows for re-embed, expected R@5 lift on conversational data. - Reranker backend stays on
ce-marcoby default.V9_RERANKER_BACKEND=bge-v2-m3(oroff) switches at runtime. - Subject-aware retrieval is opt-in via
--subject-awareinbenchmarks/locomo_bench_llm.py. Future: surface as MCP tool flag. - No migrations. Schema unchanged from v8.
- Re-embed (only if switching embedding model, otherwise skip):
python -m scripts.reembed --backend openai-3-large --confirm
- Old bash sub-agent prompts that hardcode
~/claude-memory-server/ollama/lookup_memory.sh "query"will keep working. To ride the new package install, replace withlookup-memory "query".
None. All v8 MCP tools, env vars, hooks, and DB tables behave identically.
v8.0 is backward compatible β your existing v7 installation keeps working unchanged. All new features are opt-in via MCP tool calls or env vars.
cd ~/total-agent-memory && ./update.sh # legacy clones: ~/claude-memory-server
# Applies migrations 011-013 idempotently, restarts LaunchAgents, updates dependenciesThen restart Claude Code: /mcp restart memory.
- Migrations 011β013 apply on MCP startup (privacy_counters, task_phases, intents). Zero-downtime, idempotent.
- Existing
memory_savecalls keep working β they now additionally strip<private>...</private>sections if present. - Existing
memory_recallcalls keep working β default mode is still"search". Newmode="index"is opt-in. - Existing
session_endcalls keep working βauto_compress=Falseby default. Passauto_compress=Trueto opt in. - Existing
self_rules_contextcalls keep working β default returns all rules (no phase filter).
1. Cloud providers (only if you want to replace/augment Ollama):
export MEMORY_LLM_PROVIDER=openai # or "anthropic"
export MEMORY_LLM_API_KEY=sk-...
export MEMORY_LLM_MODEL=gpt-4o-mini # or "claude-haiku-4-5"See Cloud providers for OpenRouter / per-phase routing / Cohere examples.
2. Install additional hooks (for UserPromptSubmit capture + citation):
./install.sh --ide claude-code # re-run installer; it now registers user-prompt-submit.sh hookThe hook is additive β existing hooks keep working.
3. activeContext.md Obsidian integration (if you want markdown projection):
export MEMORY_ACTIVECONTEXT_VAULT=~/Documents/project/Projects # default
# Disable: export MEMORY_ACTIVECONTEXT_DISABLE=1Each session_end writes <vault>/<project>/activeContext.md.
None. All v7 MCP tool signatures are preserved. New parameters are optional with safe defaults.
If you switch to a cloud embedding provider (MEMORY_EMBED_PROVIDER=openai/cohere), the server will refuse to start if existing DB embeddings have a different dimension than the new provider returns. This is deliberate β it prevents silent data corruption.
Either:
- Keep
MEMORY_EMBED_PROVIDER=fastembed(default 384d) and only change the LLM provider, OR - Re-embed the DB:
python src/tools/reembed.py --provider openai --model text-embedding-3-small
Quick reference β see full docs in MCP tools reference:
| Tool | Purpose |
|---|---|
classify_task(description) |
Returns {level 1-4, suggested_phases, estimated_tokens} |
task_create(task_id, description) |
Starts state machine in "van" phase |
phase_transition(task_id, new_phase, artifacts?) |
Moves task through van/plan/creative/build/reflect/archive |
task_phases_list(task_id) |
Chronological phase history |
save_decision(title, options, criteria_matrix, selected, rationale, ...) |
Structured decision with per-criterion indexing |
memory_get(ids, detail) |
Batched full-content fetch for IDs from memory_recall(mode="index") |
save_intent / list_intents / search_intents |
UserPromptSubmit-captured prompts |
rule_set_phase(rule_id, phase) |
Tag a rule for phase-scoped loading |
Extended tools:
memory_recall(mode="index"|"timeline", decisions_only=False, ...)β 3-layer token-efficient workflowsession_end(auto_compress=True, transcript=None, ...)β LLM-generated summaryself_rules_context(phase="build"|"plan"|...)β phase filtersave_knowledge(...)β now strips<private>...</private>sections automatically
v8.0 doesn't remove any v7 functionality. If you hit an issue, you can:
-
Set env var to revert behaviour:
export MEMORY_LLM_PROVIDER=ollama # revert to local LLM export MEMORY_EMBED_PROVIDER=fastembed # revert to local embeddings export MEMORY_ACTIVECONTEXT_DISABLE=1 # disable markdown projection export MEMORY_POST_TOOL_CAPTURE=0 # disable opt-in capture (default anyway)
-
Migrations 011/012/013 are additive (no
DROP/ALTERon existing tables), so DB downgrade is not destructive β old code continues reading older tables. -
Worst case:
git checkout v7.0.0 && ./update.sh --skip-migrations.
Without Ollama: works fully β raw content is saved, retrieval via BM25 + FastEmbed dense embeddings.
With Ollama: you also get LLM-generated summaries, keywords, question-forms, compressed representations, and deep enrichment (entities, intent, topics).
brew install ollama # or: curl -fsSL https://ollama.com/install.sh | sh
ollama serve &
ollama pull qwen2.5-coder:7b # default β best quality/speed on M-series
ollama pull nomic-embed-text # optional, alternative embedderUse OpenAI, Anthropic, or any OpenAI-compat endpoint (OpenRouter, Together, Groq, DeepSeek, LM Studio, llama.cpp) instead of local Ollama.
OpenAI:
export MEMORY_LLM_PROVIDER=openai
export MEMORY_LLM_API_KEY=sk-...
export MEMORY_LLM_MODEL=gpt-4o-miniAnthropic:
export MEMORY_LLM_PROVIDER=anthropic
export MEMORY_LLM_API_KEY=sk-ant-...
export MEMORY_LLM_MODEL=claude-haiku-4-5OpenRouter (100+ models via one endpoint):
export MEMORY_LLM_PROVIDER=openai
export MEMORY_LLM_API_BASE=https://openrouter.ai/api/v1
export MEMORY_LLM_API_KEY=sk-or-...
export MEMORY_LLM_MODEL=anthropic/claude-haiku-4.5Per-phase routing (cheap model for bulk, quality for compression):
export MEMORY_TRIPLE_PROVIDER=openai
export MEMORY_TRIPLE_MODEL=gpt-4o-mini
export MEMORY_ENRICH_PROVIDER=anthropic
export MEMORY_ENRICH_MODEL=claude-haiku-4-5Embeddings (dimension must match existing DB or re-embed required):
export MEMORY_EMBED_PROVIDER=openai
export MEMORY_EMBED_MODEL=text-embedding-3-small # 1536d
# or Cohere:
export MEMORY_EMBED_PROVIDER=cohere
export MEMORY_EMBED_API_KEY=...| Model | Size | Use case |
|---|---|---|
qwen2.5-coder:7b |
4.7 GB | default β best quality/speed ratio |
qwen2.5-coder:32b |
19 GB | highest quality, needs 32 GB+ RAM |
llama3.1:8b |
4.9 GB | general-purpose alternative |
phi3:mini |
2.3 GB | low-RAM machines |
Environment variables (all optional):
| Variable | Default | Purpose |
|---|---|---|
MEMORY_MODE |
fast |
ultrafast|fast|balanced|deep. Selects hot-path profile. See Modes. |
MEMORY_USE_LLM_IN_HOT_PATH |
false |
Master switch for sync LLM stages in save_knowledge / Recall.search. MEMORY_MODE=deep flips this to true. |
MEMORY_ALLOW_OLLAMA_IN_HOT_PATH |
false |
Re-enables the silent FastEmbed β Ollama fallback ladder when FastEmbed is unavailable. |
MEMORY_RERANK_ENABLED |
false |
Honour caller's rerank=true. When false, CrossEncoder rerank is hard-disabled even if a tool call requests it. |
MEMORY_ENRICHMENT_ENABLED |
false |
Run the async enrichment worker. Default-ON in balanced / deep. |
MEMORY_TEXT_EMBED_MODEL |
sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2 |
Model for embedding_space=text. |
MEMORY_CODE_EMBED_MODEL |
empty β falls back to TEXT model | Model for embedding_space=code. The row still records space=code so a future swap is config-only. |
MEMORY_LOG_EMBED_MODEL |
empty β TEXT | Model for embedding_space=log. |
MEMORY_CONFIG_EMBED_MODEL |
empty β TEXT | Model for embedding_space=config. |
MEMORY_DEFAULT_EMBEDDING_SPACE |
text |
Space for unclassified content. |
| Variable | Default | Purpose |
|---|---|---|
MEMORY_DB |
~/.tam/memory.db (legacy installs: ~/.claude-memory/memory.db) |
SQLite location |
MEMORY_LLM_ENABLED |
auto |
auto|true|false|force β LLM enrichment toggle |
MEMORY_LLM_MODEL |
qwen2.5-coder:7b |
Ollama model for enrichment |
MEMORY_LLM_PROBE_TTL_SEC |
60 |
Cache TTL for Ollama availability probe |
MEMORY_LLM_TIMEOUT_SEC |
60 |
Global fallback timeout for Ollama requests (s) |
MEMORY_TRIPLE_TIMEOUT_SEC |
30 |
Timeout for deep triple extraction (s) |
MEMORY_ENRICH_TIMEOUT_SEC |
45 |
Timeout for deep enrichment (s) |
MEMORY_REPR_TIMEOUT_SEC |
60 |
Timeout for representation generation (s) |
MEMORY_TRIPLE_MAX_PREDICT |
2048 |
num_predict cap for triple extraction |
OLLAMA_URL |
http://localhost:11434 |
Ollama endpoint |
MEMORY_EMBED_MODE |
fastembed |
fastembed|sentence-transformers|ollama |
DASHBOARD_PORT |
37737 |
HTTP dashboard port |
MEMORY_MCP_PORT |
3737 |
HTTP MCP transport port (Docker path) |
MEMORY_ASYNC_ENRICHMENT |
false |
v10.1 β move quality gate / contradiction / entity dedup / episodic / wiki to a background worker. See Performance tuning |
MEMORY_ENRICH_TICK_SEC |
0.1 |
Worker tick interval (clamp 0.01..5) |
MEMORY_ENRICH_BATCH |
5 |
Rows claimed per tick (clamp 1..50) |
MEMORY_ENRICH_MAX_ATTEMPTS |
3 |
Retries before flipping a row to failed |
MEMORY_ENRICH_STALE_AFTER_SEC |
60 |
Seconds before a processing row is reclaimed (worker crash recovery) |
CPU-only / WSL hosts: if Ollama keeps timing out, lower
MEMORY_TRIPLE_MAX_PREDICTbefore raising timeouts.install-codex.shwrites conservative defaults automatically. For 30-40s save latency on WSL2 β setMEMORY_ASYNC_ENRICHMENT=trueβ see below.
Full config: see total_agent_memory/config.py.
When MEMORY_MODE=fast (default):
| metric | p50 | p95 | p99 |
|---|---|---|---|
save_fast |
6.2 | 8.9 | 11.4 |
save_fast cached |
0.3 | 0.4 | 1.4 |
search_fast |
3.4 | 4.7 | 6.0 |
cached_search |
3.1 | 3.4 | 3.6 |
llm_calls=0, network_calls=0. Reproduce: ./bin/memory-bench. Regression gate: ./bin/memory-perf-gate. Architecture rationale and per-stage audit: docs/v11/audit.md. Raw bench artifact: docs/v11/benchmark.md.
If your numbers do not match the table, run ./bin/memory-bench --warmup first β cold FastEmbed import dominates the first call.
The synchronous v10 hot path runs five LLM-bound stages inline so a drop verdict can block the INSERT and a contradiction supersede commits in the same transaction. On macOS with a warm Ollama that's ~340 ms median; on a WSL2 box without GPU/CoreML each LLM round-trip can stretch the same call into 30β40 seconds.
v10.1 ships an opt-in inbox/outbox worker that moves the heavy stages out of band:
sync : privacy β canonical_tags β INSERT β embed β enqueue β return
worker : quality_gate β entity_dedup_audit β contradiction β episodic β wiki
Enable it in your env:
export MEMORY_ASYNC_ENRICHMENT=true
# Optional knobs (defaults shown):
export MEMORY_ENRICH_TICK_SEC=0.1
export MEMORY_ENRICH_BATCH=5
export MEMORY_ENRICH_MAX_ATTEMPTS=3
export MEMORY_ENRICH_STALE_AFTER_SEC=60Restart the MCP server. A background daemon thread now consumes enrichment_queue; you can watch it on the dashboard panel β‘ v10.1 enrichment worker.
memory_save latency:
| min | p50 | p95 | p99 | max | mean | |
|---|---|---|---|---|---|---|
| sync (default) | 17.5 ms | 25.3 ms | 2150.5 ms | 2179.0 ms | 2186.1 ms | 348.0 ms |
async (MEMORY_ASYNC_ENRICHMENT=true) |
18.1 ms | 22.3 ms | 26.7 ms | 27.4 ms | 27.5 ms | 22.7 ms |
memory_recall latency: p50 β 3-5 ms in both modes (steady state),
with cold-cache p95 outliers on the first warmup hit.
p95 collapses 80Γ with async (2150 ms β 27 ms). On WSL2 with a
slow Ollama, the same shape holds β sync p95 of 30-40 s becomes
async p95 of ~300-1000 ms (LLM moves out of the hot path entirely).
Reproduce: ./.venv/bin/python benchmarks/v10_5_latency.py --rounds 2 --with-llm.
Full report: benchmarks/v10_5_results.md.
When async is on, a quality_gate drop no longer prevents the INSERT (we already committed in the sync path). Instead the row is marked status='quality_dropped' after the worker scores it. memory_recall ignores that status (idx_knowledge_status_quality is added in migration 020). Audit history stays in quality_gate_log so nothing is lost.
If you need strict pre-INSERT gating (e.g. compliance), keep the default sync path.
Rows stuck in processing longer than MEMORY_ENRICH_STALE_AFTER_SEC (default 60 s) are flipped back to pending automatically β covers worker process kills mid-stage. The pre-existing write_intents outbox still covers a crash before INSERT.
- β
Default
MEMORY_MODE=fastβ zero LLM, zero Ollama, zero network in save/search/recall hot path. SetMEMORY_MODE=deepto restore v10.5 behaviour. - β
Memory Core / AI Layer split β
src/memory_core/*is deterministic;src/ai_layer/*owns every LLM-bound code path. Enforced bytests/test_no_llm_hot_path.py. - β
4 modes:
ultrafast/fast/balanced/deep. Single env flag. - β
Multi-embedding-space contract β every vector row records provider / model / dimension / space / content_type / language. Spaces:
text/code/log/config. Single Chroma backend; per-space model swap is config-only. - β
Embed fallback ladder gated β silent Ollama fallback in
Store.embedrequiresMEMORY_ALLOW_OLLAMA_IN_HOT_PATH=true. - β
New MCP tools:
memory_save_fast,memory_search_fast,memory_explain_search,memory_warmup,memory_perf_report,memory_rebuild_fts,memory_rebuild_embeddings,memory_eval_locomo,memory_eval_recall,memory_eval_temporal,memory_eval_entity_consistency,memory_eval_contradictions,memory_eval_long_context. - β Migrations 021 (embedding_spaces) + 022 (embedding_cache_v11) β idempotent on next start.
- β
Benchmark suite:
bin/memory-bench(artifactdocs/v11/benchmark.md) +bin/memory-perf-gatefor CI.
- β
Universal
memory-protocolskill β single canonical SKILL.md + 4 references (tool cheatsheet for all 60+ MCP tools, workflow recipes for 15 common situations, hooks reference, per-IDE setup) + 4 templates (Claude Code settings.json, Codex config.toml, Cursor.mdc, Cline.md). Same content for every IDE; only the wiring differs. - β
install.sh --ideextended to 9 IDEs: claude-code, codex, cursor, cline, continue, aider, windsurf, gemini-cli, opencode. New helpers:register_mcp_cline / continue / aider / windsurf+_json_merge_mcp_nestedfor the dotted-key case (cline.mcpServers). - β
Cross-platform hardening β all bash scripts pass
bash -nunder macOS bash 3.2 (default). Replaced${var,,}lowercase bashism inupdate.shwithtr '[:upper:]' '[:lower:]'. Verified with shellcheck. - β
Sub-agent memory protocol β universal header for any sub-agent (
php-pro,golang-pro,vue-expert, etc.) with mandatorymemory_recallbefore /memory_saveafter. Full template inskills/memory-protocol/references/subagent-protocol.md. - β
v10.5 latency benchmark β
benchmarks/v10_5_latency.pywith apples-to-apples sync vs async comparison. Demonstrates 80Γ p95 reduction (2150 ms β 27 ms) when async is enabled with LLM stages on.
- β
Async enrichment worker β opt-in
MEMORY_ASYNC_ENRICHMENT=truemoves quality gate / entity dedup / contradiction detector / episodic linking / wiki refresh to a background thread. Drops max save latency 5.4Γ on macOS, 60β100Γ on WSL2. See Performance tuning. - β
enrichment_queuetable with stale-processing recovery (rows stuck >60 s inprocessingflip back topending). - β Dashboard panel for worker health: depth, throughput/min, p50/p95 ms per task, oldest pending age, recent failures.
- β
_binary_searchValueError fix βnp.argpartitionrequireskth STRICTLY < N; tiny test projects (pool β€ 50) used to silently breakcontradiction_log. - β
coref_resolverRUβEN translation fix β prompt explicitly pins output language (Do NOT translate).
- β 10 Beever-Atlas-inspired features in one push: quality gate (Beever 6-Month Test), canonical tag vocabulary, importance boost in recall, opt-in coref resolution, contradiction auto-detection with supersede, write-intent outbox + reconciler, embedding-based entity dedup, episodic save events in the graph, smart query router (relational vs lexical), per-project Markdown wiki digest.
- β
5 SQLite migrations (
015β019) applied automatically on restart. - β 11 new env knobs, all with safe fail-open defaults.
- β Tests: 971 β 1124 (+153).
- β
lookup-memory/tam-lookup/ctm-lookup(legacy) CLI β bash entry-point for sub-agents, registered as[project.scripts]and installed by./install.sh/./update.sh(replaces manual~/claude-memory-server/ollama/lookup_memory.sh) - β
Pluggable embedding backends:
openai-3-small,openai-3-large(3072d),bge-m3,e5-large,locomo-tuned-minilm(fine-tuned on user data) - β
Pluggable reranker backends:
ce-marco,bge-v2-m3,bge-large,off(envV9_RERANKER_BACKEND, hot-swap) - β Subject-aware retrieval β LLM extracts (subject, action) from question β SQL graph lookup β DIRECT FACTS prepended to context (LoCoMo cat 1/2 lift)
- β Judge-weighted ensemble β category-aware scoring rubric + abstain logic for LoCoMo-style adversarial gold
- β
Fine-tune embedding pipeline (
scripts/finetune_embedding.py) β mine triplets from your data, train on top of MiniLM viasentence-transformers - β
Few-shot pair mining (
scripts/mine_locomo_fewshot.py) β augment per-category prompts with held-in (Q,A) pairs - β Schema-specific graph extractor (closed canonical predicate vocabulary, optional)
- β
SSL fix for macOS Python.org installs β
urllibrequests now use certifi by default - β HTTP retry with exponential backoff for embedding providers (5xx/timeout)
- β
LoCoMo benchmark integration (
benchmarks/locomo_bench_llm.pywith 14 ablation flags)
- β Task workflow phases (L1-L4 classifier + 6-phase state machine)
- β
Structured
save_decisionwith criteria matrix + multi-representation criterion indexing - β Cloud LLM/embed providers (OpenAI, Anthropic, Cohere, any OpenAI-compat)
- β
session_end(auto_compress=True)via LLM provider - β
Progressive disclosure:
memory_recall(mode="index")+memory_get(ids) - β
activeContext.mdObsidian live-doc projection - β Phase-scoped rules via tag filter
- β
<private>...</private>inline redaction - β
HTTP citation endpoints
/api/knowledge/{id}+/api/session/{id} - β UserPromptSubmit + PostToolUse (opt-in) capture hooks
- β
Unified
install.sh --ide {claude-code|cursor|gemini-cli|opencode|codex}
- Plugin marketplace publish (when Claude Code API opens)
has_llm()per-phase provider caching- GitHub Actions: install smoke tests + LongMemEval nightly
- "Endless mode" β continuous session without hard boundaries (virtual sessions by idle >N hours)
- MLX local LLM integration (A1 plan from memory #3583)
- Speculative decoding for local path (+1.5-1.8Γ LLM speed)
total-agent-memory is, and will always be, free and MIT-licensed. No paid tier, no gated features, no "enterprise edition". The benchmarks on this page are the entire product.
If it's saving you hours of context-pasting every week and you want to help keep development going β or just say thanks β a donation means a lot.
| Goal | |
|---|---|
| β $5 β a coffee | One evening of focused OSS work |
| π $25 β a pizza | A new MCP tool end-to-end (design, code, tests, docs) |
| π§ $100 β a weekend | A major feature: e.g. the preference-tracking module that closes the 80% gap on LongMemEval |
| π $500+ β a sprint | A release cycle: new subsystem + migrations + docs + benchmark artifact |
- β Star the repo β GitHub discovery runs on this
- π¦ Share benchmarks on X / HN / Reddit β reach matters more than donations
- π Open issues with repro cases β bug reports are pure gold
- π Write a blog post about how you use it
- π§ Submit a PR β fixes, new tools, new integrations
- π Translate the README β first docs in RU / DE / JA / ZH very welcome
- π¬ Tell your team β peer recommendations convert 10Γ better than marketing
- Building something that would benefit from a custom integration, on-prem deployment, or team-shared memory? Email
vbcherepanov@gmail.comβ open to contract work and partnerships. - AI / dev-tools company whose roadmap overlaps? Same email β happy to talk.
MIT forever. No commercial-license switch, no VC money, no dark patterns. The memory layer belongs to the developers using it, not to a SaaS vendor.
Local-first is the product. If you want a cloud memory service, mem0 and Supermemory are great. If you want your data on your disk, untouched by anyone else β this.
Honest benchmarks. Every number on this page is reproducible from the artifacts in evals/ and the scripts in benchmarks/. If you can't reproduce a claim, open an issue β it's a bug.
- Open an issue before a large PR β saves everyone time.
pytest tests/must stay green. Add tests for new tools.- Update
evals/scenarios/*.jsonif you change retrieval behavior. - Docs-only / typo PRs welcome without discussion.
MIT β see LICENSE.
Built for coding agents. Runs on your machine. Free forever.
Compare to mem0 / Letta / Zep / Supermemory Β·
Benchmark artifact Β·
TypeScript SDK Β·
Donate