rememory

Local Supermemory MCP Server for AI Agents

A self-hosted Model Context Protocol server that gives AI agents persistent, structured memory. Rememory combines a knowledge graph with SuperRAG (hybrid vector + BM25 + graph search) so agents can store, connect, and recall information across conversations — all running locally in Docker with your own embedding and LLM endpoints.

Key Features

Knowledge graph memory — Memories are linked with typed relationships (updates, extends, derives) so agents can trace how knowledge evolves over time
Automatic memory extraction — Documents are chunked, embedded, and analyzed by an LLM to extract atomic facts, preferences, and episodes
Hybrid search (SuperRAG) — Combines vector similarity, BM25 full-text, and graph traversal via Reciprocal Rank Fusion for high-quality recall
Content-type-aware chunking — Smart splitting for markdown, code, HTML, and plain text
Automatic forgetting — Exponential decay for episodes, confidence thresholds, and noise filtering keep memory clean
Configurable endpoints — Bring your own embedding and LLM endpoints (Ollama, vLLM, OpenAI, or any compatible API)
Persistent local storage — SQLite (graph + FTS5) and LanceDB (vectors) with Docker volume persistence
Privacy-first — All data stays on your machine, no external services required

Architecture

┌─────────────────────────────────────────────────────────┐
│                      AI Agent                           │
└──────────────────────┬──────────────────────────────────┘
                       │ MCP (stdio / HTTP)
┌──────────────────────▼──────────────────────────────────┐
│                   rememory server                       │
│                                                         │
│  ┌──────────────┐  ┌──────────────┐  ┌───────────────┐  │
│  │  Pipeline    │  │    Search    │  │   Forgetting  │  │
│  │  ─────────── │  │  ──────────  │  │   ──────────  │  │
│  │  chunker     │  │  vector      │  │   decay       │  │
│  │  extractor   │  │  BM25        │  │   expiration  │  │
│  │  embedder    │  │  graph       │  │   cleanup     │  │
│  │  relationship│  │  RRF fusion  │  │               │  │
│  └──────┬───────┘  └──────┬───────┘  └───────┬───────┘  │
│         │                 │                  │          │
│  ┌──────▼─────────────────▼──────────────────▼───────┐  │
│  │                  Storage Layer                    │  │
│  │  ┌──────────────────┐  ┌────────────────────────┐ │  │
│  │  │  SQLite + FTS5   │  │  LanceDB (vectors)     │ │  │
│  │  │  (graph, BM25)   │  │                        │ │  │
│  │  └──────────────────┘  └────────────────────────┘ │  │
│  └───────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────┘
         ▲                               ▲
         │ HTTP                          │ HTTP
   ┌─────┴───────┐                ┌──────┴──────┐
   │  Embedding  │                │     LLM     │
   │  Endpoint   │                │   Endpoint  │
   │  (Ollama)   │                │  (Ollama)   │
   └─────────────┘                └─────────────┘

Quick Start

Prerequisites

Docker (with Compose)
An embedding endpoint — Ollama is the easiest way to get started

1. Pull an embedding model and an LLM via Ollama

ollama pull nomic-embed-text
ollama pull llama3.2

2. Clone and configure

git clone https://github.com/your-org/rememory.git
cd rememory
cp .env.example .env
# Edit .env if your endpoints differ from the defaults

3. Build and run

docker compose build
docker compose run --rm -i rememory

The server starts in stdio mode by default — it reads JSON-RPC from stdin and writes to stdout, ready for MCP clients.

MCP Client Configuration

Claude Desktop

Add to ~/Library/Application Support/Claude/claude_desktop_config.json (macOS) or %APPDATA%\Claude\claude_desktop_config.json (Windows):

{
  "mcpServers": {
    "rememory": {
      "command": "docker",
      "args": [
        "compose", "-f", "/path/to/rememory/docker-compose.yml",
        "run", "--rm", "-i", "rememory"
      ]
    }
  }
}

VS Code Copilot

Add to your VS Code settings.json:

{
  "mcp": {
    "servers": {
      "rememory": {
        "command": "docker",
        "args": [
          "compose", "-f", "/path/to/rememory/docker-compose.yml",
          "run", "--rm", "-i", "rememory"
        ]
      }
    }
  }
}

Cursor

Add to .cursor/mcp.json in your project or ~/.cursor/mcp.json globally:

{
  "mcpServers": {
    "rememory": {
      "command": "docker",
      "args": [
        "compose", "-f", "/path/to/rememory/docker-compose.yml",
        "run", "--rm", "-i", "rememory"
      ]
    }
  }
}

Tools

Rememory exposes 9 MCP tools:

Tool	Description	Key Parameters
`add_document`	Ingest a document — chunk, embed, extract memories, build relationships	`content`, `contentType`, `title`, `containerTags`, `extractMemories`
`add_memory`	Directly add a memory unit (fact, preference, or episode)	`content`, `memoryType`, `containerTags`, `expiresAt`
`search`	Hybrid search across memories (vector + BM25 + graph)	`query`, `searchMode`, `topK`, `containerTags`, `includeRelated`
`get_memory`	Retrieve a memory by ID with its relationships and version history	`id`
`list_memories`	List memories with filtering and pagination	`containerTags`, `memoryTypes`, `onlyLatest`, `limit`, `offset`
`delete_memory`	Delete a memory and its associated vectors (cascades to relationships)	`id`
`get_related`	Traverse the memory graph from a seed memory	`memoryId`, `depth`, `direction`, `relationshipTypes`
`forget`	Trigger memory cleanup — decay, expiration, and noise removal	`containerTags`, `dryRun`
`get_status`	Health check and storage statistics	(none)

Configuration

All configuration is via environment variables. Copy .env.example to .env and customize as needed.

Variable	Default	Description
`EMBEDDING_ENDPOINT`	`http://host.docker.internal:11434/api/embed`	Embedding API endpoint
`EMBEDDING_MODEL`	`nomic-embed-text`	Embedding model name
`EMBEDDING_FORMAT`	`ollama`	Response format: `ollama` or `openai`
`EMBEDDING_DIMENSIONS`	`auto`	Vector dimensions (`auto` to detect, or a number)
`EMBEDDING_BATCH_SIZE`	`32`	Max texts per embedding request
`LLM_ENDPOINT`	`http://host.docker.internal:11434/v1/chat/completions`	OpenAI-compatible chat completions endpoint
`LLM_MODEL`	`llama3.2`	LLM model name
`DATA_DIR`	`/data`	Storage directory (mapped to Docker volume)
`TRANSPORT`	`stdio`	MCP transport: `stdio` or `http`
`PORT`	`3111`	HTTP port (when `TRANSPORT=http`)
`LOG_LEVEL`	`info`	Log level: `trace`, `debug`, `info`, `warn`, `error`, `fatal`
`SEARCH_TOP_K`	`10`	Default number of search results
`CHUNK_SIZE`	`1000`	Chunk size in characters
`CHUNK_OVERLAP`	`200`	Overlap between chunks in characters
`WEIGHT_VECTOR`	`0.5`	Hybrid search weight for vector similarity
`WEIGHT_BM25`	`0.3`	Hybrid search weight for BM25 full-text
`WEIGHT_GRAPH`	`0.2`	Hybrid search weight for graph traversal
`DECAY_LAMBDA`	`0.05`	Exponential decay rate for episode confidence
`CONFIDENCE_THRESHOLD`	`0.1`	Auto-remove memories below this confidence
`CLEANUP_INTERVAL_MINUTES`	`60`	Minutes between automatic forgetting cycles

Concepts

Documents vs Memories

A document is raw input — a text, a conversation, a code file. When you call add_document, rememory chunks the document, embeds the chunks for vector search, and optionally uses an LLM to extract memories: atomic, self-contained knowledge units typed as:

Fact — A persistent truth ("The API uses JWT authentication")
Preference — A repeated pattern or user preference ("Prefers TypeScript over JavaScript")
Episode — A time-bound event ("Deployed v2.3 on March 15th") that can expire

Memory Relationships

Every new memory is compared against existing memories using embedding similarity and LLM classification. Three relationship types form the knowledge graph:

updates — The new memory contradicts or supersedes an existing one. The old memory is marked isLatest = false, preserving full version history.
extends — The new memory adds detail to an existing one. Both remain current.
derives — The new memory is inferred from an existing one during extraction.

Hybrid Search

Search combines three signals via Reciprocal Rank Fusion (RRF, k=60):

Vector similarity (default weight: 0.5) — Cosine similarity via LanceDB
BM25 full-text (default weight: 0.3) — SQLite FTS5 with Porter stemming
Graph traversal (default weight: 0.2) — 1–2 hop traversal from top vector matches

Results are filtered to isLatest = true memories and optionally expanded with related memories via graph traversal.

Automatic Forgetting

The forgetting engine keeps memory clean and relevant:

Expiration — Episodes past their expiresAt are removed
Decay — Episode confidence decays exponentially: confidence × e^(-λ × days) × (1 + 0.1 × accessCount)
Threshold — Memories below CONFIDENCE_THRESHOLD are auto-removed
Strengthening — Frequently accessed preferences (accessCount > 5) have their strength increased

Run forget manually or let the automatic cleanup cycle handle it.

Development

Setup

npm install

Run locally (without Docker)

# Set environment variables (or create a .env file)
export EMBEDDING_ENDPOINT=http://localhost:11434/api/embed
export LLM_ENDPOINT=http://localhost:11434/v1/chat/completions
export DATA_DIR=./data

npm run dev

Build

npm run build

Test

npm test
npm run test:watch  # Watch mode

Lint

npm run lint

Benchmark

Rememory includes a LongMemEval integration for benchmarking memory accuracy across 500 questions covering temporal reasoning, knowledge updates, multi-session aggregation, and more.

# Quick 30-question pilot (~4-5 hours)
npx tsx scripts/longmemeval.ts --limit 30 --stratify --output data/results.jsonl

# Evaluate results
npx tsx scripts/longmemeval-eval.ts --results data/results.jsonl

See docs/BENCHMARK.md for the full guide, configuration options, and our 83.3% accuracy results on the 30-question stratified pilot.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
docs		docs
scripts		scripts
src		src
supabase		supabase
test		test
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yml		docker-compose.yml
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json
vitest.config.ts		vitest.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

rememory

Key Features

Architecture

Quick Start

Prerequisites

1. Pull an embedding model and an LLM via Ollama

2. Clone and configure

3. Build and run

MCP Client Configuration

Claude Desktop

VS Code Copilot

Cursor

Tools

Configuration

Concepts

Documents vs Memories

Memory Relationships

Hybrid Search

Automatic Forgetting

Development

Setup

Run locally (without Docker)

Build

Test

Lint

Benchmark

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

rememory

Key Features

Architecture

Quick Start

Prerequisites

1. Pull an embedding model and an LLM via Ollama

2. Clone and configure

3. Build and run

MCP Client Configuration

Claude Desktop

VS Code Copilot

Cursor

Tools

Configuration

Concepts

Documents vs Memories

Memory Relationships

Hybrid Search

Automatic Forgetting

Development

Setup

Run locally (without Docker)

Build

Test

Lint

Benchmark

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages