Skip to content

hyorman/rememory

Repository files navigation

rememory

Local Supermemory MCP Server for AI Agents

A self-hosted Model Context Protocol server that gives AI agents persistent, structured memory. Rememory combines a knowledge graph with SuperRAG (hybrid vector + BM25 + graph search) so agents can store, connect, and recall information across conversations — all running locally in Docker with your own embedding and LLM endpoints.

Key Features

  • Knowledge graph memory — Memories are linked with typed relationships (updates, extends, derives) so agents can trace how knowledge evolves over time
  • Automatic memory extraction — Documents are chunked, embedded, and analyzed by an LLM to extract atomic facts, preferences, and episodes
  • Hybrid search (SuperRAG) — Combines vector similarity, BM25 full-text, and graph traversal via Reciprocal Rank Fusion for high-quality recall
  • Content-type-aware chunking — Smart splitting for markdown, code, HTML, and plain text
  • Automatic forgetting — Exponential decay for episodes, confidence thresholds, and noise filtering keep memory clean
  • Configurable endpoints — Bring your own embedding and LLM endpoints (Ollama, vLLM, OpenAI, or any compatible API)
  • Persistent local storage — SQLite (graph + FTS5) and LanceDB (vectors) with Docker volume persistence
  • Privacy-first — All data stays on your machine, no external services required

Architecture

┌─────────────────────────────────────────────────────────┐
│                      AI Agent                           │
└──────────────────────┬──────────────────────────────────┘
                       │ MCP (stdio / HTTP)
┌──────────────────────▼──────────────────────────────────┐
│                   rememory server                       │
│                                                         │
│  ┌──────────────┐  ┌──────────────┐  ┌───────────────┐  │
│  │  Pipeline    │  │    Search    │  │   Forgetting  │  │
│  │  ─────────── │  │  ──────────  │  │   ──────────  │  │
│  │  chunker     │  │  vector      │  │   decay       │  │
│  │  extractor   │  │  BM25        │  │   expiration  │  │
│  │  embedder    │  │  graph       │  │   cleanup     │  │
│  │  relationship│  │  RRF fusion  │  │               │  │
│  └──────┬───────┘  └──────┬───────┘  └───────┬───────┘  │
│         │                 │                  │          │
│  ┌──────▼─────────────────▼──────────────────▼───────┐  │
│  │                  Storage Layer                    │  │
│  │  ┌──────────────────┐  ┌────────────────────────┐ │  │
│  │  │  SQLite + FTS5   │  │  LanceDB (vectors)     │ │  │
│  │  │  (graph, BM25)   │  │                        │ │  │
│  │  └──────────────────┘  └────────────────────────┘ │  │
│  └───────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────┘
         ▲                               ▲
         │ HTTP                          │ HTTP
   ┌─────┴───────┐                ┌──────┴──────┐
   │  Embedding  │                │     LLM     │
   │  Endpoint   │                │   Endpoint  │
   │  (Ollama)   │                │  (Ollama)   │
   └─────────────┘                └─────────────┘

Quick Start

Prerequisites

  • Docker (with Compose)
  • An embedding endpoint — Ollama is the easiest way to get started

1. Pull an embedding model and an LLM via Ollama

ollama pull nomic-embed-text
ollama pull llama3.2

2. Clone and configure

git clone https://github.com/your-org/rememory.git
cd rememory
cp .env.example .env
# Edit .env if your endpoints differ from the defaults

3. Build and run

docker compose build
docker compose run --rm -i rememory

The server starts in stdio mode by default — it reads JSON-RPC from stdin and writes to stdout, ready for MCP clients.

MCP Client Configuration

Claude Desktop

Add to ~/Library/Application Support/Claude/claude_desktop_config.json (macOS) or %APPDATA%\Claude\claude_desktop_config.json (Windows):

{
  "mcpServers": {
    "rememory": {
      "command": "docker",
      "args": [
        "compose", "-f", "/path/to/rememory/docker-compose.yml",
        "run", "--rm", "-i", "rememory"
      ]
    }
  }
}

VS Code Copilot

Add to your VS Code settings.json:

{
  "mcp": {
    "servers": {
      "rememory": {
        "command": "docker",
        "args": [
          "compose", "-f", "/path/to/rememory/docker-compose.yml",
          "run", "--rm", "-i", "rememory"
        ]
      }
    }
  }
}

Cursor

Add to .cursor/mcp.json in your project or ~/.cursor/mcp.json globally:

{
  "mcpServers": {
    "rememory": {
      "command": "docker",
      "args": [
        "compose", "-f", "/path/to/rememory/docker-compose.yml",
        "run", "--rm", "-i", "rememory"
      ]
    }
  }
}

Tools

Rememory exposes 9 MCP tools:

Tool Description Key Parameters
add_document Ingest a document — chunk, embed, extract memories, build relationships content, contentType, title, containerTags, extractMemories
add_memory Directly add a memory unit (fact, preference, or episode) content, memoryType, containerTags, expiresAt
search Hybrid search across memories (vector + BM25 + graph) query, searchMode, topK, containerTags, includeRelated
get_memory Retrieve a memory by ID with its relationships and version history id
list_memories List memories with filtering and pagination containerTags, memoryTypes, onlyLatest, limit, offset
delete_memory Delete a memory and its associated vectors (cascades to relationships) id
get_related Traverse the memory graph from a seed memory memoryId, depth, direction, relationshipTypes
forget Trigger memory cleanup — decay, expiration, and noise removal containerTags, dryRun
get_status Health check and storage statistics (none)

Configuration

All configuration is via environment variables. Copy .env.example to .env and customize as needed.

Variable Default Description
EMBEDDING_ENDPOINT http://host.docker.internal:11434/api/embed Embedding API endpoint
EMBEDDING_MODEL nomic-embed-text Embedding model name
EMBEDDING_FORMAT ollama Response format: ollama or openai
EMBEDDING_DIMENSIONS auto Vector dimensions (auto to detect, or a number)
EMBEDDING_BATCH_SIZE 32 Max texts per embedding request
LLM_ENDPOINT http://host.docker.internal:11434/v1/chat/completions OpenAI-compatible chat completions endpoint
LLM_MODEL llama3.2 LLM model name
DATA_DIR /data Storage directory (mapped to Docker volume)
TRANSPORT stdio MCP transport: stdio or http
PORT 3111 HTTP port (when TRANSPORT=http)
LOG_LEVEL info Log level: trace, debug, info, warn, error, fatal
SEARCH_TOP_K 10 Default number of search results
CHUNK_SIZE 1000 Chunk size in characters
CHUNK_OVERLAP 200 Overlap between chunks in characters
WEIGHT_VECTOR 0.5 Hybrid search weight for vector similarity
WEIGHT_BM25 0.3 Hybrid search weight for BM25 full-text
WEIGHT_GRAPH 0.2 Hybrid search weight for graph traversal
DECAY_LAMBDA 0.05 Exponential decay rate for episode confidence
CONFIDENCE_THRESHOLD 0.1 Auto-remove memories below this confidence
CLEANUP_INTERVAL_MINUTES 60 Minutes between automatic forgetting cycles

Concepts

Documents vs Memories

A document is raw input — a text, a conversation, a code file. When you call add_document, rememory chunks the document, embeds the chunks for vector search, and optionally uses an LLM to extract memories: atomic, self-contained knowledge units typed as:

  • Fact — A persistent truth ("The API uses JWT authentication")
  • Preference — A repeated pattern or user preference ("Prefers TypeScript over JavaScript")
  • Episode — A time-bound event ("Deployed v2.3 on March 15th") that can expire

Memory Relationships

Every new memory is compared against existing memories using embedding similarity and LLM classification. Three relationship types form the knowledge graph:

  • updates — The new memory contradicts or supersedes an existing one. The old memory is marked isLatest = false, preserving full version history.
  • extends — The new memory adds detail to an existing one. Both remain current.
  • derives — The new memory is inferred from an existing one during extraction.

Hybrid Search

Search combines three signals via Reciprocal Rank Fusion (RRF, k=60):

  1. Vector similarity (default weight: 0.5) — Cosine similarity via LanceDB
  2. BM25 full-text (default weight: 0.3) — SQLite FTS5 with Porter stemming
  3. Graph traversal (default weight: 0.2) — 1–2 hop traversal from top vector matches

Results are filtered to isLatest = true memories and optionally expanded with related memories via graph traversal.

Automatic Forgetting

The forgetting engine keeps memory clean and relevant:

  • Expiration — Episodes past their expiresAt are removed
  • Decay — Episode confidence decays exponentially: confidence × e^(-λ × days) × (1 + 0.1 × accessCount)
  • Threshold — Memories below CONFIDENCE_THRESHOLD are auto-removed
  • Strengthening — Frequently accessed preferences (accessCount > 5) have their strength increased

Run forget manually or let the automatic cleanup cycle handle it.

Development

Setup

npm install

Run locally (without Docker)

# Set environment variables (or create a .env file)
export EMBEDDING_ENDPOINT=http://localhost:11434/api/embed
export LLM_ENDPOINT=http://localhost:11434/v1/chat/completions
export DATA_DIR=./data

npm run dev

Build

npm run build

Test

npm test
npm run test:watch  # Watch mode

Lint

npm run lint

Benchmark

Rememory includes a LongMemEval integration for benchmarking memory accuracy across 500 questions covering temporal reasoning, knowledge updates, multi-session aggregation, and more.

# Quick 30-question pilot (~4-5 hours)
npx tsx scripts/longmemeval.ts --limit 30 --stratify --output data/results.jsonl

# Evaluate results
npx tsx scripts/longmemeval-eval.ts --results data/results.jsonl

See docs/BENCHMARK.md for the full guide, configuration options, and our 83.3% accuracy results on the 30-question stratified pilot.

License

MIT

About

Local Memory server for Agents

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors