Skip to content

jotlabsorg/mem

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

mem

A local, searchable memory of your browser tabs, indexed by meaning instead of URLs.

Not bookmarks. Not notes. A time-indexed knowledge exhaust of what you read.

What it does

You press a shortcut. The page you're reading gets captured -- content extracted, stripped of navigation and ads, chunked, embedded as vectors, and stored in a local SQLite database. Later, you search by idea rather than by URL or title, and the system finds what you were reading even if you only vaguely remember the concept.

Zero-friction capture, high-quality recall. No cloud, no sync, no auth. Everything runs on your machine.


Capture (Cmd+Shift+U)

Press the shortcut on any page. A small overlay appears asking "Why are you saving this?" -- type a one-line note and press Enter, or press Escape to skip. The extension extracts the page content using Readability, sends it to the local API, and the backend chunks, embeds, and stores it. The whole flow takes under a second.

Capture overlay


Search (Cmd+Shift+M)

Press the shortcut anywhere in the browser. An overlay appears with a single search bar. Type what you remember -- an idea, a problem, a concept -- and results appear ranked by semantic similarity. Arrow keys to navigate, Enter to open the original URL, Escape to close.

Search overlay Search result

Architecture

Browser Extension (ClojureScript, Manifest V3)
   |
   | chrome.runtime.sendMessage
   v
Background Service Worker
   |
   | HTTP POST/GET to localhost
   v
Local API Server (Rust, Axum)  -->  127.0.0.1:7745
   |
   v
SQLite + FTS5 + sqlite-vec
   |
   +-- tab_artifacts      (one row per captured page)
   +-- artifact_chunks    (text split into ~500-word segments)
   +-- artifacts_fts      (FTS5 virtual table, auto-synced via triggers)
   +-- chunk_embeddings   (384-dim float vectors via sqlite-vec)

Everything stays local. The database lives at ~/.mem/mem.db.


How the search algorithm works

Search uses a hybrid approach that combines three signals: semantic similarity, keyword matching, and recency. The final score is a weighted blend that favors meaning over exact words.

Step 1: Embed the query

The query string is passed through the same embedding model used at capture time (all-MiniLM-L6-v2, 384 dimensions). This produces a single float vector representing the meaning of the query.

Step 2: Semantic path (weight: 70%)

The query vector is compared against all stored chunk embeddings using sqlite-vec's vector similarity search. This returns the top-K closest chunks by distance.

Results are grouped by artifact (a single page may have multiple chunks). For each artifact, the best distance is kept and up to 2 chunk snippets are collected. Distances are normalized to a [0, 1] similarity score:

semantic_score = 1.0 - (distance / max_distance)

Step 3: FTS path (weight: 20%)

The raw query string is also run through SQLite's FTS5 full-text search with BM25 ranking. This catches exact keyword matches that the embedding model might not surface -- abbreviations, proper nouns, code identifiers.

FTS scores are normalized against the maximum score in the result set:

fts_score = bm25_rank / max_bm25_rank

Step 4: Recency path (weight: 10%)

Each candidate gets a recency boost based on when it was captured. The decay function is:

recency = 1.0 / (1.0 + days_ago * 0.01)

This gives a gentle preference to recent pages without burying older ones. A page captured yesterday scores ~0.99, a page from a month ago scores ~0.77, a page from a year ago scores ~0.27.

Step 5: Merge and rank

Candidates from both paths are merged into a single map keyed by artifact ID. The final score for each artifact is:

score = 0.7 * semantic_score + 0.2 * fts_score + 0.1 * recency

If an artifact appears in both the semantic and FTS results, both contributions are added. Results are sorted by descending score and truncated to the requested limit.

Why this matters

Your brain does not remember URLs or exact titles. It remembers ideas, problems, and contexts. Semantic embeddings align with human recall. The FTS fallback catches the cases where you do remember a specific term. Recency handles the "I just read something about this" scenario.


Embedding model

The primary embedder is fastembed-rs running the all-MiniLM-L6-v2 ONNX model locally. It produces 384-dimensional vectors and requires no API key or network access.

If fastembed fails to initialize (missing ONNX runtime, unsupported platform), the system falls back to OpenAI's text-embedding-3-small API (1536 dimensions) if the OPENAI_API_KEY environment variable is set.


Text chunking

Captured page content is split into chunks of approximately 500 words with a 50-word overlap between consecutive chunks. This overlap ensures that ideas spanning a chunk boundary are still captured in at least one chunk. Each chunk is embedded independently and stored alongside its parent artifact.


Data model

tab_artifacts -- one row per captured page:

  • url (unique, upserted on re-capture)
  • title
  • content_text (full extracted text)
  • note (optional one-liner from the capture prompt)
  • created_at

artifact_chunks -- text segments for embedding:

  • artifact_id (foreign key)
  • chunk_index (ordering within the page)
  • chunk_text

chunk_embeddings -- vector storage via sqlite-vec:

  • chunk_id (matches artifact_chunks)
  • embedding (float[384])

artifacts_fts -- FTS5 virtual table over title + content_text, kept in sync via triggers on insert/update/delete.


Tech stack

Layer Technology
Backend Rust, Axum, rusqlite, sqlite-vec, fastembed-rs
Extension ClojureScript, shadow-cljs, Manifest V3
Content extraction @mozilla/readability
Storage SQLite (WAL mode, FTS5, vec0)
Build Cargo (Rust), shadow-cljs (ClojureScript), npm

Project structure

mem/
  crates/
    mem-core/          -- database, embedder, chunker, search algorithm
    mem-server/        -- Axum HTTP server (capture + search API, web UI)
  ui/
    src/dev/jotlabs/mem/
      extension/
        background.cljs      -- service worker (command dispatch, API proxy)
        capture.cljs         -- content script for capture overlay
        search_overlay.cljs  -- content script for search overlay
      web/
        app.cljs             -- standalone web search UI
    resources/
      extension/manifest.json
      web/index.html

Running

Start the backend:

cd mem
cargo run -p mem-server

The server binds to 127.0.0.1:7745. On first run it downloads the embedding model (~23MB).

Build the extension:

cd mem/ui
npm install
# we have these two options available
npm run build:chrome:ext
npm run build:firefox:ext

Load the extension in Chrome: go to chrome://extensions, enable Developer mode, click "Load unpacked", and select mem/ui/dist/extension.


Environment variables

Variable Purpose Default
MEM_DB_PATH Override database file location ~/.mem/mem.db
MEM_WEB_DIR Override web UI assets directory auto-detected
OPENAI_API_KEY Enable OpenAI embedding fallback not set
RUST_LOG Control log verbosity (e.g. debug, info) info

About

A local browser memory

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors