HDF5-backed memory store for on-device AI agents.
EdgeHDF5 is a standalone Rust library that gives any AI agent fast, portable, single-file memory. It stores conversations, embeddings, knowledge graphs, and session history in a single HDF5 file — then searches them at microsecond latency using hardware-adaptive backends (Apple AMX, BLAS, SIMD, GPU).
EdgeHDF5 is framework-agnostic. It works with any agent system that produces embeddings and needs persistent memory — LangChain, custom agent loops, or standalone tools.
rustystack/edgehdf5 MIT License
Rust workspace · 2 crates · ~6.7K LOC
| Concern | SQLite + pgvector / Qdrant | EdgeHDF5 (HDF5) |
|---|---|---|
| Deployment | Requires a running database process or client library | Single .h5 file, no daemon, no network |
| Vector search | Bolted-on extension; query planner overhead | Native flat arrays with SIMD/BLAS/GPU dispatch |
| Portability | Tied to OS-level SQLite or container images | One file — copy, snapshot, ship to another device |
| Memory mapping | Page-level I/O with SQLite WAL | Direct mmap of contiguous float arrays |
| Typed storage | Blobs or JSON columns for embeddings | First-class N×D float32/float16 datasets |
| Compression | Row-level, limited | Deflate on datasets; PQ for 8× vector compression |
| On-device AI | Heavy dependency tree | Zero-network, single-file, deterministic I/O |
EdgeHDF5 targets use cases where the agent runs on the user's machine — laptops, edge devices, CI runners — and needs memory that is fast, self-contained, and inspectable.
┌──────────────────────────────────────────────────────────────────┐
│ HDF5Memory (lib.rs) │
│ AgentMemory trait: save · search · sessions · kg │
├──────────┬──────────┬─────────────┬─────────────┬────────────────┤
│ cache │ session │ knowledge │ schema │ storage │
│ (LRU) │ mgmt │ graph │ (v1.0) │ (mmap I/O) │
├──────────┴──────────┴─────────────┴─────────────┴────────────────┤
│ Search Layer │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ strategy.rs — Adaptive Dispatch │ │
│ │ Scalar → SIMD → BLAS → Accelerate → Rayon → GPU → IVF-PQ│ │
│ └────┬────────┬────────┬──────────┬────────┬────────┬──────┘ │
│ │ │ │ │ │ │ │
│ vector blas accel gpu ivf+pq hybrid │
│ search search search search index vec+bm25 │
│ (SIMD) (sgemm) (cblas) (wgpu) (ANN) (RRF) │
├──────────────────────────────────────────────────────────────────┤
│ rustyhdf5 stack: rustyhdf5 · rustyhdf5-io · rustyhdf5-accel │
│ rustyhdf5-format · rustyhdf5-gpu │
└──────────────────────────────────────────────────────────────────┘
│
┌─────────┴─────────┐
│ agent_memory.h5 │
│ /meta │
│ /memory │
│ /sessions │
│ /knowledge_graph │
└───────────────────┘
| Crate | Path | Description |
|---|---|---|
| edgehdf5-memory | crates/edgehdf5-memory |
Core library — memory store, search backends, knowledge graph, sessions |
| edgehdf5-migrate | crates/edgehdf5-migrate |
CLI tool to migrate existing SQLite agent databases to HDF5 |
[dependencies]
edgehdf5-memory = { git = "ssh://git@github.com/rustystack/edgehdf5.git", features = ["float16"] }For Apple Silicon machines, enable hardware-accelerated search:
edgehdf5-memory = { git = "ssh://git@github.com/rustystack/edgehdf5.git", features = ["float16", "accelerate"] }use edgehdf5_memory::{HDF5Memory, MemoryConfig, MemoryEntry, AgentMemory};
use std::path::PathBuf;
// Configure the memory store
let config = MemoryConfig {
path: PathBuf::from("agent_memory.h5"),
agent_id: "my-agent".into(),
embedder: "openai:text-embedding-3-small".into(),
embedding_dim: 384,
chunk_size: 512,
overlap: 50,
float16: true,
compression: true,
compression_level: 4,
compact_threshold: 0.3,
created_at: "2025-01-01T00:00:00Z".into(),
};
let mut memory = HDF5Memory::create(config)?;
// Add a memory entry
let entry = MemoryEntry {
chunk: "The user prefers dark mode and uses vim keybindings.".into(),
embedding: embed("The user prefers dark mode..."), // your embedding fn
source_channel: "chat".into(),
timestamp: 1700000000.0,
session_id: "session-001".into(),
tags: "preference,ui".into(),
};
let id = memory.save(entry)?;
// Add a session summary
memory.add_session("session-001", 0, 5, "chat", "Discussed UI preferences")?;use edgehdf5_memory::vector_search::{cosine_similarity_batch_prenorm, top_k, compute_norm};
let query = embed("What are the user's UI preferences?");
let query_norm = compute_norm(&query);
// Get the in-memory cache for search
let cache = memory.cache();
let scores = cosine_similarity_batch_prenorm(
&query,
&cache.embeddings,
&cache.norms,
&cache.tombstones,
);
let results = top_k(&scores, 5);
for (idx, score) in &results {
println!("[{:.3}] {}", score, cache.chunks[*idx]);
}use edgehdf5_memory::bm25::BM25Index;
use edgehdf5_memory::hybrid::hybrid_search;
let bm25 = BM25Index::build(&cache.chunks, &cache.tombstones);
let results = hybrid_search(
&query_embedding,
"user UI preferences dark mode",
&cache.embeddings,
&cache.chunks,
&cache.tombstones,
&bm25,
0.7, // vector weight
0.3, // keyword weight
5, // top-k
);use edgehdf5_memory::strategy::{auto_select_strategy, search_with_metrics, HardwareCapabilities};
let hw = HardwareCapabilities {
rayon_available: cfg!(feature = "parallel"),
gpu_available: false,
blas_available: cfg!(feature = "fast-math"),
accelerate_available: cfg!(feature = "accelerate"),
};
let strategy = auto_select_strategy(cache.embeddings.len(), &hw);
// Returns: Scalar | SimdBruteForce | Blas | Accelerate | RayonParallel | Gpu | IvfPq
let (results, metrics) = search_with_metrics(
&query, &cache.embeddings, &cache.norms, &cache.tombstones,
10, strategy, &mut None,
);
println!("Strategy: {}, Time: {}µs", metrics.strategy, metrics.search_time_us);let entity_a = memory.add_entity("Alice", "person", -1)?;
let entity_b = memory.add_entity("ProjectX", "project", -1)?;
memory.add_relation(entity_a, entity_b, "works_on", 1.0)?;
let relations = memory.knowledge().get_relations_from(entity_a);use edgehdf5_memory::async_memory::{AsyncHDF5Memory, AsyncConfig};
use std::time::Duration;
let config = AsyncConfig {
flush_interval: Duration::from_secs(10),
flush_threshold: 100,
};
let mem = AsyncHDF5Memory::open_with("agent_memory.h5", config).await?;
// Saves are buffered and batched by a background writer task
mem.save(entry).await?;
mem.save_batch(entries).await?;
// Searches are offloaded to spawn_blocking
let results = mem.hybrid_search(embedding, "query".into(), 0.7, 0.3, 5).await;
// Graceful shutdown flushes remaining writes
mem.shutdown().await?;The async wrapper keeps the synchronous HDF5Memory core on a background thread, batches saves through an mpsc channel, and auto-flushes on a configurable interval or when pending WAL entries exceed a threshold.
Benchmarked on MacBook Pro M3 Max, 384-dimensional embeddings:
| Backend | 1K vectors | 10K vectors | 100K vectors | Notes |
|---|---|---|---|---|
| Scalar (baseline) | 42µs | 410µs | 4.1ms | No SIMD, no dependencies |
| SIMD brute-force | 18µs | 175µs | 1.7ms | rustyhdf5-accel auto-dispatch |
| Apple Accelerate (cblas) | 15µs | 157µs | 1.5ms | AMX coprocessor via Accelerate.framework |
| BLAS (matrixmultiply) | 17µs | 168µs | 1.6ms | Cross-platform, no system libs |
| Rayon parallel | 35µs | 120µs | 980µs | Scales with core count |
| GPU (wgpu) | 200µs | 190µs | 650µs | Amortized; wins at scale |
| IVF-PQ | N/A (overhead > brute-force) | 850µs | 380µs | 6.2× faster than numpy; index build amortized |
Adaptive strategy thresholds:
| Collection Size | Auto-Selected Strategy |
|---|---|
| < 1,000 | Scalar brute-force |
| 1K – 10K | SIMD brute-force (or Accelerate/BLAS if available) |
| 10K – 50K | Rayon parallel (or Accelerate/BLAS) |
| 50K – 500K | GPU (if available) or Accelerate/BLAS |
| > 500K | IVF-PQ approximate search with exact reranking |
Storage efficiency with Product Quantization:
- 384-dim float32 → 48 bytes per vector (PQ codes) = 8× compression
- 100K vectors: 146 MB (raw) → 4.6 MB (PQ codes + codebook)
| Feature | Default | Description |
|---|---|---|
float16 |
yes | Half-precision embedding storage via the half crate |
parallel |
no | Rayon-based parallel search |
fast-math |
no | BLAS matrix-vector multiply via matrixmultiply (cross-platform) |
accelerate |
no | Apple Accelerate framework — cblas_sgemv on AMX (macOS only) |
openblas |
no | OpenBLAS cblas_sgemv (Linux) |
gpu |
no | GPU-accelerated search via wgpu (Metal/Vulkan/DX12) |
async |
no | Tokio-based async wrapper with background flush (batched writes, auto-flush) |
# macOS / Apple Silicon (best performance)
features = ["float16", "accelerate", "parallel"]
# Linux server
features = ["float16", "openblas", "parallel"]
# Cross-platform (no system library dependencies)
features = ["float16", "fast-math", "parallel"]
# Maximum acceleration (macOS with GPU)
features = ["float16", "accelerate", "parallel", "gpu"]
# Async agent runtime (tokio)
features = ["float16", "accelerate", "parallel", "async"]| Platform | Status | Optimized Backend |
|---|---|---|
| macOS / Apple Silicon | Primary target | Accelerate (AMX), Metal (wgpu) |
| macOS / Intel | Supported | Accelerate (SSE/AVX), Metal (wgpu) |
| Linux x86_64 | Supported | OpenBLAS, Vulkan (wgpu) |
| Linux aarch64 | Supported | OpenBLAS, Vulkan (wgpu) |
| Windows x86_64 | Supported | matrixmultiply, DX12 (wgpu) |
All platforms fall back to scalar or SIMD brute-force when no accelerated backend is compiled in.
The edgehdf5-migrate CLI converts existing SQLite agent memory databases to HDF5.
cargo install --path crates/edgehdf5-migrateedgehdf5-migrate \
--sqlite old_memory.db \
--hdf5 agent_memory.h5 \
--agent-id my-agent \
--embedder openai:text-embedding-3-small \
--embedding-dim 384 \
--skip-deleted \
--compression \
--compression-level 6 \
--verbose| Flag | Default | Description |
|---|---|---|
--sqlite |
(required) | Path to source SQLite database |
--hdf5 |
(required) | Path to output HDF5 file |
--agent-id |
migrated |
Agent identifier to write into /meta |
--embedder |
unknown |
Embedder model name |
--embedding-dim |
(auto-detect) | Embedding dimensionality; auto-detected from first row if omitted |
--skip-deleted |
false |
Skip rows with deleted=1 |
--compression |
false |
Enable deflate compression |
--compression-level |
4 |
Compression level (1–9) |
--float16 |
false |
Store embeddings as float16 (halves storage) |
--dry-run |
false |
Validate migration without writing the output file |
--verbose |
false |
Print progress and statistics |
The migration tool reads from these tables:
memory_chunks—chunk TEXT, embedding BLOB, source_channel TEXT, timestamp REAL, session_id TEXT, tags TEXT, deleted INTEGERsessions—id TEXT, start_idx INTEGER, end_idx INTEGER, channel TEXT, timestamp REAL, summary TEXTentities—id INTEGER, name TEXT, entity_type TEXT, embedding_idx INTEGERrelations—src INTEGER, tgt INTEGER, relation TEXT, weight REAL, ts REAL
agent_memory.h5
├── /meta (attributes)
│ ├── schema_version: "1.0"
│ ├── edgehdf5_version: "1.93.0"
│ ├── agent_id, embedder, embedding_dim
│ ├── chunk_size, overlap
│ └── created_at
├── /memory
│ ├── chunks: string[N] (NullPad encoded)
│ ├── embeddings: f32[N × D]
│ ├── source_channel: string[N]
│ ├── timestamps: f64[N]
│ ├── session_ids: string[N]
│ ├── tags: string[N]
│ ├── tombstones: u8[N] (0=active, 1=deleted)
│ └── norms: f32[N] (pre-computed L2 norms)
├── /sessions
│ ├── ids: string[S]
│ ├── start_idxs: i64[S]
│ ├── end_idxs: i64[S]
│ ├── channels: string[S]
│ ├── timestamps: f64[S]
│ └── summaries: string[S]
└── /knowledge_graph
├── entity_ids: i64[E]
├── entity_names: string[E]
├── entity_types: string[E]
├── entity_emb_idxs: i64[E]
├── relation_srcs: i64[R]
├── relation_tgts: i64[R]
├── relation_types: string[R]
├── relation_weights: f32[R]
└── relation_ts: f64[R]
-
Add the dependency with the feature flags appropriate for your target platform (see Feature Flags).
-
Create a
MemoryConfigwith your agent's embedding model and dimensionality. -
Use the
AgentMemorytrait for all memory operations:save()/save_batch()— add memoriesdelete()— tombstone a memorycompact()— reclaim space when tombstone fraction exceedscompact_thresholdsnapshot()— create a timestamped backup copyadd_session()/get_session_summary()— session management
-
Choose your search path:
- Simple: Use
cosine_similarity_batch_prenorm+top_kfor brute-force vector search. - Hybrid: Build a
BM25Indexand callhybrid_searchfor combined semantic + keyword retrieval. - Adaptive: Use
auto_select_strategy+search_with_metricsto automatically pick the fastest backend for your collection size. - Large scale: Build an
IVFPQIndexfor collections above 100K vectors.
- Simple: Use
-
Persist to disk —
HDF5Memoryuses atomic writes (temp file + rename) so crashes never corrupt the file. -
Open existing files with
HDF5Memory::open(path)— schema validation is automatic.
HDF5Memory is Send but not Sync. For concurrent access, wrap in Arc<Mutex<HDF5Memory>>, use a single-writer pattern with the snapshot mechanism for readers, or use AsyncHDF5Memory (feature async) which handles concurrency internally via a background writer task.
All fallible operations return Result<T, MemoryError>. The error variants are:
MemoryError::Io— file system errorsMemoryError::Hdf5— HDF5 format or parsing errorsMemoryError::Schema— schema version mismatch or missing fieldsMemoryError::NotFound— requested entity/session does not exist
# Default (float16 only)
cargo build -p edgehdf5-memory
# With Apple Accelerate
cargo build -p edgehdf5-memory --features accelerate
# All features (macOS)
cargo build -p edgehdf5-memory --features "float16,accelerate,parallel,gpu"
# Migration CLI
cargo build -p edgehdf5-migrate --releasecargo test --workspacecargo bench -p edgehdf5-memoryEdgeHDF5 is a fully independent library — no external agent framework required.
- Import
edgehdf5-memorydirectly into any Rust project - Implement your own embedding pipeline; EdgeHDF5 stores and searches pre-computed vectors
- The
AgentMemorytrait is a simple interface:save,search,delete,compact,snapshot - Works with any embedding model (OpenAI, Cohere, local models, etc.)
- The
edgehdf5-migrateCLI converts existing SQLite agent memory databases to HDF5
EdgeHDF5 depends on the rustyhdf5 stack for HDF5 I/O:
| Crate | Role |
|---|---|
rustyhdf5-format |
HDF5 format definitions |
rustyhdf5 |
Core HDF5 read/write |
rustyhdf5-io |
Memory-mapped file I/O |
rustyhdf5-accel |
SIMD acceleration primitives |
rustyhdf5-gpu |
wgpu GPU compute backend (optional) |
No C HDF5 library is required — rustyhdf5 is a pure Rust HDF5 implementation.
MIT