Skip to content

gheber/edgehdf5

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

EdgeHDF5

HDF5-backed memory store for on-device AI agents.

EdgeHDF5 is a standalone Rust library that gives any AI agent fast, portable, single-file memory. It stores conversations, embeddings, knowledge graphs, and session history in a single HDF5 file — then searches them at microsecond latency using hardware-adaptive backends (Apple AMX, BLAS, SIMD, GPU).

EdgeHDF5 is framework-agnostic. It works with any agent system that produces embeddings and needs persistent memory — LangChain, custom agent loops, or standalone tools.

rustystack/edgehdf5          MIT License
Rust workspace · 2 crates · ~6.7K LOC

Why HDF5 for Agent Memory?

Concern SQLite + pgvector / Qdrant EdgeHDF5 (HDF5)
Deployment Requires a running database process or client library Single .h5 file, no daemon, no network
Vector search Bolted-on extension; query planner overhead Native flat arrays with SIMD/BLAS/GPU dispatch
Portability Tied to OS-level SQLite or container images One file — copy, snapshot, ship to another device
Memory mapping Page-level I/O with SQLite WAL Direct mmap of contiguous float arrays
Typed storage Blobs or JSON columns for embeddings First-class N×D float32/float16 datasets
Compression Row-level, limited Deflate on datasets; PQ for 8× vector compression
On-device AI Heavy dependency tree Zero-network, single-file, deterministic I/O

EdgeHDF5 targets use cases where the agent runs on the user's machine — laptops, edge devices, CI runners — and needs memory that is fast, self-contained, and inspectable.


Architecture

┌──────────────────────────────────────────────────────────────────┐
│                        HDF5Memory (lib.rs)                       │
│         AgentMemory trait: save · search · sessions · kg         │
├──────────┬──────────┬─────────────┬─────────────┬────────────────┤
│  cache   │ session  │  knowledge  │   schema    │   storage      │
│  (LRU)   │  mgmt    │  graph      │  (v1.0)     │  (mmap I/O)   │
├──────────┴──────────┴─────────────┴─────────────┴────────────────┤
│                      Search Layer                                │
│  ┌──────────────────────────────────────────────────────────┐    │
│  │              strategy.rs — Adaptive Dispatch              │    │
│  │  Scalar → SIMD → BLAS → Accelerate → Rayon → GPU → IVF-PQ│   │
│  └────┬────────┬────────┬──────────┬────────┬────────┬──────┘    │
│       │        │        │          │        │        │            │
│    vector   blas     accel      gpu     ivf+pq    hybrid         │
│    search   search   search    search   index    vec+bm25        │
│    (SIMD)   (sgemm)  (cblas)   (wgpu)   (ANN)    (RRF)          │
├──────────────────────────────────────────────────────────────────┤
│  rustyhdf5 stack: rustyhdf5 · rustyhdf5-io · rustyhdf5-accel     │
│                     rustyhdf5-format · rustyhdf5-gpu               │
└──────────────────────────────────────────────────────────────────┘
                              │
                    ┌─────────┴─────────┐
                    │   agent_memory.h5  │
                    │  /meta             │
                    │  /memory           │
                    │  /sessions         │
                    │  /knowledge_graph  │
                    └───────────────────┘

Crate Structure

Crate Path Description
edgehdf5-memory crates/edgehdf5-memory Core library — memory store, search backends, knowledge graph, sessions
edgehdf5-migrate crates/edgehdf5-migrate CLI tool to migrate existing SQLite agent databases to HDF5

Quick Start

Add the dependency

[dependencies]
edgehdf5-memory = { git = "ssh://git@github.com/rustystack/edgehdf5.git", features = ["float16"] }

For Apple Silicon machines, enable hardware-accelerated search:

edgehdf5-memory = { git = "ssh://git@github.com/rustystack/edgehdf5.git", features = ["float16", "accelerate"] }

Create a memory store and add entries

use edgehdf5_memory::{HDF5Memory, MemoryConfig, MemoryEntry, AgentMemory};
use std::path::PathBuf;

// Configure the memory store
let config = MemoryConfig {
    path: PathBuf::from("agent_memory.h5"),
    agent_id: "my-agent".into(),
    embedder: "openai:text-embedding-3-small".into(),
    embedding_dim: 384,
    chunk_size: 512,
    overlap: 50,
    float16: true,
    compression: true,
    compression_level: 4,
    compact_threshold: 0.3,
    created_at: "2025-01-01T00:00:00Z".into(),
};

let mut memory = HDF5Memory::create(config)?;

// Add a memory entry
let entry = MemoryEntry {
    chunk: "The user prefers dark mode and uses vim keybindings.".into(),
    embedding: embed("The user prefers dark mode..."),  // your embedding fn
    source_channel: "chat".into(),
    timestamp: 1700000000.0,
    session_id: "session-001".into(),
    tags: "preference,ui".into(),
};

let id = memory.save(entry)?;

// Add a session summary
memory.add_session("session-001", 0, 5, "chat", "Discussed UI preferences")?;

Search memories

use edgehdf5_memory::vector_search::{cosine_similarity_batch_prenorm, top_k, compute_norm};

let query = embed("What are the user's UI preferences?");
let query_norm = compute_norm(&query);

// Get the in-memory cache for search
let cache = memory.cache();
let scores = cosine_similarity_batch_prenorm(
    &query,
    &cache.embeddings,
    &cache.norms,
    &cache.tombstones,
);
let results = top_k(&scores, 5);

for (idx, score) in &results {
    println!("[{:.3}] {}", score, cache.chunks[*idx]);
}

Hybrid search (vector + BM25)

use edgehdf5_memory::bm25::BM25Index;
use edgehdf5_memory::hybrid::hybrid_search;

let bm25 = BM25Index::build(&cache.chunks, &cache.tombstones);

let results = hybrid_search(
    &query_embedding,
    "user UI preferences dark mode",
    &cache.embeddings,
    &cache.chunks,
    &cache.tombstones,
    &bm25,
    0.7,   // vector weight
    0.3,   // keyword weight
    5,     // top-k
);

Adaptive strategy (auto-selects the best backend)

use edgehdf5_memory::strategy::{auto_select_strategy, search_with_metrics, HardwareCapabilities};

let hw = HardwareCapabilities {
    rayon_available: cfg!(feature = "parallel"),
    gpu_available: false,
    blas_available: cfg!(feature = "fast-math"),
    accelerate_available: cfg!(feature = "accelerate"),
};

let strategy = auto_select_strategy(cache.embeddings.len(), &hw);
// Returns: Scalar | SimdBruteForce | Blas | Accelerate | RayonParallel | Gpu | IvfPq

let (results, metrics) = search_with_metrics(
    &query, &cache.embeddings, &cache.norms, &cache.tombstones,
    10, strategy, &mut None,
);
println!("Strategy: {}, Time: {}µs", metrics.strategy, metrics.search_time_us);

Knowledge graph

let entity_a = memory.add_entity("Alice", "person", -1)?;
let entity_b = memory.add_entity("ProjectX", "project", -1)?;
memory.add_relation(entity_a, entity_b, "works_on", 1.0)?;

let relations = memory.knowledge().get_relations_from(entity_a);

Async usage (tokio)

use edgehdf5_memory::async_memory::{AsyncHDF5Memory, AsyncConfig};
use std::time::Duration;

let config = AsyncConfig {
    flush_interval: Duration::from_secs(10),
    flush_threshold: 100,
};

let mem = AsyncHDF5Memory::open_with("agent_memory.h5", config).await?;

// Saves are buffered and batched by a background writer task
mem.save(entry).await?;
mem.save_batch(entries).await?;

// Searches are offloaded to spawn_blocking
let results = mem.hybrid_search(embedding, "query".into(), 0.7, 0.3, 5).await;

// Graceful shutdown flushes remaining writes
mem.shutdown().await?;

The async wrapper keeps the synchronous HDF5Memory core on a background thread, batches saves through an mpsc channel, and auto-flushes on a configurable interval or when pending WAL entries exceed a threshold.


Performance

Benchmarked on MacBook Pro M3 Max, 384-dimensional embeddings:

Backend 1K vectors 10K vectors 100K vectors Notes
Scalar (baseline) 42µs 410µs 4.1ms No SIMD, no dependencies
SIMD brute-force 18µs 175µs 1.7ms rustyhdf5-accel auto-dispatch
Apple Accelerate (cblas) 15µs 157µs 1.5ms AMX coprocessor via Accelerate.framework
BLAS (matrixmultiply) 17µs 168µs 1.6ms Cross-platform, no system libs
Rayon parallel 35µs 120µs 980µs Scales with core count
GPU (wgpu) 200µs 190µs 650µs Amortized; wins at scale
IVF-PQ N/A (overhead > brute-force) 850µs 380µs 6.2× faster than numpy; index build amortized

Adaptive strategy thresholds:

Collection Size Auto-Selected Strategy
< 1,000 Scalar brute-force
1K – 10K SIMD brute-force (or Accelerate/BLAS if available)
10K – 50K Rayon parallel (or Accelerate/BLAS)
50K – 500K GPU (if available) or Accelerate/BLAS
> 500K IVF-PQ approximate search with exact reranking

Storage efficiency with Product Quantization:

  • 384-dim float32 → 48 bytes per vector (PQ codes) = 8× compression
  • 100K vectors: 146 MB (raw) → 4.6 MB (PQ codes + codebook)

Feature Flags

Feature Default Description
float16 yes Half-precision embedding storage via the half crate
parallel no Rayon-based parallel search
fast-math no BLAS matrix-vector multiply via matrixmultiply (cross-platform)
accelerate no Apple Accelerate framework — cblas_sgemv on AMX (macOS only)
openblas no OpenBLAS cblas_sgemv (Linux)
gpu no GPU-accelerated search via wgpu (Metal/Vulkan/DX12)
async no Tokio-based async wrapper with background flush (batched writes, auto-flush)

Recommended feature sets

# macOS / Apple Silicon (best performance)
features = ["float16", "accelerate", "parallel"]

# Linux server
features = ["float16", "openblas", "parallel"]

# Cross-platform (no system library dependencies)
features = ["float16", "fast-math", "parallel"]

# Maximum acceleration (macOS with GPU)
features = ["float16", "accelerate", "parallel", "gpu"]

# Async agent runtime (tokio)
features = ["float16", "accelerate", "parallel", "async"]

Platform Support

Platform Status Optimized Backend
macOS / Apple Silicon Primary target Accelerate (AMX), Metal (wgpu)
macOS / Intel Supported Accelerate (SSE/AVX), Metal (wgpu)
Linux x86_64 Supported OpenBLAS, Vulkan (wgpu)
Linux aarch64 Supported OpenBLAS, Vulkan (wgpu)
Windows x86_64 Supported matrixmultiply, DX12 (wgpu)

All platforms fall back to scalar or SIMD brute-force when no accelerated backend is compiled in.


Migration from SQLite

The edgehdf5-migrate CLI converts existing SQLite agent memory databases to HDF5.

Install

cargo install --path crates/edgehdf5-migrate

Usage

edgehdf5-migrate \
  --sqlite old_memory.db \
  --hdf5 agent_memory.h5 \
  --agent-id my-agent \
  --embedder openai:text-embedding-3-small \
  --embedding-dim 384 \
  --skip-deleted \
  --compression \
  --compression-level 6 \
  --verbose

Options

Flag Default Description
--sqlite (required) Path to source SQLite database
--hdf5 (required) Path to output HDF5 file
--agent-id migrated Agent identifier to write into /meta
--embedder unknown Embedder model name
--embedding-dim (auto-detect) Embedding dimensionality; auto-detected from first row if omitted
--skip-deleted false Skip rows with deleted=1
--compression false Enable deflate compression
--compression-level 4 Compression level (1–9)
--float16 false Store embeddings as float16 (halves storage)
--dry-run false Validate migration without writing the output file
--verbose false Print progress and statistics

Expected SQLite schema

The migration tool reads from these tables:

  • memory_chunkschunk TEXT, embedding BLOB, source_channel TEXT, timestamp REAL, session_id TEXT, tags TEXT, deleted INTEGER
  • sessionsid TEXT, start_idx INTEGER, end_idx INTEGER, channel TEXT, timestamp REAL, summary TEXT
  • entitiesid INTEGER, name TEXT, entity_type TEXT, embedding_idx INTEGER
  • relationssrc INTEGER, tgt INTEGER, relation TEXT, weight REAL, ts REAL

HDF5 File Schema (v1.0)

agent_memory.h5
├── /meta                          (attributes)
│   ├── schema_version: "1.0"
│   ├── edgehdf5_version: "1.93.0"
│   ├── agent_id, embedder, embedding_dim
│   ├── chunk_size, overlap
│   └── created_at
├── /memory
│   ├── chunks:          string[N]       (NullPad encoded)
│   ├── embeddings:      f32[N × D]
│   ├── source_channel:  string[N]
│   ├── timestamps:      f64[N]
│   ├── session_ids:     string[N]
│   ├── tags:            string[N]
│   ├── tombstones:      u8[N]           (0=active, 1=deleted)
│   └── norms:           f32[N]          (pre-computed L2 norms)
├── /sessions
│   ├── ids:             string[S]
│   ├── start_idxs:      i64[S]
│   ├── end_idxs:        i64[S]
│   ├── channels:        string[S]
│   ├── timestamps:      f64[S]
│   └── summaries:       string[S]
└── /knowledge_graph
    ├── entity_ids:       i64[E]
    ├── entity_names:     string[E]
    ├── entity_types:     string[E]
    ├── entity_emb_idxs:  i64[E]
    ├── relation_srcs:    i64[R]
    ├── relation_tgts:    i64[R]
    ├── relation_types:   string[R]
    ├── relation_weights: f32[R]
    └── relation_ts:      f64[R]

Integration Guide

For teams adding EdgeHDF5 to an existing agent

  1. Add the dependency with the feature flags appropriate for your target platform (see Feature Flags).

  2. Create a MemoryConfig with your agent's embedding model and dimensionality.

  3. Use the AgentMemory trait for all memory operations:

    • save() / save_batch() — add memories
    • delete() — tombstone a memory
    • compact() — reclaim space when tombstone fraction exceeds compact_threshold
    • snapshot() — create a timestamped backup copy
    • add_session() / get_session_summary() — session management
  4. Choose your search path:

    • Simple: Use cosine_similarity_batch_prenorm + top_k for brute-force vector search.
    • Hybrid: Build a BM25Index and call hybrid_search for combined semantic + keyword retrieval.
    • Adaptive: Use auto_select_strategy + search_with_metrics to automatically pick the fastest backend for your collection size.
    • Large scale: Build an IVFPQIndex for collections above 100K vectors.
  5. Persist to diskHDF5Memory uses atomic writes (temp file + rename) so crashes never corrupt the file.

  6. Open existing files with HDF5Memory::open(path) — schema validation is automatic.

Thread safety

HDF5Memory is Send but not Sync. For concurrent access, wrap in Arc<Mutex<HDF5Memory>>, use a single-writer pattern with the snapshot mechanism for readers, or use AsyncHDF5Memory (feature async) which handles concurrency internally via a background writer task.

Error handling

All fallible operations return Result<T, MemoryError>. The error variants are:

  • MemoryError::Io — file system errors
  • MemoryError::Hdf5 — HDF5 format or parsing errors
  • MemoryError::Schema — schema version mismatch or missing fields
  • MemoryError::NotFound — requested entity/session does not exist

Building

# Default (float16 only)
cargo build -p edgehdf5-memory

# With Apple Accelerate
cargo build -p edgehdf5-memory --features accelerate

# All features (macOS)
cargo build -p edgehdf5-memory --features "float16,accelerate,parallel,gpu"

# Migration CLI
cargo build -p edgehdf5-migrate --release

Running tests

cargo test --workspace

Running benchmarks

cargo bench -p edgehdf5-memory

EdgeHDF5 Integration

EdgeHDF5 is a fully independent library — no external agent framework required.

  • Import edgehdf5-memory directly into any Rust project
  • Implement your own embedding pipeline; EdgeHDF5 stores and searches pre-computed vectors
  • The AgentMemory trait is a simple interface: save, search, delete, compact, snapshot
  • Works with any embedding model (OpenAI, Cohere, local models, etc.)
  • The edgehdf5-migrate CLI converts existing SQLite agent memory databases to HDF5

Dependencies

EdgeHDF5 depends on the rustyhdf5 stack for HDF5 I/O:

Crate Role
rustyhdf5-format HDF5 format definitions
rustyhdf5 Core HDF5 read/write
rustyhdf5-io Memory-mapped file I/O
rustyhdf5-accel SIMD acceleration primitives
rustyhdf5-gpu wgpu GPU compute backend (optional)

No C HDF5 library is required — rustyhdf5 is a pure Rust HDF5 implementation.


License

MIT

About

HDF5-backed agent memory for on-device AI

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors