HNSW-CLJ: Vector Search in Clojure

Status: Pre-alpha (early WIP)

v0.1.1

Fast HNSW search in Clojure with no third-party dependencies, delivering sub-millisecond results via SIMD- and BLAS-optimized distance core. Achieves 4–16× speedup. The strength of this solution is that it targets the main runtime bottleneck, leverages data-level parallelism, minimizes overhead, and aligns computation with the hardware’s strengths—significantly reducing query latency and increasing throughput.

Backround: About 3–4 years ago I began exploring how to implement HNSW in Clojure without third-party dependencies and with full customizability—because AI keeps changing, and so do the needs.

Why accelerating distance computation matters

In HNSW, 70–95% of query time is spent computing distances/inner products.

SIMD + Java Vector API: 4–16× fewer instructions, fewer cache misses, lower latency.

JBLAS (BLAS kernels): Near-peak CPU throughput with SIMD + multithreading for large batches.

Result: Higher QPS, lower P95/P99 latency.

⚡ Performance Results

I use the Old Hungarian Károli Bible for semantic similarity research, where pairing Hebrew and Greek meanings is a challenging task (it was). In this environment, it becomes immediately clear if something is not working. I tested on 31,173 verses — the full Károli Bible (mpnet-v2, 768-dim embeddings, 492.9 MB index):

🏆 Latest Benchmark Results (August 15, 2025)

Multi-threaded Performance (Production)

Peak Performance: 0.186 ms/query (5,376 QPS) with 20 threads
Average (5 runs): 0.195 ms/query (5,128 QPS)
Consistency: <5% variance across runs

Detailed Performance Metrics

Configuration	QPS	Latency (ms)	vs Target	Status
Best Run	5,376	0.186	+14%	✅ Exceeded
Run 2	5,236	0.191	+10%	✅ Exceeded
Run 3	4,773	0.209	+1%	✅ Met
Average	5,128	0.195	+9%	✅ Exceeded
Original Target	4,719	0.212	-	Baseline

Single-threaded Performance (Baseline)

Single query: 1.093 ms/query (915 QPS)
Speedup: 5.8x with 20 threads parallelization
vs Linear search: 86x faster (single), 500x faster (parallel)

⚡ SIMD + Java Vector API + JBLAS

Approach	Best for	Key benefits
🧮 Java Vector API (SIMD)	Per-query, low-latency search	4–16× fewer instructions, branchless tight loops, cache-friendly arrays, no JNI overhead
📦 JBLAS (native BLAS)	Large batches, high throughput	Optimized BLAS kernels (OpenBLAS/MKL), GEMM batching, multithreaded + SIMD

Impact:

4–16× faster distance kernel (CPU/ISA dependent)
Lower P95/P99 latency, higher QPS

⚠️ ⚠️ Planned: standalone dep after full functionality and examples. End of August. ⚠️ ⚠️

🗂️ Available ANN Algorithms in this branch

1. 📐 Graph-based Algorithms (Graph-based ANN)

Build navigable graph structures, search by hopping from node to node.

Implementation	Namespace	Description	Build	Search	Recall
Pure HNSW	`hnsw.hnsw-search`	Original HNSW algorithm	extreme Slow (300s+) ** ⚠️	0.2 ms	99%+
Ultra Fast HNSW	`hnsw.ultra-fast`	Optimized HNSW	Medium	0.3ms	98%+
Ultra Optimized	`hnsw.ultra-optimized`	Max optimized	Medium	0.2ms	98%+

** I work on this [WIP]

2. 🗃️ Partition-based Algorithms (Partition-based ANN)

Divide data into partitions, search in parallel.

Implementation	Namespace	Description	Build	Search	Recall
Partitioned HNSW	`hnsw.partitioned-hnsw`	Multiple HNSW in parallel	6-7s	0.4-0.6ms	91%
Lightning	`hnsw.lightning`	Simple partitions	61ms	1.5ms	29% **
IVF-FLAT	`hnsw.ivf-flat`	K-means + flat search	2-5s	5-10ms	95%+

** lightening parallel ~ like knn++ > 5-7 sec index, 7 - 1x ms very good recall.

3. 🔀 Hash-based Algorithms (Hash-based ANN)

Group similar vectors using hash functions.

Implementation	Namespace	Description	Build	Search	Recall
Hybrid LSH	`hnsw.hybrid-lsh`	Multi-probe LSH	0.5-1s	2-5ms	45%

4. 🔄 Hybrid Algorithms (Hybrid ANN)

Combine multiple techniques.

Implementation	Namespace	Description	Build	Search	Recall
IVF-HNSW	`hnsw.ivf-hnsw`	IVF + HNSW graph	30-90s	2-5ms	85-90%

5. 📉 Dimension Reduction Algorithms (Dimension Reduction ANN)

First reduce dimensions, then search in reduced space. (my favorite)

Implementation	Namespace	Description	Build	Search	Recall
P-HNSW (PCAF)	`hnsw.pcaf`	Random Projection + HNSW	~5s	1-2ms	85-90%

Example Search Session

Automatically pick one verse → compare it with all other verses → return the top 5 most similar items.

🔍 Search> love 5

🎯 SEARCHING FOR: "love" (top 5 results)
============================================================
✅ Found 179 verses containing 'love'

📖 First match: [Gen 21:23]
   "Now therefore swear unto me here by God that thou wilt not deal falsely..."

🔗 Top 5 semantically similar verses:
------------------------------------------------------------
⏱️ Search completed in 10.12 ms

 1. [Josh 2:12] (84.0%)
    "Now therefore, I pray you, swear unto me by the LORD..."

 2. [Ps 71:18] (83.4%)
    "Now also when I am old and greyheaded, O God, forsake me not..."

 3. [2Sam 16:19] (81.2%)
    "And again, whom should I serve? should I not serve..."

 4. [1Kings 9:6] (81.0%)
    "But if ye shall at all turn from following me..."

 5. [Gen 27:29] (81.0%)
    "Let people serve thee, and nations bow down to thee..."

🔍 Search> beginning 10

🎯 SEARCHING FOR: "beginning" (top 10 results)
============================================================
✅ Found 8 verses containing 'beginning'

📖 First match: [Gen 1:1]
   "In the beginning God created the heaven and the earth."

⏱️ Search completed in 4.45 ms

 1. [Gen 2:4] (92.2%)
    "These are the generations of the heavens and of the earth..."
 2. [Ps 124:8] (81.2%)
    "Our help is in the name of the LORD, who made heaven and earth."
 3. [Heb 1:10] (80.8%)
    "And, Thou, Lord, in the beginning hast laid the foundation..."

📊 Features

✅ Implemented

Ultra-optimized distance functions (SIMD via Java Vector API + JBLAS)
Sub-millisecond search on 31K+ vectors
Interactive search shell with colored output
Save/load index to disk
Multiple distance metrics (Cosine, Euclidean)
Parallel search with linear scaling
Performance benchmarking tools

🚧 TODO

🛠️ Architecture

Essential Files (Production)

src/hnsw/
├── ultra-optimized.clj   # Main HNSW implementation
├── ultra-fast.clj        # SIMD-optimized distance functions  
└── index-io.clj          # Save/load functionality

Additional Files (Full Implementation)

src/hnsw/
├── core.clj              # Examples and demos
├── graph.clj             # Original HNSW algorithm
├── simd.clj              # SIMD support detection
├── simd-optimized.clj    # SIMD implementations
├── filtered.clj          # Filter support (WIP)
├── benchmark.clj         # Benchmarking utilities
└── api.clj               # Public API wrapper

Scripts & Tests

scripts/
├── build_index_complete.clj  # Build 31K index
├── interactive_search.clj    # Interactive shell
└── test_*.clj                # Performance tests

final-test-bench/
└── hnsw-clj-our-solution/    # Production-ready demo
    ├── interactive_search_31k.clj
    ├── PERFORMANCE_ANALYSIS*.md
    └── run_interactive_31k.sh

🚀 Quick Start

;; 1. Build index
(require '[hnsw.ultra-optimized :as opt])
(require '[hnsw.index-io :as io])

(def vectors [["id1" (double-array [0.1 0.2 ...])]
              ["id2" (double-array [0.4 0.5 ...])]])

(def index (opt/build-index vectors))
(io/save-index index "my-index.hnsw")

;; 2. Load and search
(def index (io/load-index "my-index.hnsw" opt/fast-cosine-distance))
(def results (opt/search index query-vector 10))

📈 Benchmarks

Dataset	Vectors	Dims	Build Time	Search (k=10)	QPS	Single Thread
Small	1,000	128	~7s*	0.05 ms	20,000	(0.1 ms / 10,000 QPS)
Medium	10,000	768	~69s*	0.15 ms	6,667	(0.5 ms / 2,000 QPS)
Bible	31,173	768	215s	0.212 ms	4,719	(1.199 ms / 834 QPS)
Large	100,000	768	~690s*	0.50 ms	2,000	(2.5 ms / 400 QPS)

Performance with 20 parallel threads (single thread results in parentheses)
*Build times estimated based on Bible dataset (only Bible dataset values are measured)

🔬 Python hnswlib vs Clojure HNSW-CLJ Comparison

⚠️ Parameter Differences

Identical parameters: M=16, ef_construction=200, ef_search=50, k=10, vectors=31,173, dimensions=768
Single difference: Distance metric - Python (Cosine) vs Clojure (Euclidean)

🏆 Validated Performance Results

Metric	Python hnswlib	Clojure HNSW-CLJ	Winner	Difference
Build Time	25-30s	215s	🐍 Python	8x faster
Single Thread QPS	1,125	834	🐍 Python	1.3x faster
Multi Thread QPS (20)	1,330	4,719	🔧 Clojure	3.5x faster
Average Latency	0.85ms	0.212ms	🔧 Clojure	4x better
Parallel Scaling	1.3x (GIL)	5.7x	🔧 Clojure	True parallelism

💡 Usage Recommendations

Use Python hnswlib when:

Fast prototyping needed
Building indices quickly (8x faster)
Integration with Python ML ecosystem
Single-threaded applications

Use Clojure HNSW-CLJ when:

Production serving with high QPS requirements
Real-time applications needing low latency
High concurrency environments
Scalable microservices

🎯 Optimal Hybrid Approach

# 1. Build index with Python (fast: 25-30 sec)
import hnswlib
index = hnswlib.Index(space='cosine', dim=768)
index.init_index(max_elements=31173, ef_construction=200, M=16)
index.add_items(vectors)
index.save_index('index.bin')

;; 2. Serve with Clojure (high performance: 4,700+ QPS)
(def index (load-index "index.bin"))
(serve-api index {:port 8080})

This combines Python's fast build time with Clojure's excellent serving performance!

For detailed comparison, see PARAMETER_COBENCHMARK_RESULTS_ACTUAL.md and BENCHMARK_RESULTS_PYTHON.md

🔧 Requirements

Clojure 1.11+
Java 21+ (for Vector API)
3GB heap for 31K index

📝 License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
scripts		scripts
src/hnsw		src/hnsw
test		test
.dir-locals.el		.dir-locals.el
.gitignore		.gitignore
BENCHMARK_RESULTS_ACTUAL.md		BENCHMARK_RESULTS_ACTUAL.md
BENCHMARK_RESULTS_PYTHON.md		BENCHMARK_RESULTS_PYTHON.md
README.md		README.md
deps.edn		deps.edn
nrepl-init-optimized.sh		nrepl-init-optimized.sh
run-tests.sh		run-tests.sh
test-functional.sh		test-functional.sh
test-minimal.sh		test-minimal.sh
test-performance.sh		test-performance.sh
test-quick.sh		test-quick.sh
tests.edn		tests.edn

Folders and files

Latest commit

History

Repository files navigation

HNSW-CLJ: Vector Search in Clojure

Status: Pre-alpha (early WIP)

⚡ Performance Results

🏆 Latest Benchmark Results (August 15, 2025)

Multi-threaded Performance (Production)

Detailed Performance Metrics

Single-threaded Performance (Baseline)

⚡ SIMD + Java Vector API + JBLAS

🗂️ Available ANN Algorithms in this branch

1. 📐 Graph-based Algorithms (Graph-based ANN)

2. 🗃️ Partition-based Algorithms (Partition-based ANN)

3. 🔀 Hash-based Algorithms (Hash-based ANN)

4. 🔄 Hybrid Algorithms (Hybrid ANN)

5. 📉 Dimension Reduction Algorithms (Dimension Reduction ANN)

Example Search Session

📊 Features

✅ Implemented

🚧 TODO

🛠️ Architecture

Essential Files (Production)

Additional Files (Full Implementation)

Scripts & Tests

🚀 Quick Start

📈 Benchmarks

🔬 Python hnswlib vs Clojure HNSW-CLJ Comparison

⚠️ Parameter Differences

🏆 Validated Performance Results

💡 Usage Recommendations

🎯 Optimal Hybrid Approach

🔧 Requirements

📝 License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages