A clean, modular C++ implementation of the Vamana graph-based approximate nearest neighbor (ANN) index for students to learn, experiment with, and extend.
Implements the core algorithm from DiskANN: Fast Accurate Billion-point Nearest Neighbor Search on a Single Node (NeurIPS 2019).
For each point (in a random order, parallelized with OpenMP):
- Greedy Search: Search the current graph for the point itself, producing a candidate list of size
L - Robust Prune (α-RNG): Prune candidates to at most
Rdiverse neighbors using the alpha-RNG rule — a candidatecis kept only ifdist(node, c) ≤ α · dist(c, n)for all already-selected neighborsn - Add Edges: Set forward edges; add backward edges to each neighbor
- Degree Check: If any neighbor's degree exceeds
γR, prune its neighborhood
Per-node mutexes ensure correctness during parallel construction.
Greedy beam search starting from a fixed start node, maintaining a candidate set bounded at size L. Returns the top-K closest points found.
| Parameter | Typical Range | Description |
|---|---|---|
R |
32–64 | Max out-degree (graph connectivity) |
L (build) |
75–200 | Search list size during construction (≥ R) |
α (alpha) |
1.0–1.5 | RNG pruning threshold (> 1 keeps long-range edges) |
γ (gamma) |
1.2–1.5 | Degree multiplier triggering neighbor pruning |
L (search) |
10–200 | Search list size at query time (≥ K) |
K |
1–100 | Number of nearest neighbors to return |
graphann/
├── CMakeLists.txt # Build config (C++17, OpenMP, -O3 -march=native)
├── README.md
├── include/
│ ├── distance.h # Squared L2 distance function
│ ├── io_utils.h # fbin/ibin file loaders
│ ├── timer.h # Simple chrono-based timer
│ └── vamana_index.h # VamanaIndex class declaration
├── scripts/
│ ├── convert_vecs.py # Convert fvecs/ivecs → fbin/ibin
│ └── run_sift1m.sh # One-command SIFT1M download, build & search
└── src/
├── distance.cpp # Distance implementation
├── io_utils.cpp # File I/O implementation
├── vamana_index.cpp # Core: greedy_search, robust_prune, build, search
├── build_index.cpp # CLI: build index from data
└── search_index.cpp # CLI: search + recall/latency evaluation
src/vamana_index.cpp— the core algorithm:greedy_search(),robust_prune(),build()include/vamana_index.h— data structures (adjacency list graph, per-node locks)
Requirements: C++17 compiler with OpenMP support (GCC ≥ 7, Clang ≥ 10).
mkdir build && cd build
cmake .. -DCMAKE_BUILD_TYPE=Release
make -jThis produces two executables: build_index and search_index.
A single script downloads the SIFT1M dataset, converts it to binary format, builds a Vamana index, and runs search with recall evaluation:
./scripts/run_sift1m.shThis will:
- Build the project (cmake + make)
- Download SIFT1M (1M base vectors, 10K queries, ground truth) into
tmp/sift/ - Convert
.fvecs/.ivecsfiles to.fbin/.ibinformat intmp/ - Build a Vamana index with default parameters (R=32, L=75, α=1.2, γ=1.5)
- Run search at multiple
Lvalues and report recall@10, latency, and distance computations
Requires: curl, python3 with numpy, and a C++17 compiler with OpenMP.
fbin (float binary): Used for dataset and query vectors.
[4 bytes: uint32 npts] [4 bytes: uint32 dims] [npts * dims * 4 bytes: float32 row-major vectors]
ibin (int binary): Used for ground truth neighbor IDs.
[4 bytes: uint32 npts] [4 bytes: uint32 dims] [npts * dims * 4 bytes: uint32 row-major IDs]
Standard ANN benchmark datasets (SIFT, GIST, GloVe, etc.) are available in this format from ANN Benchmarks and big-ann-benchmarks.
./build_index \
--data /path/to/base.fbin \
--output /path/to/index.bin \
--R 32 --L 75 --alpha 1.2 --gamma 1.5./search_index \
--index /path/to/index.bin \
--data /path/to/base.fbin \
--queries /path/to/query.fbin \
--gt /path/to/gt.ibin \
--K 10 \
--L 10,20,30,50,75,100,150,200Output:
=== Search Results (K=10) ===
L Recall@10 Avg Dist Cmps Avg Latency (us) P99 Latency (us)
--------------------------------------------------------------------------
10 0.5432 320.5 125.3 412.7
20 0.7891 580.2 198.4 623.1
50 0.9234 1205.8 385.6 1102.3
100 0.9812 2280.4 702.1 2015.8
200 0.9965 4350.2 1305.7 3812.4
- Parallelism: OpenMP
parallel for schedule(dynamic)for both build (point insertion) and search (queries) - Memory layout: Contiguous row-major float arrays, 64-byte aligned for SIMD
- Vectorization:
-O3 -march=nativeauto-vectorizes the L2 distance loop — no manual intrinsics needed - Lock granularity: Per-node
std::mutex— threads only contend when updating the same node's adjacency list - No external dependencies beyond OpenMP
-
Code understanding: Use AI tools to understand the logic of algorithm and how it is a hueristic approximation of what we discussed in class
-
Beam width experiments: Try different
Lvalues during build and measure recall vs build time. What's the sweet spot? -
Medoid start node: Replace the random start node with the medoid — the point closest to the centroid of the dataset. How does this affect search recall?
-
Change the edges in index build: Run the build twice — second pass starts from the graph produced by the first. How does recall change?
-
Change the search algorithm: Plot the histogram of node degrees. Is it uniform? What happens with different
αvalues? -
Concurrent search optimization: Replace
std::vector<bool> visitedingreedy_search()with a pre-allocated scratch buffer to avoid per-query allocation.
- Subramanya et al., DiskANN: Fast Accurate Billion-point Nearest Neighbor Search on a Single Node, NeurIPS 2019