Skip to content

Add diskann-benchmark-multi-vector crate#1027

Open
suri-kumkaran wants to merge 8 commits intomainfrom
users/suryangupta/multi-vector-benchmark
Open

Add diskann-benchmark-multi-vector crate#1027
suri-kumkaran wants to merge 8 commits intomainfrom
users/suryangupta/multi-vector-benchmark

Conversation

@suri-kumkaran
Copy link
Copy Markdown
Contributor

@suri-kumkaran suri-kumkaran commented May 6, 2026

New benchmark crate for the Chamfer and MaxSim multi-vector distance
operations from diskann-quantization. Plugs into the existing
diskann-benchmark-runner regression workflow; mirrors the structure of
diskann-benchmark-simd.

What's new

  • Workspace member diskann-benchmark-multi-vector + benchmark-multi-vector binary.
  • Four registered regression kernels — (f32, f16) × (optimized, reference):
    • optimized drives QueryComputer (architecture-dispatched SIMD).
    • reference drives the Chamfer / MaxSim fallback used by the
      multi_vector unit tests.
  • Example configs: a benchmark matrix (multi-vector.json), a smoke
    config consumed by integration tests (test.json), and a default 5%
    tolerance (tolerance.json).
  • README documenting the kernels, the ns/IP @ Dim metric, and the
    run / check verify / check run workflow.

Design notes

  • Two-axis dispatch (element type × implementation). Arch selection
    is delegated to QueryComputer; not re-exposed here.
  • QueryComputer::new is hoisted out of the timed loop to preserve
    the production amortization model. Reference path is hoisted
    symmetrically.
  • ns/IP @ Dim = min_latency_µs * 1000 / (Q * D * loops).
    Invariant under Q, D, and loop budget; approximately linear in
    Dim. Per-shape regression detection is robust because the Dim
    factor cancels.
  • Repetitive boilerplate collapsed into macros so adding an element type
    is a one-liner.

Tests

8 unit tests (tolerance boundary, mismatched-run guard, zero-baseline
rejection, rendered table content) + 2 integration tests driving the
CLI.

@suri-kumkaran suri-kumkaran self-assigned this May 6, 2026
@suri-kumkaran suri-kumkaran requested review from a team and Copilot May 6, 2026 21:17
@suri-kumkaran suri-kumkaran linked an issue May 6, 2026 that may be closed by this pull request
4 tasks
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds a new diskann-benchmark-multi-vector crate to benchmark and regression-check multi-vector distance ops (Chamfer, MaxSim) across f32/f16 and optimized/reference implementations, integrated with diskann-benchmark-runner.

Changes:

  • Introduces benchmark library with dispatcher registration, benchmark execution, and regression checking logic.
  • Adds benchmark-multi-vector CLI + integration tests and example job/tolerance configs used by the runner workflow.
  • Updates workspace to include the new crate.

Reviewed changes

Copilot reviewed 8 out of 9 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
diskann-benchmark-multi-vector/src/lib.rs Core benchmark/regression logic, dispatch rules, fixtures, and unit tests.
diskann-benchmark-multi-vector/src/bin.rs CLI entrypoint wiring registry + basic integration tests.
diskann-benchmark-multi-vector/examples/tolerance.json Default tolerance config consumed by check verify/run.
diskann-benchmark-multi-vector/examples/test.json Small smoke input for integration tests.
diskann-benchmark-multi-vector/examples/multi-vector.json Full benchmark matrix input example.
diskann-benchmark-multi-vector/README.md Documents kernels, metric normalization, and runner workflows.
diskann-benchmark-multi-vector/Cargo.toml New crate manifest + deps/bin target.
Cargo.toml Adds new crate to workspace members.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +336 to +338
#[derive(Debug, Error)]
#[error("this kernel handles a different implementation than {0}")]
pub(crate) struct ImplementationMismatch(Implementation);
Comment on lines +273 to +284
match change {
Ok(change) => {
row.insert(format!("{:.3} %", change * 100.0), 6);
if change > c.tolerance.min_time_regression.get() {
row.insert("FAIL", 7);
}
}
Err(err) => {
row.insert("invalid", 6);
row.insert(err, 7);
}
}
Comment on lines +501 to +508
match relative_change(before_min, after_min) {
Ok(change) => {
if change > tolerance.min_time_regression.get() {
passed = false;
}
}
Err(_) => passed = false,
};
Comment on lines +580 to +585
let min_latency = r
.latencies
.iter()
.min()
.copied()
.unwrap_or(MicroSeconds::new(u64::MAX));
Comment on lines +609 to +629
fn run_loops<F>(run: &Run, mut body: F) -> RunResult
where
F: FnMut(),
{
let mut latencies = Vec::with_capacity(run.num_measurements.get());

for _ in 0..run.num_measurements.get() {
let start = std::time::Instant::now();
for _ in 0..run.loops_per_measurement.get() {
body();
}
latencies.push(start.elapsed().into());
}

let percentiles = percentiles::compute_percentiles(&mut latencies).unwrap();
RunResult {
run: run.clone(),
latencies,
percentiles,
}
}
.join("tolerance.json");

let stdout = run_check_test(&input_path, &tolerance_path);
println!("stdout = {}", stdout);
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 7, 2026

Codecov Report

❌ Patch coverage is 85.55759% with 79 lines in your changes missing coverage. Please review.
✅ Project coverage is 89.44%. Comparing base (c50fb2b) to head (6b33719).
⚠️ Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
diskann-benchmark-multi-vector/src/lib.rs 84.78% 75 Missing ⚠️
diskann-benchmark-multi-vector/src/bin.rs 92.59% 4 Missing ⚠️

❌ Your patch status has failed because the patch coverage (85.55%) is below the target coverage (90.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1027      +/-   ##
==========================================
- Coverage   89.51%   89.44%   -0.08%     
==========================================
  Files         460      462       +2     
  Lines       85466    85916     +450     
==========================================
+ Hits        76508    76849     +341     
- Misses       8958     9067     +109     
Flag Coverage Δ
miri 89.44% <85.55%> (-0.08%) ⬇️
unittests 89.29% <85.55%> (-0.08%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
diskann-benchmark-multi-vector/src/bin.rs 92.59% <92.59%> (ø)
diskann-benchmark-multi-vector/src/lib.rs 84.78% <84.78%> (ø)

... and 11 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Contributor

@hildebrandmw hildebrandmw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Suryansh, this will be useful for performance optimization. One big question I had is whether or not this needs to be its own crate, or whether it can be folded into diskann-benchmark (which will need some work to put the normal indexing code behind a feature gate to keep compile times low) or as a binary in diskann-quantization directly.

The pattern established by diskann-benchmark-simd is a little unfortunate so now is the time to think about our approach before we have to deal with a crate proliferation. We also might be able to bury all benchmark related binary crates in a subdirectory to keep the top level organized.

In either case, I don't think we should publish this to crates.io - and we should probably stop publishing diskann-benchmark-simd as well.

($f:ident, $field:tt, $($expr:tt)*) => {
writeln!($f, "{:>18}: {}", $field, $($expr)*)
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since #996, manual alignment shouldn't be needed any more.

macro_rules! register {
($impl:ident, $t:ty, $tag:literal) => {
dispatcher.register_regression($tag, Kernel::<$impl, $t>::new());
};
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: The macro is offering only a moderate benefit in terseness. Might be better off without it?

}

// Optimized (architecture-dispatched QueryComputer).
register!(Optimized, f32, "multi-vector-op-f32-optimized");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably provide a way of overriding the runtime architecture to test both AVX512 and AVX2 kernels on the same machine. Otherwise, we have no way to compare such implementations against each other ...

}
}
};
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be cleaner to get rid of DispatchRule and use an entirely bespoke, smaller trait. Like:

impl TryFrom<Implementation> for Optimized {
    type Error = FailureScore;
    fn try_from(i: Implementation) -> Result<Self, Self::Error> {
        match i {
            Implementation::Optimized => Ok(Self),
            _ => Err(FailureScore(1)),
        }
    }
}

and write matching via TryFrom.

const RNG_SEED: u64 = 0x12345;

struct Data<T> {
query_data: Box<[T]>,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We spend so much time writing matrix abstractions. Why not use them? If the constructors are insufficient, that's a sign we have work to do!

results.push(result);
}
Ok(results)
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two run loops are nearly identical. Perhaps a tweak to factor out the common parts? Maybe consider keeping the loop nest the same, but taking a trait object (to reduce compile times) for the distance implementation?

}

impl_kernel_for!(f32);
impl_kernel_for!(f16);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the abstractions these macros are stamping were also introduced in this PR, consider whether or not there's a simpler way to express things via generics that doesn't require macros.

| `multi-vector-op-f16-reference` | `f16` | `Chamfer` / `MaxSim` |

The **optimized** path constructs a `QueryComputer` once per shape (which
internally selects the best available SIMD kernel for the host) and calls
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're using this as a development tool ... it's probably a good idea to allow an override for the architecture.

bench run --input-file examples/multi-vector.json --output-file before.json
```

### Regression check workflow
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The section on regression checks seems to be just a copy+paste of how diskann-benchmark-runner works. Does this need to be spelled out in the README like this?

The crate registers four kernels — one per `(element_type, implementation)`
pair:

| Tag | Element | Implementation |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While it's fine to have a list like this, the chances of this getting updated as kernels are added is nearly zero. Maybe instead rely on the inputs behavior of the runner APP and ensure that the returned descriptions are sufficient?

@harsha-simhadri harsha-simhadri requested a review from suhasjs May 8, 2026 19:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

Add benchmarking suite for multi-vector operations

4 participants