Skip to content

(diversity) Integrate determinant diversity in tokp full precision search and disk search#1011

Open
narendatha wants to merge 28 commits intomainfrom
u/narendatha/det_div_plugins
Open

(diversity) Integrate determinant diversity in tokp full precision search and disk search#1011
narendatha wants to merge 28 commits intomainfrom
u/narendatha/det_div_plugins

Conversation

@narendatha
Copy link
Copy Markdown
Contributor

@narendatha narendatha commented May 4, 2026

Integrate determinant-diversity into topk full-precision search and disk search

Summary

Wires the determinant-diversity post-processing algorithm as an optional SearchPlugin for two search paths:

  • Async graph-index (full-precision topk via inmem::FullAccessor)
  • Disk-index (via DiskIndexSearcher post-processor)

Determinant-diversity reranks search results to increase geometric diversity among the top-k, at a configurable quality/speed tradeoff.

Motivation

Standard nearest neighbor search often returns highly similar/redundant results. For some scenarios like RAG search, it's more valuable to retrieve diverse, complementary documents that cover different aspects of the query. This PR implements a principled approach based on determinant maximization of the Gram matrix of query-scaled vectors.

Algorithm

The algorithm scales each candidate vector by (similarity_to_query)^power and then greedily selects vectors to maximize the determinant of the Gram matrix (equivalent to maximizing the volume of the parallelepiped spanned by the vectors).

Two variants are implemented:

  • eta = 0: Pure greedy orthogonalization (pivoted QR decomposition)
  • eta > 0: Ridge-regularized version that balances diversity with numerical stability

Both variants run in O(n k d) time where n = candidates, k = results to return, d = dimension.

Performance Characteristics

Tested on OpenAI embeddings dataset (3072 dimensions, 10K sample):

Metric Normal Search Det_Div Search Notes
QPS ~150 ~130 ~13% overhead for diversity optimization
Recall@10 98-99% 87-88% Expected trade-off for diversity

The recall reduction is expected - Det_Div search intentionally selects diverse vectors rather than the absolute nearest neighbors.

Diversity Improvement

We measure diversity using log-determinant of the Gram matrix of scaled vectors (higher = vectors span more volume = more diverse).

  • RAG search consistently improves diversity (all improvements >= 0)
  • Average improvement of +2.9 to +3.3 in log_det corresponds to 10-27x more volume spanned by selected vectors

Parameter Tuning Guide

power (default: 2.0)

Controls the trade-off between relevance and diversity:

  • Higher values (e.g., 3.0+): Strongly prefer relevance - vectors closer to the query get much higher weight, resulting in less diversity but higher recall
  • Lower values (e.g., 1.0): Prefer diversity over relevance - similarity weighting is more uniform, allowing more diverse vectors to be selected
  • Value of 0: Pure diversity maximization with no relevance weighting

eta (default: 0.01)

Ridge regularization parameter that controls numerical robustness and relevance preference:

  • Higher values (e.g., 0.1+): More robust numerically, but biases toward selecting high-relevance vectors over diverse ones
  • Lower values (e.g., 0.001): Closer to pure determinant maximization, maximizes diversity but may be less stable
  • Value of 0: Falls back to pure greedy orthogonalization (no ridge regularization)

Recommended Settings

Use Case power eta
Balanced (default) 2.0 0.01
High diversity 1.0 0.001
High relevance 3.0 0.1

Changes

Core algorithm (diskann-providers)

  • Added determinant_diversity_post_process() in diskann-providers/src/model/graph/provider/async_/determinant_diversity_post_process.rs
  • Two selection strategies controlled by eta:
    • eta == 0: greedy orthogonalization
    • eta > 0: eta-weighted residual selection

Async graph-index path (diskann-benchmark)

  • Added FullPrecisionVectorAccessor shim trait + impl for inmem::FullAccessor<f32, ...> so the post-processor can fetch real candidate vectors
  • Implemented DeterminantDiversity plugin that satisfies glue::SearchPostProcess using for<'a, 'b> SearchStrategy HRTB bound (credit: @hildebrandmw)
  • Wired run_topk_timed helper to expose QPS/latency/recall metrics in the benchmark output

Disk-index path (diskann-disk, diskann-benchmark)

  • Added DeterminantDiversityAndFilter processor to DiskSearchPostProcessor enum
  • Extended searcher.search() to accept an optional SearchPostProcessorKind::DeterminantDiversity { power, eta }
  • Extended DiskSearchPhase JSON input to accept post_processor
  • Result-counting uses result_count from search stats to correctly bound post-processed results

Input/config

  • New TopkPostProcessor serde enum in diskann-benchmark/src/inputs/post_processor.rs with validation (power > 0, eta >= 0)
  • Two new example configs: async-determinant-diversity.json, disk-index-determinant-diversity.json

Benchmark JSON examples

// async graph-index
"post_processor": { "type": "determinant-diversity", "power": 1.0, "eta": 0.0 }

// disk-index
"post_processor": { "type": "determinant-diversity", "power": 1.0, "eta": 0.5 }

Notes

  • Determinant-diversity is only wired for full-precision topk on the async path; quantized (PQ/scalar) paths are intentionally excluded
  • Spherical, range, and filtered-search plugin variants are unchanged; this post-processor is topk-only

@narendatha narendatha changed the base branch from main to mhildebr/benchmark-plugins May 4, 2026 12:41
Base automatically changed from mhildebr/benchmark-plugins to main May 5, 2026 00:45
narendatha and others added 10 commits May 5, 2026 12:41
Replace kind()-based string equality checks with explicit is_match()
and get() phase-shape helpers on plugin structs. This avoids fragile
ordering assumptions and makes each plugin responsible for recognising
its own phase shape.
Co-authored-by: Copilot <copilot@github.com>
…plugins

# Conflicts:
#	diskann-benchmark/src/backend/index/benchmarks.rs
#	diskann-benchmark/src/backend/index/product.rs
#	diskann-benchmark/src/backend/index/scalar.rs
#	diskann-benchmark/src/backend/index/search/plugins.rs
#	diskann-benchmark/src/backend/index/spherical.rs
#	diskann-benchmark/src/inputs/graph_index.rs
#	diskann-benchmark/src/inputs/mod.rs
Co-authored-by: Copilot <copilot@github.com>
…ojection issue

This commit explores approaches to wire real candidate vectors into async determinant-diversity post-processing.

Current state: IN COMPILATION ERROR (intentional for analysis)

Attempted approaches:
1. Initial shim-trait FullPrecisionVectorAccessor with async get_full_precision_vector()
   - Resulted in 'implementation not general enough' at search_with() call

2. Removed explicit for<'a> post_processor::DeterminantDiversity bound
   - Still fails - the constraint is inherent in search_with() signature itself

Root cause analysis:
- search_with() requires: PP: for<'a> SearchPostProcess<S::SearchAccessor<'a>, T, O>
- This means post-processor must work for ANY accessor lifetime 'a
- But query = queries.row(query_idx) is borrowed for specific loop iteration lifetime
- These are fundamentally incompatible - a borrowed value can't satisfy for<'a> generically

Compiler errors (3 total):
- 'not general enough': implementation needed for or<'a> but found specific '0
- 'does not live long enough': queries lifetime too short for 'static requirement

Files modified:
- diskann-benchmark/src/backend/index/benchmarks.rs:
  * Removed explicit for<'a> post_processor::DeterminantDiversity constraint
  * Narrowed plugin impl to FullPrecisionProvider<f32>

- diskann-benchmark/src/backend/index/post_processor/determinant_diversity.rs:
  * Added shim trait FullPrecisionVectorAccessor
  * Async method get_full_precision_vector(&mut self, id) -> impl Future<...>

Next steps to investigate:
- Move determinant-diversity outside search_with() as post-processing reranking
- This avoids HRTB entirely by applying after candidates are returned
- Benchmark impact: measure recall/QPS with external reranking vs baseline

Related context:
- Disk index determinant-diversity works correctly (uses real vectors, shows 51-53% QPS cost)
- Shared algorithm fixed (distance-to-similarity scoring direction)
- Branch already merged with origin/main
Co-authored-by: Copilot <copilot@github.com>
…educe duplication

- Use for<'a, 'b> SearchStrategy bound (user-provided fix) to break HRTB lifetime
  projection issue in the search_with post-processor constraint
- Wire FullPrecisionVectorAccessor shim trait so async det-div post-processor fetches
  real candidate vectors instead of placeholder distances
- Populate QPS/latency metrics in async det-div benchmark path (previously all 'missing')
- Extract run_topk_timed helper to eliminate ~100 lines of duplicated loop/timing/recall
  machinery from DeterminantDiversity::run
- Update async-determinant-diversity.json example tag (async-index-build -> graph-index-build)
- Fix clippy::manual_async_fn in FullPrecisionVectorAccessor shim trait
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 6, 2026

Codecov Report

❌ Patch coverage is 51.04670% with 304 lines in your changes missing coverage. Please review.
✅ Project coverage is 89.25%. Comparing base (be804aa) to head (468b5d2).
⚠️ Report is 3 commits behind head on main.

Files with missing lines Patch % Lines
diskann-benchmark/src/backend/index/benchmarks.rs 5.51% 120 Missing ⚠️
diskann-disk/src/search/provider/disk_provider.rs 18.18% 90 Missing ⚠️
...kend/index/post_processor/determinant_diversity.rs 0.00% 36 Missing ⚠️
diskann-benchmark/src/inputs/graph_index.rs 55.26% 17 Missing ⚠️
diskann-benchmark/src/inputs/disk.rs 0.00% 11 Missing ⚠️
diskann-benchmark/src/inputs/post_processor.rs 0.00% 11 Missing ⚠️
...kann-benchmark/src/backend/index/search/plugins.rs 62.96% 10 Missing ⚠️
...vider/async_/determinant_diversity_post_process.rs 96.91% 8 Missing ⚠️
diskann-tools/src/utils/search_disk_index.rs 0.00% 1 Missing ⚠️

❌ Your patch status has failed because the patch coverage (51.04%) is below the target coverage (90.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1011      +/-   ##
==========================================
- Coverage   90.63%   89.25%   -1.39%     
==========================================
  Files         460      463       +3     
  Lines       85424    86021     +597     
==========================================
- Hits        77427    76781     -646     
- Misses       7997     9240    +1243     
Flag Coverage Δ
miri 89.25% <51.04%> (-1.39%) ⬇️
unittests 89.10% <51.04%> (-1.50%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
diskann-benchmark/src/backend/index/mod.rs 100.00% <ø> (ø)
diskann-benchmark/src/backend/index/spherical.rs 100.00% <ø> (ø)
diskann-benchmark/src/inputs/mod.rs 78.26% <ø> (ø)
diskann-disk/src/build/builder/core.rs 95.24% <100.00%> (+<0.01%) ⬆️
diskann-tools/src/utils/search_disk_index.rs 0.00% <0.00%> (ø)
...vider/async_/determinant_diversity_post_process.rs 96.91% <96.91%> (ø)
...kann-benchmark/src/backend/index/search/plugins.rs 62.31% <62.96%> (-3.69%) ⬇️
diskann-benchmark/src/inputs/disk.rs 4.24% <0.00%> (-0.24%) ⬇️
diskann-benchmark/src/inputs/post_processor.rs 0.00% <0.00%> (ø)
diskann-benchmark/src/inputs/graph_index.rs 39.93% <55.26%> (+2.91%) ⬆️
... and 3 more

... and 45 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

narendatha and others added 4 commits May 6, 2026 16:57
- Use for<'a, 'b> SearchStrategy bound to resolve HRTB lifetime mismatch
- Wire FullPrecisionVectorAccessor shim trait for async det-div to fetch real vectors
- Implement DeterminantDiversity post-processor for async graph-index path
- Extract run_topk_timed helper to eliminate ~100 lines of code duplication
- Wire post_processor parameter to disk-index search pipeline
- Update search parameter handling and result counting for post-processed results
- Add TopkPostProcessor input type and necessary imports
- Populate QPS/latency metrics in async det-div benchmark path
Co-authored-by: Copilot <copilot@github.com>
@narendatha narendatha changed the title (draft) integrate determinant diversity in tokp full precision search and disk search (diversity) Integrate determinant diversity in tokp full precision search and disk search May 6, 2026
@narendatha narendatha marked this pull request as ready for review May 6, 2026 15:03
@narendatha narendatha requested review from a team and Copilot May 6, 2026 15:03
Copy link
Copy Markdown
Contributor

@hildebrandmw hildebrandmw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Naren! My biggest concern is diskann-benchmark. I would really prefer if we generalized diskann-benchmark-core to handle post-processing, which would reduce bugs and be more broadly applicable.

On the implementation front, there's a bit of work to do regarding being more opinionated on the nature of inputs, establishing invariants of the diversity search with a bespoke struct (which could help with the previous point), reducing code duplication, testing the implementation, and removing allocations with a Matrix.

I'll leave reviewing the diskann-disk changes to others.

.is_some_and(|pp| matches!(pp, TopkPostProcessor::DeterminantDiversity { .. }))
}

pub(crate) fn get(phase: &SearchPhase) -> anyhow::Result<(&TopkSearchPhase, f32, f32)> {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we return a richer struct rather than to f32? Ideally - the parsed code would validate that a determinant diversity post-processor can be constructed from the parameters (using the actual constructor).

/// Encapsulates the outer thread/run/L loops, the per-rep timing harness, per-query latency
/// collection, and [`SearchResults`] construction. The caller supplies only the actual
/// per-query search as a closure `search_fn(knn_params, query, output) -> Result<()>`.
fn run_topk_timed(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code in diskann-benchmark-core abstracting the loop nest exists for several reasons:

  1. It keeps the behavior more or less the same for all users.
  2. More importantly, it very carefully minimizes monomorphizations to help keep compile times under control.

Introducing bespoke implementations like this can very quickly get us back to a place where benchmarks take forever to compile and the set of metrics gathered by different search kinds diverges.

I understand that the KNN runner in diskann-benchmark-core predates parametrizable post-process routines, but we should update it with an additional defaulted generic. I understand that's more work, but it improves the core abstraction and will pay dividends when it comes to exploring additional post-processing routines.

))?);
let groundtruth = datafiles::load_groundtruth(datafiles::BinFile(&topk.groundtruth))?;

let results = run_topk_timed(topk, &queries, &groundtruth, |params, query, output| {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Case and point as to why we should use diskann-benchmark-core: This runs all the searches single-threaded, but the loop nest loops over the number of threads. Not only will this make the benchmark take forever to run, it's also misleading and means that we're measuring overhead related to tokio runtime construction.


fn kind(&self) -> &'static str {
Self::kind().as_str()
SearchPhaseKind::Topk.as_str()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably have the plugin types implement an as_str() method instead to make sure they stay up-to-date properly and completely avoid SearchPhaseKind here.

impl CheckDeserialization for TopkPostProcessor {
fn check_deserialization(&mut self, _checker: &mut Checker) -> Result<(), anyhow::Error> {
match self {
TopkPostProcessor::DeterminantDiversity { power, eta } => {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we have one struct that validates the inputs in the lower level crate and always use that for input validation? That way, there's no risk of these ad-hoc validations ever being wrong.

* Licensed under the MIT license.
*/

use diskann_vector::{MathematicalValue, PureDistanceFunction, distance::InnerProduct};
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A module level doc explaining what this is and how it works would be really helpful.

}

#[cfg(test)]
mod tests {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we have tests that check the algorithmic behavior?

.collect()
}

fn post_process_greedy_orthogonalization_f32<Id: Copy>(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two routines are very similar. Is it possible to merge them?

for (_, distance_to_query, v) in &candidates {
let scale =
distance_to_similarity(*distance_to_query, distance_range).powf(power) * inv_sqrt_eta;
let residual: Vec<f32> = v.iter().map(|&x| x * scale).collect();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe use a Matrix to store the residuals? This will provide better locality in the processing loop and cut down on the number of allocations.


mod determinant_diversity_post_process;
pub(crate) mod postprocess;
pub use determinant_diversity_post_process::determinant_diversity_post_process;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is shared with the disk index, maybe move it up instead of down in the async_ abyss? I know diskann-providers is kind of a nightmare at the moment 😢

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants