Skip to content

make opening index size inspection O(1)#26

Closed
zehiko wants to merge 1 commit into
release-7.0.0from
zehiko/deepsize-fix-release-7.0.0
Closed

make opening index size inspection O(1)#26
zehiko wants to merge 1 commit into
release-7.0.0from
zehiko/deepsize-fix-release-7.0.0

Conversation

@zehiko

@zehiko zehiko commented Jun 18, 2026

Copy link
Copy Markdown
Member

Problem

LanceIndexStore::deep_size_of_children counted object_store + index_dir + metadata_cache. object_store and metadata_cache are Arcs to process-wide shared infrastructure, not data the index owns.

A cached BTreeIndex reaches its LanceIndexStore (and thus metadata_cache) via Arc, so moka's byte-weigher — which calls deep_size_of() on every index-cache insert — walked the entire shared metadata cache to weigh a single index. Two consequences:

  1. Cost: each scalar-index open became O(metadata-cache size) (LanceCache::deep_size_of_childrenapprox_size_bytes() iterates every entry). On a large shared cache this dominates CPU.
  2. Mis-weighting: every index entry was charged ≈ the whole metadata cache, inflating and equalizing index-cache entry weights, so size-based eviction evicted index objects almost immediately and re-opened them on nearly every query.

In production this showed up as open_scalar_index on essentially every query, each paying the O(metadata-cache) walk.

Fix

Count only data the store owns: index_dir + file_sizes. This mirrors BTreeIndex::deep_size_of_children, which already skips its index_cache field to avoid circular references.

It deliberately does not touch approx_size_bytes / LanceCache::deep_size_of_children (no weighted_size), so it carries no telemetry-accuracy tradeoff — the per-open path simply never reaches the cache.

Testing done

Microbenchmark timing the real LanceIndexStore::deep_size_of() vs metadata-cache fill N (release-7.0.0 vs this branch, same harness):

N (metadata entries) release-7.0.0 this fix
1 5.6 µs ~1 ns
1,000 91 µs ~1 ns
10,000 767 µs ~1 ns
100,000 19.5 ms ~1 ns

O(N) → O(1), independent of cache fill.

  • stack testing - WIP

@github-actions github-actions Bot added the bug Something isn't working label Jun 18, 2026
@github-actions

Copy link
Copy Markdown

ACTION NEEDED
Lance follows the Conventional Commits specification for release automation.

The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification.

For details on the error please inspect the "PR Title Check" action.

@zehiko zehiko marked this pull request as draft June 18, 2026 15:40
LanceIndexStore::deep_size_of_children counted object_store + metadata_cache,
both Arc-shared process-wide infrastructure, not data the index owns. A cached
BTreeIndex reaches this store via Arc, so every index-cache entry was charged
the entire shared metadata cache:
 - mis-weighting: inflated/equalized index-cache entry weights so size-based
   eviction evicted index objects immediately and re-opened them every query
   (the production open_scalar_index churn);
 - cost: each index-cache insert walked the metadata cache (approx_size_bytes
   iterates all entries) -> O(metadata-cache size) per open.

Count only owned data (index_dir + file_sizes). Mirrors BTreeIndex, which
already skips its index_cache field.
@zehiko zehiko force-pushed the zehiko/deepsize-fix-release-7.0.0 branch from 531c10d to 416dd77 Compare June 18, 2026 15:43
@zehiko zehiko changed the title fix(index): exclude shared infra from LanceIndexStore DeepSizeOf (O(metadata-cache)→O(1) per scalar-index open) make opening index size inspection O(1) Jun 18, 2026
@zehiko

zehiko commented Jun 19, 2026

Copy link
Copy Markdown
Member Author

This path is no longer relevant after Lance lance-format#6222, see details in https://linear.app/rerun/issue/RR-4918/maven-slow-query-investigation

@zehiko zehiko closed this Jun 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant