Skip to content

Expand blob prefetch noop cache to N entries#2004

Open
tyrielv wants to merge 2 commits into
microsoft:masterfrom
tyrielv:tyrielv/expand-prefetch-cache
Open

Expand blob prefetch noop cache to N entries#2004
tyrielv wants to merge 2 commits into
microsoft:masterfrom
tyrielv:tyrielv/expand-prefetch-cache

Conversation

@tyrielv
Copy link
Copy Markdown
Contributor

@tyrielv tyrielv commented Jun 4, 2026

Summary

Replace the single-entry LastBlobPrefetch.dat cache with a multi-entry BlobPrefetchCache.dat that stores up to N entries (default 100), keyed by SHA256 hash of (files, folders, hydrate) and storing the commit ID.

Problem

The most common blob prefetch use case cycles through ~3 different prefetch calls with different file/folder patterns. Since only 1 entry was cached, 2 of the 3 always missed and re-ran the full pipeline (diff + existence checks + downloads) even when nothing changed.

Changes

  • BlobPrefetcher.cs: Replace flat 4-key dictionary with hash-keyed multi-entry cache. Add ComputeCacheKey (canonical, order-independent SHA256 hashing) and single-entry eviction when at capacity.
  • PrefetchVerb.cs: Read gvfs.prefetch-cache-size config (default 100, 0 disables, max 1000). Use BlobPrefetchCache.dat instead of LastBlobPrefetch.dat.
  • BlobPrefetcherTests.cs: 12 unit tests covering key determinism, order independence, cache hit/miss, multi-entry support, and null/empty edge cases.

Configuration

git config gvfs.prefetch-cache-size 100   # default
git config gvfs.prefetch-cache-size 0     # disable caching

Backward Compatibility

LastBlobPrefetch.dat is simply ignored — the first prefetch after upgrade is a cache miss (acceptable).

tyrielv added 2 commits June 4, 2026 10:53
Replace the single-entry LastBlobPrefetch.dat cache with a multi-entry
BlobPrefetchCache.dat that stores up to N entries (default 100), keyed
by SHA256 hash of (files, folders, hydrate) and storing the commit ID.

This avoids redundant diff+download work when users cycle through a
small set of prefetch patterns (e.g. 3 different file/folder combos),
which previously caused 2/3 of calls to miss the single-entry cache.

Changes:
- BlobPrefetcher: replace flat 4-key dictionary with hash-keyed cache
- BlobPrefetcher.ComputeCacheKey: canonical, order-independent hashing
- BlobPrefetcher.SavePrefetchArgs: single-entry eviction when at capacity
- PrefetchVerb: read gvfs.prefetchCacheSize config (0=disabled, max 1000)
- PrefetchVerb: use BlobPrefetchCache.dat instead of LastBlobPrefetch.dat
- 12 unit tests covering key determinism, order independence, cache
  hit/miss, multi-entry support, and null/empty edge cases

Assisted-by: Claude Opus 4.6
Signed-off-by: Tyrie Vella <tyrielv@gmail.com>
The multi-entry prefetch cache persists across ordered tests, causing
cache hits where the tests expect fresh prefetch work. Delete
BlobPrefetchCache.dat in [SetUp] so each test starts with a clean cache.

Assisted-by: Claude Opus 4.6
Signed-off-by: Tyrie Vella <tyrielv@gmail.com>
@tyrielv tyrielv marked this pull request as ready for review June 4, 2026 20:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant