Skip to content

Add pack-index object existence checker strategy for prefetch#2003

Open
tyrielv wants to merge 1 commit into
microsoft:masterfrom
tyrielv:tyrielv/prefetch-idx-strategy
Open

Add pack-index object existence checker strategy for prefetch#2003
tyrielv wants to merge 1 commit into
microsoft:masterfrom
tyrielv:tyrielv/prefetch-idx-strategy

Conversation

@tyrielv
Copy link
Copy Markdown
Contributor

@tyrielv tyrielv commented Jun 4, 2026

Summary

Introduces a strategy pattern (IObjectExistenceChecker) to decouple blob prefetch from libgit2's git_revparse_single, which is extremely slow for missing objects (~2.8ms/op with 14 packs in a large GVFS cache).

New PackIndexObjectExistenceChecker reads MIDX and supplemental .idx files directly in managed code via memory-mapped IO (~5μs/op), with loose-object File.Exists fallback. 460x faster for cache-miss lookups.

Gated on config

git config gvfs.prefetch-use-idx true

Default: f alse (existing revparse behavior unchanged).

Components

File Purpose
IObjectExistenceChecker Strategy interface
LibGit2ObjectExistenceChecker Wraps existing LibGit2Repo.ObjectExists
PackIndexObjectExistenceChecker MIDX + pack idx + loose fallback
MidxReader Memory-mapped MIDX v1 parser with binary search
PackIndexReader Memory-mapped pack index v2 parser with binary search

Design decisions

  • Searches both LocalObjectsRoot and GitObjectsRoot (shared cache), de-duplicated
  • Detects supplemental packs not yet in MIDX via PNAM chunk diffing
  • Thread-safe: shared mmap instance across all FindBlobsStage workers, wrapped in NonDisposingCheckerWrapper
  • Falls back to revparse on any initialization error
  • 19 new unit tests covering hit/miss, fanout buckets, supplemental packs, loose objects, empty dirs, corrupt files

Benchmark (59.7M objects, 14 packs)

Method Exists (ns/op) Missing (ns/op) Batch 1000 mixed (ms)
git_revparse_single 762 2,801,984 1,796
MIDX binary search 5,371 6,386 2

@tyrielv tyrielv force-pushed the tyrielv/prefetch-idx-strategy branch 2 times, most recently from c090d78 to 743b0fc Compare June 4, 2026 20:09
@tyrielv tyrielv marked this pull request as ready for review June 4, 2026 20:14
Introduce IObjectExistenceChecker strategy pattern to decouple blob
prefetch from libgit2's git_revparse_single, which is extremely slow
for missing objects (~2.8ms/op with 14 packs in a large GVFS cache).

New PackIndexObjectExistenceChecker reads MIDX and supplemental .idx
files directly in managed code via memory-mapped IO (~5us/op), with
loose-object File.Exists fallback. Gated on gvfs.prefetch-use-idx
git config (default: false).

Components:
- IObjectExistenceChecker: strategy interface
- RevParseObjectExistenceChecker: wraps existing LibGit2Repo.ObjectExists
- PackIndexObjectExistenceChecker: MIDX + pack idx + loose fallback
- MidxReader: memory-mapped MIDX v1 parser with binary search
- PackIndexReader: memory-mapped pack index v2 parser with binary search
- FindBlobsStage: accepts optional checker factory (backward compatible)
- BlobPrefetcher: reads config, creates appropriate checker factory

Searches both LocalObjectsRoot and GitObjectsRoot (shared cache),
detects supplemental packs not yet in MIDX via PNAM chunk diffing,
and safely falls back to revparse on initialization errors.

Unit tests cover: MIDX/idx hit and miss, all 256 fanout buckets,
supplemental pack detection, loose objects, empty/missing pack dirs,
multiple object roots, corrupt file handling, and deduplication.

Assisted-by: Claude Opus 4.6
Signed-off-by: Tyrie Vella <tyrielv@gmail.com>
@tyrielv tyrielv force-pushed the tyrielv/prefetch-idx-strategy branch from 743b0fc to 13d4935 Compare June 5, 2026 17:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant