Problem
ChangeTrace currently relies on LibGit2Sharp to reconstruct repository history and produce TraceEvent streams. The architecture and domain models are solid and maintainable, but profiling shows that some operations in history reader become significant performance bottlenecks when working with large repositories, e.g., 100k–200k commits.
The main performance issues are:
- Repeated commit graph traversal for branch mapping (
BuildCommitToBranchMap)
- Object allocation for each
Commit
- Per-commit
TreeDiff operations
- Use of
Task.Run in synchronous logic, which does not provide true asynchronous benefits
Meanwhile, Git CLI (git log) is highly optimized for streaming commit history directly from packfiles and can perform many of these operations far more efficiently.
Goal
The goal of this investigation is to determine which parts of history reconstruction pipeline would benefit from using Git CLI versus LibGit2Sharp. The intent is not to remove LibGit2Sharp entirely — it remains valuable for repository manipulation, branch operations, and detailed diff inspection — but to find the optimal split between:
- Git CLI – for fast history extraction and commit metadata streaming
- LibGit2Sharp – for operations requiring in depth tree inspection or diff calculation
A potential hybrid pipeline could look like this:
git log
│
▼
Commit metadata stream
│
▼
TraceEvent reconstruction
│
▼
(optional) LibGit2Sharp diff operations
Areas to Investigate
- History Extraction – Measure performance of
git log --pretty=... and git log --name-status compared to the current LibGit2Sharp commit traversal. Identify which approach gives the fastest timeline reconstruction.
- Branch Attribution – Explore alternatives to
BuildCommitToBranchMap, such as git log --decorate or git branch --contains. Determine whether current branch mapping is strictly necessary or can be simplified
- Diff Strategy – Assess whether all commits truly require
TreeDiff, or if some information can be derived directly from CLI output, reducing allocations and computation.
- Async Model – Check whether wrapping synchronous operations in
Task.Run actually provides any benefit, or if purely synchronous execution inside history reader would be simpler and faster.
Expected Outcome
By the end of this evaluation, we should clearly understand:
- Which parts of the pipeline benefit most from Git CLI streaming
- Which operations should remain handled by LibGit2Sharp
- How hybrid approach could dramatically improve performance without sacrificing correctness or maintainability
Benefits
Implementing hybrid approach could significantly speed up timeline reconstruction for large repositories, reduce memory allocations, minimize redundant commit graph traversals, and improve ChangeTrace scalability — all while keeping the architecture and domain models clean and testable.
Problem
ChangeTrace currently relies on LibGit2Sharp to reconstruct repository history and produce
TraceEventstreams. The architecture and domain models are solid and maintainable, but profiling shows that some operations in history reader become significant performance bottlenecks when working with large repositories, e.g., 100k–200k commits.The main performance issues are:
BuildCommitToBranchMap)CommitTreeDiffoperationsTask.Runin synchronous logic, which does not provide true asynchronous benefitsMeanwhile, Git CLI (
git log) is highly optimized for streaming commit history directly from packfiles and can perform many of these operations far more efficiently.Goal
The goal of this investigation is to determine which parts of history reconstruction pipeline would benefit from using Git CLI versus LibGit2Sharp. The intent is not to remove LibGit2Sharp entirely — it remains valuable for repository manipulation, branch operations, and detailed diff inspection — but to find the optimal split between:
A potential hybrid pipeline could look like this:
Areas to Investigate
git log --pretty=...andgit log --name-statuscompared to the current LibGit2Sharp commit traversal. Identify which approach gives the fastest timeline reconstruction.BuildCommitToBranchMap, such asgit log --decorateorgit branch --contains. Determine whether current branch mapping is strictly necessary or can be simplifiedTreeDiff, or if some information can be derived directly from CLI output, reducing allocations and computation.Task.Runactually provides any benefit, or if purely synchronous execution inside history reader would be simpler and faster.Expected Outcome
By the end of this evaluation, we should clearly understand:
Benefits
Implementing hybrid approach could significantly speed up timeline reconstruction for large repositories, reduce memory allocations, minimize redundant commit graph traversals, and improve ChangeTrace scalability — all while keeping the architecture and domain models clean and testable.