feat: add /cache stats — prefix hash/drift exposure and cache-hit summary#2336
feat: add /cache stats — prefix hash/drift exposure and cache-hit summary#2336encyc wants to merge 4 commits into
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a new /cache stats command to the TUI, displaying prefix-cache stability, fingerprints, and aggregated cache-hit summaries. The implementation updates the engine's turn loop to propagate the pinned prefix combined hash via events and stores it in the app state. The review feedback highlights three key issues: first, the token aggregation logic in format_cache_stats should correctly handle non-cache-aware turns and infer missing miss tokens to prevent skewed statistics; second, the pinned_hash is extracted before the cache is updated, making the reported hash one turn behind; and third, an empty pinned_combined_hash should clear the stored hash in the app state rather than leaving a stale value.
| let (hit, miss, input) = app.session.turn_cache_history.iter().fold( | ||
| (0u64, 0u64, 0u64), | ||
| |(hit, miss, input), rec| { | ||
| ( | ||
| hit + u64::from(rec.cache_hit_tokens.unwrap_or(0)), | ||
| miss + u64::from(rec.cache_miss_tokens.unwrap_or(0)), | ||
| input + u64::from(rec.input_tokens), | ||
| ) | ||
| }, | ||
| ); |
There was a problem hiding this comment.
In format_cache_stats, the fold logic aggregates cache hit and miss tokens across all turns. However, if cache_hit_tokens is None (indicating the turn is not cache-aware), defaulting cache_hit_tokens and cache_miss_tokens to 0 will pollute the statistics. Furthermore, if cache_hit_tokens is Some but cache_miss_tokens is None, the miss tokens should be inferred as input_tokens.saturating_sub(hit) (just like in format_cache_history), otherwise they are under-reported as 0, leading to an incorrect 100% hit rate. We should only aggregate cache statistics for cache-aware turns and correctly infer missing cache_miss_tokens.
let (hit, miss, input) = app.session.turn_cache_history.iter().fold(
(0u64, 0u64, 0u64),
|(hit, miss, input), rec| {
if let Some(h) = rec.cache_hit_tokens {
let m = rec.cache_miss_tokens.unwrap_or_else(|| rec.input_tokens.saturating_sub(h));
(
hit + u64::from(h),
miss + u64::from(m),
input + u64::from(rec.input_tokens),
)
} else {
(hit, miss, input + u64::from(rec.input_tokens))
}
},
);| let pinned_hash = pm | ||
| .pinned_fingerprint() | ||
| .map(|fp| fp.combined_sha256.clone()) | ||
| .unwrap_or_default(); | ||
| match pm.check_and_update(&system_text, tools_ref) { |
There was a problem hiding this comment.
The pinned_hash is extracted from pm before pm.check_and_update is called. Since check_and_update is what actually updates the pinned fingerprint, extracting it beforehand means the sent pinned_combined_hash will always be one turn behind (and empty on the very first turn). Extracting the fingerprint after the update ensures the TUI receives the correct, up-to-date hash.
let change_res = pm.check_and_update(&system_text, tools_ref);
let pinned_hash = pm
.pinned_fingerprint()
.map(|fp| fp.combined_sha256.clone())
.unwrap_or_default();
match change_res {| if !pinned_combined_hash.is_empty() { | ||
| app.last_pinned_prefix_hash = Some(pinned_combined_hash); | ||
| } |
There was a problem hiding this comment.
If pinned_combined_hash is empty (e.g., when switching to a model or provider that does not support prefix caching), we should clear app.last_pinned_prefix_hash by setting it to None instead of leaving the stale hash from the previous provider/model intact.
app.last_pinned_prefix_hash = (!pinned_combined_hash.is_empty()).then_some(pinned_combined_hash);There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 9b571e8f6f
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| let pinned_hash = pm | ||
| .pinned_fingerprint() | ||
| .map(|fp| fp.combined_sha256.clone()) | ||
| .unwrap_or_default(); | ||
| match pm.check_and_update(&system_text, tools_ref) { |
There was a problem hiding this comment.
Use the post-update pinned hash
In sessions where the prefix manager is unpinned or detects drift, this captures pinned_hash before check_and_update, but that call is what pins the first fingerprint and re-pins after a change. As a result /cache stats still shows Pinned hash: unavailable after the first turn, and after a drift warning it reports the old baseline hash until another stable turn happens, even though the event field is documented as the current pinned prefix hash.
Useful? React with 👍 / 👎.
| hit + u64::from(rec.cache_hit_tokens.unwrap_or(0)), | ||
| miss + u64::from(rec.cache_miss_tokens.unwrap_or(0)), |
There was a problem hiding this comment.
Don't treat missing cache telemetry as zero misses
When a provider reports cache_hit_tokens but omits cache_miss_tokens, the existing /cache table infers misses as input_tokens - hit, but /cache stats adds zero here. In that common partial-telemetry case, a turn like 10k input / 8k hit / missing miss is summarized as 100% hit rate instead of 80%, hiding low cache effectiveness; records with no hit telemetry are also folded into a misleading 0.0% of 0 advisory instead of being excluded like the table does.
Useful? React with 👍 / 👎.
| /// Current pinned prefix combined hash (SHA-256, 64 hex chars). | ||
| /// Updated per-turn via PrefixCacheChange events; surfaced by | ||
| /// `/cache stats` for cache-hit debugging. | ||
| pub last_pinned_prefix_hash: Option<String>, |
There was a problem hiding this comment.
Clear the cached fingerprint with model-scoped telemetry
This new fingerprint is model/provider scoped, but clear_model_scoped_telemetry() (used by model and provider switches) only clears the token fields and turn_cache_history, leaving last_pinned_prefix_hash behind. After switching models or providers, /cache stats can therefore show the previous cache scope's pinned hash alongside an empty/new hit-rate history until another prefix event overwrites it, which makes the diagnostics point at the wrong request prefix.
Useful? React with 👍 / 👎.
| if avg_pct < 80.0 { | ||
| out.push_str(" NOTE: cache hit rate is low (< 80%). Check prefix stability above or consider /compact.\n"); | ||
| } |
There was a problem hiding this comment.
Spurious low-hit-rate advisory when no cache-aware turns exist
When history is non-empty but every record has cache_hit_tokens == None (e.g. a model that doesn't report cache telemetry), the fold leaves turns == 0, total_cache == 0, and avg_pct == 0.0. Because 0.0 < 80.0, the advisory fires anyway, producing output like Turns recorded: 0 … NOTE: cache hit rate is low. The user sees a "fix your caching" prompt when the real situation is simply that the model doesn't expose cache stats. Guard the advisory on turns > 0.
| if avg_pct < 80.0 { | |
| out.push_str(" NOTE: cache hit rate is low (< 80%). Check prefix stability above or consider /compact.\n"); | |
| } | |
| if turns > 0 && avg_pct < 80.0 { | |
| out.push_str(" NOTE: cache hit rate is low (< 80%). Check prefix stability above or consider /compact.\n"); | |
| } |
| let mut turns = 0u64; | ||
| let (hit, miss, input) = app.session.turn_cache_history.iter().fold( | ||
| (0u64, 0u64, 0u64), | ||
| |(hit, miss, input), rec| { | ||
| let Some(hit_tokens) = rec.cache_hit_tokens else { | ||
| return (hit, miss, input); | ||
| }; | ||
| let h = u64::from(hit_tokens); | ||
| let m = u64::from( | ||
| rec.cache_miss_tokens | ||
| .unwrap_or(rec.input_tokens.saturating_sub(hit_tokens)), | ||
| ); | ||
| turns += 1; | ||
| (hit + h, miss + m, input + u64::from(rec.input_tokens)) | ||
| }, | ||
| ); |
There was a problem hiding this comment.
turns mutated as a side-effect inside fold
Mutating an outer variable from inside the fold closure works in Rust but is surprising and hides the counter from fold's own accumulator. Folding into a 4-tuple keeps all state visible and eliminates the mutable capture.
| let mut turns = 0u64; | |
| let (hit, miss, input) = app.session.turn_cache_history.iter().fold( | |
| (0u64, 0u64, 0u64), | |
| |(hit, miss, input), rec| { | |
| let Some(hit_tokens) = rec.cache_hit_tokens else { | |
| return (hit, miss, input); | |
| }; | |
| let h = u64::from(hit_tokens); | |
| let m = u64::from( | |
| rec.cache_miss_tokens | |
| .unwrap_or(rec.input_tokens.saturating_sub(hit_tokens)), | |
| ); | |
| turns += 1; | |
| (hit + h, miss + m, input + u64::from(rec.input_tokens)) | |
| }, | |
| ); | |
| let (turns, hit, miss, input) = app.session.turn_cache_history.iter().fold( | |
| (0u64, 0u64, 0u64, 0u64), | |
| |(turns, hit, miss, input), rec| { | |
| let Some(hit_tokens) = rec.cache_hit_tokens else { | |
| return (turns, hit, miss, input); | |
| }; | |
| let h = u64::from(hit_tokens); | |
| let m = u64::from( | |
| rec.cache_miss_tokens | |
| .unwrap_or(rec.input_tokens.saturating_sub(hit_tokens)), | |
| ); | |
| (turns + 1, hit + h, miss + m, input + u64::from(rec.input_tokens)) | |
| }, | |
| ); |
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
…mary (Hmbown#2264) Surfaces the current pinned prefix fingerprint hash and stability diagnostics via a new `/cache stats` subcommand, per the low-risk 0.8.x slice agreed on in Hmbown#2264. Changes: - EngineEvent::PrefixCacheChange now carries pinned_combined_hash from PrefixStabilityManager so the TUI layer can surface it - App.last_pinned_prefix_hash stores the hash per-turn for /cache stats - New /cache stats command renders three sections: - Prefix Stability: pct, checks, changes, last-change description - Prefix Fingerprint: full SHA-256 hash, short id, drift warning - Cache Hit Rate: aggregated token totals + low-hit advisory - format_tokens() helper for compact K/M token display - 6 unit tests covering: no-data, stable, drift warning, hit-rate summary, low-hit advisory, token formatting Refs: Hmbown#2264
- Move pinned_hash extraction after check_and_update() so the reported hash reflects the current prefix state, not the previous turn - Skip non-cache-aware turns (cache_hit_tokens=None) in /cache stats aggregation; infer miss tokens when cache_miss_tokens is None - Clear last_pinned_prefix_hash to None when pinned_combined_hash is empty (e.g. switching to a non-caching model/provider)
clear_model_scoped_telemetry() now resets last_pinned_prefix_hash to None so /cache stats does not show the previous model/provider cache scope's fingerprint after a switch.
b8380ec to
6175285
Compare
Closes #2264 (low-risk 0.8.x slice).
What
New
/cache statssubcommand that surfaces prefix-cache stability diagnostics in three sections:Prefix Stability — percentage, check/change counts, WARNING when drifted
Prefix Fingerprint — full SHA-256 combined hash, short id, drift warning
Cache Hit Rate — aggregated hit/miss/input tokens from per-turn telemetry, low-hit advisory below 80%
Plumbing
PrefixCacheChangeeventpinned_combined_hash: String— carries pinned prefix fingerprint to TUIturn_loop.rspinned_fingerprint().combined_sha256beforecheck_and_updateApplast_pinned_prefix_hash: Option<String>— updated per-turnui.rsFiles
6 files, +232 −1
Greptile Summary
This PR adds
/cache stats, a new subcommand that surfaces prefix-cache stability, a pinned SHA-256 fingerprint, and a per-session cache-hit rate summary. It also includes a largely undocumented second feature: a "whale routes" single-column model picker for DeepSeek providers, with a newwhale_routes.rstaxonomy and matching refactor ofmodel_picker.rs./cache statsplumbing (debug.rs,events.rs,turn_loop.rs,app.rs,ui.rs): the newformat_cache_statsfunction aggregates per-turn telemetry, displays prefix stability and fingerprint data, and fires a low-hit advisory below 80%. The six accompanying tests cover the main paths. Thepinned_combined_hashfield is populated aftercheck_and_updatein both branches (the PR description says "before", which is a minor doc discrepancy but not a functional issue).whale_routes.rs,model_picker.rs): introduces six named model+effort combinations (Blue Whale → Porpoise) and replaces the two-column picker with a single-column list for DeepSeek providers.model_picker.rscontains a compile error inrender_whale_routes—popup_heightcalls.min()on ausize(fromwhale_route_row_count()) against au16(fromarea.height), which will not compile. Thesort_orderfield onWhaleRoutealso silently doubles as an array index without an explicit invariant test.Confidence Score: 2/5
Not safe to merge:
render_whale_routesinmodel_picker.rswill not compile due to a type mismatch in thepopup_heightexpression.The
/cache statswiring is clean and the logic is correct, but the PR includes a second undocumented feature (whale-routes model picker) that contains a definite compile error:(row_count + 4).min(area.height.saturating_sub(4))mixesusizewithu16, which Rust will reject. This would prevent the entire TUI crate from building.crates/tui/src/tui/model_picker.rs— therender_whale_routesmethod has the compile error.crates/tui/src/tui/whale_routes.rsshould also add an assertion thatsort_order == array index.Important Files Changed
usizevsu16type mismatch inpopup_heightcomputation that prevents this code from building.sort_orderdoubles silently as the array index, a load-bearing invariant with no direct assertion.format_cache_statsandformat_tokensfor the new/cache statssubcommand, with six unit tests; main logic is correct and well-guarded.pinned_fingerprint().combined_sha256in both changed and stable branches and populates the new event field; straightforward and correct.last_pinned_prefix_hash: Option<String>field, initialises it toNone, and clears it on session reset; clean.pinned_combined_hashfrom the event intoApp, converting empty strings toNone; correct.Sequence Diagram
sequenceDiagram participant Engine as turn_loop.rs participant PM as PrefixStabilityManager participant Event as PrefixCacheChange event participant UI as ui.rs (event loop) participant App as App state participant Cmd as /cache stats Engine->>PM: check_and_update(system_text, tools) PM-->>Engine: Ok(_) or Err(change) Engine->>PM: pinned_fingerprint() PM-->>Engine: combined_sha256 Engine->>Event: "emit PrefixCacheChange { pinned_combined_hash, stability_pct, changed }" Event->>UI: received in run_event_loop UI->>App: "last_pinned_prefix_hash = Some(hash)" UI->>App: "prefix_stability_pct = Some(pct)" UI->>App: "prefix_change_count += 1 (if changed)" Note over Cmd,App: User runs /cache stats Cmd->>App: read prefix_stability_pct, last_pinned_prefix_hash Cmd->>App: read session.turn_cache_history Cmd-->>Cmd: format_cache_stats(app) → String messageComments Outside Diff (1)
crates/tui/src/tui/model_picker.rs, line 371 (link)popup_heightmixesusize(fromwhale_route_row_count()) withu16(fromarea.height.saturating_sub(4)) in the same.min()call. Rust requires both sides of.min()to be the same type, so this will not compile. Theas u16cast only applies to the final result and does not resolve the intermediate type mismatch.Reviews (3): Last reviewed commit: "fix: clear last_pinned_prefix_hash on mo..." | Re-trigger Greptile