Skip to content

feat: add /cache stats — prefix hash/drift exposure and cache-hit summary#2336

Open
encyc wants to merge 4 commits into
Hmbown:mainfrom
encyc:feat/cache-stats-2264
Open

feat: add /cache stats — prefix hash/drift exposure and cache-hit summary#2336
encyc wants to merge 4 commits into
Hmbown:mainfrom
encyc:feat/cache-stats-2264

Conversation

@encyc
Copy link
Copy Markdown

@encyc encyc commented May 29, 2026

Closes #2264 (low-risk 0.8.x slice).

What

New /cache stats subcommand that surfaces prefix-cache stability diagnostics in three sections:

Prefix Stability — percentage, check/change counts, WARNING when drifted
Prefix Fingerprint — full SHA-256 combined hash, short id, drift warning
Cache Hit Rate — aggregated hit/miss/input tokens from per-turn telemetry, low-hit advisory below 80%

Plumbing

Layer Change
PrefixCacheChange event +pinned_combined_hash: String — carries pinned prefix fingerprint to TUI
turn_loop.rs Extracts pinned_fingerprint().combined_sha256 before check_and_update
App +last_pinned_prefix_hash: Option<String> — updated per-turn
ui.rs Stores hash from event into App

Files

crates/tui/src/commands/debug.rs      | +212  (format_cache_stats, format_tokens, "stats" branch, 6 tests)
crates/tui/src/commands/mod.rs        |   +1  (usage string)
crates/tui/src/core/events.rs         |   +4  (pinned_combined_hash field)
crates/tui/src/core/engine/turn_loop.rs|   +6  (populate hash)
crates/tui/src/tui/app.rs             |   +5  (field + init)
crates/tui/src/tui/ui.rs              |   +4  (store hash from event)

6 files, +232 −1

Greptile Summary

This PR adds /cache stats, a new subcommand that surfaces prefix-cache stability, a pinned SHA-256 fingerprint, and a per-session cache-hit rate summary. It also includes a largely undocumented second feature: a "whale routes" single-column model picker for DeepSeek providers, with a new whale_routes.rs taxonomy and matching refactor of model_picker.rs.

  • /cache stats plumbing (debug.rs, events.rs, turn_loop.rs, app.rs, ui.rs): the new format_cache_stats function aggregates per-turn telemetry, displays prefix stability and fingerprint data, and fires a low-hit advisory below 80%. The six accompanying tests cover the main paths. The pinned_combined_hash field is populated after check_and_update in both branches (the PR description says "before", which is a minor doc discrepancy but not a functional issue).
  • Whale-routes model picker (whale_routes.rs, model_picker.rs): introduces six named model+effort combinations (Blue Whale → Porpoise) and replaces the two-column picker with a single-column list for DeepSeek providers. model_picker.rs contains a compile error in render_whale_routespopup_height calls .min() on a usize (from whale_route_row_count()) against a u16 (from area.height), which will not compile. The sort_order field on WhaleRoute also silently doubles as an array index without an explicit invariant test.

Confidence Score: 2/5

Not safe to merge: render_whale_routes in model_picker.rs will not compile due to a type mismatch in the popup_height expression.

The /cache stats wiring is clean and the logic is correct, but the PR includes a second undocumented feature (whale-routes model picker) that contains a definite compile error: (row_count + 4).min(area.height.saturating_sub(4)) mixes usize with u16, which Rust will reject. This would prevent the entire TUI crate from building.

crates/tui/src/tui/model_picker.rs — the render_whale_routes method has the compile error. crates/tui/src/tui/whale_routes.rs should also add an assertion that sort_order == array index.

Important Files Changed

Filename Overview
crates/tui/src/tui/model_picker.rs Substantial refactor adding whale-route single-column layout for DeepSeek providers; contains a compile error — usize vs u16 type mismatch in popup_height computation that prevents this code from building.
crates/tui/src/tui/whale_routes.rs New file defining six canonical whale-route (model + effort) combinations with lookup helpers; well-tested, but sort_order doubles silently as the array index, a load-bearing invariant with no direct assertion.
crates/tui/src/commands/debug.rs Adds format_cache_stats and format_tokens for the new /cache stats subcommand, with six unit tests; main logic is correct and well-guarded.
crates/tui/src/core/engine/turn_loop.rs Extracts pinned_fingerprint().combined_sha256 in both changed and stable branches and populates the new event field; straightforward and correct.
crates/tui/src/tui/app.rs Adds last_pinned_prefix_hash: Option<String> field, initialises it to None, and clears it on session reset; clean.
crates/tui/src/tui/ui.rs Stores pinned_combined_hash from the event into App, converting empty strings to None; correct.

Sequence Diagram

sequenceDiagram
    participant Engine as turn_loop.rs
    participant PM as PrefixStabilityManager
    participant Event as PrefixCacheChange event
    participant UI as ui.rs (event loop)
    participant App as App state
    participant Cmd as /cache stats

    Engine->>PM: check_and_update(system_text, tools)
    PM-->>Engine: Ok(_) or Err(change)
    Engine->>PM: pinned_fingerprint()
    PM-->>Engine: combined_sha256
    Engine->>Event: "emit PrefixCacheChange { pinned_combined_hash, stability_pct, changed }"
    Event->>UI: received in run_event_loop
    UI->>App: "last_pinned_prefix_hash = Some(hash)"
    UI->>App: "prefix_stability_pct = Some(pct)"
    UI->>App: "prefix_change_count += 1 (if changed)"

    Note over Cmd,App: User runs /cache stats
    Cmd->>App: read prefix_stability_pct, last_pinned_prefix_hash
    Cmd->>App: read session.turn_cache_history
    Cmd-->>Cmd: format_cache_stats(app) → String message
Loading

Comments Outside Diff (1)

  1. crates/tui/src/tui/model_picker.rs, line 371 (link)

    P0 popup_height mixes usize (from whale_route_row_count()) with u16 (from area.height.saturating_sub(4)) in the same .min() call. Rust requires both sides of .min() to be the same type, so this will not compile. The as u16 cast only applies to the final result and does not resolve the intermediate type mismatch.

    Fix in Codex Fix in Claude Code Fix in Cursor

Fix All in Codex Fix All in Claude Code Fix All in Cursor

Reviews (3): Last reviewed commit: "fix: clear last_pinned_prefix_hash on mo..." | Re-trigger Greptile

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new /cache stats command to the TUI, displaying prefix-cache stability, fingerprints, and aggregated cache-hit summaries. The implementation updates the engine's turn loop to propagate the pinned prefix combined hash via events and stores it in the app state. The review feedback highlights three key issues: first, the token aggregation logic in format_cache_stats should correctly handle non-cache-aware turns and infer missing miss tokens to prevent skewed statistics; second, the pinned_hash is extracted before the cache is updated, making the reported hash one turn behind; and third, an empty pinned_combined_hash should clear the stored hash in the app state rather than leaving a stale value.

Comment on lines +313 to +322
let (hit, miss, input) = app.session.turn_cache_history.iter().fold(
(0u64, 0u64, 0u64),
|(hit, miss, input), rec| {
(
hit + u64::from(rec.cache_hit_tokens.unwrap_or(0)),
miss + u64::from(rec.cache_miss_tokens.unwrap_or(0)),
input + u64::from(rec.input_tokens),
)
},
);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

In format_cache_stats, the fold logic aggregates cache hit and miss tokens across all turns. However, if cache_hit_tokens is None (indicating the turn is not cache-aware), defaulting cache_hit_tokens and cache_miss_tokens to 0 will pollute the statistics. Furthermore, if cache_hit_tokens is Some but cache_miss_tokens is None, the miss tokens should be inferred as input_tokens.saturating_sub(hit) (just like in format_cache_history), otherwise they are under-reported as 0, leading to an incorrect 100% hit rate. We should only aggregate cache statistics for cache-aware turns and correctly infer missing cache_miss_tokens.

        let (hit, miss, input) = app.session.turn_cache_history.iter().fold(
            (0u64, 0u64, 0u64),
            |(hit, miss, input), rec| {
                if let Some(h) = rec.cache_hit_tokens {
                    let m = rec.cache_miss_tokens.unwrap_or_else(|| rec.input_tokens.saturating_sub(h));
                    (
                        hit + u64::from(h),
                        miss + u64::from(m),
                        input + u64::from(rec.input_tokens),
                    )
                } else {
                    (hit, miss, input + u64::from(rec.input_tokens))
                }
            },
        );

Comment thread crates/tui/src/core/engine/turn_loop.rs Outdated
Comment on lines 250 to 254
let pinned_hash = pm
.pinned_fingerprint()
.map(|fp| fp.combined_sha256.clone())
.unwrap_or_default();
match pm.check_and_update(&system_text, tools_ref) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The pinned_hash is extracted from pm before pm.check_and_update is called. Since check_and_update is what actually updates the pinned fingerprint, extracting it beforehand means the sent pinned_combined_hash will always be one turn behind (and empty on the very first turn). Extracting the fingerprint after the update ensures the TUI receives the correct, up-to-date hash.

                let change_res = pm.check_and_update(&system_text, tools_ref);
                let pinned_hash = pm
                    .pinned_fingerprint()
                    .map(|fp| fp.combined_sha256.clone())
                    .unwrap_or_default();
                match change_res {

Comment thread crates/tui/src/tui/ui.rs Outdated
Comment on lines +1744 to +1746
if !pinned_combined_hash.is_empty() {
app.last_pinned_prefix_hash = Some(pinned_combined_hash);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

If pinned_combined_hash is empty (e.g., when switching to a model or provider that does not support prefix caching), we should clear app.last_pinned_prefix_hash by setting it to None instead of leaving the stale hash from the previous provider/model intact.

                        app.last_pinned_prefix_hash = (!pinned_combined_hash.is_empty()).then_some(pinned_combined_hash);

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9b571e8f6f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread crates/tui/src/core/engine/turn_loop.rs Outdated
Comment on lines 250 to 254
let pinned_hash = pm
.pinned_fingerprint()
.map(|fp| fp.combined_sha256.clone())
.unwrap_or_default();
match pm.check_and_update(&system_text, tools_ref) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Use the post-update pinned hash

In sessions where the prefix manager is unpinned or detects drift, this captures pinned_hash before check_and_update, but that call is what pins the first fingerprint and re-pins after a change. As a result /cache stats still shows Pinned hash: unavailable after the first turn, and after a drift warning it reports the old baseline hash until another stable turn happens, even though the event field is documented as the current pinned prefix hash.

Useful? React with 👍 / 👎.

Comment thread crates/tui/src/commands/debug.rs Outdated
Comment on lines +317 to +318
hit + u64::from(rec.cache_hit_tokens.unwrap_or(0)),
miss + u64::from(rec.cache_miss_tokens.unwrap_or(0)),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Don't treat missing cache telemetry as zero misses

When a provider reports cache_hit_tokens but omits cache_miss_tokens, the existing /cache table infers misses as input_tokens - hit, but /cache stats adds zero here. In that common partial-telemetry case, a turn like 10k input / 8k hit / missing miss is summarized as 100% hit rate instead of 80%, hiding low cache effectiveness; records with no hit telemetry are also folded into a misleading 0.0% of 0 advisory instead of being excluded like the table does.

Useful? React with 👍 / 👎.

Comment thread crates/tui/src/tui/app.rs
/// Current pinned prefix combined hash (SHA-256, 64 hex chars).
/// Updated per-turn via PrefixCacheChange events; surfaced by
/// `/cache stats` for cache-hit debugging.
pub last_pinned_prefix_hash: Option<String>,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Clear the cached fingerprint with model-scoped telemetry

This new fingerprint is model/provider scoped, but clear_model_scoped_telemetry() (used by model and provider switches) only clears the token fields and turn_cache_history, leaving last_pinned_prefix_hash behind. After switching models or providers, /cache stats can therefore show the previous cache scope's pinned hash alongside an empty/new hit-rate history until another prefix event overwrites it, which makes the diagnostics point at the wrong request prefix.

Useful? React with 👍 / 👎.

Comment on lines +353 to +355
if avg_pct < 80.0 {
out.push_str(" NOTE: cache hit rate is low (< 80%). Check prefix stability above or consider /compact.\n");
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Spurious low-hit-rate advisory when no cache-aware turns exist

When history is non-empty but every record has cache_hit_tokens == None (e.g. a model that doesn't report cache telemetry), the fold leaves turns == 0, total_cache == 0, and avg_pct == 0.0. Because 0.0 < 80.0, the advisory fires anyway, producing output like Turns recorded: 0 … NOTE: cache hit rate is low. The user sees a "fix your caching" prompt when the real situation is simply that the model doesn't expose cache stats. Guard the advisory on turns > 0.

Suggested change
if avg_pct < 80.0 {
out.push_str(" NOTE: cache hit rate is low (< 80%). Check prefix stability above or consider /compact.\n");
}
if turns > 0 && avg_pct < 80.0 {
out.push_str(" NOTE: cache hit rate is low (< 80%). Check prefix stability above or consider /compact.\n");
}

Fix in Codex Fix in Claude Code Fix in Cursor

Comment on lines +317 to +332
let mut turns = 0u64;
let (hit, miss, input) = app.session.turn_cache_history.iter().fold(
(0u64, 0u64, 0u64),
|(hit, miss, input), rec| {
let Some(hit_tokens) = rec.cache_hit_tokens else {
return (hit, miss, input);
};
let h = u64::from(hit_tokens);
let m = u64::from(
rec.cache_miss_tokens
.unwrap_or(rec.input_tokens.saturating_sub(hit_tokens)),
);
turns += 1;
(hit + h, miss + m, input + u64::from(rec.input_tokens))
},
);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 turns mutated as a side-effect inside fold

Mutating an outer variable from inside the fold closure works in Rust but is surprising and hides the counter from fold's own accumulator. Folding into a 4-tuple keeps all state visible and eliminates the mutable capture.

Suggested change
let mut turns = 0u64;
let (hit, miss, input) = app.session.turn_cache_history.iter().fold(
(0u64, 0u64, 0u64),
|(hit, miss, input), rec| {
let Some(hit_tokens) = rec.cache_hit_tokens else {
return (hit, miss, input);
};
let h = u64::from(hit_tokens);
let m = u64::from(
rec.cache_miss_tokens
.unwrap_or(rec.input_tokens.saturating_sub(hit_tokens)),
);
turns += 1;
(hit + h, miss + m, input + u64::from(rec.input_tokens))
},
);
let (turns, hit, miss, input) = app.session.turn_cache_history.iter().fold(
(0u64, 0u64, 0u64, 0u64),
|(turns, hit, miss, input), rec| {
let Some(hit_tokens) = rec.cache_hit_tokens else {
return (turns, hit, miss, input);
};
let h = u64::from(hit_tokens);
let m = u64::from(
rec.cache_miss_tokens
.unwrap_or(rec.input_tokens.saturating_sub(hit_tokens)),
);
(turns + 1, hit + h, miss + m, input + u64::from(rec.input_tokens))
},
);

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Fix in Codex Fix in Claude Code Fix in Cursor

encyc added 4 commits May 29, 2026 17:20
…mary (Hmbown#2264)

Surfaces the current pinned prefix fingerprint hash and stability
diagnostics via a new `/cache stats` subcommand, per the low-risk
0.8.x slice agreed on in Hmbown#2264.

Changes:
- EngineEvent::PrefixCacheChange now carries pinned_combined_hash
  from PrefixStabilityManager so the TUI layer can surface it
- App.last_pinned_prefix_hash stores the hash per-turn for /cache stats
- New /cache stats command renders three sections:
  - Prefix Stability: pct, checks, changes, last-change description
  - Prefix Fingerprint: full SHA-256 hash, short id, drift warning
  - Cache Hit Rate: aggregated token totals + low-hit advisory
- format_tokens() helper for compact K/M token display
- 6 unit tests covering: no-data, stable, drift warning, hit-rate
  summary, low-hit advisory, token formatting

Refs: Hmbown#2264
- Move pinned_hash extraction after check_and_update() so the reported
  hash reflects the current prefix state, not the previous turn
- Skip non-cache-aware turns (cache_hit_tokens=None) in /cache stats
  aggregation; infer miss tokens when cache_miss_tokens is None
- Clear last_pinned_prefix_hash to None when pinned_combined_hash is
  empty (e.g. switching to a non-caching model/provider)
clear_model_scoped_telemetry() now resets last_pinned_prefix_hash to
None so /cache stats does not show the previous model/provider cache
scope's fingerprint after a switch.
@encyc encyc force-pushed the feat/cache-stats-2264 branch from b8380ec to 6175285 Compare May 29, 2026 09:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature Request: Systematic prefix-cache stability — learn from deepseek-reasonix's 99%+ cache hit architecture

1 participant