breaking changes between MVP v0.1 & MVP v0.2 Agentic trace dataset which significantly changes theoretical KV cache hit rate under MAX_CONTEXT=131072

@Duyi-Wang @billishyahao @ZhaiFeiyue @TianDi101
### Summary
We observed a large theoretical KV cache hit-rate difference between the older AgentX trace replay dataset and the newer v0.2 no-subagents dataset. This difference is visible even when looking at the full dataset theoretical hit rate, and it becomes much larger after applying the default `MAX_CONTEXT=131072` constraint used by the replay scripts.

This means the dataset switch itself changes the expected KV cache behavior, independent of serving engine, scheduler, offload, or eviction policy changes.

### Datasets Compared
| Dataset | Full theoretical token hit rate | Theoretical hit rate closer to measured replay with `MAX_CONTEXT=131072` |
|---|---:|---:|
| `semianalysisai/cc-traces-weka-042026` | ~96.8% | ~96.0-96.2% |
| `semianalysisai/cc-traces-weka-no-subagents-051226` | ~85.8% | ~65.6-69.9% |

### Methodology
The numbers above were computed offline from the `hash_ids` field in each trace.

For each trace, I modeled an infinite local prefix cache:

- Cache scope is per trace because `hash_id_scope=local`.
- For each request, cache-hit tokens are computed from the longest contiguous prefix of `hash_ids` that matches any prior request in the same trace.
- Block size is 64 tokens.
- Cross-trace cache sharing is not counted.
- The `MAX_CONTEXT=131072` estimate filters or truncates requests according to the replay-time context limit, to approximate what the scripts can actually send to the server.

### Observations
The older `cc-traces-weka-042026` dataset is highly cache-friendly. Its prompts mostly grow monotonically across turns, so most requests reuse nearly the full previous context.

The newer `cc-traces-weka-no-subagents-051226` dataset is more realistic but much harder for prefix caching. It includes longer traces and more context compaction/reset behavior.

Key point: **even before applying any context-length filter, the full theoretical token hit rate differs substantially: ~96.8% vs ~85.8%.** That is already an ~11 percentage point drop caused by dataset distribution alone.

After applying `MAX_CONTEXT=131072`, the gap becomes much larger:

- Older dataset remains around ~96.0-96.2%.
- Newer dataset drops to ~65.6-69.9%.

This indicates that the newer dataset contains a much larger fraction of long-context traces whose cache-hit distribution changes significantly when the max-context constraint is applied.

### Why This Is a Problem
When comparing benchmark runs across dataset versions, a lower KV cache hit rate may be incorrectly attributed to the serving engine, scheduler, offload implementation, or cache eviction policy.

In reality, part of the difference comes from the dataset distribution itself, even at full-dataset theoretical level, and another large part comes from the interaction with `MAX_CONTEXT`.

This makes it hard to answer questions like:

- Did the cache implementation regress?
- Did the scheduler/offload policy reduce hit rate?
- Or did the dataset switch change the expected theoretical hit-rate ceiling?

### Suggested Improvements
It would be helpful if the benchmark tooling reported dataset-level theoretical cache-hit ceilings before replay starts, for example:

- Full theoretical prefix-cache token hit rate.
- Effective theoretical prefix-cache token hit rate after `MAX_CONTEXT`.
- Number of requests/traces excluded or truncated by `MAX_CONTEXT`.
- Per-request hit-rate distribution, not only aggregate token-weighted hit rate.

This would make benchmark comparisons across dataset versions much easier to interpret.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

breaking changes between MVP v0.1 & MVP v0.2 Agentic trace dataset which significantly changes theoretical KV cache hit rate under MAX_CONTEXT=131072 #1632

Summary

Datasets Compared

Methodology

Observations

Why This Is a Problem

Suggested Improvements

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Dataset	Full theoretical token hit rate	Theoretical hit rate closer to measured replay with `MAX_CONTEXT=131072`
`semianalysisai/cc-traces-weka-042026`	~96.8%	~96.0-96.2%
`semianalysisai/cc-traces-weka-no-subagents-051226`	~85.8%	~65.6-69.9%

breaking changes between MVP v0.1 & MVP v0.2 Agentic trace dataset which significantly changes theoretical KV cache hit rate under MAX_CONTEXT=131072 #1632

Description

Summary

Datasets Compared

Methodology

Observations

Why This Is a Problem

Suggested Improvements

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions