Summary
OpenAI returns usage.prompt_tokens_details.cached_tokens on every text-generation response for cache-eligible models (gpt-4o family, gpt-5 family, etc.), but AbstractOpenAiCompatibleTextGenerationModel::parseGenerationResponse only reads the three top-level fields and discards the rest of the usage object before any downstream code can see it.
This makes OpenAI prompt caching unobservable from any caller — and OpenAI exposes cache hit rate only through this field. Caching is a major cost lever (50–75% input discount on hit), and without this number downstream applications cannot measure whether their prompt structure is actually cache-friendly, nor whether changes to that structure improved or regressed cache hit rate.
The same gap exists for Anthropic when an Anthropic provider implementation lands: Anthropic returns usage.cache_creation_input_tokens and usage.cache_read_input_tokens in the same response shape, and the current base class would discard those too.
Where the data is dropped
src/Providers/OpenAiCompatibleImplementation/AbstractOpenAiCompatibleTextGenerationModel.php (around lines 604–618):
if (isset($responseData['usage']) && is_array($responseData['usage'])) {
$usage = $responseData['usage'];
$tokenUsage = new TokenUsage(
$usage['prompt_tokens'] ?? 0,
$usage['completion_tokens'] ?? 0,
$usage['total_tokens'] ?? 0
);
} else {
$tokenUsage = new TokenUsage(0, 0, 0);
}
// Use any other data from the response as provider-specific response metadata.
$additionalData = $responseData;
unset($additionalData['id'], $additionalData['choices'], $additionalData['usage']);
The prompt_tokens_details.cached_tokens value is present on the response, never read into TokenUsage, and then explicitly stripped from $additionalData so callers cannot recover it through the metadata passthrough either.
Symptom
Callers measuring per-request usage see promptTokens: N, no indication of which portion of N was served from cache. Multi-turn / repeated-prefix workloads are especially affected because that is exactly the case where caching helps most and the metric is most useful.
Suggested shape
Add nullable cache fields to TokenUsage, parsed when the provider supplies them, left null otherwise:
?int $cachedTokens — for OpenAI's prompt_tokens_details.cached_tokens and Anthropic's cache_read_input_tokens.
?int $cacheCreationTokens — for Anthropic's cache_creation_input_tokens. (No OpenAI analog today, since OpenAI's prompt caching is automatic and not write-tracked, but the field is useful for clean Anthropic surfacing and for any future provider that distinguishes cache-create vs cache-read.)
This follows the same precedent established by commit 305b80d ("Add support for thought tokens in token usage data") — both are provider-optional usage fields that benefit from being first-class on the DTO rather than buried in additionalData.
Relationship to #158
#158 (Improve handling of unprovided usage statistics) proposes making the existing TokenUsage properties nullable and the TokenUsage instance itself nullable on GenerativeAiResult. That refactor and this addition are aligned — they both push toward a more flexible TokenUsage shape — but they are independent enough to land in either order:
I'm happy to send a PR either way once a maintainer indicates a preference on which lands first and on the field naming.
Ecosystem signal
This is a widely-recognized gap in OpenAI-compatible clients right now; recent issues filed against other libraries with the same shape of bug include NousResearch/hermes-agent#25400, braintrustdata/braintrust-sdk-dotnet#53, PostHog/posthog-python#574, and BerriAI/litellm#22984. Including the field puts php-ai-client on the right side of the pattern as more callers start measuring cache hit rate as a first-class metric.
AI assistance
- AI assistance: Yes
- Tool(s): opencode (Claude Opus 4.7)
- Used for: Drafted the issue body after Chris asked me to diagnose why a downstream Data Machine workload could not observe prompt-cache hit rate. I located the parsing gap, verified the discarded field, checked for existing issues/PRs in this repo and the wider ecosystem, and drafted this issue for Chris to review. Chris is the submitter and is responsible for the report; AI assistance is descriptive, not a disclaimer.
Summary
OpenAI returns
usage.prompt_tokens_details.cached_tokenson every text-generation response for cache-eligible models (gpt-4o family, gpt-5 family, etc.), butAbstractOpenAiCompatibleTextGenerationModel::parseGenerationResponseonly reads the three top-level fields and discards the rest of theusageobject before any downstream code can see it.This makes OpenAI prompt caching unobservable from any caller — and OpenAI exposes cache hit rate only through this field. Caching is a major cost lever (50–75% input discount on hit), and without this number downstream applications cannot measure whether their prompt structure is actually cache-friendly, nor whether changes to that structure improved or regressed cache hit rate.
The same gap exists for Anthropic when an Anthropic provider implementation lands: Anthropic returns
usage.cache_creation_input_tokensandusage.cache_read_input_tokensin the same response shape, and the current base class would discard those too.Where the data is dropped
src/Providers/OpenAiCompatibleImplementation/AbstractOpenAiCompatibleTextGenerationModel.php(around lines 604–618):The
prompt_tokens_details.cached_tokensvalue is present on the response, never read intoTokenUsage, and then explicitly stripped from$additionalDataso callers cannot recover it through the metadata passthrough either.Symptom
Callers measuring per-request usage see
promptTokens: N, no indication of which portion ofNwas served from cache. Multi-turn / repeated-prefix workloads are especially affected because that is exactly the case where caching helps most and the metric is most useful.Suggested shape
Add nullable cache fields to
TokenUsage, parsed when the provider supplies them, leftnullotherwise:?int $cachedTokens— for OpenAI'sprompt_tokens_details.cached_tokensand Anthropic'scache_read_input_tokens.?int $cacheCreationTokens— for Anthropic'scache_creation_input_tokens. (No OpenAI analog today, since OpenAI's prompt caching is automatic and not write-tracked, but the field is useful for clean Anthropic surfacing and for any future provider that distinguishes cache-create vs cache-read.)This follows the same precedent established by commit
305b80d("Add support for thought tokens in token usage data") — both are provider-optional usage fields that benefit from being first-class on the DTO rather than buried inadditionalData.Relationship to #158
#158 (
Improve handling of unprovided usage statistics) proposes making the existingTokenUsageproperties nullable and theTokenUsageinstance itself nullable onGenerativeAiResult. That refactor and this addition are aligned — they both push toward a more flexibleTokenUsageshape — but they are independent enough to land in either order:?intfrom the start with no migration noise.?int(the existing ones areintwith0sentinels), and Improve handling of unprovided usage statistics #158 then makes the existing fields catch up.I'm happy to send a PR either way once a maintainer indicates a preference on which lands first and on the field naming.
Ecosystem signal
This is a widely-recognized gap in OpenAI-compatible clients right now; recent issues filed against other libraries with the same shape of bug include
NousResearch/hermes-agent#25400,braintrustdata/braintrust-sdk-dotnet#53,PostHog/posthog-python#574, andBerriAI/litellm#22984. Including the field puts php-ai-client on the right side of the pattern as more callers start measuring cache hit rate as a first-class metric.AI assistance