Surface `prompt_tokens_details.cached_tokens` (and parallel cache fields) in `TokenUsage`

## Summary

OpenAI returns `usage.prompt_tokens_details.cached_tokens` on every text-generation response for cache-eligible models (gpt-4o family, gpt-5 family, etc.), but `AbstractOpenAiCompatibleTextGenerationModel::parseGenerationResponse` only reads the three top-level fields and discards the rest of the `usage` object before any downstream code can see it.

This makes OpenAI prompt caching unobservable from any caller — and OpenAI exposes cache hit rate **only** through this field. Caching is a major cost lever (50–75% input discount on hit), and without this number downstream applications cannot measure whether their prompt structure is actually cache-friendly, nor whether changes to that structure improved or regressed cache hit rate.

The same gap exists for Anthropic when an Anthropic provider implementation lands: Anthropic returns `usage.cache_creation_input_tokens` and `usage.cache_read_input_tokens` in the same response shape, and the current base class would discard those too.

## Where the data is dropped

[`src/Providers/OpenAiCompatibleImplementation/AbstractOpenAiCompatibleTextGenerationModel.php` (around lines 604–618)](https://github.com/WordPress/php-ai-client/blob/main/src/Providers/OpenAiCompatibleImplementation/AbstractOpenAiCompatibleTextGenerationModel.php#L604-L618):

```php
if (isset($responseData['usage']) && is_array($responseData['usage'])) {
    $usage = $responseData['usage'];

    $tokenUsage = new TokenUsage(
        $usage['prompt_tokens'] ?? 0,
        $usage['completion_tokens'] ?? 0,
        $usage['total_tokens'] ?? 0
    );
} else {
    $tokenUsage = new TokenUsage(0, 0, 0);
}

// Use any other data from the response as provider-specific response metadata.
$additionalData = $responseData;
unset($additionalData['id'], $additionalData['choices'], $additionalData['usage']);
```

The `prompt_tokens_details.cached_tokens` value is present on the response, never read into `TokenUsage`, and then explicitly stripped from `$additionalData` so callers cannot recover it through the metadata passthrough either.

## Symptom

Callers measuring per-request usage see `promptTokens: N`, no indication of which portion of `N` was served from cache. Multi-turn / repeated-prefix workloads are especially affected because that is exactly the case where caching helps most and the metric is most useful.

## Suggested shape

Add nullable cache fields to `TokenUsage`, parsed when the provider supplies them, left `null` otherwise:

- `?int $cachedTokens` — for OpenAI's `prompt_tokens_details.cached_tokens` and Anthropic's `cache_read_input_tokens`.
- `?int $cacheCreationTokens` — for Anthropic's `cache_creation_input_tokens`. (No OpenAI analog today, since OpenAI's prompt caching is automatic and not write-tracked, but the field is useful for clean Anthropic surfacing and for any future provider that distinguishes cache-create vs cache-read.)

This follows the same precedent established by [commit `305b80d` (\"Add support for thought tokens in token usage data\")](https://github.com/WordPress/php-ai-client/commit/305b80d) — both are provider-optional usage fields that benefit from being first-class on the DTO rather than buried in `additionalData`.

## Relationship to #158

[#158](https://github.com/WordPress/php-ai-client/issues/158) (`Improve handling of unprovided usage statistics`) proposes making the existing `TokenUsage` properties nullable and the `TokenUsage` instance itself nullable on `GenerativeAiResult`. That refactor and this addition are aligned — they both push toward a more flexible `TokenUsage` shape — but they are independent enough to land in either order:

- If #158 lands first, the new cache fields can be added as `?int` from the start with no migration noise.
- If this lands first, the new fields are `?int` (the existing ones are `int` with `0` sentinels), and #158 then makes the existing fields catch up.

I'm happy to send a PR either way once a maintainer indicates a preference on which lands first and on the field naming.

## Ecosystem signal

This is a widely-recognized gap in OpenAI-compatible clients right now; recent issues filed against other libraries with the same shape of bug include `NousResearch/hermes-agent#25400`, `braintrustdata/braintrust-sdk-dotnet#53`, `PostHog/posthog-python#574`, and `BerriAI/litellm#22984`. Including the field puts php-ai-client on the right side of the pattern as more callers start measuring cache hit rate as a first-class metric.

## AI assistance

- **AI assistance:** Yes
- **Tool(s):** opencode (Claude Opus 4.7)
- **Used for:** Drafted the issue body after Chris asked me to diagnose why a downstream Data Machine workload could not observe prompt-cache hit rate. I located the parsing gap, verified the discarded field, checked for existing issues/PRs in this repo and the wider ecosystem, and drafted this issue for Chris to review. Chris is the submitter and is responsible for the report; AI assistance is descriptive, not a disclaimer.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Surface `prompt_tokens_details.cached_tokens` (and parallel cache fields) in `TokenUsage` #236

Summary

Where the data is dropped

Symptom

Suggested shape

Relationship to #158

Ecosystem signal

AI assistance

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Surface prompt_tokens_details.cached_tokens (and parallel cache fields) in TokenUsage #236

Description

Summary

Where the data is dropped

Symptom

Suggested shape

Relationship to #158

Ecosystem signal

AI assistance

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Surface `prompt_tokens_details.cached_tokens` (and parallel cache fields) in `TokenUsage` #236