context-management

Prompt-cache + context-window management for AI SDK apps. Extracted from a set of gateway-caching experiments, where the combined techniques cut a 20-turn Opus conversation's cost by 82% vs. gateway caching: "auto" alone.

What it does:

Breakpoint placement — pinTailBreakpoint (a prepareStep hook) caches each tool-loop step's tail so multi-step turns don't re-pay for the same tool output every step (-15% alone), plus trailing-history chains and breakpoint budgeting under Anthropic's 4-marker cap.
Server-side context edits — contextEdits({ contextWindow }) configures Anthropic's clear_tool_uses / clear_thinking with cache-friendly sizing, and mirrorTrim keeps the local history matched to what the server cleared so the cached prefix stays aligned.
Usage / cost accounting — makeMessageMetadata attaches model id, per-turn token breakdown (cache read/write/uncached), context size, and the AI Gateway's actual billed USD to every assistant message; sessionUsage folds a conversation into totals. Metadata rides the UIMessage, so existing persistence stores it for free.
Truncation + recovery — replace old, large tool-result bodies with id-stamped stubs (deterministic, cache-stable) and expose a fetch_full_result tool so the model can recover any full output on demand — backed by existing chat persistence (historyOutputStore), no extra storage.

Install

bun add github:EvolvingPrograms/context-management

Or locally from disk during development:

// package.json
"dependencies": {
  "@evolvingprograms/context-management": "file:../context-management"
}

Use

Building blocks are standalone:

import {
  pinTailBreakpoint,
  contextEdits,
  mirrorTrim,
  makeMessageMetadata,
  sessionUsage,
} from "@evolvingprograms/context-management"

const result = streamText({
  model,
  prepareStep: pinTailBreakpoint,
  providerOptions: {
    gateway: { caching: "auto" },
    anthropic: { contextManagement: contextEdits({ contextWindow: 200_000 }) },
  },
  ...
})

return result.toUIMessageStreamResponse({
  messageMetadata: makeMessageMetadata({ model: modelId }),
  onFinish: ({ messages }) => save(messages), // usage/cost/model persist free
})

Or compose everything with one call — modes off | auto | pinned | managed mirror the gateway-caching strategy ladder:

import { createContextManagement } from "@evolvingprograms/context-management"

const cm = createContextManagement({
  mode: "managed",
  model: modelId,
  contextWindow: 200_000,
  modelMessages, // backs the fetch_full_result recovery tool automatically
})

const result = streamText({
  model,
  system: SYSTEM + cm.systemSuffix,
  messages: modelMessages,
  tools: { ...appTools, ...cm.tools },
  prepareStep: cm.prepareStep,
  providerOptions: cm.providerOptions(base),
})
return result.toUIMessageStreamResponse({
  messageMetadata: cm.messageMetadata,
  onFinish: ({ messages }) => save(messages),
})

Layout

src/            modules (each with its own README + sibling tests)
├── breakpoints/   cache_control placement
├── edits/         Anthropic context edits + local mirror-trim
├── truncation/    tool-result truncation + recovery
└── usage/         token / cache / cost accounting

Develop

bun install
bun test          # unit tests, no network
bun run typecheck

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.github/workflows		.github/workflows
src		src
tests		tests
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
bun.lock		bun.lock
package.json		package.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

context-management

Install

Use

Layout

Develop

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

context-management

Install

Use

Layout

Develop

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages