Skip to content

EvolvingPrograms/context-management

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

context-management

Prompt-cache + context-window management for AI SDK apps. Extracted from a set of gateway-caching experiments, where the combined techniques cut a 20-turn Opus conversation's cost by 82% vs. gateway caching: "auto" alone.

What it does:

  • Breakpoint placementpinTailBreakpoint (a prepareStep hook) caches each tool-loop step's tail so multi-step turns don't re-pay for the same tool output every step (-15% alone), plus trailing-history chains and breakpoint budgeting under Anthropic's 4-marker cap.
  • Server-side context editscontextEdits({ contextWindow }) configures Anthropic's clear_tool_uses / clear_thinking with cache-friendly sizing, and mirrorTrim keeps the local history matched to what the server cleared so the cached prefix stays aligned.
  • Usage / cost accountingmakeMessageMetadata attaches model id, per-turn token breakdown (cache read/write/uncached), context size, and the AI Gateway's actual billed USD to every assistant message; sessionUsage folds a conversation into totals. Metadata rides the UIMessage, so existing persistence stores it for free.
  • Truncation + recovery — replace old, large tool-result bodies with id-stamped stubs (deterministic, cache-stable) and expose a fetch_full_result tool so the model can recover any full output on demand — backed by existing chat persistence (historyOutputStore), no extra storage.

Install

bun add github:EvolvingPrograms/context-management

Or locally from disk during development:

// package.json
"dependencies": {
  "@evolvingprograms/context-management": "file:../context-management"
}

Use

Building blocks are standalone:

import {
  pinTailBreakpoint,
  contextEdits,
  mirrorTrim,
  makeMessageMetadata,
  sessionUsage,
} from "@evolvingprograms/context-management"

const result = streamText({
  model,
  prepareStep: pinTailBreakpoint,
  providerOptions: {
    gateway: { caching: "auto" },
    anthropic: { contextManagement: contextEdits({ contextWindow: 200_000 }) },
  },
  ...
})

return result.toUIMessageStreamResponse({
  messageMetadata: makeMessageMetadata({ model: modelId }),
  onFinish: ({ messages }) => save(messages), // usage/cost/model persist free
})

Or compose everything with one call — modes off | auto | pinned | managed mirror the gateway-caching strategy ladder:

import { createContextManagement } from "@evolvingprograms/context-management"

const cm = createContextManagement({
  mode: "managed",
  model: modelId,
  contextWindow: 200_000,
  modelMessages, // backs the fetch_full_result recovery tool automatically
})

const result = streamText({
  model,
  system: SYSTEM + cm.systemSuffix,
  messages: modelMessages,
  tools: { ...appTools, ...cm.tools },
  prepareStep: cm.prepareStep,
  providerOptions: cm.providerOptions(base),
})
return result.toUIMessageStreamResponse({
  messageMetadata: cm.messageMetadata,
  onFinish: ({ messages }) => save(messages),
})

Layout

src/            modules (each with its own README + sibling tests)
├── breakpoints/   cache_control placement
├── edits/         Anthropic context edits + local mirror-trim
├── truncation/    tool-result truncation + recovery
└── usage/         token / cache / cost accounting

Develop

bun install
bun test          # unit tests, no network
bun run typecheck

About

Prompt-cache and context-window management for AI SDK apps: cache breakpoint placement, Anthropic context edits, tool-result truncation with recovery, and usage/cost accounting.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors