Tokenometer — LLM cost calculator, token counter, latency benchmark, and CI cost guardrail for Claude, GPT-4o, Gemini, Mistral, and Cohere. CLI + GitHub Action + VS Code extension + Claude Code skill. Live: https://tokenometer.vercel.app
Tokenometer answers a simple, expensive question: does it actually cost less to send your prompt as YAML, JSON, XML, or Markdown — across Claude, GPT-4o, Gemini, Mistral, and Cohere — and how fast does each provider actually respond? It started as a $23 question. Today it's the only LLM cost CLI that also tells you latency, ships a PR-blocking GitHub Action, lights up your editor's status bar, and teaches Claude Code agents to think in dollars.
| Tokenometer | tokencost (AgentOps) | tiktoken (OpenAI) | gpt-tokenizer | promptfoo | gpt-token-counter-live (VS Code) | |
|---|---|---|---|---|---|---|
| Multi-provider (Anthropic / OpenAI / Google) | ✓ | ✓ | – | – | ✓ | – |
| Mistral support | ✓ | – | – | – | partial | – |
| Cohere support | ✓ | – | – | – | partial | – |
| Multi-format compare (JSON / YAML / XML / MD / text) | ✓ | – | – | – | – | – |
Empirical mode (real provider countTokens) |
✓ | – | – | – | partial | – |
| Latency (TTFT + tokens/sec, p50/p95) | ✓ | – | – | – | partial | – |
| Vision-token cost (image inputs) | ✓ | – | – | – | – | – |
| Cost (USD), not just tokens | ✓ | ✓ | – | – | partial | – |
| Honest "approximate" flag when offline is a proxy | ✓ | – | – | – | – | – |
| CLI | ✓ | ✓ | – | – | ✓ | – |
| GitHub Action (PR cost-diff guardrail) | ✓ | – | – | – | partial | – |
| Per-file attribution in CI | ✓ | – | – | – | – | – |
| SARIF output (GitHub code scanning) | ✓ | – | – | – | – | – |
| VS Code / Cursor extension | ✓ | – | – | – | – | ✓ |
| Claude Code skill | ✓ | – | – | – | – | – |
Tokenometer is the only tool in this list that combines multi-provider (5 providers, 63 models) + multi-format + empirical mode + latency benchmarking + USD cost + a PR-blocking GitHub Action + an editor extension + a Claude Code skill + an honest approximate-vs-exact flag. tokencost is the closest match for cost-in-USD across providers, but it doesn't compare formats, measure latency, or run as a CI guardrail. tiktoken and gpt-tokenizer are great single-provider primitives — Tokenometer uses gpt-tokenizer under the hood for the offline path. promptfoo is the broadest evaluator overall, but cost is one input among many; it isn't a dedicated cost-guardrail. The VS Code extension is real-time-in-editor only.
claude-opus-4-7realmessages.countTokensis +62% denser (median) than the popularcl100k_baseproxy. If you budget Claude cost fromtiktoken, you under-budget by ~half.claude-sonnet-4-6andclaude-haiku-4-5are within ~17% ofcl100k_base(and identical to each other — same tokenizer family).- Format choice (JSON / YAML / XML / Markdown / text) is a wash — within ~1pp on the median delta. Picking a cheaper model saves 7-12×; reformatting saves ~10%.
gpt-4oempirical (Anthropic's countTokens equivalent for OpenAI: tiktokeno200k_base) matches the offline tokenometer counts on 100/100 cells, exactly. Sanity anchor.
Reproduce: npm install && npm run benchmarks:empirical with ANTHROPIC_API_KEY set. Full sweep is free (countTokens is free).
$ tokenometer ./prompt.md --model claude-opus-4-7 --format json,yaml,markdown
Model Format Tokens USD Approx
────────────────── ────────── ──────── ───────── ──────
claude-opus-4-7 json 1,243 $0.0186 ✓
claude-opus-4-7 yaml 1,189 $0.0178 ✓
claude-opus-4-7 markdown 1,156 $0.0173 ✓
Cheapest: claude-opus-4-7 as markdown ($0.0173)
Priciest: claude-opus-4-7 as json ($0.0186, 1.08x more)
The Approx column shows ✓ when the count is a proxy (Anthropic / Google / Mistral-Tekken / Cohere offline) and is empty when it's an exact match (OpenAI offline, Mistral SentencePiece-family offline, or any provider with --empirical).
Real demo (with empirical mode + GIF) at https://tokenometer.vercel.app.
Cost AND latency in one CLI — the only tool that does both. tiktoken and @anthropic-ai/tokenizer give you a token count for one provider. They don't tell you:
- What the same prompt costs across multiple providers and models (Claude, GPT-4o, Gemini, Mistral, Cohere)
- How fast each provider actually responds (TTFT + tokens/sec, p50/p95/mean) — a real generation, not a synthetic benchmark
- Whether format conversion (YAML ↔ JSON ↔ XML ↔ MD) actually moves the needle
- The empirical cost — what your provider actually charged on a real call, after prompt caching
- Whether a PR introduced a prompt-cost regression
- The vision-token cost when your prompt includes images
Tokenometer is dev-time, multi-provider, multi-format, optionally empirical, latency-aware, vision-aware, and CI-native. And the same core powers the CLI, the GitHub Action, the VS Code / Cursor status bar, and the Claude Code skill — counts, pricing, and tokenizer choices stay identical across surfaces.
One-shot:
npx tokenometer ./prompt.md --model claude-opus-4-7Global:
npm i -g tokenometer
tokenometer ./prompt.md --format yaml,json,xml,markdown,text --model claude-opus-4-7,gpt-4o,mistral-large-latest,command-r-plusStdin works too:
echo "prompt body" | tokenometer - --model claude-sonnet-4-6Run tokenometer --help for the full flag list and the current set of known model ids (63 across 5 providers).
Use docs/ADOPTION.md for copy-paste integration paths:
GitHub Actions cost gates, VS Code/Cursor rollout, MCP clients, LangChain,
Vercel AI SDK, OpenAI SDK, Anthropic SDK, and a PR cost-regression case study.
tokenometer ./prompt.md --model claude-opus-4-7Prints estimated tokens + USD across each format × the chosen model(s). Default model is claude-opus-4-7 (or auto-detected from *_API_KEY env vars); default formats are all of json,markdown,text,xml,yaml.
ANTHROPIC_API_KEY=… tokenometer ./prompt.md --empirical --max-spend 0.05For each (model × format) cell, calls the provider's exact token-count API:
- Anthropic →
messages.countTokens(free) - Google →
model.countTokens(free) - OpenAI → tiktoken
o200k_base(matches OpenAI's production count exactly, no API call) - Cohere →
POST /v1/tokenize(free, requiresCOHERE_API_KEY) - Mistral → unsupported (no public token-count endpoint); offline
mistral-tokenizer-jsis exact for SentencePiece-family models, approximate (chars/4) for Tekken-family models.
Set GOOGLE_API_KEY (or GEMINI_API_KEY) for Gemini, MISTRAL_API_KEY for Mistral, COHERE_API_KEY for Cohere. --offline forces the offline path even if --empirical is also passed.
- uses: faraa2m/tokenometer/packages/action@v1
with:
paths: prompts/**/*.md,prompts/**/*.json
models: claude-opus-4-7,claude-sonnet-4-6,gpt-4o
formats: json,yaml,markdown
budget: '0.50' # USD; omit to disable the gate
top-n-files: 5 # rows shown in the per-file Δ table; the rest fold into <details>Posts a sticky PR comment with the cost diff vs the base branch, including a per-file Δ table and a collapsible "all files" block. Fails the check when the total Δ exceeds budget. See packages/action/README.md for all inputs and outputs.
ext install faraa2m.tokenometer-vscode
Or install directly from the VS Code Marketplace or Open VSX (Cursor / VSCodium).
Status bar shows model · tokens · USD for the active prompt file, updates on every keystroke (debounced), and turns warning-colored when you exceed tokenometer.warnOnCostAbove. Same @tokenometer/core as the CLI — what you see in the editor matches what CI computes. See packages/vscode/README.md.
cp -R packages/claude-code-skill ~/.claude/skills/tokenometerInstalls the tokenometer-cost-check skill so Claude Code agents can answer "what does this prompt cost?" with a real number — they shell out to npx tokenometer instead of guessing from tiktoken. See packages/claude-code-skill/README.md.
Tokenometer picks a tokenizer per provider and flags the count as approximate (approximate: true in the API result) when the offline path is a proxy:
| Provider | Offline tokenizer | Exactness | Empirical (--empirical) |
|---|---|---|---|
| OpenAI | gpt-tokenizer o200k_base |
exact | same o200k_base (matches OpenAI production count) |
| Anthropic | gpt-tokenizer cl100k_base |
approximate | messages.countTokens (exact, free) |
chars / 4 heuristic |
approximate | model.countTokens (exact, free) |
|
| Mistral | mistral-tokenizer-js (V1/V2/V3) · chars/4 for Tekken family |
exact for SP-family · approximate for Tekken | unsupported (no public token-count endpoint) |
| Cohere | chars / 4 heuristic |
approximate | POST /v1/tokenize (exact, free, requires COHERE_API_KEY) |
Cost = tokens / 1000 × per-1k input rate. Pricing and context windows are sourced from the tokenlens registry, with a small set of local overrides for bleeding-edge models the registry hasn't picked up yet (and the full Cohere catalog, which @tokenlens/models doesn't ship at v1.3.0) — see packages/core/src/rates.ts (RATES_VERSION).
The CLI is multi-surface by design:
--output table(default) — human-readable per-cell table.--output json— emits aTokenometerResultshape ({ files: [{ path, results: [...] }] }); pipe tojq.--output sarif— emits SARIF 2.1.0; drop into GitHub Code Scanning or any SARIF viewer.--by-file— appends a per-file token + USD summary table for multi-file inputs.--image <path>(repeatable) — adds vision-token cost for Claude / GPT-4o / Gemini.--latency— measures real generation latency (TTFT + total ms + tokens/sec, p50/p95/mean overntrials, default 3). Implies--empirical. Supported on Anthropic, OpenAI, Google, Cohere, and Mistral.
npx tokenometer ./prompt.md --output sarif > tokenometer.sarif
npx tokenometer ./prompts/*.md --by-file --output json | jq '.files[].results | map(.inputCost) | add'
ANTHROPIC_API_KEY=… OPENAI_API_KEY=… npx tokenometer ./prompt.md --latency --model claude-opus-4-7,gpt-4oFull flag reference: packages/cli/README.md.
- VS Code / Cursor —
@tokenometer/vscode. Status bar with live token count + USD cost; settings for model, format, and a warn-above-USD threshold;Tokenometer: Switch modelandTokenometer: Show detailscommands. - Claude Code skill —
@tokenometer/claude-code-skill. Drop in~/.claude/skills/tokenometer/SKILL.mdand Claude Code agents will reach fornpx tokenometer …when you ask them anything cost-shaped.
- Code of Conduct
- Contributing guide
- Security policy — uses GitHub Private Vulnerability Reporting
- Changelog
- Discussions
v1.0.x — production-ready. Shipped across npm (tokenometer, @tokenometer/core), VS Code Marketplace + Open VSX (faraa2m.tokenometer-vscode), GitHub Marketplace (the Tokenometer Action), and the live playground at tokenometer.vercel.app. See CHANGELOG.md for release notes and the milestones page for what's next.
MIT