Tokenometer

Tokenometer — LLM cost calculator, token counter, latency benchmark, and CI cost guardrail for Claude, GPT-4o, Gemini, Mistral, and Cohere. CLI + GitHub Action + VS Code extension + Claude Code skill. Live: https://tokenometer.vercel.app

Tokenometer answers a simple, expensive question: does it actually cost less to send your prompt as YAML, JSON, XML, or Markdown — across Claude, GPT-4o, Gemini, Mistral, and Cohere — and how fast does each provider actually respond? It started as a $23 question. Today it's the only LLM cost CLI that also tells you latency, ships a PR-blocking GitHub Action, lights up your editor's status bar, and teaches Claude Code agents to think in dollars.

Why Tokenometer vs alternatives

	Tokenometer	tokencost (AgentOps)	tiktoken (OpenAI)	gpt-tokenizer	promptfoo	gpt-token-counter-live (VS Code)
Multi-provider (Anthropic / OpenAI / Google)	✓	✓	–	–	✓	–
Mistral support	✓	–	–	–	partial	–
Cohere support	✓	–	–	–	partial	–
Multi-format compare (JSON / YAML / XML / MD / text)	✓	–	–	–	–	–
Empirical mode (real provider `countTokens`)	✓	–	–	–	partial	–
Latency (TTFT + tokens/sec, p50/p95)	✓	–	–	–	partial	–
Vision-token cost (image inputs)	✓	–	–	–	–	–
Cost (USD), not just tokens	✓	✓	–	–	partial	–
Honest "approximate" flag when offline is a proxy	✓	–	–	–	–	–
CLI	✓	✓	–	–	✓	–
GitHub Action (PR cost-diff guardrail)	✓	–	–	–	partial	–
Per-file attribution in CI	✓	–	–	–	–	–
SARIF output (GitHub code scanning)	✓	–	–	–	–	–
VS Code / Cursor extension	✓	–	–	–	–	✓
Claude Code skill	✓	–	–	–	–	–

Tokenometer is the only tool in this list that combines multi-provider (5 providers, 63 models) + multi-format + empirical mode + latency benchmarking + USD cost + a PR-blocking GitHub Action + an editor extension + a Claude Code skill + an honest approximate-vs-exact flag. tokencost is the closest match for cost-in-USD across providers, but it doesn't compare formats, measure latency, or run as a CI guardrail. tiktoken and gpt-tokenizer are great single-provider primitives — Tokenometer uses gpt-tokenizer under the hood for the offline path. promptfoo is the broadest evaluator overall, but cost is one input among many; it isn't a dedicated cost-guardrail. The VS Code extension is real-time-in-editor only.

Findings (Anthropic, n=150 cells across 10 prompt shapes)

claude-opus-4-7 real messages.countTokens is +62% denser (median) than the popular cl100k_base proxy. If you budget Claude cost from tiktoken, you under-budget by ~half.
claude-sonnet-4-6 and claude-haiku-4-5 are within ~17% of cl100k_base (and identical to each other — same tokenizer family).
Format choice (JSON / YAML / XML / Markdown / text) is a wash — within ~1pp on the median delta. Picking a cheaper model saves 7-12×; reformatting saves ~10%.
gpt-4o empirical (Anthropic's countTokens equivalent for OpenAI: tiktoken o200k_base) matches the offline tokenometer counts on 100/100 cells, exactly. Sanity anchor.

Reproduce: npm install && npm run benchmarks:empirical with ANTHROPIC_API_KEY set. Full sweep is free (countTokens is free).

Demo

$ tokenometer ./prompt.md --model claude-opus-4-7 --format json,yaml,markdown

  Model              Format     Tokens   USD       Approx
  ────────────────── ────────── ──────── ───────── ──────
  claude-opus-4-7    json       1,243    $0.0186   ✓
  claude-opus-4-7    yaml       1,189    $0.0178   ✓
  claude-opus-4-7    markdown   1,156    $0.0173   ✓

  Cheapest: claude-opus-4-7 as markdown ($0.0173)
  Priciest: claude-opus-4-7 as json     ($0.0186, 1.08x more)

The Approx column shows ✓ when the count is a proxy (Anthropic / Google / Mistral-Tekken / Cohere offline) and is empty when it's an exact match (OpenAI offline, Mistral SentencePiece-family offline, or any provider with --empirical).

Real demo (with empirical mode + GIF) at https://tokenometer.vercel.app.

Why this exists

Cost AND latency in one CLI — the only tool that does both. tiktoken and @anthropic-ai/tokenizer give you a token count for one provider. They don't tell you:

What the same prompt costs across multiple providers and models (Claude, GPT-4o, Gemini, Mistral, Cohere)
How fast each provider actually responds (TTFT + tokens/sec, p50/p95/mean) — a real generation, not a synthetic benchmark
Whether format conversion (YAML ↔ JSON ↔ XML ↔ MD) actually moves the needle
The empirical cost — what your provider actually charged on a real call, after prompt caching
Whether a PR introduced a prompt-cost regression
The vision-token cost when your prompt includes images

Tokenometer is dev-time, multi-provider, multi-format, optionally empirical, latency-aware, vision-aware, and CI-native. And the same core powers the CLI, the GitHub Action, the VS Code / Cursor status bar, and the Claude Code skill — counts, pricing, and tokenizer choices stay identical across surfaces.

Install

One-shot:

npx tokenometer ./prompt.md --model claude-opus-4-7

Global:

npm i -g tokenometer
tokenometer ./prompt.md --format yaml,json,xml,markdown,text --model claude-opus-4-7,gpt-4o,mistral-large-latest,command-r-plus

Stdin works too:

echo "prompt body" | tokenometer - --model claude-sonnet-4-6

Run tokenometer --help for the full flag list and the current set of known model ids (63 across 5 providers).

Adoption playbooks

Use docs/ADOPTION.md for copy-paste integration paths: GitHub Actions cost gates, VS Code/Cursor rollout, MCP clients, LangChain, Vercel AI SDK, OpenAI SDK, Anthropic SDK, and a PR cost-regression case study.

Five-line use

1. Compare formats for a single prompt (offline, no API key)

tokenometer ./prompt.md --model claude-opus-4-7

Prints estimated tokens + USD across each format × the chosen model(s). Default model is claude-opus-4-7 (or auto-detected from *_API_KEY env vars); default formats are all of json,markdown,text,xml,yaml.

2. Empirical mode (real provider `countTokens`, with a hard ceiling)

ANTHROPIC_API_KEY=… tokenometer ./prompt.md --empirical --max-spend 0.05

For each (model × format) cell, calls the provider's exact token-count API:

Anthropic → messages.countTokens (free)
Google → model.countTokens (free)
OpenAI → tiktoken o200k_base (matches OpenAI's production count exactly, no API call)
Cohere → POST /v1/tokenize (free, requires COHERE_API_KEY)
Mistral → unsupported (no public token-count endpoint); offline mistral-tokenizer-js is exact for SentencePiece-family models, approximate (chars/4) for Tekken-family models.

Set GOOGLE_API_KEY (or GEMINI_API_KEY) for Gemini, MISTRAL_API_KEY for Mistral, COHERE_API_KEY for Cohere. --offline forces the offline path even if --empirical is also passed.

3. CI guardrail (GitHub Action)

- uses: faraa2m/tokenometer/packages/action@v1
  with:
    paths: prompts/**/*.md,prompts/**/*.json
    models: claude-opus-4-7,claude-sonnet-4-6,gpt-4o
    formats: json,yaml,markdown
    budget: '0.50'      # USD; omit to disable the gate
    top-n-files: 5      # rows shown in the per-file Δ table; the rest fold into <details>

Posts a sticky PR comment with the cost diff vs the base branch, including a per-file Δ table and a collapsible "all files" block. Fails the check when the total Δ exceeds budget. See packages/action/README.md for all inputs and outputs.

4. Live cost in your editor (VS Code / Cursor)

ext install faraa2m.tokenometer-vscode

Or install directly from the VS Code Marketplace or Open VSX (Cursor / VSCodium).

Status bar shows model · tokens · USD for the active prompt file, updates on every keystroke (debounced), and turns warning-colored when you exceed tokenometer.warnOnCostAbove. Same @tokenometer/core as the CLI — what you see in the editor matches what CI computes. See packages/vscode/README.md.

5. Claude Code skill (agentic prompt-cost awareness)

cp -R packages/claude-code-skill ~/.claude/skills/tokenometer

Installs the tokenometer-cost-check skill so Claude Code agents can answer "what does this prompt cost?" with a real number — they shell out to npx tokenometer instead of guessing from tiktoken. See packages/claude-code-skill/README.md.

Methodology

Tokenometer picks a tokenizer per provider and flags the count as approximate (approximate: true in the API result) when the offline path is a proxy:

Provider	Offline tokenizer	Exactness	Empirical (`--empirical`)
OpenAI	`gpt-tokenizer` `o200k_base`	exact	same `o200k_base` (matches OpenAI production count)
Anthropic	`gpt-tokenizer` `cl100k_base`	approximate	`messages.countTokens` (exact, free)
Google	`chars / 4` heuristic	approximate	`model.countTokens` (exact, free)
Mistral	`mistral-tokenizer-js` (V1/V2/V3) · `chars/4` for Tekken family	exact for SP-family · approximate for Tekken	unsupported (no public token-count endpoint)
Cohere	`chars / 4` heuristic	approximate	`POST /v1/tokenize` (exact, free, requires `COHERE_API_KEY`)

Cost = tokens / 1000 × per-1k input rate. Pricing and context windows are sourced from the tokenlens registry, with a small set of local overrides for bleeding-edge models the registry hasn't picked up yet (and the full Cohere catalog, which @tokenlens/models doesn't ship at v1.3.0) — see packages/core/src/rates.ts (RATES_VERSION).

Output formats

The CLI is multi-surface by design:

--output table (default) — human-readable per-cell table.
--output json — emits a TokenometerResult shape ({ files: [{ path, results: [...] }] }); pipe to jq.
--output sarif — emits SARIF 2.1.0; drop into GitHub Code Scanning or any SARIF viewer.
--by-file — appends a per-file token + USD summary table for multi-file inputs.
--image <path> (repeatable) — adds vision-token cost for Claude / GPT-4o / Gemini.
--latency — measures real generation latency (TTFT + total ms + tokens/sec, p50/p95/mean over n trials, default 3). Implies --empirical. Supported on Anthropic, OpenAI, Google, Cohere, and Mistral.

npx tokenometer ./prompt.md --output sarif > tokenometer.sarif
npx tokenometer ./prompts/*.md --by-file --output json | jq '.files[].results | map(.inputCost) | add'
ANTHROPIC_API_KEY=… OPENAI_API_KEY=… npx tokenometer ./prompt.md --latency --model claude-opus-4-7,gpt-4o

Full flag reference: packages/cli/README.md.

Editor + Claude Code

VS Code / Cursor — @tokenometer/vscode. Status bar with live token count + USD cost; settings for model, format, and a warn-above-USD threshold; Tokenometer: Switch model and Tokenometer: Show details commands.
Claude Code skill — @tokenometer/claude-code-skill. Drop in ~/.claude/skills/tokenometer/SKILL.md and Claude Code agents will reach for npx tokenometer … when you ask them anything cost-shaped.

Project health

Code of Conduct
Contributing guide
Security policy — uses GitHub Private Vulnerability Reporting
Changelog
Discussions

Status

v1.0.x — production-ready. Shipped across npm (tokenometer, @tokenometer/core), VS Code Marketplace + Open VSX (faraa2m.tokenometer-vscode), GitHub Marketplace (the Tokenometer Action), and the live playground at tokenometer.vercel.app. See CHANGELOG.md for release notes and the milestones page for what's next.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 59 Commits
.changeset		.changeset
.github		.github
.planning/research		.planning/research
benchmarks		benchmarks
docs		docs
packages		packages
scripts		scripts
.gitignore		.gitignore
.npmrc		.npmrc
.nvmrc		.nvmrc
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
biome.json		biome.json
package-lock.json		package-lock.json
package.json		package.json
tsconfig.base.json		tsconfig.base.json
tsconfig.json		tsconfig.json
vercel.json		vercel.json
vitest.config.ts		vitest.config.ts
vitest.workspace.ts		vitest.workspace.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Tokenometer

Why Tokenometer vs alternatives

Findings (Anthropic, n=150 cells across 10 prompt shapes)

Demo

Why this exists

Install

Adoption playbooks

Five-line use

1. Compare formats for a single prompt (offline, no API key)

2. Empirical mode (real provider `countTokens`, with a hard ceiling)

3. CI guardrail (GitHub Action)

4. Live cost in your editor (VS Code / Cursor)

5. Claude Code skill (agentic prompt-cost awareness)

Methodology

Output formats

Editor + Claude Code

Project health

Status

License

About

Uh oh!

Releases 9

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Tokenometer

Why Tokenometer vs alternatives

Findings (Anthropic, n=150 cells across 10 prompt shapes)

Demo

Why this exists

Install

Adoption playbooks

Five-line use

1. Compare formats for a single prompt (offline, no API key)

2. Empirical mode (real provider countTokens, with a hard ceiling)

3. CI guardrail (GitHub Action)

4. Live cost in your editor (VS Code / Cursor)

5. Claude Code skill (agentic prompt-cost awareness)

Methodology

Output formats

Editor + Claude Code

Project health

Status

License

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 9

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

2. Empirical mode (real provider `countTokens`, with a hard ceiling)

Packages