A small, extensible CLI coding agent built around one constraint: keep token usage low. As coding-agent costs climb, tiny-code automates the savings so you don't have to. Interactive terminal REPL, interchangeable Anthropic, Gemini, DeepSeek, Qwen Coder, and local (Ollama) models, and just the core features you actually use: read/write/edit files, run shell commands, search code, and a custom commands/skills system. No business logic baked in.
Run cheap, open-weight models locally and escalate heavy work to a frontier model only when needed — see Local models & cost-aware routing.
Status: early (v0.x). Published as
@therr/tiny-code; the binary istiny-code. Names may change before the first npm publish.
npm install -g @therr/tiny-codeOr run from source:
npm install
npm run build
node dist/cli.jsProvide at least one API key. If several are set, the default is the first available in this order: Anthropic, Gemini, DeepSeek, Qwen.
export ANTHROPIC_API_KEY=sk-ant-...
export GEMINI_API_KEY=...
export DEEPSEEK_API_KEY=sk-...
export QWEN_API_KEY=sk-... # Alibaba DashScope key (DASHSCOPE_API_KEY also works)DeepSeek and Qwen are hosted, OpenAI-compatible coding models. Override their
endpoints with TINY_CODE_DEEPSEEK_URL / TINY_CODE_QWEN_URL (or deepseekBaseUrl
/ qwenBaseUrl in config) — e.g. to point Qwen at the international DashScope host.
tiny-code # start the REPL (uses an available key)
tiny-code --provider gemini # force a provider
tiny-code --model claude-opus-4-8
tiny-code --provider deepseek --model deepseek-v4-pro # DeepSeek's coding model
tiny-code --provider qwen --model qwen3-coder-plus # Qwen Coder
tiny-code --provider ollama --model gemma3:12b # run a local model (no API cost)In the REPL: type a request, watch it work. Mutating actions (writes, edits, shell commands) prompt for approval unless pre-approved in config.
/help— list commands/costs— session token usage, estimated $ cost, and cost-saving tips/clear— clear the conversation history and start fresh/models— show known models, pricing, and the active one (see below)/priority [performance|balanced|cost]— show or switch the cost/performance priority mid-session; re-picks the auto-selected model unless one is pinned (see below)/improve— reflect on the session and propose an improvement PR (see below)/<name> [args]— run a custom command (see below)/exit— quit
tiny-code talks to a local Ollama server over its
OpenAI-compatible API, so any model you've pulled is available — including
Google Gemma 3 (gemma3:4b, gemma3:12b, gemma3:27b) and
qwen2.5-coder (the default, which tool-calls reliably).
ollama serve
ollama pull qwen2.5-coder:7b
tiny-code --provider ollama --model qwen2.5-coder:7bMind the compute cost. Local models are free of API charges but use your machine's RAM/VRAM. On startup with an Ollama model, tiny-code prints how much memory the model needs versus what's free, and warns if it likely won't fit or if the model is too small (≤3B) to tool-call reliably. Rough guide (≈Q4):
| Model | ~RAM needed | Good for |
|---|---|---|
gemma3:1b |
~1 GB | trivial text (poor at tool calls) |
gemma3:4b |
~3 GB | lightweight edits, search |
gemma3:12b |
~7 GB | most coding tasks |
gemma3:27b |
~16 GB | stronger reasoning |
Local-first routing. Set a routing of local-first with an escalateTo
target: every turn starts on the cheap/local model, and tiny-code escalates to
the frontier model when a turn looks heavy (refactors, debugging, multi-file
work) or when the local model gets stuck and calls the built-in escalate tool.
You get local speed and zero cost for the bulk of the work, and frontier power
only for the hard parts. Run /costs any time for usage, spend, and tips.
{
"provider": "ollama",
"model": "qwen2.5-coder:7b",
"routing": "local-first",
"escalateTo": { "provider": "anthropic", "model": "claude-opus-4-8" }
}On start, the agent walks up from the working directory looking for AGENTS.md
(or CLAUDE.md) and includes it in the system prompt — put your project
conventions there.
Drop markdown files with YAML frontmatter into ./.agent/commands/ (per project)
or ~/.config/tiny-code/commands/ (global):
---
name: review
description: Review the staged diff for bugs
argument-hint: "[--strict]"
---
Review the staged git diff. Run `git diff --cached`, then report any bugs,
risky changes, or missing tests. $ARGUMENTSInvoke with /review --strict. $ARGUMENTS is replaced with whatever you type
after the command (appended if the placeholder is absent).
Optional tiny-code.config.json in the project root (or
~/.config/tiny-code/config.json). Precedence: defaults < config file < env <
CLI flags.
{
"provider": "anthropic",
"model": "claude-opus-4-8",
"ollamaBaseUrl": "http://localhost:11434/v1",
"priority": "balanced",
"maxTokens": 16000,
"thinking": true,
"effort": "high",
"maxIterations": 50,
"routing": "off",
"escalateTo": { "provider": "anthropic", "model": "claude-opus-4-8" },
"allow": {
"tools": [],
"bash": ["npm test", "git status", "git diff"],
"write": ["src/**"]
}
}allow pre-approves mutating actions so they skip the confirmation prompt:
bash matches command prefixes, write matches path globs for write/edit.
routing: "local-first" plus escalateTo enables cost-aware routing (see
above); it defaults to local-first
automatically whenever escalateTo is present. ollamaBaseUrl points at your
Ollama server's OpenAI-compatible endpoint; deepseekBaseUrl / qwenBaseUrl
override the DeepSeek and Qwen (DashScope) endpoints.
Approximate cloud pricing used for the /costs estimate lives in the model
catalog (src/models/catalog.ts) — edit it to match current vendor rates.
Minimizing token usage is a first-class goal — coding-agent bills grow fast, and you shouldn't need a complex setup to control them. tiny-code automates the savings where it can:
-
Usage visible by default. Every assistant turn prints
↑ in ↓ out tokenswith no configuration. On exit you get a session total. -
Bounded tool output. grep caps at 200 matches, glob at 500 files, and
read_filesupportsoffset/limitto pull only the lines you need — preventing runaway context growth automatically. -
Minimal system prompt. The built-in persona is kept short. Tool schemas are generated from Zod (no duplicate prose). Project context is opt-in.
-
Concise agent instructions. The agent is explicitly told to avoid restating the task or narrating completed steps.
-
Effort control. For Anthropic models,
efforttunes the adaptive thinking budget. Drop it from the default"high"to"medium"or"low"for simpler, cheaper tasks:{ "effort": "medium" }Or set it per-session with
TINY_CODE_EFFORT=medium.
Coming: automatic conversation compaction once histories grow long
(see TODO.md), which will keep input-token counts from compounding across
many turns without any user action.
tiny-code ships a small, curated catalog of coding models
(src/models/catalog.ts) with each model's pricing, context window, and a
relative coding-aptitude score. It uses this to turn raw token counts into real
money and to pick a model that fits your cost/performance preference.
-
Dollar cost, not just tokens. Per-turn usage and the session total show an estimated USD cost next to the token counts, priced from the active model's rate — so the bill is visible as you work, not a surprise later.
-
/modelslists the catalog (cheapest first) with pricing and scores, marks the active model, and shows the session's running cost. -
Priority-driven selection. When you don't pin a
model, tiny-code picks one for you based onpriority:priorityPicks balancedThe best capability-per-dollar among capable models (default). performanceThe most capable model, ignoring price. costThe cheapest still-capable model. balancedis the default: it ranks capable models bycodingScore / blendedCostPerMTok(a model's coding aptitude per blended dollar, weighting input 80% / output 20%) behind a quality floor, so you get strong-but-sensibly-priced models without opting in.{ "priority": "performance" }Or per-session with
TINY_CODE_PRIORITY=cost, or on the fly with the/prioritycommand (e.g./priority performanceto jump to the most capable model when a task gets hard, then/priority balancedto drop back). Pinningmodel(config, env, or--model) always overrides the recommendation.
The catalog is curated and offline (tiny-code has no live model-discovery yet —
see TODO.md), so its prices carry an "as of" date; keep it current as vendors
ship new models and change pricing.
tiny-code can learn from how it's used. When a session ends (or when you run
/improve), it reflects on the conversation transcript looking for recurring
friction — tool errors, repeated retries, denied permissions, missing
capabilities. If it finds a concrete improvement, it asks for your permission to
open a pull request.
That PR contains only a single markdown file under improvements/
describing the proposed change, targeting main for a maintainer to review and
implement separately. It never contains code changes — this is enforced
structurally (the PR creator only ever stages one regex-validated markdown path),
so a prompt-injected session cannot smuggle code into a PR.
PRs are opened via the gh CLI, which must be
installed and authenticated (gh auth login); the working tree must be clean.
{
"improve": {
"enabled": true,
"baseBranch": "main",
"onSessionEnd": true
}
}The feature is on by default. Set improve.enabled to false (or export
TINY_CODE_IMPROVE=0) to disable it entirely; set onSessionEnd to false to
keep /improve but skip the automatic reflection at exit.
npm install
npm run typecheck
npm run lint
npm test
npm run buildSee TODO.md for the deferred-features backlog (sub-agents, web search, MCP,
rich TUI, one-shot mode, conversation persistence).
Contributions are accepted from authorized contributors only, and are governed by the License — by submitting a contribution you agree to those terms (including the assignment of rights in clause 4). This is not an open-source project: please do not fork, mirror, or redistribute the codebase (see the License).
Workflow for an authorized change:
-
Open an issue first to discuss the change and get sign-off from a maintainer.
-
Branch from
main, keep changes focused, and follow the existing code style. -
Before opening a PR, make sure the full check suite passes:
npm run typecheck npm run lint npm test npm run build -
Add or update tests for any behavior change, and add a changeset (
npx changeset) when the change is user-facing. -
Open a PR targeting
mainfor maintainer review.
tiny-code can also propose its own improvements — a low-friction way to feed
ideas back to maintainers without writing code. When a session ends (or when you
run /improve), the agent reflects on the transcript for recurring friction
(tool errors, repeated retries, denied permissions, missing capabilities) and,
if it finds something concrete, asks permission to open a PR.
That PR contains only a single markdown file under improvements/
describing the proposed change — never code. This is enforced structurally (the
PR creator only ever stages one regex-validated markdown path), so a
prompt-injected session cannot smuggle code into a PR. A maintainer then reviews
the proposal and implements it separately. See
Self-improvement above for configuration.
Proprietary — All Rights Reserved. See LICENSE.
You are granted a limited, revocable license to install and use tiny-code
(e.g. via npm install) for your own internal or personal use. You may not
fork, copy, modify, redistribute, resell, or offer it as a hosted service. It is
free during the beta period; future versions may be offered under subscription
or other paid terms.