Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 0 additions & 26 deletions .changeset/local-models-cost-routing.md

This file was deleted.

13 changes: 9 additions & 4 deletions .env.example
Original file line number Diff line number Diff line change
@@ -1,13 +1,18 @@
# Provide at least one for cloud providers. If both are present, Anthropic is
# the default. Ollama runs locally and needs no key.
# Provide at least one for cloud providers. If several are present, the default
# is the first available in this order: Anthropic, Gemini, DeepSeek, Qwen.
# Ollama runs locally and needs no key.
ANTHROPIC_API_KEY=
GEMINI_API_KEY=
DEEPSEEK_API_KEY=
QWEN_API_KEY= # Alibaba DashScope key (DASHSCOPE_API_KEY also accepted)

# Optional overrides (also settable via config file / CLI flags)
# TINY_CODE_PROVIDER=anthropic # anthropic | gemini | ollama
# TINY_CODE_PROVIDER=anthropic # anthropic | gemini | ollama | deepseek | qwen
# TINY_CODE_MODEL=claude-opus-4-8
# TINY_CODE_OLLAMA_URL=http://localhost:11434/v1 # Ollama OpenAI-compatible endpoint
# TINY_CODE_PRIORITY=performance # performance | cost | balanced — auto-picks a model when none is pinned
# TINY_CODE_DEEPSEEK_URL=https://api.deepseek.com/v1
# TINY_CODE_QWEN_URL=https://dashscope.aliyuncs.com/compatible-mode/v1
# TINY_CODE_PRIORITY=balanced # performance | cost | balanced (default) — auto-picks a model when none is pinned
# TINY_CODE_EFFORT=high # low | medium | high | xhigh | max — Anthropic thinking budget

# Self-improvement: reflect on sessions and propose markdown-only improvement PRs.
Expand Down
8 changes: 5 additions & 3 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,9 +29,11 @@ runaway costs.
- Keep it current: when adding/repricing a model, update its entry **and**
`CATALOG_AS_OF`. Anthropic pricing comes from the bundled claude-api reference;
verify Gemini pricing against Google's published rates. Don't guess prices.
- `priority` defaults to `performance`, which preserves the historical default
models (Opus for Anthropic, Gemini 2.5 Pro for Gemini). Don't change the
default without updating the config tests that assert those ids.
- `priority` defaults to `balanced` (best capability-per-dollar behind a quality
floor), so the auto-picked model is cost-aware by default — e.g. Sonnet rather
than Opus for Anthropic. `performance` restores the historical most-capable
picks. Don't change the default without updating the config/catalog tests that
assert those ids.

## Boundaries
- No business logic. This is a general-purpose tool.
Expand Down
63 changes: 63 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
# @therr/tiny-code

## 0.3.0

### Minor Changes

- 118faa0: Default model selection to `balanced` priority.

When no `model` is pinned, tiny-code now defaults to `priority: "balanced"`
instead of `performance`, picking the best capability-per-dollar model
(`codingScore / blendedCostPerMTok`, behind a quality floor) rather than the
most capable regardless of price. In line with the project's token-minimalism
goal, this makes the out-of-the-box pick cost-aware — e.g. Claude Sonnet rather
than Opus for Anthropic. Set `priority: "performance"` (or
`TINY_CODE_PRIORITY=performance`) to restore the previous most-capable defaults;
pinning a `model` still overrides everything.

- 785b832: Add local models and cost-aware, local-first routing.
- **Local (Ollama) provider.** Talk to a local Ollama server over its
OpenAI-compatible API (`--provider ollama`), with an idle timeout so a hung
model can't freeze the REPL, best-effort token-usage reporting, and configurable
`maxTokens`.
- **Local-first routing.** Set `routing: "local-first"` with an `escalateTo`
target to run a cheap/local model by default and escalate heavy turns (or a
stuck local model, via the new `escalate` tool) to a frontier model — with full
conversation context preserved. Escalation is sticky across follow-up turns.
- **Model-selection policy** is now owned by a pluggable `ModelDecisionEngine`
(`LocalFirstModelEngine`), keeping the agent loop pure mechanism.
- **Compute awareness.** On startup with a local model, tiny-code estimates RAM
need vs. machine capacity and warns when a model likely won't fit or is too
small (≤3B) to tool-call reliably; an over-RAM local model is routed to the
frontier up front.
- **Priority-driven model selection.** `priority` (`performance` / `cost` /
`balanced`, or `TINY_CODE_PRIORITY`) auto-picks a catalog model when none is
pinned.
- The `/costs` view reports session usage, estimated spend, and routing, and the
usage line distinguishes an unpriced _cloud_ turn ("cost unknown") from a
_local_ turn ("no API cost").

- f5c3832: Add a `/priority` command to switch cost/performance bias mid-session.

`/priority` (no args) shows the current priority and the active model;
`/priority performance | balanced | cost` switches it and re-picks the
auto-selected model on the fly — e.g. jump to the most capable model when a task
gets hard, then drop back to `balanced`. Pinned models and local-first routing
keep governing the model themselves, so there the command just records the new
priority. Backed by a new `AgentLoop.setProvider` for swapping the active
provider mid-session, and a `modelPinned` flag on the resolved config.

- 52b179d: Add DeepSeek and Qwen Coder model support.
- **DeepSeek and Qwen providers.** Two new hosted, OpenAI-compatible providers
(`--provider deepseek` / `--provider qwen`), keyed by `DEEPSEEK_API_KEY` and
`QWEN_API_KEY` (or `DASHSCOPE_API_KEY`). Endpoints are overridable via
`TINY_CODE_DEEPSEEK_URL` / `TINY_CODE_QWEN_URL` or `deepseekBaseUrl` /
`qwenBaseUrl` in config — e.g. to target the international DashScope host.
- **Shared OpenAI-compatible core.** The streaming/tool-call adapter that backed
the Ollama provider is now a reusable `OpenAiCompatibleProvider` base; Ollama,
DeepSeek, and Qwen all extend it, differing only in endpoint, auth, and error
wording.
- **Catalog entries** for `deepseek-v4-pro`, `deepseek-v4-flash`,
`qwen3-coder-plus`, and `qwen3-coder-flash`, so `/costs` estimates and
priority-based model selection work for the new providers. `/costs` treats both
as paid cloud providers.
46 changes: 32 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,9 @@
A small, extensible CLI coding agent built around one constraint: **keep token
usage low**. As coding-agent costs climb, tiny-code automates the savings so
you don't have to. Interactive terminal REPL, interchangeable **Anthropic**,
**Gemini**, and **local (Ollama)** models, and just the core features you
actually use: read/write/edit files, run shell commands, search code, and a
custom commands/skills system. No business logic baked in.
**Gemini**, **DeepSeek**, **Qwen Coder**, and **local (Ollama)** models, and just
the core features you actually use: read/write/edit files, run shell commands,
search code, and a custom commands/skills system. No business logic baked in.

Run cheap, open-weight models locally and **escalate heavy work to a frontier
model only when needed** — see [Local models & cost-aware routing](#local-models--cost-aware-routing).
Expand All @@ -29,19 +29,28 @@ node dist/cli.js

## Setup

Provide at least one API key. If both are set, Anthropic is used by default.
Provide at least one API key. If several are set, the default is the first
available in this order: Anthropic, Gemini, DeepSeek, Qwen.

```bash
export ANTHROPIC_API_KEY=sk-ant-...
export GEMINI_API_KEY=...
export DEEPSEEK_API_KEY=sk-...
export QWEN_API_KEY=sk-... # Alibaba DashScope key (DASHSCOPE_API_KEY also works)
```

DeepSeek and Qwen are hosted, OpenAI-compatible coding models. Override their
endpoints with `TINY_CODE_DEEPSEEK_URL` / `TINY_CODE_QWEN_URL` (or `deepseekBaseUrl`
/ `qwenBaseUrl` in config) — e.g. to point Qwen at the international DashScope host.

## Usage

```bash
tiny-code # start the REPL (uses an available key)
tiny-code --provider gemini # force a provider
tiny-code --model claude-opus-4-8
tiny-code --provider deepseek --model deepseek-v4-pro # DeepSeek's coding model
tiny-code --provider qwen --model qwen3-coder-plus # Qwen Coder
tiny-code --provider ollama --model gemma3:12b # run a local model (no API cost)
```

Expand All @@ -52,6 +61,7 @@ shell commands) prompt for approval unless pre-approved in config.
- `/costs` — session token usage, estimated $ cost, and cost-saving tips
- `/clear` — clear the conversation history and start fresh
- `/models` — show known models, pricing, and the active one (see below)
- `/priority [performance|balanced|cost]` — show or switch the cost/performance priority mid-session; re-picks the auto-selected model unless one is pinned (see below)
- `/improve` — reflect on the session and propose an improvement PR (see below)
- `/<name> [args]` — run a custom command (see below)
- `/exit` — quit
Expand Down Expand Up @@ -133,7 +143,7 @@ CLI flags.
"provider": "anthropic",
"model": "claude-opus-4-8",
"ollamaBaseUrl": "http://localhost:11434/v1",
"priority": "performance",
"priority": "balanced",
"maxTokens": 16000,
"thinking": true,
"effort": "high",
Expand All @@ -154,7 +164,8 @@ CLI flags.
`routing: "local-first"` plus `escalateTo` enables cost-aware routing (see
[above](#local-models--cost-aware-routing)); it defaults to `local-first`
automatically whenever `escalateTo` is present. `ollamaBaseUrl` points at your
Ollama server's OpenAI-compatible endpoint.
Ollama server's OpenAI-compatible endpoint; `deepseekBaseUrl` / `qwenBaseUrl`
override the DeepSeek and Qwen (DashScope) endpoints.

Approximate cloud pricing used for the `/costs` estimate lives in the model
catalog (`src/models/catalog.ts`) — edit it to match current vendor rates.
Expand Down Expand Up @@ -203,18 +214,25 @@ money and to pick a model that fits your cost/performance preference.
- **Priority-driven selection.** When you don't pin a `model`, tiny-code picks
one for you based on `priority`:

| `priority` | Picks |
| --------------- | ----------------------------------------------------------- |
| `performance` | The most capable model (the default — current behavior). |
| `cost` | The cheapest still-capable model. |
| `balanced` | The best capability-per-dollar among capable models. |
| `priority` | Picks |
| --------------- | --------------------------------------------------------------- |
| `balanced` | The best capability-per-dollar among capable models (default). |
| `performance` | The most capable model, ignoring price. |
| `cost` | The cheapest still-capable model. |

`balanced` is the default: it ranks capable models by
`codingScore / blendedCostPerMTok` (a model's coding aptitude per blended
dollar, weighting input 80% / output 20%) behind a quality floor, so you get
strong-but-sensibly-priced models without opting in.

```json
{ "priority": "balanced" }
{ "priority": "performance" }
```

Or per-session with `TINY_CODE_PRIORITY=cost`. Pinning `model` (config, env,
or `--model`) always overrides the recommendation.
Or per-session with `TINY_CODE_PRIORITY=cost`, or on the fly with the
`/priority` command (e.g. `/priority performance` to jump to the most capable
model when a task gets hard, then `/priority balanced` to drop back). Pinning
`model` (config, env, or `--model`) always overrides the recommendation.

The catalog is curated and offline (tiny-code has no live model-discovery yet —
see `TODO.md`), so its prices carry an "as of" date; keep it current as vendors
Expand Down
14 changes: 14 additions & 0 deletions TODO.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,20 @@ a single condensed block. For Anthropic use the compaction beta; for Gemini
summarize via a lightweight call to a cheap model. Pair with conversation
persistence so compacted sessions can be resumed.

## On-the-fly provider switching
The `/priority` command already swaps the active *model* within the current
provider mid-session (`AgentLoop.setProvider`). Extend this to switch the
*provider* too, so a session can move between Anthropic, Gemini, DeepSeek, Qwen,
and Ollama without restarting. **Approach:** a `/provider <name> [model]`
command that validates the target's API key (reuse `createProvider`'s checks),
re-resolves the model (honoring `priority` and any pin), rebuilds the provider,
and calls `agent.setProvider`. Decide how it interacts with local-first routing
(switching the primary vs. the `escalateTo` target) and keep `/costs` accurate
across providers — usage is already priced per-turn from the active model, so the
running total stays correct; just refresh the session-end summary's model.
Consider a single `/model <id>` shortcut that infers the provider from the
catalog entry.

## Sub-agents
Spawn isolated agent runs for parallel exploration/research (like a lightweight
Explore/Plan agent). **Approach:** a `spawn_agent` tool whose `execute` constructs
Expand Down
2 changes: 1 addition & 1 deletion package.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"name": "@therr/tiny-code",
"version": "0.2.3",
"version": "0.3.0",
"description": "A small, extensible CLI coding agent with interchangeable Anthropic and Gemini models.",
"type": "module",
"bin": {
Expand Down
12 changes: 11 additions & 1 deletion src/agent/loop.ts
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@ export interface AgentLoopOptions {
* iteration guard trips). Conversation state persists across `run` calls.
*/
export class AgentLoop {
private readonly provider: ModelProvider;
private provider: ModelProvider;
private readonly registry: ToolRegistry;
private readonly gate: PermissionGate;
private readonly system: string;
Expand Down Expand Up @@ -86,6 +86,16 @@ export class AgentLoop {
return this.messages;
}

/**
* Swap the base provider mid-session — e.g. when the user changes the active
* model via `/priority`. Only affects un-escalated turns; if the session has
* stuck to an escalated frontier provider, that takes precedence until
* `clearHistory()` resets routing.
*/
setProvider(provider: ModelProvider): void {
this.provider = provider;
}

/** Drop the conversation history so the next turn starts fresh. Cumulative
* token usage is preserved, since it reflects the whole session's cost.
* Also clears sticky escalation: a fresh conversation re-routes from scratch. */
Expand Down
9 changes: 6 additions & 3 deletions src/cli.ts
Original file line number Diff line number Diff line change
Expand Up @@ -12,18 +12,21 @@ Usage:
tiny-code [options]

Options:
--provider <name> anthropic | gemini | ollama (default: inferred from API keys)
--model <id> Model id override (e.g. claude-opus-4-8, gemma3:12b)
--provider <name> anthropic | gemini | ollama | deepseek | qwen
(default: inferred from API keys)
--model <id> Model id override (e.g. claude-opus-4-8, qwen3-coder-plus)
--config <path> Path to a config JSON file
-v, --version Print version
-h, --help Show this help

Environment:
ANTHROPIC_API_KEY Required for the Anthropic provider
GEMINI_API_KEY Required for the Gemini provider
DEEPSEEK_API_KEY Required for the DeepSeek provider
QWEN_API_KEY Required for the Qwen provider (or DASHSCOPE_API_KEY)
TINY_CODE_OLLAMA_URL Ollama OpenAI-compatible base URL (default http://localhost:11434/v1)
TINY_CODE_PRIORITY performance | cost | balanced — auto-picks a model when
none is pinned (default: performance)
none is pinned (default: balanced)

Cost-saving: set "routing": "local-first" with an "escalateTo" target in your
config to run cheap/local models by default and escalate heavy tasks. Run /costs
Expand Down
Loading
Loading