drove — local LLMs on demand.
A llama.cpp server manager that wakes models when you need them and shuts them down when you don't. It proxies an OpenAI-compatible API, lazily starts llama-server, and keeps configuration transparent.
Install drove with the repository install script. It installs uv if needed, then installs the drove CLI as a uv tool.
curl -fsSL https://raw.githubusercontent.com/cleanunicorn/drove/main/install.sh | bashOr run the same script from a local checkout:
git clone https://github.com/cleanunicorn/drove.git
cd drove
./install.shAfter installation, make sure the uv tool bin directory is on your PATH if the installer prints a PATH warning. drove also requires llama-server from llama.cpp before you start the proxy.
drove init
drove models download unsloth/Qwen3-8B-GGUF
drove serve &
drove chat- Lazy by design — model processes start on first request and stop after idle timeout.
- OpenAI-compatible — drop drove behind existing OpenAI SDK clients.
- Observable — request logging plus TUI/web inspection for request/response debugging.
| drove | Ollama | llama.cpp directly | |
|---|---|---|---|
| Backend | llama.cpp | llama.cpp (forked) | llama.cpp |
| Lazy model loading | yes | yes | no |
| Multiple concurrent models | yes | yes | manual |
| OpenAI-compatible API | yes | yes | yes (server) |
| Direct llama-server flags | yes (per model) | partial | yes |
| HuggingFace download + GGUF convert | yes | partial | manual |
| Request/response observability | built-in | no | no |
| TUI chat with sessions | yes | no | no |
| Configuration surface | TOML + env | env + Modelfile | flags |
- In-repo docs:
docs/ - Hosted docs target:
https://drove.dev/docs
When opening a new issue, please use the repository issue templates:
uv sync
uv run pytest
uv run ruff check .
uv run mypy src/