Cacheon is a Bittensor subnet (SN14) that runs an open competition for production-grade LLM inference optimization. Miners submit containerized inference servers. Validators evaluate them against a vLLM baseline on the same hardware. The fastest correct server earns the majority of emission; a runner-up receives a share, and the remaining emission scales with the winner's improvement.
V1 arena: Qwen2.5-72B-Instruct on 8-GPU TP=8 pods (8x H200, 8x B200, or 8x B300). Beat the pinned vLLM baseline on TTFT and throughput while passing a greedy-decoding correctness gate.
- Miners build an inference server, package it as a Docker image, pay the submission fee, and commit the image reference, digest, and payment proof on-chain.
- Validators scan the chain for new commitments, pull the image, and run it with model weights mounted at
/models. - Scoring measures TTFT and throughput improvement over the vLLM baseline. Correctness is checked first; fail it and the score is zero.
- The fastest correct server becomes the winner and earns 80% of the competition pool. The runner-up earns 20%. Total pool scales with the winner's score relative to a target improvement.
- Challengers must exceed the winner's fresh score by a fixed 1% margin to prevent noise-driven churn.
Score formula:
if not correctness_pass:
score = 0.0
else:
ttft_imp = max(0, (baseline_ttft - miner_ttft) / baseline_ttft)
tps_imp = max(0, (miner_tps - baseline_tps) / baseline_tps)
score = 0.5 * ttft_imp + 0.5 * tps_impBuild an inference server that serves Qwen2.5-72B-Instruct via /v1/chat/completions with streaming and logprobs. Package it as a Docker image (maximum 20 GB; model weights are mounted at runtime, not baked into the image). Push it to a public registry, then run miner/commit.py to pay the submission fee and commit on-chain in one flow.
Requirements: public container registry, Bittensor wallet registered on SN14, coldkey balance for the submission fee. GPU hardware is only needed for local testing.
# Push your image
docker tag my-server:latest docker.io/myuser/cacheon-miner:v1
docker push docker.io/myuser/cacheon-miner:v1
# Pay fee + commit on-chain (test locally first)
python miner/commit.py \
--wallet-name <wallet> \
--wallet-hotkey <hotkey> \
--image "docker.io/myuser/cacheon-miner:v1" \
--digest "sha256:..." \
--fee 0.1 \
--network finney \
--netuid 14Full guide: cacheon.ai/docs/miners/overview
The validator has two components: an always-on CPU host (chain scanning, weight setting) and an ephemeral GPU pod (eval). The GPU pod is rented on-demand only when challengers are queued.
GPU requirements: NVLink/SXM 8-GPU pod (8x H200, 8x B200, or 8x B300 are Tier A; 8x H100 is Tier B fallback), 400 GB storage, model weights at /workspace/models/Qwen2.5-72B-Instruct. See GPU requirements.
# CPU host (always-on)
git clone https://github.com/latent-to/cacheon
cd cacheon
cp .env.example .env # add wallet and S3 config
docker compose up --build
# GPU pod (on-demand, run when challengers appear)
bash scripts/gpu_setup/setup.sh
docker compose -f validator/gpu-compose.yml up --build -dFull guide: cacheon.ai/docs/validators/overview
| Miners | Validators | Evaluation | |
|---|---|---|---|
| Start here | Overview | Overview | Scoring |
| Reference | API contract | Architecture | Harness |
| Setup | Quickstart | Validator setup | Prompts |
| Rules | Rules | Manual GPU setup | Roadmap |
MIT