Skip to content

latent-to/cacheon

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

124 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Cacheon (SN14)

Inference optimization. Fastest server wins.

Discord Docs TAO.app X License: MIT

Website | Docs | Discord | TAO.app


Cacheon is a Bittensor subnet (SN14) that runs an open competition for production-grade LLM inference optimization. Miners submit containerized inference servers. Validators evaluate them against a vLLM baseline on the same hardware. The fastest correct server earns the majority of emission; a runner-up receives a share, and the remaining emission scales with the winner's improvement.

V1 arena: Qwen2.5-72B-Instruct on 8-GPU TP=8 pods (8x H200, 8x B200, or 8x B300). Beat the pinned vLLM baseline on TTFT and throughput while passing a greedy-decoding correctness gate.

How It Works

  1. Miners build an inference server, package it as a Docker image, pay the submission fee, and commit the image reference, digest, and payment proof on-chain.
  2. Validators scan the chain for new commitments, pull the image, and run it with model weights mounted at /models.
  3. Scoring measures TTFT and throughput improvement over the vLLM baseline. Correctness is checked first; fail it and the score is zero.
  4. The fastest correct server becomes the winner and earns 80% of the competition pool. The runner-up earns 20%. Total pool scales with the winner's score relative to a target improvement.
  5. Challengers must exceed the winner's fresh score by a fixed 1% margin to prevent noise-driven churn.

Score formula:

if not correctness_pass:
    score = 0.0
else:
    ttft_imp = max(0, (baseline_ttft - miner_ttft) / baseline_ttft)
    tps_imp  = max(0, (miner_tps  - baseline_tps)  / baseline_tps)
    score = 0.5 * ttft_imp + 0.5 * tps_imp

For Miners

Build an inference server that serves Qwen2.5-72B-Instruct via /v1/chat/completions with streaming and logprobs. Package it as a Docker image (maximum 20 GB; model weights are mounted at runtime, not baked into the image). Push it to a public registry, then run miner/commit.py to pay the submission fee and commit on-chain in one flow.

Requirements: public container registry, Bittensor wallet registered on SN14, coldkey balance for the submission fee. GPU hardware is only needed for local testing.

# Push your image
docker tag my-server:latest docker.io/myuser/cacheon-miner:v1
docker push docker.io/myuser/cacheon-miner:v1

# Pay fee + commit on-chain (test locally first)
python miner/commit.py \
  --wallet-name <wallet> \
  --wallet-hotkey <hotkey> \
  --image "docker.io/myuser/cacheon-miner:v1" \
  --digest "sha256:..." \
  --fee 0.1 \
  --network finney \
  --netuid 14

Full guide: cacheon.ai/docs/miners/overview

For Validators

The validator has two components: an always-on CPU host (chain scanning, weight setting) and an ephemeral GPU pod (eval). The GPU pod is rented on-demand only when challengers are queued.

GPU requirements: NVLink/SXM 8-GPU pod (8x H200, 8x B200, or 8x B300 are Tier A; 8x H100 is Tier B fallback), 400 GB storage, model weights at /workspace/models/Qwen2.5-72B-Instruct. See GPU requirements.

# CPU host (always-on)
git clone https://github.com/latent-to/cacheon
cd cacheon
cp .env.example .env   # add wallet and S3 config
docker compose up --build

# GPU pod (on-demand, run when challengers appear)
bash scripts/gpu_setup/setup.sh
docker compose -f validator/gpu-compose.yml up --build -d

Full guide: cacheon.ai/docs/validators/overview

Documentation

Miners Validators Evaluation
Start here Overview Overview Scoring
Reference API contract Architecture Harness
Setup Quickstart Validator setup Prompts
Rules Rules Manual GPU setup Roadmap

License

MIT

Releases

No releases published

Packages

 
 
 

Contributors