AI Inference Managed by AI — A single Go binary that detects hardware, resolves optimal configs from a YAML knowledge base, deploys inference engines via K3S, and exposes 61 MCP tools for AI Agents to operate everything.
- 2026-04 — v0.4.0 ships: Explorer Agent Planner (PDCA), Central Advisor + Analyzer, MCP consolidation 101→61, aima-service device identity Phase 1, onboarding wizard with 5-dimension 0–100 scoring, multi-modal benchmark (chat/TTS/ASR/T2I/T2V).
- 2026-03 — v0.3.x: OpenClaw full-stack integration, smart agent routing, Engine Profile system with SGLang-KT, AMD RDNA3 (W7900D) 8-GPU validated.
- 2026-02 — v0.2.0: Support service, Web UI redesign, OpenClaw integration.
- 2026-01 — v0.0.1: Initial foundation release (hardware detection, multi-runtime).
- Zero-config hardware detection — automatically discovers GPUs (NVIDIA, AMD, Huawei Ascend, Hygon DCU, Apple Silicon), CPU, and RAM.
- Knowledge-driven deployment — YAML catalog of hardware profiles, engines, models, and partition strategies; no engine-specific code branches.
- Multi-runtime — K3S (Pod) for clusters, Docker for single-node containers, Native (exec) for bare-metal inference.
- 61 MCP tools — full programmatic control for AI Agents over hardware, models, engines, deployments, fleet, and more.
- Fleet management — mDNS-based auto-discovery of LAN peers; remote tool execution across heterogeneous devices.
- Offline-first — all core functions work with zero network; network is enhancement, not requirement.
- Single binary, zero CGO — cross-compiles to Windows, macOS, Linux (amd64/arm64) with no C dependencies.
Grab a pre-built binary from the Releases page, or build from source:
git clone https://github.com/Approaching-AI/AIMA.git
cd AIMA
make buildFor published product releases, the binary installer can be one line:
curl -fsSL https://raw.githubusercontent.com/Approaching-AI/AIMA/master/install.sh | shOn Windows PowerShell:
irm https://raw.githubusercontent.com/Approaching-AI/AIMA/master/install.ps1 | iexNotes:
- The installer resolves the latest installable
vX.Y.Zproduct release instead of GitHub'slatestrelease, because bundle tags such asbundle/stack/2026-02-26are not product binaries. - If tags are ahead of published binaries, the installer warns and stays on the latest installable release until the new assets are uploaded.
- Override the source repo for forks with
AIMA_REPO=<owner>/<repo>. - Pin a release with
AIMA_VERSION=v0.2.0. - Windows installer currently targets
windows/amd64and installs to%LOCALAPPDATA%\Programs\AIMA.
# 1. Detect your hardware
aima hal detect
# 2. Initialize infrastructure (installs K3S + HAMi + aima-serve daemon)
# Downloads airgap images for offline container startup.
# --tier k3s pulls in K3S + HAMi on top of the docker baseline.
# Requires root for systemd service installation.
sudo aima onboarding init --tier k3s --yes
# 3. Deploy a model (auto-resolves engine + config for your hardware)
aima deploy qwen3.5-35b-a3bAfter aima onboarding init --tier k3s, three components are running as systemd services:
| Component | What it does |
|---|---|
| K3S | Container orchestration (containerd, airgap images pre-loaded) |
| HAMi | GPU virtualization for multi-model sharing (skipped on unsupported hardware) |
| aima-serve | API server on 0.0.0.0:6188 with mDNS broadcast |
The server is now discoverable on the LAN and ready to serve inference requests.
On another device with the AIMA binary — no onboarding init or serve needed:
# List AIMA devices auto-discovered on the LAN via mDNS (no IP needed)
aima fleet devices
# Query a remote device
aima fleet exec <device-id> hardware.detect
aima fleet exec <device-id> deploy.list
# Call the OpenAI-compatible API directly
curl http://<server-ip>:6188/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model":"qwen3.5-35b-a3b","messages":[{"role":"user","content":"hello"}]}'Every AIMA server hosts a built-in Web UI at http://<server-ip>:6188/ui/.
To discover the server IP first: aima fleet devices.
To get a Fleet dashboard that auto-discovers all LAN peers, run aima serve --discover on your own device and open http://localhost:6188/ui/.
aima onboarding init starts the server without authentication (LAN trust model). To enable API key authentication:
# Set API key (hot-reloads, no restart needed)
aima config set api_key <your-key>
# All API/MCP/Fleet requests now require: Authorization: Bearer <your-key>
# Web UI will prompt for the key automatically.
# Remote fleet commands with authentication
aima fleet devices --api-key <your-key>AIMA has been validated end-to-end across eight GPU/NPU ecosystems — NVIDIA (CUDA), AMD (ROCm / Vulkan), Huawei Ascend (CANN), Hygon (DCU), MetaX (MACA), Moore Threads (MUSA), Apple (Metal), and Intel (CPU-only):
AIMA orchestrates three inference runtimes — vLLM (Safetensors), llama.cpp (GGUF), and SGLang (Safetensors) — across every supported backend:
AIMA resolves every deployment decision by climbing four layers of intelligence. Each layer can override the layer below it, and every layer fails gracefully to the one underneath.
The system is built around four invariants: no code branches for engine/model types (YAML-driven), no container lifecycle management (K3S handles it), MCP tools as the single source of truth, and offline-first operation.
See design/ARCHITECTURE.md for the full architecture document.
Every AIMA release passes through The Forge — an end-to-end UAT matrix run on real hardware.
- Eight vendors × multiple devices (NVIDIA GB10, RTX 4090, AMD W7900D × 8, Huawei Ascend 910B × 8, Hygon BW150 DCU × 8, MetaX N260 × 2, Moore Threads M1000, Apple M4, Intel CPU)
- Three runtimes validated per release: K3S Pod, Docker container, Native exec
- 16 UAT items per release covering install / hardware detect / model deploy / API / MCP / fleet / onboarding wizard / failover
- 1,200+ evidence files and ~1,000 hours of logged runtime across the fleet
Per-release UAT report lives in docs/uat/v0.4-release-uat.md, with raw per-device evidence (logs, configs, reproducer commands) under artifacts/uat/v0.4/ — u1…u15 covering install, hardware detect, model deploy, API/MCP, fleet, onboarding, and failover on GB10, RTX 4090, AMD W7900D, Apple M4, and others.
AIMA is built to be driven by AI agents first, humans second.
- 61 MCP tools expose hardware / model / engine / deploy / fleet / knowledge / agent / device identity as JSON-RPC 2.0 functions
- The Dispatcher routes any request through L0→L3 with automatic fallback — agents get the same API whether the local LLM is loaded or not
- Explorer Agent Planner runs document-driven PDCA cycles with a SQLite-backed workspace and seven bash-like tools; every decision leaves a structured trace
Browse the MCP tool surface by domain
| Domain | Representative tools | Purpose |
|---|---|---|
| hardware | hardware.detect · hardware.metrics |
GPU/CPU/RAM inventory + live telemetry |
| model | model.list · model.scan · model.pull · model.import · model.info · model.remove |
Local model catalog + download + import |
| engine | engine.list · engine.pull · engine.scan · engine.import · engine.info |
Inference engine lifecycle |
| deploy | deploy.apply · deploy.dry_run · deploy.list · deploy.logs · deploy.delete · deploy.approve · deploy.status |
Start / monitor / stop model serving |
| fleet | fleet.info · fleet.exec |
LAN auto-discovery + remote tool execution |
| knowledge | knowledge.resolve · knowledge.search · knowledge.promote · knowledge.evaluate · knowledge.save · knowledge.analytics |
YAML catalog + golden-config lifecycle |
| agent | agent.ask · agent.status · agent.rollback |
L3a Agent invocation + decision trace |
| device | device.register · device.status · device.renew · device.reset |
aima-service identity lifecycle |
| benchmark | benchmark.run · benchmark.matrix · benchmark.record · benchmark.list |
Reproducible performance testing |
| central | central.advise · central.scenario · central.sync |
Central knowledge server + advisory feedback |
| system | system.config · system.status |
Hot-reload config + overall health |
See internal/mcp/ for the complete registry.
Connect any MCP-compatible client (Claude Desktop, Cursor, custom agent) and AIMA becomes your AI-inference control plane.
cmd/aima/ Entry point + dependency wiring split by domain
internal/
hal/ Hardware detection
knowledge/ YAML knowledge base + SQLite resolver
runtime/ K3S (Pod) + Docker (container) + Native (exec) runtimes
mcp/ MCP server + 61 MCP tool registrations/implementations
agent/ Go Agent loop (L3a)
cli/ Cobra CLI (thin wrappers over MCP tools)
ui/ Embedded Web UI (Alpine.js SPA)
proxy/ OpenAI-compatible HTTP proxy
fleet/ mDNS fleet discovery + remote execution
sqlite.go SQLite state store (`package state`, modernc.org/sqlite, zero CGO)
model/ Model scan/download/import + metadata detection
engine/ Engine image management
stack/ K3S + HAMi infrastructure installer
catalog/
hardware/ Hardware profile YAML
engines/ Engine asset YAML
models/ Model asset YAML
partitions/ Partition strategy YAML
stack/ Stack component YAML
make build
# Output: build/aima (or build/aima.exe on Windows)make all
# Output:
# build/aima.exe (windows/amd64)
# build/aima-darwin-arm64 (macOS/arm64)
# build/aima-linux-arm64 (linux/arm64)
# build/aima-linux-amd64 (linux/amd64)make release-assets
# Output:
# build/release/<version>/aima-darwin-arm64
# build/release/<version>/aima-linux-amd64
# build/release/<version>/aima-linux-arm64
# build/release/<version>/aima-windows-amd64.exe
# build/release/<version>/checksums.txtTo upload those assets to the matching GitHub release with gh:
make publish-release-assetsAnnotated SemVer tag pushes such as v0.2.1 also trigger .github/workflows/release.yml, which builds the same assets and uploads them automatically.
go test ./...Apache License 2.0. See LICENSE for details.


