Kong — Local AI Platform

A comprehensive local AI management platform for running LLMs, image generation, speech, music, 3D modeling, and more on consumer GPU hardware. All inference runs locally — no cloud APIs, no data leaves your machine.

Built for an NVIDIA RTX 4080 (16GB VRAM) / 32GB RAM system, but adaptable to other hardware configurations.

Architecture

 ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐
 │  Web UI  │ │  Tauri   │ │ C++ App  │ │ JUCE App │ │   CLI    │
 │ (React)  │ │ (React)  │ │(SDL/ImGui)│ │ (Audio)  │ │ (Node)   │
 └────┬─────┘ └────┬─────┘ └────┬─────┘ └────┬─────┘ └────┬─────┘
      └──────────┬──┴─────────┬──┴─────────┬──┘             │
                 ▼             ▼             ▼                ▼
              ┌──────────────────────────────────────────────────┐
              │         Fastify API Gateway (:3000)              │
              │   Routes / WebSocket Hub / VRAM Budget           │
              └──────┬──────────────┬─────────────┬─────────────┘
                     ▼              ▼             ▼
              ┌────────────┐ ┌───────────┐ ┌─────────────┐
              │  Ollama    │ │  Express  │ │   FastAPI   │
              │  (:11434)  │ │ Orchestr. │ │  ML Server  │
              │ LLM/Vision │ │  (:3001)  │ │  (:8000)    │
              └────────────┘ │ Jobs/Queue│ │ SD/Whisper/ │
                             └─────┬─────┘ │ Music/3D    │
                              ┌────▼────┐  └─────────────┘
                              │  Redis  │
                              └─────────┘

All frontends communicate exclusively through the Fastify gateway on port 3000. No frontend ever contacts Ollama, Express, or FastAPI directly. This provides a single point for authentication, rate limiting, VRAM budget enforcement, and request routing.

Service	Technology	Port	Role
Gateway	Fastify	3000	API gateway, WebSocket hub, VRAM budget enforcement, static file serving
Orchestrator	Express	3001	Model lifecycle management, BullMQ job queue, multi-step workflows
ML Server	FastAPI (Python)	8000	Non-Ollama ML inference: image generation, speech, music, 3D
Ollama	Ollama	11434	LLM and vision model inference
Redis	Redis	6379	VRAM state coordination, job queue backend

Repository Structure

kong-local-llm-setup/
├── package.json                  # Root workspace config
├── pnpm-workspace.yaml           # pnpm workspace definition
├── turbo.json                    # Turborepo build orchestration
├── .gitignore
│
├── packages/
│   └── shared/                   # @kong/shared — TypeScript types and constants
│       ├── src/
│       │   ├── types/
│       │   │   ├── chat.ts       # ChatMessage, ChatRequest, ChatResponse, ChatStreamChunk
│       │   │   ├── models.ts     # ModelInfo, ModelCategory, VramStatus, ModelLoadRequest
│       │   │   └── system.ts     # SystemStatus, GpuInfo, BackendStatus
│       │   └── index.ts          # Re-exports + service URL constants
│       ├── package.json
│       └── tsconfig.json
│
├── apps/
│   ├── gateway/                  # @kong/gateway — Fastify API gateway
│   │   ├── src/
│   │   │   ├── server.ts         # Entry point — registers plugins and routes
│   │   │   ├── routes/
│   │   │   │   ├── chat.ts       # POST /api/chat, GET /api/chat/ws
│   │   │   │   ├── models.ts     # GET /api/models, POST /api/models/pull, DELETE /api/models/:name
│   │   │   │   └── system.ts     # GET /api/system
│   │   │   └── services/
│   │   │       └── ollama-client.ts  # Ollama HTTP client (chat, list, pull, delete)
│   │   ├── package.json
│   │   └── tsconfig.json
│   │
│   ├── web/                      # @kong/web — React web UI
│   │   ├── index.html
│   │   ├── vite.config.ts        # Vite config with API proxy to gateway
│   │   ├── src/
│   │   │   ├── main.tsx          # React entry point
│   │   │   ├── index.css         # Tailwind CSS imports + dark theme base
│   │   │   ├── App.tsx           # Root component — sidebar nav, model selector, page router
│   │   │   ├── pages/
│   │   │   │   ├── Chat.tsx      # Chat interface with streaming, stop, clear
│   │   │   │   ├── ModelManager.tsx  # Model list with status, size, category
│   │   │   │   └── SystemMonitor.tsx # GPU stats, VRAM bar, temperature, backend health
│   │   │   └── hooks/
│   │   │       ├── useChat.ts    # SSE streaming chat hook
│   │   │       ├── useModels.ts  # Model list fetching hook
│   │   │       └── useSystem.ts  # System status polling hook
│   │   ├── package.json
│   │   └── tsconfig.json
│   │
│   ├── cli/                      # @kong/cli — command-line interface
│   │   ├── bin/
│   │   │   └── kong.js           # Entry point
│   │   ├── src/commands/
│   │   │   ├── index.ts          # Commander.js program definition
│   │   │   ├── chat.ts           # kong chat [prompt] — interactive or single-shot
│   │   │   ├── models.ts         # kong models list | kong models pull <name>
│   │   │   ├── system.ts         # kong system status
│   │   │   └── serve.ts          # kong serve — start gateway + web UI
│   │   ├── package.json
│   │   └── tsconfig.json
│   │
│   ├── orchestrator/             # (Phase 2) Express workflow engine
│   ├── ml-server/                # (Phase 3) Python FastAPI ML backends
│   ├── desktop/                  # (Phase 5) Tauri desktop app
│   ├── native/                   # (Phase 5) C++ SDL+BGFX+Dear ImGui app
│   └── juce-audio/               # (Phase 4) JUCE 8 audio app
│
├── config/
│   ├── models.yaml               # Model registry — all models across all backends
│   ├── backends.yaml             # Backend service definitions and health checks
│   └── vram-profiles.yaml        # VRAM budget profiles for different workloads
│
├── scripts/
│   ├── setup.sh                  # First-time setup: check prerequisites, install deps, pull models
│   └── dev.sh                    # Start Ollama + all dev servers
│
└── docs/
    ├── 2026-04-10-STATUS-REPORT.md
    └── POSSIBILITIES-AND-AVENUES-OF-AI-RESEARCH.md

Prerequisites

Requirement	Minimum	Recommended
OS	Linux (Ubuntu 22.04+)	Ubuntu 24.04
GPU	NVIDIA with 8GB VRAM	RTX 4080 16GB
RAM	16GB	32GB
NVIDIA Driver	535+	Latest stable
CUDA	12.0+	13.0
Node.js	20.x	24.x
pnpm	9.x	10.x
Ollama	0.20+	Latest

Install Prerequisites

# Node.js (via nvm)
curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.40.0/install.sh | bash
nvm install 24

# pnpm
npm install -g pnpm

# Ollama
curl -fsSL https://ollama.com/install.sh | sh

Quick Start

# Clone
git clone https://github.com/datajango/kong-local-llm-setup.git
cd kong-local-llm-setup

# Setup (installs dependencies + pulls default model)
./scripts/setup.sh

# Start everything
./scripts/dev.sh

# Open in browser
# http://localhost:5173

Or manually:

# Install dependencies
pnpm install

# Ensure Ollama is running
ollama serve &

# Pull the coding model (9GB download)
ollama pull qwen2.5-coder:14b-instruct-q4_K_M

# Start gateway + web UI
pnpm run dev

# Open http://localhost:5173

Apps

Gateway (Fastify)

Package: @kong/gateway | Port: 3000 | Path: apps/gateway/

The central API gateway. Every frontend and CLI command communicates through this service. It proxies requests to Ollama and (in later phases) to the Express orchestrator and Python ML server.

Key files:

src/server.ts — Fastify app setup, plugin registration, route mounting
src/routes/chat.ts — Chat endpoint with SSE streaming and WebSocket support
src/routes/models.ts — Model CRUD operations proxied to Ollama
src/routes/system.ts — GPU monitoring via nvidia-smi, backend health checks
src/services/ollama-client.ts — Typed HTTP client for the Ollama REST API

Features:

CORS enabled for all origins (development)
WebSocket support via @fastify/websocket
Server-Sent Events (SSE) for streaming chat responses
Automatic model category inference from model names

Run independently:

pnpm --filter @kong/gateway dev

Web UI (React)

Package: @kong/web | Port: 5173 | Path: apps/web/

Dark-themed single-page application with three pages accessible from the sidebar.

Pages:

Page	Description
Chat	Send messages, see streaming responses with typing cursor, stop generation, clear history
Model Manager	View all available models with category, parameter count, quantization, VRAM size, and load status
System Monitor	Live GPU stats refreshed every 3 seconds — VRAM usage bar (color-coded), temperature, utilization percentage, backend health indicators

Custom Hooks:

Hook	Purpose
`useChat(model)`	Manages message state, SSE streaming, abort control
`useModels()`	Fetches model list from gateway, provides refresh
`useSystem(pollMs)`	Polls `/api/system` at configurable interval

Tech:

React 19 with TypeScript
Tailwind CSS v4 (via @tailwindcss/vite plugin)
Lucide React for icons
Vite 6 with dev proxy to gateway (/api -> localhost:3000)

Run independently:

pnpm --filter @kong/web dev

CLI

Package: @kong/cli | Path: apps/cli/

Command-line interface powered by Commander.js. All commands communicate with the gateway API.

Commands:

kong chat [prompt]              Chat with a model (interactive REPL or single prompt)
  -m, --model <model>           Model to use (default: qwen2.5-coder:14b-instruct-q4_K_M)

kong models list                List all available models with status and size
kong models pull <name>         Pull a model with streaming progress display

kong system status              Show GPU info, VRAM usage, backend health

kong serve                      Start gateway + web UI dev servers
  -p, --port <port>             Gateway port (default: 3000)

Examples:

# Single prompt
pnpm --filter @kong/cli dev -- chat "Write a C++ hello world"

# Interactive mode
pnpm --filter @kong/cli dev -- chat

# Check system
pnpm --filter @kong/cli dev -- system status

# List models
pnpm --filter @kong/cli dev -- models list

Environment variables:

KONG_GATEWAY — Gateway URL (default: http://localhost:3000)

Orchestrator (Express)

Package: @kong/orchestrator | Port: 3001 | Path: apps/orchestrator/

Status: Phase 2 — not yet implemented.

Planned responsibilities:

Model lifecycle management (load, unload, swap)
BullMQ job queue for async tasks (image generation, audio processing)
Multi-step workflow engine (chain multiple models together)
Ollama process supervision (start, stop, restart, health monitoring)

ML Server (FastAPI)

Package: @kong/ml-server | Port: 8000 | Path: apps/ml-server/

Status: Phase 3 — not yet implemented.

Planned responsibilities:

Image generation via diffusers (SDXL Turbo, Stable Diffusion XL)
Speech-to-text via faster-whisper (Whisper Large v3 Turbo)
Text-to-speech via piper-tts
Music generation via audiocraft (MusicGen, AudioGen)
3D mesh generation via TripoSR
VRAM coordination with gateway via Redis

Desktop (Tauri)

Package: @kong/desktop | Path: apps/desktop/

Status: Phase 5 — not yet implemented.

Tauri 2.0 desktop application wrapping the shared React frontend. Adds native capabilities:

System tray with quick actions
Native file dialogs for saving generated content
Local file system access for model management
Native menus and keyboard shortcuts
Auto-start and background operation

Native (C++ SDL+BGFX+ImGui)

Package: N/A (CMake) | Path: apps/native/

Status: Phase 5 — not yet implemented.

High-performance native application for real-time workloads:

SDL2 for window management and input
BGFX for cross-platform GPU rendering
Dear ImGui for immediate-mode UI panels
libcurl for HTTP communication with the gateway
Custom WebSocket client for streaming
3D viewport for rendering generated meshes (BGFX shaders)
Real-time GPU monitoring panels
Audio capture via SDL for voice input

JUCE Audio

Package: N/A (CMake) | Path: apps/juce-audio/

Status: Phase 4 — not yet implemented.

Standalone audio application built with JUCE 8:

Audio engine with real-time processing pipeline
Speech panel — microphone capture, STT via Whisper, TTS playback
Music panel — MusicGen control, melody input, generation parameters
Sound effects panel — AudioGen text-to-SFX generation
HTTP client connecting to the gateway for all model requests
Audio file management (save/load/export generated audio)

Packages

Shared Types

Package: @kong/shared | Path: packages/shared/

TypeScript type definitions and constants shared across all Node.js apps.

Types:

// Chat
ChatMessage          // { role, content, images? }
ChatRequest          // { model, messages, stream?, temperature?, maxTokens? }
ChatResponse         // { model, message, done, totalDuration?, evalCount? }
ChatStreamChunk      // { model, message, done }

// Models
ModelInfo            // { id, name, category, backend, vramMb, loaded, ... }
ModelCategory        // "coding" | "chat" | "vision" | "image-gen" | "stt" | "tts" | ...
ModelLoadRequest     // { modelId, priority? }
ModelListResponse    // { models, vram }
VramStatus           // { totalMb, usedMb, freeMb, loadedModels }

// System
SystemStatus         // { gpu, vram, backends, uptime }
GpuInfo              // { name, totalMemoryMb, usedMemoryMb, temperature?, utilization? }
BackendStatus        // { name, url, healthy, lastCheck }

Constants:

OLLAMA_URL       = "http://localhost:11434"
GATEWAY_URL      = "http://localhost:3000"
ORCHESTRATOR_URL = "http://localhost:3001"
ML_SERVER_URL    = "http://localhost:8000"

Build:

pnpm --filter @kong/shared build

Models

Ollama Models

Models served by Ollama for text, code, and vision inference.

Model	Category	Parameters	Quantization	VRAM	Description
`qwen2.5-coder:14b-instruct-q4_K_M`	coding	14.8B	Q4_K_M	~9 GB	Expert coding — C++, Python, TypeScript, Rust, Go
`qwen2.5:14b-instruct-q4_K_M`	chat	14.8B	Q4_K_M	~9 GB	General-purpose chat and reasoning
`minicpm-v:8b-2.6-q4_K_M`	vision	8B	Q4_K_M	~5 GB	Image understanding, technical diagrams, OCR

Pull additional models:

ollama pull qwen2.5:14b-instruct-q4_K_M
ollama pull minicpm-v:8b-2.6-q4_K_M

Model Profiles

Profiles combine a base Ollama model with a specialized system prompt. They reuse the same model weights — no additional VRAM required.

Profile	Base Model	Domain
Electronics Expert	qwen2.5-coder:14b	ESP32, Raspberry Pi, Arduino, I2C/SPI/UART, circuit design
CAD Expert	qwen2.5-coder:14b	OpenSCAD, CadQuery, FreeCAD, mechanical engineering

Profiles are defined in config/models.yaml and include the full system prompt.

Python ML Models (Planned)

Models that will be served by the FastAPI ML server in Phase 3+.

Model	Category	VRAM	Backend Library
SDXL Turbo	Image generation	~5 GB	`diffusers`
Whisper Large v3 Turbo	Speech-to-text	~1.5 GB	`faster-whisper`
Piper TTS	Text-to-speech	~100 MB (CPU)	`piper-tts`
MusicGen Small	Music generation	~2 GB	`audiocraft`
AudioGen Medium	Sound effects	~4 GB	`audiocraft`
TripoSR	Image-to-3D mesh	~4 GB	`triposr`

VRAM Management

The RTX 4080 has 16,376 MB of VRAM. After system overhead (~400 MB), the effective budget is approximately 15,500 MB. Only one large model (9 GB) can be loaded at a time, or 2-3 smaller models concurrently.

VRAM Profiles

Predefined model combinations optimized for specific workflows:

Profile	Models	Total VRAM	Use Case
coding	Qwen 2.5 Coder 14B + Whisper	10.5 GB	Code generation with voice input
creative	SDXL Turbo + MusicGen + Piper	7.1 GB	Image, music, and speech generation
vision-chat	MiniCPM-V 8B	5.0 GB	Image analysis and visual Q&A
3d-pipeline	TripoSR + MiniCPM-V 8B	9.0 GB	Image-to-3D with vision analysis

VRAM Budget

The VRAM manager (planned for Phase 2) will:

Track what models are loaded and their VRAM consumption via Redis
Enforce a VRAM budget — reject requests if insufficient VRAM
Auto-evict least-recently-used models when new models are requested
Never evict a model that is actively streaming a response
Support preemptive loading hints from frontends
Coordinate between Ollama (Node.js) and ML Server (Python) via shared Redis state

API Reference

Base URL: http://localhost:3000

Health

GET /api/health

Response:

{ "status": "ok" }

Chat

REST (SSE Streaming)

POST /api/chat
Content-Type: application/json

Request body:

{
  "model": "qwen2.5-coder:14b-instruct-q4_K_M",
  "messages": [
    { "role": "user", "content": "Write a C++ hello world" }
  ],
  "stream": true,
  "temperature": 0.7,
  "maxTokens": 2048
}

Streaming response (stream: true):

data: {"model":"qwen2.5-coder:14b-instruct-q4_K_M","message":{"role":"assistant","content":"```cpp"},"done":false}
data: {"model":"qwen2.5-coder:14b-instruct-q4_K_M","message":{"role":"assistant","content":"\n#include"},"done":false}
...
data: [DONE]

Non-streaming response (stream: false):

{
  "model": "qwen2.5-coder:14b-instruct-q4_K_M",
  "message": { "role": "assistant", "content": "```cpp\n#include <iostream>..." },
  "done": true
}

WebSocket

GET /api/chat/ws

Send a JSON ChatRequest message. Receive streamed ChatStreamChunk messages, ending with { "done": true }.

Models API

List Models

GET /api/models

Response:

{
  "models": [
    {
      "id": "qwen2.5-coder:14b-instruct-q4_K_M",
      "name": "qwen2.5-coder:14b-instruct-q4_K_M",
      "category": "coding",
      "backend": "ollama",
      "vramMb": 8572,
      "parameterSize": "14.8B",
      "quantization": "Q4_K_M",
      "loaded": false
    }
  ]
}

List Running Models

GET /api/models/running

Response:

{ "running": ["qwen2.5-coder:14b-instruct-q4_K_M"] }

Pull a Model

POST /api/models/pull
Content-Type: application/json

Request body:

{ "name": "minicpm-v:8b-2.6-q4_K_M" }

Response (SSE stream):

data: {"status":"pulling manifest"}
data: {"status":"downloading","completed":1048576,"total":5368709120}
...
data: {"status":"success"}

Delete a Model

DELETE /api/models/:name

Response:

{ "success": true }

System

GET /api/system

Response:

{
  "gpu": {
    "name": "NVIDIA GeForce RTX 4080",
    "totalMemoryMb": 16376,
    "usedMemoryMb": 343,
    "temperature": 34,
    "utilization": 7
  },
  "vram": {
    "totalMb": 16376,
    "usedMb": 343,
    "freeMb": 16033,
    "loadedModels": []
  },
  "backends": [
    {
      "name": "ollama",
      "url": "http://localhost:11434",
      "healthy": true,
      "lastCheck": "2026-04-11T04:32:11.893Z"
    }
  ],
  "uptime": 42.5
}

Configuration

models.yaml

config/models.yaml — Central registry of all models across all backends.

Each model entry contains:

Field	Type	Description
`id`	string	Unique identifier (Ollama model tag or custom ID)
`name`	string	Human-readable display name
`category`	string	One of: coding, chat, vision, image-gen, stt, tts, music, sound-fx, 3d, electronics, cad
`backend`	string	One of: ollama, diffusers, whisper, piper, audiocraft, triposr
`vramMb`	number	VRAM consumption in megabytes
`description`	string	Short description of capabilities
`baseModel`	string	(Profiles only) Underlying Ollama model ID
`systemPrompt`	string	(Profiles only) System prompt for specialization

backends.yaml

config/backends.yaml — Backend service definitions.

Each backend entry contains:

Field	Type	Description
`url`	string	Base URL of the service
`healthCheck`	string	Path to health check endpoint
`description`	string	Short description

vram-profiles.yaml

config/vram-profiles.yaml — Predefined model combinations for specific workflows.

Each profile contains:

Field	Type	Description
`description`	string	What the profile is for
`models`	array	List of `{ id, vramMb }` entries
`totalVramMb`	number	Sum of all model VRAM requirements

Scripts

Script	Description
`scripts/setup.sh`	First-time setup — checks Node.js, pnpm, Ollama are installed; runs `pnpm install`; pulls the default coding model
`scripts/dev.sh`	Starts Ollama (if not running) then runs `pnpm run dev` to launch all services via Turborepo

Development

Monorepo Commands

# Install all dependencies
pnpm install

# Start all services in dev mode (gateway + web UI)
pnpm run dev

# Build all packages
pnpm run build

# Build a specific package
pnpm --filter @kong/shared build
pnpm --filter @kong/gateway build

# Run a specific app in dev mode
pnpm --filter @kong/gateway dev
pnpm --filter @kong/web dev

# Clean all build artifacts
pnpm run clean

Adding a New Model

Pull the model via Ollama:
```
ollama pull <model-name>
```
Add an entry to config/models.yaml with the model's ID, category, VRAM size, and description.
The model will automatically appear in the Web UI model selector and CLI models list output.

Adding a New API Route

Create a new file in apps/gateway/src/routes/ (e.g., images.ts).

Export an async function that takes a FastifyInstance and registers routes:

import type { FastifyInstance } from "fastify";

export async function imageRoutes(app: FastifyInstance) {
  app.post("/api/images/generate", async (request) => {
    // ...
  });
}

Register the route in apps/gateway/src/server.ts:

import { imageRoutes } from "./routes/images.js";
await app.register(imageRoutes);

Adding a New Web UI Page

Create a new page component in apps/web/src/pages/ (e.g., ImageGen.tsx).
Add a nav entry to the NAV_ITEMS array in apps/web/src/App.tsx.
Add the page route in the <main> section of App.tsx.
Create a custom hook in apps/web/src/hooks/ if the page needs API data.

Tech Stack

Runtime & Build

Tool	Version	Purpose
Node.js	24.x	JavaScript/TypeScript runtime
pnpm	10.x	Package manager with workspace support
Turborepo	2.x	Monorepo build orchestration with caching
TypeScript	5.x	Type safety across all Node.js packages
tsx	4.x	TypeScript execution for dev mode

Backend

Tool	Purpose
Fastify 5	API gateway — high-performance HTTP + WebSocket
@fastify/cors	Cross-origin resource sharing
@fastify/websocket	WebSocket support for streaming
@fastify/static	Static file serving (production)
Ollama	LLM inference runtime (wraps llama.cpp)

Frontend

Tool	Purpose
React 19	UI component framework
Vite 6	Frontend build tool and dev server
Tailwind CSS 4	Utility-first CSS framework
Lucide React	Icon library

CLI

Tool	Purpose
Commander.js 13	CLI argument parsing and command structure

Planned

Tool	Phase	Purpose
Express	2	Workflow orchestrator, job queue
Redis	2	VRAM state, job queue backend (BullMQ)
FastAPI (Python)	3	ML inference server
diffusers	3	Image generation (Stable Diffusion)
faster-whisper	3	Speech-to-text
piper-tts	3	Text-to-speech
audiocraft	4	Music and sound effect generation
JUCE 8	4	Audio application framework
Tauri 2	5	Desktop application shell
SDL2	5	Window management, input, audio capture
BGFX	5	Cross-platform GPU rendering
Dear ImGui	5	Immediate-mode GUI
TripoSR	6	Image-to-3D mesh generation

Roadmap

Phase	Focus	Status
1. Foundation	Ollama + Fastify gateway + React web UI + CLI	Complete
2. Model Management + VRAM	Redis, VRAM manager, Express orchestrator, BullMQ	Planned
3. Python ML Server	Image generation, speech (Whisper + Piper), VRAM coordination	Planned
4. Audio & Music	MusicGen, AudioGen, JUCE 8 audio application	Planned
5. Desktop Apps	Tauri desktop app, C++ SDL+BGFX+Dear ImGui native app	Planned
6. 3D & Advanced	TripoSR, workflow engine, CAD/electronics profiles	Planned

See docs/2026-04-10-STATUS-REPORT.md for detailed Phase 1 completion report.

Documentation

Document	Description
Status Report (2026-04-10)	Phase 1 completion report — what was built, tested, and verified
AI Research Possibilities	Comprehensive survey of 18 AI research domains with models, capabilities, and open frontiers

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
apps		apps
config		config
docs		docs
packages/shared		packages/shared
scripts		scripts
.gitignore		.gitignore
EASY-START-GUIDE.md		EASY-START-GUIDE.md
README.md		README.md
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
pnpm-workspace.yaml		pnpm-workspace.yaml
turbo.json		turbo.json

Folders and files

Latest commit

History

Repository files navigation

Kong — Local AI Platform

Table of Contents

Architecture

Repository Structure

Prerequisites

Install Prerequisites

Quick Start

Apps

Gateway (Fastify)

Web UI (React)

CLI

Orchestrator (Express)

ML Server (FastAPI)

Desktop (Tauri)

Native (C++ SDL+BGFX+ImGui)

JUCE Audio

Packages

Shared Types

Models

Ollama Models

Model Profiles

Python ML Models (Planned)

VRAM Management

VRAM Profiles

VRAM Budget

API Reference

Health

Chat

REST (SSE Streaming)

WebSocket

Models API

List Models

List Running Models

Pull a Model

Delete a Model

System

Configuration

models.yaml

backends.yaml

vram-profiles.yaml

Scripts

Development

Monorepo Commands

Adding a New Model

Adding a New API Route

Adding a New Web UI Page

Tech Stack

Runtime & Build

Backend

Frontend

CLI

Planned

Roadmap

Documentation

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages