From 146e0952b15cab487f48c112219abec0e1747b57 Mon Sep 17 00:00:00 2001 From: "Luma (Enclave AI)" Date: Tue, 19 May 2026 20:35:55 +0000 Subject: [PATCH 1/2] feat(arc): native Ollama + Vulkan path for older Intel iGPUs MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Adds a deployment path for Intel iGPUs that predate the Arc Alchemist generation (Iris Pro 580 / Iris / UHD / Gen 9), where the existing docker-compose.arc.yml (ipex-llm + SYCL) does not work. Uses native systemd Ollama with the Vulkan backend via Mesa ANV — the actually- supported community path for older Intel iGPUs. Files added: - docs/hardware/intel-igpu-vulkan.md — full procedure + hardware-class decision tree - scripts/install-vulkan-ollama.sh — automation (executable): detection, stop-docker-ollama, group memberships, curl install.sh, systemd drop-in (OLLAMA_VULKAN=1 + RUSTICL_ENABLE=iris + GPU_MAX_ALLOC_PERCENT=100 + OLLAMA_GPU_OVERHEAD=0), chown, journal verification, API check - README.md — Quick Start callout pointing at the new path Architecture: native systemd Ollama on localhost:11434; the rest of ai-stack (Olla, LiteLLM, Router, Shepherd) stays Docker — Smart Router just routes to the native Ollama as it would to any peer. Validated 2026-05-19: - nuk1 (Intel NUC6i7KYB, Iris Pro 580): operator-validated working, 100% GPU on qwen2.5:1.5b, 23.4 GiB unified memory, default ctx 32768 - lab1, lab2, lab3 (similar Intel NUC hardware): same procedure applied via pod ops; Vulkan/Mesa ANV engaged per journalctl verification (library=Vulkan name=Vulkan0 description=Intel(R) Iris(R) Pro Graphics 580) - lab4: offline at deploy time; same procedure applies when next online Why this matters: ai-stack mission is 'reclaim usability from old hardware'. Pre-Arc Intel iGPUs are a substantial install base that ipex-llm has moved past; Vulkan/Mesa ANV is the upstream-living path for them. Solution path researched + validated by operator (NetYeti) via fresh- context Claude.ai prompt in <1 hour; pod ops applied procedure across peers; codebase update encodes the path for external reproducibility. Co-Authored-By: Claude Opus 4.7 (1M context) --- README.md | 8 ++ docs/hardware/intel-igpu-vulkan.md | 213 +++++++++++++++++++++++++++++ scripts/install-vulkan-ollama.sh | 164 ++++++++++++++++++++++ 3 files changed, 385 insertions(+) create mode 100644 docs/hardware/intel-igpu-vulkan.md create mode 100755 scripts/install-vulkan-ollama.sh diff --git a/README.md b/README.md index 606f8ed..676a296 100644 --- a/README.md +++ b/README.md @@ -145,6 +145,14 @@ cd ai-stack The installer auto-detects your GPU (NVIDIA, Intel Arc, or CPU-only), creates `.env`, generates your API key, and starts the stack. Open `http://localhost:40117/` (Shepherd dashboard) when it completes. +**Older Intel iGPU?** (Iris Pro / Iris / UHD / Gen 9 etc., pre-Arc.) `install.sh` falls back to CPU-only for these — ipex-llm doesn't support pre-11th-Gen iGPUs. Use the Vulkan path instead: + +```bash +./scripts/install-vulkan-ollama.sh # native Ollama + Vulkan/Mesa ANV +``` + +See [`docs/hardware/intel-igpu-vulkan.md`](docs/hardware/intel-igpu-vulkan.md) for the procedure and supported hardware list. The rest of ai-stack (Olla, LiteLLM, Router, Shepherd) still runs via `docker compose` — it just connects to the native Ollama on `localhost:11434`. + To add cloud models (Claude, Gemini, OpenCode Zen) after install: ```bash echo 'ANTHROPIC_API_KEY=sk-ant-...' >> .env diff --git a/docs/hardware/intel-igpu-vulkan.md b/docs/hardware/intel-igpu-vulkan.md new file mode 100644 index 0000000..98023f4 --- /dev/null +++ b/docs/hardware/intel-igpu-vulkan.md @@ -0,0 +1,213 @@ +# Older Intel iGPU — Vulkan via Mesa ANV (native Ollama) + +For Intel iGPUs that **predate the Arc Alchemist generation** (Iris Pro 580 / Iris / UHD / Skylake-era Gen 9 etc.), the +`docker-compose.arc.yml` overlay (ipex-llm + SYCL) does **not** work — the ipex-llm runtime requires 11th-Gen Core+ +class hardware. Use the **Vulkan path via Mesa ANV** with a **native (non-Docker) Ollama install**. + +This was validated 2026-05-19 on: + +- **nuk1** — Intel NUC6i7KYB ("Skull Canyon"), Iris Pro 580 (Gen 9 GT4e, 72 EUs, 128 MB eDRAM), 32 GB RAM +- **lab1, lab2, lab3** — similar Intel NUC hardware class + +After this procedure: `ollama ps` shows `100% GPU`, default context jumps from 4096 → 32768, and the GPU is visible +as `library=Vulkan name=Vulkan0 description="Intel(R) Iris(R) Pro Graphics 580" total="23.4 GiB"` (or similar). + +## Why this path (not arc.yml) + +| Hardware class | Path | Why | +|---|---|---| +| Intel Arc (Alchemist / Battlemage, 12th-Gen+) | `docker-compose.arc.yml` | ipex-llm/SYCL stack tested and supported on this hardware | +| Intel Iris Xe (Gen 12, 11th-Gen Core+) | `docker-compose.arc.yml` *or* this path | Either stack works; Vulkan is broader-supported, ipex-llm is more optimized when it works | +| Intel Iris Pro / Iris / UHD / Gen 9 or older | **this path** (Vulkan) | ipex-llm/SYCL doesn't support pre-11th-Gen iGPUs; Vulkan via Mesa ANV does | +| NVIDIA | `docker-compose.nvidia.yml` | n/a | +| CPU-only | base `docker-compose.yml` | fallback | + +The rest of ai-stack (Olla, LiteLLM, Smart Router, Shepherd) runs in Docker as usual; only the Ollama service runs natively, +exposing port 11434 to the localhost where the other services connect. + +## Prerequisites + +- Ubuntu 24.04 LTS (Noble) or compatible +- `i915` kernel driver active (`lspci -k -s 00:02.0 | grep "Kernel driver in use: i915"`) +- `/dev/dri/renderD128` present (`ls /dev/dri/`) +- User in `render` and `video` groups (the Ollama installer adds these and creates the `ollama` user automatically) + +## Procedure + +The procedure is automated by `scripts/install-vulkan-ollama.sh` — see that script for the full sequence. Manual steps below +are for operators who want to understand or adapt each step. + +### 1. Verify GPU + driver + +```bash +lspci -nn | grep -i vga +# Expected: Intel ... Iris ... [8086:...] + +sudo lspci -v -s 00:02.0 | grep "Kernel driver" +# Expected: Kernel driver in use: i915 + +ls /dev/dri/ +# Expected: card0 or card1, renderD128 +``` + +### 2. Stop any conflicting Ollama (Docker or older native) + +If you previously ran Ollama via `docker-compose.arc.yml` or any other Docker container, it holds port 11434 and +must be stopped: + +```bash +# If running via ai-stack: +sudo systemctl stop ai-stack.service 2>/dev/null || true + +# Or stop the Docker Ollama container directly: +docker ps -q --filter "publish=11434" | xargs -r docker stop +``` + +### 3. Add yourself to render/video groups + +```bash +sudo usermod -aG render,video "$USER" +# Log out and back in, OR apply to current shell: +newgrp render +``` + +### 4. Install Ollama (native, via official installer) + +```bash +# Optional: clean previous broken install +sudo rm -f /usr/local/bin/ollama +sudo systemctl disable ollama 2>/dev/null || true +sudo rm -f /etc/systemd/system/ollama.service + +# Fresh install +curl -fsSL https://ollama.com/install.sh | sh +``` + +The installer creates the `ollama` user, adds it to `render` + `video` groups, and starts the systemd service. + +### 5. Add the Vulkan systemd drop-in + +By default the installer warns "No NVIDIA/AMD GPU detected" and falls back to CPU-only. Override with a drop-in: + +```bash +sudo mkdir -p /etc/systemd/system/ollama.service.d +sudo tee /etc/systemd/system/ollama.service.d/override.conf <<'EOF' +[Service] +Environment="OLLAMA_VULKAN=1" +Environment="RUSTICL_ENABLE=iris" +Environment="GPU_MAX_ALLOC_PERCENT=100" +Environment="OLLAMA_GPU_OVERHEAD=0" +EOF + +sudo systemctl daemon-reload +sudo systemctl restart ollama +``` + +### 6. Fix model directory permissions + +If models were previously downloaded by a different user (e.g. a Docker-based setup with a `root`-owned bind-mount), +fix ownership so the `ollama` user can read them: + +```bash +sudo chown -R ollama:ollama /usr/share/ollama/.ollama/ +``` + +### 7. Verify GPU is engaged + +```bash +sudo journalctl -u ollama --no-pager -n 50 | grep -i "inference compute" +``` + +You want to see a line like: + +``` +inference compute id=... library=Vulkan name=Vulkan0 +description="Intel(R) Iris(R) Pro Graphics 580 (SKL GT4)" +type=iGPU total="23.4 GiB" available="13.8 GiB" +``` + +And the default context should have jumped to 32768 (vs 4096 in CPU mode): + +``` +vram-based default context total_vram="23.4 GiB" default_num_ctx=32768 +``` + +### 8. Test inference + +```bash +ollama pull qwen2.5:1.5b +ollama run qwen2.5:1.5b "Hello" +ollama ps +``` + +Expected output of `ollama ps`: + +``` +NAME ID SIZE PROCESSOR CONTEXT +qwen2.5:1.5b ... 2.8 GB 100% GPU 32768 +``` + +`100% GPU` is the success indicator. Anything less means the model partially fell back to CPU — usually because +the model is larger than available VRAM. + +## ai-stack integration + +The Olla / LiteLLM / Smart Router / Shepherd services in `docker-compose.yml` connect to Ollama at `localhost:11434` — +whether Ollama runs in Docker or natively, they don't care. After this procedure: + +```bash +# Start the rest of ai-stack (Ollama already running natively): +docker compose up -d olla litellm router shepherd +``` + +Or use the same `start.sh` ai-stack provides — set `GPU_TYPE=vulkan` (or `cpu`, since the compose file doesn't need +to know about the native Vulkan Ollama) in `.env`. + +## Troubleshooting + +### `journalctl -u ollama` shows CPU-only / "No GPU detected" + +The drop-in didn't take effect. Verify: + +```bash +sudo systemctl show ollama -p Environment | tr ' ' '\n' | grep -i ollama +# Expected lines: Environment=OLLAMA_VULKAN=1 RUSTICL_ENABLE=iris ... +``` + +If missing, the drop-in file isn't being read. Confirm `/etc/systemd/system/ollama.service.d/override.conf` exists and +re-run `sudo systemctl daemon-reload && sudo systemctl restart ollama`. + +### `ollama ps` shows `100% CPU` even though Vulkan is engaged + +Model size exceeds VRAM. Either: +- Use a smaller quantization +- Use a smaller model +- Check `available` VRAM in the journal line — 14 GiB free means models <14 GiB fit fully + +### `vulkaninfo` hangs + +Common on headless systems (no display). Either skip the check or run from a second SSH session. +The `journalctl` line is the authoritative signal — if it says `library=Vulkan`, Vulkan is working. + +### Permission denied on `/dev/dri/renderD128` + +User isn't in the `render` group. Re-check step 3 and log out / back in (or `newgrp render` in the current shell). + +## Hardware-class boundary + +This path is for Intel iGPUs **without** ipex-llm/SYCL support — primarily Gen 9 (Skylake era) and older. +For 11th-Gen Core+ / Iris Xe / Arc you can also use this path, but `docker-compose.arc.yml` may give better +performance on supported hardware. Run benchmarks on both paths if you're not sure. + +The ipex-llm tested-and-supported hardware floor is documented by Intel as +**iGPUs of 11th Gen Core and newer have been tested; older iGPU works but with poor performance** +([source](https://github.com/intel/ipex-llm) — note: project archived 2026-01-28). For older iGPUs this Vulkan path +is the supported community direction. + +## Background + +Empirically validated 2026-05-19 by NetYeti on Intel NUC6i7KYB (Iris Pro 580) using free-tier Claude.ai in under +one hour, after the ai-stack pod spent ~10 hours on a parallel SYCL/ipex-llm investigation that concluded the +hardware was "GPU-incapable" (an over-generalization — the hardware is fine; ipex-llm just doesn't support this +generation). Lesson banked: single-stack failure ≠ hardware-class failure. See `docs/hardware/arc.md` for the +Arc path; this file documents the Iris/older-iGPU path. diff --git a/scripts/install-vulkan-ollama.sh b/scripts/install-vulkan-ollama.sh new file mode 100755 index 0000000..b1b4f35 --- /dev/null +++ b/scripts/install-vulkan-ollama.sh @@ -0,0 +1,164 @@ +#!/usr/bin/env bash +# install-vulkan-ollama.sh — install native Ollama with Vulkan/Mesa ANV GPU support +# +# For Intel iGPUs that predate Arc Alchemist (Iris Pro / Iris / UHD / Gen 9 etc.) +# where docker-compose.arc.yml (ipex-llm/SYCL) does NOT work. +# +# See docs/hardware/intel-igpu-vulkan.md for background and manual procedure. +# Validated 2026-05-19 on Intel NUC6i7KYB (Iris Pro 580) + lab1/2/3 NUC hardware. + +set -euo pipefail + +SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" + +info() { echo -e "\033[1;34m[INFO]\033[0m $*"; } +success() { echo -e "\033[1;32m[OK]\033[0m $*"; } +warn() { echo -e "\033[1;33m[WARN]\033[0m $*"; } +error() { echo -e "\033[1;31m[ERROR]\033[0m $*" >&2; } + +# ─── Prerequisites check ───────────────────────────────────────────────────── + +info "Checking prerequisites..." + +if ! command -v lspci &>/dev/null; then + error "lspci not found. Install pciutils first: sudo apt install pciutils" + exit 1 +fi + +if ! ls /dev/dri/renderD* &>/dev/null; then + error "No /dev/dri/renderD* device found. Is the i915 kernel driver loaded?" + error "Run: sudo lspci -v -s 00:02.0 | grep 'Kernel driver in use'" + exit 1 +fi + +# Detect Intel iGPU +INTEL_GPU=$(lspci 2>/dev/null | grep -i "vga\|3d\|display" | grep -i "intel" | head -1 || true) +if [[ -z "$INTEL_GPU" ]]; then + error "No Intel GPU detected via lspci. This script targets Intel iGPUs." + exit 1 +fi + +info "Detected: $INTEL_GPU" + +# Warn if this looks like an Arc GPU — arc.yml is likely the better path +if echo "$INTEL_GPU" | grep -iq "arc"; then + warn "This looks like an Intel Arc GPU. docker-compose.arc.yml (ipex-llm/SYCL)" + warn "may give better performance on Arc hardware. Continue anyway? [y/N]" + read -r reply + if [[ ! "$reply" =~ ^[Yy]$ ]]; then + info "Aborted. See docs/hardware/arc.md for the Arc path." + exit 0 + fi +fi + +# ─── Stop conflicting Ollama (Docker or older native) ──────────────────────── + +info "Stopping any conflicting Ollama instance on port 11434..." + +if command -v docker &>/dev/null; then + CONTAINERS=$(docker ps -q --filter "publish=11434" 2>/dev/null || true) + if [[ -n "$CONTAINERS" ]]; then + info "Stopping Docker container(s) holding port 11434..." + echo "$CONTAINERS" | xargs -r docker stop + fi +fi + +# Stop ai-stack systemd service if present (so it doesn't restart the container) +if systemctl list-unit-files 2>/dev/null | grep -q "^ai-stack.service"; then + info "Stopping ai-stack.service so it doesn't restart the Docker Ollama..." + sudo systemctl stop ai-stack.service || true +fi + +# ─── Group memberships ─────────────────────────────────────────────────────── + +info "Adding $USER to render and video groups (required for /dev/dri access)..." +sudo usermod -aG render,video "$USER" + +# ─── Install Ollama ────────────────────────────────────────────────────────── + +if [[ -x /usr/local/bin/ollama ]]; then + info "Existing Ollama install detected at /usr/local/bin/ollama." + info "Will reuse the existing binary; only the systemd drop-in will change." +else + info "Installing Ollama via official installer..." + curl -fsSL https://ollama.com/install.sh | sh +fi + +# ─── Vulkan systemd drop-in ────────────────────────────────────────────────── + +info "Writing Vulkan systemd drop-in at /etc/systemd/system/ollama.service.d/override.conf..." + +sudo mkdir -p /etc/systemd/system/ollama.service.d + +sudo tee /etc/systemd/system/ollama.service.d/override.conf >/dev/null <<'EOF' +[Service] +Environment="OLLAMA_VULKAN=1" +Environment="RUSTICL_ENABLE=iris" +Environment="GPU_MAX_ALLOC_PERCENT=100" +Environment="OLLAMA_GPU_OVERHEAD=0" +EOF + +# ─── Fix model directory ownership ─────────────────────────────────────────── + +if [[ -d /usr/share/ollama/.ollama ]]; then + info "Ensuring /usr/share/ollama/.ollama is owned by ollama user..." + sudo chown -R ollama:ollama /usr/share/ollama/.ollama/ +fi + +# ─── Reload + restart ──────────────────────────────────────────────────────── + +info "Reloading systemd and restarting Ollama..." +sudo systemctl daemon-reload +sudo systemctl enable ollama +sudo systemctl restart ollama + +# ─── Verify ────────────────────────────────────────────────────────────────── + +info "Waiting 5s for Ollama to come up..." +sleep 5 + +info "Checking journal for Vulkan inference-compute line..." +if sudo journalctl -u ollama --no-pager -n 100 2>/dev/null | grep -q "library=Vulkan"; then + success "Vulkan GPU engaged. Journal confirms library=Vulkan." + sudo journalctl -u ollama --no-pager -n 100 | grep "library=Vulkan" | head -2 +else + warn "No 'library=Vulkan' line found in recent journal." + warn "This may mean the drop-in didn't take effect or the GPU isn't being used." + warn "Check: sudo journalctl -u ollama --no-pager -n 100" +fi + +# Check API +if curl -fsS http://127.0.0.1:11434/api/version &>/dev/null; then + success "Ollama API responding at http://127.0.0.1:11434" +else + error "Ollama API not responding. Check: sudo systemctl status ollama" + exit 1 +fi + +# ─── Final guidance ────────────────────────────────────────────────────────── + +cat <<'EOF' + +──────────────────────────────────────────────────────────────────────────── +Vulkan Ollama install complete. + +Next steps: + 1. Pull a model: + ollama pull qwen2.5:1.5b + + 2. Verify GPU is used (run inference, then check): + ollama run qwen2.5:1.5b "Hello" + ollama ps + # Expect PROCESSOR column to show "100% GPU" + + 3. Start the rest of ai-stack (Olla, LiteLLM, Smart Router, Shepherd): + docker compose up -d olla litellm router shepherd + # (Native Ollama on port 11434 will be discovered as the local backend.) + +If you switched from a Docker-based Ollama: + - Models that were inside the container may need re-pulling, unless you + pre-staged them under /usr/share/ollama/.ollama/models/. + +Troubleshooting + background: docs/hardware/intel-igpu-vulkan.md +──────────────────────────────────────────────────────────────────────────── +EOF From e8b1baf83c9de249339fbd17c4e59c6d583b77e0 Mon Sep 17 00:00:00 2001 From: "Luma (Enclave AI)" Date: Tue, 19 May 2026 20:43:47 +0000 Subject: [PATCH 2/2] fix(arc): remove unused SCRIPT_DIR in install-vulkan-ollama.sh shellcheck SC2034 flagged SCRIPT_DIR as unused (declared but never referenced). Removing eliminates the CI warning and reduces the script footprint by 1 line. Co-Authored-By: Claude Opus 4.7 (1M context) --- scripts/install-vulkan-ollama.sh | 1 - 1 file changed, 1 deletion(-) diff --git a/scripts/install-vulkan-ollama.sh b/scripts/install-vulkan-ollama.sh index b1b4f35..2f41e3c 100755 --- a/scripts/install-vulkan-ollama.sh +++ b/scripts/install-vulkan-ollama.sh @@ -9,7 +9,6 @@ set -euo pipefail -SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" info() { echo -e "\033[1;34m[INFO]\033[0m $*"; } success() { echo -e "\033[1;32m[OK]\033[0m $*"; }