growlf · growlf · May 19, 2026 · May 19, 2026 · May 19, 2026
@@ -145,6 +145,14 @@ cd ai-stack
 
 The installer auto-detects your GPU (NVIDIA, Intel Arc, or CPU-only), creates `.env`, generates your API key, and starts the stack. Open `http://localhost:40117/` (Shepherd dashboard) when it completes.
 
+**Older Intel iGPU?** (Iris Pro / Iris / UHD / Gen 9 etc., pre-Arc.) `install.sh` falls back to CPU-only for these — ipex-llm doesn't support pre-11th-Gen iGPUs. Use the Vulkan path instead:
+
+```bash
+./scripts/install-vulkan-ollama.sh   # native Ollama + Vulkan/Mesa ANV
+```
+
+See [`docs/hardware/intel-igpu-vulkan.md`](docs/hardware/intel-igpu-vulkan.md) for the procedure and supported hardware list. The rest of ai-stack (Olla, LiteLLM, Router, Shepherd) still runs via `docker compose` — it just connects to the native Ollama on `localhost:11434`.
+
 To add cloud models (Claude, Gemini, OpenCode Zen) after install:
 ```bash
 echo 'ANTHROPIC_API_KEY=sk-ant-...' >> .env

@@ -0,0 +1,213 @@
+# Older Intel iGPU — Vulkan via Mesa ANV (native Ollama)
+
+For Intel iGPUs that **predate the Arc Alchemist generation** (Iris Pro 580 / Iris / UHD / Skylake-era Gen 9 etc.), the
+`docker-compose.arc.yml` overlay (ipex-llm + SYCL) does **not** work — the ipex-llm runtime requires 11th-Gen Core+
+class hardware. Use the **Vulkan path via Mesa ANV** with a **native (non-Docker) Ollama install**.
+
+This was validated 2026-05-19 on:
+
+- **nuk1** — Intel NUC6i7KYB ("Skull Canyon"), Iris Pro 580 (Gen 9 GT4e, 72 EUs, 128 MB eDRAM), 32 GB RAM
+- **lab1, lab2, lab3** — similar Intel NUC hardware class
+
+After this procedure: `ollama ps` shows `100% GPU`, default context jumps from 4096 → 32768, and the GPU is visible
+as `library=Vulkan name=Vulkan0 description="Intel(R) Iris(R) Pro Graphics 580" total="23.4 GiB"` (or similar).
+
+## Why this path (not arc.yml)
+
+| Hardware class | Path | Why |
+|---|---|---|
+| Intel Arc (Alchemist / Battlemage, 12th-Gen+) | `docker-compose.arc.yml` | ipex-llm/SYCL stack tested and supported on this hardware |
+| Intel Iris Xe (Gen 12, 11th-Gen Core+) | `docker-compose.arc.yml` *or* this path | Either stack works; Vulkan is broader-supported, ipex-llm is more optimized when it works |
+| Intel Iris Pro / Iris / UHD / Gen 9 or older | **this path** (Vulkan) | ipex-llm/SYCL doesn't support pre-11th-Gen iGPUs; Vulkan via Mesa ANV does |
+| NVIDIA | `docker-compose.nvidia.yml` | n/a |
+| CPU-only | base `docker-compose.yml` | fallback |
+
+The rest of ai-stack (Olla, LiteLLM, Smart Router, Shepherd) runs in Docker as usual; only the Ollama service runs natively,
+exposing port 11434 to the localhost where the other services connect.
+
+## Prerequisites
+
+- Ubuntu 24.04 LTS (Noble) or compatible
+- `i915` kernel driver active (`lspci -k -s 00:02.0 | grep "Kernel driver in use: i915"`)
+- `/dev/dri/renderD128` present (`ls /dev/dri/`)
+- User in `render` and `video` groups (the Ollama installer adds these and creates the `ollama` user automatically)
+
+## Procedure
+
+The procedure is automated by `scripts/install-vulkan-ollama.sh` — see that script for the full sequence. Manual steps below
+are for operators who want to understand or adapt each step.
+
+### 1. Verify GPU + driver
+
+```bash
+lspci -nn | grep -i vga
+# Expected: Intel ... Iris ... [8086:...]
+
+sudo lspci -v -s 00:02.0 | grep "Kernel driver"
+# Expected: Kernel driver in use: i915
+
+ls /dev/dri/
+# Expected: card0 or card1, renderD128
+```
+
+### 2. Stop any conflicting Ollama (Docker or older native)
+
+If you previously ran Ollama via `docker-compose.arc.yml` or any other Docker container, it holds port 11434 and
+must be stopped:
+
+```bash
+# If running via ai-stack:
+sudo systemctl stop ai-stack.service 2>/dev/null || true
+
+# Or stop the Docker Ollama container directly:
+docker ps -q --filter "publish=11434" | xargs -r docker stop
+```
+
+### 3. Add yourself to render/video groups
+
+```bash
+sudo usermod -aG render,video "$USER"
+# Log out and back in, OR apply to current shell:
+newgrp render
+```
+
+### 4. Install Ollama (native, via official installer)
+
+```bash
+# Optional: clean previous broken install
+sudo rm -f /usr/local/bin/ollama
+sudo systemctl disable ollama 2>/dev/null || true
+sudo rm -f /etc/systemd/system/ollama.service
+
+# Fresh install
+curl -fsSL https://ollama.com/install.sh | sh
+```
+
+The installer creates the `ollama` user, adds it to `render` + `video` groups, and starts the systemd service.
+
+### 5. Add the Vulkan systemd drop-in
+
+By default the installer warns "No NVIDIA/AMD GPU detected" and falls back to CPU-only. Override with a drop-in:
+
+```bash
+sudo mkdir -p /etc/systemd/system/ollama.service.d
+sudo tee /etc/systemd/system/ollama.service.d/override.conf <<'EOF'
+[Service]
+Environment="OLLAMA_VULKAN=1"
+Environment="RUSTICL_ENABLE=iris"
+Environment="GPU_MAX_ALLOC_PERCENT=100"
+Environment="OLLAMA_GPU_OVERHEAD=0"
+EOF
+
+sudo systemctl daemon-reload
+sudo systemctl restart ollama
+```
+
+### 6. Fix model directory permissions
+
+If models were previously downloaded by a different user (e.g. a Docker-based setup with a `root`-owned bind-mount),
+fix ownership so the `ollama` user can read them:
+
+```bash
+sudo chown -R ollama:ollama /usr/share/ollama/.ollama/
+```
+
+### 7. Verify GPU is engaged
+
+```bash
+sudo journalctl -u ollama --no-pager -n 50 | grep -i "inference compute"
+```
+
+You want to see a line like:
+
+```
+inference compute id=... library=Vulkan name=Vulkan0
+description="Intel(R) Iris(R) Pro Graphics 580 (SKL GT4)"
+type=iGPU total="23.4 GiB" available="13.8 GiB"
+```
+
+And the default context should have jumped to 32768 (vs 4096 in CPU mode):
+
+```
+vram-based default context  total_vram="23.4 GiB"  default_num_ctx=32768
+```
+
+### 8. Test inference
+
+```bash
+ollama pull qwen2.5:1.5b
+ollama run qwen2.5:1.5b "Hello"
+ollama ps
+```
+
+Expected output of `ollama ps`:
+
+```
+NAME            ID    SIZE    PROCESSOR    CONTEXT
+qwen2.5:1.5b    ...   2.8 GB  100% GPU     32768
+```
+
+`100% GPU` is the success indicator. Anything less means the model partially fell back to CPU — usually because
+the model is larger than available VRAM.
+
+## ai-stack integration
+
+The Olla / LiteLLM / Smart Router / Shepherd services in `docker-compose.yml` connect to Ollama at `localhost:11434` —
+whether Ollama runs in Docker or natively, they don't care. After this procedure:
+
+```bash
+# Start the rest of ai-stack (Ollama already running natively):
+docker compose up -d olla litellm router shepherd
+```
+
+Or use the same `start.sh` ai-stack provides — set `GPU_TYPE=vulkan` (or `cpu`, since the compose file doesn't need
+to know about the native Vulkan Ollama) in `.env`.
+
+## Troubleshooting
+
+### `journalctl -u ollama` shows CPU-only / "No GPU detected"
+
+The drop-in didn't take effect. Verify:
+
+```bash
+sudo systemctl show ollama -p Environment | tr ' ' '\n' | grep -i ollama
+# Expected lines: Environment=OLLAMA_VULKAN=1 RUSTICL_ENABLE=iris ...
+```
+
+If missing, the drop-in file isn't being read. Confirm `/etc/systemd/system/ollama.service.d/override.conf` exists and
+re-run `sudo systemctl daemon-reload && sudo systemctl restart ollama`.
+
+### `ollama ps` shows `100% CPU` even though Vulkan is engaged
+
+Model size exceeds VRAM. Either:
+- Use a smaller quantization
+- Use a smaller model
+- Check `available` VRAM in the journal line — 14 GiB free means models <14 GiB fit fully
+
+### `vulkaninfo` hangs
+
+Common on headless systems (no display). Either skip the check or run from a second SSH session.
+The `journalctl` line is the authoritative signal — if it says `library=Vulkan`, Vulkan is working.
+
+### Permission denied on `/dev/dri/renderD128`
+
+User isn't in the `render` group. Re-check step 3 and log out / back in (or `newgrp render` in the current shell).
+
+## Hardware-class boundary
+
+This path is for Intel iGPUs **without** ipex-llm/SYCL support — primarily Gen 9 (Skylake era) and older.
+For 11th-Gen Core+ / Iris Xe / Arc you can also use this path, but `docker-compose.arc.yml` may give better
+performance on supported hardware. Run benchmarks on both paths if you're not sure.
+
+The ipex-llm tested-and-supported hardware floor is documented by Intel as
+**iGPUs of 11th Gen Core and newer have been tested; older iGPU works but with poor performance**
+([source](https://github.com/intel/ipex-llm) — note: project archived 2026-01-28). For older iGPUs this Vulkan path
+is the supported community direction.
+
+## Background
+
+Empirically validated 2026-05-19 by NetYeti on Intel NUC6i7KYB (Iris Pro 580) using free-tier Claude.ai in under
+one hour, after the ai-stack pod spent ~10 hours on a parallel SYCL/ipex-llm investigation that concluded the
+hardware was "GPU-incapable" (an over-generalization — the hardware is fine; ipex-llm just doesn't support this
+generation). Lesson banked: single-stack failure ≠ hardware-class failure. See `docs/hardware/arc.md` for the
+Arc path; this file documents the Iris/older-iGPU path.
@@ -0,0 +1,163 @@
+#!/usr/bin/env bash
+# install-vulkan-ollama.sh — install native Ollama with Vulkan/Mesa ANV GPU support
+#
+# For Intel iGPUs that predate Arc Alchemist (Iris Pro / Iris / UHD / Gen 9 etc.)
+# where docker-compose.arc.yml (ipex-llm/SYCL) does NOT work.
+#
+# See docs/hardware/intel-igpu-vulkan.md for background and manual procedure.
+# Validated 2026-05-19 on Intel NUC6i7KYB (Iris Pro 580) + lab1/2/3 NUC hardware.
+
+set -euo pipefail
+
+
+info()    { echo -e "\033[1;34m[INFO]\033[0m  $*"; }
+success() { echo -e "\033[1;32m[OK]\033[0m    $*"; }
+warn()    { echo -e "\033[1;33m[WARN]\033[0m  $*"; }
+error()   { echo -e "\033[1;31m[ERROR]\033[0m $*" >&2; }
+
+# ─── Prerequisites check ─────────────────────────────────────────────────────
+
+info "Checking prerequisites..."
+
+if ! command -v lspci &>/dev/null; then
+    error "lspci not found. Install pciutils first: sudo apt install pciutils"
+    exit 1
+fi
+
+if ! ls /dev/dri/renderD* &>/dev/null; then
+    error "No /dev/dri/renderD* device found. Is the i915 kernel driver loaded?"
+    error "Run: sudo lspci -v -s 00:02.0 | grep 'Kernel driver in use'"
+    exit 1
+fi
+
+# Detect Intel iGPU
+INTEL_GPU=$(lspci 2>/dev/null | grep -i "vga\|3d\|display" | grep -i "intel" | head -1 || true)
+if [[ -z "$INTEL_GPU" ]]; then
+    error "No Intel GPU detected via lspci. This script targets Intel iGPUs."
+    exit 1
+fi
+
+info "Detected: $INTEL_GPU"
+
+# Warn if this looks like an Arc GPU — arc.yml is likely the better path
+if echo "$INTEL_GPU" | grep -iq "arc"; then
+    warn "This looks like an Intel Arc GPU. docker-compose.arc.yml (ipex-llm/SYCL)"
+    warn "may give better performance on Arc hardware. Continue anyway? [y/N]"
+    read -r reply
+    if [[ ! "$reply" =~ ^[Yy]$ ]]; then
+        info "Aborted. See docs/hardware/arc.md for the Arc path."
+        exit 0
+    fi
+fi
+
+# ─── Stop conflicting Ollama (Docker or older native) ────────────────────────
+
+info "Stopping any conflicting Ollama instance on port 11434..."
+
+if command -v docker &>/dev/null; then
+    CONTAINERS=$(docker ps -q --filter "publish=11434" 2>/dev/null || true)
+    if [[ -n "$CONTAINERS" ]]; then
+        info "Stopping Docker container(s) holding port 11434..."
+        echo "$CONTAINERS" | xargs -r docker stop
+    fi
+fi
+
+# Stop ai-stack systemd service if present (so it doesn't restart the container)
+if systemctl list-unit-files 2>/dev/null | grep -q "^ai-stack.service"; then
+    info "Stopping ai-stack.service so it doesn't restart the Docker Ollama..."
+    sudo systemctl stop ai-stack.service || true
+fi
+
+# ─── Group memberships ───────────────────────────────────────────────────────
+
+info "Adding $USER to render and video groups (required for /dev/dri access)..."
+sudo usermod -aG render,video "$USER"
+
+# ─── Install Ollama ──────────────────────────────────────────────────────────
+
+if [[ -x /usr/local/bin/ollama ]]; then
+    info "Existing Ollama install detected at /usr/local/bin/ollama."
+    info "Will reuse the existing binary; only the systemd drop-in will change."
+else
+    info "Installing Ollama via official installer..."
+    curl -fsSL https://ollama.com/install.sh | sh
+fi
+
+# ─── Vulkan systemd drop-in ──────────────────────────────────────────────────
+
+info "Writing Vulkan systemd drop-in at /etc/systemd/system/ollama.service.d/override.conf..."
+
+sudo mkdir -p /etc/systemd/system/ollama.service.d
+
+sudo tee /etc/systemd/system/ollama.service.d/override.conf >/dev/null <<'EOF'
+[Service]
+Environment="OLLAMA_VULKAN=1"
+Environment="RUSTICL_ENABLE=iris"
+Environment="GPU_MAX_ALLOC_PERCENT=100"
+Environment="OLLAMA_GPU_OVERHEAD=0"
+EOF
+
+# ─── Fix model directory ownership ───────────────────────────────────────────
+
+if [[ -d /usr/share/ollama/.ollama ]]; then
+    info "Ensuring /usr/share/ollama/.ollama is owned by ollama user..."
+    sudo chown -R ollama:ollama /usr/share/ollama/.ollama/
+fi
+
+# ─── Reload + restart ────────────────────────────────────────────────────────
+
+info "Reloading systemd and restarting Ollama..."
+sudo systemctl daemon-reload
+sudo systemctl enable ollama
+sudo systemctl restart ollama
+
+# ─── Verify ──────────────────────────────────────────────────────────────────
+
+info "Waiting 5s for Ollama to come up..."
+sleep 5
+
+info "Checking journal for Vulkan inference-compute line..."
+if sudo journalctl -u ollama --no-pager -n 100 2>/dev/null | grep -q "library=Vulkan"; then
+    success "Vulkan GPU engaged. Journal confirms library=Vulkan."
+    sudo journalctl -u ollama --no-pager -n 100 | grep "library=Vulkan" | head -2
+else
+    warn "No 'library=Vulkan' line found in recent journal."
+    warn "This may mean the drop-in didn't take effect or the GPU isn't being used."
+    warn "Check: sudo journalctl -u ollama --no-pager -n 100"
+fi
+
+# Check API
+if curl -fsS http://127.0.0.1:11434/api/version &>/dev/null; then
+    success "Ollama API responding at http://127.0.0.1:11434"
+else
+    error "Ollama API not responding. Check: sudo systemctl status ollama"
+    exit 1
+fi
+
+# ─── Final guidance ──────────────────────────────────────────────────────────
+
+cat <<'EOF'
+
+────────────────────────────────────────────────────────────────────────────
+Vulkan Ollama install complete.
+
+Next steps:
+  1. Pull a model:
+     ollama pull qwen2.5:1.5b
+
+  2. Verify GPU is used (run inference, then check):
+     ollama run qwen2.5:1.5b "Hello"
+     ollama ps
+     # Expect PROCESSOR column to show "100% GPU"
+
+  3. Start the rest of ai-stack (Olla, LiteLLM, Smart Router, Shepherd):
+     docker compose up -d olla litellm router shepherd
+     # (Native Ollama on port 11434 will be discovered as the local backend.)
+
+If you switched from a Docker-based Ollama:
+  - Models that were inside the container may need re-pulling, unless you
+    pre-staged them under /usr/share/ollama/.ollama/models/.
+
+Troubleshooting + background: docs/hardware/intel-igpu-vulkan.md
+────────────────────────────────────────────────────────────────────────────
+EOF