Skip to content

OCNGill/Gillsystems-AMD-Radeon-llama-cpp-Update-AI-Stack

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

95 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Gill Systems Logo

Gillsystems AI Stack Updater

"One command. Both OSes. Always current."

Python 3.11+ Platform AMD GPU License: MIT Version

Gillsystems AI Stack Updater is a portable, invocation-only Python agent that keeps your AMD consumer GPU AI stack — ROCm/HIP and llama.cpp — current on both Windows and Linux with a single command. No manual headaches. Reboot-resilient. Fully automated.

Status note: Round 6 is complete — all 4 production launcher bugs fixed. The critical Main Rig crash (non-existent Jinja file via --chat-template-file) is resolved. HTPC's broken bash continuation, missing --repeat-penalty/--repeat-last-n on all nodes, and missing -b/-ub on HTPC are all corrected. 12/12 tests pass. See documentation/Gemma4_tuning_31_and_E4B.md for the centralized engineering reference.


Overview

Keeping ROCm/HIP and llama.cpp up to date on AMD consumer GPUs involves a deep dependency chain:

kernel drivers → amdgpu → ROCm runtime → HIP → rocBLAS → hipBLAS → llama.cpp (GGML_HIP)

Gillsystems AI Stack Updater automates the entire chain:

  1. Detects your currently installed versions against the upstream stable releases (GitHub Releases API + AMD repo probing).
  2. Downloads, compiles (if needed), and installs new versions automatically using amdgpu-install on Linux and the silent HIP SDK installer on Windows.
  3. Handles any required OS reboots — saves its checkpoint state to SQLite, registers a startup resume task, and picks up exactly where it left off.

Key Features

Feature Detail
Smart Version Detection Checks rocm-smi, hipcc, GitHub Releases API, and AMD repo HEAD
Dual-OS Sub-Agents Linux: amdgpu-install automation. Windows: silent HIP SDK install
GPU Architecture Auto-Detect rocminfo, /sys/class/drm, lspci, wmi, hipInfo — no manual config needed
Reboot Resilience SQLite checkpoint + systemd (Linux) / Scheduled Task (Windows) resume
llama.cpp Build + Install Layout Clone, configure the correct backend automatically, install binaries into the canonical platform root, and mirror them into <llama_cpp_source>/bin for direct launcher use
Invocation-Only Does nothing unless explicitly run — no daemons, no watchers
Dry-Run Mode Full simulation with --dry-run, no system changes made
Rich Terminal UI Panels, progress bars, version tables, reboot countdown via rich
Safe Elevation sudo on Linux, UAC runas on Windows — auto-requested if not already root/admin

Quick Start

Windows

update-ai-stack.bat

Runs as Administrator (UAC prompt appears if not already elevated). bootstrap.ps1 handles the entire first-run experience automatically: locates Python, installs all dependencies, then launches the agent. No manual setup needed.

To force a full clean rebuild (clears CMake cache and re-runs all steps):

update-ai-stack.bat --force

Linux

./update-ai-stack.sh

Live Linux runs request sudo once, keep it warm for the rest of the run, and leave the Python venv/log handling in user space so Kubuntu and Steam Deck do not accumulate root-owned runtime files. Dry-run mode does not prompt for sudo. Both launchers write timestamped run logs into logs/.

If a Konsole or SteamOS terminal profile says Could not find '.../update-ai-stack.sh', starting '/bin/bash' instead, the launcher path is stale or the execute bit was stripped. The safe profile command is:

/bin/bash "/absolute/path/to/update-ai-stack.sh"

Running the launcher once via bash ./update-ai-stack.sh --check-env also repairs the execute bit on the Linux launcher files when the repo checkout is writable.

Server Launchers

See the Server Launchers section below for the full details.

Editable per-node templates in executables/ (copy, edit the paths, run):

executables/Gillsystems_server_edit_per_node.bat
executables/Gillsystems_server_edit_per_node.sh

Production-ready node-specific launchers in executables/:

executables/Gillsystems_Main_AI_Server.bat             # Windows 11, RX 7900 XTX, Gemma 4 31B
executables/Gillsystems-HTPC-AI-server.sh              # KUbuntu, RX 7600, Gemma 4 E4B
executables/Gillsystems_Laptop_4500U_Vega6_server.bat  # Windows 10, Vega 6, Gemma 4 E4B
executables/Gillsystems_SteamDeck_AI_Server.sh         # SteamOS, RDNA 2 APU, Gemma 4 E4B

Dry Run (safe preview — no changes made)

# Linux
./update-ai-stack.sh --dry-run

# Windows
update-ai-stack.bat --dry-run

Check for Updates Only

python -m src.main --check-only

Server Launchers

Round 6 completes a full audit and fix of all 4 production launchers. See documentation/Gemma4_tuning_31_and_E4B.md for the authoritative per-node flag matrix and engineering history.

Launcher Node OS Model Backend Context Default max output
executables/Gillsystems_Main_AI_Server.bat Gillsystems-Main Windows 11 Gemma 4 31B Q4_K_M HIP/ROCm 49 152 2 048
executables/Gillsystems-HTPC-AI-server.sh Gillsystems-HTPC KUbuntu Gemma 4 E4B Q6_K ROCm/HIP 32 768 1 536
executables/Gillsystems_Laptop_4500U_Vega6_server.bat Gillsystems-Laptop Windows 10 Gemma 4 E4B Q6_K Vulkan or HIP UMA 32 768 1 024
executables/Gillsystems_SteamDeck_AI_Server.sh Gillsystems-Steam-Deck SteamOS Gemma 4 E4B Q6_K Vulkan 32 768 1 024
executables/Gillsystems_server_edit_per_node.bat / .sh Any Both edit me edit me edit me edit me

All production launchers now:

  • Use --jinja with --chat-template gemma (Main Rig included as of Round 6 — no external Jinja file)
  • Use the Google-native decode profile: --temperature 1.0 --top-k 64 --top-p 0.95 --min-p 0.05
  • Pass --repeat-penalty 1.15 --repeat-last-n 128 (anti-loop mechanism, active as of Round 6)
  • Keep -b 2048, -ub 512, --context-shift, --metrics, and --no-mmap
  • Write launch logs into the repo-root logs/ directory
  • Cap default generation length with -n so missing client-side max_tokens can no longer run unbounded
  • Do not rely on -r/--reverse-prompt for API stop behavior
  • Support node-specific model path overrides instead of a single hardcoded model root

API stop behavior: For OpenAI-compatible chat clients, send an explicit stop array such as [ "<|im_end|>", "<|im_start|>" ] when you need hard stop-word behavior. llama-server documents stop arrays for API completions; reverse prompts are for interactive mode.

Main notes: The main launcher keeps the Dense 31B model as the highest-quality node, prefers the canonical C:\Models\Working_Models\ root, accepts GILLSYSTEMS_MAIN_MODEL_PATH as an override, and resolves portable rocBLAS support files when present. As of Round 6, uses --chat-template gemma (GGUF-embedded template) — the --chat-template-file Jinja file that caused crash-on-startup has been removed.

HTPC notes: The HTPC launcher resolves its executable, shared-library directory, and optional ROCBLAS_TENSILE_LIBPATH as a coherent runtime set.

Steam Deck notes: The Steam Deck launcher still prefers the Vulkan build-tree library pairing under /home/deck/src/llama.cpp/build-vulkan/bin, but now captures logs and enforces an output cap.


Requirements

Requirement Minimum Version
Python 3.11+
pip 23+
AMD GPU Radeon RX 5000 / 6000 / 7000 series (GCN4+ / RDNA2+ / RDNA3)
OS Ubuntu 22.04+, Fedora 39+, Windows 10 22H2+, Windows 11
Disk Space ~8 GB free (ROCm ~4 GB + llama.cpp build ~2 GB)
Internet Required for version checks and downloads

Linux extras: live Linux runs now auto-install missing cmake, git, compiler/build packages, and Tier 2 Vulkan development packages during the llama.cpp step on Debian/Ubuntu, Fedora, and SteamOS/Arch. If you pre-provision machines yourself, the expected packages are still cmake, ninja-build, git, gcc, g++, plus the Vulkan development packages required by your distro.

Windows extras: Visual Studio Build Tools 2022 with C++ workload, CMake. Tier 2 Windows Vulkan fallback also requires the LunarG Vulkan SDK so SPIRV-Headers can be discovered during llama.cpp configure.


CLI Reference

usage: gillsystems-ai-stack-updater [-h] [--dry-run] [--yes] [--force] [--skip-rocm]
            [--skip-llama] [--config CONFIG] [--verbose] [--bleeding-edge]
            [--resume] [--version]
Flag Description
--dry-run Simulate the entire run — no installs, no builds, no reboots
--yes / -y Auto-confirm all prompts (non-interactive/CI mode)
--force Re-run all steps even if already up to date; also clears the CMake cache directory before building to prevent stale HIP SDK links
--skip-rocm Skip ROCm/HIP update step
--skip-llama Skip llama.cpp build step
--bleeding-edge Compile llama.cpp from master branch instead of latest stable tag (zero-day model support, e.g. Gemma 4)
--config PATH Path to a custom config YAML (default: config/default_config.yaml)
--resume Resume after a reboot — called automatically by the startup task
--verbose Enable verbose/debug logging
--version Print Gillsystems AI Stack Updater version and exit

Configuration

Gillsystems AI Stack Updater reads config/default_config.yaml on startup. Every setting can also be overridden via environment variable.

Key Config Sections (config/default_config.yaml)

gpu:
  # auto_detect=true queries rocminfo/WMI/lspci at runtime and overrides this list
  targets: [gfx1100, gfx1102, gfx1033, gfx1030, gfx906]
  auto_detect: true

paths:
  llama_cpp_source:          "~/src/llama.cpp"        # git/cmake checkout; binaries are mirrored into <source>/bin
  llama_cpp_install_linux:   "/opt/gillsystems/llama.cpp"   # canonical Linux install root
  llama_cpp_install_windows: "C:\\Gillsystems\\llama.cpp" # canonical Windows install root
  state_dir: "state"
  log_dir:   "logs"

repo:
  # Linux and Windows now use the mainstream ggml-org fork
  llama_cpp_repo:         "https://github.com/ggml-org/llama.cpp.git"
  # Windows uses mainstream ggml-org fork (AMD has no native Windows ROCm build docs)
  llama_cpp_repo_windows: "https://github.com/ggml-org/llama.cpp.git"
  bleeding_edge: false    # set true or use --bleeding-edge flag

behavior:
  auto_reboot: true
  reboot_countdown_seconds: 30
  rocm_usecases: [rocm, hiplibsdk]

Environment Variable Overrides

Variable Effect
GILLSYSTEMS_AI_STACK_UPDATER_DRY_RUN=1 Enables dry-run mode
GILLSYSTEMS_AI_STACK_UPDATER_VERBOSE=1 Enables verbose logging
GILLSYSTEMS_AI_STACK_UPDATER_LOG_LEVEL=DEBUG Sets log level (DEBUG/INFO/WARNING/ERROR)

Architecture

State Machine

START
  │
  ▼
CHECK_VERSIONS ──(no updates needed)──► DONE
  │
  ▼
DETECT_GPU_TARGETS
  │
  ▼
UPDATE_ROCM/HIP ──(reboot required)──► REBOOT ──► [resume after boot]
  │                                                      │
  ▼                                                      │
BUILD_LLAMA_CPP ◄─────────────────────────────────────-─┘
  │
  ▼
VALIDATE
  │
  ▼
DONE

Module Overview

src/
├── __init__.py          # Package, version
├── main.py              # Orchestrator / entry point / state machine
├── config.py            # Pydantic config models + YAML loader
├── state_manager.py     # SQLite checkpoint ledger (StateManager)
├── version_intel.py     # Version detection (VersionIntel, UpdateManifest)
├── gpu_detect.py        # GPU arch auto-detection (GPUDetector)
├── privilege.py         # UAC / sudo elevation
├── cli.py               # Rich terminal UI
├── linux/
│   ├── rocm_updater.py  # amdgpu-install automation
│   ├── llama_builder.py # CMake + HIP build (Linux)
│   └── reboot_handler.py# systemd one-shot resume service
└── windows/
    ├── hip_updater.py   # HIP SDK silent installer
    ├── llama_builder.py # CMake + VS Build Tools + Ninja (Windows)
    └── reboot_handler.py# Scheduled Task resume

Checkpoint Database Schema

CREATE TABLE runs (
    run_id    TEXT PRIMARY KEY,
    started   TEXT,
    finished  TEXT,
    status    TEXT   -- running | done | failed
);

CREATE TABLE steps (
    run_id     TEXT,
    step_name  TEXT,
    status     TEXT,  -- pending | running | done | failed | skipped
    started    TEXT,
    finished   TEXT,
    detail     TEXT,
    PRIMARY KEY (run_id, step_name)
);

GPU Architecture Reference

GPU Series Architecture AMDGPU_TARGETS
RX 5500 / 5600 / 5700 RDNA1 gfx1010, gfx1011, gfx1012
RX 6600 / 6700 / 6800 / 6900 RDNA2 gfx1030, gfx1031, gfx1032
RX 7600 RDNA3 gfx1102
RX 7700 XT / 7800 XT RDNA3 gfx1101
RX 7900 GRE / 7900 XT / 7900 XTX RDNA3 gfx1100
RX 9070 / 9070 XT RDNA4 gfx1200, gfx1201
Steam Deck (Van Gogh APU) RDNA2 APU gfx1033
Radeon VII / Vega 20 Vega20 gfx906
Vega 6 / Vega 7 (Renoir / Cezanne iGPU) GCN5 iGPU gfx90c
RX 580 / 590 Polaris gfx803

Gillsystems AI Stack Updater auto-detects the correct targets using rocminfo, /sys/class/drm, lspci -nn, wmi, and hipInfo. Manual override is available via --gpu-targets or the config file.


Running Tests

# Install dev dependencies
pip install -e ".[dev]"

# Run full test suite
pytest tests/ -v

# Run with coverage report
pytest tests/ --cov=src --cov-report=term-missing

# Run a specific test module
pytest tests/test_state_manager.py -v
pytest tests/test_version_intel.py -v
pytest tests/test_linux_rocm.py -v
pytest tests/test_windows_hip.py -v

Tests use pytest-mock and responses for isolation — no real network calls or system modifications are made during testing.


How Reboot Resume Works

When a ROCm driver update requires a reboot:

  1. Gillsystems AI Stack Updater writes a reboot handoff JSON file (~/.gillsystems-ai-stack-updater/reboot_handoff.json) recording the current run_id and the next step to execute.
  2. Gillsystems AI Stack Updater registers a resume task:
    • Linux: writes /etc/systemd/system/gillsystems-ai-stack-updater-resume.service (one-shot, self-disabling), runs systemctl enable.
    • Windows: creates GillsystemsAIStackUpdaterResumeTask via schtasks /create /sc ONLOGON /rl HIGHEST.
  3. Gillsystems AI Stack Updater initiates the OS reboot (shutdown /r or systemctl reboot).
  4. After the system boots, the resume task runs Gillsystems AI Stack Updater automatically.
  5. Gillsystems AI Stack Updater reads the handoff file, restores the run_id, and continues from the saved step.
  6. The resume task is unregistered immediately after successful pickup.
  7. The handoff file is deleted after the run completes.

Supported Operating Systems

OS Version ROCm Support Notes
Ubuntu 22.04 LTS ✅ Full Recommended for Linux
Ubuntu 24.04 LTS ✅ Full
Fedora 39+ ✅ Full Uses dnf backend
Debian 12+ ⚠️ Partial May need manual repo setup
Windows 10 22H2+ ✅ Full HIP SDK 7.x
Windows 11 Any ✅ Full HIP SDK 7.x
macOS Any ❌ No AMD ROCm not supported on macOS

Project Layout

Gillsystems-update-ai-engine-software/
├── src/                     # Python source modules
│   ├── linux/               # Linux-specific sub-agents
│   └── windows/             # Windows-specific sub-agents
├── tests/                   # Pytest test suite
│   └── mocks/               # Mock helpers for integration tests
├── executables/             # Node-specific and editable server launchers
├── config/
│   └── default_config.yaml  # Default configuration
├── conductor/               # 7D Conductor project tracking
│   └── tracks/T-001-agent-core/
├── documentation/           # Investigation notes and run analyses
├── Gillsystems_logo_stuff/  # Branding and donation assets
├── logs/                    # Runtime logs written by launchers/bootstrap
├── state/                   # Last-run and resume state artifacts
├── bootstrap-linux.sh       # Linux bootstrap / venv / sudo warm-up
├── bootstrap.ps1            # Windows bootstrap / dependency install
├── CHANGELOG.md             # Release history
├── Gillsystems-update-ai-engine-software.code-workspace
├── UserGuide.md             # Extended user-facing documentation
├── update-ai-stack.bat      # Windows launcher (UAC elevation)
├── update-ai-stack.sh       # Linux launcher (sudo elevation)
├── requirements.txt         # Runtime dependencies
├── pyproject.toml           # Project metadata + packaging
└── README.md                # This file

User Guide

For detailed information on the agent architecture, team composition, configuration, and internal workings, see UserGuide.md.


💖 Support / Donate

If you find this project helpful, you can support ongoing work — thank you!

PayPal QR code Venmo QR code

Donate:


Gillsystems logo with QR codes and icons

PayPal Venmo

About

Automate your local AMD AI stack with zero friction. An invocation-only agent providing reboot-resilient updates, hardware auto-detection, and optimized compilation of llama.cpp across Windows and Linux nodes—from the RX 7900 XTX to the Steam Deck.

Topics

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors