Skip to content

dmmdea/Quick-Agent-Deployer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Quick Agent Deployer v2.0.0

Deploy a functional local AI agent with maximum capabilities and minimum friction. Auto-profiles hardware, selects optimal models, installs infrastructure, generates agent scripts, and verifies everything works.

Time to working agent: ~15 minutes (Level 1 / SmolAgents) / ~25 minutes (Level 2 / CrewAI)

What's New in v2.0

v2 is a ground-up redesign based on a retrospective of 28 errors across 6 categories from v1 deployments. The most critical new category was multi-agent cross-contamination — agents clobbering each other's configs, ports, and shared infrastructure.

  • Phase 0 DISCOVER — scans the environment before touching anything. Detects existing agents, running services, port conflicts, and WSL2 issues. Never existed in v1.
  • Guest mode — deploying alongside an existing agent automatically activates strict isolation: own directory, own ports, additive-only shared configs, no mutation of existing infrastructure.
  • Crash-resumable state — every phase writes an append-only events array. Resume mid-deployment from where it left off.
  • Fixed model string normalization — v1 silently used wrong model string formats. v2 enforces: ollama_chat/ for SmolAgents, ollama/ for CrewAI, ChatOllama(model=...) for LangGraph.
  • Fixed CrewAI tool format — v1 generated plain functions that CrewAI silently ignored. v2 uses @tool decorated functions (from crewai.tools import tool) for CrewAI >= 1.14.
  • Cross-contamination audit in VERIFY — dedicated Phase 4 check: file boundary scan, existing agent health recheck, shared config integrity, port conflict recheck.
  • KV cache tiers — q8_0 default (~2x context), q4_0 standard with sanity check, tq3_0 experimental with mandatory 5-test quality gate and auto-revert.
  • Background polling — long installs run as background processes polled by Claude. No more script handoff to the user.
  • Persistent registries~/qad-agents/.registry.toml and ~/qad-agents/.ports.json track all deployed agents across sessions.

Architecture

Phase 0 DISCOVER  ->  Phase 1 DETECT  ->  Phase 2 INSTALL
                                                |
Phase 5 CONNECT  <-  Phase 4 VERIFY  <-  Phase 3 SCAFFOLD

Standalone ops (any time, non-destructive):
  Model Swap  |  Mode Swap (Primary <-> Guest)  |  KV Cache Swap
Phase What happens
DISCOVER Registry fast path, environment fingerprinting, existing agent detection, WSL2 pre-flight, mode decision
DETECT GPU/RAM probe, user profiling, model selection (Executor + Reasoner), KV cache recommendation
INSTALL Runtime setup, model pulls with background polling, llama-swap config, quality gates
SCAFFOLD agent-config.json, agent.py or crew.py + YAML, tool implementations, memory hub, launch/stop scripts
VERIFY Health checks, warm-up + benchmark, cross-contamination audit, verify report
CONNECT Extensions menu, orchestrator hookup (OpenAI-compatible + Zora SSE), OpenWebUI pipe, DEPLOYMENT.md

Deployment Modes

Mode When Isolation
Primary First agent on this machine Full ownership of shared services
Guest Existing agent(s) already present Own dir, own ports, read-only shared services, additive-only configs
Connected Joining an existing orchestrator Registered as a service, full port map

Stack

Layer Component Purpose
Inference (standard) Ollama Models that fit fully in memory
Inference (stretch) llama-server Precise tensor splitting; required for MoE models (--cpu-moe)
Model routing llama-swap On-demand Reasoner loading without keeping it hot
Interface OpenWebUI Browser chat UI, auto-configured pipe file
Agent L1 SmolAgents Single-agent, code-generation, fast iteration
Agent L2 CrewAI >= 1.14 Multi-agent with role-based delegation
Cloud fallback Gemini 2.5 Flash Optional overflow for reasoning-heavy tasks
Web search SearXNG Local, privacy-preserving search (Docker)

Model Catalog (v2)

VRAM Tier Executor Reasoner Level
<=4GB SmolLM2-1.7B, Qwen3-0.6B Gemma3-1B L1
6-8GB Qwen3-8B, Gemma3-4B Qwen3-8B L1
10-12GB Qwen3-8B (q4_K_M) Gemma3-12B, Qwen3-14B L1-L2
16-20GB Qwen3-14B Gemma4-27B, DeepSeek-R1-32B L2
24GB+ Qwen3-14B, Phi-4 Mini Gemma4-27B, Qwen3-32B L2
48GB+ / unified Gemma4-27B DeepSeek-R1-70B L2

MoE models (Qwen3-30B-A3B, DeepSeek-R1-671B) require --cpu-moe via llama-server.

Project Structure

Quick-Agent-Deployer/
+-- README.md
+-- CHANGELOG.md
+-- LICENSE
+-- docs/
|   +-- audits/
|       +-- v1.0-initial-audit.md
|       +-- v1.1-post-remediation.md
|       +-- v1.2-audit-prompt.md
+-- .claude/skills/
    +-- quick-agent-deployer/
        +-- SKILL.md                      Coordinator (~250 lines)
        +-- INSTALL.md                    Setup guide
        +-- references/
            +-- phase-discover.md         Phase 0: Environment scan
            +-- phase-detect.md           Phase 1: Hardware + model selection
            +-- phase-install.md          Phase 2: Runtime + quality gates
            +-- phase-scaffold.md         Phase 3: Agent code generation
            +-- phase-verify.md           Phase 4: Health + contamination audit
            +-- phase-connect.md          Phase 5: Extensions + orchestration
            +-- standalone-ops.md         Model swap, mode swap, KV cache swap
            +-- hardware-profiles.md      VRAM tiers -> model decision matrix
            +-- optimization-playbook.md  llama-server flags, KV cache, edge cases
            +-- agent-scaffolds.md        Complete code templates (all frameworks)
            +-- environment-signatures.md Agent fingerprints for DISCOVER

Usage

In Cowork / Claude Desktop

  1. Open a Cowork session with this project folder selected
  2. Say "Deploy a local agent" — DISCOVER runs first, then routes based on what's found
  3. Follow the phase-by-phase walkthrough
  4. Resume any time — state is on disk in JSON event logs

Trigger phrases

deploy agent - local AI - run models locally - set up Ollama - agent deployment - quick agent - deploy a local agent - local LLM setup - SmolAgents - CrewAI setup

Standalone operations (trigger any time)

  • "Swap my model" or "switch to a faster model" -> model swap procedure
  • "Switch to guest mode" -> mode swap
  • "Enable q4_0 cache" or "maximize context" -> KV cache swap

Key Design Decisions

Discover before deploying. v1 jumped straight to hardware detection. v2 checks for existing agents first — cross-contamination was the most common real-world failure.

Guest by default when agents exist. If any existing agent is detected, Guest mode activates automatically. The user can promote to Primary, but the safe default protects existing work.

Additive only in Guest mode. Shared configs (llama-swap, systemd, port maps) are never overwritten — only appended.

Model strings are per-framework. SmolAgents requires ollama_chat/, CrewAI requires ollama/. These look similar but produce silent failures when wrong.

Background polling. start_process + polling loop replaces "here's a script to run." Claude handles the wait.

License

MIT

About

[v1.2] Cowork skill that deploys local AI agents. 18+ models, multi-provider cloud overflow, potato-to-Grace Hopper hardware support. Ollama + llama-server + llamafile.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors