Quick Agent Deployer v2.0.0

Deploy a functional local AI agent with maximum capabilities and minimum friction. Auto-profiles hardware, selects optimal models, installs infrastructure, generates agent scripts, and verifies everything works.

Time to working agent: ~15 minutes (Level 1 / SmolAgents) / ~25 minutes (Level 2 / CrewAI)

What's New in v2.0

v2 is a ground-up redesign based on a retrospective of 28 errors across 6 categories from v1 deployments. The most critical new category was multi-agent cross-contamination — agents clobbering each other's configs, ports, and shared infrastructure.

Phase 0 DISCOVER — scans the environment before touching anything. Detects existing agents, running services, port conflicts, and WSL2 issues. Never existed in v1.
Guest mode — deploying alongside an existing agent automatically activates strict isolation: own directory, own ports, additive-only shared configs, no mutation of existing infrastructure.
Crash-resumable state — every phase writes an append-only events array. Resume mid-deployment from where it left off.
Fixed model string normalization — v1 silently used wrong model string formats. v2 enforces: ollama_chat/ for SmolAgents, ollama/ for CrewAI, ChatOllama(model=...) for LangGraph.
Fixed CrewAI tool format — v1 generated plain functions that CrewAI silently ignored. v2 uses @tool decorated functions (from crewai.tools import tool) for CrewAI >= 1.14.
Cross-contamination audit in VERIFY — dedicated Phase 4 check: file boundary scan, existing agent health recheck, shared config integrity, port conflict recheck.
KV cache tiers — q8_0 default (~2x context), q4_0 standard with sanity check, tq3_0 experimental with mandatory 5-test quality gate and auto-revert.
Background polling — long installs run as background processes polled by Claude. No more script handoff to the user.
Persistent registries — ~/qad-agents/.registry.toml and ~/qad-agents/.ports.json track all deployed agents across sessions.

Architecture

Phase 0 DISCOVER  ->  Phase 1 DETECT  ->  Phase 2 INSTALL
                                                |
Phase 5 CONNECT  <-  Phase 4 VERIFY  <-  Phase 3 SCAFFOLD

Standalone ops (any time, non-destructive):
  Model Swap  |  Mode Swap (Primary <-> Guest)  |  KV Cache Swap

Phase	What happens
DISCOVER	Registry fast path, environment fingerprinting, existing agent detection, WSL2 pre-flight, mode decision
DETECT	GPU/RAM probe, user profiling, model selection (Executor + Reasoner), KV cache recommendation
INSTALL	Runtime setup, model pulls with background polling, llama-swap config, quality gates
SCAFFOLD	agent-config.json, agent.py or crew.py + YAML, tool implementations, memory hub, launch/stop scripts
VERIFY	Health checks, warm-up + benchmark, cross-contamination audit, verify report
CONNECT	Extensions menu, orchestrator hookup (OpenAI-compatible + Zora SSE), OpenWebUI pipe, DEPLOYMENT.md

Deployment Modes

Mode	When	Isolation
Primary	First agent on this machine	Full ownership of shared services
Guest	Existing agent(s) already present	Own dir, own ports, read-only shared services, additive-only configs
Connected	Joining an existing orchestrator	Registered as a service, full port map

Stack

Layer	Component	Purpose
Inference (standard)	Ollama	Models that fit fully in memory
Inference (stretch)	llama-server	Precise tensor splitting; required for MoE models (`--cpu-moe`)
Model routing	llama-swap	On-demand Reasoner loading without keeping it hot
Interface	OpenWebUI	Browser chat UI, auto-configured pipe file
Agent L1	SmolAgents	Single-agent, code-generation, fast iteration
Agent L2	CrewAI >= 1.14	Multi-agent with role-based delegation
Cloud fallback	Gemini 2.5 Flash	Optional overflow for reasoning-heavy tasks
Web search	SearXNG	Local, privacy-preserving search (Docker)

Model Catalog (v2)

VRAM Tier	Executor	Reasoner	Level
<=4GB	SmolLM2-1.7B, Qwen3-0.6B	Gemma3-1B	L1
6-8GB	Qwen3-8B, Gemma3-4B	Qwen3-8B	L1
10-12GB	Qwen3-8B (q4_K_M)	Gemma3-12B, Qwen3-14B	L1-L2
16-20GB	Qwen3-14B	Gemma4-27B, DeepSeek-R1-32B	L2
24GB+	Qwen3-14B, Phi-4 Mini	Gemma4-27B, Qwen3-32B	L2
48GB+ / unified	Gemma4-27B	DeepSeek-R1-70B	L2

MoE models (Qwen3-30B-A3B, DeepSeek-R1-671B) require --cpu-moe via llama-server.

Project Structure

Quick-Agent-Deployer/
+-- README.md
+-- CHANGELOG.md
+-- LICENSE
+-- docs/
|   +-- audits/
|       +-- v1.0-initial-audit.md
|       +-- v1.1-post-remediation.md
|       +-- v1.2-audit-prompt.md
+-- .claude/skills/
    +-- quick-agent-deployer/
        +-- SKILL.md                      Coordinator (~250 lines)
        +-- INSTALL.md                    Setup guide
        +-- references/
            +-- phase-discover.md         Phase 0: Environment scan
            +-- phase-detect.md           Phase 1: Hardware + model selection
            +-- phase-install.md          Phase 2: Runtime + quality gates
            +-- phase-scaffold.md         Phase 3: Agent code generation
            +-- phase-verify.md           Phase 4: Health + contamination audit
            +-- phase-connect.md          Phase 5: Extensions + orchestration
            +-- standalone-ops.md         Model swap, mode swap, KV cache swap
            +-- hardware-profiles.md      VRAM tiers -> model decision matrix
            +-- optimization-playbook.md  llama-server flags, KV cache, edge cases
            +-- agent-scaffolds.md        Complete code templates (all frameworks)
            +-- environment-signatures.md Agent fingerprints for DISCOVER

Usage

In Cowork / Claude Desktop

Open a Cowork session with this project folder selected
Say "Deploy a local agent" — DISCOVER runs first, then routes based on what's found
Follow the phase-by-phase walkthrough
Resume any time — state is on disk in JSON event logs

Trigger phrases

deploy agent - local AI - run models locally - set up Ollama - agent deployment - quick agent - deploy a local agent - local LLM setup - SmolAgents - CrewAI setup

Standalone operations (trigger any time)

"Swap my model" or "switch to a faster model" -> model swap procedure
"Switch to guest mode" -> mode swap
"Enable q4_0 cache" or "maximize context" -> KV cache swap

Key Design Decisions

Discover before deploying. v1 jumped straight to hardware detection. v2 checks for existing agents first — cross-contamination was the most common real-world failure.

Guest by default when agents exist. If any existing agent is detected, Guest mode activates automatically. The user can promote to Primary, but the safe default protects existing work.

Additive only in Guest mode. Shared configs (llama-swap, systemd, port maps) are never overwritten — only appended.

Model strings are per-framework. SmolAgents requires ollama_chat/, CrewAI requires ollama/. These look similar but produce silent failures when wrong.

Background polling. start_process + polling loop replaces "here's a script to run." Claude handles the wait.

License

MIT

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Quick Agent Deployer v2.0.0

What's New in v2.0

Architecture

Deployment Modes

Stack

Model Catalog (v2)

Project Structure

Usage

In Cowork / Claude Desktop

Trigger phrases

Standalone operations (trigger any time)

Key Design Decisions

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.claude/skills		.claude/skills
docs		docs
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

Quick Agent Deployer v2.0.0

What's New in v2.0

Architecture

Deployment Modes

Stack

Model Catalog (v2)

Project Structure

Usage

In Cowork / Claude Desktop

Trigger phrases

Standalone operations (trigger any time)

Key Design Decisions

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Packages