Production-grade multi-agent orchestration framework for GitHub Copilot CLI.
A specialized multi-agent system that enforces Principal Software Engineer standards through the GitHub Copilot CLI. Tasks are decomposed and routed to dedicated agents — each with constrained tools, a focused model, and a single responsibility — coordinated by a master orchestrator through defined pipelines.
Key principles:
- No agent modifies code AND approves it
- Every production change passes through ≥2 agents
- Security findings at CRITICAL severity halt all pipelines
- The operator is always the final authority
graph TD
O["🎯 ORCHESTRATOR<br/><sub>claude-opus-4.6 · coordinator</sub>"]
O --> A["🏗️ ARCHITECT<br/><sub>opus · R/O</sub>"]
O --> D["💻 DEVELOPER<br/><sub>codex · R/W/X</sub>"]
O --> S["🔒 SECURITY<br/><sub>opus · R/X</sub>"]
O --> Q["🧪 QA<br/><sub>gpt-5.4 · R/W</sub>"]
O --> R["📊 REVIEWER<br/><sub>opus · R/O</sub>"]
O --> RE["🔬 RESEARCH<br/><sub>opus · R/Web</sub>"]
style O fill:#1a1a2e,stroke:#e94560,color:#fff,stroke-width:2px
style A fill:#16213e,stroke:#0f3460,color:#fff
style D fill:#16213e,stroke:#0f3460,color:#fff
style S fill:#16213e,stroke:#0f3460,color:#fff
style Q fill:#16213e,stroke:#0f3460,color:#fff
style R fill:#16213e,stroke:#0f3460,color:#fff
style RE fill:#16213e,stroke:#0f3460,color:#fff
Pipelines:
| Pipeline | Flow | Use Case |
|---|---|---|
| A (Full) | architect → developer → security → qa → reviewer | New features |
| B (Quick) | developer → reviewer | Minor fixes |
| C (Security) | security → developer → security → reviewer | Security-critical |
| D (Research) | research → architect → developer → qa → reviewer | Investigation-first |
| E (Hotfix) | developer → security → reviewer | Emergency fixes |
| F (Refactor) | architect → developer → qa → reviewer | Structural changes |
| Agent | Model | Role | Write Access |
|---|---|---|---|
| orchestrator | claude-opus-4.6 | Pipeline coordination & delegation | No |
| architect | claude-opus-4.6 | System design & blast radius analysis | No |
| developer | claude-opus-4.6 / gpt-5.3-codex | Implementation & V1-V7 verification (dual-model) | Yes |
| security | claude-opus-4.6 | Vulnerability scanning & STRIDE | No |
| qa | gpt-5.4 | Test creation & edge case coverage | Yes (tests) |
| reviewer | claude-opus-4.6 | Final quality gate & ship decision | No |
| research | claude-opus-4.6 | Deep technical investigation | No |
| general-pro | claude-opus-4.6 | Complex multi-step tasks (replaces built-in) | Yes |
| task-pro | gpt-5.3-codex | Build/test execution with brief output | Yes |
| explore-pro | gpt-5.4 | Codebase exploration & parallel research | No |
| Skill | Type | Purpose |
|---|---|---|
code-review |
Elite Original | 5-Eye review procedure with automated checks |
security-audit |
Elite Original | Scanning + manual checklist + STRIDE threat modeling |
testing |
Elite Original | Testing pyramid, edge case matrix, coverage targets |
deploy |
Elite Original | Pre-flight, deploy, post-deploy, rollback procedures |
memory |
Elite Original | Session/project/decision memory + ADR across sessions |
cross-validation |
Elite Original | Second-pass validation for security-sensitive flows |
api-design |
PHOENIX TIER 1 | REST API design patterns (resource naming, pagination, errors) |
e2e-testing |
PHOENIX TIER 1 | Playwright E2E testing (Page Objects, CI/CD, artifacts) |
search-first |
PHOENIX TIER 1 | Anti-reinvention: search for existing tools before coding |
codebase-onboarding |
PHOENIX TIER 1 | Repo onboarding & architecture exploration guide |
database-migrations |
PHOENIX TIER 1 | Zero-downtime DB migrations (Prisma, Drizzle, Django, etc.) |
blueprint |
PHOENIX TIER 1 | Multi-session plan generator with adversarial review gate |
security-scan |
PHOENIX TIER 1 | Agent config security scan (AgentShield) |
context-budget |
PHOENIX TIER 1 | Token overhead audit across skills, agents, and rules |
| Skill | Purpose |
|---|---|
github-ops |
gh CLI workflows |
iterative-retrieval |
Subagent context refinement |
benchmark |
Performance baselining |
strategic-compact |
Compaction timing strategy |
Total: 18 skills available (14 user-owned + 4 TIER 2 from ECC). Governance:
config/skill-governance.md
# Clone
git clone git@github.com:Evil-Null/copilot-elite-system.git
cd copilot-elite-system
# Install to ~/.copilot/
bash install.sh
# Verify
bash ~/.copilot/scripts/memory.sh status
bash ~/.copilot/scripts/guardrails.sh report# Create directories
mkdir -p ~/.copilot/{agents,scripts,templates}
mkdir -p ~/.copilot/skills/{code-review,security-audit,testing,deploy,memory,cross-validation,api-design,e2e-testing,search-first,codebase-onboarding,database-migrations,blueprint,security-scan,context-budget}
mkdir -p ~/.copilot/memory/{projects,decisions,sessions}
# Copy files
cp agents/* ~/.copilot/agents/
cp -r skills/* ~/.copilot/skills/
cp scripts/* ~/.copilot/scripts/
cp templates/* ~/.copilot/templates/
cp memory/README.md ~/.copilot/memory/
cp copilot-instructions.md ~/.copilot/
# Set permissions
chmod +x ~/.copilot/scripts/*.sh# Initialize memory for your project
bash ~/.copilot/scripts/memory.sh init my-project
# Run through the orchestrator (recommended)
copilot --agent orchestrator -p "Add rate limiting middleware. Use Pipeline A."
# Safe autopilot
bash ~/.copilot/scripts/autopilot.sh --agent developer "Implement feature X"
⚠️ Direct invocation (copilot --agent <name>) bypasses pipeline governance. Use orchestrator for production work.
| Protection | Default | Override |
|---|---|---|
| Max steps | 15 | --max-steps N |
| Max files changed | 30 | --max-files N |
| Git push | Blocked | --allow-push |
| File deletion | Blocked | --allow-delete |
git reset --hard |
Always blocked | No override |
Copy to .github/workflows/ in any repository:
| Template | Trigger | Purpose |
|---|---|---|
copilot-review.yml |
PR | Automated code review |
copilot-security.yml |
PR + weekly | Security audit |
copilot-quality.yml |
PR + push | Quality gate |
| Model | Multiplier | Used By |
|---|---|---|
| claude-opus-4.6 | 3× | orchestrator, architect, security, reviewer, research, developer (plan/verify/debug) |
| gpt-5.3-codex | 1× | developer (code generation) |
| gpt-5.4 | 1× | qa |
Top-tier models only — Sonnet, Haiku, and previous-gen models are not permitted.
- Full 5-Eye pipeline (A): ~14-20 premium requests
- Quick review (B): ~7-9 premium requests
- All pipeline costs: See REGISTRY.md for Pipelines C, D, E, F estimates
- GitHub Copilot Pro+ subscription
- Copilot CLI v1.0.23+
- Bash 4+
Everything Claude Code (ECC) v1.10.0 is integrated as an additive capability layer.
graph TD
subgraph L1["🔴 LAYER 1 — ELITE CORE (AUTHORITATIVE)"]
L1a["10 agents · 14 skills · 6 pipelines · 6 scripts<br/>copilot-instructions.md = SUPREME"]
end
subgraph L2["🟡 LAYER 2 — GOVERNANCE"]
L2a["hooks/hooks.json — 12 curated hooks<br/>config/ecc-governance.md — 3-tier policy<br/>rules/*.md — 10 lang + 1 common-security"]
end
subgraph L3["🟢 LAYER 3 — ECC PLUGIN (ADDITIVE)"]
L3a["47 agents (ecc: prefix) · 181 skills<br/>READ-ONLY — never modified"]
end
L1 --> L2 --> L3
style L1 fill:#2d0000,stroke:#ff4444,color:#fff,stroke-width:2px
style L2 fill:#2d2d00,stroke:#ffcc00,color:#fff,stroke-width:2px
style L3 fill:#002d00,stroke:#44ff44,color:#fff,stroke-width:2px
style L1a fill:#1a1a2e,stroke:#e94560,color:#fff
style L2a fill:#1a1a2e,stroke:#e2b714,color:#fff
style L3a fill:#1a1a2e,stroke:#44ff44,color:#fff
| Capability | Count | Detail |
|---|---|---|
| Curated hooks | 12 | 9 ECC-adapted + 3 custom (model guard, pipeline audit, guardrails) |
| Approved agents | 28 | Language reviewers, build resolvers, specialists |
| Restricted agents | 11 | Owner-explicit only |
| Banned agents | 8 | Irrelevant or redundant |
| Language rules | 10 + common | TypeScript, Python, Go, Rust, Java, C#, C/C++, Kotlin, PHP, Shell + common-security baseline |
Model enforcement: Only claude-opus-4.6, gpt-5.3-codex, gpt-5.4 — enforced by model-policy-guard.js (exit 1 on banned model).
Governance: config/ecc-governance.md
PHOENIX cherry-picked 12 high-value skills from ECC v1.10.0 through a challenge-hardened process (3 independent review agents, 37 findings resolved).
| Tier | Count | Description |
|---|---|---|
| Elite Original | 6 | Core review, security, testing, deploy, memory, cross-validation |
| PHOENIX TIER 1 | 8 | Adapted from ECC — paths/models/governance remapped |
| TIER 2 (ECC) | 4 | Approved as-is from ECC plugin |
| Total | 18 | ~22,845 tokens (11.4% of 200K context) |
Governance: config/skill-governance.md
TITAN expanded language-specific security and style rules from 4 to 10 languages with a shared zero-trust security baseline. Challenge-hardened by 3 independent agents (34 findings resolved).
rules/
├── common-security.md ← Shared zero-trust baseline (20 rules, loaded for ALL languages)
├── typescript.md ← TS/JS: XSS, prototype pollution, child_process
├── python.md ← Python: pickle, eval/exec, subprocess
├── go.md ← Go: text/template XSS, InsecureSkipVerify
├── rust.md ← Rust: unsafe audit, cargo-audit/cargo-deny
├── java.md ← Java: ObjectInputStream, SpEL, Log4Shell
├── csharp.md ← C#: TypeNameHandling, over-posting, Razor XSS
├── cpp.md ← C/C++: buffer overflow, use-after-free, ASAN
├── kotlin.md ← Kotlin: Jackson polymorphic, coroutine safety
├── php.md ← PHP: unserialize, type juggling, file inclusion
└── shell.md ← Shell: eval injection, quoting, ShellCheck
| Metric | Value |
|---|---|
| Total files | 11 (1 common + 10 language) |
| Total lines | 921 |
| Max per file | 96 lines (all ≤150 cap) |
| Security model | Common baseline + language-specific delta (no duplication) |
| Format | Blockquote (> Applies to:) — no YAML frontmatter |
Each language file has 4 mandatory sections: Style, Patterns, Security (delta only), Testing.
The Security section contains only language-specific vulnerabilities — universal rules live in common-security.md.
agents/AGENTS.md— Operating contract (root authority)memory/README.md— Memory system documentationtemplates/README.md— CI template usage guide
| Problem | Solution |
|---|---|
copilot command not found |
Install Copilot CLI: npm i -g @anthropic-ai/copilot-cli |
| Agent not recognized | Run bash install.sh from the copilot-elite-system directory |
| Lock timeout | Stale lock — run ~/.copilot/scripts/memory.sh status to detect |
| Permission denied on scripts | Run chmod +x ~/.copilot/scripts/*.sh |
| Autopilot timeout | Increase with export AUTOPILOT_TIMEOUT=600 (seconds) |
- No pipeline enforcement — Agents can be invoked directly (
copilot --agent <name>), bypassing orchestrator governance. The system relies on agents self-checking, which is advisory. - Cross-validation is instruction-based — No runtime code detects or enforces cross-validation triggers. Compliance depends on LLM behavior.
- Deny-tool patterns are convention — Copilot CLI's deny-tool depends on the CLI honoring patterns. No sandbox enforcement.
- Memory is unencrypted plaintext — Do not store secrets in memory files.
- No automated pipeline testing — Correctness relies on manual verification and agent self-checks.
- Cost estimates are approximate — See
agents/REGISTRY.mdfor details. Actual costs vary by task complexity. - A/B model comparison is manual — No automated framework for comparing model performance.