microsoft · brrusino · Jun 5, 2026 · Jun 5, 2026
diff --git a/AGENTS.md b/AGENTS.md
@@ -76,6 +76,11 @@ make validate-examples    # validate all examples
   - `loader.py` - YAML parsing with environment variable resolution (${VAR:-default}) and `!file` tag support
   - `validator.py` - Cross-reference validation (agent names, routes, parallel groups)
 
+- **skills/**: Skill registry and loader (opt-in, bundled skill content)
+  - `registry.py` - Resolves built-in skill names to on-disk directories (probes editable-install + wheel-install layouts)
+  - `loader.py` - Reads `SKILL.md` + `references/*.md` for providers that require eager preamble injection; wraps each skill in `<skill name="...">` tags inside a `<skills>` envelope
+  - Built-in skills live under `plugins/conductor/skills/<name>/` (bundled into the wheel via hatchling `force-include`)
+
 - **engine/**: Workflow execution orchestration
   - `workflow.py` - Main `WorkflowEngine` class that orchestrates agent execution, parallel groups, for-each groups, and routing
   - `context.py` - `WorkflowContext` manages accumulated agent outputs with three modes: accumulate, last_only, explicit
@@ -135,6 +140,7 @@ make validate-examples    # validate all examples
 - **Tool resolution**: `null` = all workflow tools, `[]` = none, `[list]` = subset
 - **Set step typing**: `output_type` defaults to `auto` (safe YAML parse with `_to_json_safe` normalisation — `datetime`/`date`/`time` → ISO 8601, non-string dict keys and other non-JSON-safe values raise `ExecutionError`). Explicit `string`/`number`/`integer`/`boolean`/`list`/`dict` only valid on single `value:`. `WorkflowContext.store` accepts any JSON-safe value (scalars/lists from `set` steps in addition to the dicts produced by LLM / script / gate / parallel-group outputs); `_add_agent_input` returns the scalar verbatim for `step.output` and raises a clear `KeyError` for `step.output.field` shorthand on non-dict outputs.
 - **Reasoning effort**: `runtime.default_reasoning_effort` sets a workflow-wide default; per-agent `reasoning.effort` overrides it. Allowed values: `low`, `medium`, `high`, `xhigh`. Each provider translates the unified value to its native API (Copilot: `reasoning_effort` on the session, validated against the model's `supported_reasoning_efforts`; Claude: extended thinking with budget mapping low=2048, medium=8192, high=16384, xhigh=32768 tokens, with `temperature` coerced to 1.0 and `max_tokens` bumped to fit the budget). See `examples/reasoning-effort.yaml`.
+- **Skills**: `runtime.skills: [name, ...]` sets a workflow-wide default list of skills enabled for every provider-backed agent; per-agent `skills: [name, ...]` overrides it (tri-state via list presence: omitted = inherit, `skills: []` = explicit opt-out, `skills: [name, ...]` = explicit set). Skill names must resolve to a registered built-in (currently just `conductor`). The observable contract is the same across providers — *"the agent has access to the named skill"* — but the mechanism differs by provider via `AgentProvider.supports_native_skills`: **Copilot** (`True`) registers the skill directory on the SDK session via `skill_directories`, so the agent discovers and loads skill content natively (progressive disclosure via `SKILL.md` frontmatter); **Claude** and **Claude Agent SDK** (`False`) eagerly inject every enabled skill's `SKILL.md` plus `references/*.md` into the agent's rendered prompt inside `<skills><skill name="...">...</skill></skills>` tags. Providers also declare `skills: bool` on their `ProviderCapabilities` descriptor so `conductor validate` can catch skills-against-unsupported-provider mismatches. Built-in skills live under `plugins/conductor/skills/<name>/` and are bundled into the wheel via the hatchling `force-include` entry in `pyproject.toml`. Skills are rejected on non-provider-backed step types (script, wait, set, terminate, workflow, human_gate). See `examples/skills-self-improving-workflow.yaml`.
 - **Terminate steps** (`type: terminate`): explicit terminal step with `status` (`success` | `failed`), Jinja2 `reason`, and optional `output_template` (a `dict[str, str]` that replaces `workflow.output:` when set; each value is rendered then passed through `_maybe_parse_json` so `"true"` becomes `True`, `"42"` becomes `42`, JSON literals are parsed). Reaching a terminate step ends the workflow immediately (no routes evaluated after). `success` → CLI exit 0, dashboard ✅, `workflow_completed { termination_reason, terminated_by, is_explicit: true, status }`; runs `on_complete` hook. `failed` → CLI exit 1 (with rendered output JSON still printed to stdout for downstream tooling), dashboard ❌, raises `WorkflowTerminated` (subclass of `ExecutionError`), emits `workflow_failed { error_type: "WorkflowTerminated", is_explicit: true, status, output }`, runs `on_error` hook, and **does not** save an on-failure checkpoint (explicit terminations are intentionally non-resumable). Terminate steps cannot have `routes`, `tools`, `output`, `prompt`, `model`, etc.; cannot be used as parallel-group members or as a for_each inline agent (route to one from those groups' `routes:` instead). Inside a sub-workflow, a `status: failed` terminate is downgraded at the parent boundary to `SubworkflowTerminatedError` (also a subclass of `ExecutionError`) preserving the child's rendered `terminated_output` / `terminated_reason` / `terminated_by` as structured attributes — the parent treats it as a normal sub-workflow failure (its own `workflow_failed` does NOT inherit `is_explicit: true`). For more detail see `examples/terminate.yaml`, `docs/workflow-syntax.md` (Terminate Steps section), and `plugins/conductor/skills/conductor/references/authoring.md`.
 - **Structured `runtime.provider` (Copilot custom routing)**: `runtime.provider` accepts either the bare string shorthand (`provider: copilot`) or a structured `ProviderSettings` object that routes the Copilot SDK at OpenAI-compatible / Azure / Anthropic endpoints (Ollama, vLLM, LM Studio, Azure OpenAI, etc.). Object fields: `name` (defaults to `copilot`), `type` (`openai`|`azure`|`anthropic`), `wire_api` (`completions`|`responses`), `base_url`, `api_key`, `bearer_token`, `headers`, `azure.api_version`. `api_key` and `bearer_token` are `SecretStr` (redacted in `model_dump` / dashboard / event logs). The model is frozen after construction. Custom routing activates only when at least one non-`name` field is set in YAML — ambient `OPENAI_*` env vars never divert default routing on their own. Once activated, missing fields fall back from env vars in this order: `base_url` ← `COPILOT_PROVIDER_BASE_URL` → `OPENAI_BASE_URL`; `api_key` ← `COPILOT_PROVIDER_API_KEY` (only — ambient `OPENAI_API_KEY` is intentionally NOT a fallback to avoid credential leaks); `bearer_token` ← `COPILOT_PROVIDER_BEARER_TOKEN`. The schema rejects every non-`name` field when `name != "copilot"` (structured config for other providers is a follow-up). It also rejects anchorless / broken combinations that would silently no-op at the SDK boundary: `wire_api` / `type` / `headers` / `azure` cannot stand alone without `base_url` / `api_key` / `bearer_token`; empty `headers`, empty `SecretStr`, and `azure: {api_version: null}` are rejected. The resolver raises `ProviderError` when custom routing is activated but every resolved field is falsy (e.g. expected env vars all unset). Custom routing applies to both agent execution and dialog turns so all sessions hit the same endpoint. `--provider <name>` CLI override replaces the whole `ProviderSettings` (logs a notice when YAML had structured fields). See `examples/copilot-local-llm.yaml`.
 
@@ -170,6 +176,7 @@ Tests mirror source structure in `tests/`:
 - `test_providers/` - Provider implementation tests
 - `test_integration/` - Full workflow execution tests
 - `test_gates/` - Human gate tests
+- `test_skills/` - Skill registry, loader, schema field, and executor-integration tests
 
 Use `pytest.mark.performance` for performance tests (exclude with `-m "not performance"`).
 

diff --git a/examples/skills-self-improving-workflow.yaml b/examples/skills-self-improving-workflow.yaml
@@ -0,0 +1,161 @@
+# Self-improving workflow using the `conductor` skill
+#
+# This example shows how the generalized `skills:` capability gives an
+# agent reusable, version-accurate knowledge of Conductor's YAML
+# schema, execution model, and authoring patterns. It's the building
+# block that a future `conductor watch` (issue #181) would chain into a
+# closed-loop generate → review → fix cycle. The example demonstrates
+# all three sides of that loop in one pass.
+#
+# Provider mechanism — same observable contract, different mechanism:
+#   * Copilot: the skill directory is registered on the SDK session via
+#     `skill_directories`. The agent discovers and loads SKILL.md and
+#     references on demand (progressive disclosure).
+#   * Claude: SKILL.md + references/*.md are eagerly prepended to the
+#     agent's rendered prompt inside <skills><skill name="conductor">
+#     ... </skill></skills> tags.
+#
+# Tri-state opt-in via list presence:
+#   * Omit `skills:`     → inherit `runtime.skills`
+#   * `skills: []`       → explicit opt-out (e.g. the `formatter` agent
+#                          below, which doesn't need workflow knowledge)
+#   * `skills: [name]`   → explicit set, replaces the workflow default
+#
+# Usage:
+#   conductor run examples/skills-self-improving-workflow.yaml \
+#     --input task="Write a workflow that summarises a GitHub issue."
+
+workflow:
+  name: skills-self-improving-workflow
+  description: |
+    Generate a Conductor workflow, review it for correctness against
+    the bundled `conductor` skill, then revise it based on the review.
+    Demonstrates the building-block pattern for future iterative
+    fix-validate loops (see issue #181).
+  version: "1.0.0"
+  entry_point: author
+
+  runtime:
+    provider: copilot
+    skills: [conductor]   # every provider-backed agent inherits this
+
+  input:
+    task:
+      type: string
+      required: true
+      description: A short description of the workflow the author should generate.
+
+agents:
+  - name: author
+    description: Drafts an initial Conductor workflow for the given task.
+    prompt: |
+      You are authoring a brand-new Conductor workflow.
+
+      Task: {{ workflow.input.task }}
+
+      Produce a complete, runnable workflow YAML that solves the task.
+      Use the `conductor` skill for schema details, allowed step types,
+      routing patterns, and naming conventions. Prefer a minimal,
+      idiomatic design over an exhaustive one.
+    output:
+      workflow_yaml:
+        type: string
+        description: The drafted workflow YAML.
+      notes:
+        type: string
+        description: Author's notes on design choices.
+    routes:
+      - to: reviewer
+
+  - name: reviewer
+    description: Reviews the drafted workflow for schema and design issues.
+    prompt: |
+      Review the following Conductor workflow YAML.
+
+      ```yaml
+      {{ author.output.workflow_yaml }}
+      ```
+
+      Use the `conductor` skill to verify:
+        * Schema correctness (field names, types, required fields)
+        * Routing soundness (no orphan agents, no unreachable routes,
+          terminating routes present)
+        * Sensible context mode and failure-mode choices
+        * Appropriate use of step types (agent / script / set / wait /
+          terminate / parallel / for_each / human_gate / workflow)
+
+      List blocking issues separately from suggestions. If the workflow
+      is already correct and idiomatic, say so plainly and return an
+      empty `issues` list.
+    output:
+      issues:
+        type: string
+        description: Blocking schema or correctness issues. Empty if none.
+      suggestions:
+        type: string
+        description: Non-blocking improvements.
+      verdict:
+        type: string
+        description: One of "approved" or "needs_revision".
+    routes:
+      - to: fixer
+        when: "{{ reviewer.output.verdict == 'needs_revision' }}"
+      - to: formatter
+
+  - name: fixer
+    description: Applies the reviewer's findings and produces a revised workflow.
+    prompt: |
+      The reviewer flagged issues with the drafted workflow.
+
+      Original:
+      ```yaml
+      {{ author.output.workflow_yaml }}
+      ```
+
+      Review findings:
+      - Issues: {{ reviewer.output.issues }}
+      - Suggestions: {{ reviewer.output.suggestions }}
+
+      Use the `conductor` skill to produce a corrected, complete
+      workflow YAML that addresses every blocking issue. Adopt
+      suggestions where they materially improve the design. Output the
+      final YAML and a brief changelog.
+    output:
+      workflow_yaml:
+        type: string
+        description: The revised workflow YAML.
+      changelog:
+        type: string
+        description: Short summary of what changed and why.
+    routes:
+      - to: formatter
+
+  - name: formatter
+    description: Renders the final workflow + report. Doesn't need the skill.
+    # Explicit opt-out: this agent only formats existing strings. No
+    # need to bloat its prompt with the full skill content.
+    skills: []
+    prompt: |
+      Produce a Markdown report containing two sections:
+
+      ## Workflow
+
+      ```yaml
+      {{ fixer.output.workflow_yaml | default(author.output.workflow_yaml) }}
+      ```
+
+      ## Notes
+
+      Original notes: {{ author.output.notes }}
+      Review verdict: {{ reviewer.output.verdict }}
+      {% if fixer.output.changelog is defined %}
+      Changelog: {{ fixer.output.changelog }}
+      {% endif %}
+    output:
+      report:
+        type: string
+        description: The final Markdown report.
+
+output:
+  report: "{{ formatter.output.report }}"
+  verdict: "{{ reviewer.output.verdict }}"
diff --git a/plugins/conductor/skills/conductor/SKILL.md b/plugins/conductor/skills/conductor/SKILL.md
@@ -5,7 +5,7 @@ description: Validate, run, and execute workflows; creating new workflows when e
 
 # Conductor
 
-CLI tool for defining and running multi-agent workflows with the GitHub Copilot SDK, Anthropic Claude, or Claude Agent SDK.
+CLI tool for defining and running multi-agent workflows with the GitHub Copilot SDK or Anthropic Claude.
 
 > **DO NOT create new workflow files unless the user explicitly asks you to create one.** Default to running, validating, or debugging existing workflows. If the user's request is ambiguous, assume they want to run or modify an existing workflow rather than create a new one.
 

diff --git a/plugins/conductor/skills/conductor/references/authoring.md b/plugins/conductor/skills/conductor/references/authoring.md
@@ -20,6 +20,7 @@ workflow:
     max_agent_iterations: 50     # Max tool-use roundtrips per agent (1-500, optional)
     max_session_seconds: 120     # Wall-clock timeout per agent session (optional)
     default_reasoning_effort: medium  # Workflow-wide reasoning effort: low, medium, high, xhigh (optional)
+    skills: [conductor]              # Skills available to every provider-backed agent (optional)
 
   input:                         # Define workflow inputs
     param_name:
@@ -116,6 +117,11 @@ agents:
     reasoning:                   # Override runtime.default_reasoning_effort (optional)
       effort: high               # low, medium, high, or xhigh
 
+    skills: [conductor]          # Skills this agent has access to (optional, tri-state)
+                                 # Omit = inherit runtime.skills; [] = explicit opt-out;
+                                 # [name, ...] = explicit set. Not allowed on
+                                 # script/human_gate/workflow/wait/set/terminate agents.
+
     routes:                      # Where to go next
       - to: next_agent
 ```
@@ -148,6 +154,40 @@ agents:
 
 See `examples/reasoning-effort.yaml` for a complete example.
 
+### Skills
+
+`skills` enables reusable knowledge or capability bundles for provider-backed agents. The Conductor distribution ships one built-in skill — `conductor` — which packages the YAML schema, execution model, and authoring patterns (the same content this reference doc covers) so an agent can evaluate, improve, debug, or generate Conductor workflows.
+
+**Tri-state per-agent field (resolved via list presence):**
+- Omit `skills:` — inherit from `runtime.skills`
+- `skills: []` — explicit opt-out (no skills for this agent, regardless of workflow default)
+- `skills: [name, ...]` — explicit set, replaces the workflow default
+
+**Workflow-wide default:** `runtime.skills: [conductor]` enables it for every provider-backed agent. Individual agents can override.
+
+**Provider mechanism (same observable contract — "the agent has access to the named skill"):**
+- **Copilot** — the resolved skill directory is registered on the SDK session via `skill_directories`, so the agent discovers and loads skill content natively (progressive disclosure via `SKILL.md` frontmatter). This is more token-efficient than eager injection.
+- **Claude** — the loader reads `SKILL.md` plus every `references/*.md` file in the skill directory and prepends them to the agent's rendered prompt inside `<skills><skill name="...">...</skill></skills>` tags. Inserted between workspace instructions and the user prompt.
+
+Not allowed on `script`, `human_gate`, `workflow`, `wait`, `set`, or `terminate` agent types. Unknown skill names fail at workflow validation time.
+
+```yaml
+workflow:
+  runtime:
+    skills: [conductor]             # all provider-backed agents get the conductor skill
+
+agents:
+  - name: workflow_reviewer
+    skills: [conductor]             # per-agent opt-in (redundant here, kept for clarity)
+    prompt: "Review this workflow for correctness..."
+
+  - name: simple_agent
+    skills: []                      # opt out even when runtime default is set
+    prompt: "Do something simple."
+```
+
+See `examples/skills-self-improving-workflow.yaml` for a complete example.
+
 ## Routing Patterns
 
 ### Linear
@@ -696,18 +736,8 @@ agents:
 ### Gate Output
 
 Human gates automatically capture:
-- `output.selected` — the `value` of the chosen option.
-- `output.additional_input` — dict of values collected from `prompt_for` fields.
-  Always present; `{}` when no `prompt_for` was specified or the selected option
-  has no `prompt_for`. Access individual fields via templates as
-  `{{ <gate>.output.additional_input.<field> }}` (for example
-  `{{ approval_gate.output.additional_input.feedback }}` when an option declares
-  `prompt_for: feedback`).
-
-> **`context: explicit` mode note.** `input:` declarations support
-> `<gate>.output.additional_input` (the whole dict) but not the dotted shorthand
-> `<gate>.output.additional_input.<field>`. Declare the parent key and read
-> individual fields via Jinja2 in the agent's prompt or output template.
+- `output.selected` - the `value` of the chosen option
+- `output.feedback` - text input from `prompt_for` (if specified)
 
 ## Context Modes