Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,11 @@ make validate-examples # validate all examples
- `loader.py` - YAML parsing with environment variable resolution (${VAR:-default}) and `!file` tag support
- `validator.py` - Cross-reference validation (agent names, routes, parallel groups)

- **skills/**: Skill registry and loader (opt-in, bundled skill content)
- `registry.py` - Resolves built-in skill names to on-disk directories (probes editable-install + wheel-install layouts)
- `loader.py` - Reads `SKILL.md` + `references/*.md` for providers that require eager preamble injection; wraps each skill in `<skill name="...">` tags inside a `<skills>` envelope
- Built-in skills live under `plugins/conductor/skills/<name>/` (bundled into the wheel via hatchling `force-include`)

- **engine/**: Workflow execution orchestration
- `workflow.py` - Main `WorkflowEngine` class that orchestrates agent execution, parallel groups, for-each groups, and routing
- `context.py` - `WorkflowContext` manages accumulated agent outputs with three modes: accumulate, last_only, explicit
Expand Down Expand Up @@ -135,6 +140,7 @@ make validate-examples # validate all examples
- **Tool resolution**: `null` = all workflow tools, `[]` = none, `[list]` = subset
- **Set step typing**: `output_type` defaults to `auto` (safe YAML parse with `_to_json_safe` normalisation — `datetime`/`date`/`time` → ISO 8601, non-string dict keys and other non-JSON-safe values raise `ExecutionError`). Explicit `string`/`number`/`integer`/`boolean`/`list`/`dict` only valid on single `value:`. `WorkflowContext.store` accepts any JSON-safe value (scalars/lists from `set` steps in addition to the dicts produced by LLM / script / gate / parallel-group outputs); `_add_agent_input` returns the scalar verbatim for `step.output` and raises a clear `KeyError` for `step.output.field` shorthand on non-dict outputs.
- **Reasoning effort**: `runtime.default_reasoning_effort` sets a workflow-wide default; per-agent `reasoning.effort` overrides it. Allowed values: `low`, `medium`, `high`, `xhigh`. Each provider translates the unified value to its native API (Copilot: `reasoning_effort` on the session, validated against the model's `supported_reasoning_efforts`; Claude: extended thinking with budget mapping low=2048, medium=8192, high=16384, xhigh=32768 tokens, with `temperature` coerced to 1.0 and `max_tokens` bumped to fit the budget). See `examples/reasoning-effort.yaml`.
- **Skills**: `runtime.skills: [name, ...]` sets a workflow-wide default list of skills enabled for every provider-backed agent; per-agent `skills: [name, ...]` overrides it (tri-state via list presence: omitted = inherit, `skills: []` = explicit opt-out, `skills: [name, ...]` = explicit set). Skill names must resolve to a registered built-in (currently just `conductor`). The observable contract is the same across providers — *"the agent has access to the named skill"* — but the mechanism differs by provider via `AgentProvider.supports_native_skills`: **Copilot** (`True`) registers the skill directory on the SDK session via `skill_directories`, so the agent discovers and loads skill content natively (progressive disclosure via `SKILL.md` frontmatter); **Claude** and **Claude Agent SDK** (`False`) eagerly inject every enabled skill's `SKILL.md` plus `references/*.md` into the agent's rendered prompt inside `<skills><skill name="...">...</skill></skills>` tags. Providers also declare `skills: bool` on their `ProviderCapabilities` descriptor so `conductor validate` can catch skills-against-unsupported-provider mismatches. Built-in skills live under `plugins/conductor/skills/<name>/` and are bundled into the wheel via the hatchling `force-include` entry in `pyproject.toml`. Skills are rejected on non-provider-backed step types (script, wait, set, terminate, workflow, human_gate). See `examples/skills-self-improving-workflow.yaml`.
- **Terminate steps** (`type: terminate`): explicit terminal step with `status` (`success` | `failed`), Jinja2 `reason`, and optional `output_template` (a `dict[str, str]` that replaces `workflow.output:` when set; each value is rendered then passed through `_maybe_parse_json` so `"true"` becomes `True`, `"42"` becomes `42`, JSON literals are parsed). Reaching a terminate step ends the workflow immediately (no routes evaluated after). `success` → CLI exit 0, dashboard ✅, `workflow_completed { termination_reason, terminated_by, is_explicit: true, status }`; runs `on_complete` hook. `failed` → CLI exit 1 (with rendered output JSON still printed to stdout for downstream tooling), dashboard ❌, raises `WorkflowTerminated` (subclass of `ExecutionError`), emits `workflow_failed { error_type: "WorkflowTerminated", is_explicit: true, status, output }`, runs `on_error` hook, and **does not** save an on-failure checkpoint (explicit terminations are intentionally non-resumable). Terminate steps cannot have `routes`, `tools`, `output`, `prompt`, `model`, etc.; cannot be used as parallel-group members or as a for_each inline agent (route to one from those groups' `routes:` instead). Inside a sub-workflow, a `status: failed` terminate is downgraded at the parent boundary to `SubworkflowTerminatedError` (also a subclass of `ExecutionError`) preserving the child's rendered `terminated_output` / `terminated_reason` / `terminated_by` as structured attributes — the parent treats it as a normal sub-workflow failure (its own `workflow_failed` does NOT inherit `is_explicit: true`). For more detail see `examples/terminate.yaml`, `docs/workflow-syntax.md` (Terminate Steps section), and `plugins/conductor/skills/conductor/references/authoring.md`.
- **Structured `runtime.provider` (Copilot custom routing)**: `runtime.provider` accepts either the bare string shorthand (`provider: copilot`) or a structured `ProviderSettings` object that routes the Copilot SDK at OpenAI-compatible / Azure / Anthropic endpoints (Ollama, vLLM, LM Studio, Azure OpenAI, etc.). Object fields: `name` (defaults to `copilot`), `type` (`openai`|`azure`|`anthropic`), `wire_api` (`completions`|`responses`), `base_url`, `api_key`, `bearer_token`, `headers`, `azure.api_version`. `api_key` and `bearer_token` are `SecretStr` (redacted in `model_dump` / dashboard / event logs). The model is frozen after construction. Custom routing activates only when at least one non-`name` field is set in YAML — ambient `OPENAI_*` env vars never divert default routing on their own. Once activated, missing fields fall back from env vars in this order: `base_url` ← `COPILOT_PROVIDER_BASE_URL` → `OPENAI_BASE_URL`; `api_key` ← `COPILOT_PROVIDER_API_KEY` (only — ambient `OPENAI_API_KEY` is intentionally NOT a fallback to avoid credential leaks); `bearer_token` ← `COPILOT_PROVIDER_BEARER_TOKEN`. The schema rejects every non-`name` field when `name != "copilot"` (structured config for other providers is a follow-up). It also rejects anchorless / broken combinations that would silently no-op at the SDK boundary: `wire_api` / `type` / `headers` / `azure` cannot stand alone without `base_url` / `api_key` / `bearer_token`; empty `headers`, empty `SecretStr`, and `azure: {api_version: null}` are rejected. The resolver raises `ProviderError` when custom routing is activated but every resolved field is falsy (e.g. expected env vars all unset). Custom routing applies to both agent execution and dialog turns so all sessions hit the same endpoint. `--provider <name>` CLI override replaces the whole `ProviderSettings` (logs a notice when YAML had structured fields). See `examples/copilot-local-llm.yaml`.

Expand Down Expand Up @@ -170,6 +176,7 @@ Tests mirror source structure in `tests/`:
- `test_providers/` - Provider implementation tests
- `test_integration/` - Full workflow execution tests
- `test_gates/` - Human gate tests
- `test_skills/` - Skill registry, loader, schema field, and executor-integration tests

Use `pytest.mark.performance` for performance tests (exclude with `-m "not performance"`).

Expand Down
161 changes: 161 additions & 0 deletions examples/skills-self-improving-workflow.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,161 @@
# Self-improving workflow using the `conductor` skill
#
# This example shows how the generalized `skills:` capability gives an
# agent reusable, version-accurate knowledge of Conductor's YAML
# schema, execution model, and authoring patterns. It's the building
# block that a future `conductor watch` (issue #181) would chain into a
# closed-loop generate → review → fix cycle. The example demonstrates
# all three sides of that loop in one pass.
#
# Provider mechanism — same observable contract, different mechanism:
# * Copilot: the skill directory is registered on the SDK session via
# `skill_directories`. The agent discovers and loads SKILL.md and
# references on demand (progressive disclosure).
# * Claude: SKILL.md + references/*.md are eagerly prepended to the
# agent's rendered prompt inside <skills><skill name="conductor">
# ... </skill></skills> tags.
#
# Tri-state opt-in via list presence:
# * Omit `skills:` → inherit `runtime.skills`
# * `skills: []` → explicit opt-out (e.g. the `formatter` agent
# below, which doesn't need workflow knowledge)
# * `skills: [name]` → explicit set, replaces the workflow default
#
# Usage:
# conductor run examples/skills-self-improving-workflow.yaml \
# --input task="Write a workflow that summarises a GitHub issue."

workflow:
name: skills-self-improving-workflow
description: |
Generate a Conductor workflow, review it for correctness against
the bundled `conductor` skill, then revise it based on the review.
Demonstrates the building-block pattern for future iterative
fix-validate loops (see issue #181).
version: "1.0.0"
entry_point: author

runtime:
provider: copilot
skills: [conductor] # every provider-backed agent inherits this

input:
task:
type: string
required: true
description: A short description of the workflow the author should generate.

agents:
- name: author
description: Drafts an initial Conductor workflow for the given task.
prompt: |
You are authoring a brand-new Conductor workflow.

Task: {{ workflow.input.task }}

Produce a complete, runnable workflow YAML that solves the task.
Use the `conductor` skill for schema details, allowed step types,
routing patterns, and naming conventions. Prefer a minimal,
idiomatic design over an exhaustive one.
output:
workflow_yaml:
type: string
description: The drafted workflow YAML.
notes:
type: string
description: Author's notes on design choices.
routes:
- to: reviewer

- name: reviewer
description: Reviews the drafted workflow for schema and design issues.
prompt: |
Review the following Conductor workflow YAML.

```yaml
{{ author.output.workflow_yaml }}
```

Use the `conductor` skill to verify:
* Schema correctness (field names, types, required fields)
* Routing soundness (no orphan agents, no unreachable routes,
terminating routes present)
* Sensible context mode and failure-mode choices
* Appropriate use of step types (agent / script / set / wait /
terminate / parallel / for_each / human_gate / workflow)

List blocking issues separately from suggestions. If the workflow
is already correct and idiomatic, say so plainly and return an
empty `issues` list.
output:
issues:
type: string
description: Blocking schema or correctness issues. Empty if none.
suggestions:
type: string
description: Non-blocking improvements.
verdict:
type: string
description: One of "approved" or "needs_revision".
routes:
- to: fixer
when: "{{ reviewer.output.verdict == 'needs_revision' }}"
- to: formatter

- name: fixer
description: Applies the reviewer's findings and produces a revised workflow.
prompt: |
The reviewer flagged issues with the drafted workflow.

Original:
```yaml
{{ author.output.workflow_yaml }}
```

Review findings:
- Issues: {{ reviewer.output.issues }}
- Suggestions: {{ reviewer.output.suggestions }}

Use the `conductor` skill to produce a corrected, complete
workflow YAML that addresses every blocking issue. Adopt
suggestions where they materially improve the design. Output the
final YAML and a brief changelog.
output:
workflow_yaml:
type: string
description: The revised workflow YAML.
changelog:
type: string
description: Short summary of what changed and why.
routes:
- to: formatter

- name: formatter
description: Renders the final workflow + report. Doesn't need the skill.
# Explicit opt-out: this agent only formats existing strings. No
# need to bloat its prompt with the full skill content.
skills: []
prompt: |
Produce a Markdown report containing two sections:

## Workflow

```yaml
{{ fixer.output.workflow_yaml | default(author.output.workflow_yaml) }}
```

## Notes

Original notes: {{ author.output.notes }}
Review verdict: {{ reviewer.output.verdict }}
{% if fixer.output.changelog is defined %}
Changelog: {{ fixer.output.changelog }}
{% endif %}
output:
report:
type: string
description: The final Markdown report.

output:
report: "{{ formatter.output.report }}"
verdict: "{{ reviewer.output.verdict }}"
2 changes: 1 addition & 1 deletion plugins/conductor/skills/conductor/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ description: Validate, run, and execute workflows; creating new workflows when e

# Conductor

CLI tool for defining and running multi-agent workflows with the GitHub Copilot SDK, Anthropic Claude, or Claude Agent SDK.
CLI tool for defining and running multi-agent workflows with the GitHub Copilot SDK or Anthropic Claude.

> **DO NOT create new workflow files unless the user explicitly asks you to create one.** Default to running, validating, or debugging existing workflows. If the user's request is ambiguous, assume they want to run or modify an existing workflow rather than create a new one.

Expand Down
54 changes: 42 additions & 12 deletions plugins/conductor/skills/conductor/references/authoring.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ workflow:
max_agent_iterations: 50 # Max tool-use roundtrips per agent (1-500, optional)
max_session_seconds: 120 # Wall-clock timeout per agent session (optional)
default_reasoning_effort: medium # Workflow-wide reasoning effort: low, medium, high, xhigh (optional)
skills: [conductor] # Skills available to every provider-backed agent (optional)

input: # Define workflow inputs
param_name:
Expand Down Expand Up @@ -116,6 +117,11 @@ agents:
reasoning: # Override runtime.default_reasoning_effort (optional)
effort: high # low, medium, high, or xhigh

skills: [conductor] # Skills this agent has access to (optional, tri-state)
# Omit = inherit runtime.skills; [] = explicit opt-out;
# [name, ...] = explicit set. Not allowed on
# script/human_gate/workflow/wait/set/terminate agents.

routes: # Where to go next
- to: next_agent
```
Expand Down Expand Up @@ -148,6 +154,40 @@ agents:

See `examples/reasoning-effort.yaml` for a complete example.

### Skills

`skills` enables reusable knowledge or capability bundles for provider-backed agents. The Conductor distribution ships one built-in skill — `conductor` — which packages the YAML schema, execution model, and authoring patterns (the same content this reference doc covers) so an agent can evaluate, improve, debug, or generate Conductor workflows.

**Tri-state per-agent field (resolved via list presence):**
- Omit `skills:` — inherit from `runtime.skills`
- `skills: []` — explicit opt-out (no skills for this agent, regardless of workflow default)
- `skills: [name, ...]` — explicit set, replaces the workflow default

**Workflow-wide default:** `runtime.skills: [conductor]` enables it for every provider-backed agent. Individual agents can override.

**Provider mechanism (same observable contract — "the agent has access to the named skill"):**
- **Copilot** — the resolved skill directory is registered on the SDK session via `skill_directories`, so the agent discovers and loads skill content natively (progressive disclosure via `SKILL.md` frontmatter). This is more token-efficient than eager injection.
- **Claude** — the loader reads `SKILL.md` plus every `references/*.md` file in the skill directory and prepends them to the agent's rendered prompt inside `<skills><skill name="...">...</skill></skills>` tags. Inserted between workspace instructions and the user prompt.

Not allowed on `script`, `human_gate`, `workflow`, `wait`, `set`, or `terminate` agent types. Unknown skill names fail at workflow validation time.

```yaml
workflow:
runtime:
skills: [conductor] # all provider-backed agents get the conductor skill

agents:
- name: workflow_reviewer
skills: [conductor] # per-agent opt-in (redundant here, kept for clarity)
prompt: "Review this workflow for correctness..."

- name: simple_agent
skills: [] # opt out even when runtime default is set
prompt: "Do something simple."
```

See `examples/skills-self-improving-workflow.yaml` for a complete example.

## Routing Patterns

### Linear
Expand Down Expand Up @@ -696,18 +736,8 @@ agents:
### Gate Output

Human gates automatically capture:
- `output.selected` — the `value` of the chosen option.
- `output.additional_input` — dict of values collected from `prompt_for` fields.
Always present; `{}` when no `prompt_for` was specified or the selected option
has no `prompt_for`. Access individual fields via templates as
`{{ <gate>.output.additional_input.<field> }}` (for example
`{{ approval_gate.output.additional_input.feedback }}` when an option declares
`prompt_for: feedback`).

> **`context: explicit` mode note.** `input:` declarations support
> `<gate>.output.additional_input` (the whole dict) but not the dotted shorthand
> `<gate>.output.additional_input.<field>`. Declare the parent key and read
> individual fields via Jinja2 in the agent's prompt or output template.
- `output.selected` - the `value` of the chosen option
- `output.feedback` - text input from `prompt_for` (if specified)

## Context Modes

Expand Down
Loading
Loading