diff --git a/.github/CODEOWNERS b/.github/CODEOWNERS new file mode 100644 index 0000000..4434d63 --- /dev/null +++ b/.github/CODEOWNERS @@ -0,0 +1,6 @@ +# CODEOWNERS +# This file defines the code owners for this repository +# Code owners are automatically requested for review when someone opens a pull request + +# Default owners for everything in the repo +* @miroslavpojer @lsulak @oto-macenauer-absa @tmikula-dev diff --git a/.github/copilot-instructions.md b/.github/copilot-instructions.md new file mode 100644 index 0000000..0298983 --- /dev/null +++ b/.github/copilot-instructions.md @@ -0,0 +1,17 @@ +# Copilot Instructions + +## PR Review + +When reviewing a pull request, load and apply: +https://github.com/AbsaOSS/agentic-toolkit/blob/master/skills/pr-review/SKILL.md + +### Skill-specific checks — apply when any `SKILL.md` is modified + +For every modified `SKILL.md`, also verify: +- `name` is kebab-case, matches the directory name, and is ≤ 64 chars +- `description` covers both "what it does" AND "when to trigger" with explicit trigger keywords +- `description` is ≤ 1024 chars and not padded with filler +- SKILL.md body is < 500 lines, or uses progressive disclosure via `references/` +- No hardcoded credentials, secrets, or absolute internal paths in skill body or scripts +- Any bundled script in `scripts/` is referenced from SKILL.md with clear usage guidance +- The new or modified skill's description does not conflict with or shadow existing skills diff --git a/.github/workflows/check_pr_release_notes.yml b/.github/workflows/check_pr_release_notes.yml new file mode 100644 index 0000000..af3761e --- /dev/null +++ b/.github/workflows/check_pr_release_notes.yml @@ -0,0 +1,23 @@ +name: Check PR Release Notes in Description + +on: + pull_request: + types: [opened, synchronize, reopened, edited] + branches: [ master ] + +jobs: + check-release-notes: + runs-on: ubuntu-latest + steps: + - uses: actions/setup-python@a309ff8b426b58ec0e2a45f0f869d46889d02405 + with: + python-version: '3.14' + + - name: Check presence of release notes in PR description + uses: AbsaOSS/release-notes-presence-check@8e586b26a5e27f899ee8590a5d988fd4780a3dbf + env: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + with: + github-repository: ${{ github.repository }} + pr-number: ${{ github.event.number }} + skip-labels: "no RN" diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md new file mode 100644 index 0000000..7a213fc --- /dev/null +++ b/CONTRIBUTING.md @@ -0,0 +1,231 @@ +# Contributing to Agentic Toolkit + +This guide covers everything you need to author a high-quality skill for this repository — from file layout and +frontmatter rules through description writing, body guidelines, and testing. + +> New here? Read **[docs/getting-started.md](./docs/getting-started.md)** first to understand what skills are and how +> to install them before authoring your own. + +## Table of Contents + +1. [Skill structure](#1-skill-structure) +2. [Frontmatter schema](#2-frontmatter-schema) +3. [Writing the description](#3-writing-the-description) +4. [Writing the skill body](#4-writing-the-skill-body) +5. [Testing your skill](#5-testing-your-skill) +6. [Proposing a skill via a pull request](#6-proposing-a-skill-via-a-pull-request) + +## 1. Skill structure + +Each skill lives in its own folder under `skills/`: + +``` +skills/ +└── skill-name/ + ├── SKILL.md # Required — frontmatter + instructions + ├── scripts/ # Optional — executable scripts the agent can run + ├── references/ # Optional — supporting docs loaded on demand + ├── assets/ # Optional — templates, icons, example files + └── evals/ # Optional — test prompts and assertions +``` + +> **Rule:** the folder name must exactly match the `name` field in the `SKILL.md` frontmatter. + +### When to add each optional directory + +| Directory | Use for | +|---------------|------------------------------------------------------------------------------------------------------------------------------------------| +| `scripts/` | Deterministic or repetitive logic better run as code than described in prose (e.g. a validation script, a formatter, a data transformer) | +| `references/` | Domain docs, API specs, decision tables, or anything too large to keep in `SKILL.md` without exceeding 500 lines | +| `assets/` | Template files, example inputs/outputs, icons — anything the skill produces or consumes | +| `evals/` | Test prompts and assertions to verify skill behavior and trigger accuracy. See [skill-testing.md](./docs/skill-testing.md) | + +--- + +## 2. Frontmatter schema + +Every `SKILL.md` must open with a YAML frontmatter block: + +```yaml +--- +name: skill-name +description: > + What this skill does and the specific situations in which it should be + activated. Include trigger phrases, domains, and keywords. +license: Proprietary # optional +compatibility: GitHub Copilot # optional — only when env requirements exist +--- +``` + +### Field reference + +| Field | Required | Constraints | +|---|---|---| +| `name` | ✅ | Lowercase letters, numbers, and hyphens only. Max 64 chars. Must not start or end with a hyphen. No consecutive hyphens (`--`). Must match the parent directory name. | +| `description` | ✅ | Max 1024 chars. Non-empty. Must describe both **what** the skill does and **when** to activate it. See [Writing the description](#3-writing-the-description). | +| `license` | ➖ | Short SPDX name or reference to a bundled `LICENSE.txt`. | +| `compatibility` | ➖ | Max 500 chars. Only include if the skill has specific environment requirements (tools, Python version, network access, etc.). Most skills do not need this field. | + +#### Valid `name` examples + +```yaml +name: pr-review # ✅ +name: create-issue # ✅ +name: data-pipeline # ✅ +``` + +#### Invalid `name` examples + +```yaml +name: PR-Review # ❌ uppercase +name: -pr-review # ❌ starts with hyphen +name: pr--review # ❌ consecutive hyphens +name: pr_review # ❌ underscores not allowed +``` + +--- + +## 3. Writing the description + +The `description` field is the **primary triggering mechanism**. The agent never reads your skill body until it decides the description matches the current task. A weak description means the skill never fires, no matter how good the body is. + +### What to include + +1. **What it does** — a concise statement of the skill's output or capability +2. **When to use it** — explicit trigger phrases, domains, user intents +3. **Keywords** — include both formal terms and casual phrasings a real user might type + +### Be slightly "pushy" + +Claude tends to under-trigger skills. Lean toward explicit activation language: + +```yaml +# Too vague — will often not trigger +description: Helps with pull request reviews. + +# Better — explicit about when to activate +description: > + Pull request code review. Activate when asked to review a PR, check a diff, + or give feedback on code changes. Covers standard risk, elevated risk, API + contracts, dependency bumps, CI/CD changes, and infrastructure changes. + Applies the relevant sections based on what files the PR touches. + Produces concise comments grouped by severity: Blocker / Important / Nit. +``` + +### Length guidelines + +- **Aim for 150–400 characters** for most skills +- Do not pad to the 1024-char limit — filler dilutes signal +- Do not put "when to use" information only in the body; it belongs in `description` + +### Good vs. poor examples + +| | Example | +|---|---| +| ✅ Good | `"Extracts text and tables from PDF files, fills PDF forms, and merges multiple PDFs. Use when working with PDF documents or when the user mentions PDFs, forms, or document extraction."` | +| ❌ Poor | `"Helps with PDFs."` | +| ✅ Good | `"Creates a GitHub issue from a natural language prompt. Triggers on requests like 'create an issue for X', 'open a bug report about Y', 'file a feature request for Z', 'add a ticket for W'."` | +| ❌ Poor | `"Opens GitHub issues."` | + +--- + +## 4. Writing the skill body + +### Size and progressive disclosure + +- **Target under 500 lines** for `SKILL.md`. If you are approaching this limit, move supporting detail into `references/` files and add clear pointers in `SKILL.md` telling the agent when and how to load them. +- For large reference files (> 300 lines), include a table of contents at the top. +- When a skill supports multiple distinct domains or frameworks, create a `references/` file per domain and let the skill body select which one to load based on context. + +``` +cloud-deploy/ +├── SKILL.md # workflow + selection logic +└── references/ + ├── aws.md + ├── gcp.md + └── azure.md +``` + +### Add only what the agent lacks + +Focus on what the agent *would not* know without the skill: project-specific conventions, non-obvious edge cases, the particular APIs or tools to use, and team standards. Do not explain what a PDF is, how HTTP works, or what a migration does — the agent already knows. + +```markdown + +PDF (Portable Document Format) files are common documents that contain text +and images. To extract text you need a library. pdfplumber is recommended. + + +Use pdfplumber for text extraction. For scanned documents, fall back to +pdf2image + pytesseract. +``` + +### Explain the why, not just the what + +Prefer explaining *why* over issuing directives. Today's models respond better to reasoning than to rigid commands. + +```markdown + +ALWAYS use the imperative form. NEVER use passive voice. + + +Use imperative form in instructions (e.g. "Run the linter" not "The linter +should be run") — it is clearer and easier for the agent to follow. +``` + +### Bundle reusable scripts + +If every test run of your skill independently writes the same helper script (a formatter, a validator, a transformer), bundle it in `scripts/` and reference it from `SKILL.md`. This saves every future invocation from reinventing the wheel. + +### Format conventions + +- Use `##` and `###` headings to structure the body +- Use numbered lists for sequential steps, bullet lists for non-ordered items +- Include short worked examples where they add clarity +- Keep code blocks minimal — a representative snippet beats an exhaustive reference + +### Effective body patterns + +| Pattern | When to use | +|---|---| +| **Gotchas** | Environment-specific facts the agent will get wrong without being told. Keep in `SKILL.md` itself — the agent reads it before encountering the situation. | +| **Output template** | When you need a specific output format. A concrete template is more reliable than describing the format in prose. | +| **Checklist** | Multi-step workflows where skipping a step causes downstream failures: `- [ ] Step 1: Run scripts/validate.py`. | +| **Validation loop** | Any task where the agent should self-check before finishing: do → run validator → fix errors → repeat until clean. | +| **Plan-validate-execute** | Batch or destructive operations: generate a plan file → validate it against a source of truth → execute. | + +--- + +## 5. Testing your skill + +Before proposing a PR, verify that your skill activates correctly and produces good output. The full testing +methodology — eval creation, fixture management, with/without comparisons, trigger testing, and description +optimization using the Anthropic [`skill-creator`](https://github.com/anthropics/skills/tree/main/skills/skill-creator) +skill — is covered in **[docs/skill-testing.md](./docs/skill-testing.md)**. + +--- + +## 6. Proposing a skill via a pull request + +1. **Open an issue first** using + the [Skill Proposal template](https://github.com/AbsaOSS/agentic-toolkit/issues/new/choose) to discuss scope + before writing code +2. Create your skill folder under `skills/` following the structure in [Skill structure](#1-skill-structure) +3. Run the tests described in [Testing your skill](#5-testing-your-skill) and include benchmark results in the PR description +4. Open a pull request; CODEOWNERS will be automatically requested for review +5. Optionally, add `Copilot` as a reviewer to get automated skill quality feedback + +### PR checklist + +Before opening a pull request, verify: + +- [ ] Folder name matches the `name` frontmatter field exactly +- [ ] `name` is kebab-case, ≤ 64 chars, no consecutive hyphens +- [ ] `description` covers both *what it does* and *when to trigger*, with explicit keywords +- [ ] `description` is ≤ 1024 chars and not padded with filler +- [ ] `SKILL.md` body is < 500 lines, or uses progressive disclosure via `references/` +- [ ] No hardcoded credentials, secrets, or internal paths in skill body or scripts +- [ ] Any script in `scripts/` is referenced from `SKILL.md` with usage guidance +- [ ] New skill's description does not conflict with or shadow existing skills +- [ ] Evals exist (or a note explains why they are not applicable) +- [ ] `skills-ref validate ./skills/my-skill` passes (install: `pip install skills-ref`) diff --git a/README.md b/README.md index 60e4693..f90df00 100644 --- a/README.md +++ b/README.md @@ -1 +1,118 @@ -# agentic-toolkit \ No newline at end of file +# Agentic Toolkit + +Curated, reusable AI skills — instructional context, domain conventions, and callable agent tools for AI-assisted +engineering. + +Every AI assistant starts from zero — no accumulated engineering experience, no reusable workflows. This repo fixes +that. It's a curated library of **skills**: bundles of domain knowledge, engineering patterns, and proven practices +packaged for any [Agent Skills](https://agentskills.io)-compatible client (Copilot, Claude, Cursor, and others). The +knowledge here is not internal or proprietary — it reflects universal software engineering practices used at ABSA. +Tomorrow the scope may expand to agents, MCP servers, and plugins as agentic tooling matures (see [Scope](#scope)). + +**Who is this for?** Anyone who uses AI for AI-assisted engineering. The primary audience is technical: engineers, tech +leads, architects, or anyone on a technical team. + +## Table of Contents + +- [Scope](#scope) +- [How Skills Relate To Your Own](#how-skills-relate-to-your-own) +- [Quickstart](#quickstart) +- [Skill Catalog](#skill-catalog) +- [Finding More Skills](#finding-more-skills) +- [Contributing](#contributing) +- [FAQ](#faq) +- [Troubleshooting](#troubleshooting) + +## Scope + +This repo currently focuses on **Skills** — basically just folders of instructions, scripts, and resources that an AI +agent can load on demand to improve its performance on specialised tasks. + +As our use of agentic tooling matures, the scope may expand to include: + +- **Agents** — custom agent definitions for recurring engineering tasks (code review, architecture, research) +- **MCP Servers** — callable tool servers giving agents access to internal systems and data +- **Plugins** — bundled distributions packaging skills, agents, and MCP servers into a single install + +If and when that happens, the repo structure, name, and install instructions will be updated accordingly. For now, +skills are the right place to start — they are the lowest friction, highest value layer, and the foundation everything +else builds on. + +## How Skills Relate To Your Own + +The skills in this repo are the **base layer** — shared conventions and capabilities applicable across teams and +projects. You load them into your personal or project space and extend with your own layers, so every session inherits +the team's collective knowledge or workflows. + +If something might help the wider team, it belongs here. + +``` +Agentic Skills ← shared base (this repo) + └── Personal Skills ← your individual skills (~/.copilot/skills/ or ~/.agents/skills/) + └── Project Skills ← repository-specific (most commonly in .github/skills/) +``` + +Each layer extends the one above — personal and project layers are **extensions**, not replacements. + +## Quickstart + +Skills are managed through the [Vercel Skills CLI](https://github.com/vercel-labs/skills) or via +[GitHub Copilot CLI](https://github.com/github/copilot-cli) slash commands. + +```bash +# Install all skills globally +npx skills add https://github.com/AbsaOSS/agentic-toolkit -g + +# Or install a single skill +npx skills add https://github.com/AbsaOSS/agentic-toolkit -g --skill pr-review +``` + +For the full guide — what skills are, how they activate, project-scoped installs, updates, Copilot CLI commands — see +**[docs/getting-started.md](./docs/getting-started.md)**. + +## Skill Catalog + +Browse all available skills in the **[skills/](./skills/)** directory — each skill folder contains a `SKILL.md` with +its purpose, trigger phrases, and full instructions. + +> The catalog table will be populated as skills are added. See `skills/` for the current set. + +## Finding More Skills + +Before building a new skill, check whether one already exists: + +| Source | What's Available | +|--------------------------------------------------------------------------------------|-------------------------------------------------------------------------| +| [github/awesome-copilot](https://github.com/github/awesome-copilot/tree/main/skills) | 200+ community skills: cloud, languages, security, DevOps, productivity | +| [skills.sh](https://skills.sh) | Open registry — install with `npx skills add ` | +| [anthropics/skills](https://github.com/anthropics/skills) | Anthropic reference skills including `skill-creator` | +| [absa-group/agent-skills](https://github.com/absa-group/agent-skills) | Broader ABSA-owned skill collection | +| [absa-group/cps-agentic-toolkit](https://github.com/absa-group/cps-agentic-toolkit) | CPS team's skill set built on top of this repo | + +## Contributing + +See **[CONTRIBUTING.md](./CONTRIBUTING.md)** for the skill authoring guide — folder layout, frontmatter schema, writing +effective descriptions and bodies, [testing](./docs/skill-testing.md), and the PR checklist. + +To propose a new skill — or to propose expanding the repo into agents, MCP servers, or +plugins — [open an issue](https://github.com/AbsaOSS/agentic-toolkit/issues/new). + +Contributions are welcome from anyone. + +## FAQ + +### What's the difference between a skill, an agent, and an MCP server? + +A **skill** gives the agent instructions — prose it reads into context when a task matches. +An **agent** is a full persona: a system prompt, tools, and behavioural constraints bundled together — it may *use* +multiple skills. An **MCP server** gives the agent callable tools for live integrations and API calls. +Think of skills as books, agents as the people who read them, and MCP servers as the phone lines they dial. + +### Can I use these skills outside of GitHub Copilot? + +Yes. Skills follow the open [Agent Skills](https://agentskills.io) standard and work with any compatible tool — +Claude, Cursor, Windsurf, and custom pipelines. + +## Troubleshooting + +Setup issues and common fixes are covered in **[docs/troubleshooting.md](./docs/troubleshooting.md)**. diff --git a/docs/README.md b/docs/README.md new file mode 100644 index 0000000..a13b50d --- /dev/null +++ b/docs/README.md @@ -0,0 +1,14 @@ +# Documentation + +Navigation hub for all guides in this repository. Browse by category below. + +| Guide | Audience | Description | +|-----------------------------------------|----------|------------------------------------------------------------------------------------| +| [Getting Started](./getting-started.md) | Users | What skills are, how to install them, Copilot CLI usage | +| [Troubleshooting](./troubleshooting.md) | Users | Setup guides and fixes for install, activation, and proxy issues | +| [Contributing](../CONTRIBUTING.md) | Authors | Skill folder layout, frontmatter, description writing, body guidelines, PR process | +| [Skill Testing](./skill-testing.md) | Authors | Eval creation, fixtures, regression loops, trigger and description optimization | + +> **Keep this index up to date.** When you add a new guide under `docs/`, add a row to the table above. + +See also the [main README](../README.md) for the skill catalog, scope, and FAQ. diff --git a/docs/getting-started.md b/docs/getting-started.md new file mode 100644 index 0000000..d5402a3 --- /dev/null +++ b/docs/getting-started.md @@ -0,0 +1,147 @@ +# Getting Started + +This guide covers what skills are, how to install them, and how to use them in your AI-assisted engineering workflow. + +## What Is a Skill? + +A **skill** is a folder containing instructions, scripts, and reference material that an AI agent loads on demand. +Skills follow the open [Agent Skills](https://agentskills.io) standard and work with GitHub Copilot, Claude, Cursor, and +other compatible tools. + +Skills manage LLM context efficiently — the agent does **not** receive all skill content at startup. Instead, it loads +content progressively: + +| Phase | What Happens | +|----------------|-----------------------------------------------------------------------------------------------------------| +| **Discovery** | At startup, the agent loads only the `name` and `description` of each installed skill | +| **Activation** | When your task matches a skill's description, the agent reads the full `SKILL.md` into context | +| **Execution** | The agent follows the skill's instructions, optionally loading reference files or running bundled scripts | + +> ⚠️ **The `description` field is the sole activation signal.** If a skill isn't firing, your prompt likely doesn't +> match its description keywords. Rephrase your message to include relevant trigger terms from the skill's description. +> Inside a Copilot CLI session, run `/skills list` to inspect loaded descriptions. +> Outside the CLI, you can run `npx skills list -g` to see all the installed skills. + +## Prerequisites: Install Copilot CLI + +We use the [GitHub Copilot CLI](https://github.com/github/copilot-cli) as the primary AI-assisted engineering tool. + +```bash +# macOS / Linux +curl -fsSL https://gh.io/copilot-install | bash + +# macOS (Homebrew) +brew install copilot-cli + +# Any platform (npm) +npm install -g @github/copilot +``` + +Then launch a session from your project directory: + +```bash +copilot +``` + +## Install Skills + +### Option A: Vercel Skills CLI (recommended) + +The [Vercel Skills CLI](https://github.com/vercel-labs/skills) automates installation via `npx`: + +```bash +# List available skills from this repo without installing them +npx skills add https://github.com/AbsaOSS/agentic-toolkit --list + +# Install all skills globally (available in every project) +npx skills add https://github.com/AbsaOSS/agentic-toolkit -g + +# Install a single skill globally (--skill filters by folder name within the repo) +npx skills add https://github.com/AbsaOSS/agentic-toolkit -g --skill pr-review + +# Install project-scoped (into .github/skills/ of the current repo) +npx skills add https://github.com/AbsaOSS/agentic-toolkit +``` + +> To share project-scoped skills with your team, commit the `.github/skills/` directory. + +On first install, `npx skills add` asks two questions interactively: + +| Prompt | Options | Recommendation | +|---------------------------------|-----------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| **Installation method** | `Symlink` / `Copy` | Use **Symlink** — a single copy lives on disk, and `npx skills update` refreshes it in place. Use **Copy** only if symlinks aren't supported in your environment. | +| **Which agents to install to?** | A specific agent directory / **Copy to all agents** | Choose **Copy to all agents** if you want the skills available in Copilot, Claude, Cursor, and any other compatible tool simultaneously. Pick a specific directory to limit the scope. | + +These choices are recorded and reused on subsequent updates — you won't be asked again unless you reinstall from +scratch. The global skills directory used is typically `~/.agents/skills/` (recommended for cross-tool compatibility) +or `~/.copilot/skills/`. See [Skills directory location varies](./troubleshooting.md#skills-directory-location-varies) +if you need +to change it later. + +#### Update installed skills + +```bash +npx skills update # interactive — prompts for scope +npx skills update -g # update global skills +npx skills update -y # auto-detects scope: project-scope if running within a project, else global +``` + +### Option B: Manual install + +Clone this repo and copy skills into your personal space: + +```bash +git clone https://github.com/AbsaOSS/agentic-toolkit.git +cp -r agentic-toolkit/skills/*/ ~/.agents/skills/ +``` + +Skills copied into `~/.agents/skills/` (or `~/.copilot/skills/`) are loaded automatically at every session start — no +further action needed. + +## Verify Skills Are Loaded + +Inside a Copilot CLI session: + +``` +/skills list # See all loaded skills and their descriptions +/env # Full environment: skills, agents, MCP servers, instructions +``` + +Confirm the skills appear in the list. + +## Using Skills in Copilot CLI + +GitHub Copilot CLI resolves skills from two locations: + +| Scope | Location | When To Use | +|-------------|---------------------------------------------|-------------------------------------------------------| +| **Global** | `~/.agents/skills/` or `~/.copilot/skills/` | Available across all projects | +| **Project** | `.github/skills/` | Specific to one repo — commit to share with your team | + +Project skills take precedence over global skills when both exist. + +### Managing skills mid-session + +``` +/skills add # Add a skill to the current session (temporary) +/skills remove # Remove a skill from the current session (temporary) +``` + +> Session-level changes are temporary. Use `npx skills add` or the manual copy for permanent changes. + +### Add project-specific skills + +For skills that only apply to a specific repository, place them in `.github/skills/` within that repo. These are loaded +automatically when Copilot CLI is launched from that directory, layered on top of your personal and CPS base skills. + +``` +your-project-repo/ +└── .github/ + └── skills/ + └── your-project-skill/ + └── SKILL.md +``` + +## Troubleshooting + +Running into issues? See **[docs/troubleshooting.md](./troubleshooting.md)** guide. diff --git a/docs/skill-testing.md b/docs/skill-testing.md new file mode 100644 index 0000000..49aee05 --- /dev/null +++ b/docs/skill-testing.md @@ -0,0 +1,115 @@ +# Skill Testing Guide + +This document provides a comprehensive methodology for testing, evaluating, and optimizing skills in this repository. It covers eval creation, fixture management, iterative improvement, trigger testing, and description optimization. + +--- + +## 1. Recommended workflow + +1. Create eval cases in `skills//evals/evals.json` +2. Add fixture files under `skills//evals/fixtures/` when prompts depend on local files +3. Start a Copilot CLI session from the repository root +4. Ask Copilot to use the `skill-creator` skill to test the target skill +5. Review outputs and diffs +6. Improve the skill description or body +7. Re-run targeted evals, then the full suite +8. Repeat until regressions are stable and output quality is consistently better + +## 2. Eval Cases + +Store evals in `skills//evals/evals.json`: + +```json +{ + "skill_name": "my-skill", + "evals": [ + { + "id": "happy-path-1", + "prompt": "A realistic prompt a real user would type, with concrete detail", + "expected_output": "Short description of what a successful result should do", + "files": [] + } + ] +} +``` + +Write prompts that look like real user requests, not abstract test descriptions. Include a mix of happy-path, regression, output-format, edge, negative, and paraphrased cases. + +## 3. Fixtures + +If the skill operates on files, place sample inputs in `skills//evals/files/` and reference those files from the eval entries. Use small, representative fixtures. + +## 4. With/Without Skill Comparisons + +Ask for a side-by-side comparison in the Copilot CLI session: + +``` +Use the skill-creator skill to compare outputs for skills/my-skill with and without the skill enabled. +``` + +Compare correctness, completeness, structure, latency, verbosity, and formatting stability. + +## 5. Reviewing Outputs and Diffs + +Review changed outputs and fixture diffs. Classify each change as improvement, acceptable variation, regression, or unclear. Avoid making fixtures so strict that harmless wording improvements fail the test. + +## 6. Iterative Improvement + +When an eval fails, update the smallest possible part of the skill and re-run the affected cases first. Avoid large rewrites unless the skill is fundamentally mis-scoped. + +## 7. Regression-First Loop + +1. Run the full eval suite and save the baseline +2. Pick the largest failure cluster +3. Make one small edit to the skill +4. Re-run only affected evals and regressions +5. Review output diffs +6. Run the full suite again +7. Keep or revert the change +8. Repeat until stable + +## 8. Body vs. Trigger Testing + +- **Body testing:** Use `evals/evals.json` to verify that once the skill is active, it behaves correctly. +- **Trigger testing:** Create `skills//evals/trigger-eval.json` to test whether the skill activates for the right queries. + +## 9. Description Optimization + +Ask Copilot to optimize the description against your trigger eval set: + +``` +Use the skill-creator skill to optimize the description for skills/my-skill using skills/my-skill/evals/trigger-eval.json. +``` + +## 10. What “good enough” looks like + +- All smoke tests and known regressions passing +- No critical format failures +- Positive or neutral delta versus baseline +- Stable behavior across paraphrases +- Strong trigger accuracy on both train and validation queries + +## 11. Practical Tips + +- Use realistic prompts with concrete nouns, filenames, and intent +- Prefer several focused evals over one large vague eval +- Keep fixtures small and scenario-specific +- Update fixtures only after confirming the new behavior is actually better +- Do not optimize solely for exact string matches +- If a change improves one eval but harms unrelated ones, revert it + +## 12. Minimal CLI Loop + +``` +gh copilot +→ "Use the skill-creator skill to test my skill at skills/my-skill" +→ inspect results and diffs +→ edit SKILL.md or fixtures +→ "Use the skill-creator skill to rerun the evals for skills/my-skill" +→ optimize description if needed +→ repeat until stable +``` + +--- + +For more on evals, see [Demystifying Evals for AI Agents](https://www.anthropic.com/engineering/demystifying-evals-for-ai-agents) and the `skill-creator` skill documentation. diff --git a/docs/troubleshooting.md b/docs/troubleshooting.md new file mode 100644 index 0000000..d1121c3 --- /dev/null +++ b/docs/troubleshooting.md @@ -0,0 +1,48 @@ +# Troubleshooting + +Fixes for known setup issues, plus setup guides for specific skills. + +--- + +## Skill not activating + +The `description` field is the **sole activation signal** — the agent never reads a skill's body until it decides +the description matches the current task. + +Steps to diagnose: + +1. Inside a Copilot CLI session, run `/skills list` to confirm the skill is loaded and inspect its description. +2. Outside the CLI, run `npx skills list` or open the skill's `SKILL.md` directly. +3. Rephrase your prompt to include keywords that appear in the description — both formal terms and casual phrasings. +4. If the skill still won't trigger, consider improving the description. See the + [Contributing Guide](../CONTRIBUTING.md#3-writing-the-description) for guidance. + +## SSL / certificate errors (corporate proxy) + +If `npx skills` fails with SSL or certificate errors, you are likely behind a corporate proxy or TLS inspection +tool. Configure npm to trust your organisation's certificate bundle: + +```bash +npm config set cafile /path/to/your/ca-bundle.pem +``` + +This writes the setting to `~/.npmrc`. Obtain the certificate bundle from your organisation's IT or security team. + +## Skill removal doesn't update the lock file + +`npx skills remove` removes the skill directory but does not always clean up internal state. If you see stale +references after removing a skill, delete the corresponding entry from the skills lock file (typically +`.skills-lock.json`) manually. See +[Managing AI Agent Skills with npx skills](https://dev.to/toyama0919/managing-ai-agent-skills-with-npx-skills-a-practical-guide-2an8) +for a detailed walkthrough. + +## Skills directory location varies + +The global skills directory depends on which location you chose when you first ran `npx skills add`. The default +is `~/.agents/skills/`, but `~/.copilot/skills/` and other paths are valid. We recommend `~/.agents/skills/` for +cross-tool compatibility. + +## Skills not loading in a project session + +Make sure project-scoped skills live in `.github/skills/` at the root of the repo *and* that you launched the +Copilot CLI from inside that repo's directory. Skills in subdirectories or in differently named folders are ignored.