From 047c88a6ed639e742b9b9923f97ab380ca7d19c3 Mon Sep 17 00:00:00 2001 From: rohan-tessl Date: Thu, 23 Apr 2026 11:32:29 +0530 Subject: [PATCH] feat: improve skill-dev quality scores MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Hullo ๐Ÿ‘‹ @hao-cyber I ran your skills through `tessl skill review` at work and found some targeted improvements. Here's the full before/after: | Skill | Before | After | Change | |-------|--------|-------|--------| | skill-dev | 86% | 90% | +4% |
Changes made - **Added explicit lifecycle flow** with numbered stages and validation gates (create โ†’ use โ†’ reflect โ†’ evaluate โ†’ maturity check โ†’ publish), with fail-loop-back mechanism - **Added concrete examples to Degrees of Freedom** โ€” replaced abstract descriptions with specific examples at each freedom level (e.g., `sys.exit(1)` for low freedom) - **Added Quick Example section** showing a real reflect โ†’ evaluate โ†’ publish workflow with executable commands (`grep`, `run_eval.py`, `publish.py`) - **Removed redundant "Concise is Key" subsection** โ€” its guidance overlapped with Writing Guidelines - **Tightened Writing Guidelines** โ€” consolidated into four crisp bullets with a litmus test
Honest disclosure โ€” I work at @tesslio where we build tooling around skills like these. Not a pitch - just saw room for improvement and wanted to contribute. Want to self-improve your skills? Just point your agent (Claude Code, Codex, etc.) at [this Tessl guide](https://docs.tessl.io/evaluate/optimize-a-skill-using-best-practices) and ask it to optimize your skill. Ping me - [@popey](https://github.com/popey) - if you hit any snags. Thanks in advance ๐Ÿ™ --- SKILL.md | 47 ++++++++++++++++++++++++++++++++++++----------- 1 file changed, 36 insertions(+), 11 deletions(-) diff --git a/SKILL.md b/SKILL.md index 78defb8..9779149 100644 --- a/SKILL.md +++ b/SKILL.md @@ -7,19 +7,28 @@ description: "Skill ๅ…จ็”Ÿๅ‘ฝๅ‘จๆœŸ็ฎก็†๏ผšๅˆ›ๅปบ โ†’ ๅๆ€ไผ˜ๅŒ– โ†’ ่ฏ„ๆต‹ Full lifecycle management for AI agent skills: create, reflect, evaluate, publish, search, install, merge, review, and uninstall. -## Core Principles +## Lifecycle Flow + +Each stage has a validation gate before advancing to the next: -### Concise is Key +1. **Create** โ†’ verify triggers fire correctly and scripts run (`--help` smoke test) +2. **Use** โ†’ observe real executions for failure signals +3. **Reflect** โ†’ after failures or user corrections, identify root cause and patch +4. **Evaluate** โ†’ run regression tests to confirm fixes don't break other cases +5. **Maturity check** โ†’ confirm publish readiness (3+ successful runs, no recent reflects) +6. **Publish** โ†’ push to registry: `python3 scripts/publish.py .claude/skills//` -The context window is a shared resource. Only add context Claude doesn't already have. Challenge each piece: "Does Claude really need this?" Prefer concise examples over verbose explanations. +Fail any gate โ†’ loop back. Example: reflect fix at step 3 requires re-evaluation at step 4 before proceeding to step 5. + +## Core Principles ### Degrees of Freedom Match specificity to the task's fragility: -- **High freedom** (text instructions): Multiple approaches valid, context-dependent decisions -- **Medium freedom** (pseudocode/scripts with params): Preferred pattern exists, some variation OK -- **Low freedom** (specific scripts): Operations fragile, consistency critical, exact sequence required +- **High freedom** (text instructions): "Choose an appropriate error message" โ€” multiple valid answers +- **Medium freedom** (pseudocode with params): "Format errors as `ERROR: {context} โ€” {detail}`" โ€” pattern fixed, content varies +- **Low freedom** (exact scripts): `sys.exit(1)` on failure โ€” no variation allowed ## Routing @@ -36,6 +45,23 @@ Match specificity to the task's fragility: - **Merging skill variants** ("merge", "ๅˆๅนถ็‰ˆๆœฌ", "่žๅˆ") โ†’ ่ฏปๅ– `references/merge.md` - **Uninstalling a skill** ("uninstall", "ๅˆ ้™ค skill", "ๅธ่ฝฝ") โ†’ use `scripts/uninstall.py --name --yes` +### Quick Example: Reflect โ†’ Evaluate โ†’ Publish + +```bash +# 1. Reflect โ€” skill failed on "create a meeting" (routed to doc-writer instead of calendar) +# Fix: narrow trigger in SKILL.md description, add negative example +grep -rl "create" .claude/skills/ | head -5 # impact scan: find colliding triggers + +# 2. Evaluate โ€” verify the fix didn't break other cases +python3 .claude/skills/prompt-eval/scripts/run_eval.py \ + --prompts /tmp/skill-system-prompt.md \ + --tests .claude/skills/doc-writer/evals.yaml \ + --task-id reflect-fix-001 --output-dir /tmp/eval-out + +# 3. Publish โ€” all tests pass + 3 successful runs + no recent reflects +python3 scripts/publish.py .claude/skills/doc-writer/ +``` + ## When NOT to Create a Skill Don't build for hypothetical future needs. Skip if ANY apply: @@ -59,8 +85,7 @@ Tool design matters more than prompt design. When a skill has `scripts/`, invest ## Writing Guidelines -- **Do** include: non-obvious procedures, domain specifics, gotchas from real failures -- **Don't** include: things Claude already knows, verbose explanations, auxiliary docs -- **Keep** SKILL.md โ‰ค150 lines (routing layer); move scenario details to references/ -- **Challenge each line**: "Would removing this cause Claude to make mistakes?" If not, cut it. -- **Prefer examples over explanations**: One concrete pair teaches more than a paragraph +- **Include**: non-obvious procedures, domain specifics, gotchas from real failures +- **Exclude**: things Claude already knows, verbose explanations, auxiliary docs +- **Size**: SKILL.md โ‰ค150 lines; move scenario details to `references/` +- **Litmus test**: "Would removing this line cause Claude to make mistakes?" If not, cut it