From 047c88a6ed639e742b9b9923f97ab380ca7d19c3 Mon Sep 17 00:00:00 2001
From: rohan-tessl <rohan-tessl@users.noreply.github.com>
Date: Thu, 23 Apr 2026 11:32:29 +0530
Subject: [PATCH] feat: improve skill-dev quality scores
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Hullo 👋 @hao-cyber

I ran your skills through `tessl skill review` at work and found some targeted improvements. Here's the full before/after:

| Skill | Before | After | Change |
|-------|--------|-------|--------|
| skill-dev | 86% | 90% | +4% |

<details>
<summary>Changes made</summary>

- **Added explicit lifecycle flow** with numbered stages and validation gates (create → use → reflect → evaluate → maturity check → publish), with fail-loop-back mechanism
- **Added concrete examples to Degrees of Freedom** — replaced abstract descriptions with specific examples at each freedom level (e.g., `sys.exit(1)` for low freedom)
- **Added Quick Example section** showing a real reflect → evaluate → publish workflow with executable commands (`grep`, `run_eval.py`, `publish.py`)
- **Removed redundant "Concise is Key" subsection** — its guidance overlapped with Writing Guidelines
- **Tightened Writing Guidelines** — consolidated into four crisp bullets with a litmus test

</details>

Honest disclosure — I work at @tesslio where we build tooling around skills like these. Not a pitch - just saw room for improvement and wanted to contribute.

Want to self-improve your skills? Just point your agent (Claude Code, Codex, etc.) at [this Tessl guide](https://docs.tessl.io/evaluate/optimize-a-skill-using-best-practices) and ask it to optimize your skill. Ping me - [@popey](https://github.com/popey) - if you hit any snags.

Thanks in advance 🙏
---
 SKILL.md | 47 ++++++++++++++++++++++++++++++++++++-----------
 1 file changed, 36 insertions(+), 11 deletions(-)
diff --git a/SKILL.md b/SKILL.md
index 78defb8..9779149 100644
--- a/SKILL.md
+++ b/SKILL.md
@@ -7,19 +7,28 @@ description: "Skill 全生命周期管理：创建 → 反思优化 → 评测 
 
 Full lifecycle management for AI agent skills: create, reflect, evaluate, publish, search, install, merge, review, and uninstall.
 
-## Core Principles
+## Lifecycle Flow
+
+Each stage has a validation gate before advancing to the next:
 
-### Concise is Key
+1. **Create** → verify triggers fire correctly and scripts run (`--help` smoke test)
+2. **Use** → observe real executions for failure signals
+3. **Reflect** → after failures or user corrections, identify root cause and patch
+4. **Evaluate** → run regression tests to confirm fixes don't break other cases
+5. **Maturity check** → confirm publish readiness (3+ successful runs, no recent reflects)
+6. **Publish** → push to registry: `python3 scripts/publish.py .claude/skills/<name>/`
 
-The context window is a shared resource. Only add context Claude doesn't already have. Challenge each piece: "Does Claude really need this?" Prefer concise examples over verbose explanations.
+Fail any gate → loop back. Example: reflect fix at step 3 requires re-evaluation at step 4 before proceeding to step 5.
+
+## Core Principles
 
 ### Degrees of Freedom
 
 Match specificity to the task's fragility:
 
-- **High freedom** (text instructions): Multiple approaches valid, context-dependent decisions
-- **Medium freedom** (pseudocode/scripts with params): Preferred pattern exists, some variation OK
-- **Low freedom** (specific scripts): Operations fragile, consistency critical, exact sequence required
+- **High freedom** (text instructions): "Choose an appropriate error message" — multiple valid answers
+- **Medium freedom** (pseudocode with params): "Format errors as `ERROR: {context} — {detail}`" — pattern fixed, content varies
+- **Low freedom** (exact scripts): `sys.exit(1)` on failure — no variation allowed
 
 ## Routing
 
@@ -36,6 +45,23 @@ Match specificity to the task's fragility:
 - **Merging skill variants** ("merge", "合并版本", "融合") → 读取 `references/merge.md`
 - **Uninstalling a skill** ("uninstall", "删除 skill", "卸载") → use `scripts/uninstall.py --name <skill> --yes`
 
+### Quick Example: Reflect → Evaluate → Publish
+
+```bash
+# 1. Reflect — skill failed on "create a meeting" (routed to doc-writer instead of calendar)
+#    Fix: narrow trigger in SKILL.md description, add negative example
+grep -rl "create" .claude/skills/ | head -5  # impact scan: find colliding triggers
+
+# 2. Evaluate — verify the fix didn't break other cases
+python3 .claude/skills/prompt-eval/scripts/run_eval.py \
+  --prompts /tmp/skill-system-prompt.md \
+  --tests .claude/skills/doc-writer/evals.yaml \
+  --task-id reflect-fix-001 --output-dir /tmp/eval-out
+
+# 3. Publish — all tests pass + 3 successful runs + no recent reflects
+python3 scripts/publish.py .claude/skills/doc-writer/
+```
+
 ## When NOT to Create a Skill
 
 Don't build for hypothetical future needs. Skip if ANY apply:
@@ -59,8 +85,7 @@ Tool design matters more than prompt design. When a skill has `scripts/`, invest
 
 ## Writing Guidelines
 
-- **Do** include: non-obvious procedures, domain specifics, gotchas from real failures
-- **Don't** include: things Claude already knows, verbose explanations, auxiliary docs
-- **Keep** SKILL.md ≤150 lines (routing layer); move scenario details to references/
-- **Challenge each line**: "Would removing this cause Claude to make mistakes?" If not, cut it.
-- **Prefer examples over explanations**: One concrete pair teaches more than a paragraph
+- **Include**: non-obvious procedures, domain specifics, gotchas from real failures
+- **Exclude**: things Claude already knows, verbose explanations, auxiliary docs
+- **Size**: SKILL.md ≤150 lines; move scenario details to `references/`
+- **Litmus test**: "Would removing this line cause Claude to make mistakes?" If not, cut it