CraftKit

CraftKit is a cross-agent toolkit for creating, improving, and operationalizing prompts and skills for coding agents such as Claude Code and Codex.

Why CraftKit

Prompt assets and agent skills often become fragmented, provider-specific, and hard to reuse. CraftKit exists to keep them file-first, portable, reviewable, and easy to improve over time.

CraftKit is an artifact-quality toolkit: it helps author, critique, tune, and carry forward prompts and skills. It is not a general coding-agent workflow suite, project-management layer, deployment system, or runtime framework. When a workflow needs those things, CraftKit should produce clear files, specs, or handoffs that another tool can use rather than becoming the tool itself.

Install

All eight skills install as Agent Skills and are invocable in Claude Code through slash-style skill discovery. They are also plain SKILL.md files that can be used by Codex and other compatible agents.

Via npx skills

npx skills add sungjunlee/craftkit

Add -g -y for global install without prompts:

npx skills add sungjunlee/craftkit -g -y

Via Claude Code Plugin Marketplace

/plugin marketplace add https://github.com/sungjunlee/craftkit.git
/plugin install craftkit@craftkit

Install from a local clone

git clone https://github.com/sungjunlee/craftkit.git
cd craftkit
npx skills add . -g -y

For Codex or any other agent, see Use in other agents below.

The eight skills

Skill	Use when	Side effect
`craft-prompt`	a new prompt is needed from scratch for any LLM (Claude, GPT, Gemini, Perplexity, etc.)	returns copy-pasteable text
`craft-skill-spec`	a new skill needs a concrete spec based on current CraftKit skill-radar judgments before writing `SKILL.md`	returns a spec; reads radar references
`craft-harness`	a project-specific agent harness needs to be built, repaired, synced, pruned, or evolved across Codex and Claude Code	may inspect and edit repo-local harness files; gates risky surfaces
`craft-critique`	an existing prompt or skill needs a read-only review before editing or shipping	surfaces strengths, prioritized findings, recommendations, and a rewrite plan without editing
`craft-tune`	an existing prompt or skill needs sharpening applied	runs an autonomous review-and-fix loop and edits the artifact
`craft-survey`	a new skill should be grounded in prior art before drafting	returns read-only recommendations
`craft-autoresearch`	a prompt or skill works "sometimes" and needs eval-driven iteration	runs evals and may edit mutable files
`craft-handoff`	a session is ending and the next session needs a copy-paste-ready continuation prompt	writes handoff files and may copy to clipboard

When two skills could trigger, choose the least invasive one that answers the request: review-only wording goes to craft-critique; apply/fix/improve wording goes to craft-tune; repeated measurable failures go to craft-autoresearch; prior-art questions go to craft-survey; repo harness placement and Codex/Claude setup work goes to craft-harness.

Each skill lives at skills/<skill-name>/SKILL.md — plain markdown with YAML frontmatter, loadable as a Claude Code skill or copy-pasteable into any other agent.

Status

Six of the eight skills (craft-prompt, craft-critique, craft-tune, craft-survey, craft-autoresearch, craft-handoff) have been optimized through craft-autoresearch passes against eval suites — including craft-autoresearch itself (reflexive meta-pass). craft-tune was reshaped to run an autonomous self-converging review-and-fix loop; the read-only diagnose role stays with the separate craft-critique skill. The next autoresearch pass will run against this shape. craft-skill-spec and craft-harness are new and have not yet been through an autoresearch pass. Per-session baseline → kept-state scores and mutation rationale live in the commit bodies. Run artifacts are preserved at ~/.craftkit/autoresearch/<skill>/<date-slug>/ outside the repo.

Skill	Eval status	Score source	Known gap
`craft-prompt`	autoresearch pass completed	commit body + `~/.craftkit/autoresearch/craft-prompt/<date-slug>/`	keep volatile provider guidance in guides
`craft-critique`	autoresearch pass completed	commit body + `~/.craftkit/autoresearch/craft-critique/<date-slug>/`	re-run on fresh failure examples after major wording changes
`craft-tune`	autoresearch pass completed, then reshaped into self-converging loop	commit body + `~/.craftkit/autoresearch/craft-tune/<date-slug>/`	next pass should test the newer loop contract
`craft-survey`	autoresearch pass completed	commit body + `~/.craftkit/autoresearch/craft-survey/<date-slug>/`	example must keep proving provenance and edit-target rules
`craft-autoresearch`	reflexive autoresearch pass completed	commit body + `~/.craftkit/autoresearch/craft-autoresearch/<date-slug>/`	examples must stay synchronized with the contract fields
`craft-skill-spec`	not yet autoresearched	none yet	first pass should test radar-dependent standalone behavior
`craft-harness`	not yet autoresearched	none yet	first pass should test lifecycle modes, Codex/Claude target separation, and risk gates
`craft-handoff`	autoresearch pass completed	commit body + `~/.craftkit/autoresearch/craft-handoff/2026-05-26-goal-pressure/`	replay against real agent outputs to catch wording failures beyond deterministic checks

What belongs in CraftKit

generating new prompts from scratch (task, research, session handoff, templates)
prompt design and restructuring
reusable skill design
project-specific agent harness design and maintenance
diagnostic review and minimal-diff editing
iterative improvement loops
survey-backed best practices
time-aware curation of evolving skill-authoring patterns
copy-pasteable outputs for agent workflows

Design principles

File-first and diff-friendly
Small composable units
Explicit inputs and outputs
Cross-agent portability (core skill spines stay provider-neutral; platform-specific detail stays in guides or reference files)
Eval-driven improvement when possible
Copy-pasteable results over fancy abstractions

For evolving skill-authoring guidance, the craft-skill-spec skill carries its own radar layer at skills/craft-skill-spec/references/radar/ — start with current.md there and consult the dated snapshots only when a watch item needs deeper context.

craft-harness also ships reviewable hook asset recipes under skills/craft-harness/assets/hooks/. They provide shared .agents/hooks/scripts/ scripts plus Codex and Claude adapter snippets; they are intentionally not auto-installers.

Skill spine budget

AGENTS.md keeps a hard 500-line ceiling for each SKILL.md, but day-to-day edits should aim much lower:

Normal skills: about 100-160 lines.
Complex loop or orchestration skills: about 160-220 lines.
Anything growing past that should move examples, platform notes, maintenance commands, or edge-case catalogs into references/.

The spine should still be understandable alone: purpose, inputs, steps, output contract, one compact example, limitations, and links to on-demand references. References carry depth; the spine carries the operating path.

Use in other agents

CraftKit skills are plain markdown with YAML frontmatter, so they port easily:

Open the relevant SKILL.md.
Paste the body (everything after the frontmatter) into the target agent's system prompt or instructions.
Keep the frontmatter description line as context so the agent knows when to apply the skill.

See docs/examples/tune-a-prompt.md for a walk-through of diagnosing and tuning an existing prompt, then optionally running a short improvement loop.

Prior art

sungjunlee/prompt-builder — predecessor project. Its mature prompt-authoring asset (5-step process, 6 building blocks, platform guides, templates) was absorbed wholesale into craft-prompt. Kept on GitHub for reference; new work happens here.
karpathy/autoresearch — Andrej Karpathy's ML training-loop project that introduced the autoresearch methodology (give an agent a baseline, let it experiment overnight, keep what improves, discard what doesn't). craft-autoresearch adapts that loop discipline to prompt and skill artifacts instead of model training code.
byungjunjang/jangpm-meta-skills — four-skill meta toolkit for Claude Code and Codex (blueprint, deep-dive, reflect, autoresearch). Its autoresearch skill contributed implementation patterns — experiment contract shape, the three-eval-type taxonomy (binary / comparative / fidelity), deletion discipline — that craft-autoresearch builds on.

License

MIT — see LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 91 Commits
.claude-plugin		.claude-plugin
docs/examples		docs/examples
skills		skills
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CraftKit

Why CraftKit

Install

Via npx skills

Via Claude Code Plugin Marketplace

The eight skills

Status

What belongs in CraftKit

Design principles

Skill spine budget

Use in other agents

Prior art

License

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

CraftKit

Why CraftKit

Install

Via npx skills

Via Claude Code Plugin Marketplace

The eight skills

Status

What belongs in CraftKit

Design principles

Skill spine budget

Use in other agents

Prior art

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages