feat(skills): harden MCP/skill trust boundaries and add per-skill security models#2326
Conversation
…urity models - pin MCP/CLI fetches, scope *_ALLOW_INSECURE to loopback, add Jira write-confirm gate - harden untrusted-content boundary, video-to-gif, jira, gitlab, accessibility runtimes - add SECURITY.md models for 9 skills plus template, instructions overlay, and registry 🔒 - Generated by Copilot
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2326 +/- ##
==========================================
+ Coverage 81.64% 86.82% +5.17%
==========================================
Files 130 80 -50
Lines 19471 9144 -10327
Branches 12 12
==========================================
- Hits 15898 7939 -7959
+ Misses 3570 1202 -2368
Partials 3 3
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
Eval Execution✅ Status: Passed — no merge-blocking failures (1 advisory assertion failure(s) present)
|
…, and eval - format test_jira_main.py and drop the unnecessary lambda (ruff + CodeQL) - normalize SECURITY.md and template tables; fix template broken doc links - rename mermaid node for cspell; reword trap wording for the eval text lint 🔒 - Generated by Copilot
katriendg
left a comment
There was a problem hiding this comment.
One low-severity observation (non-blocking): the two skills narrowed the escape hatch differently. gitlab.py allows loopback HTTP with no flag; jira.py now requires JIRA_ALLOW_INSECURE=1 even for loopback (the logic changed from "allow if flag OR loopback" to "allow only if loopback AND flag"). Both are secure - just inconsistent developer ergonomics. It's an intentional, documented behavior change, so it's fine to merge as-is; worth a follow-up note only if you want parity.
…parity) - gate loopback plaintext http behind GITLAB_ALLOW_INSECURE, matching jira - add tests for loopback http accept-with-flag and reject-without-flag - update gitlab SECURITY.md wording for the opt-in gate 🔒 - Generated by Copilot
|
Thanks @katriendg — good catch. We went with parity rather than leaving it as a follow-up.
Resolved in
Both skills now follow the same rule: HTTPS everywhere; plaintext |
This PR remediates the cross-surface security audit findings tracked in issue #2289 and, on top of that remediation, establishes a consistent per-skill STRIDE threat-model layer for every skill that ships an executable runtime. The work spans runtime hardening in several skill scripts, tightened trust-boundary guidance across consuming agents and prompts, and a new documentation layer (models, a shared template, a conformance instructions overlay, and a registry) so future skills follow the same structure.
Description
Trust-boundary and runtime hardening
The audit-driven remediation narrows escape hatches and pins floating inputs across the MCP configuration and skill scripts.
*_ALLOW_INSECUREescape hatches in the jira and gitlab skills to loopback hosts only, refusing non-loopback plaintext HTTP.Video-to-GIF script hardening
-vffiltergraph in convert.sh, matching the typed[ValidateRange]parameters already present in convert.ps1 and closing a filtergraph-injection vector.trap/finally) instead of a predictable path, closing a symlink/pre-creation race.timeout/gtimeout; a PowerShellInvoke-FFmpegProcessseam using the .NET process API withWaitForExit).Per-skill security models
A new documentation layer gives every runtime-bearing skill an authoritative, uniformly structured threat model.
SECURITY.mdSTRIDE models — assets, adversaries, all-six-STRIDE trust buckets, data-flow and trust-boundary diagrams, risk-rating tables, and aG-prefixed gap register — for nine skills: jira, gitlab, mural, tts-voiceover, accessibility, powerpoint, video-to-gif, gh-code-scanning, and customer-card-render.SECURITY.mdstructure._add_narration_timingXML parse as a tracked defence-in-depth gap (G-TAM-1), aligned with the XML-parser hardening audit in security(skills): audit codebase for unhardened XML parsers beyond PowerPoint skill #1056 / PR fix(skills): harden tts-voiceover XML parser against XXE (#1056) #1695, without duplicating that in-flight fix.Related Issue(s)
Fixes #2289
Type of Change
Select all that apply:
Code & Documentation:
Infrastructure & Configuration:
AI Artifacts:
prompt-builderagent and addressed all feedback.github/instructions/*.instructions.md).github/prompts/*.prompt.md).github/agents/*.agent.md).github/skills/*/SKILL.md).github/hooks/*/*.json)evals/)Other:
.ps1,.sh,.py)Sample Prompts (for AI Artifact Contributions)
User Request:
"Author a STRIDE security model for the gh-code-scanning skill following the shared template, then register it."
Execution Flow:
The agent reads .github/instructions/skill-security-model.instructions.md and docs/templates/skill-security-model-template.md, inspects the skill's runtime scripts to enumerate assets, adversaries, and trust buckets, writes a
SECURITY.mdwith all-six-STRIDE mitigations per bucket plus aG-prefixed gap register, and adds a row to both docs/security/security-model.md and docs/security/README.md.Output Artifacts:
A per-skill
SECURITY.mdnext to the skill'sSKILL.md, plus registry rows in the two security docs. Example header:Success Indicators:
npm run validate:skills,npm run lint:frontmatter, andnpm run lint:ai-artifactspass, and the model appears in both security-doc indexes.Testing
Automated validation run locally by the agent:
npm run validate:skills— 48 skills, 0 errors, 0 warnings.npm run lint:frontmatter— 796 files, 0 errors, 0 warnings.npm run lint:ai-artifacts— 0 issues.npm run test:ps(video-to-gif) — 20/20 Pester tests passed.Invoke-ScriptAnalyzeron convert.ps1 — clean;bash -n convert.sh— valid.Remaining template checks (
lint:md,spell-check,lint:md-links,docs:test,plugin:generate,eval:lint:schema) are deferred to CI. Manual testing was not performed.Checklist
Required Checks
AI Artifact Contributions
/prompt-analyzeto review contributionprompt-builderreviewRequired Automated Checks
The following validation commands must pass before merging:
npm run lint:mdnpm run spell-checknpm run lint:frontmatternpm run validate:skillsnpm run lint:md-linksnpm run lint:psnpm run eval:lint:schemanpm run plugin:generatenpm run docs:testSecurity Considerations
Additional Notes
This is AI-assisted security remediation. It requires qualified human review before merge; the human-review checkbox stays unchecked until a reviewer validates the change set.
Several controls intentionally change runtime behavior and are called out for reviewers:
*_ALLOW_INSECUREin jira and gitlab now permits loopback hosts only; non-loopback plaintext HTTP is refused.fps/width/loop/start/durationarguments that were previously passed through to FFmpeg.Certificate pinning remains out of scope and is tracked as a follow-on work item. The tts-voiceover
embed_audio.pyXML-parser hardening is intentionally left to the in-flight PR #1695 and is documented here as a tracked defence-in-depth gap.