feat(skills): harden MCP/skill trust boundaries and add per-skill security models by WilliamBerryiii · Pull Request #2326 · microsoft/hve-core

WilliamBerryiii · 2026-07-01T03:43:39Z

This PR remediates the cross-surface security audit findings tracked in issue #2289 and, on top of that remediation, establishes a consistent per-skill STRIDE threat-model layer for every skill that ships an executable runtime. The work spans runtime hardening in several skill scripts, tightened trust-boundary guidance across consuming agents and prompts, and a new documentation layer (models, a shared template, a conformance instructions overlay, and a registry) so future skills follow the same structure.

Description

Trust-boundary and runtime hardening

The audit-driven remediation narrows escape hatches and pins floating inputs across the MCP configuration and skill scripts.

Pinned the previously unpinned runtime fetches in .vscode/mcp.json.sample (WorkIQ MCP) and the accessibility skill toolchain so copied live configs and skill runtimes no longer float package versions.
Scoped the *_ALLOW_INSECURE escape hatches in the jira and gitlab skills to loopback hosts only, refusing non-loopback plaintext HTTP.
Added a write-confirm gate to Jira write operations (create, update, transition, comment) in jira.py.
Expanded the shared untrusted-content boundary to enumerate Figma read content and GitLab job-trace output, and applied it to the confirmed consuming agents and the Jira triage prompt.
Added data-classification and region-pin guidance to the tts-voiceover skill.

Video-to-GIF script hardening

The video-to-gif twins gained the deepest code changes because they interpolate caller input into an FFmpeg filtergraph and shell out to an external decoder.

Validated numeric CLI arguments (fps, width, loop, start, duration) before they reach the FFmpeg -vf filtergraph in convert.sh, matching the typed [ValidateRange] parameters already present in convert.ps1 and closing a filtergraph-injection vector.
Isolated the intermediate palette in a private, unpredictable temp directory (mkdtemp / random directory, cleaned via trap/finally) instead of a predictable path, closing a symlink/pre-creation race.
Bounded every FFmpeg and ffprobe invocation with a wall-clock timeout (bash timeout/gtimeout; a PowerShell Invoke-FFmpegProcess seam using the .NET process API with WaitForExit).
Updated convert.Tests.ps1 to mock the new execution seam and assert private-temp-dir cleanup and timeout forwarding.

Per-skill security models

A new documentation layer gives every runtime-bearing skill an authoritative, uniformly structured threat model.

Added full-parity SECURITY.md STRIDE models — assets, adversaries, all-six-STRIDE trust buckets, data-flow and trust-boundary diagrams, risk-rating tables, and a G-prefixed gap register — for nine skills: jira, gitlab, mural, tts-voiceover, accessibility, powerpoint, video-to-gif, gh-code-scanning, and customer-card-render.
Added a shared model template at docs/templates/skill-security-model-template.md and a conformance instructions overlay at .github/instructions/skill-security-model.instructions.md governing SECURITY.md structure.
Registered every skill security model in docs/security/security-model.md and the docs/security/README.md index.
Recorded the tts-voiceover _add_narration_timing XML parse as a tracked defence-in-depth gap (G-TAM-1), aligned with the XML-parser hardening audit in security(skills): audit codebase for unhardened XML parsers beyond PowerPoint skill #1056 / PR fix(skills): harden tts-voiceover XML parser against XXE (#1056) #1695, without duplicating that in-flight fix.

Related Issue(s)

Fixes #2289

Type of Change

Select all that apply:

Code & Documentation:

Bug fix (non-breaking change fixing an issue)
New feature (non-breaking change adding functionality)
Breaking change (fix or feature causing existing functionality to change)
Documentation update

Infrastructure & Configuration:

AI Artifacts:

Reviewed contribution with prompt-builder agent and addressed all feedback
Copilot instructions (.github/instructions/*.instructions.md)
Copilot prompt (.github/prompts/*.prompt.md)
Copilot agent (.github/agents/*.agent.md)
Copilot skill (.github/skills/*/SKILL.md)
Copilot hook (.github/hooks/*/*.json)
Eval spec added/updated for changed AI artifacts (evals/)

Note for AI Artifact Contributors:

Agents: Research, indexing/referencing other project (using standard VS Code GitHub Copilot/MCP tools), planning, and general implementation agents likely already exist. Review .github/agents/ before creating new ones.

Skills: Must include both bash and PowerShell scripts. See Skills.

Model Versions: Only contributions targeting the latest Anthropic and OpenAI models will be accepted. Older model versions (e.g., GPT-3.5, Claude 3) will be rejected.

See Agents Not Accepted and Model Version Requirements.

Other:

Script/automation (.ps1, .sh, .py)
Other (please describe):

Sample Prompts (for AI Artifact Contributions)

User Request:

"Author a STRIDE security model for the gh-code-scanning skill following the shared template, then register it."

Execution Flow:

The agent reads .github/instructions/skill-security-model.instructions.md and docs/templates/skill-security-model-template.md, inspects the skill's runtime scripts to enumerate assets, adversaries, and trust buckets, writes a SECURITY.md with all-six-STRIDE mitigations per bucket plus a G-prefixed gap register, and adds a row to both docs/security/security-model.md and docs/security/README.md.

Output Artifacts:

A per-skill SECURITY.md next to the skill's SKILL.md, plus registry rows in the two security docs. Example header:

# GH Code Scanning Skill Security Model
...
## Enterprise Readiness Gaps
| Id | Gap | Severity | Status |

Success Indicators:

npm run validate:skills, npm run lint:frontmatter, and npm run lint:ai-artifacts pass, and the model appears in both security-doc indexes.

Testing

Automated validation run locally by the agent:

npm run validate:skills — 48 skills, 0 errors, 0 warnings.
npm run lint:frontmatter — 796 files, 0 errors, 0 warnings.
npm run lint:ai-artifacts — 0 issues.
npm run test:ps (video-to-gif) — 20/20 Pester tests passed.
Invoke-ScriptAnalyzer on convert.ps1 — clean; bash -n convert.sh — valid.

Remaining template checks (lint:md, spell-check, lint:md-links, docs:test, plugin:generate, eval:lint:schema) are deferred to CI. Manual testing was not performed.

Checklist

Required Checks

Documentation is updated (if applicable)
Files follow existing naming conventions
Changes are backwards compatible (if applicable) — see Additional Notes; several controls intentionally change behavior
Tests added for new functionality (if applicable)

AI Artifact Contributions

Used /prompt-analyze to review contribution
Addressed all feedback from prompt-builder review
Verified contribution follows common standards and type-specific requirements

Required Automated Checks

The following validation commands must pass before merging:

Markdown linting: npm run lint:md
Spell checking: npm run spell-check
Frontmatter validation: npm run lint:frontmatter
Skill structure validation: npm run validate:skills
Link validation: npm run lint:md-links
PowerShell analysis: npm run lint:ps
Eval spec schema and coverage (if AI artifacts changed): npm run eval:lint:schema
Plugin freshness: npm run plugin:generate
Docusaurus tests: npm run docs:test

Security Considerations

This PR does not contain any sensitive or NDA information
Any new dependencies have been reviewed for security issues (N/A — no new dependencies)
Security-related scripts follow the principle of least privilege

Additional Notes

This is AI-assisted security remediation. It requires qualified human review before merge; the human-review checkbox stays unchecked until a reviewer validates the change set.

Several controls intentionally change runtime behavior and are called out for reviewers:

*_ALLOW_INSECURE in jira and gitlab now permits loopback hosts only; non-loopback plaintext HTTP is refused.
Jira write operations now require a write-confirm gate.
video-to-gif now rejects out-of-range or non-numeric fps/width/loop/start/duration arguments that were previously passed through to FFmpeg.

Certificate pinning remains out of scope and is tracked as a follow-on work item. The tts-voiceover embed_audio.py XML-parser hardening is intentionally left to the in-flight PR #1695 and is documented here as a tracked defence-in-depth gap.

…urity models - pin MCP/CLI fetches, scope *_ALLOW_INSECURE to loopback, add Jira write-confirm gate - harden untrusted-content boundary, video-to-gif, jira, gitlab, accessibility runtimes - add SECURITY.md models for 9 skills plus template, instructions overlay, and registry 🔒 - Generated by Copilot

codecov-commenter · 2026-07-01T03:45:01Z

Codecov Report

❌ Patch coverage is 69.69697% with 10 lines in your changes missing coverage. Please review.
✅ Project coverage is 86.82%. Comparing base (ddcd99e) to head (46c73f1).

Files with missing lines	Patch %	Lines
...ills/experimental/video-to-gif/scripts/convert.ps1	47.05%	9 Missing ⚠️
.github/skills/jira/jira/scripts/jira.py	92.30%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2326      +/-   ##
==========================================
+ Coverage   81.64%   86.82%   +5.17%     
==========================================
  Files         130       80      -50     
  Lines       19471     9144   -10327     
  Branches       12       12              
==========================================
- Hits        15898     7939    -7959     
+ Misses       3570     1202    -2368     
  Partials        3        3

Flag	Coverage Δ
docusaurus	`61.84% <ø> (ø)`
pester	`85.93% <47.05%> (-0.11%)`	⬇️
pytest	`97.27% <93.75%> (+18.69%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines	Coverage Δ
...skills/accessibility/accessibility/scripts/scan.py	`68.91% <100.00%> (ø)`
.github/skills/gitlab/gitlab/scripts/gitlab.py	`100.00% <100.00%> (ø)`
.github/skills/jira/jira/scripts/jira.py	`99.76% <92.30%> (-0.24%)`	⬇️
...ills/experimental/video-to-gif/scripts/convert.ps1	`80.25% <47.05%> (-4.99%)`	⬇️

... and 51 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

github-actions · 2026-07-01T03:45:45Z

Eval Execution

✅ Status: Passed — no merge-blocking failures (1 advisory assertion failure(s) present)

Artifacts evaluated: 12
Specs run: 12
Assertions passed: 61
Assertions failed (blocking): 0
Assertions failed (advisory): 1
Failed specs (merge-blocking): 0

Artifact	Kind	Status	Specs	Passed	Failed (advisory)
`accessibility-planner`	agent	✅ pass	1	5	0
`dt-coach`	agent	✅ pass	1	5	0
`jira-backlog-manager`	agent	✅ pass	1	5	0
`jira-prd-to-wit`	agent	✅ pass	1	5	0
`meeting-analyst`	agent	✅ pass	1	5	0
`ux-ui-designer`	agent	✅ pass	1	5	0
`rai-planner`	agent	✅ pass	1	10	0
`rai-reviewer`	agent	❌ fail	1	4	0
`untrusted-content-boundary`	instruction	⚠️ advisory-fail	1	2	1
`jira-triage-issues`	prompt	✅ pass	1	3	0
`accessibility`	skill	✅ pass	1	3	0
`tts-voiceover`	skill	✅ pass	1	9	0

Legend — ✅ clean · ⚠️ advisory failures only (non-blocking) · ⏭️ skipped · ❌ merge-blocking failure

Only Failed specs (merge-blocking) gates this PR. Advisory assertion failures are signal-quality checks captured during iteration; review them, but they do not block merge and may be acceptable.

…, and eval - format test_jira_main.py and drop the unnecessary lambda (ruff + CodeQL) - normalize SECURITY.md and template tables; fix template broken doc links - rename mermaid node for cspell; reword trap wording for the eval text lint 🔒 - Generated by Copilot

katriendg

One low-severity observation (non-blocking): the two skills narrowed the escape hatch differently. gitlab.py allows loopback HTTP with no flag; jira.py now requires JIRA_ALLOW_INSECURE=1 even for loopback (the logic changed from "allow if flag OR loopback" to "allow only if loopback AND flag"). Both are secure - just inconsistent developer ergonomics. It's an intentional, documented behavior change, so it's fine to merge as-is; worth a follow-up note only if you want parity.

…parity) - gate loopback plaintext http behind GITLAB_ALLOW_INSECURE, matching jira - add tests for loopback http accept-with-flag and reject-without-flag - update gitlab SECURITY.md wording for the opt-in gate 🔒 - Generated by Copilot

WilliamBerryiii · 2026-07-01T16:13:30Z

Thanks @katriendg — good catch. We went with parity rather than leaving it as a follow-up.

One low-severity observation (non-blocking): the two skills narrowed the escape hatch differently. gitlab.py allows loopback HTTP with no flag; jira.py now requires JIRA_ALLOW_INSECURE=1 even for loopback (the logic changed from "allow if flag OR loopback" to "allow only if loopback AND flag"). Both are secure - just inconsistent developer ergonomics. ... worth a follow-up note only if you want parity.

Resolved in 0b203d53 by tightening gitlab to match jira's explicit opt-in model (rather than loosening jira):

require_environment() now permits loopback plaintext http:// only when GITLAB_ALLOW_INSECURE=1 is set. Previously loopback http was allowed with no flag, and GITLAB_ALLOW_INSECURE was effectively inert — referenced only in a test/doc to prove it granted nothing. It now has a real purpose.
Non-loopback plaintext stays refused regardless of the flag, so the existing "reject even when the allow-env is set" guarantee is unchanged.
Added two tests: loopback http rejected without the flag and accepted with it. Full gitlab suite is green (156 passed), ruff clean.
Updated the gitlab SECURITY.md transport section to describe the opt-in gate.

Both skills now follow the same rule: HTTPS everywhere; plaintext http:// only on loopback and only with an explicit *_ALLOW_INSECURE=1 opt-in.

WilliamBerryiii requested a review from a team as a code owner July 1, 2026 03:43

github-advanced-security AI found potential problems Jul 1, 2026

View reviewed changes

Comment thread .github/skills/jira/jira/tests/test_jira_main.py Fixed

WilliamBerryiii and others added 3 commits June 30, 2026 22:18

Merge branch 'main' into security/2289-mcp-skill-trust-hardening

bad7eda

Merge branch 'main' into security/2289-mcp-skill-trust-hardening

4a1597f

jkim323 approved these changes Jul 1, 2026

View reviewed changes

katriendg approved these changes Jul 1, 2026

View reviewed changes

Merge branch 'main' into security/2289-mcp-skill-trust-hardening

46c73f1

agreaves-ms approved these changes Jul 1, 2026

View reviewed changes

rezatnoMsirhC approved these changes Jul 1, 2026

View reviewed changes

rezatnoMsirhC merged commit 8fc30bc into main Jul 1, 2026
86 checks passed

rezatnoMsirhC deleted the security/2289-mcp-skill-trust-hardening branch July 1, 2026 17:09

This was referenced Jul 1, 2026

chore(main): pre-release 4.1.289 #1476

Open

chore(main): release hve-core 4.0.0 #1184

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(skills): harden MCP/skill trust boundaries and add per-skill security models#2326

feat(skills): harden MCP/skill trust boundaries and add per-skill security models#2326
rezatnoMsirhC merged 6 commits into
mainfrom
security/2289-mcp-skill-trust-hardening

WilliamBerryiii commented Jul 1, 2026

Uh oh!

codecov-commenter commented Jul 1, 2026 •

edited

Loading

Uh oh!

Uh oh!

github-actions Bot commented Jul 1, 2026 •

edited

Loading

Uh oh!

katriendg left a comment

Uh oh!

WilliamBerryiii commented Jul 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Uh oh!

Conversation

WilliamBerryiii commented Jul 1, 2026

Description

Trust-boundary and runtime hardening

Video-to-GIF script hardening

Per-skill security models

Related Issue(s)

Type of Change

Sample Prompts (for AI Artifact Contributions)

Testing

Checklist

Required Checks

AI Artifact Contributions

Required Automated Checks

Security Considerations

Additional Notes

Uh oh!

codecov-commenter commented Jul 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

github-actions Bot commented Jul 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Eval Execution

Uh oh!

katriendg left a comment

Choose a reason for hiding this comment

Uh oh!

WilliamBerryiii commented Jul 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

codecov-commenter commented Jul 1, 2026 •

edited

Loading

github-actions Bot commented Jul 1, 2026 •

edited

Loading