Metadata automation by jim-nvidia · Pull Request #246 · NVIDIA/skills

jim-nvidia · 2026-06-06T00:05:33Z

Onboarding type

New product onboarding (new components.d/<slug>.yml file)
Other (catalog change, README fix, infrastructure, etc.)

For new product onboarding — author affirmations

By submitting this PR, I confirm on behalf of my team:

Skills cleared for open source release per NVIDIA's internal IP review process (six-question check, all answers affirmative)
License selected: Apache 2.0 / CC-BY 4.0 / Dual (Apache 2.0 + CC-BY 4.0). Specify: _____
No new license or new third-party component introduced beyond what the source repo already carries
Source repo is public and under an NVIDIA-owned GitHub org
.agents/skills/ or skills/ path used for new entries (or existing path retained for legacy entries per components.d/<slug>.yml)

NVIDIA contributors: see the internal onboarding guide for the IP review process details and license selection.

Reviewer checklist (OSS Skills PIC)

Author confirmations above are checked
components.d/<slug>.yml entry valid (required fields, unique catalog_dir, path exists in source repo, filename slug matches name)
SKILL.md frontmatter spec-compliant (at least one sampled)
No new license or third-party dependency requiring OSRB filing

All PRs

All commits signed off with DCO (git commit -s).
If you forgot, run git rebase --signoff origin/main && git push --force-with-lease to retroactively sign all commits in your branch.

Other context (for non-onboarding PRs)

cc: @sayalinvidia @jasonnvidia @
Introduces an end-to-end generator for the catalog-wide metadata.json and skills.sh.json files, plus the schemas, configs, workflow, and docs that support it.

What it does

Generator (.github/scripts/marketplace/generate-skill-metadata.py):

Walks skills/ and parses SKILL.md frontmatter
Carries forward existing values from the prior metadata.json baseline for byte-stable regen runs (unchanged skills produce zero AI calls)
For materially changed skills (name or description changed), runs an "amend" mode: passes existing classification values as context and asks the model per-field whether to keep or update — biased toward preservation
Falls back to an OpenAI-compatible LLM enrichment call (NVIDIA Inference API by default) for required fields the deterministic path cannot resolve, with strict JSON contract and no enum invention
Per-skill enrichment failures go to a skill_warnings list; valid skills write regardless (partial-success exit code 1)
Validates every output against the schemas before writing

Schemas and configs (.github/scripts/marketplace/):

metadata.schema.json — single source of truth for controlled vocabulary
skills-sh.schema.json — output structure for skills.sh.json
skills-subdomains.json — subdomain titles, descriptions, ordering (keys are in canonical curated sequence; generator preserves insertion order)
metadata-exclusions.yaml — temporary withholding list (currently empty)

Workflow (.github/workflows/generate-skill-metadata.yml):

pull_request: runs --check --no-ai; PRs must produce byte-stable output
workflow_dispatch and post-sync: regenerates with AI; opens an auto-PR on changes; opens or updates a tracking issue on validation failure

Docs (.github/scripts/marketplace/README.md): local usage, CI behavior, AI contract.

Setup required (repo admin)

Before the workflow can run against NVIDIA/skills, a repo admin must configure the following in Settings → Secrets and variables → Actions:

Name	Type	Value
`INFERENCE_API_KEY`	Repository secret	NVIDIA Inference API key
`INFERENCE_MODEL`	Repository variable	Model name (e.g. `openai/gpt-5.5`)
`INFERENCE_API_URL`	Repository variable	Optional — defaults to NVIDIA Inference API endpoint

Test plan

Generator runs locally end-to-end against the full skills tree (gpt-5.5 via Inference API)
--check --no-ai mode produces byte-stable output on an unmodified tree
Amend mode tested: fill-mode and amend-mode calls verified against live API
Partial-success path: a skill failing AI enrichment does not block output for others
Group ordering: subdomain sequence in skills.sh.json matches skills-subdomains.json insertion order
CI workflow run in NVIDIA/skills (requires secrets configured by admin)

Introduce an end-to-end generator for the catalog-wide metadata.json and skills.sh.json files, plus the schemas, configs, workflow, and docs that support it. Generator (.github/scripts/marketplace/generate-skill-metadata.py): - walks skills/ and parses SKILL.md frontmatter - carries forward existing values from the prior metadata.json baseline for byte-stable regen runs - falls back to an OpenAI-compatible LLM enrichment call (NVIDIA Inference API by default) for required fields the deterministic path cannot resolve, with strict JSON contract and no enum invention - validates every output against the schemas before writing Schemas and configs co-located under .github/scripts/marketplace/: - metadata.schema.json (single source of truth for controlled vocabulary) - skills-sh.schema.json (output structure for skills.sh.json) - skills-subdomains.json (subdomain titles, descriptions, ordering) - metadata-exclusions.yaml (temporary withholding list; currently empty) Workflow (.github/workflows/generate-skill-metadata.yml): - pull_request: runs --check --no-ai; PRs must produce byte-stable output - workflow_dispatch and post-sync: regenerate with AI; opens an auto-PR on changes; opens or updates a tracking issue on validation failure Docs: - docs/metadata-generation.md (local usage, CI behavior, AI contract) - docs/metadata-generation-prd.md (design) - docs/components-d-product-primary-audit.md (one-time audit of the components.d/ name vs product.primary enum mismatch surfaced while building this; provides decision points for the team) Signed-off-by: Jim Eagan <jeagan@nvidia.com>

The docs/ directory is the Fern-published external documentation site (docs.nvidia.com/skills). Internal pipeline documentation does not belong there. - docs/metadata-generation.md → .github/scripts/marketplace/README.md (auto-renders next to the generator and its schemas). - Removed docs/metadata-generation-prd.md (kept locally; design doc not needed in tree once the pipeline is shipped). - Removed docs/components-d-product-primary-audit.md (one-time team audit, archived elsewhere). Updated the moved README's relative paths so links to sibling marketplace files use ./ and the workflow link uses ../../workflows/. Dropped the PRD cross-link from the README opening paragraph. Signed-off-by: Jim Eagan <jeagan@nvidia.com>

When a skill's SKILL.md `name` or `description` changes, the generator previously updated those fields in metadata.json but silently kept the five MVP classification fields from the baseline, never re-evaluating them against the new content. This commit teaches the AI client an "amend" mode: the existing values are passed in as context and the model is asked, per field, whether to keep the value verbatim or change it because the new content clearly warrants a different controlled value. The prompt biases toward preservation — only clear mismatches are amended — so byte-stability is the common case for routine wording edits, while genuine content shifts get the metadata they deserve. Skills classified as `unchanged` still trigger zero AI calls, and --no-ai still preserves the existing metadata for materially-changed skills as-is. Also drops the explicit `temperature: 0` from the API body. The task is strict controlled-vocabulary classification constrained by response_format=json_object, so the model's default temperature is fine, and omitting the field keeps us compatible with deployments (e.g. gpt-5.x) that reject any explicit value. Verified end-to-end against the live inference API on gpt-5.5 with both a fill-mode call and an amend-mode call. Signed-off-by: Jim Eagan <jeagan@nvidia.com>

…nt failures Three fixes to the metadata generation pipeline: 1. Group sequence in skills.sh.json was being alphabetized on every run, destroying curated subdomain ordering. Generator now iterates skills-subdomains.json keys in insertion order rather than sorting by title. 2. Subdomain group descriptions were being overwritten with diverged values. Updated skills-subdomains.json to use canonical descriptions and corrected the $comment to reflect key-order-based emission. 3. A single skill failing AI enrichment blocked output for all other skills (all-or-nothing write). Per-skill enrichment failures now go to a separate skill_warnings list; valid skills are written and the run exits 1 with a PARTIAL SUCCESS report. validate_inventory_round_trip updated to exclude intentionally skipped skills. Also adds a $comment to skills.sh.json marking it as generated and directing editors to skills-subdomains.json for ordering and description changes. Signed-off-by: Jim Eagan <jeagan@nvidia.com>

jasonnvidia and others added 4 commits June 5, 2026 15:41

jim-nvidia requested review from mosheabr and sayalinvidia as code owners June 6, 2026 00:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Metadata automation#246

Metadata automation#246
jim-nvidia wants to merge 4 commits into
NVIDIA:mainfrom
jim-nvidia:metadata-automation

jim-nvidia commented Jun 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jim-nvidia commented Jun 6, 2026

Onboarding type

For new product onboarding — author affirmations

Reviewer checklist (OSS Skills PIC)

All PRs

Other context (for non-onboarding PRs)

What it does

Setup required (repo admin)

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants