Metadata automation#246
Open
jim-nvidia wants to merge 4 commits into
Open
Conversation
Introduce an end-to-end generator for the catalog-wide metadata.json and skills.sh.json files, plus the schemas, configs, workflow, and docs that support it. Generator (.github/scripts/marketplace/generate-skill-metadata.py): - walks skills/ and parses SKILL.md frontmatter - carries forward existing values from the prior metadata.json baseline for byte-stable regen runs - falls back to an OpenAI-compatible LLM enrichment call (NVIDIA Inference API by default) for required fields the deterministic path cannot resolve, with strict JSON contract and no enum invention - validates every output against the schemas before writing Schemas and configs co-located under .github/scripts/marketplace/: - metadata.schema.json (single source of truth for controlled vocabulary) - skills-sh.schema.json (output structure for skills.sh.json) - skills-subdomains.json (subdomain titles, descriptions, ordering) - metadata-exclusions.yaml (temporary withholding list; currently empty) Workflow (.github/workflows/generate-skill-metadata.yml): - pull_request: runs --check --no-ai; PRs must produce byte-stable output - workflow_dispatch and post-sync: regenerate with AI; opens an auto-PR on changes; opens or updates a tracking issue on validation failure Docs: - docs/metadata-generation.md (local usage, CI behavior, AI contract) - docs/metadata-generation-prd.md (design) - docs/components-d-product-primary-audit.md (one-time audit of the components.d/ name vs product.primary enum mismatch surfaced while building this; provides decision points for the team) Signed-off-by: Jim Eagan <jeagan@nvidia.com>
The docs/ directory is the Fern-published external documentation site (docs.nvidia.com/skills). Internal pipeline documentation does not belong there. - docs/metadata-generation.md → .github/scripts/marketplace/README.md (auto-renders next to the generator and its schemas). - Removed docs/metadata-generation-prd.md (kept locally; design doc not needed in tree once the pipeline is shipped). - Removed docs/components-d-product-primary-audit.md (one-time team audit, archived elsewhere). Updated the moved README's relative paths so links to sibling marketplace files use ./ and the workflow link uses ../../workflows/. Dropped the PRD cross-link from the README opening paragraph. Signed-off-by: Jim Eagan <jeagan@nvidia.com>
When a skill's SKILL.md `name` or `description` changes, the generator previously updated those fields in metadata.json but silently kept the five MVP classification fields from the baseline, never re-evaluating them against the new content. This commit teaches the AI client an "amend" mode: the existing values are passed in as context and the model is asked, per field, whether to keep the value verbatim or change it because the new content clearly warrants a different controlled value. The prompt biases toward preservation — only clear mismatches are amended — so byte-stability is the common case for routine wording edits, while genuine content shifts get the metadata they deserve. Skills classified as `unchanged` still trigger zero AI calls, and --no-ai still preserves the existing metadata for materially-changed skills as-is. Also drops the explicit `temperature: 0` from the API body. The task is strict controlled-vocabulary classification constrained by response_format=json_object, so the model's default temperature is fine, and omitting the field keeps us compatible with deployments (e.g. gpt-5.x) that reject any explicit value. Verified end-to-end against the live inference API on gpt-5.5 with both a fill-mode call and an amend-mode call. Signed-off-by: Jim Eagan <jeagan@nvidia.com>
…nt failures Three fixes to the metadata generation pipeline: 1. Group sequence in skills.sh.json was being alphabetized on every run, destroying curated subdomain ordering. Generator now iterates skills-subdomains.json keys in insertion order rather than sorting by title. 2. Subdomain group descriptions were being overwritten with diverged values. Updated skills-subdomains.json to use canonical descriptions and corrected the $comment to reflect key-order-based emission. 3. A single skill failing AI enrichment blocked output for all other skills (all-or-nothing write). Per-skill enrichment failures now go to a separate skill_warnings list; valid skills are written and the run exits 1 with a PARTIAL SUCCESS report. validate_inventory_round_trip updated to exclude intentionally skipped skills. Also adds a $comment to skills.sh.json marking it as generated and directing editors to skills-subdomains.json for ordering and description changes. Signed-off-by: Jim Eagan <jeagan@nvidia.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Onboarding type
components.d/<slug>.ymlfile)For new product onboarding — author affirmations
By submitting this PR, I confirm on behalf of my team:
.agents/skills/orskills/path used for new entries (or existing path retained for legacy entries percomponents.d/<slug>.yml)Reviewer checklist (OSS Skills PIC)
components.d/<slug>.ymlentry valid (required fields, uniquecatalog_dir, path exists in source repo, filename slug matches name)SKILL.mdfrontmatter spec-compliant (at least one sampled)All PRs
git commit -s).If you forgot, run
git rebase --signoff origin/main && git push --force-with-leaseto retroactively sign all commits in your branch.Other context (for non-onboarding PRs)
cc: @sayalinvidia @jasonnvidia @
Introduces an end-to-end generator for the catalog-wide
metadata.jsonandskills.sh.jsonfiles, plus the schemas, configs, workflow, and docs that support it.What it does
Generator (
.github/scripts/marketplace/generate-skill-metadata.py):skills/and parsesSKILL.mdfrontmattermetadata.jsonbaseline for byte-stable regen runs (unchanged skills produce zero AI calls)skill_warningslist; valid skills write regardless (partial-success exit code 1)Schemas and configs (
.github/scripts/marketplace/):metadata.schema.json— single source of truth for controlled vocabularyskills-sh.schema.json— output structure forskills.sh.jsonskills-subdomains.json— subdomain titles, descriptions, ordering (keys are in canonical curated sequence; generator preserves insertion order)metadata-exclusions.yaml— temporary withholding list (currently empty)Workflow (
.github/workflows/generate-skill-metadata.yml):pull_request: runs--check --no-ai; PRs must produce byte-stable outputworkflow_dispatchand post-sync: regenerates with AI; opens an auto-PR on changes; opens or updates a tracking issue on validation failureDocs (
.github/scripts/marketplace/README.md): local usage, CI behavior, AI contract.Setup required (repo admin)
Before the workflow can run against
NVIDIA/skills, a repo admin must configure the following in Settings → Secrets and variables → Actions:INFERENCE_API_KEYINFERENCE_MODELopenai/gpt-5.5)INFERENCE_API_URLTest plan
--check --no-aimode produces byte-stable output on an unmodified treeskills.sh.jsonmatchesskills-subdomains.jsoninsertion orderNVIDIA/skills(requires secrets configured by admin)