feat(taxonomy): alias hygiene + server version advisory (follow-ups)#51
Conversation
Two taxonomy follow-ups:
1. Alias hygiene (fixes a regression item 4's blind alias-union shipped):
the canonical generator now drops any alias that shadows a real skill's
canonical name or is assigned to >1 skill. The union had pasted real
skills (Data Science, ETL, Web Development, Natural Language Processing,
MariaDB, Container Orchestration, ...) on as aliases of OTHER skills, and
mapped backend/frontend/server-side to two skills each — so e.g. 'data
science' resolved nondeterministically to Machine Learning instead of the
Data Science skill. 34 such aliases pruned; legitimate synonyms (py,
python3, k8s, postgres, golang, ...) kept. Regenerated taxonomy.json.
2. Version handshake on the client: /vault-git/info may now carry a
taxonomy {version, lineage} block; gitsync parses it (backward-compatible
— absent on older servers → no advisory), and compares it
against the bundled (lineage, version) and advises 'upgrade to refresh'
when the same-lineage server is ahead (a different lineage reports
'different taxonomy, not version-comparable').
Tests: alias-hygiene invariant (0 shadows / 0 multi-mapped), advisory cases,
status JSON shape. Full suite 629 passed; ruff clean; mypy src unchanged
(pre-existing cli.py:38 only). Pairs with the Cloud /vault-git/info change.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b69d80060f
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| kept = [ | ||
| a for a in e["aliases"] | ||
| if a.lower() not in names_lower and len(assigned[a.lower()]) == 1 | ||
| ] |
There was a problem hiding this comment.
Bump taxonomy version for alias-pruned artifact
When this new hygiene pass prunes aliases, the generated taxonomy.json content changes, but CANONICAL_VERSION remains at 2 even though the script says to bump it whenever artifact content changes. The sync advisory added in this commit compares only (lineage, version), so a client/server pair with one side built before this pruning and the other after it will both report canonical v2 and sync status will say they are aligned even though exact skill resolution can differ for inputs like AWS, NLP, or Data Science. Please bump the artifact version with this content change.
Useful? React with 👍 / 👎.
Summary
Two of the three taxonomy follow-ups (the third, artifact-driven seeding, needs a cross-repo decision — see below).
1. Alias hygiene — fixes a regression item 4 shipped
Item 4's blind alias-union pasted real canonical skills onto other skills as aliases (
Data Science,ETL,Web Development,Natural Language Processing,MariaDB,Container Orchestration, … were each a real skill and an alias of a different one), and mappedbackend/frontend/server-sideto two skills each. Sofind_exact("data science")resolved nondeterministically to Machine Learning instead of the actual Data Science skill. The generator now drops any alias that shadows a canonical skill name or is assigned to >1 skill — 34 pruned, legit synonyms (py,python3,k8s,postgres,golang, …) kept. Regeneratedtaxonomy.json. New invariant test: 0 shadows / 0 multi-mapped.2. Server version advisory (the refresh-detection half of follow-up 3)
/vault-git/infomay now carry ataxonomy {version, lineage}block.gitsyncparses it (backward-compatible — absent on older servers ⇒ no advisory), andtraitprint sync statuscompares it against the bundled(lineage, version): advises "upgrade to refresh" when the same-lineage server is ahead, and reports "different taxonomy, not version-comparable" on a lineage mismatch. (Auto-download is deliberately out — it needs a hosting endpoint; this ships the detection + advisory.)Pairs with
traitprint-cloudadding thetaxonomyblock to/vault-git/info(server half).The third follow-up (artifact-driven seeding) — needs your call, not shipped
Making Cloud's seed regenerate from the canonical artifact (true single source) + a drift-check is genuinely architectural: it requires deciding where the canonical artifact physically lives across two repos (vendor-and-sync vs a published asset both fetch), and the artifact would need to carry Cloud's typed/weighted relationships (it currently only keeps Local's derived symmetric neighbors). I didn't want to pick a cross-repo sync mechanism or rewrite production seeding blind. Options written up in the next message.
Testing
Full suite 628 passed / 3 skipped; ruff clean; mypy
src/unchanged (pre-existingcli.py:38only).https://claude.ai/code/session_01PsAQUnoLH94f2cbK2dSpox
Generated by Claude Code