Skip to content

feat(taxonomy): alias hygiene + server version advisory (follow-ups)#51

Merged
the-data-viking merged 1 commit into
mainfrom
claude/serene-archimedes-knryzi
Jun 13, 2026
Merged

feat(taxonomy): alias hygiene + server version advisory (follow-ups)#51
the-data-viking merged 1 commit into
mainfrom
claude/serene-archimedes-knryzi

Conversation

@the-data-viking

Copy link
Copy Markdown
Contributor

Summary

Two of the three taxonomy follow-ups (the third, artifact-driven seeding, needs a cross-repo decision — see below).

1. Alias hygiene — fixes a regression item 4 shipped

Item 4's blind alias-union pasted real canonical skills onto other skills as aliases (Data Science, ETL, Web Development, Natural Language Processing, MariaDB, Container Orchestration, … were each a real skill and an alias of a different one), and mapped backend/frontend/server-side to two skills each. So find_exact("data science") resolved nondeterministically to Machine Learning instead of the actual Data Science skill. The generator now drops any alias that shadows a canonical skill name or is assigned to >1 skill34 pruned, legit synonyms (py, python3, k8s, postgres, golang, …) kept. Regenerated taxonomy.json. New invariant test: 0 shadows / 0 multi-mapped.

2. Server version advisory (the refresh-detection half of follow-up 3)

/vault-git/info may now carry a taxonomy {version, lineage} block. gitsync parses it (backward-compatible — absent on older servers ⇒ no advisory), and traitprint sync status compares it against the bundled (lineage, version): advises "upgrade to refresh" when the same-lineage server is ahead, and reports "different taxonomy, not version-comparable" on a lineage mismatch. (Auto-download is deliberately out — it needs a hosting endpoint; this ships the detection + advisory.)

Pairs with

traitprint-cloud adding the taxonomy block to /vault-git/info (server half).

The third follow-up (artifact-driven seeding) — needs your call, not shipped

Making Cloud's seed regenerate from the canonical artifact (true single source) + a drift-check is genuinely architectural: it requires deciding where the canonical artifact physically lives across two repos (vendor-and-sync vs a published asset both fetch), and the artifact would need to carry Cloud's typed/weighted relationships (it currently only keeps Local's derived symmetric neighbors). I didn't want to pick a cross-repo sync mechanism or rewrite production seeding blind. Options written up in the next message.

Testing

Full suite 628 passed / 3 skipped; ruff clean; mypy src/ unchanged (pre-existing cli.py:38 only).

https://claude.ai/code/session_01PsAQUnoLH94f2cbK2dSpox


Generated by Claude Code

Two taxonomy follow-ups:

1. Alias hygiene (fixes a regression item 4's blind alias-union shipped):
   the canonical generator now drops any alias that shadows a real skill's
   canonical name or is assigned to >1 skill. The union had pasted real
   skills (Data Science, ETL, Web Development, Natural Language Processing,
   MariaDB, Container Orchestration, ...) on as aliases of OTHER skills, and
   mapped backend/frontend/server-side to two skills each — so e.g. 'data
   science' resolved nondeterministically to Machine Learning instead of the
   Data Science skill. 34 such aliases pruned; legitimate synonyms (py,
   python3, k8s, postgres, golang, ...) kept. Regenerated taxonomy.json.

2. Version handshake on the client: /vault-git/info may now carry a
   taxonomy {version, lineage} block; gitsync parses it (backward-compatible
   — absent on older servers → no advisory), and  compares it
   against the bundled (lineage, version) and advises 'upgrade to refresh'
   when the same-lineage server is ahead (a different lineage reports
   'different taxonomy, not version-comparable').

Tests: alias-hygiene invariant (0 shadows / 0 multi-mapped), advisory cases,
status JSON shape. Full suite 629 passed; ruff clean; mypy src unchanged
(pre-existing cli.py:38 only). Pairs with the Cloud /vault-git/info change.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b69d80060f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +119 to +122
kept = [
a for a in e["aliases"]
if a.lower() not in names_lower and len(assigned[a.lower()]) == 1
]

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Bump taxonomy version for alias-pruned artifact

When this new hygiene pass prunes aliases, the generated taxonomy.json content changes, but CANONICAL_VERSION remains at 2 even though the script says to bump it whenever artifact content changes. The sync advisory added in this commit compares only (lineage, version), so a client/server pair with one side built before this pruning and the other after it will both report canonical v2 and sync status will say they are aligned even though exact skill resolution can differ for inputs like AWS, NLP, or Data Science. Please bump the artifact version with this content change.

Useful? React with 👍 / 👎.

@the-data-viking the-data-viking merged commit 85e37c3 into main Jun 13, 2026
4 checks passed
@the-data-viking the-data-viking deleted the claude/serene-archimedes-knryzi branch June 13, 2026 21:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants