Skip to content

fix(entity): restore person-shape exemption on the slug validation path#179

Merged
jack-arturo merged 2 commits into
mainfrom
fix/entity-validator-slug-path-exemption
Jun 10, 2026
Merged

fix(entity): restore person-shape exemption on the slug validation path#179
jack-arturo merged 2 commits into
mainfrom
fix/entity-validator-slug-path-exemption

Conversation

@jack-arturo

Copy link
Copy Markdown
Member

Why

PR #178's applied review suggestion (commit e6876b0, "Potential fix for pull request finding") gated the person-shape exemption on the display value containing a space:

if " " in (value or "").strip() and len(parts) >= 2 and _has_person_name_shape(parts):

Stored entity tags only retain the slug (entity:people:jack-arturo), so validate_entity_tag(context=...) — the path scripts/lab/repair_entity_tags.py uses — never satisfies the guard, and the context-hint branch re-rejects every real person mentioned alongside data/projects/tooling.

Empirical impact (prod dry-run, read-only): 7,494 planned rejections with the guard vs ~6,106 expected with the exemption — ~1,390 legitimate person tags (jack-arturo ×51, zack-katz ×27, jason-coleman ×25, katie-keith ×14, ...) wrongly planned for removal. CI stayed green because every existing test exercised the spaced-value path (validate_entity_value("people", "Mara Quinn", ...)), never the slug path with context.

What

  • Restore the plain person-shape exemption (no space guard) in _looks_tool_or_org_like.
  • Address the original Copilot concern (entity:people:data-dog) deterministically: add "data" to _NON_PERSON_TECH_TOKENS, so brand-like person-shaped pairs are rejected by the per-token vocabulary check on every path (with or without context) — strictly stronger than the context-hint rejection the guard tried to preserve.
  • Regression tests for the slug path: real-person tags survive technical context via validate_entity_tag; data-dog is rejected with low_signal_people_slug without needing context.

Verification

  • pytest tests/: 490 passed, 12 skipped
  • Spot checks: jack-arturo/zack-katz/jason-coleman/katie-keith accepted with technical context on the tag path; data-dog, growthmath, claude-code-as-people still rejected

Part of the issue #72 production repair rollout (follow-up to #178).

🤖 Generated with Claude Code

PR #178's applied review suggestion (e6876b0) gated the person-shape
exemption in _looks_tool_or_org_like on the display value containing a
space. Stored entity tags only retain the slug, so validate_entity_tag
(the repair-script path) never satisfied the guard and real people
(jack-arturo, zack-katz, ...) were re-rejected by context hints — on the
production corpus the repair dry-run planned 7,494 rejections instead of
the expected ~6,100. CI missed it because every test exercised the
spaced-value path.

Restore the plain exemption and address the original review concern
(brand-like person-shaped pairs such as data-dog) deterministically by
adding "data" to _NON_PERSON_TECH_TOKENS, so the per-token vocabulary
check rejects them on every path, with or without context.

Adds slug-path regression tests so this gap stays covered.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings June 10, 2026 21:35

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adjusts the entity-quality validator so that the “person-shaped name” exemption from context-hint rejection applies consistently on the slug/tag validation path (validate_entity_tag), preventing legitimate entity:people:* tags from being incorrectly rejected during repair runs.

Changes:

  • Restores the multi-token person-name exemption in _looks_tool_or_org_like for slug-only inputs (removes the “display value must contain a space” guard).
  • Adds "data" to _NON_PERSON_TECH_TOKENS so brand-like pairs such as data-dog are rejected deterministically via token vocabulary checks.
  • Adds regression tests covering slug/tag-path behavior (including validate_entity_tag).

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
automem/utils/entity_quality.py Ensures person-shaped multi-token slugs skip context-hint condemnation even when validated via validate_entity_tag; adds "data" to non-person tech tokens.
tests/test_entity_quality.py Adds tests for tag-path acceptance/rejection; one test needs adjustment to actually reproduce the prior regression.

Comment thread tests/test_entity_quality.py
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants