fix(entity): restore person-shape exemption on the slug validation path#179
Merged
Merged
Conversation
PR #178's applied review suggestion (e6876b0) gated the person-shape exemption in _looks_tool_or_org_like on the display value containing a space. Stored entity tags only retain the slug, so validate_entity_tag (the repair-script path) never satisfied the guard and real people (jack-arturo, zack-katz, ...) were re-rejected by context hints — on the production corpus the repair dry-run planned 7,494 rejections instead of the expected ~6,100. CI missed it because every test exercised the spaced-value path. Restore the plain exemption and address the original review concern (brand-like person-shaped pairs such as data-dog) deterministically by adding "data" to _NON_PERSON_TECH_TOKENS, so the per-token vocabulary check rejects them on every path, with or without context. Adds slug-path regression tests so this gap stays covered. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Contributor
There was a problem hiding this comment.
Pull request overview
This PR adjusts the entity-quality validator so that the “person-shaped name” exemption from context-hint rejection applies consistently on the slug/tag validation path (validate_entity_tag), preventing legitimate entity:people:* tags from being incorrectly rejected during repair runs.
Changes:
- Restores the multi-token person-name exemption in
_looks_tool_or_org_likefor slug-only inputs (removes the “display value must contain a space” guard). - Adds
"data"to_NON_PERSON_TECH_TOKENSso brand-like pairs such asdata-dogare rejected deterministically via token vocabulary checks. - Adds regression tests covering slug/tag-path behavior (including
validate_entity_tag).
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
automem/utils/entity_quality.py |
Ensures person-shaped multi-token slugs skip context-hint condemnation even when validated via validate_entity_tag; adds "data" to non-person tech tokens. |
tests/test_entity_quality.py |
Adds tests for tag-path acceptance/rejection; one test needs adjustment to actually reproduce the prior regression. |
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
This was referenced Jun 10, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
PR #178's applied review suggestion (commit e6876b0, "Potential fix for pull request finding") gated the person-shape exemption on the display value containing a space:
Stored entity tags only retain the slug (
entity:people:jack-arturo), sovalidate_entity_tag(context=...)— the pathscripts/lab/repair_entity_tags.pyuses — never satisfies the guard, and the context-hint branch re-rejects every real person mentioned alongside data/projects/tooling.Empirical impact (prod dry-run, read-only): 7,494 planned rejections with the guard vs ~6,106 expected with the exemption — ~1,390 legitimate person tags (jack-arturo ×51, zack-katz ×27, jason-coleman ×25, katie-keith ×14, ...) wrongly planned for removal. CI stayed green because every existing test exercised the spaced-value path (
validate_entity_value("people", "Mara Quinn", ...)), never the slug path with context.What
_looks_tool_or_org_like.entity:people:data-dog) deterministically: add"data"to_NON_PERSON_TECH_TOKENS, so brand-like person-shaped pairs are rejected by the per-token vocabulary check on every path (with or without context) — strictly stronger than the context-hint rejection the guard tried to preserve.validate_entity_tag;data-dogis rejected withlow_signal_people_slugwithout needing context.Verification
pytest tests/: 490 passed, 12 skippedjack-arturo/zack-katz/jason-coleman/katie-keithaccepted with technical context on the tag path;data-dog,growthmath,claude-code-as-people still rejectedPart of the issue #72 production repair rollout (follow-up to #178).
🤖 Generated with Claude Code