feat(entity): harden identity cleanup and repair tooling#176
Merged
Conversation
…tic memory - Add Entity nodes to FalkorDB with identity, aliases, and REFERENCED_IN edges - New identity consolidation task: dedup entities + synthesize 2-5 sentence identities via LLM - Identity-aware recall: inject entity identities into recall responses - Entity API: GET /entities, GET /entity/<slug>, GET /entities/merge-candidates, POST /entity/<slug>/merge - Migration script: scripts/migrate_entity_nodes.py (idempotent, --dry-run support) - Configurable: IDENTITY_SYNTHESIS_MODEL, CONSOLIDATION_IDENTITY_INTERVAL_SECONDS - Batch queries throughout (UNWIND edges, single-query recall injection) - Conditional synthesis: skip unchanged entities to save LLM calls - 18 tests covering migration, dedup, merge, synthesis, and recall
- Add admin token auth on entity merge endpoint - Validate limit parameter with 400 on invalid input - Handle None LLM response content gracefully - Remove redundant exc in logger.exception call - Prefix unused variable with underscore in tests - Add return type hint to connect_falkordb() - Run black + isort formatting on all changed files
- Reject self-merges with 400 before calling merge_entities() - Add missing 'from typing import Any' in migrate script - Fix _review reference in test assertion (line 483)
Line length 100, isort profile=black per .pre-commit-config.yaml
- Route contract test: add entity endpoints to expected routes - Recall: populate entities for tag/time-window queries without query_text - Entity dedup: use strict > (not >=) for auto-merge overlap threshold - Entity dedup: delete alias REFERENCED_IN edges after copying to canonical - Identity synthesis: persist true ref count, not truncated sample size - Scheduler: don't advance identity last_run when task was skipped
…ntity runs - Update entity_dedup docstring to match actual >60% threshold - Defensive parsing of ref count query result (handles unexpected row shapes) - Don't advance identity scheduler on error (not just skip)
Contributor
There was a problem hiding this comment.
Pull request overview
This PR hardens entity identity inputs and introduces deterministic validation/canonicalization across extraction, enrichment, migration/dedup, recall payloads, and local repair tooling; it also expands local production-clone restore/repair workflows with safer, more performant batching and tunable Qdrant/FalkorDB restore behavior.
Changes:
- Add entity-quality validation + canonicalization utilities and apply them in extraction/enrichment, migration/dedup, and recall identity payload injection.
- Introduce identity consolidation (dedup + optional identity synthesis) and wire it into consolidation scheduling/config (disabled by default).
- Add/upgrade local restore & repair tooling (batched UNWIND restores, Qdrant gRPC/timeout/tuning options, lab clone script improvements) plus extensive test coverage.
Reviewed changes
Copilot reviewed 39 out of 40 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
tests/test_vector_size_safety.py |
Adds tests for Qdrant timeout env propagation and payload-index skipping. |
tests/test_repair_entity_tags.py |
Comprehensive tests for local entity-tag repair planning/apply, batching, retries, and timeouts. |
tests/test_regression_drift_scout_contract.py |
Contract tests for the regression drift scout skill metadata/safety guarantees. |
tests/test_identity_config.py |
Tests that identity synthesis defaults to disabled and restores interval when enabled. |
tests/test_entity_quality.py |
Tests entity-quality rejection/canonicalization rules and avoids corpus-specific deny lists. |
tests/test_entity_identities.py |
Large integration-style suite for entity migration, dedup, merge, identity synthesis, and recall injection. |
tests/test_backup_endpoint.py |
Adds restore batching/retry/timeout tests and lab clone defaults assertions. |
tests/test_api_endpoints.py |
Tests vector overfetch behavior with tag filters and tag scoring ignoring generated entities. |
tests/contracts/test_routes_contract.py |
Adds entity API endpoints to the route contract. |
tests/conftest.py |
Extends Qdrant test stubs for optimizer/HNSW diff models and VectorParams(on_disk=...). |
scripts/restore_from_backup.py |
Adds batching, retries, timeouts, gRPC/tuning envs, and readiness waits for restores. |
scripts/migrate_entity_nodes.py |
New migration script to create first-class Entity nodes from entity:* tags with validation gates. |
scripts/lab/clone_production.sh |
Improves lab restore isolation/perf; supports API backup URL flow, gRPC port, and skip-api mode. |
README.md |
Updates Railway deployment description to include standalone graph viewer service group. |
INSTALLATION.md |
Documents adding the standalone graph viewer companion service and required env wiring. |
docs/RAILWAY_DEPLOYMENT.md |
Adds a dedicated “Standalone Graph Viewer” section and marks setup as completed in checklist. |
docs/ENVIRONMENT_VARIABLES.md |
Documents viewer CORS hardening and identity synthesis env vars. |
docker-compose.yml |
Adds gRPC port, configurable FalkorDB persistence args, and Qdrant timeout/payload-index toggles. |
consolidation.py |
Adds identity consolidation step and an OpenAI client helper for identity synthesis. |
benchmarks/EXPERIMENT_LOG.md |
Adds experiment log entries for recent main refresh and entity hardening run. |
automem/utils/scoring.py |
Excludes metadata.entities from recursive scoring traversal. |
automem/utils/entity_quality.py |
New deterministic entity validation/canonicalization module. |
automem/utils/entity_extraction.py |
Applies entity-quality validation and canonical display normalization to extracted entities. |
automem/stores/runtime_clients.py |
Passes Qdrant timeout from env; allows skipping payload indexes via env toggle. |
automem/enrichment/runtime_orchestration.py |
Validates/normalizes entities before tagging during enrichment/JIT enrichment. |
automem/consolidation/runtime_scheduler.py |
Includes identity step effects in scheduler emitted stats. |
automem/consolidation/runtime_helpers.py |
Adds identity interval override and includes identity in “full” mode task mapping. |
automem/consolidation/runtime_bindings.py |
Wires identity scheduler interval override into runtime creation. |
automem/consolidation/identity_synthesis.py |
New identity synthesis runner: current-state memory gathering + dedup + LLM identity storage. |
automem/consolidation/entity_dedup.py |
New entity dedup/merge logic using slug similarity + memory overlap with safeguards. |
automem/config.py |
Adds identity synthesis config vars and new task field mapping. |
automem/api/runtime_bootstrap.py |
Registers the new entity blueprint. |
automem/api/recall.py |
Adds entity identity injection in recall responses and adjusts vector candidate overfetch behavior with tag filters. |
automem/api/entity.py |
New Entity API endpoints (list/get/audit/merge/merge-candidates). |
app.py |
Wires identity interval config into consolidation runtime. |
.gitignore |
Ignores data/ directory. |
.env.example |
Documents viewer env vars and CORS configuration. |
.dockerignore |
Excludes additional local/lab/benchmark output directories from Docker build context. |
.agents/skills/automem-regression-drift-scout/SKILL.md |
Adds a read-only regression drift scout skill definition. |
This was referenced Jun 8, 2026
jack-arturo
added a commit
that referenced
this pull request
Jun 10, 2026
…nd event categories (#178) ## Summary Production Gate-3 review of the entity-tag repair rollout (the human review of `rejected-tags.csv` before executing `repair_entity_tags.py` against production) surfaced three over-rejection classes in the deployed validator. Of 7,677 planned tag removals, ~1,600 were legitimate entities. ### What was wrong 1. **Real people rejected by the context-hint branch.** `_looks_tool_or_org_like` rejected any person whose memory content contained generic words (`data`, `project`, `platform`, `tool`...). In an engineering corpus that's nearly every memory — 725 distinct multi-token person names were condemned, including the corpus owner's own canonical entity. 2. **`code` as a primary fragment token** rejected real tool entities: `claude-code` (80 occurrences), `vs-code`. 3. **`events` and `opportunities` categories missing from `_CATEGORY_ALIASES`** — every such tag was dropped as `unknown_category`. ### The fix - Person-shaped multi-token slugs skip the context-hint branch. CamelCase and tool/org suffix signals still apply (`growthmath`-style names are still rejected); single-token brand-like people (`automem`, `claude`) are still rejected contextually. - `code` demoted to `_MARKDOWN_OR_CODE_SECONDARY_TOKENS` (needs a second code-ish token to reject). People slugs containing `code` and path/markdown fragments remain rejected via existing checks. - `events`/`opportunities` (+ singular aliases) added to the category map. - Backstop for noise the context branch used to catch: `_NON_PERSON_COMMON_TOKENS` (bottom-line / deck-today / email-highlights / claude-desktop class) and `pipeline` added to `_NON_PERSON_TECH_TOKENS`. ### Empirical validation on a production clone (10,061 memories) | | deployed validator | this PR | |---|---|---| | planned rejections | 7,677 | 6,106 | | freed (all person/tool/event entities) | — | 1,603 across 743 distinct tags | | newly caught (all generated noise) | — | 32 | Every freed tag inspected by category: person names, `claude-code`/`vs-code`, `entity:events:*`, `entity:opportunities:*`. Every newly-rejected tag is generated noise (`good-plugins`, `chrome`, `claude-desktop`, `stream-deck`-as-person). This also stops enrichment-time over-stripping: the same validator gates `_validated_entities()`, so new memories were silently losing these entities on every store since #176 deployed. ## Test Plan - `pytest tests/` → 487 passed, 12 skipped (env-dependent) - `make lint` → clean - New tests: person names in technical context, code-suffixed tools, event/opportunity categories, common-word-pair people noise — all using synthetic names per the no-real-fixtures rule Refs #72. Part of the staged production entity repair (Gate 3 of the rollout runbook). 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Fable 5 <noreply@anthropic.com> Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
This was referenced Jun 10, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Breaking Changes
None. Identity payload fields are additive; repair tooling is local/admin CLI only.
Related
Refs #72 and #124.
Test Plan
make test-> 433 passed, 1 skipped, 25 deselectedmake lintprod-api-20260608-220749:sync-only,reject-only,canonicalize-safe, and post-migration all passed vector identity, zero bare-tag mutations, and no hard preserve regressions.prod-api-20260609-102848:sync-only,reject-only, andcanonicalize-safepassed; post-migration had one LoCoMo near-tie top swap classified as review with identical top-5 membership and no hard preserve regression.