Skip to content

fix(loader): preserve blueprint_name and tags on file rename#219

Merged
devonjones merged 4 commits into
testfrom
fix/loader-rename-blueprint-name
May 5, 2026
Merged

fix(loader): preserve blueprint_name and tags on file rename#219
devonjones merged 4 commits into
testfrom
fix/loader-rename-blueprint-name

Conversation

@devonjones
Copy link
Copy Markdown
Collaborator

Summary

Fixes a long-standing bug in the incremental fixtures loader where renaming a file in Dropbox (same MD5, new path/basename) would leave the database row in a corrupt state: stale blueprint_name, stale search_text, no tags, no images, and deprecated = true.

This was discovered while loading the new aztlan series. A typo'd filename (atzlanaztlan) was caught and renamed in Dropbox, then the next fixture load corrupted all 27 affected floor rows. Investigation found the same pattern across other historical renames (5 dungeon_stone → aztlan column rows, 1 aztlan corner floor renamed 4x4 → 2x2, 7 building_facades gable rows, 25 rough_stone_ruined rows — 38 rows total).

What was happening

In _handle_addition's rename-within-same-fixture branch (incremental.py), when an MD5 conflict was rescued and detected as a rename:

  1. Only full_name and file_name were updated — blueprint_name and search_text were left pointing at the old file.
  2. Tags and images were correctly deleted-and-re-inserted from the new fixture data.
  3. Then the same row was hit by _handle_deprecation (because _find_missing_blueprints saw the old full_name was no longer in the fixture), which set deprecated = true and deleted the freshly re-inserted tags and images.

Compounding this, update_blueprint didn't include search_text in its allow-list, so even fixing the rename branch wouldn't have been enough on its own.

Changes

  • _handle_addition rename branch now also updates blueprint_name, recomputes search_text from the new basename + tag words, and clears deprecated.
  • The blueprint id is added to a per-load _renamed_blueprint_ids set; _handle_deprecation skips ids in this set so renamed rows aren't tombstoned in the same load.
  • update_blueprint's field list gains search_text.

Test plan

  • pytest tests/test_incremental_fixtures.py passes
  • Re-upload aztlan.json against a non-prod env and verify a renamed file ends up with: blueprint_name = file_name, deprecated = false, tags + images present, search_text containing the new tokens
  • Confirm the old buggy path is no longer reachable: a rename in a single load yields exactly one live row, not a deprecated tombstone

Note: prod data was already SQL-patched out of band (38 rows: synced blueprint_name to file_name, cleared deprecated, fixed search_text per category). The aztlan rows that lost tags/images still need a fixture re-upload; this PR is the prerequisite so that re-upload doesn't re-corrupt them.

🤖 Generated with Claude Code

When a file is renamed in Dropbox (same MD5, new path/basename), the
incremental fixtures loader was leaving blueprint_name + search_text
stale and re-deprecating the row in the same load — which then stripped
the freshly re-inserted tags and images via _handle_deprecation.

The rename branch in _handle_addition now also resyncs blueprint_name
and search_text to the new file basename and clears the deprecated
flag. The blueprint id is recorded so the subsequent deprecation pass
in the same load skips it instead of tombstoning the live row.

update_blueprint gains search_text in its allowed-fields list so the
loader can push the recomputed value through.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@devonjones
Copy link
Copy Markdown
Collaborator Author

/gemini review

Copy link
Copy Markdown
Collaborator Author

@devonjones devonjones left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review (Claude)

I've reviewed the changes in this PR. Here's my analysis:

Critical Issues

None. The fix is correct and addresses the documented corruption pattern.

Medium Issues

1. Latent bug: _handle_modification does not update search_text when tags change (openforge/db/fixtures/incremental.py:1029-1069)

This PR fixes the rename path, but the same class of bug exists in _handle_modification. The modify path is reached when MD5 and full_name are unchanged but tags, images, modified-time, size, or config differ (_has_significant_changes, lines 620-718). Tags feed into search_text via words in _blueprint_search_text. When tags change in modify, _munge_blueprint produces a bp_data dict with no search_text key, so update_blueprint's allow-list silently drops it (since the new search_text field is allow-listed but not present in the data) and the row's search_text stays stale.

This isn't introduced by this PR — it's pre-existing — but adding search_text to update_blueprint's allow-list now makes it trivially fixable here. Consider mirroring the rename branch's approach in _handle_modification:

# In _handle_modification, after _munge_blueprint:
words = self._get_words(modified_item)
bp_data["search_text"] = blueprint_sql._blueprint_search_text(bp_data, words)

If you'd rather scope this PR tightly to the rename bug, please file a follow-up issue — search consistency on tag edits is the same shape of corruption for users (search misses post-edit), just less visible than the rename case because the row isn't deprecated.

2. No test coverage for the rename path (tests/test_incremental_fixtures.py, tests/test_incremental.py)

I checked both incremental test files — neither has a test that exercises the same-MD5-rename branch in _handle_addition (no matches for rename or _renamed_blueprint_ids). Given the bug here corrupted ~38 production rows and was discovered by accident, this branch is worth pinning down. A unit test that:

  1. Inserts a blueprint with full_name tiles/foo/atzlan-1.stl and some tags/images.
  2. Runs apply_incremental_changes with a fixture containing the same MD5 but full_name tiles/foo/aztlan-1.stl.
  3. Asserts: one live row (not two), blueprint_name = "aztlan-1.stl", deprecated = false, tags + images present, search_text contains aztlan.

would lock in both the rename-resync fix and the deprecation-skip behavior. The PR description also lists this as an unchecked test-plan item — the fix is small enough that I'd push for the test to land alongside it.

3. Reaching across module boundary into blueprint_sql._blueprint_search_text (openforge/db/fixtures/incremental.py:924)

Calling a leading-underscore private function from another module is a convention break (the project's own CLAUDE.md documents _function_name as private). Two cleaner options:

  • Promote _blueprint_search_text to blueprint_search_text (public). It's a pure function with no side effects, and it's now used by two modules.
  • Better: wrap the rename update inside a new helper in blueprint_sql (e.g., rename_blueprint(curs, blueprint_id, new_full_name, words)) that handles the search_text recomputation internally. This keeps incremental.py from having to know about the search_text computation shape and prevents the call sites from drifting (cf. issue #1 above — if both modify and rename used the same helper, the latent bug would be fixed in one place).

I'd lean toward option 2 since it co-locates "what does it mean to update a blueprint's name" in one module.

Minor Issues

1. _renamed_blueprint_ids reset location is asymmetric with current_fixture_files (openforge/db/fixtures/incremental.py:398, 760)

current_fixture_files is reset in compare_fixture_data (the comparison phase). _renamed_blueprint_ids is reset in _apply_changes_with_cursor (the apply phase). Both are correct in the current call patterns, but the asymmetry means a future caller who runs compare → compare → apply would get the right behavior for one (current_fixture_files reflects the second compare) and the wrong behavior for the other (renamed_ids would be reset before the second compare's renamed rows are processed... actually the set is populated in _handle_addition during apply, not compare, so this is fine). Worth a one-line comment near the __init__ set to document that this state is per-apply and gets reset at the top of _apply_changes_with_cursor.

2. Comment style nit (openforge/db/fixtures/incremental.py:110-113, 808-810)

Inline em-dashes () are used in two new comments. Consistent with some existing comments in the file but not all. Not worth changing, just flagging in case the project has a style preference.

Positive Observations

  • Root-cause fix, not symptom suppression. The PR description traces the corruption through three interacting code paths (rescue-branch leaving stale fields, _find_missing_blueprints flagging the old name, _handle_deprecation stripping tags/images). The fix addresses the actual cause in each path rather than papering over it.
  • Per-load reset of _renamed_blueprint_ids. Resetting at the top of _apply_changes_with_cursor (line 760) correctly prevents leakage across fixture files when a single IncrementalFixturesLoader instance processes a directory in load_fixtures (openforge/db/fixtures/__init__.py:127-145). Good defensive instinct.
  • Search-text input shape consistency. _blueprint_search_text is called with {"blueprint_name": new_file_name} — matching how insert_blueprint calls it after _munge_blueprint sets bp["blueprint_name"] = data["file_metadata"]["file"] (the basename). Consistent indexing semantics between insert and rename — an easy thing to get wrong.
  • Allow-list extension is minimal. Just adding search_text to update_blueprint's fields list is the right surface area; no behavior change for existing callers since they don't pass it.
  • Dry-run safety. apply_incremental_changes returns early on dry_run=True (line 734-737) before _apply_changes_with_cursor, so the rename code path is never exercised in dry-run — _renamed_blueprint_ids stays empty, and no DB writes happen. Skipping deprecation is moot since deprecation also doesn't run. Correct by construction.
  • Comment explaining the deprecation skip. The inline comment at lines 808-817 explains why the skip exists, not just what it does. Future maintainers will need this — it's the kind of bug that's invisible without context.

Recommendations Summary

Must fix before merge:

None — the fix is correct as-is.

Should fix:

  1. Add a test for the same-MD5-rename branch (item Medium #2). The PR description even has it as an unchecked test-plan box; this is the regression test for the bug being fixed.
  2. Decide on _blueprint_search_text access (item Medium #3) — either promote to public or wrap in a blueprint_sql.rename_blueprint helper. The latter also opens the door to fixing item Medium #1 in one place.

Nice to have:

  1. File a follow-up (or fold into this PR) for the _handle_modification search_text staleness on tag changes (item Medium #1).
  2. One-line comment in __init__ documenting that _renamed_blueprint_ids is per-apply state (item Minor #1).

Overall Assessment

Solid bug fix that correctly addresses a real production data-corruption issue. The diagnosis in the PR description is excellent and the fix matches it. My main asks are a regression test for the rename branch and a small refactor to avoid the cross-module private-function call — neither blocks correctness. The latent _handle_modification search_text staleness deserves a follow-up regardless.

devonjones and others added 2 commits May 4, 2026 18:29
Addresses review feedback on PR #219:
- Promote _blueprint_search_text -> blueprint_search_text (public)
  to avoid cross-module access to a leading-underscore private.
- _handle_modification now recomputes search_text alongside other
  column updates, so tag edits keep search_text consistent (latent
  bug, same shape as the rename one).
- Add three regression tests for the same-MD5 rename branch:
  blueprint_name+search_text+deprecated resync, deprecation skip
  for renamed ids, and per-load reset of _renamed_blueprint_ids.
- Document _renamed_blueprint_ids as per-apply state.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@devonjones
Copy link
Copy Markdown
Collaborator Author

Response to Claude Review

Thanks for the thorough review. Pushed 2ad8ff4 (and a small chore cleanup 8d74ed9) addressing all three medium items:

Medium #1 (latent: _handle_modification doesn't update search_text): Fixed in this PR. _handle_modification now recomputes search_text via the public helper alongside the column update, so tag edits keep search consistent.

Medium #2 (no test coverage for the rename path): Added three regression tests in tests/test_incremental_fixtures.py:

  • test_rename_resyncs_blueprint_name_and_clears_deprecated — pins the resync of blueprint_name + search_text + deprecated on the same-MD5 rename branch.
  • test_renamed_id_skips_deprecation — verifies the deprecation step is a no-op for renamed ids (no mark_blueprint_deprecated, no delete_* side-effects).
  • test_apply_changes_resets_renamed_ids_per_load — guards against state leaking between fixtures when one loader instance processes a directory.

Medium #3 (cross-module access to _blueprint_search_text): Took option A — promoted _blueprint_search_textblueprint_search_text (public). Option B (a rename_blueprint helper that internalizes search_text) was tempting but would have moved the rename's tag/image delete-and-reinsert logic into the SQL module too, which felt like wrong layering — that orchestration belongs in the loader. Promoting the pure helper to public got us to one canonical search_text computation used by insert_blueprint, the rename branch, and now the modify branch.

Minor #1 (_renamed_blueprint_ids reset symmetry): Added a comment in __init__ documenting that this is per-apply state, reset at the top of _apply_changes_with_cursor.

Minor #2 (em-dash style): Won't fix — matches existing comment style in the file.

- _munge_blueprint reads file_metadata['file'], so add it to the
  fixture item used in the rename test.
- _apply_changes_with_cursor runs _link_deprecated_to_successors at
  the end which calls cursor.fetchall(); stub it.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@devonjones
Copy link
Copy Markdown
Collaborator Author

/gemini review

Copy link
Copy Markdown
Collaborator Author

@devonjones devonjones left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review (Claude, Round 2)

I've reviewed the changes addressing round-1 feedback. Here's my analysis.

Round 1 Resolution

  1. Latent _handle_modification search_text bug — RESOLVED. incremental.py:1056-1060 now recomputes search_text from bp_data (the _munge_blueprint output) before calling update_blueprint. _munge_blueprint always populates blueprint_name (from data["name"] for config blueprints, or from file_metadata["file"] for file-based blueprints — utils.py:43, 52-53), so blueprint_search_text works correctly for both types. The inline comment explaining why the recompute is necessary is a nice touch.

  2. Test coverage for the rename branch — RESOLVED. Three new tests in tests/test_incremental_fixtures.py:524-635:

    • test_rename_resyncs_blueprint_name_and_clears_deprecated — asserts the four corruption-prone fields (full_name, file_name, blueprint_name, deprecated) plus search_text are all in the update payload, and that search_text contains the new tokens (aztlan, floor) and not the old (atzlan).
    • test_renamed_id_skips_deprecation — asserts mark_blueprint_deprecated, delete_all_blueprint_tags, and delete_images_for_blueprint are NOT called when the id is in the renamed set.
    • test_apply_changes_resets_renamed_ids_per_load — asserts the set is empty after an apply pass, even when seeded with stale state.
      These test observable behavior (DB calls, post-state) rather than internal mechanics. The first test directly pins the original bug shape (stale atzlan not appearing in search_text).
  3. _blueprint_search_textblueprint_search_text — RESOLVED. Promoted in blueprints.py:97, used internally by insert_blueprint at line 107, and externally by incremental.py:927, 1057. Confirmed via grep: no remaining references to the old _blueprint_search_text private name anywhere in the codebase, and no other blueprint_sql._* private access from outside the module.

New Issues

None blocking. One minor note:

  • Test brittleness is acceptable but worth flagging. test_rename_resyncs_blueprint_name_and_clears_deprecated patches six SQL boundary functions. This is unavoidable without a real DB (the conftest's clean_tables autouse fixture forces Postgres for unit tests), and the assertions still target observable behavior (the contents of the update payload). If the rename branch is later refactored to call a single helper that does the update + tag/image refresh, several patch targets will need updating — but the assertion shape would survive. Acceptable trade-off.

Other Observations

  • .beads/.gitignore addition is correct. The two added entries (.jsonl.lock, .migration-hint-ts) match the actual untracked files visible in the workspace (.beads/.jsonl.lock, .beads/.migration-hint-ts) and are runtime/lock files that shouldn't be tracked.

  • Audit of all update_blueprint callers in incremental.py is consistent:

    • Line 937 (rename branch): sets search_text explicitly. Correct.
    • Line 991 (consolidated_paths only): doesn't change name/tags, no search_text needed. Correct.
    • Line 1060 (modification): now sets search_text. Correct.

    No other path touches name/tags without refreshing search_text.

  • _renamed_blueprint_ids lifecycle is well-documented. The class-level docstring (lines 110-116) and the reset comment at line 762 make the per-apply scoping unambiguous, which matters because the loader instance can be reused across fixtures in a directory walk.

  • Config-blueprint search_text recompute is safe. _munge_blueprint sets blueprint_name = data.get("name") for config blueprints, and _compare_single_item already skips items without a name (line 494-500), so blueprint_search_text will never be handed a None blueprint_name in practice.

Overall Assessment

Ready to merge. All three round-1 items are properly resolved (not just superficially), no new issues introduced, and the regression tests pin down the original bug shape concretely enough that a future regression would fail loudly.

@devonjones
Copy link
Copy Markdown
Collaborator Author

Response to Claude Round 2

Thanks for the audit. All three round-1 items confirmed resolved with no new issues introduced. Will let one more reviewer round run to catch any final feedback before asking about merge.

@devonjones
Copy link
Copy Markdown
Collaborator Author

/gemini review

@devonjones devonjones merged commit 085b735 into test May 5, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant