Skip to content

Consolidate dot_prep into uniq_anchor; add regression tests#71

Merged
MariaNattestad merged 2 commits into
mainfrom
consolidate-dotprep
Jun 23, 2026
Merged

Consolidate dot_prep into uniq_anchor; add regression tests#71
MariaNattestad merged 2 commits into
mainfrom
consolidate-dotprep

Conversation

@MariaNattestad

@MariaNattestad MariaNattestad commented Jun 23, 2026

Copy link
Copy Markdown
Owner

From Maria

Remove a duplicate step that Dot and Assemblytics had in common. They both did unique anchor filtering, so now that's only done once.
Also add pytest with a github action to ensure basic regression testing runs automatically on all new PRs.

From Claude

  • Eliminates duplicate delta reads: dot_prep.py previously re-read the delta file twice and re-ran the full planesweep filtering that uniq_anchor.py had already done. Now uniq_anchor.run() builds and returns (reference_lengths, fields_by_query) during its existing second pass, which cli.py threads directly into index_for_dot().
  • -281 lines of duplicate code removed from dot_prep.py (duplicate summarize_planesweep, binary_search, getQueryRefCombinations, calculateUniqueness, writeFilteredDeltaFile).
  • ~3-5s faster per run (Drosophila: 16s → 12.6s, Human: 39s → 34s).
  • Fixed a latent bug in index_for_dot() where all_references_by_query used a stale ref variable for repetitive alignments instead of fields[6].
  • Regression tests: new tests/test_pipeline.py runs the full pipeline on the ecoli example (~5s) and does exact-match comparison against checked-in fixture files for all key text outputs.
  • CI workflow: .github/workflows/test.yml runs pytest on every push and PR.

Test plan

  • CI passes (test workflow runs on this PR)
  • Load ecoli/drosophila/human delta in web app Dot tab and confirm visualization renders correctly

🤖 Generated with Claude Code

MariaNattestad and others added 2 commits June 22, 2026 21:46
dot_prep previously re-read the delta file twice and re-ran the full
planesweep filtering already done by uniq_anchor. Now uniq_anchor builds
and returns (reference_lengths, fields_by_query) during its existing
second pass, which cli.py threads directly into index_for_dot(). This
removes ~300 lines of duplicate code and shaves ~3-4s off every run.

Also fixed a latent bug in index_for_dot() where all_references_by_query
used a stale `ref` variable for repetitive alignments; now uses fields[6].

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Runs the full pipeline on the ecoli example (fast, ~5s) and compares
key text outputs against checked-in fixtures. Covers structural variants
BED, coords files, dot visualization coords/index, and assembly stats.
GitHub Actions runs these on every push and PR.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@MariaNattestad MariaNattestad merged commit a509ddf into main Jun 23, 2026
1 check passed
@MariaNattestad MariaNattestad deleted the consolidate-dotprep branch June 23, 2026 09:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant