Use CMAPLE in weekly-tree pipeline, not raxml + garli#1
Open
SamT123 wants to merge 27 commits into
Open
Conversation
These are changes from /home/slj38/ae which had not been committed to skepner/ae. I'm committing them, then making the changes necessary to use cmaple on top
Use CMAPLE for initial tree, not raxml
fb5c216 to
1ed7815
Compare
previously: first ancestor of sequence X with occurence of mutation Y. But this had the problem of the trim being bad if that one sequence is placed outside of the clade we are aiming for (well, it fails because the trimmed tree contains too few sequences) Now: find all nodes with occurrence of mutations, which do have mutations in 'required_mutations' and do not have mutations in 'forbidden mutations. Then check there is only one such clade with > min_tips descendants, and trim to that clade.
…orting So test dirs don't get linked as previous
drserajames
referenced
this pull request
in drserajames/ae
Jun 12, 2026
Port the unblocked half of AD ssm-report init.py: create the report working-dir subdirs, copy the static templates, and generate the date-substituted report.json / setup.json that the assembly core consumes. - init.py: init/init_dirs/copy_templates/make_report_json/ compute_substitutions/find_previous_dir - packaged templates: setup.json, index.html, README.org, root-gitignore, merges-index.html - bin/ssm-report-init wrapper Omits the site-specific infra it carried (albertine git-repo + hidb/seqdb/ locationdb rsync, and the rr/sy/rename-report-on-server deploy scripts that shell out to ssm-make/ssh i19/syput) and the figure-settings half of init_settings (serum-coverage/geographic makers, blocked on map-draw #1). Verified: scaffolds dirs+templates and writes substituted report.json/ setup.json; date logic unit-checked (Northern/Southern season, teleconference selection, October year split); the generated 233-page report.json round-trips through the assembler (read_json + LatexReport ctor, correct ts_dates). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
drserajames
referenced
this pull request
in drserajames/ae
Jun 12, 2026
Port the tree-layout (node-position) computation from acmacs-tal as the first buildable slice of the TAL subsystem (#3). The drawing layer remains blocked on the Cairo Surface backend from subsystem #1 — see cc/tal/PORTING.md. - cc/tal/layout.{hh,cc}: ae::tal::compute_layout(Tree&) -> TreeLayout {height, max_cumulative, leaves[], inodes[]}. Faithful port of acmacs-tal Tree::compute_cumulative_vertical_offsets() (shown leaves stacked one per default_vertical_offset; inode at the midpoint of its first/last shown child; horizontal = cumulative edge, reusing ae::tree::Tree::calculate_cumulative()). Uses an iterative post-order so deep ladderized trees don't overflow the stack. - cc/py/tal.cc: ae_backend.tal submodule exposing compute_layout + NodeLayout/ TreeLayout. Registered in cc/py/module.{cc,hh}; sources wired in meson.build (commented "# --- tal (subsystem #3) ---" blocks, appended). - cc/tal/PORTING.md: milestone-1 exploration — pipeline, Node model, the full LayoutElement->source-file map, the Surface dependency/blocker, phased port plan, and arm64 build/verify gotchas. - cc/tal/test/: hand-verifiable Newick tree + test-layout.py. Verify (arm64 build): python3 cc/tal/test/test-layout.py -> OK: layout verified (height=5.0, max_cumulative=3.0, 5 leaves, 4 inodes) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
drserajames
referenced
this pull request
in drserajames/ae
Jun 12, 2026
Two more headless Phase-A pieces of the TAL subsystem (#3), both reusing data ae::tree already carries; drawing remains Phase B, blocked on subsystem #1. - cc/tal/clades.{hh,cc}: ae::tal::compute_clade_sections(Tree&) -> [Clade{name, sections[]}]. Port of acmacs-tal Tree::make_clade_sections() — shown leaves grouped (in vertical order) into per-clade vertically-contiguous runs; a gap starts a new section. Reuses ae::tree::Leaf::clades. - cc/tal/time-series.{hh,cc}: ae::tal::compute_time_series(Tree&, interval, start?, end?) -> TimeSeries{slots[], dated/undated/outside counts}. Ports the data side of acmacs-tal time-series.cc (year/month/week/day slot generation + per-slot leaf counts) using ae::date + C++20 <chrono>, without porting acmacs-base/time-series. Reuses ae::tree::Leaf::date. - cc/py/tal.cc: expose compute_clade_sections / compute_time_series and their result types under ae_backend.tal. meson.build: clades.cc + time-series.cc added to sources_tal. - cc/tal/test/: tree-clades.json (dated, clade-annotated phylo-tree-v3 tree) + test-clades.py + test-time-series.py. Verify (arm64 build): python3 cc/tal/test/test-clades.py -> OK: clade sections verified ... python3 cc/tal/test/test-time-series.py -> OK: time series verified ... Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
drserajames
referenced
this pull request
in drserajames/ae
Jun 12, 2026
First drawing slice of the TAL subsystem (#3), unblocked now that map-draw (#1) has reached M3 and the CairoPdf surface exposes line()/text(). - cc/tal/draw-tree.{hh,cc}: ae::tal::export_tree_pdf(Tree&, output, image_size, labels). Port of the leaf/inode loop in acmacs-tal DrawTree::draw — each node's horizontal edge segment (x scaled by cumulative edge) plus the vertical connector spanning each inode's first..last shown child; optional leaf labels. Reuses ae::tal::compute_layout (Phase A) and the ae::draw::CairoPdf surface from subsystem #1. - cc/tal/tal-draw-main.cc + meson.build: new `tal-draw` executable (tal-draw [--labels] <tree.newick|tree.json[.xz]> <out.pdf> [size]). Cairo is linked only into this target, never into libae/ae_backend (mirrors chart-draw). - cc/tal/test/test-draw-tree.sh: asserts the test trees render to valid PDFs. Used the concrete CairoPdf directly rather than first extracting an abstract ae::draw::Surface — lowest-conflict path while map-draw is actively evolving CairoPdf; the abstraction is deferred (see cc/tal/PORTING.md). Verify (arm64 build): ninja -C build-arm64 tal-draw sh cc/tal/test/test-draw-tree.sh -> OK: tal-draw renders valid PDFs (a 20-leaf tree was also rasterised and eyeballed — correct topology, branch- length scaling and labels.) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
drserajames
referenced
this pull request
in drserajames/ae
Jun 13, 2026
The antigenic-map figures the report embeds come from kateri (the Dart map viewer/PDF generator, driven over a socket via ae.utils.kateri), not the shelved C++ map-draw subsystem (#1). Update the package docs accordingly: - report.py / labs.py / __init__.py / init.py: drop "blocked on map-draw / not-yet-ported map-draw subsystem" framing - README.md: rewrite the dependency boundary as a per-figure table (maps -> kateri, trees -> TAL, geography -> no ae renderer yet, stat -> ae_backend counts) and restate the next milestone as a kateri-based figure pipeline Docs only; no code behaviour change, package still imports clean. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
drserajames
referenced
this pull request
in drserajames/ae
Jun 13, 2026
The `cairo` dependency is consumed by tal-draw (via cc/draw/cairo-surface), not just the shelved map-draw chart-draw target. Relabel its comment so a future deletion of map-draw #1 doesn't remove the cairo dep and break tal-draw. No functional change. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
drserajames
referenced
this pull request
in drserajames/ae
Jun 13, 2026
…t labels
Adds the remaining signature-page text elements to tal-draw, plus the two
surface primitives they need.
- cc/draw/cairo-surface.{hh,cc}: two new CairoPdf primitives (additive) —
rectangle() (filled/outlined, top-left anchored) and text_rotated() (text
rotated about an anchor; -90 reads upward). Shared surface; map-draw #1 is
shelved (see TODO §1 kateri course-correction) and these are append-only.
- cc/tal/draw-tree.{cc,hh} + tal-draw-main.cc: TreeDrawParameters gains
title/legend/aa_transitions; export_tree_pdf now draws a centred title
(--title=), a clade colour legend (--legend, filled-rect swatches), inode
aa-substitution labels (--aa-transitions; port of DrawAATransitions, reads
Inode::aa_transitions from the phylo-tree-v3 "A" field), and rotated year/
month labels under the time-series column.
- cc/tal/test/test-draw-tree.sh: adds an M3 invocation.
All text uses only the existing/new line()/text()/rectangle()/text_rotated()
primitives — no Pango.
Verify (arm64 build):
ninja -C build-arm64 tal-draw
sh cc/tal/test/test-draw-tree.sh -> OK: tal-draw renders valid PDFs
(a 24-leaf 3-clade tree with inode aa-transitions rendered as a full
signature-page-style figure and rasterised & eyeballed.)
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
drserajames
referenced
this pull request
in drserajames/ae
Jun 13, 2026
Verified ae_backend.chart_v3.Chart('test/chart1.ace') loads and export()
works (the ae.utils.kateri.send_chart path). Replace the "open chart_v3
import-abort" caveat on the figure-pipeline milestone with "fixed", and point
the geographic note at the cc/geo / geo-draw renderer being built (#1) instead
of "no ae renderer yet". Docs only.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
drserajames
referenced
this pull request
in drserajames/ae
Jun 14, 2026
…GRATION) The bulk of the AD→ae port is now complete; update the stale status docs: - CLAUDE.md: rewrite the "Porting from AD" section — subsystem table now reflects reality (map-draw shelved → kateri + cc/geo geo-draw; hidb/webserver/CLI done; TAL feature-complete; ssm-report engine consolidated). Add the note that antigenic maps live in kateri (driven via ae.utils.kateri), and that the chart engine now writes layout coords (set_coordinates / Layout.__setitem__ / set_unmovable). (Also carries the chart agent's complementary Layout-indexing doc edits, which are accurate.) - TODO #4 row: all four figure families generate on ae; adjust ported (ae.adjust + kateri point-drag); remaining = a full assembled-report run + geo clade colouring (#1). - py/ae/report/README.md + MIGRATION.md: headers updated from "in progress" / "planning" to "largely done" with the current remaining list. Docs only. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
drserajames
referenced
this pull request
in drserajames/ae
Jun 14, 2026
…uild geo-pie
Closes the largest of the three gaps the 2026-0223 capstone attempt surfaced:
ae.report.geographic could not reproduce the report's geographic maps (it only
coloured by continent). The report's AD geographic-draw colours each antigen by
its geographic_coloring(subtype) aa/clade "apply" rules.
- geo-draw: build + verify the previously-uncommitted clade-pie work (sector
primitive, GeoWedge/LegendEntry, stable palette + legend). Synthetic-verified.
- geo-draw: new report-faithful coloring mode. GeoPoint gains per-point
outline_width; CairoPdf::circle() skips the stroke on transparent/zero outline;
geo-draw-main parses per-location "points":[{color,outline,outline_width,count}]
+ top-level point_size/density + per-period "title", and packs one dot per
antigen into AD's concentric rings (pack_colored_points = port of AD
GeographicMapWithPointsFromHidb::prepare).
- py/ae/report/geographic.py: make_geo(color_by="coloring",
colorings={subtype: geographic_coloring(subtype)}). _Coloring is a Python port
of AD ColoringByAminoAcid::color — ordered apply rules, sequenced rule sets fill
only, aa rule matched via seqdb SequenceAA.matches_all (incl. ! negation and -
deletions) overrides fill/outline/outline_width, later matches win, unmatched ->
default. Continent + clade-pie modes unchanged.
Verified on real H1 hidb (Dec 2023 window; local hidb is stale so the exact
2025-12 reference month can't be reproduced here): dots clade-coloured by the
report's exact palette, AD-style packed clusters per location, month-name title,
no on-map legend. No real surveillance data added to the repo.
Docs: MIGRATION.md gap analysis #1 marked closed; README/TODO updated.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
There are enough sequences now that raxml and/or garli fail (I haven't tried myself, this information from Sarah & Poppy). These changes modify the pipeline to use CMAPLE (https://academic.oup.com/mbe/article/41/7/msae134/7700168), a fast recent program which can handle many (>1 million) sequences.
Two other changes: (1) increasing recursion limit in newick.cc to handle larger trees, and (2) committing some changes which were in Sarah's /home/slj38/ae directory.