Skip to content

Use CMAPLE in weekly-tree pipeline, not raxml + garli#1

Open
SamT123 wants to merge 27 commits into
skepner:mainfrom
acorg:main
Open

Use CMAPLE in weekly-tree pipeline, not raxml + garli#1
SamT123 wants to merge 27 commits into
skepner:mainfrom
acorg:main

Conversation

@SamT123

@SamT123 SamT123 commented Jul 15, 2025

Copy link
Copy Markdown

There are enough sequences now that raxml and/or garli fail (I haven't tried myself, this information from Sarah & Poppy). These changes modify the pipeline to use CMAPLE (https://academic.oup.com/mbe/article/41/7/msae134/7700168), a fast recent program which can handle many (>1 million) sequences.

Two other changes: (1) increasing recursion limit in newick.cc to handle larger trees, and (2) committing some changes which were in Sarah's /home/slj38/ae directory.

SamT123 added 5 commits July 15, 2025 09:52
These are changes from /home/slj38/ae which had not been committed to skepner/ae. I'm committing them, then making the changes necessary to use cmaple on top
Use CMAPLE for initial tree, not raxml
@SamT123 SamT123 force-pushed the main branch 3 times, most recently from fb5c216 to 1ed7815 Compare July 16, 2025 23:54
SamT123 and others added 7 commits September 10, 2025 14:06
previously: first ancestor of sequence X with occurence of mutation Y.

But this had the problem of the trim being bad if that one sequence is placed outside of the clade we are aiming for (well, it fails because the trimmed tree contains too few sequences)

Now: find all nodes with occurrence of mutations, which do have mutations in 'required_mutations' and do not have mutations in 'forbidden mutations. Then check there is only one such clade with > min_tips descendants, and trim to that clade.
drserajames referenced this pull request in drserajames/ae Jun 12, 2026
Port the unblocked half of AD ssm-report init.py: create the report
working-dir subdirs, copy the static templates, and generate the
date-substituted report.json / setup.json that the assembly core consumes.

- init.py: init/init_dirs/copy_templates/make_report_json/
  compute_substitutions/find_previous_dir
- packaged templates: setup.json, index.html, README.org, root-gitignore,
  merges-index.html
- bin/ssm-report-init wrapper

Omits the site-specific infra it carried (albertine git-repo + hidb/seqdb/
locationdb rsync, and the rr/sy/rename-report-on-server deploy scripts that
shell out to ssm-make/ssh i19/syput) and the figure-settings half of
init_settings (serum-coverage/geographic makers, blocked on map-draw #1).

Verified: scaffolds dirs+templates and writes substituted report.json/
setup.json; date logic unit-checked (Northern/Southern season, teleconference
selection, October year split); the generated 233-page report.json round-trips
through the assembler (read_json + LatexReport ctor, correct ts_dates).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
drserajames referenced this pull request in drserajames/ae Jun 12, 2026
Port the tree-layout (node-position) computation from acmacs-tal as the first
buildable slice of the TAL subsystem (#3). The drawing layer remains blocked on
the Cairo Surface backend from subsystem #1 — see cc/tal/PORTING.md.

- cc/tal/layout.{hh,cc}: ae::tal::compute_layout(Tree&) -> TreeLayout
  {height, max_cumulative, leaves[], inodes[]}. Faithful port of acmacs-tal
  Tree::compute_cumulative_vertical_offsets() (shown leaves stacked one per
  default_vertical_offset; inode at the midpoint of its first/last shown child;
  horizontal = cumulative edge, reusing ae::tree::Tree::calculate_cumulative()).
  Uses an iterative post-order so deep ladderized trees don't overflow the stack.
- cc/py/tal.cc: ae_backend.tal submodule exposing compute_layout + NodeLayout/
  TreeLayout. Registered in cc/py/module.{cc,hh}; sources wired in meson.build
  (commented "# --- tal (subsystem #3) ---" blocks, appended).
- cc/tal/PORTING.md: milestone-1 exploration — pipeline, Node model, the full
  LayoutElement->source-file map, the Surface dependency/blocker, phased port plan,
  and arm64 build/verify gotchas.
- cc/tal/test/: hand-verifiable Newick tree + test-layout.py.

Verify (arm64 build):
  python3 cc/tal/test/test-layout.py
  -> OK: layout verified (height=5.0, max_cumulative=3.0, 5 leaves, 4 inodes)

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
drserajames referenced this pull request in drserajames/ae Jun 12, 2026
Two more headless Phase-A pieces of the TAL subsystem (#3), both reusing data
ae::tree already carries; drawing remains Phase B, blocked on subsystem #1.

- cc/tal/clades.{hh,cc}: ae::tal::compute_clade_sections(Tree&) -> [Clade{name,
  sections[]}]. Port of acmacs-tal Tree::make_clade_sections() — shown leaves
  grouped (in vertical order) into per-clade vertically-contiguous runs; a gap
  starts a new section. Reuses ae::tree::Leaf::clades.
- cc/tal/time-series.{hh,cc}: ae::tal::compute_time_series(Tree&, interval,
  start?, end?) -> TimeSeries{slots[], dated/undated/outside counts}. Ports the
  data side of acmacs-tal time-series.cc (year/month/week/day slot generation +
  per-slot leaf counts) using ae::date + C++20 <chrono>, without porting
  acmacs-base/time-series. Reuses ae::tree::Leaf::date.
- cc/py/tal.cc: expose compute_clade_sections / compute_time_series and their
  result types under ae_backend.tal. meson.build: clades.cc + time-series.cc
  added to sources_tal.
- cc/tal/test/: tree-clades.json (dated, clade-annotated phylo-tree-v3 tree) +
  test-clades.py + test-time-series.py.

Verify (arm64 build):
  python3 cc/tal/test/test-clades.py       -> OK: clade sections verified ...
  python3 cc/tal/test/test-time-series.py  -> OK: time series verified ...

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
drserajames referenced this pull request in drserajames/ae Jun 12, 2026
First drawing slice of the TAL subsystem (#3), unblocked now that map-draw (#1)
has reached M3 and the CairoPdf surface exposes line()/text().

- cc/tal/draw-tree.{hh,cc}: ae::tal::export_tree_pdf(Tree&, output, image_size,
  labels). Port of the leaf/inode loop in acmacs-tal DrawTree::draw — each node's
  horizontal edge segment (x scaled by cumulative edge) plus the vertical
  connector spanning each inode's first..last shown child; optional leaf labels.
  Reuses ae::tal::compute_layout (Phase A) and the ae::draw::CairoPdf surface
  from subsystem #1.
- cc/tal/tal-draw-main.cc + meson.build: new `tal-draw` executable
  (tal-draw [--labels] <tree.newick|tree.json[.xz]> <out.pdf> [size]). Cairo is
  linked only into this target, never into libae/ae_backend (mirrors chart-draw).
- cc/tal/test/test-draw-tree.sh: asserts the test trees render to valid PDFs.

Used the concrete CairoPdf directly rather than first extracting an abstract
ae::draw::Surface — lowest-conflict path while map-draw is actively evolving
CairoPdf; the abstraction is deferred (see cc/tal/PORTING.md).

Verify (arm64 build):
  ninja -C build-arm64 tal-draw
  sh cc/tal/test/test-draw-tree.sh   -> OK: tal-draw renders valid PDFs
(a 20-leaf tree was also rasterised and eyeballed — correct topology, branch-
length scaling and labels.)

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
drserajames referenced this pull request in drserajames/ae Jun 13, 2026
The antigenic-map figures the report embeds come from kateri (the Dart map
viewer/PDF generator, driven over a socket via ae.utils.kateri), not the
shelved C++ map-draw subsystem (#1). Update the package docs accordingly:

- report.py / labs.py / __init__.py / init.py: drop "blocked on map-draw /
  not-yet-ported map-draw subsystem" framing
- README.md: rewrite the dependency boundary as a per-figure table (maps ->
  kateri, trees -> TAL, geography -> no ae renderer yet, stat -> ae_backend
  counts) and restate the next milestone as a kateri-based figure pipeline

Docs only; no code behaviour change, package still imports clean.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
drserajames referenced this pull request in drserajames/ae Jun 13, 2026
The `cairo` dependency is consumed by tal-draw (via cc/draw/cairo-surface),
not just the shelved map-draw chart-draw target. Relabel its comment so a
future deletion of map-draw #1 doesn't remove the cairo dep and break
tal-draw. No functional change.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
drserajames referenced this pull request in drserajames/ae Jun 13, 2026
…t labels

Adds the remaining signature-page text elements to tal-draw, plus the two
surface primitives they need.

- cc/draw/cairo-surface.{hh,cc}: two new CairoPdf primitives (additive) —
  rectangle() (filled/outlined, top-left anchored) and text_rotated() (text
  rotated about an anchor; -90 reads upward). Shared surface; map-draw #1 is
  shelved (see TODO §1 kateri course-correction) and these are append-only.
- cc/tal/draw-tree.{cc,hh} + tal-draw-main.cc: TreeDrawParameters gains
  title/legend/aa_transitions; export_tree_pdf now draws a centred title
  (--title=), a clade colour legend (--legend, filled-rect swatches), inode
  aa-substitution labels (--aa-transitions; port of DrawAATransitions, reads
  Inode::aa_transitions from the phylo-tree-v3 "A" field), and rotated year/
  month labels under the time-series column.
- cc/tal/test/test-draw-tree.sh: adds an M3 invocation.

All text uses only the existing/new line()/text()/rectangle()/text_rotated()
primitives — no Pango.

Verify (arm64 build):
  ninja -C build-arm64 tal-draw
  sh cc/tal/test/test-draw-tree.sh   -> OK: tal-draw renders valid PDFs
(a 24-leaf 3-clade tree with inode aa-transitions rendered as a full
signature-page-style figure and rasterised & eyeballed.)

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
drserajames referenced this pull request in drserajames/ae Jun 13, 2026
Verified ae_backend.chart_v3.Chart('test/chart1.ace') loads and export()
works (the ae.utils.kateri.send_chart path). Replace the "open chart_v3
import-abort" caveat on the figure-pipeline milestone with "fixed", and point
the geographic note at the cc/geo / geo-draw renderer being built (#1) instead
of "no ae renderer yet". Docs only.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
drserajames referenced this pull request in drserajames/ae Jun 14, 2026
…GRATION)

The bulk of the AD→ae port is now complete; update the stale status docs:
- CLAUDE.md: rewrite the "Porting from AD" section — subsystem table now reflects
  reality (map-draw shelved → kateri + cc/geo geo-draw; hidb/webserver/CLI done;
  TAL feature-complete; ssm-report engine consolidated). Add the note that
  antigenic maps live in kateri (driven via ae.utils.kateri), and that the chart
  engine now writes layout coords (set_coordinates / Layout.__setitem__ /
  set_unmovable). (Also carries the chart agent's complementary Layout-indexing
  doc edits, which are accurate.)
- TODO #4 row: all four figure families generate on ae; adjust ported
  (ae.adjust + kateri point-drag); remaining = a full assembled-report run +
  geo clade colouring (#1).
- py/ae/report/README.md + MIGRATION.md: headers updated from "in progress" /
  "planning" to "largely done" with the current remaining list.

Docs only.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
drserajames referenced this pull request in drserajames/ae Jun 14, 2026
…uild geo-pie

Closes the largest of the three gaps the 2026-0223 capstone attempt surfaced:
ae.report.geographic could not reproduce the report's geographic maps (it only
coloured by continent). The report's AD geographic-draw colours each antigen by
its geographic_coloring(subtype) aa/clade "apply" rules.

- geo-draw: build + verify the previously-uncommitted clade-pie work (sector
  primitive, GeoWedge/LegendEntry, stable palette + legend). Synthetic-verified.
- geo-draw: new report-faithful coloring mode. GeoPoint gains per-point
  outline_width; CairoPdf::circle() skips the stroke on transparent/zero outline;
  geo-draw-main parses per-location "points":[{color,outline,outline_width,count}]
  + top-level point_size/density + per-period "title", and packs one dot per
  antigen into AD's concentric rings (pack_colored_points = port of AD
  GeographicMapWithPointsFromHidb::prepare).
- py/ae/report/geographic.py: make_geo(color_by="coloring",
  colorings={subtype: geographic_coloring(subtype)}). _Coloring is a Python port
  of AD ColoringByAminoAcid::color — ordered apply rules, sequenced rule sets fill
  only, aa rule matched via seqdb SequenceAA.matches_all (incl. ! negation and -
  deletions) overrides fill/outline/outline_width, later matches win, unmatched ->
  default. Continent + clade-pie modes unchanged.

Verified on real H1 hidb (Dec 2023 window; local hidb is stale so the exact
2025-12 reference month can't be reproduced here): dots clade-coloured by the
report's exact palette, AD-style packed clusters per location, month-name title,
no on-map legend. No real surveillance data added to the repo.

Docs: MIGRATION.md gap analysis #1 marked closed; README/TODO updated.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants