Skip to content

Apply mAIC to PartitionFinder#127

Merged
bqminh merged 59 commits into
masterfrom
huaiyan
Jun 18, 2026
Merged

Apply mAIC to PartitionFinder#127
bqminh merged 59 commits into
masterfrom
huaiyan

Conversation

@HuaiyanRen

Copy link
Copy Markdown
Collaborator

No description provided.

thomaskf and others added 30 commits July 2, 2025 18:24
…. 2. greedy algorithm added. 3. calculation works for keep_empty_seq. 4 pairs generated in previous iteration are reconsidered for merging. 5. multi-thread works properly. 6. avoid unnecessary warning.

Q-mixture model now can be estimated by iteratively adding components from one
@bqminh

bqminh commented Mar 5, 2026

Copy link
Copy Markdown
Member

Claude Code:

PR #127 — "Apply mAIC to PartitionFinder"

Author: HuaiyanRen | Branch: huaiyan → master | 36 commits | ~178 additions / ~1,108 deletions across 26 files

What It Does

The stated goal is to integrate marginal AIC (mAIC) into the PartitionFinder merging algorithm — specifically:

  1. Remove the !params.contain_nonrev guard so mAIC is computed even when non-reversible models are present
  2. Remove the outError() that blocked the "1-taxon intersection + non-reversible model" case in partitionmodel.cpp
  3. Refactor checkAbsentStates() from void to int to aggregate absent-state counts across partitions

In practice, the PR is a snapshot of a development branch that diverged from master ~8 months ago and has not been rebased. The majority of the diff is unintended reversion of features and bug fixes added to master in the interim.

Critical Issues (Block Merge)

1. Compilation error — ASSSERT typo
In main/phylotesting.cpp:
ASSSERT(check); // three S's — undefined macro, build fails
ASSERT is the correct macro. This is a build-breaking error.

2. Memory leak in checkAbsentStates()
double *state_freq = new double[num_states]; // allocated
if (seq_type == SEQ_POMO)
return 0; // leaked — never freed
The early PoMo return exits before delete[] state_freq.

3. Mathematically incorrect mAIC for non-reversible models
The PR removes the !params.contain_nonrev guard and also removes the outError() that protected against "1-taxon intersection + non-reversible model". The fallthrough now silently uses equilibrium state frequencies (log_state_freq[char_id]) as the marginal approximation — which is only valid for reversible models. For non-reversible models, root placement matters and the approximation produces biased marginal likelihoods with no warning.

4. Massive unintended regressions (~1,000 lines of production code removed)

The PR inadvertently drops everything added to master while huaiyan was diverged:

  • MrBayes export: entire --mrbayes / printMrBayesBlockFile() feature + all printMrBayesModelText() implementations across 5 model classes
  • AliSim: simulate_alignment() C API, --pop-size, --skip-bl-check, seed enforcement, branch-length safety check, double-free fix for first_insertion
  • FunDi: iterative convergence loop with fallback replaced by a single optimization call
  • Weighted NNI perturbation: --weighted-perturbation, branchLengthSafe(), computeRankOverE(), sampleIndexByWeights()
  • MixtureFinder: protein/multistate support (--force-aa-mix-finder), per-data-type initial frequency set selection (initFreqSet() removed, hardcoded "FO" used instead)
  • libiqtree C API: other_options, blfix, simulate_alignment(), iqtree_free() all removed (breaking API change)
  • Bug fixes: split.cpp fix for ntaxa % UINT_BITS == 0; alignment parsimony auto-switch for large datasets

Moderate Issues

5. checkAbsentStates() semantics changed
The severity logic is altered: count >= num_states-1 (all-but-one absent) now produces the same fatal error as all-gaps. Warning messages downgraded from outWarning() to cout <<, losing standard warning machinery. The BRLEN_FIX bypass has no documented rationale.

6. MixtureFinder initFreqSet() removal
Replacing per-seq_type frequency selection with hardcoded "FO" silently produces wrong starting frequency sets for codon data (should be F1X4,F3X4) and morphological data (should be FQ).

7. Spelling errors introduced

  • "not asigned by user" (missing 's') in phyloanalysis.cpp
  • "defaut: 5" (missing 'l') in tools.cpp help text

8. CI weakened
CI restricted to master pushes only; GCC 9 and GCC 12 test matrix entries removed.

What Is Valuable

The core mAIC contribution (return-count refactor of checkAbsentStates, per-partition aggregation, enabling mAIC for non-reversible models) is a legitimate and useful addition — it is just buried under a large unintended reversion.

Verdict

Not ready to merge. The branch needs to be rebased onto current master and submitted as a targeted PR containing only the mAIC-specific changes, retaining all features and bug fixes that master accumulated in the interim. At minimum, the ASSSERT typo and PoMo memory leak must be fixed, and the non-reversible mAIC path needs either a correct mathematical implementation or a restored guard with a clear roadmap comment.

HuaiyanRen and others added 24 commits March 12, 2026 14:23
…n matrix computation (L(x) = Σ_r π(r) · P(r→x|t))
…), for nonrev model (use artificial taxon strategy).
…gnment when applying mAIC within PartitionFinder.

@bqminh bqminh left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

checked locally

@bqminh bqminh merged commit f5c9ba4 into master Jun 18, 2026
8 of 10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants