Skip to content

[Model] MaximumAcyclicAgreementForest #1046

@zhangjieqingithub

Description

@zhangjieqingithub

Motivation

In phylogenetics, when two gene trees on the same species set disagree, the disagreement can be explained by hybridization (reticulation) events. The hybridization number of two rooted binary phylogenetic trees $T_1, T_2$ is the minimum number of reticulations needed in any phylogenetic network that displays both trees. Baroni, Grünewald, Moulton, Semple (2005) proved this equals $|F| - 1$, where $F$ is a minimum-size maximum acyclic agreement forest. Computing MAAF is therefore the central exact-tractability question behind hybridization-network reconstruction, and is central to the FPT and approximation literature (Bordewich–Semple 2007; Whidden–Beiko–Zeh 2013).

Definition

  • Name: MaximumAcyclicAgreementForest (alias MAAF)

  • Reference: Baroni, Grünewald, Moulton, Semple, "Bounding the number of hybridisation events for a consistent evolutionary history", Journal of Mathematical Biology 51(2), 2005.

  • Formal definition: Given two rooted binary phylogenetic X-trees $T_1, T_2$ with common leaf set $X$, find a partition $F = {X_1,\dots,X_k}$ of $X$ such that

    1. for each block $X_i$, the subtrees $T_1|X_i$ and $T_2|X_i$ (restricted to $X_i$ with degree-2 vertices suppressed) are isomorphic, and the subtrees restricted to different blocks are edge-disjoint in both $T_1$ and $T_2$; and
    2. the ancestor digraph on ${X_1,\dots,X_k}$ — with arc $X_i \to X_j$ iff for some $t \in {1,2}$ the root of $T_t|X_i$ is an ancestor in $T_t$ of the root of $T_t|X_j$ — is acyclic.

    Objective: minimize $k = |F|$. The hybridization number of ${T_1, T_2}$ equals $k - 1$.

Variables

  • Count: $n = |X|$ (one variable per leaf)
  • Per-variable domain: ${1, 2, \dots, n}$ (block labels)
  • Meaning: variable $x_\ell$ is the index of the block of $F$ to which leaf $\ell \in X$ is assigned.

Schema

  • Type name: MaximumAcyclicAgreementForest
  • Variants: single default variant (pair of rooted binary trees, unweighted).
Field Type (math) Meaning
tree1 rooted binary tree with leaves labeled by $X$ first input tree $T_1$
tree2 rooted binary tree with leaves labeled by $X$ second input tree $T_2$ on the same leaf set
  • Size fields: num_leaves (= $n = |X|$).

Complexity

$O(3.18^{\texttt{num_leaves}} \cdot \texttt{num_leaves})$ — Whidden, Beiko, Zeh, "Fixed-Parameter Algorithms for Maximum Agreement Forests", SIAM Journal on Computing 42(4), 2013. The acyclic-forest branching gives hybridization number in FPT time in $k$; in the worst case $k = \Theta(n)$, yielding the stated exponential bound. https://epubs.siam.org/doi/10.1137/110845045

Extra Remark

The non-acyclic variant (drop condition (2)) is Maximum Agreement Forest (MAF); $|F_{\text{MAF}}| - 1$ equals the rooted-SPR distance between $T_1$ and $T_2$ (Bordewich–Semple 2005). MAAF and MAF coincide on many inputs but differ on "cycle" configurations, making the acyclicity constraint algorithmically essential.

Reduction Rule Crossref

How to solve

Brute-force (baseline): enumerate all $n^n$ label assignments and verify the two feasibility conditions.

A companion MAAF → ILP rule provides a solver path via the existing ILP backend. A companion MAAF → MinimumFeedbackVertexSet rule provides a structural reduction to directed feedback vertex set (the "cycle killer" reduction of van Iersel et al. 2012), enabling the ILP/QUBO solver chain already registered for MinimumFeedbackVertexSet.

Example Instance

The 4-leaf quartet swap (canonical example from Baroni et al. 2005):

  • $X = {a, b, c, d}$
  • $T_1 = ((a, b), (c, d))$
  • $T_2 = ((a, c), (b, d))$

Expected Outcome

No cherry ${x, y} \subseteq X$ is common to both trees (${a,b}$ is a cherry of $T_1$ but not $T_2$; ${a,c}$ is a cherry of $T_2$ but not $T_1$; etc.), so every non-singleton block $X_i \subseteq X$ would produce non-isomorphic restricted subtrees. The optimal acyclic agreement forest is therefore

$$F^\star = {{a}, {b}, {c}, {d}}, \qquad |F^\star| = 4,$$

and the hybridization number of $(T_1, T_2)$ equals $|F^\star| - 1 = 3$.

BibTeX

@article{BaroniGMS2005,
  author  = {Baroni, Magnus and Gr{\"u}newald, Stefan and Moulton, Vincent and Semple, Charles},
  title   = {Bounding the number of hybridisation events for a consistent evolutionary history},
  journal = {Journal of Mathematical Biology},
  volume  = {51},
  number  = {2},
  pages   = {171--182},
  year    = {2005},
  doi     = {10.1007/s00285-005-0315-9},
}

@article{WhiddenBZ2013,
  author  = {Whidden, Chris and Beiko, Robert G. and Zeh, Norbert},
  title   = {Fixed-Parameter Algorithms for Maximum Agreement Forests},
  journal = {SIAM Journal on Computing},
  volume  = {42},
  number  = {4},
  pages   = {1431--1466},
  year    = {2013},
  doi     = {10.1137/110845045},
}

@article{BordewichS2007,
  author  = {Bordewich, Magnus and Semple, Charles},
  title   = {Computing the hybridization number of two phylogenetic trees is fixed-parameter tractable},
  journal = {IEEE/ACM Transactions on Computational Biology and Bioinformatics},
  volume  = {4},
  number  = {3},
  pages   = {458--466},
  year    = {2007},
  doi     = {10.1109/tcbb.2007.1019},
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    modelA model problem to be implemented.

    Type

    No type

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions