Skip to content

[Model] MaximumContactMapOverlap #1043

@zhangjieqingithub

Description

@zhangjieqingithub

Motivation

MaximumContactMapOverlap is a standard protein-structure comparison problem that is not currently in the catalog.

It is useful because it models contact-map alignment directly: given two ordered residue contact graphs, find an order-preserving alignment that maximizes the number of superimposed contacts. This is a core combinatorial formulation behind flexible protein-structure comparison, motif comparison, and contact-map based clustering.

Associated rule:

Definition

Name: MaximumContactMapOverlap
Aliases: CMO, MaxCMO
Reference:

  • Rumen Andonov, Noel Malod-Dognin, and Nicola Yanev, "Maximum Contact Map Overlap Revisited," Journal of Computational Biology 18(1):27-41, 2011. https://doi.org/10.1089/cmb.2009.0196
  • Wei Xie and Nikolaos V. Sahinidis, "A reduction-based exact algorithm for the contact map overlap problem," Journal of Computational Biology 14(5):637-654, 2007. https://doi.org/10.1089/cmb.2007.R007

Let
G_1 = (V_1, E_1) and G_2 = (V_2, E_2)
be two finite ordered contact maps. Each V_r is ordered as residue positions
0, 1, ..., n_r - 1, and each E_r is a simple undirected contact set on those vertices.

A solution is an order-preserving partial injective alignment
f : V_1 -> V_2 union {unmatched}.

The alignment is feasible iff:

  • each vertex of V_1 is either unmatched or mapped to one vertex of V_2,
  • no two vertices of V_1 are mapped to the same vertex of V_2,
  • if i < k in V_1 and both are matched, then f(i) < f(k) in V_2.

The objective is to maximize the number of preserved contacts:

|{ {i,k} in E_1 : i and k are both matched and {f(i), f(k)} in E_2 }|.

So this is a maximization problem.

Variables

Let n_1 = |V_1| and n_2 = |V_2|.

  • Count: n_1 variables
  • Per-variable domain: {0, 1, ..., n_2}
  • Meaning: value 0 means unmatched; value j + 1 means the vertex is matched to vertex j in V_2

A configuration is feasible iff its nonzero matched values are strictly increasing along the order of V_1. The objective value is the number of contacts of E_1 whose endpoints are mapped to endpoints of a contact in E_2.

Schema (data type)

Type name: MaximumContactMapOverlap

Field Mathematical type Description
num_vertices_1 integer Number of ordered residues/vertices in first contact map
contacts_1 set of unordered vertex pairs Scored contacts in first contact map
num_vertices_2 integer Number of ordered residues/vertices in second contact map
contacts_2 set of unordered vertex pairs Scored contacts in second contact map

Suggested size fields:

  • num_vertices_1
  • num_vertices_2
  • num_contacts_1
  • num_contacts_2

Input validation should require:

  • all contact endpoints are valid vertex indices,
  • no self-loops,
  • no parallel duplicate contacts after normalizing unordered pairs,
  • vertices are ordered implicitly by their indices.

Complexity

The direct brute-force baseline enumerates every map from V_1 to V_2 union {unmatched} and rejects those that are not order-preserving and injective.

This gives the conservative worst-case expression:

(num_vertices_2 + 1) ^ num_vertices_1.

This is a baseline enumeration bound, not a best-known exact algorithm bound. The CMO literature contains substantially stronger exact algorithms and integer-programming approaches.

Extra Remark

For biological contact maps, consecutive-residue contacts are often excluded before scoring because they are trivial backbone contacts. The clean implementation convention is that the input edge sets E_1 and E_2 already contain exactly the contacts intended to be scored. The objective should not contain an additional special case for adjacent residues.

The mathematical model is symmetric in the two contact maps, but the internal configuration is intentionally asymmetric:

graph 1 -> graph 2 or unmatched.

This keeps evaluation simple while representing every order-preserving alignment.

Reduction Rule Crossref

How to solve

  • It can be solved by brute force over order-preserving partial alignments.
  • It can be reduced exactly to ILP/bool via [Rule] MaximumContactMapOverlap to ILP #1044.
  • It also has specialized exact algorithms in the CMO literature.

Example Instance

Let the first contact map have vertices 0,1,2,3 and contacts:

E_1 = {{0,2}, {1,3}}.

Let the second contact map have vertices 0,1,2,3,4 and contacts:

E_2 = {{0,3}, {1,4}, {0,2}}.

Expected Outcome

One optimal alignment is:

Vertex in G_1 Image in G_2
0 0
1 1
2 3
3 4

This alignment is order-preserving: 0 < 1 < 3 < 4.

It preserves both contacts of G_1:

  • {0,2} maps to {0,3}, which is in E_2,
  • {1,3} maps to {1,4}, which is in E_2.

So the objective value is 2. Since G_1 has only two contacts, no alignment can preserve more than two contacts, so this is optimal.

BibTeX

@article{AndonovMalodDogninYanev2011CMO,
  author  = {Rumen Andonov and Noel Malod-Dognin and Nicola Yanev},
  title   = {Maximum Contact Map Overlap Revisited},
  journal = {Journal of Computational Biology},
  volume  = {18},
  number  = {1},
  pages   = {27--41},
  year    = {2011},
  doi     = {10.1089/cmb.2009.0196},
  url     = {https://doi.org/10.1089/cmb.2009.0196}
}

@article{XieSahinidis2007CMO,
  author  = {Wei Xie and Nikolaos V. Sahinidis},
  title   = {A Reduction-Based Exact Algorithm for the Contact Map Overlap Problem},
  journal = {Journal of Computational Biology},
  volume  = {14},
  number  = {5},
  pages   = {637--654},
  year    = {2007},
  doi     = {10.1089/cmb.2007.R007},
  url     = {https://doi.org/10.1089/cmb.2007.R007}
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    modelA model problem to be implemented.

    Type

    No type

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions