Skip to content

mummer reports inexact MEMs #235

@nrizzo

Description

@nrizzo

Hello,

with MUMmer 4.0.1 I am finding MEMs between a PacBio HiFi read (from the HG002 sequencing data, run m64004) and the T2T-CHM13v2.0 chromosome 22 (we are using the seed finding in ChainX, thanks to you and to the essaMEM/divsufsort authors for the code!).
However, I found one inexact match reported as well (see input_files.zip).

$ mummer -maxmatch -l 20 chr22.fa hg002_pacbio_hifi_read.fa | grep "12852871"
12852871     26367       116

$ seqtk subseq chr22.fa <(echo "chr22 12852870 $((12852870 + 116))")
>chr22:12852871-12852986 CP068256.2 Homo sapiens isolate CHM13 chromosome 22
aaagaagtttctgagaatgcttctgtcgagattttatatgaagatattcccgtttccaacgaaatcctgaaatctatccaaatatcccctcgcagattctacaaaaagagtgtttc

$ seqtk subseq hg002_pacbio_hifi_read.fa <(echo "m64004_210224_230828/171/ccs 26366 $((26366 + 116))")
>m64004_210224_230828/171/ccs:26367-26482
AAAGAAGTTTCTGAGAATGCTTCCGGGTAGTTTTTATGTGAAGATATTTCCTTTTCCACAATAGGCCTCAAAGTGCTCCAAATATCCAATTGCAGATTCTACAAAAAGAGTGTTTC

The coordinates seem to be the start of a MEM, which is reported correctly using a different length threshold, so I suspect a bug related to the LCP array:

$ mummer -maxmatch -l 15 chr22.fa hg002_pacbio_hifi_read.fa | grep "12852871"
12852871     26367        23

Best,
Nicola

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions