Skip to content

masking sites in target genome after alignment #303

@JeffWeinell

Description

@JeffWeinell

I have an alignment of 58 snake genomes stored as a HAL file and generated using Progressive Cactus. For each genome in the alignment, I have a BED file specifying site positions in the ungapped genome that I want to be hard-masked (with Ns) in an updated alignment.

The example below illustrates what I am trying to do.

Input files that I have:

(1) An alignment (portrayed here as an alignment block with dummy data for simplicity).

genome1.seqABC  CATAATT----CACCACTCGCACCAGGACGAAAAACGTATTCTTgctgacgcgtttcttatt
genome2.seqXYZ  cataattcaTCCACCACTCGCAccagGACGAAAAACGT------gctgacgcgtttcttatt

(2) BED file (dummy data) with regions of ungapped genome2 that I want to be hard-masked in the updated alignment.

seqXYZ	0	9
seqXYZ	22	26

Desired updated alignment

After hard-masking the target genome sites in the BED file, the updated alignment includes unmasked, soft-masked, and hard-masked sites:

genome1.seqABC  CATAATT----CACCACTCGCACCAGGACGAAAAACGTATTCTTgctgacgcgtttcttatt
genome2.seqXYZ  NNNNNNNNNTCCACCACTCGCANNNNGACGAAAAACGT------gctgacgcgtttcttatt

I would greatly appreciate any help with how to solve this problem!

-Jeff

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions