“Con-hi” means “consensus-highlighter”.
Latest version is 4.1.a (2026-03-18 edition).
This program annotates low-coverage and high-coverage regions of sequences in fasta format using read mapping in BAM format.
- Target sequence(s) in fasta format.
- Read mapping in a sorted BAM file.
- Coverage threshold(s) for searching low-coverage and high-coverage regions.
- A GenBank or BED file with annotated low-coverage and high-coverage regions.
- Python 3.6 or later.
- Biopython package.
- samtools 1.13 or later is recommended. Versions from 1.11 to 1.12 are acceptable, but might calculate coverage inaccurately.
You can install Biopython with following command:
pip3 install biopythonYou can install samtools by downloading latest release from samtools page on github. Then follow instrunctions in downloaded INSTALL file.
Basic usage is:
./con-hi.py -f <TARGET_FASTA> -b <MAPPING_BAM>You can specify custom coverage theshold(s) by passing comma-separated list of thresholds with options -c, -C and -X. For example, following command will annotate:
-
regions with coverage below 25 and all regions below 55 (and also with zero coverage);
-
regions with coverage greater than 1.5×M and greater than 2.0×M, where M is median coverage.
./con-hi.py \
-f my_sequence.fasta -b my_mapping.sorted.bam \
-c 25,55 -X 1.5,2.0
-f or --target-fasta: *
File of target sequence(s) in fasta format.
-b or --bam: *
Sorted BAM file which contains mapping on target sequence(s).
-o or --outfile:
Output file.
Deault value: 'highlighted_sequence.gbk'.
-O or --output-format:
Output format: `genbank` or `bed`.
Default: `genbank`.
-r or --target-seq-ids:
Comma-separated list of target sequence IDs to process.
Examples: "seq_1" or "seq_1,seq_9,seq_12".
Dasta sequence id is the part of its header before the first space.
Default: process all target sequences.
-c or --lower-coverage-thresholds:
Comma-separated list of lower coverage threshold(s).
To annotate regions with coverage < c.
Default: 10.
To disable it, specify "-c off", and low-coverage regions won't be annotated.
-C or --upper-coverage-thresholds:
Comma-separated list of upper coverage threshold(s).
To annotate regions with coverage > C.
Default: 500.
To disable it, specify "-C off", and high-coverage regions won't be annotated.
-n or --no-zero-output:
Disable annotation of zero-coverage regions.
Disabled by default.
-X or --upper-coverage-coefficients:
Comma-separated list of coverage coefficient(s).
To annotate regions with coverage > 1.7×M,
where M is median coverage, specify "-X 1.7".
Default: 2.0.
To disable it, specify "-X off", and high-coverage regions won't be annotated.
-l or --min-feature-len:
Minimum length of a feature to output. Must be int >= 0.
Default: 5 bp.
--circular:
Target sequence in curcular. Affects only corresponding GenBank field.
Disabled by default.
--organism:
Organism name. Affects only corresponding GenBank field.
If it contains spaces, surround it with quotes (see Example 4).
Empty by default.
-k or --keep-temp-cov-file:
Don't delete temporary TSV file "coverages.tsv" after work of the program.
The program creates this file in the same directory where the "-o" file is located.
Default behaviour is to delete this file afterwards.
* - mandatory option
Annotate file my_sequence.fasta with default parameters according to mapping from file my_mapping.sorted.bam:
./con-hi.py -f my_sequence.fasta -b my_mapping.sorted.bam
Annotate regions with coverage below 25, fragments with coverages below 50 and regions with zero coverages:
./con-hi.py -f my_sequence.fasta -b my_mapping.sorted.bam -c 25,50Annotate regions with coverage above 25 and fragments with coverages above 50:
./con-hi.py -f my_sequence.fasta -b my_mapping.sorted.bam -C 25,50Annotate regions with coverage below 25, fragments with coverages below 50. Disable annotation of zero coverage regions:
./con-hi.py -f my_sequence.fasta -b my_mapping.sorted.bam -c 25,50 -n
Specify the name of the organism for output file. The sequence is circular:
./con-hi.py -f my_sequence.fasta -b my_mapping.sorted.bam \
--circular --organism "Serratia marcescens"Disable annotation of low-coverage regions (-c off). Annotate high-coverage regions with coverage above 1.7×M and above 2.4×M (where M is median coverage) using -X option:
./con-hi.py -f my_sequence.fasta -b my_mapping.sorted.bam \
-c off -X 1.7,2.4Target file my_sequences.fasta contains the following sequences:
-
a prokaryotic chromosome (sequence id
chr); -
one high-copy plasmid (sequence id
plasmid_H1); -
two low-copy plasmids (sequence ids
plasmid_L1andplasmid_L2).
One might expect that the more copies a replicon has the higher is its read coverage. Use coverage threshold of 20 for the chromosome, 50 for the high-copy plasmid, and 5 for low-copy plasmids:
./con-hi.py -f my_sequences.fasta -b my_mapping.sorted.bam \
-r chr \
-c 20
./con-hi.py -f my_sequences.fasta -b my_mapping.sorted.bam \
-r plasmid_H1 \
-c 50
./con-hi.py -f my_sequences.fasta -b my_mapping.sorted.bam \
-r plasmid_L1,plasmid_L2 \
-c 5Output results in BED format instead of GenBank:
./con-hi.py -f my_sequence.fasta -b my_mapping.sorted.bam -O bed