MOSTAR is comprehensive bioinformatics pipeline for microbial analysis of whole-genome Oxford Nanopore sequencing data (ONT-reads). The pipeline constructs highly-polished genomes (using hybrid- or non-hybrid assembly), in addition to performing functional annotation, AMR profiling, ICE detection, and taxonomic classification — with built-in quality controls and an interactive HTML report. The pipeline bridges the gap between long-read and short-read technology, its name is therefore inspired by the historic Stari Most (Old Bridge) of Mostar, a symbol of connection and cultural resilience.
MOSTAR has been developed and tested on S. aureus, B. fragilis, as well as H. influenzae strains, but will work with any bacteria, as long as the correct genome size and ONT model are specified. The pipeline contains some of the most well known tools in bioinformatics, and is designed to be a "one-stop shop" for most bacterial analysis. Finally the pipeline provides result- and log files from every included tool.
- Long-read quality trimming (Filtlong)
- De novo assembly (Flye)
- ONT consensus polishing (Medaka)
- Genome rotation (circlator)
- AMR profiling (AMRFinder+)
- Interactive HTML report
- Short-read quality trimming (Fastp)
- Short-read alignment to ONT consensus (BWA-MEM)
- Short-read polishing (Polypolish)
- Taxonomic classification (Kraken2 / EMU)
- Functional annotation (Bakta)
- ICE detection — Integrative and Conjugative Elements (MacSyFinder / CONJScan)
- Plasmid-borne AMR cross-referencing (geNomad + AMRFinder+)
- Prophage detection and localisation (geNomad)
A successful run will contain the following output, including the final polished fasta, HTML-report, as well as individual output files and logs from all the included tools.
Output_folder |- amr_results | |- maps/ (Contains high-res .png circular genome maps) | |- AMR_Report.tsv |- annotation |- flye |- ice_detection |- annotation |- flye |- ice_detection |- intermediate |- logs |- medaka |- taxonomy |- amr_summary.html |- MOSTAR_Final_Report.html |- MOSTAR_Assembly.fasta
The installation has been designed to be as simple as possible. The included YML will create a separate conda environment with all the required dependencies. The only manual step is downloading and configuring databases. For some systems geNomad may become a dependency issue. If you encounter installation hang-ups, remove geNomad from the YML and install it separatly.
# Download the repository
git clone https://github.com/nermze/mostar.git
# Change to MOSTAR dir
cd mostar
# Create mostar_env using supplied YML
conda env create -f environment.yml -v
conda activate mostar_env
# Install MOSTAR
python -m pip install .
# Test the install
mostar --help
# If you encounter installation problems, first remove geNomad from the YML, then do
conda env create -f environment.yml -v
conda install -c conda-forge -c bioconda genomad
# Use micromamba to install (much faster)
conda install micromamba
micromamba env create -f environment.yml -v
micromamba activate mostar_env
python -m pip install .# Activate env (if not activated)
conda activate mostar-env
# Download AMRFinder+ database:
amrfinder -u
# Download and install CONJScan (required for MacSyFinder)
msf_data install CONJScan
# Download bakta database (Specify light or full)
bakta_db download --output <output-path> --type [light|full]
# Download Kraken2 database
# To download the small pre-built db (any Kraken2 compatible DB will also work)
mkdir -p ~/kraken2_db && cd ~/kraken2_db
wget https://genome-idx.s3.amazonaws.com/kraken/k2_pluspf_08gb_20240904.tar.gz
tar -xvzf k2_pluspf_08gb_20240904.tar.gz
# Download geNomad database in current directory (or specify path), approx 1.5Gb
genomad download-database .# Required:
* ONT-reads
* Genome size
* Model
* Output
# Run MOSTAR in ONT-only mode, assemble genome and perform AMR analysis.
mostar --ont ont.fq.gz --genome-size [size] --output [dir] --model [model]
# Run MOSTAR in Hybrid mode, assemble genome and perform AMR analysis.
mostar --ont ont.fq.gz --genome-size [size] --output [dir] --model [model] --r1 R1.fq --r2 R2.fq
# The "Everything" Run (Taxonomy, Annotation, ICE, and Plasticity/Prophages):
mostar --ont ont_read.fastq.gz --r1 read1.fastq.gz --r2 read2.fastq.gz \
--genome-size 1.9m --output Output \
--kraken2-db kraken2_db_path \
--bakta-db db-light_path --ice \
--genomad-db genomad_db_path --plasticity| Required | Tool/Name | Description |
|---|---|---|
--ont |
ONT Reads | Nanopore long-reads (.fastq.gz) |
--genome-size |
Genome Size | Estimated size (e.g., 2.1m, 500k) |
--output |
Output | Directory name for output files |
--model |
Model | Default: r1041_e82_400bps_sup_v5.2.0) |
| Options | ||
--r1/--r2 |
Illumina | Forward & Reverse short-reads (.fastq.gz) |
--organism |
AMRFinder+ | Organism (e.g., Escherichia, Staphylococcus) |
--meta |
Flye | Enable Meta-Genome mode, omit --genome-size [Default: disabled] |
| Annotation | ||
--bakta-db |
Bakta | Path to Bakta database |
--bakta-ref |
Bakta | Annotation reference sequence (.gff) |
--complete |
Bakta | Enable if sequence is complete (circular) [Default: disabled] |
| Mobile element Detection | ||
--ice |
MacSyFinder | Use with --bakta-db [Default: disabled] |
--plasticity |
geNomad | Plasticity and prophage tracker [Default: disabled] |
| Classification | ||
--kraken2-db |
Kraken2 | Requires path to pre-built Kraken2 database" |
--confidence |
Kraken2 | Kraken2 confidence threshold [Default: 0.1 |
| Other | ||
--cleanup |
Cleanup | Delete intermediate files |
--threads |
Threads | Select number of threads |
--help/-h |
Help | Show help menu |
The report features key run-metrics from Medaka (and Polypolish if hybrid), including assembly statistics and number of contigs. The report is dynamic and will adapt to user input, as some of the tools like taxonomy and short-read polishing are optional. If taxonomy has been enabled, the pipeline will automatically pass the identified species ID on to AMRFinder+.
By including geNomad in the pipeline, MOSTAR will also detect plasmid-borne AMR genes, in addition prophages and their locations.
The report will also draw interactive genome maps, with visualization of AMR-gene locations, direction, detected ICE, and GC-content.
If any ICE's are detected by MacSyFinder CONJScan, the pipeline will also extract genomic coordinates from the annotation file provided by Bakta to visualize them on the map. Notice how AMR genes are located on the ICE-element.
Finaly the report willl also feature a detailed AMR table derived by NCBI AMRFinder+. Plasmid-borne genes will be color-coded distinct red.
1. Fastp 2. Flye 3. Medaka 4. BWA 5. AMRFinder+ 6. Bakta 7. Polypolish 8. Filtlong 9. Samtools 10. Minimap2 11. Kraken2 13. MacSyFinder 14. geNomad 15. Python3
If --model is not specified, MOSTAR defaults to r1041_e82_400bps_sup_v5.2.0, which corresponds to R10.4.1 flowcells basecalled with the Super Accuracy model at 400 bps. This default is appropriate for most modern ONT runs but must be changed if your data was generated on a different flowcell or basecalling configuration — using the wrong model is one of the most common causes of poor polishing outcomes. To list all models available in your Medaka installation, run: hmedaka tools list_models
The --ice module depends on protein sequences produced by Bakta to query the MacSyFinder CONJScan database. If --bakta-db is not provided, Bakta annotation is skipped and no .faa file will be produced, causing ICE detection to be silently bypassed. Always pair --ice with --bakta-db to ensure this module runs. If you see the warning No protein file found — skipping ICE detection, this is the cause.
When --kraken2-db is provided, MOSTAR uses the top-confidence Kraken2 hit to identify the organism and passes it to AMRFinder+ as the --organism flag, enabling species-specific point mutation screening in addition to gene-based resistance detection. Point mutation models are only available for a subset of clinically relevant organisms. If your organism is not supported, AMRFinder+ will still run in gene-detection mode without point mutations. To see all supported organisms, run: amrfinder --list_organisms If you know your organism and want to override automatic detection, or if you are running without a Kraken2 database, use: --organism Klebsiella Leave --organism unset if the organism is unknown — AMRFinder+ will still provide a complete gene-level resistome profile.
If your assembly is fragmented, missing expected genomic features, or producing an unusually high contig count, your sample may have uneven read depth — common in direct clinical extractions, environmental samples, mixed cultures, or plasmid-enriched preps. Re-run with the --meta flag to enable Flye's uneven-coverage assembly mode, which does not assume uniform depth across the genome: mostar --ont reads.fq.gz --genome-size 5m --output outdir --meta Note that --meta mode disables some of Flye's coverage-based error correction, so it should only be used when standard assembly fails or produces poor results.
If the hybrid polishing step reports mean read depth: 0.0x across all contigs, your Illumina reads are likely incomplete, truncated, or mismatched to the assembly. Verify your R1/R2 files are complete and correctly paired before re-running. MOSTAR validates that these files exist and are non-empty at startup, but cannot detect partially downloaded or corrupted files. Check read counts with: hecho
If the specified --output directory already exists from a previous run, MOSTAR will write into it and overwrite existing files without warning. If you want to preserve a previous result, rename the output directory before re-running or specify a new output path.
In hybrid mode, if the Medaka and Final assembly statistics are identical, Polypolish ran but made no changes. This is expected when short-read coverage is very low (typically below 5×) and does not indicate an error. Check logs/polypolish.log to confirm — the mean read depth per contig will be reported there.
Developed and maintained by Nermin Zecic (@nermze). For questions, bugs, or feature requests, please open an Issue.







