Galah - Scalable dereplication and MIMAG calculation for metagenome assembled genomes.
Documentation can be found at https://wwood.github.io/galah/.
Galah aims to be a scalable metagenome assembled genome (MAG) dereplication and quality assessment method. Dereplication clusters genomes together based on their average nucleotide identity (ANI), and chooses a single member of each cluster as the representative. Quality assessment results in a MIMAG quality score for each genome, based on its completeness, contamination and the presence of rRNA and tRNA genes.
# Install latest release via conda.
conda create -n galah -c bioconda -c conda-forge galahFor clustering and determining MIMAG quality scores:
galah process --genome-fasta-files /path/to/genome1.fna /path/to/genome2.fna \
--output-cluster-definition clusters.tsv \
--output-mimag-summary mimag.tsvFor clustering a set of genomes at 95% ANI:
galah cluster --genome-fasta-files /path/to/genome1.fna /path/to/genome2.fna \
--output-cluster-definition clusters.tsvFor clustering a set of contigs at 95% ANI:
galah cluster --cluster-contigs --small-genomes --genome-fasta-files /path/to/contigs.fna \
--output-cluster-definition clusters.tsvFor determining MIMAG quality scores for a set of genomes with CheckM2, Barrnap, and tRNAscan-SE:
galah analyse --genome-fasta-files /path/to/genome1.fna /path/to/genome2.fna \
--output-mimag-summary mimag.tsvIf you have any questions or need help, please open an issue.
Galah is developed by the Woodcroft lab at the Centre for Microbiome Research, School of Biomedical Sciences, QUT, with contributions from Samuel Aroney, Antônio Camargo, and Rhys Newell. It is licensed under GPL3 or later.
The source code is available at https://github.com/wwood/galah.
Aroney, S.T.N., Camargo, A.P., Tyson, G.W. and Woodcroft B.J. Galah: More scalable dereplication for metagenome assembled genomes. Zenodo (2024). https://doi.org/10.5281/zenodo.13637856
