A high-performance allele caller for cgMLST/wgMLST schemas, inspired by and compatible with chewBBACA.
chewcall reimplements the AlleleCall algorithm from chewBBACA in Rust, replacing BLASTp with SIMD-accelerated exact Smith-Waterman protein alignment via parasail.
- Rust 1.75+ (install via rustup)
- parasail (SIMD-accelerated Smith-Waterman library)
- Optional: CUDA 12+ and NVIDIA GPU (for
--gpumode)
# Build parasail (one-time)
git clone https://github.com/jeffdaily/parasail.git
cd parasail && mkdir build && cd build
cmake .. && make -j$(nproc)
cd ../..
# Standard build
RUSTFLAGS="-C target-cpu=native" cargo build --release
# With GPU support (requires CUDA)
CUDA_HOME=/usr/local/cuda RUSTFLAGS="-C target-cpu=native" cargo build --release
# Run (parasail must be in LD_LIBRARY_PATH)
LD_LIBRARY_PATH=/path/to/parasail/build ./target/release/chewcall [OPTIONS]The binary is at target/release/chewcall.
# Run allele calling (built-in CDS prediction via prodigal-rs)
chewcall \
-i /path/to/genomes \
-g /path/to/schema \
-o /path/to/output \
--cpu 8
# Or with pre-computed CDS (pyrodigal, for exact comparison with chewBBACA)
python predict_cds.py \
-i /path/to/genomes \
-g /path/to/schema \
-o /path/to/cds_output
chewcall \
-i /path/to/genomes \
-g /path/to/schema \
-o /path/to/output \
--cpu 8 \
--cds-input /path/to/cds_outputchewcall [OPTIONS] -i <INPUT> -g <SCHEMA> -o <OUTPUT>
Options:
-i, --input <INPUT> Input directory with genome FASTA files
-g, --schema <SCHEMA> Schema directory (chewBBACA format)
-o, --output <OUTPUT> Output directory
--cpu <CPU> Number of CPU threads [default: 1]
--cds-input <CDS_INPUT> Pre-computed CDS directory (skip built-in prediction)
--mode <MODE> Alignment mode: "fast" (parasail) or "compatible" (BLAST) [default: fast]
--blastp-path <PATH> Path to blastp binary (required for --mode compatible)
--gpu Use GPU (CUDA) for Smith-Waterman alignment
--update-schema Append novel alleles (INF) to schema FASTA files in place
--bsr <BSR> BLAST Score Ratio threshold [default: 0.6]
--size-threshold <SIZE> Size threshold for ASM/ALM [default: 0.2]
--min-length <MIN> Minimum sequence length [default: 0]
-t, --translation-table <TT> Genetic code [default: 11]
--prodigal-mode <MODE> Prodigal mode: single or meta [default: single]
chewcall supports three CDS prediction modes:
- Built-in prodigal-rs (default) — Pure Rust reimplementation of Prodigal 2.6.3 (single mode). No external dependencies. Uses the
.trntraining file from the schema directory. - Pre-computed CDS (
--cds-input) — Reads CDS from a directory of FASTA files pre-computed with pyrodigal or prodigal. - External prodigal (
--prodigal-path) — Spawns prodigal as a subprocess for each genome.
chewcall works with any schema in the standard chewBBACA format:
schema/
├── locus1.fasta # Full allele sequences
├── locus2.fasta
├── short/
│ ├── locus1_short.fasta # Representative alleles
│ └── locus2_short.fasta
└── *.trn # Prodigal training file
Schemas can be downloaded from Chewie-NS or prepared with chewBBACA's PrepExternalSchema / CreateSchema.
chewBBACA.py DownloadSchema -sp <species_id> -sc <schema_id> -o schema_dir| File | Description |
|---|---|
results_alleles.tsv |
Allelic profiles (locus x genome matrix) |
results_alleles_hashed.tsv |
CRC32-hashed allelic profiles |
results_statistics.tsv |
Per-genome classification statistics |
loci_summary_stats.tsv |
Per-locus classification counts |
results_contigsInfo.tsv |
CDS coordinates on contigs |
novel_alleles.fasta |
Novel allele sequences (INF) |
GPL-3.0 — same as the original chewBBACA.
GenPat Team — Istituto Zooprofilattico Sperimentale dell'Abruzzo e del Molise