intronIC is a bioinformatics tool for extracting and classifying intron sequences as U12-type (minor) or U2-type (major) using a support vector machine trained on position-weight matrix scores.
pip install intronIC# Classify introns (default model loaded automatically)
intronIC -g genome.fa.gz -a annotation.gff3.gz -n species_name -p 8
# Extract sequences only (no classification)
intronIC extract -g genome.fa.gz -a annotation.gff3.gz -n species_name -p 8
# Train a custom model (optional - most users don't need this)
intronIC train -n my_model -p 8# Quick installation test using bundled test data
intronIC test -p 4
# Or show where test data is located
intronIC test --show-only- Changelog - Release notes and version history
For complete documentation, see the intronIC Wiki:
- Quick Start Guide - Installation, dependencies, resource usage
- Overview - Classification approach and scientific background
- Usage Info - Complete CLI reference
- Output Files - File formats and interpretation
- Technical Details - Algorithm and ML architecture
- Example Usage - Common workflows
- About - Background and motivation
- New 8D RBF SVM default model trained on expanded reference data (472 U12-type + 30,155 U2-type introns)
- Five new classification features: branch point offset, BPS motif sharpness, polypyrimidine tract metrics, and multi-site support scoring
- Reduced false positives: 0 confident false calls in C. elegans (was 2), 1 in Ascaris (was 47)
- See CHANGELOG.md for full release history
- RBF SVM classification with probability scores (0-100%) using 8 sequence-derived features
- Default pretrained model loaded automatically — works for virtually all species
- Streaming mode (default) for ~85% memory reduction on large genomes
- Parallel processing for improved performance (
-p 8recommended) - Fast runtimes: ~6-10 minutes for human genome with default settings
- Comprehensive metadata including phase, position, parent gene/transcript
Most eukaryotic introns (~99.5%) are spliced by the major (U2-type) spliceosome, while a small fraction (~0.5%) are spliced by the minor (U12-type) spliceosome. U12-type introns have:
- Highly conserved TCCTTAAC branch point motif
- Terminal dinucleotides: AT-AC (~25%) or GT-AG (~75%)
- Functional importance and evolutionary conservation
intronIC identifies U12-type introns using:
- PWM Scoring: Apply position-weight matrices to 5' splice site, branch point, and 3' splice site regions
- Normalization: Convert raw scores to z-scores via robust scaling
- Feature Engineering: Compute composite features (multi-site corroboration, BP position, PPT metrics, BPS motif sharpness)
- SVM Classification: RBF SVM ensemble with balanced class weights outputs probability scores
For detailed algorithm description, see the Technical Details wiki page.
If you use intronIC in your research, please cite:
Devlin C Moyer, Graham E Larue, Courtney E Hershberger, Scott W Roy, Richard A Padgett. Comprehensive database and evolutionary dynamics of U12-type introns. Nucleic Acids Research, Volume 48, Issue 13, 27 July 2020, Pages 7066–7078. https://doi.org/10.1093/nar/gkaa464
- Documentation: intronIC Wiki
- Issues: GitHub Issues
- Discussions: GitHub Discussions
Contributions are welcome! Please see CONTRIBUTING.md for guidelines.
git clone https://github.com/glarue/intronIC.git
cd intronIC
make install # Set up development environment
make test # Run testsintronIC is released under the GNU General Public License v3.0.
