SpliceAI

Implementation of SpliceAI models in Pytorch with added Dropout layers

Model was trained transcripts from hg38.

For transcripts with multiple isoforms, isoform with highest number of exons was chosen for training and testing.

Setup

Create conda/mamba environment

mamba env create -f environment.yml 
mamba activate spliceAI

Download datasets for training and testing the models

Download from Google Cloud Storage (recommended)

python3 utils/download_dataset.py

You can also recreate datasets from scratch with

python3 utils/create_datasets.py \
        --fasta # reference genome - leave as "" to download
        --gff  # reference genome gff annotations - leave as "" to download
        --chrom # chromosome id map - leave as "" to download
        --flank 5000 # 5000 if you want to prepare datasets for the largest SpiceAI model
        --outdir <Path to output directory>

Train model

python3 model/train.py 
--train_data data/datasets/train.pkl 
--val_data data/datasets/val.pkl 
--train_fraction 0.02 # set to 1. to train on entire dataset
--val_fraction 0.2 # set to 1. to validate on entire dataset
--batch_size 256 
--num_epochs 200 
--learning_rate 0.001 
--flank 5000 # 5000 is for SpliceAI_10k model
--seed 1809 
--output_dir <Path to output directory to save model checkpoints and log file>

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
.github/workflows		.github/workflows
images		images
model		model
outputs		outputs
utils		utils
.gitignore		.gitignore
README.md		README.md
environment.yml		environment.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SpliceAI

Setup

Download datasets for training and testing the models

Train model

Result

SpliceAI_10k model

Loss curves

Metrics

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SpliceAI

Setup

Download datasets for training and testing the models

Train model

Result

SpliceAI_10k model

Loss curves

Metrics

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages