We will need to perform quality control on sequencing reads, BAM files generated by aligners and alignment statistics output by aligners.
Use the small BAM file provided at the following link to test out each of the quality control tools in your container locally.
You need to understand what inputs are required by each quality control tool, and the expected outputs.
FastQC
Requires sequencing reads as input.
Samtools
Requires as input a BAM file generated by aligners. BAM file must be sorted, and have an index file. Refer to samtools --help for documentation regarding the stats we want to include in our report:
QualiMap
Requires as input BAM file generated by aligners. BAM file must be sorted, and have an index file. Refer to qualimap --help for documentation regarding bamqc and rnaseq.
The rnaseq module requires additional input files such as a GTF file. You can use the following GTF file here which corresponds to the test BAM file.
Remove the first line in the genes.gtf file, its chromosome name is nonsensical:
tail -n +2 genes.gtf > tmp.gtf && mv tmp.gtf genes.gtf
RSeQC
RSeQC requires annotations in BED format for the -r flag. Convert the GTF file to a BED file. I had serious issues using gtf2bed so as a workaround, use gffread:
gffread -F --keep-exon-attrs genes.gtf --bed > genes.bed
The meaning behind A/B/C:
A: python script does not have -o flag, redirect stdout to .txt file using >:
infer_experiment.py -i RAP1_UNINDUCED_REP2.Aligned.out_sorted.bam -r genes.bed > RAP1_UNINDUCED_REP2.Aligned.out_infer_experiment.txt
B: Has the -o flag, pass file baseName to arg:
read_duplication.py -i RAP1_UNINDUCED_REP2.Aligned.out_sorted.bam -o RAP1_UNINDUCED_REP2.Aligned.out
C: Has -o flag, but must redirect output to .txt file using 2> for multiqc compatibility:
junction_annotation.py -i RAP1_UNINDUCED_REP2.Aligned.out_sorted.bam -r genes.bed -o RAP1_UNINDUCED_REP2.Aligned.out 2> RAP1_UNINDUCED_REP2.Aligned.out_junctions.txt
We will need to perform quality control on sequencing reads,
BAMfiles generated by aligners and alignment statistics output by aligners.Use the small
BAMfile provided at the following link to test out each of the quality control tools in your container locally.You need to understand what inputs are required by each quality control tool, and the expected outputs.
FastQC
Requires sequencing reads as input.
Samtools
Requires as input a
BAMfile generated by aligners.BAMfile must be sorted, and have an index file. Refer tosamtools --helpfor documentation regarding the stats we want to include in our report:depthflagstatidxstatsstatsQualiMap
Requires as input
BAMfile generated by aligners.BAMfile must be sorted, and have an index file. Refer toqualimap --helpfor documentation regardingbamqcandrnaseq.The
rnaseqmodule requires additional input files such as aGTFfile. You can use the followingGTFfile here which corresponds to the testBAMfile.Remove the first line in the
genes.gtffile, its chromosome name is nonsensical:bamqcrnaseqRSeQC
RSeQCrequires annotations inBEDformat for the-rflag. Convert theGTFfile to aBEDfile. I had serious issues usinggtf2bedso as a workaround, usegffread:gffread -F --keep-exon-attrs genes.gtf --bed > genes.bedinfer_experiment.py(A)bam_stat.py(A)inner_distance.py(PE only, otherwise empty files) (B)read_distribution.py(A)read_duplication.py(B)junction_annotation.py(C)junction_saturation.py(A)The meaning behind
A/B/C:A: python script does not have
-oflag, redirect stdout to.txtfile using>:infer_experiment.py -i RAP1_UNINDUCED_REP2.Aligned.out_sorted.bam -r genes.bed > RAP1_UNINDUCED_REP2.Aligned.out_infer_experiment.txtB: Has the
-oflag, pass file baseName to arg:C: Has
-oflag, but must redirect output to.txtfile using2>formultiqccompatibility:junction_annotation.py -i RAP1_UNINDUCED_REP2.Aligned.out_sorted.bam -r genes.bed -o RAP1_UNINDUCED_REP2.Aligned.out 2> RAP1_UNINDUCED_REP2.Aligned.out_junctions.txt