I ran STAR on my RNAseq data with T2T-CHM13 reference, and received many reads unmapped due to "too short" or other.
When remapping using the unmapped.out file and decreasing both the --outFilterMatchNminOverLread, and --outFilterScoreMinOverLread to 0 the "too short" reads map, but still get a high percentage of "unmapped: other".
Wanting to understand what causes the other reads to be unmapped I continued looking only at these reads (from unmapped.out file - after short reads were mapped)
After looking only at the fastqc of the "unmapped: other" reads I re-trimmed the file so the reads are appx. of length 50 with quality scores ~32 across all bases. running star again mapped insignificant amounts of reads (99.6% still unmapped: other)
running blast on the overrepresented sequences (from fastqc) revealed they are mostly rRNA sequences and are found in human.
I tried running with --twopassMode Basic and increasing --winAnchorMultimapNmax but this only increased the amount of "unmapped: other" reads. (increasing --winAnchorMultimapNmax without --twopassMode Basic also increased amount of unmapped)
I would appreciate a direction in which to continue, or reasons as to why there are so many "unmapped: other" reads when quality seems to be fine and based on BLAST there is no contamination.
thank you!
I ran STAR on my RNAseq data with T2T-CHM13 reference, and received many reads unmapped due to "too short" or other.
When remapping using the unmapped.out file and decreasing both the --outFilterMatchNminOverLread, and --outFilterScoreMinOverLread to 0 the "too short" reads map, but still get a high percentage of "unmapped: other".
Wanting to understand what causes the other reads to be unmapped I continued looking only at these reads (from unmapped.out file - after short reads were mapped)
After looking only at the fastqc of the "unmapped: other" reads I re-trimmed the file so the reads are appx. of length 50 with quality scores ~32 across all bases. running star again mapped insignificant amounts of reads (99.6% still unmapped: other)
running blast on the overrepresented sequences (from fastqc) revealed they are mostly rRNA sequences and are found in human.
I tried running with --twopassMode Basic and increasing --winAnchorMultimapNmax but this only increased the amount of "unmapped: other" reads. (increasing --winAnchorMultimapNmax without --twopassMode Basic also increased amount of unmapped)
I would appreciate a direction in which to continue, or reasons as to why there are so many "unmapped: other" reads when quality seems to be fine and based on BLAST there is no contamination.
thank you!