Skip to content

bug fix for read duplication with nproc > 1#80

Open
blake-bowen wants to merge 2 commits into
timoast:masterfrom
blake-bowen:fix-read-duplication-bug
Open

bug fix for read duplication with nproc > 1#80
blake-bowen wants to merge 2 commits into
timoast:masterfrom
blake-bowen:fix-read-duplication-bug

Conversation

@blake-bowen

Copy link
Copy Markdown

Fixes #76.

The issue described by #76 was also present in my data, causing downstream bamtofastq to error on my sinto-filtered bams.

The bug happens when nproc > 1 and seems to be caused by pysam.fetch(chrom, start, end) returning reads overlapping both start and end positions, so reads spanning a chunk boundary are returned by both workers and written twice when nproc > 1.

Fix: in _iterate_reads, skip reads with reference_start < i[1], so that each chunk has a unique set of reads. The >= 0 keeps unmapped reads which have reference_start == -1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

filterbarcodes creates duplicated bam lines

1 participant