Skip to content

Latest commit

 

History

History
78 lines (53 loc) · 2.05 KB

File metadata and controls

78 lines (53 loc) · 2.05 KB

BEDTOOLS

GFF3 to BED

A BED file is very similar to a GFF3 file, and can generally be generated by simply selecting a subset of GFF3 columns. GFF3 coordinates are 1-indexed, whereas BED coordinates are 0-indexed.

# generate a BED3 file
awk 'BEGIN {OFS = "\t"} ; $3=="gene" {print $1,$4-1,$5}' in.gff3 > out.bed

# generate a BED4 file
# BED4 indicates you have the first 4 columns as described in the official BED format specifications
awk 'BEGIN {OFS = "\t"} ; $3=="gene" {print $1,$4-1,$5,$9}' in.gff3 > out.bed

bedtools intersect

Report overlaps between two feature files

# report overlap between a BED and a BAM file
## report read alignments in BAM that overlap with features in BED
bedtools intersect -a <BED> -b <BAM>

## report features in BED that overlap with read alignments in BAM at least once
bedtools intersect -a <BED> -b <BAM> -u

bedtools coverage

# reports how BAM read mappings cover BED features
bedtools coverage -a <BED> -b <BAM>
# output:
# feature number_of_reads_that_cover_feature length_of_coverage length_of_feature ratio

# if BED and BAM are sorted in the same chromosome order, use -sorted to make it a lot faster
bedtools coverage -a <BED> -b <BAM> -sorted
## if they are not sorted, you can use bedtools sort (see below) to ensure that they are

# ensure that both <BED> and <BAM> have the exact same set of contigs,
# otherwise bedtools may complain

bedtools sort

# sort a BED file according to a contig or chromosome order listed in a file
bedtools sort -i <BED> -g <contigs.list>

bedtools genomecov

Calculate read depth in a BAM file and store as bedGraph file

# bedgraph = seqid, start, end, coverage
bedtools genomecov -ibam BAM -bg > BEDGRAPH

# depth file = seqid, pos, depth
bedtools genomecov -ibam BAM -d > DEPTH

Make a GC-content bedgraph file

cat Ergobibamus_cyprinoides_CL.scaffolds.fa \
    | seqkit sliding -W 50 -s 25 \
    | seqkit fx2tab -n -g \
    | sed -e "s/_sliding\:/\t/" -e "s/\-/\t/" -e "s/\s+/\t/g" \
    > ergo_gc_content.bedgraph