bam_vs_bed.py - count context that reads map to¶
- Tags
Genomics NGS Intervals BAM BED Counting
Purpose¶
This script takes as input a BAM file from an RNA-seq or similar experiment and a bed formatted file. The bed formatted file needs at least four columns. The fourth (name) column is used to group counts.
The script counts the number of alignments overlapping in the first input file that overlap each feature in the second file. Annotations in the bed file can be overlapping - they are counted independently.
Note that duplicate intervals will be counted multiple times. This situation can easily arise when building a set of genomic annotations based on a geneset with alternative transcripts. For example:
chr1 10000 20000 protein_coding # gene1, transrcipt1
chr1 10000 20000 protein_coding # gene1, transcript2
Any reads overlapping the interval chr1:10000-20000 will be counted twice into the protein_coding bin by bedtools. To avoid this, remove any duplicates from the bed file:
zcat input_with_duplicates.bed.gz | cgat bed2bed --merge-by-name | bgzip > input_without_duplicates.bed.gz
This scripts requires bedtools to be installed.
Options¶
- -a, –bam-file / -b, –bed-file
These are the input files. They can also be provided as provided as positional arguements, with the bam file being first and the (gziped or uncompressed) bed file coming second
- -m, --min-overlap
Using this option will only count reads if they overlap with a bed entry by a certain minimum fraction of the read.
Example
Example:
python bam_vs_bed.py in.bam in.bed.gz
Usage¶
Type:
cgat bam_vs_bed BAM BED [OPTIONS]
cgat bam_vs_bed --bam-file=BAM --bed-file=BED [OPTIONS]
where BAM is either a bam or bed file and BED is a bed file.
Type:
cgat bam_vs_bed --help
for command line help.
Command line options¶
usage: bam-vs-bed [-h] [--version] [-m MIN_OVERLAP] [-a bam] [-b bed] [-s]
[--assume-sorted] [--split-intervals] [--timeit TIMEIT_FILE]
[--timeit-name TIMEIT_NAME] [--timeit-header]
[--random-seed RANDOM_SEED] [-v LOGLEVEL]
[--log-config-filename LOG_CONFIG_FILENAME]
[--tracing {function}] [-? ?] [-I STDIN] [-L STDLOG]
[-E STDERR] [-S STDOUT]
bam-vs-bed: error: argument -?: expected one argument