bam2bed.py - convert bam formatted file to bed formatted file

Tags

Genomics NGS Intervals BAM BED Conversion

Purpose

This tool converts BAM files into BED files supplying the intervals for each read in the BAM file. BAM files must have a corresponding index file ie. example.bam and example.bam.bai

For example:

samtools view example.bam

READ1    163    1      13040   15     76M    =      13183   219     ...
READ1    83     1      13183   7      76M    =      13040   -219    ...
READ2    147    1      13207   0      76M    =      13120   -163    ...

python bam2bed.py example.bam

1       13039   13115   READ1     15      +
1       13119   13195   READ2     0       +
1       13182   13258   READ1     7       -
1       13206   13282   READ2     0       -

By default, bam2bed outputs each read as a separate interval. With the option --merge-pairs paired-end reads are merged and output as a single interval. The strand is set according to the first read in a pair.

Usage

cgat bam2bed BAMFILE [--merge-pairs] [options]

operates on the file BAMFILE:

cgat bam2bed [--merge-pairs] [options]

operates on the stdin as does:

cgat bam2bed -I BAMFILE [--merge-pairs] [options]

To merge paired-end reads and output fragment interval ie. leftmost mapped base to rightmost mapped base:

cat example.bam | cgat bam2bed --merge-pairs

1       13119   13282   READ2     0       +
1       13039   13258   READ1     7       +

To use merge pairs on only a region of the genome use samtools view:

samtools view -ub example.bam 1:13000:13100 | cgat bam2bed --merge-pairs

Note that this will select fragments were the first read-in-pair is in the region.

Options

-m, --merge-pairs

Output one region per fragment rather than one region per read, thus a single region is create stretching from the start of the frist read in pair to the end of the second.

Read pairs that meet the following criteria are removed:

  • Reads where one of the pair is unmapped

  • Reads that are not paired

  • Reads where the pairs are mapped to different chromosomes

  • Reads where the the insert size is not between the max and min (see below)

Warning

Merged fragements are always returned on the +ve strand. Fragement end point is estimated as the alignment start position of the second-in-pair read + the length of the first-in-pair read. This may lead to inaccuracy if you have an intron-aware aligner.

--max-insert-size, --min-insert-size

The maximum and minimum size of the insert that is allowed when using the –merge-pairs option. Read pairs closer to gether or futher apart than the min and max repsectively are skipped.

-b, --bed-format

What format to output the results in. The first n columns of the bed file will be output.

Type:

python bam2bed.py --help

for command line help.

Command line options

usage: bam2bed [-h] [--version] [-m] [--max-insert-size MAX_INSERT_SIZE]
               [--min-insert-size MIN_INSERT_SIZE] [--bed-format {3,4,5,6}]
               [--timeit TIMEIT_FILE] [--timeit-name TIMEIT_NAME]
               [--timeit-header] [--random-seed RANDOM_SEED] [-v LOGLEVEL]
               [--log-config-filename LOG_CONFIG_FILENAME]
               [--tracing {function}] [-? ?] [-I STDIN] [-L STDLOG]
               [-E STDERR] [-S STDOUT]
bam2bed: error: argument -?: expected one argument