bam2bed.py - convert bam formatted file to bed formatted file¶
- Tags
Genomics NGS Intervals BAM BED Conversion
Purpose¶
This tool converts BAM files into BED files supplying the intervals for each read in the BAM file. BAM files must have a corresponding index file ie. example.bam and example.bam.bai
For example:
samtools view example.bam
READ1 163 1 13040 15 76M = 13183 219 ...
READ1 83 1 13183 7 76M = 13040 -219 ...
READ2 147 1 13207 0 76M = 13120 -163 ...
python bam2bed.py example.bam
1 13039 13115 READ1 15 +
1 13119 13195 READ2 0 +
1 13182 13258 READ1 7 -
1 13206 13282 READ2 0 -
By default, bam2bed outputs each read as a separate interval. With
the option --merge-pairs
paired-end reads are merged and output as
a single interval. The strand is set according to the first read in a
pair.
Usage¶
cgat bam2bed BAMFILE [--merge-pairs] [options]
operates on the file BAMFILE:
cgat bam2bed [--merge-pairs] [options]
operates on the stdin as does:
cgat bam2bed -I BAMFILE [--merge-pairs] [options]
To merge paired-end reads and output fragment interval ie. leftmost mapped base to rightmost mapped base:
cat example.bam | cgat bam2bed --merge-pairs
1 13119 13282 READ2 0 +
1 13039 13258 READ1 7 +
To use merge pairs on only a region of the genome use samtools view:
samtools view -ub example.bam 1:13000:13100 | cgat bam2bed --merge-pairs
Note that this will select fragments were the first read-in-pair is in the region.
Options¶
- -m, --merge-pairs
Output one region per fragment rather than one region per read, thus a single region is create stretching from the start of the frist read in pair to the end of the second.
Read pairs that meet the following criteria are removed:
Reads where one of the pair is unmapped
Reads that are not paired
Reads where the pairs are mapped to different chromosomes
Reads where the the insert size is not between the max and min (see below)
Warning
Merged fragements are always returned on the +ve strand. Fragement end point is estimated as the alignment start position of the second-in-pair read + the length of the first-in-pair read. This may lead to inaccuracy if you have an intron-aware aligner.
- --max-insert-size, --min-insert-size
The maximum and minimum size of the insert that is allowed when using the –merge-pairs option. Read pairs closer to gether or futher apart than the min and max repsectively are skipped.
- -b, --bed-format
What format to output the results in. The first n columns of the bed file will be output.
Type:
python bam2bed.py --help
for command line help.
Command line options¶
usage: bam2bed [-h] [--version] [-m] [--max-insert-size MAX_INSERT_SIZE]
[--min-insert-size MIN_INSERT_SIZE] [--bed-format {3,4,5,6}]
[--timeit TIMEIT_FILE] [--timeit-name TIMEIT_NAME]
[--timeit-header] [--random-seed RANDOM_SEED] [-v LOGLEVEL]
[--log-config-filename LOG_CONFIG_FILENAME]
[--tracing {function}] [-? ?] [-I STDIN] [-L STDLOG]
[-E STDERR] [-S STDOUT]
bam2bed: error: argument -?: expected one argument