bed2bed - manipulate bed files¶

Purpose¶

This script provides various methods for merging (by position, by name or by score), filtering and moving bed formatted intervals and outputting the results as a bed file

This script provides several methods, each with a set of options

to control behavoir:

cgat.tools.bed2bed.merge()¶

+++++

Merge together overlapping or adjacent intervals. The basic

functionality is similar to bedtools merge, but with some additions:

\* Merging by name: specifying the --merge-by-name option will mean: that only overlaping (or adjacent intervals) with the same value in the 4th column of the bed will be merged

\* Removing overlapping intervals with inconsistent names: set the: --remove-inconsistent-names option.

.. caution::: Intervals of the same name will only be merged if they are consecutive in the bed file.

\* Only output merged intervals: By specifiying the --merge-min-intervals=n: options, only those intervals that were created by merging at least n intervals together will be output

Intervals that are close but not overlapping can be merged by setting

--merge-distance to a non-zero value

cgat.tools.bed2bed.bins()¶

++++

Merges together overlapping or adjecent intervals only if they have

"similar" scores. Score similarity is assessed by creating a number of

score bins and assigning each interval to a bin. If two adjacent

intervals are in the same bin, the intervals are merged. Note that in

contrast to merge-by-name above, two intervals do not need to be

overlapping or within a certain distance to be merged.

There are several methods to create the bins:

\* equal-bases: Bins are created to that they contain the same number of bases.: Specified by passing “equal-bases” to –binning-method. This is the default.

\* equal-intervals: Score bins are create so that each bin contains the: same number of intervals. Specified by passing “equal-intervals” to –binning-method.

\* equal-range: Score bins are created so that: each bin covers the same fraction of the total range of scores. Specified by passing “equal-range” to –binning-method.

\* bin-edges: Score binds can be specified by manually passing a comma: seperated list of bin edges to –bin-edges.

The number of bins is specified by the --num-bins options, and the

default is 5.

cgat.tools.bed2bed.block()¶

+++++

Creates blocked bed12 outputs from a bed6, where intervals with the

same name are merged together to create a single bed12 entry.

.. Caution:: Input must be sorted so that entries of the same

name are together.

filter-genome

+++++++++++++

Removes intervals that are on unknown contigs or extend off the 3' or

5' end of the contig. Requires a tab seperated input file to -g which

lists the contigs in the genome, plus their lengths.

sanitize-genome

+++++++++++++++

As above, but instead of removing intervals overlapping the ends of

contigs, truncates them. Also removes empty intervals.

filter-names

++++++++++++

Output intervals whose names are in list of desired names. Names are

supplied as a file with one name on each line.

cgat.tools.bed2bed.shift()¶

+++++

Moves intervals by the specified amount, but will not allow them to be

shifted off the end of contigs. Thus if a shift will shift the start

of end of the contig, the interval is only moved as much as is

possible without doing this.

rename-chr

++++++++++

Renames chromosome names. Source and target names are supplied as a file

with two columns. Examples are available at:

https://github.com/dpryan79/ChromosomeMappings

Note that unmapped chromosomes are dropped from the output file.

Other options

+++++++++++++

-g/--genome-file, -b/--bam-file:: the filter-genome, sanitize-genome and shift methods require a genome in order to ensure they are not placing intervals outside the limits of contigs. This genome can be supplied either as a samtools or cgat indexed genome, or extracted from the header of a bam file.

Examples

Merge overlapping or adjectent peaks from a CHiP-seq experiment where the intervals have the same name:

cat chip-peaks.bed | cgat bed2bed –method=merge –merge-by-name > chip-peaks-merged.bed

Merge adjected ChIP-seq peaks if their scores are in the same quartile of all scores:

cat chip-peaks.bed | cgat bed2bed –method=bins –binning-method=equal-intervals –num-bins=4

Remove intervals that overlap the ends of a contig and those that are on a non-standard contig. Take the input intervals from a file rather than stdin. Note that hg19.fasta has been indexed with index_genome:

cgat bed2bed –method=filter-genome –genome-file=hg19.fasta -I chip-peaks.bed -O chip-peaks-sanitized.bed

Convert a bed file contain gene structures with one line per exon to a bed12 with linked block representing the gene structure. Note the transparent use of compressed input and output files:

cgat bed2bed –method=block -I transcripts.bed.gz -O transcripts.blocked.bed.gz

Rename UCSC chromosomes to ENSEMBL.

cat ucsc.bed | cgat bed2bed –method=rename-chr –rename-chr-file=ucsc2ensembl.txt > ensembl.bed

Usage¶

cgat bed2bed –method=[METHOD] [OPTIONS]

Will read bed file from stdin and apply the specified method

Command line options¶

usage: bed2bed [-h]
               [-m {merge,filter-genome,bins,block,sanitize-genome,shift,extend,filter-names,rename-chr}]
               [--num-bins NUM_BINS] [--bin-edges BIN_EDGES]
               [--binning-method {equal-bases,equal-intervals,equal-range}]
               [--merge-distance MERGE_DISTANCE]
               [--merge-min-intervals MERGE_MIN_INTERVALS] [--merge-by-name]
               [--merge-and-resolve-blocks] [--merge-stranded]
               [--remove-inconsistent-names] [--offset OFFSET]
               [-g GENOME_FILE] [-b BAM_FILE] [--filter-names-file NAMES]
               [--rename-chr-file RENAME_CHR_FILE] [--timeit TIMEIT_FILE]
               [--timeit-name TIMEIT_NAME] [--timeit-header]
               [--random-seed RANDOM_SEED] [-v LOGLEVEL]
               [--log-config-filename LOG_CONFIG_FILENAME]
               [--tracing {function}] [-? ?] [-I STDIN] [-L STDLOG]
               [-E STDERR] [-S STDOUT]
bed2bed: error: argument -?: expected one argument