beds2beds.py - decompose bed files

Tags

Genomics Intervals BED Manipulation

Purpose

This script will decompose a collection of input bedfiles into a collection of unions or intersections.

Options

Files are collected by a regular expression pattern given to the option --pattern-identifier.

The script behaviour is determined by the --method option with either of the following choices:

merged-combinations

merge intervals across bed files and only report those that appear in every file.

unmerged-combinations

for each bed file, report intervals that overlap with intervals in every other bed file.

If the --exclusive-overlap option is set, report exclusive overlap. Only intervals will be reported that overlap in a pairwise comparison but do not overlap with intervals in any of the other sets.

This script requires bed files indexed by tabix.

Usage

For example, you have ChIP-Seq data for PolII and two transcription factors tf1 and tf2. The following statement will output four bed files:

zcat polii.bed.gz | head

chr17    1    100    8    1
chr19   -50    50    6    1
chr19    0    100    1    1
chr19    50   150    1    1
chr19   150   200    2    1
chr19   201   300    3    1

python beds2beds.py polii.bed.gz tf1.bed.gz tf2.bed.gz

zcat tf1.bed.gz | head

chr1    35736     40736    ENST000004173240    -
chr1    60881     65881    ENST000005349900    +
chr1    64090     69090    ENST000003351370    +
chr1    362658    367658   ENST000004264060    +
chr1    622034    627034   ENST000003328310    -
chr1    716405    721405   ENST000003585330    +

The four files contain intervals, that

  1. have PolII and tf1 present,

  2. have PolII and tf2 present,

  3. have tf1 and tf2 present, or

  4. have PolII and tf1 and tf2 present.

If the –exclusive-overlap option is set, three sets will be output with intervals that

  1. have PolII and tf1 present but no tf2,

  2. have PolII and tf2 present but no tf1,

  3. have tf1 and tf2 present but no PolII.

Type:

python beds2beds.py --help

for command line help.

Command line options

usage: beds2beds [-h] [--version] [-e] [-p PATTERN_ID]
                 [-m {merged-combinations,unmerged-combinations}]
                 [--timeit TIMEIT_FILE] [--timeit-name TIMEIT_NAME]
                 [--timeit-header] [--random-seed RANDOM_SEED] [-v LOGLEVEL]
                 [--log-config-filename LOG_CONFIG_FILENAME]
                 [--tracing {function}] [-? ?] [-P OUTPUT_FILENAME_PATTERN]
                 [-F] [-I STDIN] [-L STDLOG] [-E STDERR] [-S STDOUT]
beds2beds: error: argument -?: expected one argument