beds2counts - compute overlap stats between multiple bed files

Tags

Genomics Intervals Comparison BED Counting

Purpose

This script takes multiple bed files e.g. from multiple samples from the same experiment. It assesses the overlap between samples and outputs a count for each merged interval corresponding to the number of samples that a particular interval was found in.

Example

For example if the command:

cgat bed2counts a.bed b.bed c.bed > output.tsv

is run, where a.bed-c.bed look like:

                 1         2         3         4
       012345678901234567890123456789012345678901234
a.bed: -------          -----               -------
b.bed:      -----        --
c.bed:  ---

Union: ----------       -----               -------

Then output.tsv will look like:

contig      start   end     count
chr1        0       7       3
chr1        17      22      2
chr1        37      44      1

Options

The only option other than the standard cgat options is -i, –bed-file this allows the input files to be provided as a comma seperated list to the option rather than a space delimited set of positional arguements. It is present purely for galaxy compatibility.

Usage

cgat beds2counts BED [BED …] [OPTIONS]

Command line options

usage: beds2counts [-h] [--version] [--bed-file bed] [--timeit TIMEIT_FILE]
                   [--timeit-name TIMEIT_NAME] [--timeit-header]
                   [--random-seed RANDOM_SEED] [-v LOGLEVEL]
                   [--log-config-filename LOG_CONFIG_FILENAME]
                   [--tracing {function}] [-? ?] [-I STDIN] [-L STDLOG]
                   [-E STDERR] [-S STDOUT]

beds2counts - compute overlap stats between multiple bed files
=================================================================

:Tags: Genomics Intervals Comparison BED Counting

Purpose
-------

This script takes multiple bed files e.g. from multiple samples from
the same experiment. It assesses the overlap between samples and
outputs a count for each merged interval corresponding to the number
of samples that a particular interval was found in.

Example
-------

For example if the command::

    cgat bed2counts a.bed b.bed c.bed > output.tsv

is run, where a.bed-c.bed look like::

                     1         2         3         4
           012345678901234567890123456789012345678901234
    a.bed: -------          -----               -------
    b.bed:      -----        --
    c.bed:  ---

    Union: ----------       -----               -------

Then output.tsv will look like::

    contig	start	end	count
    chr1	0	7	3
    chr1	17	22	2
    chr1	37	44	1

Options
-------

The only option other than the standard cgat options is -i, --bed-file this
allows the input files to be provided as a comma seperated list to the option
rather than a space delimited set of positional arguements. It is present
purely for galaxy compatibility.

Usage
-----

    cgat beds2counts BED [BED ...] [OPTIONS]

Command line options
--------------------

optional arguments:
  -h, --help            show this help message and exit
  --version             show program's version number and exit
  --bed-file bed        supply list of bed files (default: [])

Script timing options:
  --timeit TIMEIT_FILE  store timeing information in file. (default: None)
  --timeit-name TIMEIT_NAME
                        name in timing file for this class of jobs (default:
                        all)
  --timeit-header       add header for timing information. (default: None)

Common options:
  --random-seed RANDOM_SEED
                        random seed to initialize number generator with
                        (default: None)
  -v LOGLEVEL, --verbose LOGLEVEL
                        loglevel. The higher, the more output. (default: 1)
  --log-config-filename LOG_CONFIG_FILENAME
                        Configuration file for logger. (default: logging.yml)
  --tracing {function}  enable function tracing. (default: None)
  -? ?                  output short help (command line options only.
                        (default: None)

Input/output options:
  -I STDIN, --stdin STDIN
                        file to read stdin from. (default: <_io.TextIOWrapper
                        name='<stdin>' mode='r' encoding='UTF-8'>)
  -L STDLOG, --log STDLOG
                        file with logging information. (default:
                        <_io.TextIOWrapper name='<stdout>' mode='w'
                        encoding='UTF-8'>)
  -E STDERR, --error STDERR
                        file with error information. (default:
                        <_io.TextIOWrapper name='<stderr>' mode='w'
                        encoding='UTF-8'>)
  -S STDOUT, --stdout STDOUT
                        file where output is to go. (default:
                        <_io.TextIOWrapper name='<stdout>' mode='w'
                        encoding='UTF-8'>)