beds2counts - compute overlap stats between multiple bed files¶
- Tags
Genomics Intervals Comparison BED Counting
Purpose¶
This script takes multiple bed files e.g. from multiple samples from the same experiment. It assesses the overlap between samples and outputs a count for each merged interval corresponding to the number of samples that a particular interval was found in.
Example
For example if the command:
cgat bed2counts a.bed b.bed c.bed > output.tsv
is run, where a.bed-c.bed look like:
1 2 3 4
012345678901234567890123456789012345678901234
a.bed: ------- ----- -------
b.bed: ----- --
c.bed: ---
Union: ---------- ----- -------
Then output.tsv will look like:
contig start end count
chr1 0 7 3
chr1 17 22 2
chr1 37 44 1
Options¶
The only option other than the standard cgat options is -i, –bed-file this allows the input files to be provided as a comma seperated list to the option rather than a space delimited set of positional arguements. It is present purely for galaxy compatibility.
Usage¶
cgat beds2counts BED [BED …] [OPTIONS]
Command line options¶
usage: beds2counts [-h] [--version] [--bed-file bed] [--timeit TIMEIT_FILE]
[--timeit-name TIMEIT_NAME] [--timeit-header]
[--random-seed RANDOM_SEED] [-v LOGLEVEL]
[--log-config-filename LOG_CONFIG_FILENAME]
[--tracing {function}] [-? ?] [-I STDIN] [-L STDLOG]
[-E STDERR] [-S STDOUT]
beds2counts - compute overlap stats between multiple bed files
=================================================================
:Tags: Genomics Intervals Comparison BED Counting
Purpose
-------
This script takes multiple bed files e.g. from multiple samples from
the same experiment. It assesses the overlap between samples and
outputs a count for each merged interval corresponding to the number
of samples that a particular interval was found in.
Example
-------
For example if the command::
cgat bed2counts a.bed b.bed c.bed > output.tsv
is run, where a.bed-c.bed look like::
1 2 3 4
012345678901234567890123456789012345678901234
a.bed: ------- ----- -------
b.bed: ----- --
c.bed: ---
Union: ---------- ----- -------
Then output.tsv will look like::
contig start end count
chr1 0 7 3
chr1 17 22 2
chr1 37 44 1
Options
-------
The only option other than the standard cgat options is -i, --bed-file this
allows the input files to be provided as a comma seperated list to the option
rather than a space delimited set of positional arguements. It is present
purely for galaxy compatibility.
Usage
-----
cgat beds2counts BED [BED ...] [OPTIONS]
Command line options
--------------------
optional arguments:
-h, --help show this help message and exit
--version show program's version number and exit
--bed-file bed supply list of bed files (default: [])
Script timing options:
--timeit TIMEIT_FILE store timeing information in file. (default: None)
--timeit-name TIMEIT_NAME
name in timing file for this class of jobs (default:
all)
--timeit-header add header for timing information. (default: None)
Common options:
--random-seed RANDOM_SEED
random seed to initialize number generator with
(default: None)
-v LOGLEVEL, --verbose LOGLEVEL
loglevel. The higher, the more output. (default: 1)
--log-config-filename LOG_CONFIG_FILENAME
Configuration file for logger. (default: logging.yml)
--tracing {function} enable function tracing. (default: None)
-? ? output short help (command line options only.
(default: None)
Input/output options:
-I STDIN, --stdin STDIN
file to read stdin from. (default: <_io.TextIOWrapper
name='<stdin>' mode='r' encoding='UTF-8'>)
-L STDLOG, --log STDLOG
file with logging information. (default:
<_io.TextIOWrapper name='<stdout>' mode='w'
encoding='UTF-8'>)
-E STDERR, --error STDERR
file with error information. (default:
<_io.TextIOWrapper name='<stderr>' mode='w'
encoding='UTF-8'>)
-S STDOUT, --stdout STDOUT
file where output is to go. (default:
<_io.TextIOWrapper name='<stdout>' mode='w'
encoding='UTF-8'>)