bed2stats.py - summary of bed file contents

Tags

Genomics Intervals Summary BED

Purpose

This script takes a bed-formatted file as input and outputs the number of intervals and bases in the bed file. Counts can be subdivided by setting the --aggregate-by command line option:

contig

output counts per contig (column 1)

name

output counts grouped by the name field in the bed formatted file (column 4)

track

output counts per track in the bed formatted file.

Note that a count of bases usually makes only sense if the intervals submitted are non-overlapping.

If the option –add-percent is given, an additional column will output the percent of the genome covered by intervals. This requires a –genome-file to be given as well.

Usage

To count the number of intervals, type:

cgat bed2table < in.bed

track

ncontigs

nintervals

nbases

all

23

556

27800

To count per contig:

cgat bed2table --aggregate=contig < in.bed

track

ncontigs

nintervals

nbases

chrX

1

11

550

chr13

1

12

600

chr12

1

37

1850

Type:

cgat bed2table --help

for command line help.

Command line options

usage: bed2stats [-h] [-g GENOME_FILE] [-a {name,contig,track,none}] [-p]
                 [--timeit TIMEIT_FILE] [--timeit-name TIMEIT_NAME]
                 [--timeit-header] [--random-seed RANDOM_SEED] [-v LOGLEVEL]
                 [--log-config-filename LOG_CONFIG_FILENAME]
                 [--tracing {function}] [-? ?] [-I STDIN] [-L STDLOG]
                 [-E STDERR] [-S STDOUT]
bed2stats: error: argument -?: expected one argument