split_gff - split a gff file into chunks

Tags

Genomics Intervals Genesets GFF Manipulation

Purpose

Split gff file into chunks. Overlapping entries will always be output in the same chunk. Input is read from stdin unless otherwise specified. The input needs to be contig/start position sorted.

Options

-i –min-chunk-size

This option specifies how big each chunck should be, in terms of the number of gff lines to be included. Because overlapping lines are always output to the same file, this should be considered a minimum size.

-n, --dry-run

This options tells the script not to actaully write any files, but it will output a list of the files that would be output.

Example

cgat splitgff -i 1 < in.gff

where in.gff looks like:

chr1 . exon 1 10 . + . chr1 . exon 8 100 . + . chr1 . exon 102 150 . + .

will produce two files that look like:

000001.chunk: chr1 . exon 1 10 . + . chr1 . exon 8 100 . + .

000002.chunk: chr1 . exon 102 150 . + .

Usage

cgat splitgff [OPTIONS]

Will read a gff file from stdin and split into multiple gff files.

cgat split_gff -I GFF [OPTIONS]

Will read the gff file GFF and split into multiple gff files.

Command line options

usage: split-gff [-h] [-i MIN_CHUNK_SIZE] [-n]
                 [--output-filename-name OUTPUT_FILENAME_NAME]
                 [--timeit TIMEIT_FILE] [--timeit-name TIMEIT_NAME]
                 [--timeit-header] [--random-seed RANDOM_SEED] [-v LOGLEVEL]
                 [--log-config-filename LOG_CONFIG_FILENAME]
                 [--tracing {function}] [-? ?] [-P OUTPUT_FILENAME_PATTERN]
                 [-F] [-I STDIN] [-L STDLOG] [-E STDERR] [-S STDOUT]
split-gff: error: argument -?: expected one argument