gff2stats.py - count features, etc. in gff file¶
- Tags
Genomics Intervals GFF GTF Summary
Purpose¶
This script generates summary statistics over features, source, gene_id and transcript_id in one or more gff or gtf formatted files.
Usage¶
Input is either a gff or gtf file; gtf input must be specified with the –is-gtf option.
Example:
python gff2stats.py --is-gtf example.gtf > example_sum.tsv
cat example.gtf
19 processed_transcript exon 6634666509 . - . gene_id "ENSG00000225373"; transcript_id "ENST00000592209" ...
19 processed_transcript exon 6052160747 . - . gene_id "ENSG00000225373"; transcript_id "ENST00000592209" ...
19 processed_transcript exon 6010560162 . - . gene_id "ENSG00000225373"; transcript_id "ENST00000592209" ...
19 processed_transcript exon 6634666416 . - . gene_id "ENSG00000225373"; transcript_id "ENST00000589741" ...
cat example_sum.tsv
track contigs strands features sources genes transcripts ...
stdin 1 2 4 23 2924 12752 ...
The counter used is dependent on the file type. For a gff file, the implemented counters are:
number of intervals per contig, strand, feature and source
For a gtf file, the additional implemented counters are:
number of genes, transcripts, single exon transcripts
summary statistics for exon numbers, exon sizes, intron sizes and transcript sizes
The output is a tab-separated table.
Options¶
The default action of gff2stats
is to count over contigs, strand,
feature and source. This assumes the input file is a gff file
There is a single option for this script:
``--is-gtf``
The input file is gtf format. The output will therefore contain summaries over exon numbers, exon sizes, intron sizes and transcript sizes in addition to the the number of genes, transcripts and single exon transcripts.
Type:
python gff2stats.py --help
for command line help.
Command line options¶
usage: gff2stats [-h] [--version] [--is-gtf] [--timeit TIMEIT_FILE]
[--timeit-name TIMEIT_NAME] [--timeit-header]
[--random-seed RANDOM_SEED] [-v LOGLEVEL]
[--log-config-filename LOG_CONFIG_FILENAME]
[--tracing {function}] [-? ?] [-P OUTPUT_FILENAME_PATTERN]
[-F] [-I STDIN] [-L STDLOG] [-E STDERR] [-S STDOUT]
gff2stats: error: argument -?: expected one argument