fastq2summary.py - compute summary stats for a fastq file

Tags

Genomics NGS Sequences FASTQ Annotation

Purpose

This script iterates over a fastq file and outputs summary statistics for the complete file

The output is a tab-delimited text file with the some of following columns depending on the option specified:

Column

Content

reads

total reads in file

bases

total bases in file

mean_length

mean read length

median_length

median read length

mean_quality

mean read quality

median_quality

median read quality

nfailed

number of bases below quality threshold

Usage

Example:

python fastq2summary.py --guess-format=sanger < in.fastq > out.tsv

In this example we know that our data have quality scores formatted as sanger. Given that illumina-1.8 quality scores are highly overlapping with sanger, this option defaults to sanger qualities. In default mode the script may not be able to distinguish highly overlapping sets of quality scores.

Type:

python fastq2summary.py --help

for command line help.

Command line options

usage: fastq2summary [-h]
                     [--guess-format {sanger,solexa,phred64,illumina-1.8,integer}]
                     [-f {sanger,solexa,phred64,illumina-1.8,integer}]
                     [--timeit TIMEIT_FILE] [--timeit-name TIMEIT_NAME]
                     [--timeit-header] [--random-seed RANDOM_SEED]
                     [-v LOGLEVEL] [--log-config-filename LOG_CONFIG_FILENAME]
                     [--tracing {function}] [-? ?] [-I STDIN] [-L STDLOG]
                     [-E STDERR] [-S STDOUT]
fastq2summary: error: argument -?: expected one argument