fastq2summary.py - compute summary stats for a fastq file¶

Tags: Genomics NGS Sequences FASTQ Annotation

Purpose¶

This script iterates over a fastq file and outputs summary statistics for the complete file

The output is a tab-delimited text file with the some of following columns depending on the option specified:

Column	Content
reads	total reads in file
bases	total bases in file
mean_length	mean read length
median_length	median read length
mean_quality	mean read quality
median_quality	median read quality
nfailed	number of bases below quality threshold

Usage¶

Example:

python fastq2summary.py --guess-format=sanger < in.fastq > out.tsv

In this example we know that our data have quality scores formatted as sanger. Given that illumina-1.8 quality scores are highly overlapping with sanger, this option defaults to sanger qualities. In default mode the script may not be able to distinguish highly overlapping sets of quality scores.

Type:

python fastq2summary.py --help

for command line help.

Command line options¶

usage: fastq2summary [-h]
                     [--guess-format {sanger,solexa,phred64,illumina-1.8,integer}]
                     [-f {sanger,solexa,phred64,illumina-1.8,integer}]
                     [--timeit TIMEIT_FILE] [--timeit-name TIMEIT_NAME]
                     [--timeit-header] [--random-seed RANDOM_SEED]
                     [-v LOGLEVEL] [--log-config-filename LOG_CONFIG_FILENAME]
                     [--tracing {function}] [-? ?] [-I STDIN] [-L STDLOG]
                     [-E STDERR] [-S STDOUT]
fastq2summary: error: argument -?: expected one argument