fastq2summary.py - compute summary stats for a fastq file¶
- Tags
Genomics NGS Sequences FASTQ Annotation
Purpose¶
This script iterates over a fastq file and outputs summary statistics for the complete file
The output is a tab-delimited text file with the some of following columns depending on the option specified:
Column |
Content |
reads |
total reads in file |
bases |
total bases in file |
mean_length |
mean read length |
median_length |
median read length |
mean_quality |
mean read quality |
median_quality |
median read quality |
nfailed |
number of bases below quality threshold |
Usage¶
Example:
python fastq2summary.py --guess-format=sanger < in.fastq > out.tsv
In this example we know that our data have quality scores formatted as sanger. Given that illumina-1.8 quality scores are highly overlapping with sanger, this option defaults to sanger qualities. In default mode the script may not be able to distinguish highly overlapping sets of quality scores.
Type:
python fastq2summary.py --help
for command line help.
Command line options¶
usage: fastq2summary [-h]
[--guess-format {sanger,solexa,phred64,illumina-1.8,integer}]
[-f {sanger,solexa,phred64,illumina-1.8,integer}]
[--timeit TIMEIT_FILE] [--timeit-name TIMEIT_NAME]
[--timeit-header] [--random-seed RANDOM_SEED]
[-v LOGLEVEL] [--log-config-filename LOG_CONFIG_FILENAME]
[--tracing {function}] [-? ?] [-I STDIN] [-L STDLOG]
[-E STDERR] [-S STDOUT]
fastq2summary: error: argument -?: expected one argument