fastq2table.py - compute stats on reads in fastq files¶
- Tags
Genomics NGS Sequences FASTQ Annotation
Purpose¶
This script iterates over a fastq file and outputs summary statistics for each read.
The output is a tab-delimited text file with the following columns:
Column |
Content |
read |
read identifier present in input fastq file |
nfailed |
number of reads that fall below Q10 |
nN |
number of ambiguous base calls (N) |
nval |
number of bases in the read |
min |
minimum base quality score for the read |
max |
maximum base quality for the read |
mean |
mean base quality for the read |
median |
median base quality for the read |
stddev |
standard devitation of quality scores for the read |
sum |
sum of quality scores for the read |
q1 |
25th percentile of quality scores for the read |
q3 |
25th percentile of quality scores for the read |
Usage¶
Example:
cgat fastq2table --guess-format=sanger < in.fastq > out.tsv
In this example we know that our data have quality scores formatted as sanger. Given that illumina-1.8 quality scores are highly overlapping with sanger, this option defaults to sanger qualities. In default mode the script may not be able to distinguish highly overlapping sets of quality scores.
If we provide two reads to the script:
@DHKW5DQ1:308:D28FGACXX:5:2211:8051:4398
ACAATGTCCTGATGTGAATGCCCCTACTATTCAGATCGCTTAGGGCATGC
+
B1=?DFDDHHFFHIJJIJGGIJGFIEE9CHIIFEGGIIJGIGIGIIDGHI
@DHKW5DQ1:308:D28FGACXX:5:1315:15039:83265
GAATGCCCCTACTATTCAGATCGCTTAGGGCATGCGTCGCATGTGAGTAA
+
@@@FDFFFHGHHHJIIIJIGHIJJIGHGHC9FBFBGHIIEGHIGC>F@FA
we get the following table as output:
read |
nfailed |
nN |
nval |
min |
max |
mean |
median |
stddev |
sum |
q1 |
q3 |
DHKW5DQ1:308:D28FGACXX:5:2211:8051:4398 |
0 |
0 |
50 |
16.0000 |
41.0000 |
37.2000 |
38.0000 |
4.4900 |
1860.0000 |
36.0000 |
40.0000 |
DHKW5DQ1:308:D28FGACXX:5:1315:15039:83265 |
0 |
0 |
50 |
24.0000 |
41.0000 |
37.0200 |
38.0000 |
3.5916 |
1851.0000 |
36.0000 |
40.0000 |
Type:
cgat fastq2table --help
for command line help.
Command line options¶
usage: fastq2table [-h] [--version]
[--guess-format {sanger,solexa,phred64,illumina-1.8,integer}]
[--target-format {sanger,solexa,phred64,illumina-1.8,integer}]
[--timeit TIMEIT_FILE] [--timeit-name TIMEIT_NAME]
[--timeit-header] [--random-seed RANDOM_SEED] [-v LOGLEVEL]
[--log-config-filename LOG_CONFIG_FILENAME]
[--tracing {function}] [-? ?] [-I STDIN] [-L STDLOG]
[-E STDERR] [-S STDOUT]
fastq2table: error: argument -?: expected one argument