Fastq.py - methods for dealing with fastq files¶
This module provides an iterator of fastq formatted files
(iterate()
). Additional iterators allow guessing of the quality
score format (iterate_guess()
) or converting them
(iterate_convert()
) while iterating through a file.
guessFormat()
inspects a fastq file to guess the quality score format
and getOffset()
returns the numeric offset for quality score conversion
for a particular quality score format.
Reference¶
-
class
Fastq.
Record
(identifier, seq, quals, format=None)¶ Bases:
object
A record representing a fastq formatted record.
-
identifier
¶ Sequence identifier
- Type
string
-
seq
¶ Sequence
- Type
string
-
quals
¶ String representation of quality scores.
- Type
string
-
format
¶ Quality score format. Can be one of
sanger
,illumina-1.8
,solexa
orphred64
.- Type
string
-
guessFormat
()¶ return quality score format - might return several if ambiguous.
-
guessDataType
()¶ return the datatype. This is done by inspecting the sequence for basecalls/colorspace ints
-
trim
(trim3, trim5=0)¶ remove nucleotides/quality scores from the 3’ and 5’ ends.
-
trim5
(trim5=0)¶ remove nucleotides/quality scores from the 5’ ends.
-
toPhred
()¶ return qualities as a list of phred-scores.
-
fromPhred
(quals, format)¶ set qualities from a list of phred-scores.
-
-
Fastq.
iterate
(infile)¶ iterate over contents of fastq file.
-
Fastq.
iterate_guess
(infile, max_tries=10000, guess=None)¶ iterate over contents of fastq file.
Guess quality format by looking at the first max_tries entries and then subsequently setting the quality score format for each entry.
- Parameters
infile (File) – File or file-like object to iterate over
max_tries (int) – Number of records to examine for guessing the quality score format.
guess (string) – Default format. This format will be chosen in the quality score format is ambiguous. The method checks if the guess is compatible with the records read so far.
- Yields
fastq – An object of type
Record
.- Raises
ValueError – If the ranges of the fastq records are not compatible, are incompatible with guess or are ambiguous.
-
Fastq.
iterate_convert
(infile, format, max_tries=10000, guess=None)¶ iterate over contents of fastq file.
The quality score format is guessed and all subsequent records are converted to format.
- Parameters
infile (File) – File or file-like object to iterate over
format (string) – Quality score format to convert all records into.
max_tries (int) – Number of records to examine for guessing the quality score format.
guess (string) – Default format. This format will be chosen in the quality score format is ambiguous. The method checks if the guess is compatible with the records read so far.
- Yields
fastq – An object of type
Record
.- Raises
ValueError – If the ranges of the fastq records are not compatible, are incompatible with guess or are ambiguous.
-
Fastq.
guessFormat
(infile, max_lines=10000, raises=True)¶ guess format of FASTQ File.
- Parameters
- Returns
formats – list of quality score formats compatible with the file
- Return type
list
- Raises
ValueError – If the ranges of the fastq records are not compatible.
-
Fastq.
guessDataType
(infile, max_lines=10000, raises=True)¶ guess datatype of FASTQ File from [colourspace, basecalls]
- Parameters
- Returns
formats – list of datatypes compatible with the file (should only ever be one!)
- Return type
list
- Raises
ValueError – If the ranges of the fastq records are not compatible.
-
Fastq.
getOffset
(format, raises=True)¶ returns the ASCII offset for a certain format.
If raises is set a ValueError is raised if there is not a single offset. Otherwise, a minimum offset is returned.
- Returns
offset – The quality score offset
- Return type