FastaIterator.py - Iteration over fasta files¶

This module provides a simple iterator of Fasta formatted files. The difference to the biopython iterator is that the iterators in this module skip over comment lines starting with “#”.

Note

Another way to access the information in fasta formatted files is through pysam.

Reference¶

class FastaIterator.FastaRecord(title, sequence, fold=False)¶

Bases: object

a fasta record.

title¶

the title of the sequence

Type: string

sequence¶

the sequence

Type: string

fold¶

the number of bases per line when writing out

Type: int

class FastaIterator.FastaIterator(f, *args, **kwargs)¶

Bases: object

a iterator of fasta formatted files.

Yields: FastaRecord

FastaIterator.iterate(infile, comment='#', fold=False)¶

iterate over fasta data in infile

Lines before the first fasta record are ignored (starting with >) as well as lines starting with the comment character.

Parameters

infile (File) – the input file
comment (char) – comment character
fold (int) – the number of bases before line split when writing out

Yields

FastaRecord

FastaIterator.iterate_together(*args)¶

iterate synchronously over one or more fasta files.

The iteration finishes once any of the files is exhausted.

:param fasta-formatted files to be iterated upon:

Yields: tuple – a tuple of FastaRecord corresponding to the current record in each file.

FastaIterator.count(filename)¶

count number of sequences in fasta file.

This method uses the grep utility to count lines starting with >.

Parameters: filename (string) – The filename
Raises: OSError – If the file does not exist
Returns: The number of sequences in the file.
Return type: int