FastaIterator.py - Iteration over fasta files

This module provides a simple iterator of Fasta formatted files. The difference to the biopython iterator is that the iterators in this module skip over comment lines starting with “#”.

Note

Another way to access the information in fasta formatted files is through pysam.

Reference

class FastaIterator.FastaRecord(title, sequence, fold=False)

Bases: object

a fasta record.

title

the title of the sequence

Type

string

sequence

the sequence

Type

string

fold

the number of bases per line when writing out

Type

int

class FastaIterator.FastaIterator(f, *args, **kwargs)

Bases: object

a iterator of fasta formatted files.

Yields

FastaRecord

FastaIterator.iterate(infile, comment='#', fold=False)

iterate over fasta data in infile

Lines before the first fasta record are ignored (starting with >) as well as lines starting with the comment character.

Parameters
  • infile (File) – the input file

  • comment (char) – comment character

  • fold (int) – the number of bases before line split when writing out

Yields

FastaRecord

FastaIterator.iterate_together(*args)

iterate synchronously over one or more fasta files.

The iteration finishes once any of the files is exhausted.

:param fasta-formatted files to be iterated upon:

Yields

tuple – a tuple of FastaRecord corresponding to the current record in each file.

FastaIterator.count(filename)

count number of sequences in fasta file.

This method uses the grep utility to count lines starting with >.

Parameters

filename (string) – The filename

Raises

OSError – If the file does not exist

Returns

The number of sequences in the file.

Return type

int