fasta2fasta.py - operate on sequences

Tags

Sequences

Purpose

perform operations (masking, renaming) on a stream of fasta formatted sequences.

Available edit operations are:

translate

translate sequences using the standard genetic code.

translate-to-stop

translate until first stop codon

truncate-at-stop

truncate sequence at first stop codon

back-translate

convert nucleotide sequence to peptide sequence Requires parameter of second fasta file with peptide sequences.

mark-codons

adds a space after each codon

apply-map

rename sequence identifiers from a given map Requires parameter with filename of a map. The map is a tab-separated file mapping old to new names.

build-map

rename sequence identifiers numerically and save output in a tab-separated file. Requires parameter with filename of a map. The map is a tab-separated file mapping new to old names and will be newly created. Any exiting file of the same name will be overwritten.

pseudo-codons

translate, but keep register with codons

interleaved-codons

mix amino acids and codons

filter

remove sequence according to certain criteria. For example, –method=filter –filter-method=min-length=5 –filter-method=max-length=10

map-codons:

remove-gaps

remove all gaps in the sequence

mask-stops

mask all stop codons

mask-seg

mask sequence by running seg

mask-bias

mask sequence by running bias

mask-codons

mask codon sequence given a masked amino acid sequence. Requires parameter with masked amino acids in fasta format.

mask-incomplete-codons

mask codons that are partially masked or gapped

mask-soft

combine hard-masked (NNN) sequences with unmasked sequences to generate soft masked sequence (masked regions in lower case)

remove-stops

remove stop codons

upper

convert sequence to upper case

lower

convert sequence to lower case

reverse-complement

build the reverse complement

shuffle

shuffle each sequence

sample

select a certain proportion of sequences

Parameters are given to the option parameters in a comma-separated list in the order that the edit operations are called upon.

Exclusion/inclusion is tested before applying any id mapping.

Usage

Example:

python fasta2fasta.py --method=translate < in.fasta > out.fasta

Type:

python fasta2fasta.py --help

for command line help.

Command line options

usage: fasta2fasta [-h] [--version]
                   [-m {translate,translate-to-stop,truncate-at-stop,back-translate,mark-codons,apply-map,build-map,pseudo-codons,filter,interleaved-codons,map-codons,remove-gaps,mask-seg,mask-bias,mask-codons,mask-incomplete-codons,mask-stops,mask-soft,map-identifier,nop,remove-stops,upper,lower,reverse-complement,sample,shuffle}]
                   [-p PARAMETERS] [-x]
                   [--sample-proportion SAMPLE_PROPORTION]
                   [--exclude-pattern EXCLUDE_PATTERN]
                   [--include-pattern INCLUDE_PATTERN]
                   [--filter-method FILTER_METHODS] [-t {aa,na}]
                   [-l TEMPLATE_IDENTIFIER] [--map-tsv-file MAP_TSV_FILE]
                   [--fold-width FOLD_WIDTH] [--timeit TIMEIT_FILE]
                   [--timeit-name TIMEIT_NAME] [--timeit-header]
                   [--random-seed RANDOM_SEED] [-v LOGLEVEL]
                   [--log-config-filename LOG_CONFIG_FILENAME]
                   [--tracing {function}] [-? ?] [-I STDIN] [-L STDLOG]
                   [-E STDERR] [-S STDOUT]
fasta2fasta: error: argument -?: expected one argument