fasta2variants.py - create sequence variants from a set of sequences¶
- Tags
Genomics Sequences Variants Protein FASTA Transformation
Purpose¶
This script reads a collection of sequences in fasta format and outputs a table of possible variants. It outputs for each position in a protein sequence the number of variants.
If the input sequences are nucleotide coding (CDS) sequences, for each variant a weight is output indicating the number of times that variant can occur from single nucleotide changes.
Usage¶
Example:
python fasta2variants.py -I CCDS_nucleotide.current.fna.gz -L CDS.log -S CDS.output -c
This will take a CDS file as input, save the log and output files, and count variants based on single nucleotide changes using the -c option.
Type:
python fasta2variants.py --help
for command line help.
Compressed (.gz) and various fasta format files (.fasta, .fna) are accepted. If the -c option is specified and the file is not a CDS sequence the script will throw an error (‘length of sequence ‘<input_file>’ is not a multiple of 3’).
Command line options¶
usage: fasta2variants [-h] [--version] [-c] [--timeit TIMEIT_FILE]
[--timeit-name TIMEIT_NAME] [--timeit-header]
[--random-seed RANDOM_SEED] [-v LOGLEVEL]
[--log-config-filename LOG_CONFIG_FILENAME]
[--tracing {function}] [-? ?] [-I STDIN] [-L STDLOG]
[-E STDERR] [-S STDOUT]
fasta2variants: error: argument -?: expected one argument