fasta2kmercontent.py¶
- Tags
Genomics Sequences FASTA Summary
Purpose¶
This script takes an input fasta file from stdin and computes a k-nucleotide content for each contig in the file. The output is a tab-delimited file of kmer counts:
contig1 contig2 contig3 contig4
n1
n2
n3
where n is the kmer and contig is the fasta entry.
The user specifies the kmer length that is to be searched. Note that the longer the kmer, the longer the script will take to run.
Note the order of output will not necessarily be the same order as the input.
Usage¶
Example:
zcat in.fasta.gz | head::
>NODE_1_length_120_cov_4.233333
TCACGAGCACCGCTATTATCAGCAACTTTTAAGCGACTTTCTTGTTGAATCATTTCAATT
GTCTCCTTTTAGTTTTATTAGATAATAACAGCTTCTTCCACAACTTCTACAAGACGGAAG
CGTTTTGTAGCTGAAAGTGGGCGAGTTTCCATGATACGAAcgatATCGCC
>NODE_3_length_51_cov_33.000000
CGAGTTTCCATGATACGAAcgatATCGCCTTCTTTAGCAACGTTGTTTTCGTCATGTGCT
TTATATTTTTTAGAATAGTTGATACGTTTACCATAGACTGG
zcat in.fasta.gz | python fasta2kmercontent.py
--kmer-size 4
> tetranucleotide_counts.tsv
head tetranucleotide_counts.tsv::
kmer NODE_228_length_74_cov_506.432434 NODE_167_length_57_cov_138.438599
GTAC 0 0
TGCT 0 0
GTAA 2 0
CGAA 1 1
AAAT 1 0
CGAC 0 0
In this example, for each contig in in.fasta.gz the occurrence of each four nucleotide combination is counted.
Alternative example:
zcat in.fasta.gz | python fasta2kmercontent.py
--kmer-size 4
--output-proportion
> tetranucleotide_proportions.tsv
In this example, for each contig in in.fasta.gz we return the proportion of
each four base combination out of the total tetranucleotide occurences.
--output-proportion
overides the count output.
Options¶
Two options control the behaviour of fasta2kmercontent.py; --kmer-size
and
--output-proportion
.
--kmer-size
::The kmer length to count over in the input fasta file
--output-proportion
::The output values are proportions rather than absolute counts
Type:
python fasta2composition.py --help
for command line help.
Command line options¶
usage: fasta2kmercontent [-h] [--version] [-k KMER] [-p]
[--timeit TIMEIT_FILE] [--timeit-name TIMEIT_NAME]
[--timeit-header] [--random-seed RANDOM_SEED]
[-v LOGLEVEL]
[--log-config-filename LOG_CONFIG_FILENAME]
[--tracing {function}] [-? ?] [-I STDIN] [-L STDLOG]
[-E STDERR] [-S STDOUT]
fasta2kmercontent: error: argument -?: expected one argument