fastqs2fasta.py - interleave two fastq files¶
- Tags
Genomics NGS FASTQ FASTA Conversion
Purpose¶
This script is used to interleave two fastq-formatted files (paired data) into a single fasta-formatted file. Read1 is followed by read2 in the resultant file.
fastq files MUST be sorted by read identifier.
Usage¶
For example:
cgat fastqs2fasta --first-fastq-file=in.fastq.1.gz --second-fastq-file=in.fastq.2.gz > out.fasta
If in.fastq.1.gz
looks like this:
@r1_from_gi|387760314|ref|NC_017594.1|_Streptococcus_saliva_#0/1
TTCTTGTTGAATCATTTCAATTGTCTCCTTTTAGTTTTATTAGATAATAACAGCTTCTTCCACAACTTCT
+
??A???ABBDDBDDEDGGFGAFHHCHHIIIDIHGIFIH=HFICIHDHIHIFIFIIIIIIHFHIFHIHHHH
@r3_from_gi|315441696|ref|NC_014814.1|_Mycobacterium_gilvum_#0/1
ATGAACGCGGCCGAGCAACACCGCCACCACGTGAATCGGTGGTTCTACGACTGCCCGTCGGCCTTCCACC
+
and in.fastq.2.gz
looks like this:
A??A?B??BDBDDDBDGGFA>CFCFIIIIIIF;HFIGHCIGHIHHEHHHIIHHFDHH-HD-IDHHHGIHG
@r1_from_gi|387760314|ref|NC_017594.1|_Streptococcus_saliva_#0/2
ACCTTCGTTTCCAAGGTGCAGCAGGTCAACTTGATCAAACTGCCCCTTTGAACGAAGTGAAAAAACAAAT
+
A????@BBDBDDADABGFGFFEHHHIEHHII@IIHIHHIDHCCIHIIIHHIEI5HIHFHIEHIH=CHHC)
@r3_from_gi|315441696|ref|NC_014814.1|_Mycobacterium_gilvum_#0/2
GGGAGCCTGCAGCGCCGCCGCGACTGCATCGCCGCGGCCGGCATCGTGGGATGGACGGTGCGTCAGACGC
+
???A?9BBDDD5@DDDGFFGFFHIIIHHIHBFHIIHIIHHH>HEIHHFI>FFHGIIHHHDHCCFIHFIHD
then the output will be:
>r1_from_gi|387760314|ref|NC_017594.1|_Streptococcus_saliva_#0/1
TTCTTGTTGAATCATTTCAATTGTCTCCTTTTAGTTTTATTAGATAATAACAGCTTCTTCCACAACTTCT
>r1_from_gi|387760314|ref|NC_017594.1|_Streptococcus_saliva_#0/2
ACCTTCGTTTCCAAGGTGCAGCAGGTCAACTTGATCAAACTGCCCCTTTGAACGAAGTGAAAAAACAAAT
>r3_from_gi|315441696|ref|NC_014814.1|_Mycobacterium_gilvum_#0/1
ATGAACGCGGCCGAGCAACACCGCCACCACGTGAATCGGTGGTTCTACGACTGCCCGTCGGCCTTCCACC
>r3_from_gi|315441696|ref|NC_014814.1|_Mycobacterium_gilvum_#0/2
GGGAGCCTGCAGCGCCGCCGCGACTGCATCGCCGCGGCCGGCATCGTGGGATGGACGGTGCGTCAGACGC
>r4_from_gi|53711291|ref|NC_006347.1|_Bacteroides_fragilis_#0/1
GAGGGATCAGCCTGTTATCCCCGGAGTACCTTTTATCCTTTGAGcgatGTCCCTTCCATACGGAAACACC
>r4_from_gi|53711291|ref|NC_006347.1|_Bacteroides_fragilis_#0/2
CAACCGTGAGCTCAGTGAAATTGTAGTATCGGTGAAGATGCcgatTACCCGcgatGGGACGAAAAGACCC
>r5_from_gi|325297172|ref|NC_015164.1|_Bacteroides_salanitr_#0/1
TGCGGCGAAATACCAGCCCATGCCCCGTCCCCAGAATTCCTTGGAGCAGCCTTTGTGAGGTTCGGCTTTG
>r5_from_gi|325297172|ref|NC_015164.1|_Bacteroides_salanitr_#0/2
AACGGCACGCACAATGCCGACCGCTACAAAAAGGCTGCCGACTGGCTCCGCAATTACCTGGTGAACGACT
Type:
cgat fastqs2fasta --help
for command line help.
Command line options¶
usage: fastqs2fasta [-h] [--version] [-a FASTQ1] [-b FASTQ2]
[--timeit TIMEIT_FILE] [--timeit-name TIMEIT_NAME]
[--timeit-header] [--random-seed RANDOM_SEED]
[-v LOGLEVEL] [--log-config-filename LOG_CONFIG_FILENAME]
[--tracing {function}] [-? ?] [-I STDIN] [-L STDLOG]
[-E STDERR] [-S STDOUT]
fastqs2fasta: error: argument -?: expected one argument