fastqs2fasta.py - interleave two fastq files

Tags

Genomics NGS FASTQ FASTA Conversion

Purpose

This script is used to interleave two fastq-formatted files (paired data) into a single fasta-formatted file. Read1 is followed by read2 in the resultant file.

fastq files MUST be sorted by read identifier.

Usage

For example:

cgat fastqs2fasta          --first-fastq-file=in.fastq.1.gz          --second-fastq-file=in.fastq.2.gz > out.fasta

If in.fastq.1.gz looks like this:

@r1_from_gi|387760314|ref|NC_017594.1|_Streptococcus_saliva_#0/1
TTCTTGTTGAATCATTTCAATTGTCTCCTTTTAGTTTTATTAGATAATAACAGCTTCTTCCACAACTTCT
+
??A???ABBDDBDDEDGGFGAFHHCHHIIIDIHGIFIH=HFICIHDHIHIFIFIIIIIIHFHIFHIHHHH
@r3_from_gi|315441696|ref|NC_014814.1|_Mycobacterium_gilvum_#0/1
ATGAACGCGGCCGAGCAACACCGCCACCACGTGAATCGGTGGTTCTACGACTGCCCGTCGGCCTTCCACC
+

and in.fastq.2.gz looks like this:

A??A?B??BDBDDDBDGGFA>CFCFIIIIIIF;HFIGHCIGHIHHEHHHIIHHFDHH-HD-IDHHHGIHG
@r1_from_gi|387760314|ref|NC_017594.1|_Streptococcus_saliva_#0/2
ACCTTCGTTTCCAAGGTGCAGCAGGTCAACTTGATCAAACTGCCCCTTTGAACGAAGTGAAAAAACAAAT
+
A????@BBDBDDADABGFGFFEHHHIEHHII@IIHIHHIDHCCIHIIIHHIEI5HIHFHIEHIH=CHHC)
@r3_from_gi|315441696|ref|NC_014814.1|_Mycobacterium_gilvum_#0/2
GGGAGCCTGCAGCGCCGCCGCGACTGCATCGCCGCGGCCGGCATCGTGGGATGGACGGTGCGTCAGACGC
+
???A?9BBDDD5@DDDGFFGFFHIIIHHIHBFHIIHIIHHH>HEIHHFI>FFHGIIHHHDHCCFIHFIHD

then the output will be:

>r1_from_gi|387760314|ref|NC_017594.1|_Streptococcus_saliva_#0/1
TTCTTGTTGAATCATTTCAATTGTCTCCTTTTAGTTTTATTAGATAATAACAGCTTCTTCCACAACTTCT
>r1_from_gi|387760314|ref|NC_017594.1|_Streptococcus_saliva_#0/2
ACCTTCGTTTCCAAGGTGCAGCAGGTCAACTTGATCAAACTGCCCCTTTGAACGAAGTGAAAAAACAAAT
>r3_from_gi|315441696|ref|NC_014814.1|_Mycobacterium_gilvum_#0/1
ATGAACGCGGCCGAGCAACACCGCCACCACGTGAATCGGTGGTTCTACGACTGCCCGTCGGCCTTCCACC
>r3_from_gi|315441696|ref|NC_014814.1|_Mycobacterium_gilvum_#0/2
GGGAGCCTGCAGCGCCGCCGCGACTGCATCGCCGCGGCCGGCATCGTGGGATGGACGGTGCGTCAGACGC
>r4_from_gi|53711291|ref|NC_006347.1|_Bacteroides_fragilis_#0/1
GAGGGATCAGCCTGTTATCCCCGGAGTACCTTTTATCCTTTGAGcgatGTCCCTTCCATACGGAAACACC
>r4_from_gi|53711291|ref|NC_006347.1|_Bacteroides_fragilis_#0/2
CAACCGTGAGCTCAGTGAAATTGTAGTATCGGTGAAGATGCcgatTACCCGcgatGGGACGAAAAGACCC
>r5_from_gi|325297172|ref|NC_015164.1|_Bacteroides_salanitr_#0/1
TGCGGCGAAATACCAGCCCATGCCCCGTCCCCAGAATTCCTTGGAGCAGCCTTTGTGAGGTTCGGCTTTG
>r5_from_gi|325297172|ref|NC_015164.1|_Bacteroides_salanitr_#0/2
AACGGCACGCACAATGCCGACCGCTACAAAAAGGCTGCCGACTGGCTCCGCAATTACCTGGTGAACGACT

Type:

cgat fastqs2fasta --help

for command line help.

Command line options

usage: fastqs2fasta [-h] [--version] [-a FASTQ1] [-b FASTQ2]
                    [--timeit TIMEIT_FILE] [--timeit-name TIMEIT_NAME]
                    [--timeit-header] [--random-seed RANDOM_SEED]
                    [-v LOGLEVEL] [--log-config-filename LOG_CONFIG_FILENAME]
                    [--tracing {function}] [-? ?] [-I STDIN] [-L STDLOG]
                    [-E STDERR] [-S STDOUT]
fastqs2fasta: error: argument -?: expected one argument