fastqs2fastqs.py - manipulate (merge/reconcile) fastq files

Tags

Genomics NGS FASTQ FASTQ Manipulation

Purpose

This script manipulates multiple fastq files and outputs new fastq files. Currently only the method reconcile is implemented.

reconcile

Reconcile reads from a pair of fastq files.

This method takes two fastq files and outputs two fastq files such that all reads in the output are present in both output files.

The typical use case is that two fastq files containing the first and second part of a read pair have been independently filtered, for example by quality scores, truncation, etc. As a consequence some reads might be missing from one file but not the other. The reconcile method will output two files containing only reads that are common to both files.

The two files must be sorted by read identifier.

Example input, read2 and read3 are only present in either of the files:

# File1 # File 2

@read1 @read1 AAA AAA + + !!! !!! @read2 @read3 CCC TTT + + !!! !!! @read4 @read4 GGG GGG + + !!! !!!

Example output, only the reads common to both files are output:

# File1        # File 2

@read1         @read1
AAA            AAA
+              +
!!!            !!!
@read4         @read4
GGG            GGG
+              +
!!!            !!!

Usage

Example:

python fastqs2fastqs.py             --method=reconcile             --output-filename-pattern=myReads_reconciled.%s.fastq             myReads.1.fastq.gz myReads.2.fastq.gz

In this example we take a pair of fastq files, reconcile by read identifier and output 2 new fastq files named myReads_reconciled.1.fastq.gz and myReads_reconciled.2.fastq.gz.

Type:

python fastqs2fastqs.py --help

for command line help.

Command line options

usage: fastqs2fastqs [-h] [--version] [-m {reconcile,filter-by-sequence}] [-c]
                     [-u] [--id-pattern-1 ID_PATTERN_1]
                     [--id-pattern-2 ID_PATTERN_2]
                     [--input-filename-fasta INPUT_FILENAME_FASTA]
                     [--filtering-kmer-size FILTERING_KMER_SIZE]
                     [--filtering-min-kmer-matches FILTERING_MIN_KMER_MATCHES]
                     [--timeit TIMEIT_FILE] [--timeit-name TIMEIT_NAME]
                     [--timeit-header] [--random-seed RANDOM_SEED]
                     [-v LOGLEVEL] [--log-config-filename LOG_CONFIG_FILENAME]
                     [--tracing {function}] [-? ?]
                     [-P OUTPUT_FILENAME_PATTERN] [-F] [-I STDIN] [-L STDLOG]
                     [-E STDERR] [-S STDOUT]
fastqs2fastqs: error: argument -?: expected one argument