rnaseq_junction_bams2bam.py - convert mappings against junctions to genomic coordinates

Tags

Genomics NGS Genesets

Purpose

This script takes as input a BAM file resulting from reads mapped against a junction database and outputs a bam formatted file in genomic coordinates.

The contigs should be of the format <chromosome>|<start>|<exon-end>-<exon-start>|<end>|<splice>|<strand>.

<start> - 0-based coordinate of first base <exon-end> - 0-based coordinate of last base in exon <exon-start> - 0-based coordinate of first base in exon <end> - 0-based coordinate of base after last base

Strand can be either fwd or rev, though sequences in the database and coordinates are all on the forward strand.

For example chr1|1244933|1244982-1245060|1245110|GTAG|fwd translates to the intron chr1:1244983-1245060 in python coordinates.

The input bam-file is supposed to be sorted by read. Only the best matches are output for each read, were best is defined both in terms of number of mismatches and number of colour mismatches.

Usage

Example:

cat input.bam | python rnaseq_junction_bam2bam.py - --log=log > output.bam

Type:

python rnaseq_junction_bam2bam.py --help

for command line help.

Command line options

usage: rnaseq-junction-bam2bam [-h] [--version] [-t FILENAME_GENOME_BAM]
                               [-s FILENAME_CONTIGS] [-o] [-i]
                               [-c REMOVE_CONTIGS] [-f] [-u]
                               [--timeit TIMEIT_FILE]
                               [--timeit-name TIMEIT_NAME] [--timeit-header]
                               [--random-seed RANDOM_SEED] [-v LOGLEVEL]
                               [--log-config-filename LOG_CONFIG_FILENAME]
                               [--tracing {function}] [-? ?] [-I STDIN]
                               [-L STDLOG] [-E STDERR] [-S STDOUT]
rnaseq-junction-bam2bam: error: argument -?: expected one argument