rnaseq_junction_bams2bam.py - convert mappings against junctions to genomic coordinates¶
- Tags
Genomics NGS Genesets
Purpose¶
This script takes as input a BAM file resulting from reads mapped against a junction database and outputs a bam formatted file in genomic coordinates.
The contigs should be of the format <chromosome>|<start>|<exon-end>-<exon-start>|<end>|<splice>|<strand>.
<start> - 0-based coordinate of first base <exon-end> - 0-based coordinate of last base in exon <exon-start> - 0-based coordinate of first base in exon <end> - 0-based coordinate of base after last base
Strand can be either fwd
or rev
, though sequences in the database
and coordinates are all on the forward strand.
For example chr1|1244933|1244982-1245060|1245110|GTAG|fwd
translates to the
intron chr1:1244983-1245060
in python coordinates.
The input bam-file is supposed to be sorted by read. Only the best matches are output for each read, were best is defined both in terms of number of mismatches and number of colour mismatches.
Usage¶
Example:
cat input.bam | python rnaseq_junction_bam2bam.py - --log=log > output.bam
Type:
python rnaseq_junction_bam2bam.py --help
for command line help.
Command line options¶
usage: rnaseq-junction-bam2bam [-h] [--version] [-t FILENAME_GENOME_BAM]
[-s FILENAME_CONTIGS] [-o] [-i]
[-c REMOVE_CONTIGS] [-f] [-u]
[--timeit TIMEIT_FILE]
[--timeit-name TIMEIT_NAME] [--timeit-header]
[--random-seed RANDOM_SEED] [-v LOGLEVEL]
[--log-config-filename LOG_CONFIG_FILENAME]
[--tracing {function}] [-? ?] [-I STDIN]
[-L STDLOG] [-E STDERR] [-S STDOUT]
rnaseq-junction-bam2bam: error: argument -?: expected one argument