bam_vs_gtf.py - compare bam file against gene set

Tags

Genomics NGS Genesets BAM GTF Summary

Purpose

Compare RNASeq reads in a BAM file and compares it against reference exons to quantify exon overrun / underrun.

Documentation

This script is for validation purposes:
  • Exon overrun should be minimal - reads should not extend beyond known exons.

  • Spliced reads should link known exons.

Please note:
  • For unspliced reads, any bases extending beyond exon boundaries are counted.

  • For spliced reads, both parts of the reads are examined for their overlap.

    As a consequence, counts are doubled for spliced reads.

  • The script requires a list of non-overlapping exons as input.

  • For read counts to be correct the NH (number of hits) flag needs to be set correctly.

Usage

Example:

# Preview the BAM file using Samtools view
samtools view tests/bam_vs_gtf.py/small.bam | head
# Pipe input bam to script and specify gtf file as argument
cat tests/bam_vs_gtf.py/small.bam | cgat bam_vs_gtf.py --gtf-file=tests/bam_vs_gtf.py/hg19.chr19.gtf.gz

category

counts

spliced_bothoverlap

0

unspliced_overlap

0

unspliced_nooverrun

0

unspliced

207

unspliced_nooverlap

207

spliced_overrun

0

spliced_halfoverlap

0

spliced_exact

0

spliced_inexact

0

unspliced_overrun

0

spliced

18

spliced_underrun

0

mapped

225

unmapped

0

input

225

spliced_nooverlap

18

spliced_ignored

0

Type:

python bam_vs_gtf.py --help

for command line help.

Command line options

filename-exons / filename-gtf: a gtf formatted file containing the genomic coordinates of a set of non-overlapping exons, such as from a reference genome annotation database (Ensembl, UCSC etc.).

usage: bam-vs-gtf [-h] [--version] [-e gtf] [--timeit TIMEIT_FILE]
                  [--timeit-name TIMEIT_NAME] [--timeit-header]
                  [--random-seed RANDOM_SEED] [-v LOGLEVEL]
                  [--log-config-filename LOG_CONFIG_FILENAME]
                  [--tracing {function}] [-? ?] [-P OUTPUT_FILENAME_PATTERN]
                  [-F] [-I STDIN] [-L STDLOG] [-E STDERR] [-S STDOUT]
bam-vs-gtf: error: argument -?: expected one argument