diff_bam.py - compare multiple bam files against each other¶
- Tags
Genomics NGS BAM Comparison
Purpose¶
Compare reads in multiple BAM files against each other.
Note
BAM files need to be sorted by read name. samtools sort does NOT work as it uses a custom comparison function (strnum_cmp) that is incompatible with the standard lexicographical order in python. See the example below on how to get sorted files.
This script is for validation purposes. It might take a while for large BAM files.
Usage¶
If you have two sorted sam or bam formatted files, type:
cgat diff_bam a.bam b.bam > out
If they are not sorted, you can use samtools sort to do an inplace sort:
cgat diff_bam <(samtools view -h a.bam | hsort 0 -k1,1)
<(samtools view -h b.bam | hsort 0 -k1,1)
The samtools -h
option outputs the header, and the hsort command
sorts without disturbing the header.
An example output looks like this:
read |
nlocations |
nmatched |
file1_nh |
file2_nh |
file1_loc |
file2_loc |
42YKVAAXX_HWI-EAS229_1:1:11:1659:174 |
1 |
2 |
2 |
2 |
0,0 |
0,0 |
42YKVAAXX_HWI-EAS229_1:1:11:166:1768 |
1 |
2 |
1 |
1 |
0 |
0 |
612UOAAXX_HWI-EAS229_1:1:97:147:1248 |
2 |
2 |
2 |
2 |
0,1 |
0,1 |
This reports for each read the number of locations that the read maps to in all files, the number of files that have matches found for the read. Then, for each file, it reports the number of matches and the locations it maps to (coded as integers, 0 the first location, 1 the second, …).
In the example above, the first read maps twice to 1 location in both files. This is a read occuring twice in the input file. The second read maps to the same one location in both files, while the third read maps to the two same locations in both input files.
Type:
python diff_bam.py --help
for command line help.
Documentation¶
For read counts to be correct the NH flag to be set correctly.
Command line options¶
usage: diff-bam [-h] [--version] [--header-names HEADERS]
[--timeit TIMEIT_FILE] [--timeit-name TIMEIT_NAME]
[--timeit-header] [--random-seed RANDOM_SEED] [-v LOGLEVEL]
[--log-config-filename LOG_CONFIG_FILENAME]
[--tracing {function}] [-? ?] [-P OUTPUT_FILENAME_PATTERN]
[-F] [-I STDIN] [-L STDLOG] [-E STDERR] [-S STDOUT]
diff-bam: error: argument -?: expected one argument