medip_merge_intervals.py - merge differentially methylated regions

Tags

Python

Purpose

This script takes the output of DESeq or EdgeR and merges adjacent intervals that show a similar expression change.

Input is data like this:

contig start end treatment_name  treatment_mean  treatment_std   control_name    control_mean    control_std     pvalue  qvalue  l2fold  fold    significant     status
chr1 10000 11000        CD14    32.9785173324   0       CD4     41.7117152603   0       0.199805206526  1.0     0.338926100945  1.26481475319   0       OK
chr1 14000 15000        CD14    9.32978709019   0       CD4     9.31489982941   0       1.0     1.0     -0.00230390372974       0.998404330063  0       OK
chr1 15000 16000        CD14    9.04603350905   0       CD4     9.01484414416   0       1.0     1.0     -0.00498279072069       0.996552150193  0       OK
chr1 16000 17000        CD14    0.457565479197  0       CD4     0.14910378845   0       0.677265200643  1.0     -1.61766129852  0.325863281276  0       OK

The second and third window would be merged, as

  1. Their methylation levels are within 10% of each other.

  2. They are both not differentially methylated.

It aggregates the following:

  • mean values: average

  • std values: max

  • pvalue: max

  • qvalue: max

  • fold: min/max (depending on enrichment/depletion)

  • l2fold: min/max (depending on enrichment/depletion)

The analysis outputs bed files with intervals that are potentially activated in one of the conditions. Windows with a positive fold change are collected in the treatment, while windows with a negative fold change are collected in the control.

For methylation analysis, it might be more interesting to report windows that are depleted (instead of enriched) of signal. Thus, if the option --invert is given, windows with a negative l2fold change are labeled treatment. Less methylation means that this region is “active” in the treatment condition.

Note that the input is assumed to be sorted by coordinate.

Usage

Example:

python cgat_script_template.py --help

Type:

python cgat_script_template.py --help

for command line help.

Command line options

usage: medip-merge-intervals [-h] [--version] [-o MIN_OVERLAP]
                             [-w PATTERN_WINDOW] [-i] [--timeit TIMEIT_FILE]
                             [--timeit-name TIMEIT_NAME] [--timeit-header]
                             [--random-seed RANDOM_SEED] [-v LOGLEVEL]
                             [--log-config-filename LOG_CONFIG_FILENAME]
                             [--tracing {function}] [-? ?]
                             [-P OUTPUT_FILENAME_PATTERN] [-F] [-I STDIN]
                             [-L STDLOG] [-E STDERR] [-S STDOUT]
medip-merge-intervals: error: argument -?: expected one argument