IndexedGenome.py - Random access to interval lists¶
This module provides a consistent front-end to various interval containers.
Two implementations are available:
- NCL
Nested containment lists as described in http://bioinformatics.oxfordjournals.org/content/23/11/1386.short. The implemenation was taken from pygr.
- quicksect
Quicksect algorithm used in Galaxy, see here. This requires python.bx to be installed. The benefit of quicksect is that it allows also quick retrieval of intervals that are closest before or after an query.
The principal clas is IndexedGenome
which uses NCL and stores
a value associated with each interval. Quicksect
is equivalent
to IndexedGenome
but uses quicksect. The Simple
is a
light-weight version of IndexedGenome
that does not store a
value and thus preserves space.
The basic usage is:
from IndexedGenome import IndexedGenome
index = IndexedGenome()
for contig, start, end, value in intervals:
index.add(contig, start, end, value)
print index.contains("chr1", 1000, 2000)
print index.get("chr1", 10000, 20000)
The index is built in memory.
Reference¶
-
class
IndexedGenome.
IndexedGenome
¶ Bases:
object
Genome with indexed intervals.
-
index_factory
¶ alias of
cgat.NCL.NCL
-
get
(contig, start, end)¶ return intervals overlapping with key.
-
-
class
IndexedGenome.
Simple
(*args, **kwargs)¶ Bases:
IndexedGenome.IndexedGenome
index intervals without storing a value.
-
index_factory
¶ alias of
cgat.NCL.NCLSimple
-
-
class
IndexedGenome.
Quicksect
(*args, **kwargs)¶ Bases:
IndexedGenome.IndexedGenome
index intervals using quicksect.
Permits finding closest interval in case there is no overlap.
-
get
(contig, start, end)¶ return intervals overlapping with key.
-
before
(contig, start, end, num_intervals=1, max_dist=2500)¶ get closest interval before start.
-
after
(contig, start, end, num_intervals=1, max_dist=2500)¶ get closest interval after end.
-