IndexedGenome.py - Random access to interval lists¶
This module provides a consistent front-end to various interval containers.
Two implementations are available:
- NCL
Nested containment lists as described in http://bioinformatics.oxfordjournals.org/content/23/11/1386.short. The implemenation was taken from pygr.
- quicksect
Quicksect algorithm used in Galaxy, see here. This requires python.bx to be installed. The benefit of quicksect is that it allows also quick retrieval of intervals that are closest before or after an query.
The principal clas is IndexedGenome which uses NCL and stores
a value associated with each interval. Quicksect is equivalent
to IndexedGenome but uses quicksect. The Simple is a
light-weight version of IndexedGenome that does not store a
value and thus preserves space.
The basic usage is:
from IndexedGenome import IndexedGenome
index = IndexedGenome()
for contig, start, end, value in intervals:
index.add(contig, start, end, value)
print index.contains("chr1", 1000, 2000)
print index.get("chr1", 10000, 20000)
The index is built in memory.
Reference¶
-
class
IndexedGenome.IndexedGenome¶ Bases:
objectGenome with indexed intervals.
-
index_factory¶ alias of
cgat.NCL.NCL
-
get(contig, start, end)¶ return intervals overlapping with key.
-
-
class
IndexedGenome.Simple(*args, **kwargs)¶ Bases:
IndexedGenome.IndexedGenomeindex intervals without storing a value.
-
index_factory¶ alias of
cgat.NCL.NCLSimple
-
-
class
IndexedGenome.Quicksect(*args, **kwargs)¶ Bases:
IndexedGenome.IndexedGenomeindex intervals using quicksect.
Permits finding closest interval in case there is no overlap.
-
get(contig, start, end)¶ return intervals overlapping with key.
-
before(contig, start, end, num_intervals=1, max_dist=2500)¶ get closest interval before start.
-
after(contig, start, end, num_intervals=1, max_dist=2500)¶ get closest interval after end.
-