Bed.py - Tools for working with bed files¶
This module contains methods for working with bed formatted files.
The principal class is Bed
to represent bed formatted
entries. The method iterate()
iterates over a bed file and is
aware of UCSC track information that might be embedded in the
file. Additional functions can process intervals (merge()
,
binIntervals()
, setName()
, etc).
The method readAndIndex()
can build an in-memory index of a bed-file
for quick cross-referencing.
Reference¶
-
class
Bed.
Bed
¶ Bases:
object
an interval in bed format.
Coordinates are represented as 0-based, half-open intervals.
Fields in the record can be accessed as attributes or through a dictionary type access:
print b.contig() print b["contig"]
Bed-formatted records can have a variable number of columuns with a minimum of 3. Accessing an optional attribute that is not present will raise an IndexError.
-
contig
¶ Chromosome/contig.
- Type
string
-
name
¶ Name of the interval (optional).
- Type
string
-
strand
¶ Strand of the interval (optional).
- Type
char
-
thickStart
¶
-
thickEnd
¶
-
itemRGB
¶
-
blockSizes
¶ Comma-separated list of sizes of the blocks (BED12).
- Type
string
-
blockStarts
¶ Comma-separated list of start positions of the blocks (BED12).
- Type
string
-
copy
()¶ Returns a new bed object that is a copy of this one
-
fromGTF
(gff, is_gtf=False, name=None)¶ fill fields from gtf formatted entry
- Parameters
gff (a gff entry.) – The object should contain the fields
contig
,start
andend
in 0-based, half-open coordinates.name (bool) – If given, attempt to set the name atttribute of the interval by this attribute of the gff object such as
gene_id
ortranscript_id
.
-
toIntervals
()¶ return intervals for BED12 entries.
If the entry is not BED12, the whole region will be returned.
- Returns
intervals – A list of tuples (start,end) with the block coordinates in the Bed entry.
- Return type
list
-
fromIntervals
(intervals)¶ Fill co-ordinates from list of intervals.
If multiple intervals are provided and entry is BED12 then the blocks are automatically set.
- Parameters
intervals (list) – List of tuples (start, end) with block coordinates.
-
property
columns
¶ return number of columns in bed-entry.
-
-
Bed.
iterator
(infile)¶ iterate over a bed formatted file.
Comments and empty lines are ignored. The iterator is track aware and will set the
track
attribute for the Bed objects it yields.- Parameters
infile (File) –
- Yields
bed –
Bed
object
-
Bed.
bed_iterator
(infile)¶ Deprecated, use
iterator()
.
-
Bed.
setName
(iterator)¶ yield bed entries in which name is set to the record number if unset.
- Yields
bed –
Bed
object
-
Bed.
grouped_iterator
(iterator)¶ yield bed results grouped by track.
Note that the iterator supplied needs to be sorted by the track attribute. This is usually the case in bed formatted files.
- Yields
bed –
Bed
object
-
Bed.
blocked_iterator
(iterator)¶ yield blocked bed results.
Intervals with the same name are merged into a single entry. This method can be used to convert BED6 formatted entries to BED12. Note that the input iterator needs to be sorted by bed name.
- Yields
bed –
Bed
object
-
Bed.
readAndIndex
(infile, with_values=False, per_track=False)¶ read and index a bed formatted file in
infile
.The index is not strand-aware.
- Parameters
- Returns
index – A dictionary of nested containment lists (NCL). Each key is a contig. If per_track is set, the dictionary has an additional first level for the track.
- Return type
-
Bed.
binIntervals
(iterator, num_bins=5, method='equal-bases', bin_edges=None)¶ merge adjacent intervals by the score attribute.
This method takes all the intervals in the collection builds a histogram of all the scores in the collection. The partition into the bins can use one of the following merging methods:
- equal-bases
merge intervals such that each bin contains the equal number of bases
- equal-intervals
merge intervals such that each bin contains the equal number intervals
This method requires the fifth field (score) of the bed input file to be present.
- Parameters
iterator – Iterator yielding bed intervals
num_bins (int) – Number of bins to create in the histogram
method (string) – Binning method
bin_edges (list) – List of bin edges. These take precedence over method.
- Returns
intervals (list) – list of intervals (
Bed
)bin_edges (list) – list of bin edges
-
Bed.
merge
(iterator)¶ merge overlapping intervals and returns a list of merged intervals.