Bed.py - Tools for working with bed files

This module contains methods for working with bed formatted files.

Note

Another way to access the information in bed formatted files is through pysam.

The principal class is Bed to represent bed formatted entries. The method iterate() iterates over a bed file and is aware of UCSC track information that might be embedded in the file. Additional functions can process intervals (merge(), binIntervals(), setName(), etc).

The method readAndIndex() can build an in-memory index of a bed-file for quick cross-referencing.

Reference

class Bed.Bed

Bases: object

an interval in bed format.

Coordinates are represented as 0-based, half-open intervals.

Fields in the record can be accessed as attributes or through a dictionary type access:

print b.contig()
print b["contig"]

Bed-formatted records can have a variable number of columuns with a minimum of 3. Accessing an optional attribute that is not present will raise an IndexError.

contig

Chromosome/contig.

Type

string

start

Start position of the interval.

Type

int

end

End position of the interval.

Type

int

name

Name of the interval (optional).

Type

string

score

Score associated with interval (optional).

Type

float

strand

Strand of the interval (optional).

Type

char

thickStart
thickEnd
itemRGB
blockCount

Number of blocks for bed intervals spanning multiple blocks (BED12).

Type

int

blockSizes

Comma-separated list of sizes of the blocks (BED12).

Type

string

blockStarts

Comma-separated list of start positions of the blocks (BED12).

Type

string

copy()

Returns a new bed object that is a copy of this one

fromGTF(gff, is_gtf=False, name=None)

fill fields from gtf formatted entry

Parameters
  • gff (a gff entry.) – The object should contain the fields contig, start and end in 0-based, half-open coordinates.

  • name (bool) – If given, attempt to set the name atttribute of the interval by this attribute of the gff object such as gene_id or transcript_id.

toIntervals()

return intervals for BED12 entries.

If the entry is not BED12, the whole region will be returned.

Returns

intervals – A list of tuples (start,end) with the block coordinates in the Bed entry.

Return type

list

fromIntervals(intervals)

Fill co-ordinates from list of intervals.

If multiple intervals are provided and entry is BED12 then the blocks are automatically set.

Parameters

intervals (list) – List of tuples (start, end) with block coordinates.

property columns

return number of columns in bed-entry.

class Bed.Track(line)

Bases: object

Bed track information.

Bed.iterator(infile)

iterate over a bed formatted file.

Comments and empty lines are ignored. The iterator is track aware and will set the track attribute for the Bed objects it yields.

Parameters

infile (File) –

Yields

bedBed object

Bed.bed_iterator(infile)

Deprecated, use iterator().

Bed.setName(iterator)

yield bed entries in which name is set to the record number if unset.

Yields

bedBed object

Bed.grouped_iterator(iterator)

yield bed results grouped by track.

Note that the iterator supplied needs to be sorted by the track attribute. This is usually the case in bed formatted files.

Yields

bedBed object

Bed.blocked_iterator(iterator)

yield blocked bed results.

Intervals with the same name are merged into a single entry. This method can be used to convert BED6 formatted entries to BED12. Note that the input iterator needs to be sorted by bed name.

Yields

bedBed object

Bed.readAndIndex(infile, with_values=False, per_track=False)

read and index a bed formatted file in infile.

The index is not strand-aware.

Parameters
  • infile (File) – File object to read from.

  • with_values (bool) – If True, store the actual bed entry. Otherwise, just the intervals are recorded and any additional fields will be ignored.

  • per_track (bool) – If True build indices per track.

Returns

index – A dictionary of nested containment lists (NCL). Each key is a contig. If per_track is set, the dictionary has an additional first level for the track.

Return type

dict

Bed.binIntervals(iterator, num_bins=5, method='equal-bases', bin_edges=None)

merge adjacent intervals by the score attribute.

This method takes all the intervals in the collection builds a histogram of all the scores in the collection. The partition into the bins can use one of the following merging methods:

equal-bases

merge intervals such that each bin contains the equal number of bases

equal-intervals

merge intervals such that each bin contains the equal number intervals

This method requires the fifth field (score) of the bed input file to be present.

Parameters
  • iterator – Iterator yielding bed intervals

  • num_bins (int) – Number of bins to create in the histogram

  • method (string) – Binning method

  • bin_edges (list) – List of bin edges. These take precedence over method.

Returns

  • intervals (list) – list of intervals (Bed)

  • bin_edges (list) – list of bin edges

Bed.merge(iterator)

merge overlapping intervals and returns a list of merged intervals.

Bed.getNumColumns(filename)

return number of fields in bed-file by looking at the first entry.

Returns

ncolumns – The number of columns. If the file is empty, 0 is returned.

Return type

int