TreeTools.py - Tools for working with trees

This module contains functions to work with gene and/or species trees.

Reference

TreeTools.Newick2Nexus(infile)

convert newick formatted tree(s) into a nexus object.

Multiple trees are separated by a semicolon. Tree names can be given by fasta-style separators, i.e., lines starting with ‘>’.

If the token [&&NHX is found in the tree, it is assumed to be output from njtree and support values are added. Support values are added in the format taxon:support:branchlength

Parameters

infile (object) – Input data. Can be a file, a list of lines or a single line.

Returns

nexus

Return type

Bio.Nexus.Nexus

TreeTools.Nexus2Newick(nexus, with_branchlengths=True, with_names=False, write_all_taxa=False)

convert nexus tree format to newick format.

Parameters
  • nexus (Bio.Nexus.Nexus) – The trees to output

  • with_branch_lengths (bool) – If True, output branchlengths.

  • with_names (bool) – If True, add node names.

  • write_all_taxa (bool) – Ouput taxa for internal nodes.

Returns

output – Trees in Newick format.

Return type

string

TreeTools.Tree2Newick(tree, with_branch_lengths=True, write_all_taxa=False)

convert tree to newick format.

TreeTools.Newick2Tree(txt)

convert tree to nexus format.

TreeTools.WriteNexus(nexus, **kwargs)

write trees in nexus file format.

TreeTools.GetTaxa(tree)

retrieve all taxa of leaves in a tree.

TreeTools.GetTaxonomicNames(tree)

get list of taxa.

TreeTools.MapTaxa(tree, map_old2new, remove_unknown=False)

update taxa in tree to new taxa.

Parameters
  • tree (Tree) – The tree to update.

  • map_old2new (dict) – Dictionary mapping old taxa to new taxa.

  • remove_unknown (bool) – If true, taxa not in map_old2new will be removed.

TreeTools.Branchlength2Support(tree)

copy values stored as branchlength to into support

The branchlength property is not changed.

This step is necessary when support has been stored as branchlength (e.g. paup), and has thus been read in as branchlength.

TreeTools.Species2Genes(nexus, map_species2genes)

convert a species tree to a gene tree.

Parameters
  • nexus (Bio.Nexus.Nexus) – The trees to work on

  • map_species2genes (dict) – Dictionary mapping species names to gene names

TreeTools.Genes2Species(nexus, map_gene2species)

convert a gene tree into a species tree.

Parameters
  • nexus (Bio.Nexus.Nexus) – The trees to work on

  • map_gene2species (dict) – Dictionary mapping gene names to species names

TreeTools.BuildMapSpecies2Genes(genes, pattern_species='^([^|]+)[|]')

build a map of species to genes

This method assumes that gene names contain the species name and it can be extracted via a regular expression.

Parameters
  • genes (list) – List of genes

  • pattern_species (string) – Regular expression to extract species name from gene name.

Returns

  • map_species2genes (dict) – Mapping between species to one or more genes

  • map_gene2species (dict) – Mapping between a gene to the species

TreeTools.GetMonophyleticPairs(tree)

build list of monophyletic pairs in tree.

TreeTools.GetTaxaForSpecies(tree, species, pattern_species='^([^|]+)[|]')

get all taxa of a given species.

This method assumes that node labels contain the species name and it can be extracted via a regular expression.

Parameters
  • genes (list) – List of genes

  • pattern_species (string) – Regular expression to extract species name from gene name.

Returns

taxa – List of taxa from this species.

Return type

list

TreeTools.IsMonophyleticForSpecies(tree, species, pattern_species='^([^|]+)[|]')

check if a tree is monophyletic for a species.

This method assumes that node labels contain the species name and it can be extracted via a regular expression.

Parameters
  • tree (Tree) – Tree to analyse

  • species (string) – Species to check

  • pattern_species (string) – Regular expression to extract species name from gene name.

Returns

Return type

bool

TreeTools.IsMonophyleticForTaxa(tree, taxa, support=None)

check if a tree is monophyletic for a list of taxa.

Parameters
  • tree (Tree) – Tree to analyse

  • taxa (list) – List of taxa

  • support (float) – Minimum bootstrap support

Returns

Return type

bool

TreeTools.GetLeaves(tree, node)

Return leaves in tree below node.

TreeTools.IsSingleSpecies(tree, node, pattern_species='^([^|]+)[|]')

True if taxa below node contain the same species.

TreeTools.Transcript2GeneTree(tree, map_transcript2gene, map_gene2transcripts)

convert a transcript tree into a gene tree.

supply a map for mapping transcripts to genes.

The procedure for converting a transcript tree into a gene tree:

If there are two genes, and they are monophyletic, no matter how many transcripts, the order is as follows:

1 Merge all nodes into two, one for each gene.

2 The distance between the genes is the minimum distance observed between

two transcripts from different genes. Half of this will be set as the branch length from the gene leaves.

If this is not possible for a set of genes, the procedure will fail and not return a gene tree.

TreeTools.MapTerminalTaxa(tree, mapping)

map taxa in leaves in all trees.

TreeTools.GetCommonAncestor(tree, taxa)

retrieve common ancestor for a list of taxa.

Reroot tree. Check if it is monopyletic. If it is, return root, otherwise, return -1.

TreeTools.TreeDFS(tree, node_id, pre_function=<function Nop>, descend_condition=<function Nop>, post_function=<function Nop>)

BFS tree tree traversal starting at node_id.

Apply functions pre_function at first and post_function at last visit of a node.

TreeTools.GetMaxIndex(tree)

get maximum node number.

TreeTools.GetBranchLengths(tree)

return an array with minimum and maximum branch length.

TreeTools.Reroot(tree, taxa)

reroot tree with taxa - the list of taxa does not need to be monophyletic.

TreeTools.GetSubsets(tree, node=None, with_decoration=True)

return subsets below a certain node including their height (distance from leaves) and branchlength

TreeTools.CountBranchPoints(tree, taxa)

count the number branch points together with their distances for a given list of taxa.

return a list of branch points

TreeTools.IsCompatible(tree1, tree2)

check if two trees are compatible.

note: this will delete support information.

TreeTools.Tree2Graph(tree)

return tree as a list of edges in a graph.

TreeTools.Graph2Tree(links, label_ancestral_nodes=False)

build tree from list of nodes.

Assumption is that links always point from parent to child.

TreeTools.GetAllNodes(tree)

return all nodes in the tree.

TreeTools.GetDistancesBetweenTaxa(tree, taxa1, taxa2)

get average branchlength between taxa1 and taxa2.

TreeTools.PruneTerminal(tree, taxon)

Prunes a terminal taxon from the tree.

id_of_previous_node = prune(tree,taxon) If taxon is from a bifurcation, the connecting node will be collapsed and its branchlength added to remaining terminal node. This might be no longer a meaningful value.

direct copy of Nexus.Trees.py - don’t know why have a separate method, maybe there was a bug in Nexus.Trees.

TreeTools.GetSubtree(tree, node_id)

return a copy of tree from node_id downwards.

TreeTools.Unroot(tree)

unroot tree.

TreeTools.GetSize(tree)

return the length of the tree. This is the maximum node_id + 1.

This quantity is useful for tree traversal while updating a container.

TreeTools.PruneTree(tree, taxa, keep_distance_to_root=False)

prune tree: keep only those taxa in list.

TreeTools.GetNodeMap(tree1, tree2)

map nodes between tree1 and tree2.

TreeTools.ReconciliateByRio(gene_tree, species_tree, extract_species, extract_gene=None, outgroup_species=None, min_branch_length=0.0)

Gene tree G and species tree S

If outgroup_species is given: trees will be cut of as soon as one of the outgroup species is part of a subtree. The corresponding node type will be out-paralog. Out-paralog relationship is cast upwards.

Input trees are rooted and binary.

Output: gene tree with duplication/speciation assigned to each node.

Initialization:

Number nodes in S in pre-order traversal (root = 1), such that child nodes are always larger than parent nodes.

For each external node g of G, set M(g) to the number of the external node in S with the matching species name.

Recursion:

Visit each internal node g of G in post-order traversal, (i.e. from leaves to root):

set a = M(g1) # g1 = first child of current node g
set b = M(g2) # g2 = second child of current node g

while a != b:
    if a > b:
          set a = parent of node a in species tree
    else:
          set b = parent of node b in species tree
set M(g) = a

if M(g) == M(g1) or M(g) == M(g2):
    g is duplication
else:
    g is speciation

The algorithm returns an array for each node with its type.

If extract_gene is given, the algorithm will label transcription nodes for alternative transcripts (duplications involving the same gene).

The algorithm has been extended to accomodate the following test cases:

Alternative transcripts

Alternative transcripts that span genes from other species are permitted, if at most one gene of the other species is involved.

To avoid over-counting of speciation events, the one subtree with the least species is masked.

If the branch length of a node in the gene tree is shorter than min_branch_length, the resultant node is masked, because the topology might be dodgy.

TreeTools.CountDuplications(gene_tree, species_tree, node_types, extract_species, extract_gene=None)

count duplications.

given are gene and species tree and node types (duplication/speciation)

extract_species gives the species for an OTU in the gene tree

Extract_gene gives the gene for an OTU in the gene tree. If not given, all transcripts are counted as unique.

TreeTools.GetParentNodeWhereTrue(node_id, tree, stop_function)

walk up in gene tree and stop where stop_function is true.

The walk finishes at the root.

returns tuple of node and distance.

TreeTools.GetChildNodesWhereTrue(node_id, tree, stop_function)

walk down in tree and stop where stop_function is true

The walk finishes at the leaves.

returns a list of tuples of nodes and distance.

TreeTools.GetDistanceToRoot(tree)

return list with distance to root for each node.

TreeTools.traverseGraph(graph, start, block=[])

traverse graph, go not passed nodes in block.

TreeTools.convertTree2Graph(tree)

convert tree to a graph.

TreeTools.calculatePatternsFromTree(tree, sort_order)

calculate patterns from a tree.