TreeTools.py - Tools for working with trees¶
This module contains functions to work with gene and/or species trees.
Reference¶
-
TreeTools.
Newick2Nexus
(infile)¶ convert newick formatted tree(s) into a nexus object.
Multiple trees are separated by a semicolon. Tree names can be given by fasta-style separators, i.e., lines starting with ‘>’.
If the token [&&NHX is found in the tree, it is assumed to be output from njtree and support values are added. Support values are added in the format taxon:support:branchlength
- Parameters
infile (object) – Input data. Can be a file, a list of lines or a single line.
- Returns
nexus
- Return type
Bio.Nexus.Nexus
-
TreeTools.
Nexus2Newick
(nexus, with_branchlengths=True, with_names=False, write_all_taxa=False)¶ convert nexus tree format to newick format.
-
TreeTools.
Tree2Newick
(tree, with_branch_lengths=True, write_all_taxa=False)¶ convert tree to newick format.
-
TreeTools.
Newick2Tree
(txt)¶ convert tree to nexus format.
-
TreeTools.
WriteNexus
(nexus, **kwargs)¶ write trees in nexus file format.
-
TreeTools.
GetTaxa
(tree)¶ retrieve all taxa of leaves in a tree.
-
TreeTools.
GetTaxonomicNames
(tree)¶ get list of taxa.
-
TreeTools.
MapTaxa
(tree, map_old2new, remove_unknown=False)¶ update taxa in tree to new taxa.
-
TreeTools.
Branchlength2Support
(tree)¶ copy values stored as branchlength to into support
The branchlength property is not changed.
This step is necessary when support has been stored as branchlength (e.g. paup), and has thus been read in as branchlength.
-
TreeTools.
Species2Genes
(nexus, map_species2genes)¶ convert a species tree to a gene tree.
- Parameters
nexus (Bio.Nexus.Nexus) – The trees to work on
map_species2genes (dict) – Dictionary mapping species names to gene names
-
TreeTools.
Genes2Species
(nexus, map_gene2species)¶ convert a gene tree into a species tree.
- Parameters
nexus (Bio.Nexus.Nexus) – The trees to work on
map_gene2species (dict) – Dictionary mapping gene names to species names
-
TreeTools.
BuildMapSpecies2Genes
(genes, pattern_species='^([^|]+)[|]')¶ build a map of species to genes
This method assumes that gene names contain the species name and it can be extracted via a regular expression.
- Parameters
genes (list) – List of genes
pattern_species (string) – Regular expression to extract species name from gene name.
- Returns
map_species2genes (dict) – Mapping between species to one or more genes
map_gene2species (dict) – Mapping between a gene to the species
-
TreeTools.
GetMonophyleticPairs
(tree)¶ build list of monophyletic pairs in tree.
-
TreeTools.
GetTaxaForSpecies
(tree, species, pattern_species='^([^|]+)[|]')¶ get all taxa of a given species.
This method assumes that node labels contain the species name and it can be extracted via a regular expression.
- Parameters
genes (list) – List of genes
pattern_species (string) – Regular expression to extract species name from gene name.
- Returns
taxa – List of taxa from this species.
- Return type
list
-
TreeTools.
IsMonophyleticForSpecies
(tree, species, pattern_species='^([^|]+)[|]')¶ check if a tree is monophyletic for a species.
This method assumes that node labels contain the species name and it can be extracted via a regular expression.
-
TreeTools.
IsMonophyleticForTaxa
(tree, taxa, support=None)¶ check if a tree is monophyletic for a list of taxa.
-
TreeTools.
GetLeaves
(tree, node)¶ Return leaves in tree below node.
-
TreeTools.
IsSingleSpecies
(tree, node, pattern_species='^([^|]+)[|]')¶ True if taxa below node contain the same species.
-
TreeTools.
Transcript2GeneTree
(tree, map_transcript2gene, map_gene2transcripts)¶ convert a transcript tree into a gene tree.
supply a map for mapping transcripts to genes.
The procedure for converting a transcript tree into a gene tree:
If there are two genes, and they are monophyletic, no matter how many transcripts, the order is as follows:
1 Merge all nodes into two, one for each gene.
- 2 The distance between the genes is the minimum distance observed between
two transcripts from different genes. Half of this will be set as the branch length from the gene leaves.
If this is not possible for a set of genes, the procedure will fail and not return a gene tree.
-
TreeTools.
MapTerminalTaxa
(tree, mapping)¶ map taxa in leaves in all trees.
-
TreeTools.
GetCommonAncestor
(tree, taxa)¶ retrieve common ancestor for a list of taxa.
Reroot tree. Check if it is monopyletic. If it is, return root, otherwise, return -1.
-
TreeTools.
TreeDFS
(tree, node_id, pre_function=<function Nop>, descend_condition=<function Nop>, post_function=<function Nop>)¶ BFS tree tree traversal starting at node_id.
Apply functions pre_function at first and post_function at last visit of a node.
-
TreeTools.
GetMaxIndex
(tree)¶ get maximum node number.
-
TreeTools.
GetBranchLengths
(tree)¶ return an array with minimum and maximum branch length.
-
TreeTools.
Reroot
(tree, taxa)¶ reroot tree with taxa - the list of taxa does not need to be monophyletic.
-
TreeTools.
GetSubsets
(tree, node=None, with_decoration=True)¶ return subsets below a certain node including their height (distance from leaves) and branchlength
-
TreeTools.
CountBranchPoints
(tree, taxa)¶ count the number branch points together with their distances for a given list of taxa.
return a list of branch points
-
TreeTools.
IsCompatible
(tree1, tree2)¶ check if two trees are compatible.
note: this will delete support information.
-
TreeTools.
Tree2Graph
(tree)¶ return tree as a list of edges in a graph.
-
TreeTools.
Graph2Tree
(links, label_ancestral_nodes=False)¶ build tree from list of nodes.
Assumption is that links always point from parent to child.
-
TreeTools.
GetAllNodes
(tree)¶ return all nodes in the tree.
-
TreeTools.
GetDistancesBetweenTaxa
(tree, taxa1, taxa2)¶ get average branchlength between taxa1 and taxa2.
-
TreeTools.
PruneTerminal
(tree, taxon)¶ Prunes a terminal taxon from the tree.
id_of_previous_node = prune(tree,taxon) If taxon is from a bifurcation, the connecting node will be collapsed and its branchlength added to remaining terminal node. This might be no longer a meaningful value.
direct copy of Nexus.Trees.py - don’t know why have a separate method, maybe there was a bug in Nexus.Trees.
-
TreeTools.
GetSubtree
(tree, node_id)¶ return a copy of tree from node_id downwards.
-
TreeTools.
Unroot
(tree)¶ unroot tree.
-
TreeTools.
GetSize
(tree)¶ return the length of the tree. This is the maximum node_id + 1.
This quantity is useful for tree traversal while updating a container.
-
TreeTools.
PruneTree
(tree, taxa, keep_distance_to_root=False)¶ prune tree: keep only those taxa in list.
-
TreeTools.
GetNodeMap
(tree1, tree2)¶ map nodes between tree1 and tree2.
-
TreeTools.
ReconciliateByRio
(gene_tree, species_tree, extract_species, extract_gene=None, outgroup_species=None, min_branch_length=0.0)¶ Gene tree G and species tree S
If outgroup_species is given: trees will be cut of as soon as one of the outgroup species is part of a subtree. The corresponding node type will be out-paralog. Out-paralog relationship is cast upwards.
Input trees are rooted and binary.
Output: gene tree with duplication/speciation assigned to each node.
Initialization:
Number nodes in S in pre-order traversal (root = 1), such that child nodes are always larger than parent nodes.
For each external node g of G, set M(g) to the number of the external node in S with the matching species name.
Recursion:
Visit each internal node g of G in post-order traversal, (i.e. from leaves to root):
set a = M(g1) # g1 = first child of current node g set b = M(g2) # g2 = second child of current node g while a != b: if a > b: set a = parent of node a in species tree else: set b = parent of node b in species tree set M(g) = a if M(g) == M(g1) or M(g) == M(g2): g is duplication else: g is speciation
The algorithm returns an array for each node with its type.
If extract_gene is given, the algorithm will label transcription nodes for alternative transcripts (duplications involving the same gene).
The algorithm has been extended to accomodate the following test cases:
- Alternative transcripts
Alternative transcripts that span genes from other species are permitted, if at most one gene of the other species is involved.
To avoid over-counting of speciation events, the one subtree with the least species is masked.
If the branch length of a node in the gene tree is shorter than min_branch_length, the resultant node is masked, because the topology might be dodgy.
-
TreeTools.
CountDuplications
(gene_tree, species_tree, node_types, extract_species, extract_gene=None)¶ count duplications.
given are gene and species tree and node types (duplication/speciation)
extract_species gives the species for an OTU in the gene tree
Extract_gene gives the gene for an OTU in the gene tree. If not given, all transcripts are counted as unique.
-
TreeTools.
GetParentNodeWhereTrue
(node_id, tree, stop_function)¶ walk up in gene tree and stop where stop_function is true.
The walk finishes at the root.
returns tuple of node and distance.
-
TreeTools.
GetChildNodesWhereTrue
(node_id, tree, stop_function)¶ walk down in tree and stop where stop_function is true
The walk finishes at the leaves.
returns a list of tuples of nodes and distance.
-
TreeTools.
GetDistanceToRoot
(tree)¶ return list with distance to root for each node.
-
TreeTools.
traverseGraph
(graph, start, block=[])¶ traverse graph, go not passed nodes in block.
-
TreeTools.
convertTree2Graph
(tree)¶ convert tree to a graph.
-
TreeTools.
calculatePatternsFromTree
(tree, sort_order)¶ calculate patterns from a tree.