Multiple Sequence Alignments can provide a lot of information relating to the relationships between proteins. One notable example was the map of the kinome space published in 2002 (Figure 1).
Such images organize our thinking about the possible space of such proteins/genes going beyond long lists of multiple sequence alignments. The image in Figure 1, got a revamp later which now is the popular ‘kinome poster’ (Figure 2).
Here we have created a script to produce similar dendrograms straight from the multiple sequence alignment files (although clearly not as pretty as Fig 2!). It is not difficult to find software that would produce ‘a dendrogram’ from an MSA but making it do the simple thing of annotating the nodes with colors, shapes etc. with respect to the labels of the genes/sequences is slightly more problematic. Sizes might correspond to the importance of given nodes and colors can organize by their tree branches. The script uses the Biopython module Phylo to construct a tree from an arbitrary MSA and networkx to draw it:
python Treebeard.py import networkx, pylab from networkx.drawing.nx_agraph import graphviz_layout from Bio import Phylo from Bio.Phylo.TreeConstruction import DistanceCalculator from Bio.Phylo.TreeConstruction import DistanceTreeConstructor from Bio import AlignIO #What color to give to the edges? e_color = '#ccccff' #What colors to give to the nodes with similar labels? color_scheme = {'RSK':'#e60000','SGK':'#ffff00','PKC':'#32cd32','DMPK':'#e600e6','NDR':'#3366ff','GRK':'#8080ff','PKA':'magenta','MAST':'green','YANK':'pink'} #What sizes to give to the nodes with similar labels? size_scheme = {'RSK':200,'SGK':150,'PKC':350,'DMPK':400,'NDR':280,'GRK':370,'PKA':325,'MAST':40,'YANK':200} #Edit this to produce a custom label to color mapping def label_colors(label): color_to_set = 'blue' for label_subname in color_scheme: if label_subname in label: color_to_set = color_scheme[label_subname] return color_to_set #Edit this to produce a custom label to size mapping def label_sizes(label): #Default size size_to_set = 20 for label_subname in size_scheme: if label_subname in label: size_to_set = size_scheme[label_subname] return size_to_set #Draw a tree whose alignment is stored in msa.phy def draw_tree(): #This loads the default kinase alignment that should be in the same directory as this script aln = AlignIO.read('agc.aln', 'clustal') #This will construct the unrooted tree. calculator = DistanceCalculator('identity') dm = calculator.get_distance(aln) constructor = DistanceTreeConstructor() tree = constructor.nj(dm) G = Phylo.to_networkx(tree) node_sizes = [] labels = {} node_colors = [] for n in G: label = str(n) if 'Inner' in label: #These are the inner tree nodes -- leave them blank and with very small sizes. node_sizes.append( 1 ) labels[n] = '' node_colors.append(e_color) else: #Size of the node depends on the labels! node_sizes.append( label_sizes(label) ) #Set colors depending on our color scheme and label names node_colors.append(label_colors(label)) #set the label that will appear in each node labels[n] = label #Draw the tree given the info we provided! pos = graphviz_layout(G) networkx.draw(G, pos,edge_color=e_color,node_size = node_sizes, labels=labels, with_labels=True,node_color=node_colors) #Showing pylab.show() #Saving the image -- uncomment #pylab.savefig('example.png') if __name__ == '__main__': draw_tree()
We are going to use the kinase alignment example to demonstrate how the script can be used. The kinase alignment we use can be found here on the kinase.com website. We load the alignment and construct the unrooted tree using the Bio.Phylo module. Note that on each line of the alignment there is a name. These names are the labels that we use to define the colors and sizes of nodes. There are two dummy functions that achieve that label_nodes() and label_sizes() — if you look at them it should be clear how to define your own custom labeling.
If you download the code and the alignment and run it by:
python Treebeard.py
You should see a similar image as in Fig 3.