At group meeting last week I focussed, alongside some metaphysical speculation, on a project which has occupied the first half of my DPhil: namely exploring the preferences of both very old and very young protein structures. This work is currently in preparation for publication so I will give only a brief overview and hopefully update the juicy details later. Feel free to contact me for more information.
Proteins are the molecular machinery of the cell. Their evolution is one of the most fundamental processes which has delivered the diversity and complexity of life that we see around ourselves today. Despite this diversity, protein domains (independent folding units) of known structure fall into just over 1,000 unique SCOP folds.
This project has sought to identify how populations of proteins at different stages of evolution explore their possible structure space.
Superfamily ages
Structural domains are clustered at different levels of similarity within the SCOP classification. At the superfamily level this classification attempts to capture evolutionary relationships through structural and functional similarities even if sequence diversion has occurred.
Evolutionary ages for these superfamilies are then estimated from their phylogenetic profiles across the tree of life. These ages are an estimate of the structural ancestor for a superfamily.
The phylogenetic occurrence profiles are constructed using predictions of superfamilies on completely sequenced genomes using HMMs and taken from the SUPERFAMILY database. Given an occurrence profile and a phylogenetic tree (for robustness we consider several possible reconstructions of the tree of life) we use a maximum parsimony algorithm (proposed by Mirkin et. al) which estimates the simplest scenario of loss events (domain loss on a genome) and gain events (domain gain) at internal nodes on the tree which explains the occurrence profile. The age estimate is the height of the first gain event, normalised between 0 (at the leaves of the tree) and 1 (at the root).
We estimated ages for 1,962 SCOP superfamilies and compared several properties relating to their primary, secondary and tertiary structures, as well as their functions. In particular, we compared two populations of superfamilies: ancients, with an age of 1, and new-borns, with an age < 0.4. Full details of our results will hopefully be published shortly so watch this space!