For my journal club last week, I chose to look at a recent paper entitled “Addressing the Role of Conformational Diversity in Protein Structure Prediction”, by Palopoli et al [1]. In the study of proteins, structures are incredibly useful tools, offering information about how they carry out their function, and allowing informed decisions to be made in many areas (e.g. drug design). Since the experimental determination is difficult, however, the computational prediction of protein structures has become very important (and a number of us here at OPIG work on this!).
A problem, however, in both experimental structure determination and computational structure prediction, is that proteins are generally treated as static – the output of an X-ray crystallography experiment is a single structure, and in the majority of cases the goal of structure prediction is to produce one model that closely resembles the native structure. The accuracy of structure prediction algorithms is also normally measured by comparing the resulting model to a single, known experimentally-determined structure. The issue here is that proteins are not static – they are constantly moving and may adopt a number of different conformations; the structure observed experimentally is just a snapshot of that motion. The dynamics of a protein may even play an important role in its function; an example is haemoglobin, which after binding to oxygen changes conformation to increase affinity for further binding. It may be more appropriate, then, to represent a protein as an ensemble of structures, and not just one.
Conformational diversity helps the protein haemoglobin carry out its function (the transportation of oxygen in the blood). Haemoglobin has four subunits, each containing a haem group, shown in red. When oxygen binds to this group (blue), a histidine residue moves, shifting the position of an alpha helix. This movement is propagated throughout the entire structure, and increases the affinity for oxygen of the other subunits – binding therefore becomes increasingly easy (this is known as co-operative binding). Gif shown is from the PDB-101 Molecule of the Month series: S. Dutta and D. Goodsell, doi:10.2210/rcsb_pdb/mom_2003_5
How, though, could this be incorporated into protein structure prediction? This is the question being considered by the authors of this paper. They consider conformational diversity by looking at different conformers of the same protein – there are many proteins whose structures have been solved experimentally multiple times, and as such have a number of structures available in the PDB. Information about this is stored in a useful database called CoDNaS [2], which was developed by some of the authors of the paper under discussion. In some cases, there are model (or decoy) structures available for these proteins, generated by various structure prediction algorithms – for example, all models submitted for the CASP experiments [3], where the current accuracy of structure prediction is monitored through blind prediction, are freely available for download. The authors curated a collection of decoy sets for 91 different proteins for which multiple experimental structures are present in the PDB.
As mentioned previously, the accuracy of a model is normally evaluated by measuring its structural similarity to one known (or reference) structure – only one conformer of the protein is considered. The authors show that the model rankings achieved by this are highly dependent on the chosen reference structure. If the possible choices (i.e. the observed conformers) are quite similar the effect is small, but if there is a large difference, then two completely different decoys could be designated as the most accurate depending on which reference structure is used.
The key figure from this paper, in my opinion, is the one shown below. For the two most dissimilar experimentally-observed conformers for each protein in the set, the RMSD of the best decoy in relation to one conformer is plotted against the RMSD of the best decoy when measured against the other:
The straight line on this graph indicates what would be observed if there are decoys in the set that equally represent the two conformers; for example, if the best decoy with reference to conformer 1 has an RMSD of 1 Å, then there is also a decoy that is 1 Å away from conformer 2. Most points are on or near this line – this means that the sets of decoy structures are not biased towards one of the conformers. Therefore, structure prediction algorithms seem to be able to generate models for multiple conformations of proteins, and so the production of an ensemble of models is not an impossible dream. Several obstacles remain, however – although of equal distance to both conformers, the decoys could still be of poor quality; and decoy selection is often inaccurate, and so finding these multiple conformations amongst all others is a challenge.
[1] – Palopoli, N., Monzon, A. M., Parisi, G., and Fornasari, M. S. (2016). Addressing the Role of Conformational Diversity in Protein Structure Prediction. PLoS One, 11, e0154923.
[2] – Monzon, A. M., Juritz, E., Fornasari, S., and Parisi, G. (2013). CoDNaS: a database of conformational diversity in the native state of proteins. Bioinformatics, 29, 2512–2514.
[3] – Moult, J., Pedersen, J. T., Judson, R., and Fidelis, K. (1995). A Large-Scale Experiment to Assess Protein Structure Prediction Methods. Proteins, 23, ii–iv.