The similarity between two protein structures can be measured using TM-score (template modelling score). This can be particularly useful when examining the quality of a model, as compared to a target or template structure. One common method of comparing protein structures has been by calculating the root mean squared deviation (RMSD) from the distances of equivalent residues in both structures. An issue with this is that, as all residue pairs are weighted evenly, when the RMSD value is large, it becomes more sensitive to local structure deviation rather than to the global topology. Other established scoring functions, such as GDT-TS (1) and MaxSub (2) rely on finding substructures of the model, where all residues are within a certain threshold distance of the corresponding template residues. However, this threshold distance is subjective and therefore could not be used “as standard” for all proteins. A major disadvantage with all of these methods is that they display power-law dependence with the length of the protein.
TM-score (3) was developed in order to overcome this length dependence. It is a variation of the Levitt-Gerstein (LG) score, which weights shorter distances between corresponding residues more strongly than longer distances. This ensures there is more sensitivity to global topology rather than local structure deviations. TM-score is defined:
where Max is the maximum value after optimal superposition, LN is the length of the native structure, Lr is the length of the aligned residues to the template structure, di is the distance between the ith pair of residues and d0 is a scaling factor. In alternative scoring functions, including MaxSub, d0 is taken to be constant. TM-score uses the below equation to define d0:
which is an approximation of the average distance of corresponding residue pairs of random related proteins. This removes the dependence of TM-score on protein length.
The value of TM-score always lies between (0,1]; evaluations of TM-score distributions have shown that when the TM-score between two structures <0.17, the P–value is close to 1 and the protein structures are indistinguishable from random structure pairs. When the TM-score reaches 0.5, the P-value is vastly reduced and the structures are mostly in the same fold (4). Therefore it is suggested that TM-score may be useful not only in the automated assessment of protein structure predictions, but also to determine similar folds in protein topology classification.
- Zemla A, Venclovas Č, Moult J, Fidelis K. Processing and analysis of CASP3 protein structure predictions. Proteins Struct Funct Genet. 1999;37(SUPPL. 3):22–9.
- Siew N, Elofsson a, Rychlewski L, Fischer D. MaxSub: an automated measure for the assessment of protein structure prediction quality. Bioinformatics. 2000;16(9):776–85.
- Zhang Y, Skolnick J. Scoring function for automated assessment of protein structure template quality. Proteins [Internet]. 2004;57(4):702–10. Available from: http://www.ncbi.nlm.nih.gov/pubmed/15476259
- Xu J, Zhang Y. How significant is a protein structure similarity with TM-score = 0.5? Bioinformatics. 2010;26(7):889–95.