In this week’s journal club we discussed an excellent review paper by E. Pozharski, C. X. Weichenberger and B. Rupp investigating crystallographic approaches to protein-ligand complex elucidation. The paper assessed and highlighted the shortcomings of deposited PDB structures containing ligand-protein complexes. It then made suggestions for the community as a whole and for researchers making use of ligand-protein complexes in their work.
The paper discussed:
- The difficulties in protein ligand complex elucidation
- The tools to assess the quality of protein-ligand structures both qualitative and quantitative
- The methods used describing their analysis of certain PDB structures
- Some case studies visually demonstrating these issues
- Some practical conclusions for the crystallographic community
- Some practical conclusions for non-crystallographer users of protein-ligand complex structures from the PDB
The basic difficulties of ligand-protein complex elucidation
- Ligands have less than 100% occupancy – sometimes significantly less and thus will inherently show up less clearly in the overall electron density.
- Ligands make small contributions to the overall structure and thus global quality measures , such as r-factors, will be affected only minutely by the ligand portion of the structure being wrong
- The original basis model needs to be used appropriately. The r-free data from the original APO model should be used to avoid model bias
The following are the tools available to inspect the quality of agreement between protein structures and their associated data.
- Visual inspection of the Fo-Fc and 2Fo-Fc maps,using software such as COOT, is essential to assess qualitatively whether a structure is justified by the evidence.
- Use of local measures of quality for example real space correlation coefficients (RSCC)
- Their own tool, making use of the above as well as global quality measure resolution
Methods and results
In a separate publication they had analysed the entirety of the PDB containing both ligands and published structure factors. In this sample they demonstrate 7.6% had RSCC values of less than 0.6 the arbitrary cut off they use to determine whether the experimental evidence supports the model coordinates.
In this publication they visually inspected a subset of structures to assess in more detail how effective that arbitrary cutoff is and ascertain the reason for poor correlation. They showed the following:
(i) Ligands incorrectly identified as questionable,false positives(7.4%)
(ii) Incorrectly modelled ligands (5.2%)
(iii) Ligands with partially missing density (29.2%).
(iv) Glycosylation sites (31.3%)
(v) Ligands placed into electron density that is likely to
originate from mother-liquor components
(vi) Incorrect ligand (4.7%)
(vii) Ligands that are entirely unjustified by the electron
density (11.9%).
The first point on the above data is that the false-positive rate using RSCC of 0.6 is 7.4%. This demonstrates that this value is not sufficient to accurately determine incorrect ligand coordinates. Within the other categories all errors can be attributed to one of or a combination of the following two factors:
- The inexperience of the crystallographer being unable to understand the data in front of them
- The wilful denial of the data in front of the crystallographer in order that they present the data they wanted to see
The paper observed that a disproportionate amount of poor answers was derived from glycosylation sites. In some instances these observations were used to inform the biochemistry of the protein in question. Interestingly this follows observations from almost a decade ago, however many of the examples in the Twilight paper were taken from 2008 or later. This indicates the community as a whole is not reacting to this problem and needs further prodding.
Conclusions and suggestions
For inexperienced users looking at ligand-protein complexes from the PDB:
- Inspect the electron density map using COOT if is available to determine qualitatively is their evidence for the ligand being there
- If using large numbers of ligand-protein complexes, use a script such as Twilight to find the RSCC value for the ligand to give some confidence a ligand is actually present as stated
For the crystallographic community:
- Improved training of crystallographers to ensure errors due to genuine misinterpretation of the underlying data are minimised
- More submission of electron-density maps, even if not publically available they should form part of initial structure validation
- Software is easy to use but difficult to analyse the output