Protein structure determination is still dominated by xray diffraction. For diffraction studies structural biologists need to grow and optimise protein crystals until they diffract to an usable and optimal resolution. A purified protein sample is exposed to a number of crystallisation screens, each comprising a selection of chemical conditions that are designed to explore a reasonably wide area of potential crystallisation conditions.
Many crystallography labs routinely image these in large plate storage systems, which reduces the human interaction to viewing a set of usually 100-1000 images at various time points. This is a slow and laborious process, and highly applicable to machine learning approaches tailored to looking at images. TexRank, a texton analysis ranking software was developed by Jia Tsing in OPIG and is used at the Structural Genomics Consortium (SGC). This ranking reduces the number of images that a human needs to search through, providing a quicker review process.
However the ultimate aim is to further reduce or remove the human review step. The first step is to classify images, with the most important classification being whether a crystal is present. MARCO uses annotated images to classify images into four categories:
- Crystals: 91% predicted
- Precipitate: 96.1% predicted
- Clear: 97.9% predicted
- Other: 69.6% predicted
This is typically better than human classification, when a human classifies two image sets at the beginning and en of ~1000 crystal images, they are around 85% accurate (Snell et al, 2008).
Cinder (Crystallographic Tinder) is an app (Andriod & IOS) that collects human categorisations of crystals to produce a labelled set, that can be used for further machine learning approaches to categorising images. A user can swipe to classify a crystal into four categories. A learning mode (KInder) is supplied to teach new crystallographers how to classify a variety of image types. The app can also be used to score a user’s own plates (C3 facility users).
Although identifying a crystal/ precipitate in a drop is essential, reducing the human interaction will require further classification efforts. For example a crystal screening drop may contain precipitant, crystal and micro crystals. Identifying these features hierarchically will be needed to further study whether that condition could be considered viable. Furthermore, following the potential crystallinity of a drop over time is important, to determine whether a condition can be optimised to produce higher quality crystals. Classifying crystallisation outcomes would ideally be used to predict the conditions in which a protein may crystallise, however this is far from reality in the crystallisation community,
References
- Ng, Jia Tsing et al. “Using Textons to Rank Crystallization Droplets by the Likely Presence of Crystals.” Acta Crystallographica Section D: Biological Crystallography 70.Pt 10 (2014): 2702–2718. PMC. Web. 28 Aug. 2018.
- Snell, Edward H. et al. “Establishing a Training Set through the Visual Analysis of Crystallization Trials. Part I: ∼150 000 Images.” Acta Crystallographica Section D: Biological Crystallography 64.Pt 11 (2008): 1123–1130. PMC. Web. 28 Aug. 2018.
- Bruno AE, Charbonneau P, Newman J, Snell EH, So DR, et al. (2018) Classification of crystallization outcomes using deep convolutional neural networks. PLOS ONE 13(6): e0198883. https://doi.org/10.1371/journal.pone.0198883
- Rosa, N., Ristic, M., Marshall, B. & Newman, J. (2018). Acta Cryst. F74, 410-418.