Category Archives: Protein Structure

OpenMM Setup: Start Simulating Proteins in 5 Minutes

Molecular dynamics (MD) simulations are a good way to explore the dynamical behaviour of a protein you might be interested in. One common problem is that they often have a relatively steep learning curve when using most MD engines.

What if you just want to run a simple, one-off simulation with no fancy enhanced sampling methods? OpenMM Setup is a useful tool for exactly this. It is built on the open-source OpenMM engine and provides an easy to install (via conda) GUI that can have you running a simulation in less than 5 minutes. Of course, running a simulation requires careful setting of parameters and being familiar with best practices and while this is beyond the scope of this post, there are many guides out there that can easily be found. Now on to the good stuff: using OpenMM Setup!

When you first run OpenMM Setup, you’ll be greeted by a browser window asking you to choose a structure to use. This can be a crystal structure or a model. Remember, sometimes these will have problems that need fixing like missing density or charged, non-physiological termini that would lead to artefacts, so visual inspection of the input is key! You can then choose the force field and water model you want to use, and tell OpenMM to do some cleaning up of the structure. Here I am running the simulation on hen egg-white lysozyme:

Continue reading

Fragment Based Drug Discovery with Crystallographic Fragment Screening at XChem and Beyond

Disclaimer: I’m a current PhD student working on PanDDA 2 for Frank von Delft and Charlotte Deane, and sponsored by Global Phasing, and some of this is my opinion – if it isn’t obvious in one of the references I probably said it so take it with a pinch of salt

Fragment Based Drug Discovery

Principle

Fragment based drugs discovery (FBDD) is a technique for finding lead compounds for medicinal chemistry. In FBDD a protein target of interest is identified for inhibition and a small library, typically of a few hundred compounds, is screened against it. Though these typically bind weakly, they can be used as a starting point for chemical elaboration towards something more lead-like. This approach is primarily contrasted with high throughput screening (HTS), in which an enormous number of larger, more complex molecules are screened in order to find ones which bind. The key idea is recognizing that the molecules in these HTS libraries can typically be broken down into a much smaller number of common substructures, fragments, so screening these ought to be more informative: between them they describe more of the “chemical space” which interacts with the protein. Since it first appeared about 25 years ago, FBDD has delivered four drugs for clinical use and over 40 molecules to clinical trials.

Continue reading

Model validation in Crystallographic Fragment Screening

Fragment based drug discovery is a powerful technique for finding lead compounds for medicinal chemistry. Crystallographic fragment screening is particularly useful because it informs one not just about whether a fragment binds, but has the advantage of providing information on how it binds. This information allows for rational elaboration and merging of fragments.

However, this comes with a unique challenge: the confidence in the experimental readout, if and how a fragment binds, is tied to the quality of the crystallographic model that can be built. This intimately links crystallographic fragment screening to the general statistical idea of a “model”, and the statistical ideas of goodness of fit and overfitting.

Continue reading

New review on BCR/antibody repertoire analysis out in MAbs!

In our latest immunoinformatics review, OPIG has teamed up with experienced antibody consultant Dr. Anthony Rees to outline the evidence for BCR/antibody repertoire convergence on common epitopes post-pathogen exposure, and all the ways we can go about detecting it from repertoire gene sequencing data. We highlight the new advances in the repertoire functional analysis field, including the role for OPIG’s latest tools for structure-aware antibody analytics: Structural Annotation of AntiBody repertoires+ (SAAB+), Paratyping, Ab-Ligity, Repertoire Structural Profiling & Structural Profiling of Antibodies to Cluster by Epitope (‘SPACE’).

Continue reading

Unraveling the role of entanglement in protein misfolding

Proteins that fail to fold correctly may populate misfolded conformations with disparate structure and function. Misfolding is the focus of intense research interest due to its putative and confirmed role in various diseases, including neurodegenerative diseases such as Parkinson’s and Alzheimer’s Diseases as well as cystic fibrosis (PMID: 16689923).

Many open questions about protein misfolding remain to be answered. For example, how do misfolded proteins evade cellular quality control mechanisms like chaperones to remain soluble but non-functional for long timescales? How long do misfolded states persist on average? How widespread is misfolding? Experiments indicate that misfolding can even be caused by synonymous mutations that alter the speed of protein translation but not the sequence of the protein produced (PMID: 23417067), introducing the additional puzzle of how the protein maintains a “memory” of its translation kinetics after synthesis is complete.

A series of four recent preprints (Preprints 1, 2, 3, and 4, see below) suggests that these questions can be answered by the partitioning of proteins into long-lived self-entangled conformations that are structurally similar to the native state but with perturbed function. Simulation of the synthesis, termination, and post-translational dynamics of a large dataset of E. coli proteins suggests that misfolding and entanglement are widespread, with two thirds of proteins misfolding some of the time (Preprint 1). Many misfolded conformations may bypass proteostasis machinery to remain soluble but non-functional due to their structural similarity to the native state. Critically, entanglement is associated with particularly long-lived misfolded states based on simulated folding kinetics.

Coarse-grain and all-atom simulation results indicate that these misfolded conformations interact with chaperones like GroEL and HtpG to a similar extent as does the native state (Preprint 2). These results suggest an explanation for why some protein always fails to refold while remaining soluble, even in the presence of multiple folding chaperones – it remains trapped in entangled conformations that resemble the native state and thus fail to recruit chaperones.

Finally, simulations indicate that changes to the translation kinetics of oligoribonuclease introduced by synonymous mutations cause a large change in its probability of entanglement at the dimerization interface (Preprint 3). These entanglements localized at the interface alter its ability to dimerize even after synthesis is complete. These simulations provide a structural explanation for how translation kinetics can have a long-timescale influence on protein behavior.

Together, these preprints suggest that misfolding into entangled conformations is a widespread phenomenon that may provide a consistent explanation for many unanswered question in molecular biology. It should be noted that entanglement is not exclusive to other types of misfolding, such as domain swapping, that may contribute to misfolding in cells. Experimental validation of the existence of entangled conformations is a critical aspect of testing this hypothesis; for comparisons between simulation and experiment, see Preprint 4.

Preprint 1: https://www.biorxiv.org/content/10.1101/2021.08.18.456613v1

Preprint 2: https://www.biorxiv.org/content/10.1101/2021.08.18.456736v1

Preprint 3: https://www.biorxiv.org/content/10.1101/2021.10.26.465867v1

Preprint 4: https://www.biorxiv.org/content/10.1101/2021.08.18.456802v1

2021 likely to be a bumper year for therapeutic antibodies entering clinical trials; massive increase in new targets

Earlier this month the World Health Organisation (WHO) released Proposed International Nonproprietary Name List 125 (PL125), comprising the therapeutics entering clinical trials during the first half of 2021. We have just added this data to our Therapeutic Structural Antibody Database (Thera-SAbDab), bringing the total number of therapeutic antibodies recognised by the WHO to 711.

This is up from 651 at the end of 2020, a year which saw 89 new therapeutic antibodies introduced to the clinic. This rise of 60 in just the first half of 2021 bodes well for a record-breaking year of therapeutics entering trials.

Continue reading

AlphaFold 2 is here: what’s behind the structure prediction miracle

Nature has now released that AlphaFold 2 paper, after eight long months of waiting. The main text reports more or less what we have known for nearly a year, with some added tidbits, although it is accompanied by a painstaking description of the architecture in the supplementary information. Perhaps more importantly, the authors have released the entirety of the code, including all details to run the pipeline, on Github. And there is no small print this time: you can run inference on any protein (I’ve checked!).

Have you not heard the news? Let me refresh your memory. In November 2020, a team of AI scientists from Google DeepMind  indisputably won the 14th Critical Assessment of Structural Prediction competition, a biennial blind test where computational biologists try to predict the structure of several proteins whose structure has been determined experimentally but not publicly released. Their results were so astounding, and the problem so central to biology, that it took the entire world by surprise and left an entire discipline, computational biology, wondering what had just happened.

Continue reading

Automated intermolecular interaction detection using the ODDT Python Module

Detecting intermolecular interactions is often one of the first steps when assessing the binding mode of a ligand. This usually involves the human researcher opening up a molecular viewer and checking the orientations of the ligand and protein functional groups, sometimes aided by the viewer’s own interaction detecting functionality. For looking at single digit numbers of structures, this approach works fairly well, especially as more experienced researchers can spot cases where the automated interaction detection has failed. When analysing tens or hundreds of binding sites, however, an automated way of detecting and recording interaction information for downstream processing is needed. When I had to do this recently, I used an open-source Python module called ODDT (Open Drug Discovery Toolkit, its full documentation can be found here).

My use case was fairly standard: starting with a list of holo protein structures as pdb files and their corresponding ligands in .sdf format, I wanted to detect any hydrogen bonds between a ligand and its native protein crystal structure. Specifically, I needed the number and name of the the interacting residue, its chain ID, and the name of the protein atom involved in the interaction. A general example on how to do this can be found in the ODDT documentation. Below, I show how I have used the code on PDB structure 1a9u.

Continue reading

The Smallest Allosteric System

Allostery is still a badly understood but very general mechanism in the protein world. In principle, an allosteric event occurs when a ligand (small or big) binds to a certain site of a protein and something (activity or function) changes at a different, distant site. A well-known example would be G-protein-coupled receptors that transport such an allosteric signal even across a membrane. But it does not have to be that far apart. As part of the Protein Folding and Dynamics series, I have recently watched a talk by Peter Hamm (Zurich) who presented work on an allosteric system that I thought was very interesting because it was small and most importantly, controllable.

PDZ domains are peptide-binding domains, often part of multi-domain proteins. For the work presented the researchers used the PDZ3 domain which is a bit special and has an additional (third) C-terminal α-helix (α3-helix) which is packing to the other side of the binding pocket. Previous work (Petit et al. 2009) had shown that removal of the α3-helix had changed ligand affinity but not PDZ structure, major changes were of an entropic nature instead. Peter Hamm’s group linked an azobenzene-derived photoswitch to that α3-helix; in its cis configuration stabilizing the α3-helix and destabilising in trans (see Figure 1).

Figure 1: PDZ3 domain (purple) and photoswitch (red) have different affinities for the peptide ligand (green), depending on the photoswitch’s isomerisation state (and temperature). From Bozovic, O., Jankovic, B. & Hamm, P. Sensing the allosteric force. Nat Commun 11, 5841 (2020). https://doi.org/10.1038/s41467-020-19689-7
Continue reading

The Coronavirus Antibody Database: 10 months on, 10x the data!

Back in May 2020, we released the Coronavirus Antibody Database (‘CoV-AbDab’) to capture molecular information on existing coronavirus-binding antibodies, and to track what we anticipated would be a boon of data on antibodies able to bind SARS-CoV-2. At the time, we had found around 300 relevant antibody sequences and a handful of solved crystal structures, most of which were characterised shortly after the SARS-CoV epidemic of 2003. We had no idea just how many SARS-CoV-2 binding antibody sequences would come to be released into the public domain…

10 months later (2nd March 2021), we now have tracked 2,673 coronavirus-binding antibodies, ~95% with full Fv sequence information and ~5% with solved structures. These datapoints originate from 100s of independent studies reported in either the academic literature or patent filings.

The entire contents CoV-AbDab database as of 2nd March 2021.
Continue reading