Category Archives: Uncategorized

Interesting Antibody Papers

This time round, one older paper, one recent paper. The older one talks about estimating how many H3s can there be in a human body based on sequencing of two individuals (they cap it at 9 million — not that much!). The more recent one is an attempt to define what makes a good antibody in terms of its developability properties (a battery of biophys assays on 150 therapeutic antibodies- amazing dataset to work with).

High resolution description of antibody heavy chain repertoires in (two) humans (Koralov lab at NYU). Here. Two individuals were sequenced and their VDJ frequencies measured. It is widely believed that the VDJ recombination events are largely independent and random. Here however they demonstrate some biases/interplay between the D and J regions. Since H3 falls on the VDJ junction, it might suggest that it affects the total choice of H3. Another quite important point is that they compared the productive vs nonproductive sequences (out of frame or with stop codons). If there were significant differences between the VDJ frequencies of productive vs nonproductive sequences, it would suggest selection at the VDJ frequency stage. However they do not see any significant differences here suggesting that VDJ combinations have little bearing on this initial selection step. Finally, they estimate the number of H3 in repertoire. The technique is interesting — they sample 1000 H3s from their set and see how many unique sequences it contributes. Each next sample contributes less and less unique sequences which leads to a log-decay curve. By doing so they get a rough estimate of when there will be no more new sequences added and hence an estimate of diversity (think why do this rather than counting the number of uniques!). They allow themselves to extrapolate this estimate to the whole organism by multiplying their blood sample by the total human body volume — they motivate this extrapolation by the fact that there were precious little overlaps between the two human subjects.

Biophysical landscape of clinical stage antibodies [here]. Paper from Adimab. Designing an antibody which binding its target is only a first step on the way to bring the drug on the market. The molecule needs to fulfill a variety of characteristics such as colloidal stability (does not aggregate or ‘clump up’), does not instantly clear from the organism (which is usually down to off target binding), is stable and can be expressed in reasonable quantities. In an effort to delineate what makes a good antibody, the authors take inspiration from earlier work on small molecules, namely the Lipinski Rules of Five. This set of rules describes what makes a ‘good’ small molecule drug, which was assessed by looking at ~2000 therapeutic drugs. The rules came down to certain numbers of hydrogen bond donors, acceptors, molecular weight & lipophilicity. Therefore, Jain et al would like a similar methodology, but for antibodies: give me an antibody and using methodology/rules we define, we will tell you either to carry on with development or maybe not. Since antibodies are far more complex and the data on therapeutic abs orders of magnitude smaller (around 50 therapeutic abs to date) Jain et al, had to devise a more nuanced approach than simply counting hb donors/acceptors mass etc. The underlying ‘good’ molecule data though is similar: they picked therapeutic antibodies and those in late clinical testing stages (2,3). This resulted in ~150 antibodies. So as to devise the benchmark ‘rules/methodology’, they went for a battery of assays to serve as a benchmark — if your ab raises too many red flags according to these assays, it’s not great (what constitutes a red flag to be defined). These assays were supposed to not be obscure and relatively easy to use as the point was that an arbitrary antibody can be relatively easy checked against them. The assays are a range of expression, cross reactivity, self reactivity, thermal stability etc. To define red flags, they run their therapeutic/clinical antibodies through the tests. To their surprise quite a lot of these molecules turn out to have quite ‘undesirable characteristics’. Following the Lipinski Rules, they define a red flag as being in the bottom 10th percentile of the assay values as evaluated on the therapeutic abs. They show that the antibodies which are approved or in more advanced clinical trials stages have less red flags. Therefore, the take-home messages from this paper: very nice dataset for any computational work, raising red flags does not disqualify you from being a therapeutic.

Interesting Antibody Papers

Hints how broadly neutralizing antibodies arise (paper here). (Haynes lab here) Antibodies can be developed to bind virtually any antigen. There is a stark difference however between the ‘binding’ antibodies and ‘neutralizing’ antibodies. Binding antibodies are those that make contact with the antigen and perhaps flag it for elimination. This is in contrast to neutralizing antibodies, whose binding eliminates the biological activity of the antigen. A special class of such neutralizing antibodies are ‘broad neutralizing antibodies’. These are molecules which are capable of neutralizing multiple strains of the antigen. Such broadly neutralizing antibodies are very important in the fight against highly malleable diseases such as Influenza or HIV.

The process how such antibodies arise is still poorly understood. In the manuscript of Williams et al., they make a link between the memory and plasma B cells of broadly neutralizing antibodies and find their common ancestor. The common ancestor turned out to be auto-reactive, which might suggest that some degree of tolerance is necessary to allow for broadly neutralizing abs (‘hit a lot of targets fatally’). From a more engineering perspective, they create chimeras of the plasma and memory b cells and demonstrate that they are much more powerful in neutralizing HIV.

Ineresting data: their crystal structures are different broadly neutralizing abs co-crystallized with the same antigen (altought small…). Good set for ab-specific docking or epitope prediction — beyond the other case like that in the PDB (lysozyme)! At the time of writing the structures were still on hold in the PDB so watch this space…

Interesting Antibody Papers

Below are two somewhat recent papers that are quite relevant to those doing ab-engineering. The first one takes a look at antibodies as a collection — software which better estimates a diversity of an antibody repertoire. The second one looks at each residue in more detail — it maps the mutational landscape of an entire antibody, showing a possible modulating switch for VL-CL interface.

Estimating the diversity of an antibody repertoire. (Arnaout Lab) paper here. High Throughput Sequencing (or next generation sequencing…) of antibody repertoires allows us to get snapshots of the overall antibody population. Since the antibody population ‘diversity’ is key to their ability to find a binder to virtually any antigen, it is desirable to quantify how ‘diverse’ the sample is as a way to see how broad you need to cast the net. Firstly however, we need to know what we mean by ‘diversity’. One way of looking at it is akin to considering ‘species diversity’, studied extensively in ecology. For example, you estimate the ‘richness’ of species in a sample of 100 rabbits, 10 wolves and 20 sheep. Diversity measures such as Simpson’s index or entropy were used to calculate how biased the diversity is towards one species. Here the sample is quite biased towards rabbits, however if instead we had 10 rabbits, 10 wolves and 10 sheep, the ‘diversity’ would be quite uniform. Back to antibodies: it is desirable to know if a given species of an antibody is more represented than others or if one is very underrepresented. This might indicate healthy vs unhealthy immune system, indicate antibodies carrying out an immune response (when there is more of a type of antibody which is directing the immune response). Problem: in an arbitrary sample of antibody sequences/reads tell me how diverse they are. We should be able to do this by estimating the number of cell clones that gave rise to the antibodies (referred to as clonality). People have been doing this by grouping sequences by CDR3 similarity. For example, sequences with CDR3 identical or more than >95% identity, are treated as the same cell — which is tantamount to being the same ‘species’. However since the number of diverse B cells in a human organism is huge, HTS only provides a sample of these. Therefore some rarer clones might be underrepresented or missing altogether. To address this issue, Arnaout and Kaplinsky developed a methodology called Recon which estimates the antibody sample diversity. It is based on the expectation-maximization algorithm: given a list of species and their numbers, iterate adding parameters until they have a good agreement between the fitted distributions and the given data. They have validated this methodology firstly on the simulated data and then on the DeKosky dataset. The code is available from here subject to their license agreement.

Thorough analysis of the mutational landscape of the entire antibody. [here]. (Germaine Fuh from Affinta/Genentech/Roche). The authors aimed to see how malleable the variable antibody domains are to mutations by introducing all possible modifications at each site in an example antibody. As the subject molecule they have used high-affinity, very stable anti-VEGF antibody G6.31. They argue that this antibody is a good representative of human antibodies (commonly used genes Vh3, Vk1) and that its optimized CDRs might indicate well any beneficial distal mutations. They confirm that the positions most resistant to mutation are the core ones responsible for maintaining the structure of the molecule. Most notably here, they have identified that Kabat L83 position correlates with VL-CL packing. This position is most frequently a phenylalanine and less frequently valine or alanine. This residue is usually spatially close to isoleucine at position LC-106. They have defined two conformations of L83F — in and out:

  1. Out: -50<X1-100 interface.
  2. In: 50<X1<180

Being in either of these positions correlates with the orientation of LC-106 in the elbow region. This in turn affects how big the VL-CL interface is (large elbow angle=small  tight interface; small elbow angle=large interface). The L83 position often undergoes somatic hypermutation, as does the LC-106 with the most common mutation being valine.

CCP4 Study Weekend 2017: From Data to Structure

This year’s CCP4 study weekend focused on providing an overview of the process and pipelines available, to take crystallographic diffraction data from spot intensities right through to structure. Therefore sessions included; processing diffraction data, phasing through molecular replacement and experimental techniques, automated model building and refinement. As well as updates to CCP4 and where is crystallography going to take us in the future?

Surrounding the meeting there was also a session for Macromolecular (MX) crystallography users of Diamond Light Source (DLS), which gave an update on the beamlines, and scientific software, as well as examples of how fragment screening at DLS has been used. The VMXi (Versatile Macromolecular X-tallography in-situ) beamline is being developed to image crystals that are forming in situ crystallisation plates. This should allow for crystallography to be optimized, as crystallization conditions can be screened, and data collected on experiments as they crystallise, especially helpful in cases where crystallisation has routinely led to non-diffracting crystals. VXMm is a micro/nanofocus MX beamline, which is in development, with a focus to get crystallographic from very small crystals (~300nm to 10 micron diameters, with a bias to the smaller size), thereby allowing crystallography of targets that have previously been hard to get sufficient crystals. Other updates included how technology developed for fast solid state data collection on x-ray free electron lasers (XFEL) can be used on synchrotron beamlines.

A slightly more in-depth discussion of two tools presented that were developed for use alongside and within CCP4, which might be of interest more broadly:

ConKit: A python interface for contact prediction tools

Contact prediction for proteins, at its simplest, involves estimating which residues within a certain certain spatial proximity of each other, given the sequence of the protein, or proteins (for complexes and interfaces). Two major types of contact prediction exist:

  • Evolutionary Coupling
  • Supervised machine learning
    • Using ab initio structure prediction tools, without sequence homologues, to predict which contacts exist, but with a much lower accuracy than evolutionary coupling.

fullscreen

ConKit is a python interface (API) for contact prediction tools, consisting of three major modules:

  • Core: A module for constructing hierarchies, thereby storing necessary data such as sequences in a parsable format.
    • Providing common functionality through functions that for example declare a contact as a false positive.
  • Application: Python wrappers for common contact prediction and sequence alignment applications
  • I/O: I/O interface for file reading, writing and conversions.

Contact prediction can be used in the crystallographic structure determination field, during unconventional molecular replacement, using a tool such as AMPLE. Molecular replacement is a computational strategy to solve the phase problem. In the typical case, by using homologous structures to determine an estimate a model of the protein, which best fits the experimental diffraction intensities, and thus estimate the phase. AMPLE utilises ab initio modeling (using Rosetta) to generate a model for the protein, contact prediction can provide input to this ab initio modeling, thereby making it more feasible to generate an appropriate structure, from which to solve the phase problem. Contact prediction can also be used to analyse known and unknown structures, to identify potential functional sites.

For more information: Talk given at CCP4 study weekend (Felix Simkovic), ConKit documentation

ACEDRG: Generating Crystallographic Restraints for Ligands

Small molecule ligands are present in many crystallographic structures, especially in drug development campaigns. Proteins are formed (almost exclusively) from a sequence containing a selection of 20 amino acids, this means there are well known restraints (for example: bond lengths, bond angles, torsion angles and rotamer position) for model building or refinement of amino acids. As ligands can be built from a much wider selection of chemical moieties, they have not previously been restrained as well during MX refinement. Ligands found in PDB depositions can be used as models for the model building/ refinement of ligands in new structures, however there are a limited number of ligands available (~23,000). Furthermore, the resolution of the ligands is limited to the resolution of the macro-molecular structure from which they are extracted.

ACEDRG utilises the crystallorgraphy open database (COD), a library of (>300,000) small molecules usually with atomic resolution data (often at least 0.84 Angstrom), to generate a dictionary of restraints to be used in refining the ligand. To create these restraints ACEDRG utilises the RDkit chemoinformatics package, generating a detailed descriptor of each atom of the ligands in COD. The descriptor utilises properties of each atom including the element name, number of bonds, environment of nearest neighbours, third degree neighbours that are aromatic ring systems. The descriptor, is stored alongside the electron density values from the COD.  When a ACEDRG query is generated, for each atom in the ligand, the atom type is compared to those for which a COD structure is available, the nearest match is then used to generate a series of restraints for the atom.

ACEDRG can take a molecular description (SMILES, SDF MOL, SYBYL MOL2) of your ligand, and generate appropriate restraints for refinement, (atom types, bond lengths and angles, torsion angles, planes and chirality centers) as a mmCIF file. These restraints can be generated for a number of different probable conformations for the ligand, such that it can be refined in these alternate conformations, then the refinement program  can use local scoring criteria to select the ligand conformation that best fits the observed electron density. ACEDRG can accessed through the CCP4i2 interface, and as a command line interface.

Hopefully a useful insight to some of the tools presented at the CCP4 Study weekend. For anyone looking for further information on the CCP4 Study weekend: Agenda, Recording of Sessions, Proceedings from previous years.

Transgenic Mosquitoes

At the meeting on November 15 I have covered a paper by Gantz et al. describing a method for creating transgenic mosquitoes expressing antibodies hindering the development of malaria parasites.

The immune system is commonly divided into two categories: innate and adaptive. The innate immune system consists of non-specific defence mechanisms such as epithelial barriers, macrophages etc. The innate system is present in virtually every living organism. The adaptive immune system is responsible for invader-specific defence response. Is consists of B and T lymphocytes and encompasses antibody production. As only vertebrates posses the adaptive immune system, mosquitoes do not naturally produce antibodies which hinders their ability to defend themselves against pathogens such as malaria.

In the study by Gantz et al. the authors inserted transgenes expressing three single-chain Fvs (m4B7, m2A10 and m1C3) into the previously-characterised chromosomal docking sites.

Figure 1: The RT-PCR experiments showing the scFv expression in different mosquito strains

RT-PCR was used to detect scFv transcripts in RNA isolated from the transgenic mosquitoes (see Figure 1). The experiments showed that the attP 44-C recipient line allowed expression of the transgenes coding for the scFvs.

The authors evaluated the impact of the modifications on the fitness of the mosquitoes. It was shown that the transgene expression does not reduce the lifespan of the mosquitoes, or their ability to procreate.

Expression of the scFvs targeted the parasite at both the early and late development stages. The transgenic mosquitoes displayed a significant reduction in the number of malaria sporozoites per infected female, in most cases completely inhibiting the sporozoite development.

Overall the study showed that it is possible to develop transgenic mosquitoes that are resistant to malaria. If this method was combined with a mechanism for a gene spread, the malaria-resistant mosquitoes could be released into the environment, helping to fight the spread of this disease.

End of an era?

The Era of Crystallography ends…

For over 100 years, crystallography has been used to determine the atom arrangements of molecules; specifically, it has become the workhorse of routine macromolecular structure solution, being responsible for over 90% of the atomic structures in the PDB. Whilst this achievement is impressive, in some ways it has come around despite the crystallographic method, rather than because of it…

The problem, generally, is this: to perform crystallography, you need crystals. Crystals require the spontaneous assembly of billions of molecules into a regular repeated arrangement. For proteins — large, complex, irregularly shaped molecules — this is not generally a natural state for them to exist in, and getting a protein to crystallise can be a difficult process (the notable exception is Lysozyme, which it is difficult NOT to crystallise, and there are subsequently currently ~1700 crystal structures of it in the PDB). Determining the conditions under which proteins will crystallise requires extensive screening: placing the protein into a variety of difference solutions, in the hope that in one of these, the protein will spontaneously self-assemble into (robust, homogeneous) crystals. As for membrane proteins, which… exist in membranes, crystallisation solutions are sort of ridiculous (clever, but ridiculous).

But even once a crystal is obtained (and assuming it is a “good” well-diffracting crystal), diffraction experiments alone are generally not enough to determine the atomic structure of the crystal. In a crystallographic experiment, only half of the data required to solve the structure of the crystal is measured — the amplitudes. The other half of the data — the phases — are not measured. This constitutes the “phase problem” of crystallography, and “causes some problems”: developing methods to solve the phase problem is essentially a field of its own.

…and the Era of Cryo-Electron Microscopy begins

Cryo-electron microscopy (cryo-EM; primers here and here), circumnavigates both of the problems with crystallography described above (although of course it has some of its own). Single-particles of the protein (or protein complex) are deposited onto grids and immobilised, removing the need for crystals altogether. Furthermore, the phases can be measured directly, removing the need to overcome the phase problem.

Cryo-EM is also really good for determining the structures of large complexes, which are normally out of the reach of crystallography, and although cryo-EM structures used to only be determined at low resolution, this is changing quickly with improved experimental hardware.

Cryo-Electron Microscopy is getting better and better every day. For structural biologists, it seems like it’s going to be difficult to avoid it. However, for crystallographers, don’t worry, there is hope.

What happens to the Human Immune Repertoire over time?

Last week during the group meeting we talked about a pre-print publication from the Ippolito group in Austin TX (here). The authors were monitoring the antibody repertoire from Bone Marrow Plasma Cells (80% of circulating abs) over a period of 6.5 years. For comparison they have monitored another individual over the period of 2.3 years. In a nutshell, the paper is like Picture 1 — just with antibodies

This is what the paper talks about in a nutshell. How the antibody repertoire looks like taken at different timepoints in an individual's lifetime.

This is what the paper talks about in a nutshell. How the antibody repertoire looks like taken at different timepoints in an individual’s lifetime.

.

The main question that they aimed to answer was: ‘Is the Human Antibody repertoire stable over time‘? It is plausible to think that there should be some ‘ground distribution’ of antibodies that are present over time which act as a default safety net. However we know that the antibody makeup can change radically especially when challenged by antigen. Therefore, it is interesting to ask, does the immune repertoire maintain a fairly stable distribution or not?

Firstly, it is necessary to define what we consider a stable distribution of the human antibody repertoire. The antibodies undergo the VDJ recombination as well as Somatic Hypermutation, meaning that the >10^10 estimated antibodies that a human is capable of producing have a very wide possible variation. In this publication the authors mostly focused on addressing this question by looking at how the usage of possible V, D and J genes and their combinations changes over time.

Seven snapshots of the immune repertoire were taken from the individual monitored over 6.5 years and two from the individual monitored over 2.3 years. Looking at the usage of the V, D and J genes over time, it appears that the proportion in each of the seven time points appears quite stable (Pic 2). Authors claim similar result looking at the combinations.  This would suggest that our antibody repertoire is biased to sample ‘similar’ antibodies over time. These frequencies were compared to the individual who was sampled over the period of 2.3 years and it appears that the differences might not be great between the two.

How the frequencies of V, D  and J genes change (not) over 6.5 years in a single individual

How the frequencies of V, D and J genes change (not) over 6.5 years in a single individual

It is a very interesting study which hints that we (humans) might be sampling the antibodies from a biased distribution — meaning that our bodies might have developed a well-defined safety net which is capable of raising an antibody towards an arbitrary antigen. It is an interesting starting point and to further check this hypothesis, it would be necessary to carry out such a study on multiple individuals (as a minimum to see if there are really no differences between us — which would at the same time hint that the repertoire do not change over time).

 

Rational Design of Antibody Protease Inhibitors

On the theme of my research area and my last presentation I talked at group meeting about a another success story in structurally designing an antibody by replicating a general protein binding site using grafted fragments involved in the original complex. The paper by Liu et al is important to me for two major reasons . Firstly they used an unconventional antibody for protein design, namely a bovine antibody which is known to have an extended CDR H3. Secondly the fragment was not grafted at the anchor points of the CDR loop.

Screen Shot 2016-07-06 at 10.28.03

SFTI-1 is a cyclic peptide and a known trypsin inhibitor. It’s structure is stabilised by a disulphide bridge. The bovine antibody is known to have an extended H3 loop which is essentially a long beta strand stalk with a knob domain at the end. Liu et al removed the knob domain and a portion of the B strand and grafted the acyclic version of the SFTI-1 to it. As I said above this result is very important because it shows we can graft a fragment/loop at places different then the anchor points of the CDR. This opens up the possibility for more diverse fragments to be grafted because of new anchor points,  and also because the fragment will sit further away from the other CDRs and the framework allowing more conformational space. To see how the designed antibody compares to the original peptide they measured the Kd and found a 4 fold increase (9.57 vs 13.3). They hypothesise that this is probably due to the fact that the extended beta strand on the antibody keeps the acyclic SFTI-1 peptide in a more stable conformation.

The problem with the bovine antibody is that if inserted in a human subject it would probably elicit an immune response from the native immune system. To humanise this antibody they found the human framework which shares the greatest sequence identity to the bovine antibody and then grafted the fragment on it. The human antibody does not have an extended CDR H3 and to decide what is the best place of grafting they tried various combinations again showing again that the fragments do not need to grafted exactly at the anchor points. Some of the resulting  human antibodies showed even more impressive Kds.

Screen Shot 2016-07-06 at 10.54.24

The best designed human antibody had a 0.79nM Kd, another 10-fold gain . Liu et al hypothesised that this might be due to the fact that the cognate protein forms contacts with residues on the other CDRs even though there is no crystal structure to show this. In order to test this hypothesis they mutated surface residues on the H2 and L1 loop to Alanine which resulted in a 6.7 fold decrease in affinity. The only comment I would have to this is that the mutations to the other CDRs might have destabilized the other CDRs on the antibody which could be the reason for the decrease in affinity.

Quantifying dispersion under varying instrument precision

Experimental errors are common at the moment of generating new data. Often this type of errors are simply due to the inability of the instrument to make precise measurements. In addition, different instruments can have different levels of precision, even-thought they are used to perform the same measurement. Take for example two balances and an object with a mass of 1kg. The first balance, when measuring this object different times might record values of 1.0083 and 1.0091, and the second balance might give values of 1.1074 and 0.9828. In this case the first balance has a higher precision as the difference between its measurements is smaller than the difference between the measurements of balance two.

In order to have some control over this error introduced by the level of precision of the different instruments, they are labelled with a measure of their precision 1/\sigma_i^2 or equivalently with their dispersion \sigma_i^2 .

Let’s assume that the type of information these instruments record is of the form X_i=C + \sigma_i Z,  where Z \sim N(0,1) is an error term, X_i its the value recorded by instrument i and where C is the fixed true quantity of interest the instrument  is trying to measure. But, what if C is not a fixed quantity? or what if the underlying phenomenon that is being measured is also stochastic like the measurement X_i. For example if we are measuring the weight of cattle at different times, or the length of a bacterial cell, or concentration of a given drug in an organism, in addition to the error that arises from the instruments; there is also some noise introduced by dynamical changes of the object that is being measured. In this scenario, the phenomenon of interest, can be given  by a random variable Y \sim N(\mu,S^2). Therefore the instruments would record quantities of the form X_i=Y + \sigma_i Z.

Under this case, estimating the value of \mu, the expected state of the phenomenon of interest is not a big challenge. Assume that there are x_1,x_2,...,x_n values observed from realisations of the variables X_i \sim N(\mu, \sigma_i^2 + S^2), which came from n different instruments. Here \sum x_i /n is still a good estimation of \mu as E(\sum X_i /n)=\mu.  Now, a more challenging problem is to infer what is the underlying variability of the phenomenon of interest Y. Under our previous setup, the problem is reduced to estimating S^2 as we are assuming Y \sim N(\mu,S^2) and that the instruments record values of the from X_i=Y + \sigma_i Z.

To estimate S^2 a standard maximum likelihood approach could be used, by considering the likelihood function:

f(x_1,x_2,..,x_n)= \prod  e^{-1/2 \times (x_i-\mu)^2 /(\sigma_i^2+S^2)} \times 1/\sqrt{2 \pi (\sigma_i^2+S^2) },

from which the maximum likelihood estimator of S^2 is given by the solution to

\sum [(X_i- \mu)^2 - (\sigma_i^2 + S^2)] /(\sigma_i^2 + S^2)^2 = 0.

Another more naive approach could use the following result

E[\sum (X_i-\sum X_i/n)^2] = (1-1/n) \sum \sigma_i^2 + (n-1) S^2

from which \hat{S^2}= (\sum (X_i-\sum X_i/n)^2 - ( (1-1/n )  \sum(\sigma_i^2) ) ) / (n-1).

Here are three simulation scenarios where 200 X_i values are taken from instruments of varying precision or variance \sigma_i^2, i=1,2,...,200 and where the variance of the phenomenon of interest S^2=1500. In the first scenario \sigma_i^2 are drawn from [10,1500^2], in the second from [10,1500^2 \times 3] and in the third from [10,1500^2 \times 5]. In each scenario the value of S_2 is estimated 1000 times taking each time another 200 realisations of X_i. The values estimated via the maximum likelihood approach are plotted in blue, and the values obtained by the alternative method are plotted in red. The true value of the S^2 is given by the red dashed line across all plots.

disp1 First simulation scenario where \sigma_i^2, i=1,2,...,200 in [10,1500^2]. The values of  \sigma_i^2 plotted in the histogram to the right. The 1000 estimations of S are shown by the blue (maximum likelihood) and red (alternative) histograms.

disp2 First simulation scenario where \sigma_i^2, i=1,2,...,200 in [10,1500^2 \times 3]. The values of \sigma_i^2 plotted in the histogram to the right. The 1000 estimations of S are shown by the blue (maximum likelihood) and red (alternative) histograms.

First simulation scenario where \sigma_i^2, i=1,2,...,200 in [10,1500^2 \times 5]. The values of \sigma_i^2 plotted in the histogram to the right. The 1000 estimations of S are shown by the blue (maximum likelihood) and red (alternative) histograms.

For recent advances in methods that deal with this kind of problems, you can look at:

Delaigle, A. and Hall, P. (2016), Methodology for non-parametric deconvolution when the error distribution is unknown. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 78: 231–252. doi: 10.1111/rssb.12109

Conformational Variation of Protein Loops

Something many structural biologists (including us here in OPIG!) are guilty of is treating proteins as static, rigid structures. In reality, proteins are dynamic molecules, transitioning constantly from  conformation to conformation. Proteins are therefore more accurately represented by an ensemble of structures instead of just one.

In my research, I focus on loop structures, the regions of a protein that connect elements of secondary structure (α-helices and β-sheets). There are many examples in the PDB of proteins with identical sequences, but whose loops have different structures. In many cases, a protein’s function depends on the ability of its loops to adopt different conformations. For example, triosephosphate isomerase, which is an important enzyme in the glycolysis pathway, changes conformation upon ligand binding, shielding the active site from solvent and stabilising the intermediate compound so that catalysis can occur efficiently. Conformational variability helps triosephosphate isomerase to be what is known as a ‘perfect enzyme’; catalysis is limited only by the diffusion rate of the substrate.

Structure of the triosephosphate isomerase enzyme. When the substrate binds, a loop changes from an ‘open’ conformation (pink, PDB entry 1TPD) to a ‘closed’ one (green, 1TRD), which prevents solvent access to the active site and stabilises the intermediate compound of the reaction.

Structure of the triosephosphate isomerase enzyme. When the substrate binds, a loop changes from an ‘open’ conformation (pink, PDB entry 1TPD) to a ‘closed’ one (green, 1TRD), which prevents solvent access to the active site and stabilises the intermediate compound of the reaction.

An interesting example, especially for some of us here at OPIG, is the antibody SPE7. SPE7 is multispecific, meaning it is able to bind to multiple unrelated antigens. It achieves this through conformational diversity. Four binding site conformations have been found, two of which can be observed in its unbound state in equilibrium – one with a flat binding site, and another with a deep, narrow binding site [1].

An antibody that exists as two different structures in equilibrium - one with a shallow binding site (left, blue, PDB code 1OAQ) and one with a deep, narrow cleft (right, green, PDB 1OCW). Complementary determining regions are coloured in each case.

SPE7; an antibody that exists as two different structures in equilibrium – one with a shallow binding site (left, blue, PDB code 1OAQ) and one with a deep, narrow cleft (right, green, PDB 1OCW). Complementary determining regions are coloured in each case.

So when you’re dealing with crystal structures, beware! X-ray structures are averages – each atom position is an average of its position across all unit cells. In addition, thanks to factors such as crystal packing, the conformation that we see may not be representative of the protein in solution. The examples above demonstrate that the sequence/structure relationship is not as clear cut as we are often lead to believe. It is important to consider dynamics and conformational diversity, which may be required for protein function. Always bear in mind that the static appearance of an X-ray structure is not the reality!

[1] James, L C, Roversi, P and Tawfik, D S. Antibody Multispecificity Mediated by Conformational Diversity. Science (2003), 299, 1362-1367.