Monthly Archives: December 2016

Interesting Antibody Papers.

Below are several antibody papers that should be of interest to those dealing with antibody engineering, be it computational or experimental. The running motif in this post will be humanization, or the process of engineering a mouse antibody sequence which binds to a target to look ‘more human’ so as to reduce the immune response (if you need an early citation on this issue, here it is).

We present two papers which talk about antibody humanization directly, one from structural point of view (Choi et al. 2015), the other one highlighting issues facing antibody engineers mining for information (Martin & Rees, 2016). The third paper (Collins et al. 2015) takes a step back from the issues presented in the other papers and talks broadly about the nature of mouse sequences raised in the lab.

Humanization via structural means [here] (Bailey-Kellogg group). The authors introduce a novel methodology named CoDAH to facilitate humanization of antibodies. They design an approach which makes a tradeoff between sequence and structural humanization scores. The sequence score used is the Human String Content (Laza et al. 2007, Mol Immunol), which calculates how similar the query (murine) sequence is to short stretches of human sequences (mostly germilne). In line with the fact that T-Cells are one of main drivers of anti-biologics immunity, they define the sequences stretches to be 9-mer, as recognized by T-Cells. For the structural score, they use Rotameric energy as calculated by Amber. They demonstrate that constructs designed using their score express and retain affinity towards the target antigen, however they do not appear to prove that the new sequences are not immunogenic.

Extracting data from databases for humanization [here] (Martin group and Rees consulting). The main purpose of this manuscript is to warn potential antibody engineers of the pitfalls of species mis-annotations. They point out that in a routine ‘humanization’ pipeline where we aim to find human sequences given a mouse sequence, a great number of seemingly good ‘human’ templates are not human at all (sources as diverse as IMGT or PDB). This might lead to errors down the line if the engineer does not double check the annotations (unfortunate but true). Many of such annotations arise because the cells in which mouse antibodies are expressed are human cells or because the sequences are chimeric — in either case the annotation would not read mouse or chimeric, but erroneously ‘human’. NB. Another thing to watch in this publication is the fact that authors are working on a sequence database of their own: EMBLIG which is said to collect data from EMBL-ENA (nucleotide repository from EMBL). Hopefully in their database, authors will address the issues that they point out here.

What can we say about antibodies produces by laboratory mice? [here] (Collins group). Authors of this manuscript have addressed the issue that the now available High Throughput Sequencing (HTS) overlooked mouse repertoires. Different mice strains have different susceptibilities to diseases (Houpt, 2002, J Imunol; which might mean that you need to think twice which mice strain to choose for a given target). Currently known antibody repertoire of mice is based on the sequencing of two strains, BALB/c and C57BL/6. Here the authors apply HTS to two strains (BALB/c and C57BL/6) of laboratory mice (eight mice per strain) to get a better snapshot of antibody gene usage. Specifically, they pay close attention to the different genes combinations (VDJ) in the sequences that they obtain. Authors conclude that the repertoires between the two strains are strikingly different and quite restricted — which might mean that the laboratory mice were under very specific pressures (read inbred/overbred). All in all, the VDJ usage numbers that they produce in this publication are a useful reference to know which sequence combinations might be used by antibody engineers.

Addressing the Role of Conformational Diversity in Protein Structure Prediction

For my journal club last week, I chose to look at a recent paper entitled “Addressing the Role of Conformational Diversity in Protein Structure Prediction”, by Palopoli et al [1]. In the study of proteins, structures are incredibly useful tools, offering information about how they carry out their function, and allowing informed decisions to be made in many areas (e.g. drug design). Since the experimental determination is difficult, however, the computational prediction of protein structures has become very important (and a number of us here at OPIG work on this!).

A problem, however, in both experimental structure determination and computational structure prediction, is that proteins are generally treated as static – the output of an X-ray crystallography experiment is a single structure, and in the majority of cases the goal of structure prediction is to produce one model that closely resembles the native structure. The accuracy of structure prediction algorithms is also normally measured by comparing the resulting model to a single, known experimentally-determined structure. The issue here is that proteins are not static – they are constantly moving and may adopt a number of different conformations; the structure observed experimentally is just a snapshot of that motion. The dynamics of a protein may even play an important role in its function; an example is haemoglobin, which after binding to oxygen changes conformation to increase affinity for further binding. It may be more appropriate, then, to represent a protein as an ensemble of structures, and not just one.

Conformational diversity helps the protein haemoglobin carry out its function (the transportation of oxygen in the blood). Haemoglobin has four subunits, each containing a haem group, shown in red. When oxygen binds to this group (blue), a histidine residue moves, shifting the position of an alpha helix. This movement is propagated throughout the entire structure, and increases the affinity for oxygen of the other subunits – binding therefore becomes increasingly easy (this is known as co-operative binding). Gif shown is from the PDB-101 Molecule of the Month series: S. Dutta and D. Goodsell, doi:10.2210/rcsb_pdb/mom_2003_5

How, though, could this be incorporated into protein structure prediction? This is the question being considered by the authors of this paper. They consider conformational diversity by looking at different conformers of the same protein – there are many proteins whose structures have been solved experimentally multiple times, and as such have a number of structures available in the PDB. Information about this is stored in a useful database called CoDNaS [2], which was developed by some of the authors of the paper under discussion. In some cases, there are model (or decoy) structures available for these proteins, generated by various structure prediction algorithms – for example, all models submitted for the CASP experiments [3], where the current accuracy of structure prediction is monitored through blind prediction, are freely available for download. The authors curated a collection of decoy sets for 91 different proteins for which multiple experimental structures are present in the PDB.

As mentioned previously, the accuracy of a model is normally evaluated by measuring its structural similarity to one known (or reference) structure – only one conformer of the protein is considered. The authors show that the model rankings achieved by this are highly dependent on the chosen reference structure. If the possible choices (i.e. the observed conformers) are quite similar the effect is small, but if there is a large difference, then two completely different decoys could be designated as the most accurate depending on which reference structure is used.

The key figure from this paper, in my opinion, is the one shown below. For the two most dissimilar experimentally-observed conformers for each protein in the set, the RMSD of the best decoy in relation to one conformer is plotted against the RMSD of the best decoy when measured against the other:

The straight line on this graph indicates what would be observed if there are decoys in the set that equally represent the two conformers; for example, if the best decoy with reference to conformer 1 has an RMSD of 1 Å, then there is also a decoy that is 1 Å away from conformer 2. Most points are on or near this line – this means that the sets of decoy structures are not biased towards one of the conformers. Therefore, structure prediction algorithms seem to be able to generate models for multiple conformations of proteins, and so the production of an ensemble of models is not an impossible dream. Several obstacles remain, however – although of equal distance to both conformers, the decoys could still be of poor quality; and decoy selection is often inaccurate, and so finding these multiple conformations amongst all others is a challenge.

[1] – Palopoli, N., Monzon, A. M., Parisi, G., and Fornasari, M. S. (2016). Addressing the Role of Conformational Diversity in Protein Structure Prediction. PLoS One, 11, e0154923.

[2] – Monzon, A. M., Juritz, E., Fornasari, S., and Parisi, G. (2013). CoDNaS: a database of conformational diversity in the native state of proteins. Bioinformatics, 29, 2512–2514.

[3] – Moult, J., Pedersen, J. T., Judson, R., and Fidelis, K. (1995). A Large-Scale Experiment to Assess Protein Structure Prediction Methods. Proteins, 23, ii–iv.

Transgenic Mosquitoes

At the meeting on November 15 I have covered a paper by Gantz et al. describing a method for creating transgenic mosquitoes expressing antibodies hindering the development of malaria parasites.

The immune system is commonly divided into two categories: innate and adaptive. The innate immune system consists of non-specific defence mechanisms such as epithelial barriers, macrophages etc. The innate system is present in virtually every living organism. The adaptive immune system is responsible for invader-specific defence response. Is consists of B and T lymphocytes and encompasses antibody production. As only vertebrates posses the adaptive immune system, mosquitoes do not naturally produce antibodies which hinders their ability to defend themselves against pathogens such as malaria.

In the study by Gantz et al. the authors inserted transgenes expressing three single-chain Fvs (m4B7, m2A10 and m1C3) into the previously-characterised chromosomal docking sites.

Figure 1: The RT-PCR experiments showing the scFv expression in different mosquito strains

RT-PCR was used to detect scFv transcripts in RNA isolated from the transgenic mosquitoes (see Figure 1). The experiments showed that the attP 44-C recipient line allowed expression of the transgenes coding for the scFvs.

The authors evaluated the impact of the modifications on the fitness of the mosquitoes. It was shown that the transgene expression does not reduce the lifespan of the mosquitoes, or their ability to procreate.

Expression of the scFvs targeted the parasite at both the early and late development stages. The transgenic mosquitoes displayed a significant reduction in the number of malaria sporozoites per infected female, in most cases completely inhibiting the sporozoite development.

Overall the study showed that it is possible to develop transgenic mosquitoes that are resistant to malaria. If this method was combined with a mechanism for a gene spread, the malaria-resistant mosquitoes could be released into the environment, helping to fight the spread of this disease.

Interesting Antibody Papers

De Novo H3 prediction by C-terminal kink-biasing (Gray Lab) [here].

Authors introduce an improvement to the prediction of CDR-H3 in the form of a constraint for de-novo decoy generation. Working from the observation that 80% of CDR-H3 have kinked C-Terminal (Weitzner et al., 2015, Structure), they bias the loops to assume this conformation (they prove that it does not force ALL loops to do so!). The constraint is in the form of a pseudo bond angle between Ca for the three C-terminal residues and a pseudo dihedral angle for the three C-terminal residues and one adjacent residue in the framework. The bias takes the form of a penalty score if the generated angle falls outside mean +/- 1s. They use a quite stringent H3 loop benchmark of only 49 loops. Using this constraint on this dataset improves prediction for majority of the loops. They also demonstrate the utility of the score for full Fv homology modeling and Ab-Ag docking.

Therapeutic vs synthetic vs natural antibodies (Ofran Lab) [here].

The authors analyzed 137 Ab-Ag complexes from the PDB. Those from hybridoma and synthetic libraries were classified as ‘Natural’ and those coming from ‘synthetic’ libraries. They demonstrate that synthetic libraries overuse H3 in the number of contacts the antibody forms with the antigen, whereas natural constructs share the paratope with H1& H2 to a larger extent. This, together with their tool, CDRs analyzer (analysis of structural & biochemical properties of ab-ag complex) can be a useful method to inform the design of antibodies.

From the past: TABHU, tools for antibody humanization (Tramontano Lab) [here]. Authors have created a tool to aid antibody humanization. Given a sequence of an antibody, the system would look for the most suitable template from their extensive sequence databases (DIGIT) and germline sequences from IMGT. The templates are assessed on sequence similarity to the query and the similarity of the ‘binding’ mode which is assessed by their paratope prediction tool proABC. After the template had been chosen, the user can produce a structural model of the sequence.

The Emerging Disorder-Function Paradigm

It’s rare to find a paper that connects all of the diverse areas of research of OPIG, but “The rules of disorder or why disorder rules” by Gsponer and Babu (2009) is one such paper. Protein folding, protein-protein interaction networks, protein loops (Schlessinger et al., 2007), and drug discovery all play a part in this story. What’s great about this paper is that it gives numerous examples of proteins and the evidence supporting that they are partially or completely unstructured. These are the so-called intrinsically unstructured proteins or IUPs, although more recently they are also being referred to as intrinsically disordered proteins, or IDPs. Intrinsically disordered regions (IDRs) “are polypeptide segments that do not contain sufficient hydrophobic amino acids to mediate co-operative folding” (Babu, 2016).

Such proteins contradict the classic “lock and key” hypothesis of Fischer, and challenge Continue reading →

Oxford Protein Informatics Group

or "OPIG" to friends