Category Archives: Publications

Ten Simple Rules for a Successful Cross-Disciplinary Collaboration

The name of our research group (Oxford Protein Informatics Group) already indicates its cross-disciplinary character. In doing research of this type, we can acquire a lot of experience in working across the boundaries of research fields. Recently, one of our group members, Bernhard Knapp, became lead author of an article about guidelines for cross-disciplinary research. This article describes ten simple rules which you should consider if working across several disciplines. They include going to the other lab in person, understanding different rewards models, having patience with the pace of other disciplines, and recognising the importance of synergy.

The ten rules article was even picked up by a journalist of the “Times Higher Education” and further discussed in the newspaper.

Happy further interdisciplinary work!

SAS-5 assists in building centrioles of nematode worms Caenorhabditis elegans

We have recently published a paper in eLife describing the structural basis for the role of protein SAS-5 in initiating the formation of a new centriole, called a daughter centriole. But why do we care and why is this discovery important?

We, as humans – a branch of multi-cellular organisms, are in constant demand of new cells in our bodies. We need them to grow from an early embryo to adult, and also to replace dead or damaged cells. Cells don’t just appear from nowhere but undergo a tightly controlled process called cell cycle. At the core of cell cycle lies segregation of duplicated genetic material into two daughter cells. Pairs of chromosomes need to be pulled apart millions of millions times a day. Errors will lead to cancer. To avoid this apocalyptic scenario, evolution supplied us with centrioles. Those large molecular machines sprout microtubules radially to form characteristic asters which then bind to individual chromosomes and pull them apart. In order to achieve continuity, centrioles duplicate once per cell cycle.

Similarly to many large macromolecular assemblies, centrioles exhibit symmetry. A few unique proteins come in multiple copies to build this gigantic cylindrical molecular structure: 250 nm wide and 500 nm long (the size of a centriole in humans). The very core of the centriole looks like a 9-fold symmetrical stack of cartwheels, at which periphery microtubules are vertically installed. We study protein composition of this fascinating structure in the effort to understand the process of assembling a new centriole.

Molecular architecture of centrioles.

SAS-5 is an indispensable component in C. elegans centriole biogenesis. SAS-5 physically associates with another centriolar protein, called SAS-6, forming a complex which is required to build new centrioles. This process is regulated by phosphorylation events, allowing for subsequent recruitment of SAS-4 and microtubules. In most other systems SAS-6 forms a cartwheel (central tube in C. elegans), which forms the basis for the 9-fold symmetry of centrioles. Unlike SAS-6, SAS-5 exhibits strong spatial dynamics, shuttling between the cytoplasm and centrioles throughout the cell cycle. Although SAS-5 is an essential protein, depletion of which completely terminates centrosome-dependent cell division, its exact mechanistic role in this process remains obscure.

IN BRIEF: WHAT WE DID
Using X-ray crystallography and a range of biophysical techniques, we have determined the molecular architecture of SAS-5. We show that SAS-5 forms a complex oligomeric structure, mediated by two self-associating domains: a trimeric coiled coil and a novel globular dimeric Implico domain. Disruption of either domain leads to centriole duplication failure in worm embryos, indicating that large SAS-5 assemblies are necessary for function. We propose that SAS-5 provides multivalent attachment sites that are critical for promoting assembly of SAS-6 into a cartwheel, and thus centriole formation.

For details, check out our latest paper 10.7554/eLife.07410!

@kbrogala

Top panel: cartoon overview of the proposed mechanism of centriole formation. In cytoplasm, SAS-5 exists at low concentrations as a dimer, and each of those dimers can stochastically bind two molecules of SAS-6. Once SAS-5 / SAS-6 complex is targeted to the centrioles, it starts to self-oligomerise. Such self-oligomerisation of SAS-5 allows for the attached molecules of SAS-6 to form a cartwheel. Bottom panel: detailed overview of the proposed process of centriole formation. In cytoplasm, where concentration of SAS-5 is low, the strong Implico domain (SAS-5 Imp, ZZ shape) of SAS-5 holds the molecule in a dimeric form. Each SAS-5 protomer can bind (through the disordered linker) to the coiled coil of dimeric SAS-6. Once SAS-5 / SAS-6 complex is targeted to the site where a daughter centriole is to be created, SAS-5 forms higher-order oligomers through self-oligomerisation of its coiled coil domain (SAS-5 CC – triple horizontal bar). Such large oligomer of SAS-5 provides multiple attachments sites for SAS-6 dimers in a very confied space. This results in a burst of local concentration of SAS-6 through the avidity effect, allowing an otherwise weak oligomer of SAS-6 to also form larger species. Effectively, this seeds the growth of a cartwheel (or a spiral in C. elegans), which in turn serves as a template for a new centriole.

Natural Move Monte Carlo: Sampling Collective Motions in Proteins

Protein and RNA structures are built up in a hierarchical fashion: from linear chains and random coils (primary) to local substructures (secondary) that make up a subunit’s 3D geometry (tertiary) which in turn can interact with additional subunits to form homomeric or heteromeric multimers (quaternary). The metastable nature of the folded polymer enables it to carry out its function repeatedly while avoiding aggregation and degradation. These functions often rely on structural motions that involve multiple scales of conformational changes by moving residues, secondary structure elements, protein domains or even whole subunits collectively around a small set of degrees of freedom.

The modular architecture of antibodies, makes them amenable to act as an example for this phenomenon. Using MD simulations and fluorescence anisotropy experiments Kortkhonjia et al. observed that Ig domain motions in their antibody of interest were shown to correlate on two levels: 1) with laterally neighbouring Ig domains (i.e. VH with VL and CH1 with CL) and 2) with their respective Fab and Fc regions.

Correlated motion between all residue pairs of an antibody during an MD simulation. The axes identify the residues whereas the colours light up as the correlation in motion increases. The individual Ig domains as well as the two Fabs and the Fc can be easily identified. ref: Kortkhonjia, et al., MAbs. Vol. 5. No. 2. Landes Bioscience, 2013.

This begs the question: Can we exploit these molecular properties to reduce dimensionality and overcome energy barriers when sampling the functional motions of metastable proteins?

In 2012 Sim et al. have published an approach that allows for the incorporation of these collective motions (they call them “Natural Moves”) into simulation. Using simple RNA model structures they have shown that explicitly sampling large structural moves can significantly accelerate the sampling process in their Monte Carlo simulation. By gradually introducing DOFs that propagate increasingly large substructures of the molecule they managed to reduce the convergence time by several orders of magnitude. This can be ascribed to the resulting reduction of the search space that narrows down the sampling window. Instead of sampling all possible conformations that a given polynucleotide chain may take, structural states that differ from the native state predominantly in tertiary structure are explored.

Reducing the conformational search space by introducing Natural Moves. A) Ω1 (residue-level flexibility) represents the cube, Ω2 (collective motions of helices) spans the plane and Ω3 (collective motions of Ω2 bodies) is shown as a line. B) By integrating multiple layers of Natural Moves the dimensionality is reduced. ref: Sim et al. (2012). PNAS 109(8), 2890–5. doi:10.1073/pnas.1119918109

It is important to stress, however, that in addition to these rigid body moves local flexibility is maintained by preserving residue level flexibility. Consequently, the authors argue, high energy barriers resulting from large structural rearrangements are reduced and the resulting energy landscape is smoothened. Therefore, entrapment in local energy minima becomes less likely and the acceptance rate of the Monte Carlo simulation is improved.

Although benchmarking of this method has mostly relied on case studies involving model RNA structures with near perfect symmetry, this method has a natural link to near-native protein structure sampling. Similarly to RNA, proteins can be decomposed into local substructures that may be responsible for the main functional motions in a given protein. However, due to the complexity of protein motion and limited experimental data we have a limited understanding of protein dynamics. This makes it a challenging task to identify suitable decompositions. As more dynamic data emerges from biophysical methods such as NMR spectroscopy and databases such as www.dynameomics.org are extended we will be able to better approximate protein motions with Natural Moves.

In conclusion, when applied to suitable systems and when used with care, there is an opportunity to breathe life into the static macromolecules of the pdb, which may help to improve our understanding of the heterogeneous structural landscape and the functional motions of metastable proteins and nanomachines.

Journal Club: Human Germline Antibody Gene Segments Encode Polyspecific Antibodies

This week’s paper by Willis et al. sought to investigate how our limited antibody-encoding gene repertoire has the ability to recognise the unlimited array of antigens. There is a finite number of V, D, and J genes that encode our antibodies, but it still has the capacity to recognise an infinite number of antigens. Simply, the authors’ notion is that an antibody from the germline (via V(D)J recombination; see entry by James) is able to adopt multiple conformations, thus allowing the antibody to bind multiple antigens.

Three antibodies derived from the germline gene 5*51-01, all binding to very different antigens.

Three antibodies derived from the germline gene 5*51-01 bind to very different antigens.

To test this hypothesis, the authors performed a multiple sequence alignment for the amino acid sequence between the mature antibodies and the germline antibody sequence from which the antibodies are derived from. if a single position from ONE mature antibody showed a difference to the germline sequence, it was identified as a ‘variable’ position, and allowed to be changed by Rosetta’s multi-state design (MSD) and single-state design (SSD) protocols.

Pipeline: align mature antibodies (2XWT, 2B1A, 3HMX) to the germline sequence (5-51) , identify 'variable' positions from the alignment, then allow Rosetta to change those residues during design.

Figure 1) from Willis et al., showing the pipeline: align mature antibodies (2XWT, 2B1A, 3HMX) to the germline sequence (5-51) , identify ‘variable’ positions from the alignment, then allow Rosetta to change those residues.

Surprisingly, without any prior information of the germline sequence, the MSD yielded a sequence that was closer to the germline sequence, and the SSD for each mature antibody had retained the mature sequence. In short, this indicated that the germline sequence is a harmonising sequence that can accommodate the conformations of each of the mature antibodies (as proven by MSD), whereas the mature sequence was the lowest energy amino acid sequence for the particular antibody’s conformation (as proven by SSD).

To further demonstrate that the germline sequence is indeed the more ‘flexible’ sequence, the authors then aligned the mature antibodies and determined the deviation in ψ-ϕ angles at each of the variable positions that were used in the Rosetta study. They found that the ψ-ϕ angle deviation in the positions that recovered to the germline residue was much larger than the other variable positions along the antibody. In other words, for the positions that tend to return to the germline amino acid in MSD, the ψ-ϕ angles have a much larger degree of variation compared to the other variable positions, suggesting that the positions that returned to the germline amino acid are prone to lots of movement.

In addition to the many results that corroborate the findings mentioned in this entry, it’s neat that the authors took a ‘backwards’ spin to conventional antibody design. Most antibody design regimes aim to find amino acid(s) that give the antibody more ‘rigidity’, and hence, mature its affinity, but this paper went against the norm to find the most FLEXIBLE antibody (the most likely germline predecessor*). Effectively, they argue that this type of protocol can be exported to extract new antibodies that can bind to multiple antigens, thus increasing the versatility of antibodies as potential therapeutic agents.

[Publication] Arginine Methylation-Dependent Reader-Writer Interplay Governs Growth Control by E2F-1

If you are familiar with the reader, writer & eraser concepts or you are passionate about epigenetics and arginines, this recent publication might be of interest to you. The study addresses the transcription factor E2F-1, which plays a crucial role in the control of cell cycle and is linked with cancer. Like Yin-yang, it has opposing functional roles: to promote cell-cycle progression and to induce apoptosis. The results demonstrate that the biological outcome of E2F-1 activity is affected by arginine methylation marks. While asymmetric arginine methylation causes apoptosis, the symmetrical methylation results in proliferation. This reader-writer interplay determined by the two types of marks governs the function of E2F-1 and potentially the fate of the cell.

[Database] SAbDab – the Structural Antibody Database

An increasing proportion of our research at OPIG is about the structure and function of antibodies. Compared to other types of proteins, there is a large number of antibody structures publicly available in the PDB (approximately 1.8% of structures contain an antibody chain). For those of us working in the fields of antibody structure prediction, antibody-antigen docking and structure-based methods for therapeutic antibody design, this is great news!

However, we find that these data are not in a standard format with respect to antibody nomenclature. For instance, which chains are “heavy” chains and which are “light“? Which heavy and light chains pair? Is there an antigen present? If so, to which H-L pair does it bind to? Which numbering system is used … etc.

To address this problem, w e have developed SAbDab: the Structural Antibody Database. Its primary aim is for easy creation of antibody structure and antibody-antigen complex datasets for further analysis by researchers such as ourselves. These sets can be selected using a number of criteria (e.g. experimental method, species, presence of constant domains…) and redundancy filters can be applied over the sequences of both the antibody and antigen. Thanks to Jin, SAbDab now also includes associated curated affinity (Kd) values for around 190 antibody-antigen complexes. We hope this will serve as a benchmarking tool for antibody-antigen docking prediction algorithms.

Alternatively, the database can be used to inspect and compare properties of individual structures. For instance, we have recently published a method to characterise the orientation between the two antibody variable domains, VH and VL. Using the ABangle tool, users can select structures with a particular VH-VL orientation, visualise and quantify conformational changes (e.g. between bound and unbound forms) and inspect the pose of structures with certain amino acids at specific positions. Similarly, the CDR (complimentary determining region) search and clustering tools, allow for the antibody hyper-variable loops to be selected by length, type and canonical class and their structures visualised or downloaded.

SAbDab also contains features such as the template search. This allows a user to submit the sequence of either an antibody heavy or light chain (or both) and to find structures in the database that may offer good templates to use in a homology modelling protocol. Specific regions of the antibody can be isolated so that structures with a high sequence identity over, for example, the CDR H3 loop can be found. SAbDab’s weekly automatic updates ensures that it contains the latest available data. Using each method of selection, the structure, a standardised and re-numbered version of the structure, and a summary file containing information about the antibody, can be downloaded both individually or en-masse as a dataset. SAbDab will continue to develop with new tools and features and is freely available at: opig.stats.ox.ac.uk/webapps/sabdab.

[Publication] Cloud computing in Molecular Modelling – a topical perspective

My ex-InhibOx colleagues (Simone Fulle, Garrett Morris, Paul Finn) and myself have recently published a topical review on “The emerging role of cloud computing in molecular modelling” in the Journal of Molecular Graphics and Modelling. This paper starts with a gentle and in-depth introduction to the field of cloud computing. The second part of the paper is how it applies to molecular modelling (and the sort of tasks we can run in the cloud). The third and last part presents two practical case studies of cloud computations, one of which describes how we built a virtual library to use in virtual screening on AWS.

We hope that after reading this article the cloud will become a less nebulous affair! *pun intended*

As an addendum, I recently came across this paper “Teaching cloud computing: A software engineering perspective” (2013) on how to teach cloud computing at a graduate level. This work is relevant, because lots of universities are presently including cloud computing in their curricula.

[Publication] Effect of Single Amino Acid Substitution Observed in Cancer on Pim-1 Kinase Thermodynamic Stability and Structure

In this study we selected point mutations resulting in Pim-1 variants that are expressed in cancer tissues and reported in SNP databases, such as FastSNP and COSMIC. These Pim-1 variants have been comprehensively characterized to investigate the effect of single amino acid substitution on Pim-1 thermal and thermodynamic stability and structure in solution. Our results indicate that the effects of the mutation observed in cancer tissues cause local changes of tertiary structure, but do not affect binding to type I kinase inhibitors.

This work has been pioneered by researches at the Department of Biochemical Sciences “A. Rossi Fanelli”, Sapienza University of Rome and served as an inspiration for one of my thesis chapters.

[Publication] Memoir: template-based structure prediction for membrane proteins

Congratulations to all involved in the Memoir publication in Nucleic Acids Research!

Memoir is a web server which builds homology models for membrane proteins. It is a web-enabled workflow combining some of OPIG’s software; MP-T, IMembrane, Medeller & Fread. The inputs are a sequence of the membrane protein you wish to model (target) and a PDB file to use as template.

Memoir may be found here and there is also a video tutorial narrated by Jamie. There is even a funny blooper of him practising, which I kept to celebrate this moment.

Happy modelling!