Uniformly sampled 3D rotation matrices

It’s not as simple as you’d think.

If you want to skip the small talk, the code is at the bottom. Sampling 2D rotations uniformly is simple: rotate by an angle from the uniform distribution \theta \sim U(0, 2\pi). Extending this idea to 3D rotations, we could sample each of the three Euler angles from the same uniform distribution \phi, \theta, \psi \sim U(0, 2\pi). This, however, gives more probability density to transformations which are clustered towards the poles:

Sampling Euler angles uniformly does not give an even distribution across the sphere.

In Fast Random Rotation Matrices (James Avro, 1992), a method for uniform random 3D rotation matrices is outlined, the main steps being:

Continue reading

Making your python tool as easy to install as possible

Have you ever tried to use someone else’s code and spent a whole day trying to install it? Have you ever decided not to use a tool because installing it was a massive pain? Both of those have happened to me and, to be honest, it is a massive shame. The authors may spend large amounts of time developing these tools and in the end, no one uses them because they can’t get them to work. So I have decided to try and make all code I develop as easy and painless as possible to install and use.

Continue reading

Linux Horror Stories and Protection Spells (Volume I)

Don’t get me wrong. I love Linux. After many years of using it, I ended up appreciating how flexible, potent, and even beautiful it is. However, using Linux has never been a bed of roses and every single Linux user that I know has had to deal with many problems since the very beginning. Indeed, I still remember how frustrating installing my first Linux machine was, especially after realizing that my network card was not working. Had I given up, I would never have written this post.

Although many of the problems that I faced while using Linux are related to updates and drivers (how painful NVidia drivers updates can be, I will write another post about that in the future), I must recognize that on many other occasions I was the only one responsible for such problems. Consequently, I want to warn the reader against a couple of those mistakes I made in the past and provide some tips about how to deal with them.

My worst nightmare: rm –r * 

Continue reading

AlphaFold 2 is here: what’s behind the structure prediction miracle

Nature has now released that AlphaFold 2 paper, after eight long months of waiting. The main text reports more or less what we have known for nearly a year, with some added tidbits, although it is accompanied by a painstaking description of the architecture in the supplementary information. Perhaps more importantly, the authors have released the entirety of the code, including all details to run the pipeline, on Github. And there is no small print this time: you can run inference on any protein (I’ve checked!).

Have you not heard the news? Let me refresh your memory. In November 2020, a team of AI scientists from Google DeepMind  indisputably won the 14th Critical Assessment of Structural Prediction competition, a biennial blind test where computational biologists try to predict the structure of several proteins whose structure has been determined experimentally but not publicly released. Their results were so astounding, and the problem so central to biology, that it took the entire world by surprise and left an entire discipline, computational biology, wondering what had just happened.

Continue reading

OPIGlets go Kayaking

The 1st of July was the day that the OPIGlets went kayaking!

Brennan very kindly offered to guide a kayaking session from the Oxford University Canoeing and Kayaking Club (OUCKC). There was great uptake from the group, with 10 members joining for a paddle.

The first task was to find a kayak long enough for Jack’s legs. Once he managed to wedge himself in to the largest kayak available, we moved onto being pushed down the ramp one by one, hoping that this would not lead to an immediate capsize.

Continue reading

A to Z of Alternative Antibody Formats: Next-Generation Therapeutics

Do you know your diabodies from your zybodies?

Antibodies are a highly important class of therapeutics used to treat a range of diseases. Given their success as therapeutics, a wide variety of alternative antibody formats have been developed – these are driving the next generation of antibody therapeutics.

To note, this is not an exhaustive list but rather intended to demonstrate the range of existing antibody formats.

Inspired by this article in The Guardian: “Rachel Roddy’s A-Z of pasta

Figure 1. Alternative Antibody Formats
Many of these figures were adapted from Spiess et al., 2015. Additionally, some of these formats have multiple variations or further possible forms (e.g., trispecific antibodies) – in these cases, one example is given here.

A – Antibodies

Antibodies – a fitting place to start this post. Antibodies are proteins produced by our immune systems to detect and protect against foreign pathogens. The ability of antibodies to bind molecules strongly and specifically – properties essential to their role in our immune defence – also make them valuable candidates for therapeutics. Antibody therapies have been developed for the treatment of various diseases, including cancers and viruses, and form a market estimated at over $100 billion1.

Continue reading

One of my other hats – Covid-19 Response Director for UK research and innovation

The group asked me if I would tell them a little bit about one of my other hats at our regular Tuesday meeting, and this blog is about that.

In October 2019 I was seconded part-time to UKRI as the Deputy Executive Chair of the Engineering and Physical Sciences Research council (EPSRC). What is UKRI (UK research and Innovation)? It’s a non-departmental public body that funds research and innovation. It is made up of the seven disciplinary research councils (acronyms to please Tom – AHRC, BBSRC, EPSRC, ESRC, NERC, STFC and MRC), Research England, and the UK’s innovation agency, Innovate UK.

As Deputy Executive Chair of EPSRC I was helping with UKRI strategy, learning how a spending review round works, visiting universities to talk about how they could work better with UKRI – pretty much everything I was expecting to be doing. But like everyone, my world changed in early 2020.

Continue reading

How fast can a protein fold?

A protein’s folding time is the time required for it to reach its unique folded state starting from its unfolded ensemble. Globular, cytosolic proteins can only attain their intended biological function once they have folded. This means that protein folding times, which typically exceed the timescales of enzymatic reactions that proteins carry out by several orders of magnitude, are critical to determining when proteins become functional. Many scientists have worked tirelessly over the years to measure protein folding times, determine their theoretical bounds, and understand how they fit into biology. Here, I focus on one of the more interesting questions to fall out of this field over the years: how fast can a protein fold? Note that this is a very different question than asking “how fast do proteins fold?”

Continue reading

Out-of-distribution generalisation and scaffold splitting in molecular property prediction

The ability to successfully apply previously acquired knowledge to novel and unfamiliar situations is one of the main hallmarks of successful learning and general intelligence. This capability to effectively generalise is amongst the most desirable properties a prediction model (or a mind, for that matter) can have.

In supervised machine learning, the standard way to evaluate the generalisation power of a prediction model for a given task is to randomly split the whole available data set X into two sets – a training set X_{\text{train}} and a test set X_{\text{test}}. The model is then subsequently trained on the examples in the training set X_{\text{train}} and afterwards its prediction abilities are measured on the untouched examples in the test set X_{\text{test}} via a suitable performance metric.

Since in this scenario the model has never seen any of the examples in X_{\text{test}} during training, its performance on X_{\text{test}} must be indicative of its performance on novel data X_{\text{new}} which it will encounter in the future. Right?

Continue reading

Automated intermolecular interaction detection using the ODDT Python Module

Detecting intermolecular interactions is often one of the first steps when assessing the binding mode of a ligand. This usually involves the human researcher opening up a molecular viewer and checking the orientations of the ligand and protein functional groups, sometimes aided by the viewer’s own interaction detecting functionality. For looking at single digit numbers of structures, this approach works fairly well, especially as more experienced researchers can spot cases where the automated interaction detection has failed. When analysing tens or hundreds of binding sites, however, an automated way of detecting and recording interaction information for downstream processing is needed. When I had to do this recently, I used an open-source Python module called ODDT (Open Drug Discovery Toolkit, its full documentation can be found here).

My use case was fairly standard: starting with a list of holo protein structures as pdb files and their corresponding ligands in .sdf format, I wanted to detect any hydrogen bonds between a ligand and its native protein crystal structure. Specifically, I needed the number and name of the the interacting residue, its chain ID, and the name of the protein atom involved in the interaction. A general example on how to do this can be found in the ODDT documentation. Below, I show how I have used the code on PDB structure 1a9u.

Continue reading