If you look at some of the biggest technology companies in the world, from Google and Facebook to hardware companies like Dell or even biotech unicorns like Oxford’s own Oxford Nanopore, all of them started on university campuses. If you are a researcher interested in finding out how to make the first steps to commercialise your research here is a quick guide:
Continue readingPeer Review: reviewing as an early career researcher
Peer review is an important component of academic research and publishing, but it can feel like an opaque process, especially for those not directly involved. I am very fortunate to have been able to participate in the peer review of multiple papers, despite being very early in my career, through support from my supervisors and a mentoring program run by Sense about Science with Nature Communications. Here are some of the things I have learned.
Continue readingThe Coronavirus Antibody Database: 10 months on, 10x the data!
Back in May 2020, we released the Coronavirus Antibody Database (‘CoV-AbDab’) to capture molecular information on existing coronavirus-binding antibodies, and to track what we anticipated would be a boon of data on antibodies able to bind SARS-CoV-2. At the time, we had found around 300 relevant antibody sequences and a handful of solved crystal structures, most of which were characterised shortly after the SARS-CoV epidemic of 2003. We had no idea just how many SARS-CoV-2 binding antibody sequences would come to be released into the public domain…
10 months later (2nd March 2021), we now have tracked 2,673 coronavirus-binding antibodies, ~95% with full Fv sequence information and ~5% with solved structures. These datapoints originate from 100s of independent studies reported in either the academic literature or patent filings.

To Pickle, Or Not To Pickle? — Quickle!
Pickling in Python can be dangerous.

That’s where Quickle
comes in — as long as you’re using Python 3.8 or later…
On The Logic of GOing with Weisfeiler-Lehman
Recently, I was able to attend Martin Grohe’s talk on The Logic of Graph Neural Networks. Professor Grohe of RWTH Aachen University, is a titan of the fields of Logic and Complexity theory. Even so, he is modest about his achievements, and I was tickled when it was pointed out to me that the theorem he refers to as “a little complex”, one of his crowning achievements, involves a four-hundred page long book of a proof.
The theorem relates to the Weisfeiler-Lehmann (WL) algorithm, an algorithm for determining whether two graphs are equivalent (i.e. isomorphic). The algorithm has deep connections with combinatorics, complexity theory and first order logic. A system of logic that is remarkably similar to the relations present in ontologies such as the Gene Ontology (GO), which is commonly used to compare and predict protein function. Kernelised methods and other WL-based metrics present a new and possibly logically “complete” way to potentially compare the functions of proteins and infer their similarity.

Ten Simple Rules For Solving Any Problem
Welcome! Take three deep breaths, each time expel the air through your nose with force. Now you are ready for this adventure. Let us dive right in and reflect on the premise of this blog post.
I, personally, dislike the word “solve”. What does it even mean? And that is already the second time I have used that word. The word solve implies all kinds of nonsense, such as completion or the existence of a solution. Let us recast it as: new insights, positive reframing or simply “ah-ha!”. These “problems” can be anything too: emotional ones (external or internal), scientific or research ones, artistic ones, writing ones. If you feel like it, just call it a problem.
Whoever is writing this blog post, he certainly does talk a lot … We should really get going or we will run out of time. You know what, let us start again. Welcome to:
Continue readingPlotly for interactive 3D plotting
An recently wrote a post on how to use the seaborn library. I really like seaborn and use it a lot for 2D plots. However, recently I have been dealing with 3D data and have found plotly to be best. When used in a jupyter notebook, it allows you to easily generate 3D interactive plots. This is extremely useful to visualize structural data.
Ribosome occupancy profiles are conserved between structurally and evolutionarily related yeast domains
Shameless plug for any OPIG blog readers to take a look at our recent publication in Bioinformatics. Consider giving it a read if the below summary grabs your attention.
Many proteins are now known to fold during their synthesis through the process known as co-translational folding. Translation is an inherently non-equilibrium process – one consequence of this fact is that the speed of translation can radically influence the ability of proteins to fold and function. In this paper we compare ribosome occupancy profiles between related domains in yeast to test the hypothesis that evolutionarily related proteins with similar native folds should tend to have similar translation speed profiles to preserve efficient co-translational folding. We find strong evidence in support of this hypothesis at the level of individual protein domains and across a set of 664 pairs of related domains for which we are able to compute high-quality ribosome occupancy profiles.
To find out more, view the Advance Article at Bioinformatics.
Seaborn 101
Seaborn is a Python-based data visualization library, which is based on matplotlib (https://seaborn.pydata.org/) . I would like to share some guidance/code to get started with drawing plots using this library! I will be using the dataset ‘flights’ from Seaborn (https://github.com/mwaskom/seaborn-data) to highlight an example.
Continue readingCalculating symmeterised small molecule RMSDs using graph automorphisms in python with GEMMI and NetworkX
When a ring flips, how do we calculate RMSD?
This surprisingly simple question leads to a very interesting problem! If we take a benzene molecule, say, and rotate it 180 degrees, then we have the exact same molecule, but if we have a data structure in which our atoms are labelled, and we apply the same transformation to the atomic positions, the numbering does not reflect that symmetry. If we were then naively to calculate the RMSD it would be huge, despite the fact that the molecule is, chemically speaking, identical.
How can we make our RMSD calculations reflect these symmetries?
Continue reading