Commercialising your research: Where to start?

If you look at some of the biggest technology companies in the world, from Google and Facebook to hardware companies like Dell or even biotech unicorns like Oxford’s own Oxford Nanopore, all of them started on university campuses. If you are a researcher interested in finding out how to make the first steps to commercialise your research here is a quick guide:

Continue reading

Peer Review: reviewing as an early career researcher

Peer review is an important component of academic research and publishing, but it can feel like an opaque process, especially for those not directly involved. I am very fortunate to have been able to participate in the peer review of multiple papers, despite being very early in my career, through support from my supervisors and a mentoring program run by Sense about Science with Nature Communications. Here are some of the things I have learned.

Continue reading

The Coronavirus Antibody Database: 10 months on, 10x the data!

Back in May 2020, we released the Coronavirus Antibody Database (‘CoV-AbDab’) to capture molecular information on existing coronavirus-binding antibodies, and to track what we anticipated would be a boon of data on antibodies able to bind SARS-CoV-2. At the time, we had found around 300 relevant antibody sequences and a handful of solved crystal structures, most of which were characterised shortly after the SARS-CoV epidemic of 2003. We had no idea just how many SARS-CoV-2 binding antibody sequences would come to be released into the public domain…

10 months later (2nd March 2021), we now have tracked 2,673 coronavirus-binding antibodies, ~95% with full Fv sequence information and ~5% with solved structures. These datapoints originate from 100s of independent studies reported in either the academic literature or patent filings.

The entire contents CoV-AbDab database as of 2nd March 2021.
Continue reading

On The Logic of GOing with Weisfeiler-Lehman

Recently, I was able to attend Martin Grohe’s talk on The Logic of Graph Neural Networks. Professor Grohe of RWTH Aachen University, is a titan of the fields of Logic and Complexity theory. Even so, he is modest about his achievements, and I was tickled when it was pointed out to me that the theorem he refers to as “a little complex”, one of his crowning achievements, involves a four-hundred page long book of a proof.

The theorem relates to the Weisfeiler-Lehmann (WL) algorithm, an algorithm for determining whether two graphs are equivalent (i.e. isomorphic). The algorithm has deep connections with combinatorics, complexity theory and first order logic. A system of logic that is remarkably similar to the relations present in ontologies such as the Gene Ontology (GO), which is commonly used to compare and predict protein function. Kernelised methods and other WL-based metrics present a new and possibly logically “complete” way to potentially compare the functions of proteins and infer their similarity.

The Gene Ontology follows a simple set of rules, very similar to first order logic. From the GO Database Description
Continue reading

Ten Simple Rules For Solving Any Problem

Welcome! Take three deep breaths, each time expel the air through your nose with force. Now you are ready for this adventure. Let us dive right in and reflect on the premise of this blog post. 

I, personally, dislike the word “solve”. What does it even mean? And that is already the second time I have used that word. The word solve implies all kinds of nonsense, such as completion or the existence of a solution. Let us recast it as: new insights, positive reframing or simply “ah-ha!”. These “problems” can be anything too: emotional ones (external or internal), scientific or research ones, artistic ones, writing ones. If you feel like it, just call it a problem.

Whoever is writing this blog post, he certainly does talk a lot … We should really get going or we will run out of time. You know what, let us start again. Welcome to:

Continue reading

Plotly for interactive 3D plotting

An recently wrote a post on how to use the seaborn library. I really like seaborn and use it a lot for 2D plots. However, recently I have been dealing with 3D data and have found plotly to be best. When used in a jupyter notebook, it allows you to easily generate 3D interactive plots. This is extremely useful to visualize structural data.

Continue reading

Ribosome occupancy profiles are conserved between structurally and evolutionarily related yeast domains

Shameless plug for any OPIG blog readers to take a look at our recent publication in Bioinformatics. Consider giving it a read if the below summary grabs your attention.

Many proteins are now known to fold during their synthesis through the process known as co-translational folding. Translation is an inherently non-equilibrium process – one consequence of this fact is that the speed of translation can radically influence the ability of proteins to fold and function. In this paper we compare ribosome occupancy profiles between related domains in yeast to test the hypothesis that evolutionarily related proteins with similar native folds should tend to have similar translation speed profiles to preserve efficient co-translational folding. We find strong evidence in support of this hypothesis at the level of individual protein domains and across a set of 664 pairs of related domains for which we are able to compute high-quality ribosome occupancy profiles.

To find out more, view the Advance Article at Bioinformatics.

Calculating symmeterised small molecule RMSDs using graph automorphisms in python with GEMMI and NetworkX

When a ring flips, how do we calculate RMSD?

This surprisingly simple question leads to a very interesting problem! If we take a benzene molecule, say, and rotate it 180 degrees, then we have the exact same molecule, but if we have a data structure in which our atoms are labelled, and we apply the same transformation to the atomic positions, the numbering does not reflect that symmetry. If we were then naively to calculate the RMSD it would be huge, despite the fact that the molecule is, chemically speaking, identical.

How can we make our RMSD calculations reflect these symmetries?

Continue reading