Supercharge Your Literature Review With These Tools

When starting a new project, conducting a literature review of the field can be one of the most daunting prospects. Not only do you need to get through a mountain of research papers, you also need to work out which mountain of papers to get through. You don’t want to start a project only to realise a few weeks (or months!) in that you missed a key paper which would have completely changed the course of your research. Luckily, there are now several handy tools which can help speed up this process.

Continue reading

Thinking of going to a conference

As so many members of the group have never attended an in-person conference, I thought it might be worth answering the question “why do people attend conferences?”

First- up, we should remember that flying around the world is not a zero cost to the planet, so all of us lucky enough to be able to travel should think hard every time before we choose to do so.

This means it’s really important to make sure that we know why we are going to any conference and maximise the benefits from attendance. Below are a few things to think about in terms of why you attend a conference and what to do when you are there, but this is definitely not a complete list, more a starter for four.

Continue reading

Am I better? Performance metrics unravelled

What’s the deal with all these numbers? Accuracy, Precision, Recall, Sensitivity, AUC and ROCs.

The basic stuff:

Given a method that produces a numerical outcome either catagorical (classification) or continuous (regression), we want to know how well our method did. Let’s start simple:

True positives (TP): You said something was a cow and it was in fact a cow – duh.

False positives (FP): You said it was a cow and it wasn’t – sad.

True negative (TN): You said it was not a cow and it was not – good job.

False negative (FN): You said it was not a cow but it was a cow – do better.

I can optimise these metrics artificially. Just call everything a cow and I have a 100% true positive rate. We are usually interested in a trade-off, something like the relative value of metrics. This gives us:

Continue reading

Code your own molecule sketcher in 4 easy steps!

Drawing molecules on your laptop usually requires access to proprietary software such as ChemDraw (link) or free websites such as PubChem’s online sketcher (link). However, if you are feeling adventurous, you can build your personal sketcher in React/Typescript using the Ketcher package!

Ketcher is an open-source package that allows easy implementation of a molecule sketcher into a web application. Unfortunately, it does require TypeScript so the script to run it cannot be imported directly into an HTML page. Therefore we will set up a simple React app to get it working.

The sketcher is very sleek and has a vast array of functionality, such as choosing any atom from the periodic table and being able to directly import molecules from either SMILES or Mol2/SDF file format into the sketcher. These molecules can then be edited and saved to a new file in the chemical file format of your choosing.

Continue reading

How to build a Python dictionary of residues for each molecule in PyMOL

Sometimes it can be handy to work with multiple structures in PyMOL using Python.

Here’s a snippet of code you might find useful: we iterate over all the α-carbon atoms in a protein and append to a list tuples such as (‘GLY’, 1). The dictionary, ‘reslist’, returns a list of residue names and indices for each molecule, where the key is a string containing the name of the molecule.

from pymol import cmd

# Create a list of all the objects, called 'mpls':
mols = cmd.get_object_list('*')

# Create an empty dictionary that will return a list of residues
# given the name of the molecule object
reslist = {}

# Set the dictionaries to be empty lists
for m in mols:  reslist[m] = []

# Use PyMOL's iterate command to go over every α-Carbon and append 
# a tuple consisting of the each residue's residue name ('resn') and
# residue index ('resi '):
for m in mols:  cmd.iterate('%s and n. ca'%m, 'reslist["%s"].append((resn,int(resi)))'%m)

This script assumes you only have protein molecules loaded, and ignores things like chain ID and insertion codes.

Once you have your list of residues, you can use it with the cmd.align command, e.g., to align a particular residue to a reference structure.

Tales of an OPIG Jamboree

Jamboree
(1) a large gathering, as of a political party or the teams of a sporting league, often including a program of speeches and entertainment.;
(2) a large gathering of members of the Boy Scouts or Girl Scouts, usually nationwide or international in scope

Oxford Dictionary

This October marks twenty years since our supreme leader, Charlotte Deane, came to Oxford to start the first protein informatics group in this university.

Twenty years is a really long time, and at OPIG we like to celebrate things in style. From the beginning, it was clear that we would be doing what we know best: get together, consume lots of food and drinks, and perhaps talk about science. But, frankly, that’s what we do all the time. This simply wasn’t enough to celebrate two decades of scientific production. So Charlotte entrusted several of us with an ambitious goal: to reach out to our former members, and to ask them to join us, in Oxford, to celebrate two decades of protein informatics. And that’s what we did.

For two months, we painstakingly tracked down every person that has ever been part of our group, and attempted to gather their contact details to invite them to Oxford. Attempted to, for the most part. While LinkedIn gave us some early victories, some alumni had managed to cover their tracks very well, including one person we could only found after tracking down their three previous jobs. Nevertheless, after much digging, we managed to find updated contact details for every person that has ever passed by our lab, and nearly thirty of these former alumni (almost 50% of them!) made their way to Oxford on October 8th* to hold the first OPIG Jamboree.

From the first student (Sanne Abeln, rightmost in the second row) to the most recent (Kate, whose hair can barely be seen on the leftmost third row), we are all here!
Continue reading

Llamas and nanobodies

Nanobodies are an exciting area of increasing interest in the biotherapeutics domain. They consist only of a heavy chain variable domain so are much smaller than conventional antibodies (about 1/10th of their mass) but despite this, manage to achieve comparable affinity for their targets, in addition to being more soluble and stable – good things come in small packages! Nanobodies are not naturally produced in humans but can be derived from camelids (VHHs) or sharks (vNARs) and then engineered to humanise them. For the rest of this blog post we will skip over the science entirely and learn how to draw a llama, a great example of a camelid species.

Graphormer: Merging GNNs and Transformers for Cheminformatics

This is my first OPIG blog! I’m going to start with a summary of the Graphormer, a Graph Neural Network (GNN) that borrows concepts from Transformers to boost performance on graph tasks. This post is largely based on the NeurIPS paper Do Transformers Really Perform Bad for Graph Representation? by Ying et. al., which introduces the Graphormer, and which we read for our last deep learning journal club. The project has now been integrated as a Microsoft Research project.

I’ll start with a cheap and cheerful summary of Transformers and GNNs before diving into the changes in the Graphormer. Enjoy!

Continue reading

Using Conda environments with Flask and Apache

With the advent of ABlooper, we’ve recently introduced OpenMM as a new dependency for the SAbDab-SAbPred antibody modelling platform. By far the easiest way to install the OpenMM Python API is via Conda, so we’ve moved to Conda environments for the entire platform. This has made installation of the platform much easier, but introduces complications when it comes to running its web applications under Apache. In this post, I’ll briefly explain the reason for this, and provide a basic guide for running Flask apps using Conda environments under Apache.

Continue reading

Running code that fails with style

We have all been there, working on code that continuously fails while staring at a dull and colorless command-line. However, we are in luck, as there is a way to make the constant error messages look less depressing. By changing our shell to one which enables a colorful themed command-line and fancy features like automatic text completion and web search your code won’t just fail with ease, but also with style!

A shell is your command-line interpreter, meaning you use it to process commands and output results of the command-line. The shell therefore also holds the power to add a little zest to the command-line. The most well-known shell is bash, which comes pre-installed on most UNIX systems. However, there exist many different shells, all with different pros and cons. The one we will focus on is called Z Shell or zsh for short.

Zsh was initially only for UNIX and UNIX-Like systems, but its popularity has made it accessible on most systems now. Like bash, zsh is extremely customizable and their syntax so similar that most bash commands will work in zsh. The benefit of zsh is that it comes with additional features, plugins and options, and open-source frameworks with large communities. The framework which we will look into is called Oh My Zsh.

Continue reading