Feeding a drove of hungry OPIGlets

In preparing for battle I have always found that plans are useless, but planning is indispensable.

Dwight D. Eisenhower

Following the previous post about OPIG retreat 2022, and having received numerous requests for recipes, I thought I’d document the process of ensuring that 24 people are kept fed and happy. Recipes at the foot of the post.

Disclaimer – these recipes are entirely my own interpretations, adapted where necessary to suit a range of dietary requirements. They are in no way authentic to any national cuisines and are not intended to be.

Disclaimer II: The Disclaiming – all measurements are approximate. I rarely write down recipes or use precise measurements. Taste as you go, and don’t be afraid to add more salt.

Continue reading

MM(PB/GB)SA – a quick start guide

The MMPBSA.py program distributed Open Source in the AmberTools21 package is a powerful tool for end-point free energy calculations on molecular dynamics simulations. In its most simple application, MMPBSA.py is used to calculate the free energy difference between the bound and unbound states of a protein-ligand complex. In order to use it, however, you need to have an Amber-compliant trajectory file, which means you need to setup and run your simulation fairly carefully.

While the Amber Manual and the MMPBSA tutorial provide lots of helpful information, putting everything together into a full pipeline taking you from structure to a free energy is another story. The goal for this guide is to provide a schematic you can follow to get started. This guide assumes you are familiar with molecular dynamics simulations and the theory of MMPBSA.

The easiest way I have found to do this, using only Open Source software, is:

(1) Download your raw PDB file. If you are lucky and it contains a complete set of heavy atoms (excepting perhaps a terminal OXT here and there, which tleap will add for you in step 3) you are good to go.

(2) Use the H++ webserver to determine the protonation states of each residue and add hydrogens as needed. This webserver is particularly convenient because it will allow you to directly download a PQR file that you can use to generate your starting topology and coordinates. Note that you have various options to choose the pH and internal/external dielectric constants for the calculation.

(3) Use tleap to generate your topology (prmtop) and coordinate (mdcor) files for your simulations. Do not forget that you will need not only the prmtop for the solvated complex, but also a dry prmtop for each of the complex, receptor, and ligand. Load the PQR file from H++ and do not forget to set PBRadii *to the same value for all prmtops*. A typical tleap script for setting up your solvated complex would look something like:

Continue reading

Einops: Powerful library for tensor operations in deep learning

Tobias and I recently gave a talk at the OPIG retreat on tips for using PyTorch. For this we created a tutorial on Google Colab notebook (link can be found here). I remember rambling about the advantages of implementing your own models against using other peoples code. Well If I convinced you, einops is for you!!

Basically, einops lets you perform operations on tensors using the Einstein Notation. This package comes with a number of advantages a few of which I will try and summarise here:

Continue reading

OPIG Retreat 2022

Finally, after two years of social distancing, we were able to continue the tradition of OPIGtreat – a 2-3 day escape to the countryside for a packed schedule of talks and fun.

This year, the lovely YHA Wilderhope Manor in Shropshire was chosen by Lewis, our trip organizer. With a hostel in the middle of nowhere, with no phone signal, this trip promised to be an exciting get-away from our plugged-in lives at the university.

Continue reading

Women in Computing: past, present and what we can do to improve the future.

Computing is one of the only scientific fields which was once female-dominated. In the 30s and 40s, women made up the bulk of the workforce doing complex, tedious calculations in the fields including ballistics, astrophysics, aeronautics (think Hidden Figures) and code-breaking. Engineers themselves found that the female computers were far more reliable than themselves in doing such calculations [9]. As computing machines became available, there was no precedent set for the gender of a computer operator, and so the women previously doing the computing became the computer operators [10].

However, this was not to last. As computing became commercialised in the 50s, the skill required for computing work was starting to be recognised. As written in [1]:

“Software company System Development Corp. (SDC) contracted psychologists William Cannon and Dallis Perry to create an aptitude assessment for optimal programmers. Cannon and Perry interviewed 1,400 engineers — 1,200 of them men — and developed a “vocational interest scale,” a personality profile to predict the best potential programmers. Unsurprisingly given their male-dominated test group, Cannon and Perry’s assessment disproportionately identified men as the ideal candidates for engineering jobs. In particular, the test tended to eliminate extroverts and people who have empathy for others. Cannon and Perry’s paper concluded that typical programmers “don’t like people,” forming today’s now pervasive stereotype of a nerdy, anti-social coder.”

Continue reading

OpenMM Setup: Start Simulating Proteins in 5 Minutes

Molecular dynamics (MD) simulations are a good way to explore the dynamical behaviour of a protein you might be interested in. One common problem is that they often have a relatively steep learning curve when using most MD engines.

What if you just want to run a simple, one-off simulation with no fancy enhanced sampling methods? OpenMM Setup is a useful tool for exactly this. It is built on the open-source OpenMM engine and provides an easy to install (via conda) GUI that can have you running a simulation in less than 5 minutes. Of course, running a simulation requires careful setting of parameters and being familiar with best practices and while this is beyond the scope of this post, there are many guides out there that can easily be found. Now on to the good stuff: using OpenMM Setup!

When you first run OpenMM Setup, you’ll be greeted by a browser window asking you to choose a structure to use. This can be a crystal structure or a model. Remember, sometimes these will have problems that need fixing like missing density or charged, non-physiological termini that would lead to artefacts, so visual inspection of the input is key! You can then choose the force field and water model you want to use, and tell OpenMM to do some cleaning up of the structure. Here I am running the simulation on hen egg-white lysozyme:

Continue reading

How to prepare a molecule for RDKit

RDKit is very fussy when it comes to inputs in SDF format. Using the SDMolSupplier, we get a significant rate of failure even on curated datasets such as the PDBBind refined set. Pymol has no such scruples, and with that, I present a function which has proved invaluable to me over the course of my DPhil. For reasons I have never bothered to explore, using pymol to convert from sdf, into mol2 and back to sdf format again (adding in missing hydrogens along the way) will almost always make a molecule safe to import using RDKit:

from pathlib import Path
from pymol import cmd

def py_mollify(sdf, overwrite=False):
    """Use pymol to sanitise an SDF file for use in RDKit.

    Arguments:
        sdf: location of faulty sdf file
        overwrite: whether or not to overwrite the original sdf. If False,
            a new file will be written in the form <sdf_fname>_pymol.sdf
            
    Returns:
        Original sdf filename if overwrite == False, else the filename of the
        sanitised output.
    """
    sdf = Path(sdf).expanduser().resolve()
    mol2_fname = str(sdf).replace('.sdf', '_pymol.mol2')
    new_sdf_fname = sdf if overwrite else str(sdf).replace('.sdf', '_pymol.sdf')
    cmd.load(str(sdf))
    cmd.h_add('all')
    cmd.save(mol2_fname)
    cmd.reinitialize()
    cmd.load(mol2_fname)
    cmd.save(str(new_sdf_fname))
    return new_sdf_fname

How to Install Open Source PyMOL on Windows 10

It is possible to get an installer for the crystallographer’s favourite molecular visualization tool for Windows machines, that is if you are willing to pay a fee. Fortunately, Christoph Gohlke has made available free, pre-compiled Windows versions of the latest PyMOL software, along with all of it’s requirements, it’s just not particularly straightforward to install. The PyMOLWiki offers a three-step guide on how to do this and I will break it down to make it somewhat clearer.

1. Install the latest version of Python 3 for Windows

Download the Windows Installer (x-bit) for Python 3 from their website, x being your Windows architecture – 32 or 64.

Then, follow the instructions on how to install it. You can check if it has installed by running the following in PowerShell:

Continue reading

Making pwd redundant

I’m going to keep this one brief, because I am mid-confirmation-and-paper-writing madness. I have seen too many people – both beginners and seasoned veterans – wandering around their Linux filesystem blindfolded:

Isn’t it hideous?

Whenever you want to see where you are, you have to execute pwd (present working directory), which will print your absolute location to stdout. If you have many terminals open at the same time, it is easy to lose track of where you are, and every other command becomes pwd; surely, I hear you cry, there has to be a better way!

Well, fear not! With a little tinkering with ~/.bashrc, we can display the working directory as part of the special PS1 environment variable, responsible for how your username and computer are displayed above. Putting the following at the top of ~/.bashrc

me=`id | awk -F\( '{print $2}' | awk -F\) '{print $1}'`
export PS1="`uname -n |  /bin/sed 's/\..*//'`{$me}:\$PWD$ "

… saving, and starting a new termanal window results in:

Much better!

I haven’t used pwd in 3 years.

3 Key Questions to Think About When Designing Proteins Computationally

We have reached the era of design, not just ‘hunting’. Particularly exciting to me is the de novo design of proteins, which have a wide and ever increasing range of applications from therapeutics to consumer products, biomanufacturing to biomaterials. Protein design has been a) enabled by decades of research that contributed to our understanding of protein sequence, structure & function and b) accelerated by computational advances – capturing the information we have learned from proteins and representing it for computers and machine learning algorithms.

In this blog post, I will discuss three key methodological considerations for computational protein design:

  1. Sequence- vs structure-based design
  2. ML- vs physics-based design
  3. Target-agnostic vs target-aware design
Continue reading