Unreasonably faster notes, with command-line fuzzy search

A good note system should act like a second brain:

Accessible in seconds
Adding information should be frictionless
Searching should be exhaustive – if it’s there, you must find it

The benefits of such a note system are immense – never forget anything again! Search, perform the magic ritual of Copy Paste, and rejoice in the wisdom of your tried and tested past.

But how? Through the unreasonable effectiveness of interactive fuzzy search. This is how I have used Fuz, a terminal-based file fuzzy finder, for about 4 years.

Briefly, Fuz extracts all text within a directory using ripgrep, enables interactive fuzzy search with FZF, and returns you the selected item. As you type, the search results get narrowed down to a few matches. Files are opened at the exact line you found. And it’s FAST – 100,000 lines in half a second fast.

Using *Fuz* to quickly add a code-snippet in our note directory – then retrieving it with fuzzy-search. Here, on how to read FASTA files with Biopython, conveniently added to a file called biopython.py.

Continue reading →

Naga101: A Guide to Getting Started with (OPIG) Slurm Servers

Over the past months, I’ve been working with a few new members of OPIG, which left me answering (and asking) lots of questions about working with Slurm. In this blog post, I will try to cover key, practical basics to interacting with servers that are set up on Slurm.

Slurm is a workload manager or job scheduler for Linux, meaning that it helps with allocating resources (eg CPUs and GPUs) on a server to users’ jobs.

To note, all of the commands and files shown here are run from a so-called ‘head’ node, from which you access Slurm servers.

1. Entering an interactive session

Unlike many other servers, you cannot access a Slurm server via ‘ssh’. Instead, you can enter an interactive (or ‘debug’) session – which, in OPIG, is limited to 30 minutes – via the srun command. This is incredibly useful for copying files, setting up environments and checking that your code runs.

srun -p servername-debug --pty --nodes=1 --ntasks-per-node=1 -t 00:30:00 --wait=0 /bin/bash

2. Submitting jobs

While the srun command is easy and helpful, many of the jobs we want to run on a server will take longer than the debug queue time limit. You can submit a job, which can then run for a longer (although typically still capped) time but is not interactive, via sbatch.

Continue reading →

Coarse-grained models of antibody solutions

Various coarse-grained (CG) models have become increasingly common in studies of antibody-antibody interactions in solution. These models appear poised to enter development pipelines in the near future to help predict and understand how antibody-antibody interactions influence the suitability of a given monoclonal antibody (mAb) for mass production and delivery as an antibody therapy. This blog post is a non-exhaustive summary of some of the highlights I found during a recent literature search.

Continue reading →

Supercharge Your Literature Review With These Tools

When starting a new project, conducting a literature review of the field can be one of the most daunting prospects. Not only do you need to get through a mountain of research papers, you also need to work out which mountain of papers to get through. You don’t want to start a project only to realise a few weeks (or months!) in that you missed a key paper which would have completely changed the course of your research. Luckily, there are now several handy tools which can help speed up this process.

Continue reading →

Thinking of going to a conference

As so many members of the group have never attended an in-person conference, I thought it might be worth answering the question “why do people attend conferences?”

First- up, we should remember that flying around the world is not a zero cost to the planet, so all of us lucky enough to be able to travel should think hard every time before we choose to do so.

This means it’s really important to make sure that we know why we are going to any conference and maximise the benefits from attendance. Below are a few things to think about in terms of why you attend a conference and what to do when you are there, but this is definitely not a complete list, more a starter for four.

Continue reading →

Am I better? Performance metrics unravelled

What’s the deal with all these numbers? Accuracy, Precision, Recall, Sensitivity, AUC and ROCs.

The basic stuff:

Given a method that produces a numerical outcome either catagorical (classification) or continuous (regression), we want to know how well our method did. Let’s start simple:

True positives (TP): You said something was a cow and it was in fact a cow – duh.

False positives (FP): You said it was a cow and it wasn’t – sad.

True negative (TN): You said it was not a cow and it was not – good job.

False negative (FN): You said it was not a cow but it was a cow – do better.

I can optimise these metrics artificially. Just call everything a cow and I have a 100% true positive rate. We are usually interested in a trade-off, something like the relative value of metrics. This gives us:

Continue reading →

Code your own molecule sketcher in 4 easy steps!

Drawing molecules on your laptop usually requires access to proprietary software such as ChemDraw (link) or free websites such as PubChem’s online sketcher (link). However, if you are feeling adventurous, you can build your personal sketcher in React/Typescript using the Ketcher package!

Ketcher is an open-source package that allows easy implementation of a molecule sketcher into a web application. Unfortunately, it does require TypeScript so the script to run it cannot be imported directly into an HTML page. Therefore we will set up a simple React app to get it working.

The sketcher is very sleek and has a vast array of functionality, such as choosing any atom from the periodic table and being able to directly import molecules from either SMILES or Mol2/SDF file format into the sketcher. These molecules can then be edited and saved to a new file in the chemical file format of your choosing.

Continue reading →

How to build a Python dictionary of residues for each molecule in PyMOL

Sometimes it can be handy to work with multiple structures in PyMOL using Python.

Here’s a snippet of code you might find useful: we iterate over all the α-carbon atoms in a protein and append to a list tuples such as (‘GLY’, 1). The dictionary, ‘reslist’, returns a list of residue names and indices for each molecule, where the key is a string containing the name of the molecule.

from pymol import cmd

# Create a list of all the objects, called 'mpls':
mols = cmd.get_object_list('*')

# Create an empty dictionary that will return a list of residues
# given the name of the molecule object
reslist = {}

# Set the dictionaries to be empty lists
for m in mols:  reslist[m] = []

# Use PyMOL's iterate command to go over every α-Carbon and append 
# a tuple consisting of the each residue's residue name ('resn') and
# residue index ('resi '):
for m in mols:  cmd.iterate('%s and n. ca'%m, 'reslist["%s"].append((resn,int(resi)))'%m)

This script assumes you only have protein molecules loaded, and ignores things like chain ID and insertion codes.

Once you have your list of residues, you can use it with the cmd.align command, e.g., to align a particular residue to a reference structure.

Tales of an OPIG Jamboree

Jamboree
(1) a large gathering, as of a political party or the teams of a sporting league, often including a program of speeches and entertainment.;
(2) a large gathering of members of the Boy Scouts or Girl Scouts, usually nationwide or international in scope
Oxford Dictionary

This October marks twenty years since our supreme leader, Charlotte Deane, came to Oxford to start the first protein informatics group in this university.

Twenty years is a really long time, and at OPIG we like to celebrate things in style. From the beginning, it was clear that we would be doing what we know best: get together, consume lots of food and drinks, and perhaps talk about science. But, frankly, that’s what we do all the time. This simply wasn’t enough to celebrate two decades of scientific production. So Charlotte entrusted several of us with an ambitious goal: to reach out to our former members, and to ask them to join us, in Oxford, to celebrate two decades of protein informatics. And that’s what we did.

For two months, we painstakingly tracked down every person that has ever been part of our group, and attempted to gather their contact details to invite them to Oxford. Attempted to, for the most part. While LinkedIn gave us some early victories, some alumni had managed to cover their tracks very well, including one person we could only found after tracking down their three previous jobs. Nevertheless, after much digging, we managed to find updated contact details for every person that has ever passed by our lab, and nearly thirty of these former alumni (almost 50% of them!) made their way to Oxford on October 8th* to hold the first OPIG Jamboree.

From the first student (Sanne Abeln, rightmost in the second row) to the most recent (Kate, whose hair can barely be seen on the leftmost third row), we are all here!

Continue reading →

Llamas and nanobodies

Nanobodies are an exciting area of increasing interest in the biotherapeutics domain. They consist only of a heavy chain variable domain so are much smaller than conventional antibodies (about 1/10th of their mass) but despite this, manage to achieve comparable affinity for their targets, in addition to being more soluble and stable – good things come in small packages! Nanobodies are not naturally produced in humans but can be derived from camelids (VHHs) or sharks (vNARs) and then engineered to humanise them. For the rest of this blog post we will skip over the science entirely and learn how to draw a llama, a great example of a camelid species.

Oxford Protein Informatics Group

or "OPIG" to friends

Unreasonably faster notes, with command-line fuzzy search

Naga101: A Guide to Getting Started with (OPIG) Slurm Servers

Coarse-grained models of antibody solutions

Supercharge Your Literature Review With These Tools

Thinking of going to a conference

Am I better? Performance metrics unravelled

Code your own molecule sketcher in 4 easy steps!

How to build a Python dictionary of residues for each molecule in PyMOL

Llamas and nanobodies