Category Archives: How To

How to do things. doh.

How to make your own singularity container zero fuss!

In this blog post, I’ll show you guys how to make your own shiny container for your tool! Zero fuss(*) and in FOUR simple steps.

As an example, I will show how to make a singularity container for one of our public tools, ANARCI, the antibody numbering tool everyone in OPIG and external users are familiar with – If not, check the web app and the GitHub repo here and here.

(*) Provided you have your own Linux machine with sudo permissions, otherwise, you can’t do it – sorry. Same if you have a Mac or Windows – sorry again.
BUT, there are workarounds for these cases such as using the remote singularity builder here, for which you only need to sign up and create an account, and the use of Virtual Machines (VMs), as described here.

Continue reading →

Visualise with Weight and Biases

Understanding what’s going on when you’ve started training your shiny new ML model is hard enough. Will it work? Have I got the right parameters? Is it the data? Probably. Any tool that can help with that process is a Godsend. Weights and biases is a great tool to help you visualise and track your model throughout your production cycle. In this blog post, I’m going to detail some basics on how you can initialise and use it to visualise your next project.

Installation

To use weights and biases (wandb), you need to make an account. For individuals it is free, however, for team-oriented features, you will have to pay. Wandb can then be installed using pip or conda.

$ 	conda install -c conda-forge wandb

or 

$   pip install wandb

To initialise your project, import the package, sign in, and then use the following command using your chosen project name and username (if you want):

import wandb

wandb.login()

wandb.init(project='project1')

In addition to your project, you can also initialise a config dictionary with starting parameter values:

Continue reading →

Why can a man not lift himself by pulling up on his bootstrap hypothesis test?

This blogpost highlights a typical mistake when performing the bootstrap hypothesis test. Bootstrapping is a method of resampling data to estimate measures of variability, such as confidence intervals or variance.

In the simplest form of the bootstrap, assume you have a set of values 1, 2, 3, 4, 5, 6, 7, 8, 9, 10. You want to estimate the mean and variability of the mean using these data. The recipe is as follows:

Continue reading →

How to prepare a molecule for RDKit

RDKit is very fussy when it comes to inputs in SDF format. Using the SDMolSupplier, we get a significant rate of failure even on curated datasets such as the PDBBind refined set. Pymol has no such scruples, and with that, I present a function which has proved invaluable to me over the course of my DPhil. For reasons I have never bothered to explore, using pymol to convert from sdf, into mol2 and back to sdf format again (adding in missing hydrogens along the way) will almost always make a molecule safe to import using RDKit:

from pathlib import Path
from pymol import cmd

def py_mollify(sdf, overwrite=False):
    """Use pymol to sanitise an SDF file for use in RDKit.

    Arguments:
        sdf: location of faulty sdf file
        overwrite: whether or not to overwrite the original sdf. If False,
            a new file will be written in the form <sdf_fname>_pymol.sdf
            
    Returns:
        Original sdf filename if overwrite == False, else the filename of the
        sanitised output.
    """
    sdf = Path(sdf).expanduser().resolve()
    mol2_fname = str(sdf).replace('.sdf', '_pymol.mol2')
    new_sdf_fname = sdf if overwrite else str(sdf).replace('.sdf', '_pymol.sdf')
    cmd.load(str(sdf))
    cmd.h_add('all')
    cmd.save(mol2_fname)
    cmd.reinitialize()
    cmd.load(mol2_fname)
    cmd.save(str(new_sdf_fname))
    return new_sdf_fname

How to Install Open Source PyMOL on Windows 10

It is possible to get an installer for the crystallographer’s favourite molecular visualization tool for Windows machines, that is if you are willing to pay a fee. Fortunately, Christoph Gohlke has made available free, pre-compiled Windows versions of the latest PyMOL software, along with all of it’s requirements, it’s just not particularly straightforward to install. The PyMOLWiki offers a three-step guide on how to do this and I will break it down to make it somewhat clearer.

1. Install the latest version of Python 3 for Windows

Download the Windows Installer (x-bit) for Python 3 from their website, x being your Windows architecture – 32 or 64.

Then, follow the instructions on how to install it. You can check if it has installed by running the following in PowerShell:

Continue reading →

Making pwd redundant

I’m going to keep this one brief, because I am mid-confirmation-and-paper-writing madness. I have seen too many people – both beginners and seasoned veterans – wandering around their Linux filesystem blindfolded:

Isn’t it hideous?

Whenever you want to see where you are, you have to execute pwd (present working directory), which will print your absolute location to stdout. If you have many terminals open at the same time, it is easy to lose track of where you are, and every other command becomes pwd; surely, I hear you cry, there has to be a better way!

Well, fear not! With a little tinkering with ~/.bashrc, we can display the working directory as part of the special PS1 environment variable, responsible for how your username and computer are displayed above. Putting the following at the top of ~/.bashrc

me=`id | awk -F\( '{print $2}' | awk -F\) '{print $1}'`
export PS1="`uname -n |  /bin/sed 's/\..*//'`{$me}:\$PWD$ "

… saving, and starting a new termanal window results in:

Much better!

I haven’t used pwd in 3 years.

How to Blopig

Blopig has a wealth of knowledge, with everything from a Bayesian answer to the question “should ketchup be stored in the fridge?“* to the Nobel-Prize-Winner-approved analysis of AlphaFold2. Blopig runs on WordPress and uses blocks, components for adding different types of content to a post. These are blocks like paragraphs, headers, images, image galleries, and videos. Here are some hints and tips for getting the most out of WordPress.

One of the first blocks worth mentioning is the “Read More…” block…

Continue reading →

Python’s Data Classes

When writing code, you have inevitably needed to store data throughout your pipeline. In these cases you store your value, list or data frame as a variable to easily use it elsewhere in your code. However, sometimes your data has an awkward form, consisting of a number of different length lists or data of different types and sizes. While it is still doable to work with, and using tuples or dictionaries can help, accessing different elements in your data quickly becomes messy and it is less intuitive what your code is actually doing.

To solve the above stated problem, data classes were introduced as a new feature in Python 3.7. A data class is a regular Python class, but with certain methods already implemented for you. This makes them easy to create and removes a lot of boilerplate (repeated code) making them simpler, more intuitive and pretty. Further, as data classes are part of the standard library, you can directly import it without needing to install any external dependencies (noice).

With the sales pitch out of the way, let us look at how we can use data classes.

from dataclasses import dataclass
from typing import Any

@dataclass
class Antibody:
    vgene: str
    jgene: None
    sequence: Any = 'EVQ'

Continue reading →

Tracking Changes in LaTeX

Tracking changes in Microsoft Word is easy – we just click the ‘Track Changes’ button. It requires a little more work in LaTeX but here is a quick guide to doing it as painlessly as possible!

First, we want our original document to be stored in a *.tex file in an Overleaf project (as shown below)

Continue reading →

Solving WORDLE with grep

People seem to have become obsessed with wordle, just like they became obsessed with sudoku. After my initial burst of “oh a new game!” had waned, I was left thinking “my time is precious and this is exactly what we have computers for”. With this in mind, below is my quick and dirty way of solving these. I’m sure the regexp gurus amongst you will have a more elegant solution.

Step 1: Make sure you’ve got /usr/share/dict/words installed. This is just a huge list of words in a specific language and for me, this required installing the British words list.

sudo apt-get install wbritish

Step 2: Go to wordle

Step 3: Pick a random 5-letter word as your starting point. This is where grep and /usr/share/dict/words comes in:

Continue reading →

Oxford Protein Informatics Group

or "OPIG" to friends

Category Archives: How To

How to make your own singularity container zero fuss!

Visualise with Weight and Biases

Why can a man not lift himself by pulling up on his bootstrap hypothesis test?

How to prepare a molecule for RDKit

How to Install Open Source PyMOL on Windows 10

1. Install the latest version of Python 3 for Windows

Making pwd redundant

How to Blopig

Python’s Data Classes

Tracking Changes in LaTeX

Solving WORDLE with grep