Docker is an excellent containerisation system ideally suited to production servers. It allows you to do one small thing but do it well. For example, breaking a large blog up into individually maintained containers for a web-server, a database and (say) a wordpress instance. However due to inherent security woes, Docker doesn’t play nicely with multi-tenanted machines, the kind which are the bread and butter for researchers and HPC users. That’s where Singularity steps in.
Continue readingCategory Archives: Code
Molecular dynamics analysis in MDAnalysis
Any opportunity to use rigorously tested and supported analysis tools rather than in-house code is, in my opinion, an opportunity you owe it to yourself to explore.
My preferred tool for analyzing the output of molecular dynamics (MD) simulations is MDAnalysis, a Python library that provides robust and easy-to-use tools for analyzing most common files output by MD packages (including PDB, DCD, COR, and XTC file formats). But, of course, MDAnalysis can analyze any PDB file, not just one output from an MD simulations. There may be an opportunity in your workflow to incorporate MDAnalysis to save time or to provide more robust error handling than whatever in-house code you currently use.
Using SLURM a little bit more efficiently
Your research group slurmified their servers? You basically have two options now.
Either you install all your necessary things on one of the slurm nodes within an interactive session, e.g.:
srun -p funkyserver-debug --pty --nodes=1 --ntasks-per-node=1 -t 00:10:00 --wait=0 /bin/bash
and always specify this node by adding the ‘#SBATCH –nodelist=funkyserver.cpu.do.work’ line to your sbatch scripts or you set up some template scripts that will help you to install all your requirements on multiple nodes so you can enjoy the benefits of the slurm system.
Here is how I did it; comments and suggestions welcome!
Step 1: Create an sbatch template file (e.g. sbatch_job_on_server.template_sh) on the submission node that does what you want. In the ‘#SBATCH –partition’ or ‘–nodelist’ lines use a placeholder, e.g. ‘<server>’, instead of funkyserver.
For example, for installing the same conda environment on all nodes that you want to work on:
Continue readingFunctional Programming in Python
Introduction
The difficulty of reasoning about the behaviour of stateful programs, especially in concurrnent enviroments, has led to increased in intrest in a programming paradigm called functional programming. This style emphasises the connection between programs and mathematics, encouraging code that is easy to understand and, in some critical cases, even possible to prove properties of.
Continue readingConsistent plotting with ggplot
Unlike other OPIGlets (looking at you, Claire), I have neither the skill nor the patience to make good figures from scratch. And making good figures — as well as remaking, rescaling and adapting them — is incredibly important, because they play a huge role in the way we communicate our research. So how does an aesthetically impaired DPhil student do her plotting?
Continue readingHow to be a Bayesian – ft. a completely ridiculous example
Most of the stats we are exposed to in our formative years as statisticians are viewed through a frequentist lens. Bayesian methods are often viewed with scepticism, perhaps due in part to a lack of understanding over how to specify our prior distribution and perhaps due to uncertainty as to what we should do with the posterior once we’ve got it.
Continue readingGitHub Link to Text Mining Tool
I have created a GitHub page to share some of the codes that I used to conduct text mining to extract HBV-related genetic information from PubMed Central. This code is easily adaptable to search through sentences that satisfy your keyword search, so please take a look if you are interested: https://github.com/angoto/HBV_Code.
Note: GitHub page is currently unavailable online, but will be accessible in due course.
SAbBox – the easy way to obtain our antibody tools
A significant part of the work we do here in OPIG revolves around antibodies, the proteins of the immune system that bind to and help remove any foreign entities that find their way into the body. Since antibodies can be developed that target basically anything, they have become extremely useful as therapeutics. In our research, we develop computational tools that can be incorporated into various points along the antibody discovery pipeline. These tools include our database of antibody structures, SAbDab, and a series of predictive tools (e.g. structural modelling algorithms like ABodyBuilder) which are known collectively as SAbPred.
Continue readingA Gentle Introduction to the GPyOpt Module
Manually tuning hyperparameters in a neural network is slow and boring. Using Bayesian Optimisation to do it for you is slightly less slower and you can go do other things whilst it’s running. Susan recently highlighted some of the resources available to get to grips with GPyOpt. Below is a copy of a Jupyter Notebook where we walk through a couple of simple examples and hopefully shed a little bit of light on how the algorithm works.
Continue readingThree things to help you get started on Bayesian Optimisation
In this blog post I will share with you the materials that I found most useful when I started doing some Bayesian Optimisation in my research. Bear in mind, I am a Chemist by training, so I approached this topic from a non-mathematical background (my eyes have to be persuaded to look at mathematical equations). Out of all the materials I have come across, I found these to be the most accessible.
Continue reading