Category Archives: Technical

Making pwd redundant

I’m going to keep this one brief, because I am mid-confirmation-and-paper-writing madness. I have seen too many people – both beginners and seasoned veterans – wandering around their Linux filesystem blindfolded:

Isn’t it hideous?

Whenever you want to see where you are, you have to execute pwd (present working directory), which will print your absolute location to stdout. If you have many terminals open at the same time, it is easy to lose track of where you are, and every other command becomes pwd; surely, I hear you cry, there has to be a better way!

Well, fear not! With a little tinkering with ~/.bashrc, we can display the working directory as part of the special PS1 environment variable, responsible for how your username and computer are displayed above. Putting the following at the top of ~/.bashrc

me=`id | awk -F\( '{print $2}' | awk -F\) '{print $1}'`
export PS1="`uname -n |  /bin/sed 's/\..*//'`{$me}:\$PWD$ "

… saving, and starting a new termanal window results in:

Much better!

I haven’t used pwd in 3 years.

How to estimate the inestimable

Back-of-the-envelope calculations are one of our chief tools as scientists. When you spend most of your time wondering if your latest measurement is correct, having a tool to check if the numbers make sense is simply priceless. If you are lucky, a good estimate might just avoid a costly or laborious measurement — this is very common in disciplines like chemical engineering, which a friend described as “the art of estimating numbers and plugging them into some variation of Bernoulli’s continuity equation”. Unsurprisingly, these Fermi problems are now common interview questions at major consultancy and tech companies, and have even started to go viral.

Last week, I thought I would ask my biochemistry students to solve a back-of-the-envelope problem as part of their tutorial work. Disguised as an enzyme catalysis problem, I asked them to estimate the energy of a single hydrogen bond. Needless to say, they were puzzled. Some of them asked if I had forgotten to include some information in the problem sheet. For some reason, Fermi problems seem to be less common in chemistry and biology that they are in physics of engineering. Of course, estimating the energy of a hydrogen bond is in many ways much harder than guessing the number of ping pong balls that fit a Boeing 747. Nobody has seen a hydrogen bond in the flesh. And our minds struggle to grasp the vast numbers present at the molecular level. Nevertheless, guesstimates are incredibly useful

Continue reading

New Antibody Therapeutic INNs will no longer end in “-mab”!

Happy 2022, Blopiggers!

My first post of the year is about another major change to the way the World Health Organisation will be assigning “International Non-proprietary Name”s (INNs) to antibody-based therapeutics. I haven’t seen this publicised widely, so I thought I’d share it here as it is an important consideration for anyone mining or exploiting this data.

Continue reading

A logical brain teaser to derail your afternoon

Brain teasers have a strange power. For many they evoke nothing more than a mild and transient sense of curiosity. But for a certain subset of people they create an irresistible intellectual temptation which even needs to actively be avoided at times as not to completely derail conversations and take over whole afternoons.

For better or worse, I am in the camp of people who are highly susceptible to brain teasers. I just love them too much. More than once in my lifetime I had to ask a friend not to tell me about a particular brain teaser they had heard about because I knew it would inevitably take over my mind and send me down an almost hypnotic spiral of thoughts whose only escape would be finding the solution.

While brain teasers can admittedly turn into ridiculously powerful distractions for some of us, they are not necessarily a waste of time. They have high recreational value and help the mind to enter a playful and creative state. They serve as mental gymnastics to directly train logical thinking skills, and logical thinking is arguably one of the most powerful transferable skills that exists. And last but not least, brain teasers are canonically used nowadays in job interviews at some of the worlds top employers (Google, Facebook, Microsoft, prestigious hedge funds, …).

In this post, I will present one of my favourite brain teasers to see if I can get you hooked. It is a slightly modified and self-contained version of the so-called pirate game. You can find the solution at the end of the page. Enjoy responsibly! Continue reading

Getting the PDB structures of compounds in ChEMBL

Recently I was dealing with a set of compounds with known target activities from the ChEMBL database, and I wanted to find out which of them also had PDB  crystal structures in complex with that target.

Referencing this manually is very easy for cases where we are interested in 2-3 compounds, but for any larger number, using the ChEMBL and PDB web services greatly reduces the number of clicks.

Continue reading

A handful of lesser known python libraries

There are more python libraries than you can shake a stick at, but here are a handful that don’t get much love and may save you some brain power, compute time or both.

Fire is a library which turns your normal python functions into command-line utilities without requiring more than a couple of additional lines of copy-and-paste code. Being able to immediately access your functions from the command line is amazingly helpful when you’re making quick and dirty utilities and saves needing to reach for the nuclear approach of using getopt.

Continue reading

Uniformly sampled 3D rotation matrices

It’s not as simple as you’d think.

If you want to skip the small talk, the code is at the bottom. Sampling 2D rotations uniformly is simple: rotate by an angle from the uniform distribution \theta \sim U(0, 2\pi). Extending this idea to 3D rotations, we could sample each of the three Euler angles from the same uniform distribution \phi, \theta, \psi \sim U(0, 2\pi). This, however, gives more probability density to transformations which are clustered towards the poles:

Sampling Euler angles uniformly does not give an even distribution across the sphere.

In Fast Random Rotation Matrices (James Avro, 1992), a method for uniform random 3D rotation matrices is outlined, the main steps being:

Continue reading

AlphaFold 2 is here: what’s behind the structure prediction miracle

Nature has now released that AlphaFold 2 paper, after eight long months of waiting. The main text reports more or less what we have known for nearly a year, with some added tidbits, although it is accompanied by a painstaking description of the architecture in the supplementary information. Perhaps more importantly, the authors have released the entirety of the code, including all details to run the pipeline, on Github. And there is no small print this time: you can run inference on any protein (I’ve checked!).

Have you not heard the news? Let me refresh your memory. In November 2020, a team of AI scientists from Google DeepMind  indisputably won the 14th Critical Assessment of Structural Prediction competition, a biennial blind test where computational biologists try to predict the structure of several proteins whose structure has been determined experimentally but not publicly released. Their results were so astounding, and the problem so central to biology, that it took the entire world by surprise and left an entire discipline, computational biology, wondering what had just happened.

Continue reading

Out-of-distribution generalisation and scaffold splitting in molecular property prediction

The ability to successfully apply previously acquired knowledge to novel and unfamiliar situations is one of the main hallmarks of successful learning and general intelligence. This capability to effectively generalise is amongst the most desirable properties a prediction model (or a mind, for that matter) can have.

In supervised machine learning, the standard way to evaluate the generalisation power of a prediction model for a given task is to randomly split the whole available data set X into two sets – a training set X_{\text{train}} and a test set X_{\text{test}}. The model is then subsequently trained on the examples in the training set X_{\text{train}} and afterwards its prediction abilities are measured on the untouched examples in the test set X_{\text{test}} via a suitable performance metric.

Since in this scenario the model has never seen any of the examples in X_{\text{test}} during training, its performance on X_{\text{test}} must be indicative of its performance on novel data X_{\text{new}} which it will encounter in the future. Right?

Continue reading

Singularity: a guide for the bewildered bioinformatician

Have you ever worked with a piece of software that is awfully difficult to set up? That legacy code written on FORTRAN 77, that other one that requires significant modifications to compile, or any of those that require a long-winded bash script with a thousand dependencies (which you also have to install!). Would it not be helpful if, when that red-eyed PhD student, that one that just spent three months writing up their thesis, says that they absolutely must use that server where you have installed all your stuff, you could just relocate to another one without trouble? Well, you may be able to do that now. You just need to use containerization.

The idea behind containerization is rather simple. The best way to ensure anyone can reproduce your work is to, well, ship your entire system to whomever needs to use it. You could, for example, pack up your desktop in a box, and ship it to your collaborators anywhere in the world. Unfortunately, this idea is quite unpractical, not only because of tedious logistics (ever had to deal with customs?), but also because suddenly you won’t be able to run your own pipeline. However, it is a good enough thought that at some point made a clever engineer wonder whether there was a way to ship an entire system without physically delivering the computer. And that’s exactly what they designed.

40ft x 8ft (9ft 6") One trip high cube shipping container bl
Best way to make sure your collaborators on the other side of the world can run your pipeline — just pack your desktop in one of these, and ship it away!
Continue reading