Pandas is one of my favourite data analysis tools working in Python! The data frames offer a lot of power and organization to any data analysis task. Here at OPIG we work with a lot of protein structure data coming from PDB files. In the following article I will go through an example of how I use pandas data frames to analyze PDB data.
Continue readingCategory Archives: Python
The stuff MDAnalysis didn’t implement: CPU Parallel HOLE conductance analysis
Some time ago, I needed to find a way to computationally estimate conductance values for every protein frame from several molecular dynamics (MD) trajectories.
In a previous post, I wrote about how to clean the resulting instant conductance timeseries from outliers. But, I never described how I generated these timeseries.
In this post, I will show how you can parallelise the computation of instant conductance given an MD trajectory. I will touch on the difficulties of this process. And why I had to implement a custom tool for it given that MDAnalysis
seems to already have implemented a routine of this sort. Finally, I will provide two Python scripts that you can easily adapt to run your parallel calculations – for which I’ll provide some important notes you don’t wanna skip.

Taking Equivariance in deep learning for a spin?
I recently went to Sheh Zaidi‘s brilliant introduction to Equivariance and Spherical Harmonics and I thought it would be useful to cement my understanding of it with a practical example. In this blog post I’m going to start with serotonin in two coordinate frames, and build a small equivariant neural network that featurises it.
Continue readingSome useful pandas functions
Pandas is one of the most used packages for data analysis in python. The library provides functionalities that allow to perfrom complex data manipulation operations in a few lines of code. However, as the number of functions provided is huge, it is impossible to keep track of all of them. More often than we’d like to admit we end up wiriting lines and lines of code only to later on discover that the same operation can be performed with a single pandas function.
To help avoiding this problem in the future, I will run through some of my favourite pandas functions and demonstrate their use on an example data set containing information of crystal structures in the PDB.
Continue readingLet your library design blosum
During the lead optimisation stage of the drug discovery pipeline, we might wish to make mutations to an initially identified binding antibody to improve properties such as developability, immunogenicity, and affinity.
There are many ways we could go about suggesting these mutations including using Large Language Models e.g. ESM and AbLang, or Inverse Folding methods e.g. ProteinMPNN and AntiFold. However, some of our recent work (soon to be pre-printed) has shown that classical non-Machine Learning approaches, such as BLOSUM, could also be worth considering at this stage.
Continue readingConverting pandas DataFrames into Publication-Ready Tables
Analysing, comparing and communicating the predictive performance of machine learning models is a crucial component of any empirical research effort. Pandas, a staple in the Python data analysis stack, not only helps with the data wrangling itself, but also provides efficient solutions for data presentation. Two of its lesser-known yet incredibly useful features are df.to_markdown()
and df.to_latex()
, which allow for a seamless transition from DataFrames to publication-ready tables. Here’s how you can use them!
Deploying a Flask app part II: using an Apache reverse proxy
I recently wrote about serving a Flask web application on localhost using gunicorn. This is sufficient to get an app up and running locally using a production-ready WSGI server, but we still need to add a HTTP proxy server in front to securely handle HTTP requests coming from external clients. Here we’ll cover configuring a simple reverse proxy using the Apache web server, though of course you could do the same with another HTTP server such as nginx.
Continue readingUnderstanding GPU parallelization in deep learning
Deep learning has proven to be the season’s favourite for biology: every other week, an interesting biological problem is solved by clever application of neural networks. Yet, as more challenges get cracked, modern research shifts more and more in the direction of larger models — meaning that increasing computational resources are required for training. Unsurprisingly, NVIDIA, the main manufacturer of GPUs, experienced a significant jump in their stock price earlier this year.
Access to compute is not enough to train good neural networks. As soon as multiple cards enter into play, researchers need to use a completely different paradigm where data and model weights are distributed across different devices — and sometimes even different computers. Though these tools start to be crucial for successful computational biology research, they are generally unknown to researchers. Hence, in this blogpost, I would like to provide a really brief introduction to multi-GPU training.
Continue readingDeploying a Flask app part I: the gunicorn WSGI server
Last year I wrote a post about deploying Flask apps with Apache/mod_wsgi when your app’s dependencies are installed in a conda environment. The year before, in the dark times, I wrote a post about the black magic invocations required to get multiple apps running stably using mod_wsgi. I’ve since moved away from mod_wsgi entirely and switched to running Flask apps from containers using the gunicorn WSGI server behind an Apache reverse proxy, which has made life immeasurably easier. In this post we’ll cover running a Flask app on localhost using gunicorn; in Part II we’ll run our app as a service using Singularity and deploy it to production using Apache as a HTTP proxy server.
Continue readingAI Can’t Believe It’s Not Butter
Recently, I’ve been using a Convolutional Neural Network (CNN), and other methods, to predict the binding affinity of antibodies from their sequence. However, nine months ago, I applied a CNN to a far more important task – distinguishing images of butter from margarine. Please check out the GitHub link below to learn moo-re.
https://github.com/lewis-chinery/AI_cant_believe_its_not_butter