Recently I have had the opportunity to get a closer look at the submission, review and promotion cycle for a typical academic paper. It was a great learning experience and led to an increase in the number and of research papers, news articles, and reviews I read in preparation. However, on multiple occasions, I did think “I wish I could watch a 2 min video to explain this”. That got me thinking, why couldn’t I and should I be able to?
Being Brief.
This is a blog post about using fewer words.
Continue readingUsing Singularity on Windows with WSL2
Previously on this blog, my colleagues Carlos and Eoin have extolled the many virtues of Singularity, which I will not repeat here. Instead, I’d like to talk about a rather interesting subject that was unexpectedly thrust upon me when my faithful Linux laptop started to show the early warning signs of critical existence failure: is there a good way to run a Singularity container on a pure Windows machine? It turns out that, with version 2 of the Windows Subsystem for Linux (WSL), there is.
Continue readingLessons in Scientific Code Deployment
So, I recently deployed my first piece of scientific code. Well, sort of. I made a github with instructions on how to download, install and run it.
And then everyone broke it.
So, now having been on tech support duty for a few weeks, it seemed like a good idea to have a think about what I’ve learned.
Now, there is a big preface to this: the first and most important thing I learned is that I should do some reading on how to do this well. I have not yet done that reading, so this post isn’t so much going to offer any advice as catalogue my mistakes. Mistakes that will probably look extremely silly to anyone who has any familiarity with deployment, but might be interesting to anyone who doesn’t.
A surprising number of people really don’t want to touch the command line
Being a programmer who spends the vast majority of their time on the command line, invoking programs from there is very natural. As such, I very much underestimated the obstacle that even installing anaconda, a few packages, and cloning the source code would be. Even with instructions to copy and paste.
The issue is, if anything goes wrong, there is a good chance they don’t know whether it is my code or their environment breaking, which probably means they need to contact me about it (more on environments later).
Really, I probably could have saved myself an awful lot of support by making it an installable, and more with a gui to guide people through using the program.
Python is a pain
So, the first thing I learned was something I’d kind of been warned about: deploying python code is a pain in the butt. Especially to people who aren’t familiar with python, managing python environments is both tricky and overwhelming easy to break code with. Run a python script from the wrong environment and it is going to fail: if you are lucky with a failure to import a module, if you are unlucky with a cryptic error due to say changes between various python versions.
Speaking of python versions, developing in 3.9 and not testing in 3.7 then telling people to install that can result in a surprising number of surprisingly difficult bugs.
The instructions weren’t clear enough
Scientific code I think generally caters an awful lot to expert users, people who really understand the model and even are willing to open the source code to figure out the implementation.
My first stab at documentation managed to not be clear enough to the people who didn’t want to touch the command line and those who were willing to open the source code because they wanted to do something spicy.
So yeah, good documentation is an acquired skill.
Distributed computing is a nightmare
In principle, distribution is terrific: get a library that will allow you to reduce running arbitrary python code on multiple nodes to a simple map-like interface. On big clusters, like a lot of scientists use, this can mean speed ups from 10 to even 1000 times.
The only problem is, everyone’s cluster is a special snowflake, and you can’t access most of them to fix things. This can make iteration with a non-programmer painfully slow.
Libraries don’t help as much as I’d have thought either: indeed, my experience of Dask and Dask Jobqueue has been a consistently uphill battle. From the fact that my workload likes individual nodes sharing lots of memory and a few cpus to some truly arcane errors (one that broke in the msgpack code), I have generally considered (and even started) writing my own code to do this.
Active development doesn’t reach people
Code that is being updated several times a day in response to bugfixes can be great – but if people aren’t pulling and installing it, no-one is going to benefit. I’m seriously tempted to write some code to either auto-update on running or at least let folk know it has been updated.
Summary
In summary, a lot went wrong in my first stab at this. Very much come to appreciate a good deployment is an artform, and I’ve got an awful lot of reading to do. In particular, the above problem areas really have eaten a lot of time that probably could have been used doing actual science with the code, so there is a good incentive to get it right.
New search features for the Structural Antibody Database (SAbDab)
Since its original publication in 2013, we have added several advanced search features to the Structural Antibody Database. This post aims to give an overview over some of these features.
Continue readingAntibodies for gut or bad
Over the last two decades, there has been mounting evidence of the role of the gut microbiome (the collection of microorganisms in the GI tract) in metabolic disorder (Fan and Pedersen 2021) and more recently, in psychiatric illness (Morais, Schreiber, and Mazmanian 2021). The maintenance of the equilibrium of commensal bacteria and their proper compartmentalization and stratification in the gut is critical for health.
There are diverse factors regulating microbiota composition (microbiota homeostasis) (Macpherson and McCoy 2013). I am principally interested in the role of antibodies – the idea that antibodies participate in this process is controversial (Kubinak and Round 2016) because of the difficulty of controlling for the multiple confounding environmental variables that influence the microbiome, but there are theories as to how this happens. The process of the shaping of the microbiota by antibodies was dubbed “antibody-mediated immunoselection” (AMIS) by (Kubinak and Round 2016).
Continue readingFormer OPIGlets – where are they now?
Since OPIG began in 2003, 53 students* have managed to escape. But where are these glorious people now? I decided to find out, using my best detective skills (aka LinkedIn, Google and Twitter).
* I’m only including full members who have left the group, as per the former members list on the OPIG website
Where are they?
Firstly, the countries. OPIGlets are mostly still residing in the UK, primarily in the ‘golden triangle’ of London, Oxford and Cambridge. The US comes in second, followed closely by Germany (Note: one former OPIGlet is in Malta, which is too small to be recognised in Geopandas so just imagine it is shown on the world map below)
Continue reading2021 likely to be a bumper year for therapeutic antibodies entering clinical trials; massive increase in new targets
Earlier this month the World Health Organisation (WHO) released Proposed International Nonproprietary Name List 125 (PL125), comprising the therapeutics entering clinical trials during the first half of 2021. We have just added this data to our Therapeutic Structural Antibody Database (Thera-SAbDab), bringing the total number of therapeutic antibodies recognised by the WHO to 711.
This is up from 651 at the end of 2020, a year which saw 89 new therapeutic antibodies introduced to the clinic. This rise of 60 in just the first half of 2021 bodes well for a record-breaking year of therapeutics entering trials.
Continue readingA Smattering of Olympic Trivia!
Tokyo 2020 is now firmly in our rearview mirror, and I for one will be sad to be deprived of the opportunity to wake up at 4AM to passionately cheer on someone I’ve never heard of in an event I know nothing about as they go for Gold. The heyday of amateurism in the Olympics may be long gone, but it’s never been better for the amateur fan, with 24/7, on-demand, coverage, unprecedented access to the athletes via social media and remote working offering the opportunity to watch the games on a second screen without worrying about one’s boss noticing (not that I would ever engage in such an irresponsible practice, in case my Supervisor is reading this…).
To indulge both my post-Olympics melancholy and my addiction to sports trivia, I’ve trawled the internet to find some interest factoids related to the Summer Games and present them below for your mild enjoyment:
Continue readingA handful of lesser known python libraries
There are more python libraries than you can shake a stick at, but here are a handful that don’t get much love and may save you some brain power, compute time or both.
Fire is a library which turns your normal python functions into command-line utilities without requiring more than a couple of additional lines of copy-and-paste code. Being able to immediately access your functions from the command line is amazingly helpful when you’re making quick and dirty utilities and saves needing to reach for the nuclear approach of using getopt.
Continue reading