Category Archives: Commentary

Happy 10th Birthday, Blopig!

OPIG recently celebrated its 20th year; and on 10 January 2023 I gave a talk just a day before the 10th anniversary of BLOPIG’s first blog post. It’s worth reflecting on what’s stayed the same and what’s changed since then.

Continue reading →

The Boltzmann Distribution and Gender Stereotypes

Journalist Caitlin Moran recently tweeted the following:

“I feel like every day now, I read/hear something saying “We don’t talk about what’s POSITIVE about masculinity; what’s GOOD about men and boys.” So: what IS the best stuff about boys, and men? Honest, celebratory question.”

What followed was a collection of replies acknowledging and celebrating various traits seen typically as ‘male’, including certain activities, such as knowing about sports or cars, or a desire to do DIY type work, and characteristics such as physical strength, no-nonsense attitudes and a ‘less complicated’ style of friendship between men.

Whilst I condone Moran’s efforts to turn recent discussions surrounding masculinity on their head and frame it in a positive light, to me the the responses offered and discussion that followed felt somewhat stifling. I am biologically male and identify as male, but do not feel like I personally adhere to most of these stereotypes. I am not physically strong, I know very little about cars and sports, and find there be just as much nuance and drama in male-male friendships as there is in friendships between other genders.

Continue reading →

Some Musings on AI in Art, Music and Protein Design

When I started my PhD in late 2018, AI hadn’t really entered the field of de novo protein design yet – at least not in a big way. Rosetta’s approach of continually ranking new side chain rotamers on a fixed backbone was still the gold standard for the ‘structure-to-sequence’ problem. And of course before long we had AI making waves in the structure prediction field, eventually culminating in the AlphaFold2 we all know and love.

Now, towards the end of my PhD, we are seeing the emergence of new generative models that learn from existing pdb structures to produce sequences that will (or at least should) fold into viable, sensible and crucially natural-looking shapes. ProtGPT2 is a good example (https://www.nature.com/articles/s41467-022-32007-7), but there are several more. How long before these models start reliably generating not only shapes but functions too? Jury’s out, but it’s looking more and more feasible. Safe to say the field as a whole has evolved massively during my time as a graduate student.

Continue reading →

Coarse-grained models of antibody solutions

Various coarse-grained (CG) models have become increasingly common in studies of antibody-antibody interactions in solution. These models appear poised to enter development pipelines in the near future to help predict and understand how antibody-antibody interactions influence the suitability of a given monoclonal antibody (mAb) for mass production and delivery as an antibody therapy. This blog post is a non-exhaustive summary of some of the highlights I found during a recent literature search.

Continue reading →

Musings on Digital Nomaddery from Seoul

The languorous, muggy heat of the Korean afternoon sun was what greeted me after 13 hour cattle-class flight from a cool, sensible Helsinki night. The goings-on in Ukraine, and associated political turmoil, meant taking the scenic route – avoiding Russia and instead passing over Turkey, Kazakstan and Mongolia – with legs contorted into unnatural positions and sleep an unattainable dream. Tired and disoriented, I relied less on Anna’s expert knowledge of the Korean language than her patience for my jet-lag-induced bad mood and brain fog. We waited an hour for a bus to take us from Incheon airport to Yongsan central station in the heart of the capital. It was 35 °C.

I’ve been here for a month. Anna has found work, starting in November; I have found the need to modify my working habits. Gone are the comfortable, temperate offices on St Giles’, replaced by an ever-changing diorama of cafés, hotel rooms and libraries. Lugging around my enormous HP Pavilion, known affectionately by some as ‘The Dominator’, proved to be unsustainable.

It’s thesis-writing time for me, so any programming I do is just tinkering and tweaking and fixing the litany of bugs that Lucy Vost has so diligently exposed. I had planned to run Ubuntu on Parallels using my MacBook Air; I discovered to my dismay that a multitude of Conda packages, including PyTorch, are not supported on Apple’s M1 chip. It has been replaced by a combination of Anna’s old Intel MacBook Pro and rewriting my codebase to install and run without a GPU – adversity is the great innovator, as the saying goes.

Continue reading →

Tidbits from YouGov Polls

Some recent verdicts from the British public on YouGov polls

The Queen (97%) is less well known than her husband Prince Philip (98%)
Liz Truss’ UK popularity rating (21%) is lower than George W. Bush’s (22%)
The most popular British dish is ‘Chips’ (84%) followed by ‘Fish and Chips’ (83%)
Oxford (55%) is a less popular university than Cambridge (58%)

So much for Aristotle’s ‘wisdom of crowds’!

Tackling horizontal and vertical limitations

A blog post about reviewing papers and preparing papers for publication.

We start with the following premise: all papers have limitations. There is not a single paper without limitations. A method may not be generally applicable, a result may not be completely justified by the data or a theory may make restrictive assumptions. To cover all limitations would make a paper infinitely long, so we must stop somewhere.

A lot of limitations fall into the following scenario. The results or methods are presented but they could have extended them in some way. Suppose, we obtain results on a particular cell type using an immortalized cell-line. Are the results still true, if we performed the experiments on primary or patient-derived cells? If the signal from the original cells was sufficiently robust then we would hope so. However, we can not be one hundred percent sure. A similar example is a method that can be applied to a certain type of data. It may be possible to extend the method to be applied to other data types. However, this may require some new methodology. I call this flavor of limitations vertical limitations. They are vertical in the sense that they build upon an already developed result in the manuscript. For certain journals, they will require that you tackle vertical limitations by adapting the original idea or method to demonstrate broad appeal or that idea could permeate multiple fields. Most of the time, however, the premise of an approach is not to keep extending it. It works. Leave it alone. Do not ask for more. An idea done well does not need more.

Continue reading →

Sharing Data Responsibly: The FAIR Principles

So you’ve submitted your paper, made your code publicly available, and maybe even provided documentation to ensure somebody can reproduce your work. But what about the data your work is based on? Is that readily available to your readers, too?

Maybe it’s too large to put on GitHub alongside your code. Maybe it’s sensitive, or subject to GDPR restrictions, so you can’t just stick a download link on your website. Maybe it’s in a proprietary format that needs non-open software to read. There are many reasons sharing data can be less straightforward than sharing code, and often it’s not entirely clear what ‘best practices’ are for a given situation. Data management is a complicated topic, and to do it justice would require far more than a quick blog post. Instead, I’d like to focus on a single source of guidance that serves as a useful starting point for thinking about responsible data management: the FAIR principles.

Continue reading →

CryoEM is now the dominant technique for solving antibody structures

Last year, the Structural Antibody Database (SAbDab) listed a record-breaking 894 new antibody structures, driven in no small part by the continued efforts of the researchers to understand SARS-CoV-2.

Fig. 1: The aggregate growth in antibody structure data (all methods) over time. Taken from http://opig.stats.ox.ac.uk/webapps/newsabdab/sabdab/stats/ on 25th May 2022.

In this blog post I wanted to highlight the major driving force behind this curve – the huge increase in cryo electron microscopy (cryoEM) data – and the implications of this for the field of structure-based antibody informatics.

Continue reading →

Make your code do more, with less

When you wrangle data for a living, you start to wonder why everything takes so darn long. Through five years of introspection, I have come to conclude that two simple factors limit every computational project. One is, of course, your personal productivity. Your time of focused work, minus distractions (and yes, meetings figure here), times your energy and mental acuity. All those things you have little control over, unfortunately. But the second is the productivity of your code and tools. And this, in principle, is a variable that you have full control over.

Even quick calculations, when applied to tens of millions of sequences, can take quite some time!

This is a post about how to increase your productivity, by helping you navigate all those instances when the progress bar does not seem to go fast enough. I want to discuss actionable tools to make your code run faster, and generate more results, with less effort, in less time. Instructions to tinker less and think more, so you can do the science that you truly want to be doing. And, above all, I want to give out advice that is so counter-intuitive that you should absolutely consider following it.

Continue reading →

Oxford Protein Informatics Group

or "OPIG" to friends

Category Archives: Commentary

Happy 10th Birthday, Blopig!

The Boltzmann Distribution and Gender Stereotypes

Some Musings on AI in Art, Music and Protein Design

Coarse-grained models of antibody solutions

Musings on Digital Nomaddery from Seoul

Tidbits from YouGov Polls

Tackling horizontal and vertical limitations

Sharing Data Responsibly: The FAIR Principles

CryoEM is now the dominant technique for solving antibody structures

Make your code do more, with less