I recently spoke at the Festival of Biologics 2021 conference in Basel (in-person, just in time!), and was lucky enough to be offered the chance to chair a session of talks. As this was the first time I’d ever been asked to do this, I asked Charlotte for some hints to make things go more smoothly. I found her advice very useful, so I thought I’d share it here for other first-time “chairers”!
Continue readingNew review on BCR/antibody repertoire analysis out in MAbs!
In our latest immunoinformatics review, OPIG has teamed up with experienced antibody consultant Dr. Anthony Rees to outline the evidence for BCR/antibody repertoire convergence on common epitopes post-pathogen exposure, and all the ways we can go about detecting it from repertoire gene sequencing data. We highlight the new advances in the repertoire functional analysis field, including the role for OPIG’s latest tools for structure-aware antibody analytics: Structural Annotation of AntiBody repertoires+ (SAAB+), Paratyping, Ab-Ligity, Repertoire Structural Profiling & Structural Profiling of Antibodies to Cluster by Epitope (‘SPACE’).
Continue readingA logical brain teaser to derail your afternoon
Brain teasers have a strange power. For many they evoke nothing more than a mild and transient sense of curiosity. But for a certain subset of people they create an irresistible intellectual temptation which even needs to actively be avoided at times as not to completely derail conversations and take over whole afternoons.
For better or worse, I am in the camp of people who are highly susceptible to brain teasers. I just love them too much. More than once in my lifetime I had to ask a friend not to tell me about a particular brain teaser they had heard about because I knew it would inevitably take over my mind and send me down an almost hypnotic spiral of thoughts whose only escape would be finding the solution.
While brain teasers can admittedly turn into ridiculously powerful distractions for some of us, they are not necessarily a waste of time. They have high recreational value and help the mind to enter a playful and creative state. They serve as mental gymnastics to directly train logical thinking skills, and logical thinking is arguably one of the most powerful transferable skills that exists. And last but not least, brain teasers are canonically used nowadays in job interviews at some of the worlds top employers (Google, Facebook, Microsoft, prestigious hedge funds, …).
In this post, I will present one of my favourite brain teasers to see if I can get you hooked. It is a slightly modified and self-contained version of the so-called pirate game. You can find the solution at the end of the page. Enjoy responsibly! Continue readingBenchmarks in De Novo Drug Design
I recently came across a review of “De novo molecular drug design benchmarking” by Lauren L. Grant and Clarissa S. Sit where they highlighted the recently proposed benchmarking methods including Fréchet ChemNet Distance [1], GuacaMol [2], and Molecular Sets (MOSES) [3] together with its current and future potential applications as well as the steps moving forward in terms of validation of benchmarking methods [4].
From this review, I particularly wanted to note about the issues with current benchmarking methods and the points we should be aware of when using these methods to benchmark our own de novo molecular design methods. Goal-directed models are referring to de novo molecular design methods optimizing for a particular scoring function [2].
Continue readingAn A-Z of Oxford
The 2021/2 academic year is now well underway in Oxford, which means a fresh batch of new students getting to grips with some of the bewildering terminology employed here, as well as prospective applicants for next year trying to figure out what on earth a college is and which one they should apply to. As a wizened final year DPhil student I decided to compile an A-Z of Oxford related terms in the hope that someone might find it useful.
A – Ashmolean Museum
Britain’s first public museum, established all the way back in 1678. Home to exhibits covering Ancient Egypt to Modern Art and everything in between.

B – Battels
A termly bill students receive from their college which might cover things like charges for food and accommodation, or fines for not returning books to the library on time.
C – College
The 39 colleges are small educational institutions which together comprise the University of Oxford. Every student is a member of a college, each of which has their own set of facilities, including a dining hall, bar, library and student accommodation. Colleges also have their own student unions, called the Junior Common Room (for undergraduates) and Middle Common Room (for postgraduates), which are excellent places to socialise and meet people studying lots of different subjects.

Using normalized SuCOS scores.
If you are working in cheminformatics or utilise protein-ligand docking, then you should be aware of the SuCOS score, an open-source shape and chemical feature overlap metric designed by a former member of OPIG: Susan Leung.
The metric compares the 3D conformers of two ligands based on their shape overlap as well as their chemical feature overlap using the RDKit toolkit. Leung et al. show that SuCOS is able to select fewer false positives and false negatives when doing re-docking studies than other scoring metrics such as RMSD or Protein Ligand Interaction Fingerprints (PLIF) similarity scores and performs better at differentiating actives from decoys when tested on the DUD-E dataset.
Most importantly, SuCOS was designed with fragment based drug discovery in focus, where a smaller fragment ligand is elaborated or combined with other fragments to create a larger molecule, with hopefully stronger binding affinity. Unlike for example RMSD, SuCOS is able to quickly calculate an overlap score between a small fragment and a larger molecule, giving chemists an idea on how the fragment elaboration might interact with the protein. However, the original SuCOS algorithm was not normalized and could create scores of > 1 for some cases.
I’ve uploaded a normalised version of the original SuCOS algorithm as a GitHub fork of Susan’s original repository. You can find the normalised SuCOS algorithm here.
Hopefully this is helpful for anyone using the SuCOS algorithm and for all docking enthusiasts who are interested in an alternative way to evaluate their docked poses.
An idea by any other name would smell as sweet.
A blog post about ideas.
Ideation is the formation of an idea, but how do we ideate?
The route of the word is “to see”, so when we have an idea we see something. In that moment of realization, we hold on to something quite abstract. Some describe it as a click or pattern or insight. This “seeing” is with the mind, however, not the eyes. Idea also implies sentiment or direction – a path one might say. It’s this last point that resonates with me most. When we are lost, in the sea of thoughts, most of the time the consequences are immediate (no consciousness required). However, sometimes we must pause and ideate. Our path, the next step, is unclear.
Continue readingMonty Python
Every now and then I decide to overthink a problem I thought I understood and get confused – last week, it was the Monty Hall problem.
For those unfamiliar with the thought experiment, the basic premise is that you are on a game show and are presented with three doors. Behind one of the doors is a car, while behind the other two are goats.
With zero initial information, you make a guess as to which door you think the car is behind (we assume you have enough goats already). Before looking behind your chosen door, the host opens one of the remaining two doors and reveals a goat. The host then asks you if you would like to change your guess. What should you do?
Continue readingGetting the PDB structures of compounds in ChEMBL
Recently I was dealing with a set of compounds with known target activities from the ChEMBL database, and I wanted to find out which of them also had PDB crystal structures in complex with that target.
Referencing this manually is very easy for cases where we are interested in 2-3 compounds, but for any larger number, using the ChEMBL and PDB web services greatly reduces the number of clicks.
Continue readingIssues with graph neural networks: the cracks are where the light shines through
Deep convolutional neural networks have lead to astonishing breakthroughs in the area of computer vision in recent years. The reason for the extraordinary performance of convolutional architectures in the image domain is their strong ability to extract informative high-level features from visual data. For prediction tasks on images, this has lead to superhuman performance in a variety of applications and to an almost universal shift from classical feature engineering to differentiable feature learning.
Unfortunately, the picture is not quite as rosy yet in the area of molecular machine learning. Feature learning techniques which operate directly on raw molecular graphs without intermediate feature-engineering steps have only emerged in the last few years in the form of graph neural networks (GNNs). GNNs, however, still have not managed to definitively outcompete and replace more classical non-differentiable molecular representation methods such as extended-connectivity fingerprints (ECFPs). There is an increasing awareness in the computational chemistry community that GNNs have not quite lived up to the initial hype and still suffer from a number of technical limitations.
Continue reading