Online tools for drawing and visualizing molecules

I recently came across a nice tool for depicting multiple molecules called CDK Depict (thanks to Ruben for sending it to me), so I decided to explore what other web-based molecule visualization and drawing tools are available.

Continue reading →

Handy LaTeX syntax I Googled over the years

In an attempt to ease the transition from Word to LaTeX for some of my colleagues (*cough* Alex *cough*) this blog post covers some LaTeX tricks I use most frequently when preparing manuscripts. It’s pitched at someone who is already familiar with the basic syntax of paragraphs, figures and tables.

Continue reading →

Some useful pandas functions

Pandas is one of the most used packages for data analysis in python. The library provides functionalities that allow to perfrom complex data manipulation operations in a few lines of code. However, as the number of functions provided is huge, it is impossible to keep track of all of them. More often than we’d like to admit we end up wiriting lines and lines of code only to later on discover that the same operation can be performed with a single pandas function.

To help avoiding this problem in the future, I will run through some of my favourite pandas functions and demonstrate their use on an example data set containing information of crystal structures in the PDB.

Continue reading →

The Antibody Dictionary

Similar to getting lost in a language when moving country, you might encounter a language barrier when moving research fields. This dictionary will guide you in the complex world of immunoinformatics, with a focus on antibodies. Whether your main research will be in this field, you want to apply your machine learning model on antibodies, or you just want to understand the research performed in OPIG, this dictionary will get you started.

The Antibody Dictionary:

Affinity maturation: The optimisation process of naive antibodies to memory antibodies such that the antibody is optimised for a specific antigen.

Antibody: (immunoglobulin) a Y-shaped molecule important in the adaptive immune system. A canonical antibody consists of two identical heavy chains and two identical smaller light chains.

Continue reading →

Let your library design blosum

During the lead optimisation stage of the drug discovery pipeline, we might wish to make mutations to an initially identified binding antibody to improve properties such as developability, immunogenicity, and affinity.

There are many ways we could go about suggesting these mutations including using Large Language Models e.g. ESM and AbLang, or Inverse Folding methods e.g. ProteinMPNN and AntiFold. However, some of our recent work (soon to be pre-printed) has shown that classical non-Machine Learning approaches, such as BLOSUM, could also be worth considering at this stage.

Continue reading →

How to replace bike ball bearings when your steering sounds crunchy

Over the last few months my bicycle steering axle started freezing up, to the point where the first thing I did before getting on my bike in the morning was jerk the handlebars from side to side aggressively to loosen it up. It made atrocious guttural sounds and bangs when I did and navigating Oxford by bike was becoming more treacherous by the day as I swerved from left to right trying to wrestle my front wheel’s fork in the right direction. It was time to undertake some DIY…

Continue reading →

VHH -vs- VNAR

As one of the group’s resident nanobody enthusiasts, on the OPIG retreat this year I presented a talk on shark VNARs, their therapeutic potential and how they differ from VHHs. Here are some of the main points covered:

Continue reading →

Where do we go from here?

This is an experiment.

We will prompt a model with a piece of text and assess how well it understands the contents.

The model sits behind your eyes.

The data used to train this model is your entire life, up to and including this very moment.

Continue reading →

The workings of Fragmenstein’s RDKit neighbour-aware minimisation

Fragmenstein is a Python module that combine hits or position a derivative following given templates by being very strict in obeying them. This is done by creating a “monster”, a compound that has the atomic positions of the templates, which then reanimated by very strict energy minimisation. This is done in two steps, first in RDKit with an extracted frozen neighbourhood and then in PyRosetta within a flexible protein. The mapping for both combinations and placements are complicated, but I will focus here on a particular step the minimisation, primarily in answer to an enquiry, namely how does the RDKit minimisation work.

Continue reading →

Converting pandas DataFrames into Publication-Ready Tables

Analysing, comparing and communicating the predictive performance of machine learning models is a crucial component of any empirical research effort. Pandas, a staple in the Python data analysis stack, not only helps with the data wrangling itself, but also provides efficient solutions for data presentation. Two of its lesser-known yet incredibly useful features are df.to_markdown() and df.to_latex(), which allow for a seamless transition from DataFrames to publication-ready tables. Here’s how you can use them!

Continue reading →

Oxford Protein Informatics Group

or "OPIG" to friends

Online tools for drawing and visualizing molecules

Handy LaTeX syntax I Googled over the years

Some useful pandas functions

The Antibody Dictionary

Let your library design blosum

How to replace bike ball bearings when your steering sounds crunchy

VHH -vs- VNAR

Where do we go from here?

The workings of Fragmenstein’s RDKit neighbour-aware minimisation

Converting pandas DataFrames into Publication-Ready Tables