Category Archives: Molecular Design

The “AI-ntibody” Competition: benchmarking in silico antibody screening/design

We recently contributed to a communication in Nature Biotechnology detailing an upcoming competition coordinated by Specifica to evaluate the relative performance of in vitro display and in silico methods at identifying target-specific antibody binders and performing downstream antibody candidate optimisation.

Following in the footsteps of tournaments such as the Critical Assessment of Structure Prediction (CASP), which have led to substantial breakthroughs in computational methods for biomolecular structure prediction, the AI-ntibody initiative seeks to establish a periodic benchmarking exercise for in silico antibody discovery/design methods. It should help to identify the most significant breakthroughs in the space and orient future methods’ development.

Continue reading →

Out of the box RDKit-valid is an imperfect metric: a review of the KekulizeException and nitrogen protonation to correct this

In deep learning based compound generation models the metric of fraction of RDKit-valid compounds is ubiquitous, but is problematic from the cheminformatics viewpoint as a large fraction may be driven by pyrrolic nitrogens (see below) rather than Texas carbons (carbon with 5 bonds like the Star of Texas). In RDKit, no error is more irksome that the KekulizeException or ValenceException from RDKit sanitisation. These are raised when the molecule is not correct. This would make the RDKit-valid a good metric, except for a small detail: the validity is as interpreted from the the stated implicit and explicit hydrogens and formal charges on the atoms, which most models do not assign. Therefore, a compound may not be RDKit-valid because it is actually impossible, like a Texas carbon, but in many cases it is because the formal charge or implicit hydrogen numbers of some atoms are incorrect. In both case, the major culprit is nitrogen. Herein I go through what they are and how to fix them, with a focus on aromatic nitrogens.

Continue reading →

Fine-tune generated molecular poses with a force field

Some molecular pose generation methods benefit from an energy relaxation post-processing step.

Predicted pose before energy minimization — Example of a small molecule pose before and after energy minimization. The pose before minimization is shown in white, the optimized prediction is shown in pink, and a crystal pose is shown as reference in light blue. Note how the aromatic rings are flattened and the leftmost bond is shortened by the optimization.

Here is a quick way to do this using OpenMM via a short script I prepared:

Continue reading →

Conference Summary: MGMS Adaptive Immune Receptors Meeting 2024

On 5th April 2024, over 60 researchers braved the train strikes and gusty weather to gather at Lady Margaret Hall in Oxford and engage in a day full of scientific talks, posters and discussions on the topic of adaptive immune receptor (AIR) analysis!

Continue reading →

A Simple Way to Quantify the Similarity Between Two Sets of Molecules

When designing machine learning algorithms with the aim of accelerating the discovery of novel and more effective therapeutics, we often care deeply about their ability to generalise to new regions of chemical space and accurately predict the properties of molecules that are structurally or functionally dissimilar to the ones we have already explored. To evaluate the performance of algorithms in such an out-of-distribution setting, it is essential that we are able to quantify the data shift that is induced by the train-test splits that we rely on to decide which model to deploy in production.

For our recent ICML 2023 paper Drug Discovery under Covariate Shift with Domain-Informed Prior Distributions over Functions, we chose to quantify the distributional similarity between two sets of molecules through the Maximum Mean Discrepancy (MMD).

Continue reading →

A simple criterion can conceal a multitude of chemical and structural sins

We’ve been investigating deep learning-based protein-ligand docking methods which often claim to be able to generate ligand binding modes within 2Å RMSD of the experimental one. We found, however, this simple criterion can conceal a multitude of chemical and structural sins…

X-ray crystal structure of ligand in PDB ID 1t9b.

Intertwined rings of the ligand from 1t9b.

DeepDock attempted to generate the ligand binding mode from PDB ID 1t9b
(light blue carbons, left), but gave pretzeled rings instead (white carbons, right).

Continue reading →

What can you do with the OPIG Immunoinformatics Suite? v3.0

OPIG’s growing immunoinformatics team continues to develop and openly distribute a wide variety of databases and software packages for antibody/nanobody/T-cell receptor analysis. Below is a summary of all the latest updates (follows on from v1.0 and v2.0).

Continue reading →

9th Joint Sheffield Conference on Cheminformatics

Over the next few days, researchers from around the world will be gathering in Sheffield for the 9th Joint Sheffield Conference on Cheminformatics. As one of the organizers (wearing my Molecular Graphics and Modeling Society ‘hat’), I can say we have an exciting array of speakers and sessions:

De Novo Design
Open Science
Chemical Space
Physics-based Modelling
Machine Learning
Property Prediction
Virtual Screening
Case Studies
Molecular Representations

It has traditionally taken place every three years, but despite the global pandemic it is returning this year, once again in person in the excellent conference facilities at The Edge. You can download the full programme in iCal format, and here is the conference calendar:

Continue reading →

The State of Computational Protein Design

Last month, I had the privilege to attend the Keystone Symposium on Computational Design and Modeling of Biomolecules in beautiful Banff, Canada. This conference gave an incredible insight into the current state of the protein design field, as we are on the precipice of advances catalyzed by deep learning.

Here are my key takeaways from the conference:

Continue reading →

Be a computational chemist and you must be a jack of all trades

Being a jack of all trades brings to mind someone who has extensive multidisciplinary expertise and is equipped with many tools in their toolbox to solve different problems. A jack of all trades is a great succinct description for computational chemists in drug discovery.

Recently I had a great conversation with Dr. Arjun Narayanan, a Senior Research Scientist at Vertex Pharmaceuticals and a jack of all trades as a computational chemist. In this blog post, I’ll describe what he does as a computational chemist, the problems he solves, and the new tools he’s looking forward to adding to his toolbox.

Continue reading →

Oxford Protein Informatics Group

or "OPIG" to friends

Category Archives: Molecular Design

The “AI-ntibody” Competition: benchmarking in silico antibody screening/design

Out of the box RDKit-valid is an imperfect metric: a review of the KekulizeException and nitrogen protonation to correct this

Fine-tune generated molecular poses with a force field

Conference Summary: MGMS Adaptive Immune Receptors Meeting 2024

A Simple Way to Quantify the Similarity Between Two Sets of Molecules

A simple criterion can conceal a multitude of chemical and structural sins

What can you do with the OPIG Immunoinformatics Suite? v3.0

9th Joint Sheffield Conference on Cheminformatics

The State of Computational Protein Design

Be a computational chemist and you must be a jack of all trades