Category Archives: Small Molecules

Watch out when using PDBbind!

Now that PDBbind 2020 has been released, I want to draw some attention to an issue with using the SDF files that are supplied in the PDBbind refined set 2020.

Normally, SDF files save the chirality information of compounds in the atom block of the file which is shown belowas a snipped of the full sdf file for the ligand of PDB entry 4qsv. The column that defines chirality is marked in red.

As you can see, all columns shown here are 0. The SDF files supplied by PDBbind for some reason do NOT encode chirality information explicitly. This will be a problem when using RDKit to read the molecule and transform it into a smiles string. By using the following commands to read the ligand for 4qsv from PDBBind 2020 and write a SMILES string, we get:

Continue reading

Out-of-distribution generalisation and scaffold splitting in molecular property prediction

The ability to successfully apply previously acquired knowledge to novel and unfamiliar situations is one of the main hallmarks of successful learning and general intelligence. This capability to effectively generalise is amongst the most desirable properties a prediction model (or a mind, for that matter) can have.

In supervised machine learning, the standard way to evaluate the generalisation power of a prediction model for a given task is to randomly split the whole available data set X into two sets – a training set X_{\text{train}} and a test set X_{\text{test}}. The model is then subsequently trained on the examples in the training set X_{\text{train}} and afterwards its prediction abilities are measured on the untouched examples in the test set X_{\text{test}} via a suitable performance metric.

Since in this scenario the model has never seen any of the examples in X_{\text{test}} during training, its performance on X_{\text{test}} must be indicative of its performance on novel data X_{\text{new}} which it will encounter in the future. Right?

Continue reading

Automated intermolecular interaction detection using the ODDT Python Module

Detecting intermolecular interactions is often one of the first steps when assessing the binding mode of a ligand. This usually involves the human researcher opening up a molecular viewer and checking the orientations of the ligand and protein functional groups, sometimes aided by the viewer’s own interaction detecting functionality. For looking at single digit numbers of structures, this approach works fairly well, especially as more experienced researchers can spot cases where the automated interaction detection has failed. When analysing tens or hundreds of binding sites, however, an automated way of detecting and recording interaction information for downstream processing is needed. When I had to do this recently, I used an open-source Python module called ODDT (Open Drug Discovery Toolkit, its full documentation can be found here).

My use case was fairly standard: starting with a list of holo protein structures as pdb files and their corresponding ligands in .sdf format, I wanted to detect any hydrogen bonds between a ligand and its native protein crystal structure. Specifically, I needed the number and name of the the interacting residue, its chain ID, and the name of the protein atom involved in the interaction. A general example on how to do this can be found in the ODDT documentation. Below, I show how I have used the code on PDB structure 1a9u.

Continue reading

ORDER!: Returning bond order information to your docked poses

John Bercow Order Remix - YouTube

Common docking software, such as AutoDock Vina or AutoDock 4, require the ligand and receptor files to be converted into the PDBQT format. Once a correct pose has been identified, the pose will be produced also as a .pdbqt file.

Continue reading

Calculating symmeterised small molecule RMSDs using graph automorphisms in python with GEMMI and NetworkX

When a ring flips, how do we calculate RMSD?

This surprisingly simple question leads to a very interesting problem! If we take a benzene molecule, say, and rotate it 180 degrees, then we have the exact same molecule, but if we have a data structure in which our atoms are labelled, and we apply the same transformation to the atomic positions, the numbering does not reflect that symmetry. If we were then naively to calculate the RMSD it would be huge, despite the fact that the molecule is, chemically speaking, identical.

How can we make our RMSD calculations reflect these symmetries?

Continue reading

Fragment-to-Lead Successes in 2019

In this blogpost, I want to highlight the excellent work by Jahnke and collaborators. For the past 5 years, they have published an annual perspective covering fragment-to-lead success stories from the previous year. Very helpfully, their work includes a table detailing the hit fragment(s) and lead molecule, together with key experimental results and parameters.

Continue reading

Curious About the Origins of Computerized Molecules? Free Webinar Dec 22…

After the stunning announcement at CASP14 that DeepMind’s AlphaFold 2 had successfully predicted the structures of proteins from their sequence alone, it’s hard to believe we began this journey by representing molecules with punched cards

Image of a punched card, showing 80 columns and 12 rows, with particular rectangular holes representing the 1 bits of binary numbers. The upper right corner is cut at an angle, to facilitate feeding the card into a punched card reader. The column numbers are printed along the bottom. The words “IBM UNITED KINGDOM LIMITED” are printed along the very bottom. This card is line 12 from a Fortran program, “12 PIFRA=(A(JB,37)-A(JB,99))/A(JB,47) PUX 0430”. Image Credit: Pete Birkinshaw, Manchester, U.K. CC BY 2.0

Tales of carrying stacks of punched cards to the computer centre with a line drawn diagonally on the side of the stack, to help put them back in order should you trip and fall—seem like another universe—but this is what passed for the human-computer interface in much of the mid-20th century.

Continue reading

Understanding Conformational Entropy in Small Molecules

While entropy is a major driving force in many chemical changes and is a key component of the free energy of a molecule, it can be challenging to calculate with standard quantum thermochemical methods. With proper consideration in flexible molecules, we can break down the total entropy into different components, including vibrational, translational, rotational and conformational entropy. The calculation of conformational entropy is the most time-consuming as we have to sample all thermally-accessible conformers. Here, we attempt to understand the components that contribute to the conformational entropy of a molecule, and develop a physically-motivated statistical model to rapidly predict the conformational entropies of small molecules.

Continue reading