I recently came across a nice tool for depicting multiple molecules called CDK Depict (thanks to Ruben for sending it to me), so I decided to explore what other web-based molecule visualization and drawing tools are available.
Continue readingCategory Archives: Small Molecules
The workings of Fragmenstein’s RDKit neighbour-aware minimisation
Fragmenstein is a Python module that combine hits or position a derivative following given templates by being very strict in obeying them. This is done by creating a “monster”, a compound that has the atomic positions of the templates, which then reanimated by very strict energy minimisation. This is done in two steps, first in RDKit with an extracted frozen neighbourhood and then in PyRosetta within a flexible protein. The mapping for both combinations and placements are complicated, but I will focus here on a particular step the minimisation, primarily in answer to an enquiry, namely how does the RDKit minimisation work.
Continue readingDemystifying the thermodynamics of ligand binding
Chemoinformatics uses a curious jumble of terms from thermodynamics, wet-lab techniques and statistical terminology, which is at its most jarring, it could be argued, in machine learning. In some datasets one often sees pIC50, pEC50, pKi and pKD, in discussion sections a medchemist may talk casually of entropy, whereas in the world of molecular mechanics everything is internal energy. Herein I hope to address some common misconceptions and unify these concepts.
Continue readingConference feedback: AI in Chemistry 2023
Last month, a drift of OPIGlets attended the royal society of chemistry’s annual AI in chemistry conference. Co-organised by the group’s very own Garrett Morris and hosted in Churchill College, Cambridge, during a heatwave (!), the two days of conference featured aspects of artificial intelligence and deep machine learning methods to applications in chemistry. The programme included a mixture of keynote talks, panel discussion, oral presentations, flash presentations, posters and opportunities for open debate, networking and discussion amongst participants from academia and industry alike.
Continue readingA Simple Way to Quantify the Similarity Between Two Sets of Molecules
When designing machine learning algorithms with the aim of accelerating the discovery of novel and more effective therapeutics, we often care deeply about their ability to generalise to new regions of chemical space and accurately predict the properties of molecules that are structurally or functionally dissimilar to the ones we have already explored. To evaluate the performance of algorithms in such an out-of-distribution setting, it is essential that we are able to quantify the data shift that is induced by the train-test splits that we rely on to decide which model to deploy in production.
For our recent ICML 2023 paper Drug Discovery under Covariate Shift with Domain-Informed Prior Distributions over Functions, we chose to quantify the distributional similarity between two sets of molecules through the Maximum Mean Discrepancy (MMD).
Continue reading9th Joint Sheffield Conference on Cheminformatics
Over the next few days, researchers from around the world will be gathering in Sheffield for the 9th Joint Sheffield Conference on Cheminformatics. As one of the organizers (wearing my Molecular Graphics and Modeling Society ‘hat’), I can say we have an exciting array of speakers and sessions:
- De Novo Design
- Open Science
- Chemical Space
- Physics-based Modelling
- Machine Learning
- Property Prediction
- Virtual Screening
- Case Studies
- Molecular Representations
It has traditionally taken place every three years, but despite the global pandemic it is returning this year, once again in person in the excellent conference facilities at The Edge. You can download the full programme in iCal format, and here is the conference calendar:
Continue readingCustomising MCS mapping in RDKit
Finding the parts in common between two molecules appears to be a straightforward, but actually is a maze of layers. The task, maximum common substructure (MCS) searching, in RDKit is done by Chem.rdFMCS.FindMCS
, which is highly customisable with lots of presets. What if one wanted to control in minute detail if a given atom X and is a match for atom Y? There is a way and this is how.
Pairwise sequence identity and Tanimoto similarity in PDBbind
In this post I will cover how to calculate sequence identity and Tanimoto similarity between any pairs of complexes in PDBbind 2020. I used RDKit in python for Tanimoto similarity and the MMseqs2 software for sequence identity calculations.
A few weeks back I wanted to cluster the protein-ligand complexes in PDBbind 2020, but to achieve this I first needed to precompute the sequence identity between all pairs sequences in PDBbind, and Tanimoto similarity between all pairs of ligands. PDBbind 2020 includes 19.443 complexes but there are much fewer distinct ligands and proteins than that. However, I kept things simple and calculated the similarities for all 19.443*19.443 pairs. Calculating the Tanimoto similarity is relatively easy thanks to the BulkTanimotoSimilarity function in RDKit. The following code should do the trick:
from rdkit.Chem import AllChem, MolFromMol2File from rdkit.DataStructs import BulkTanimotoSimilarity import numpy as np import os fps = [] for pdb in pdbs: mol = MolFromMol2File(os.path.join('data', pdb, f'{pdb}_ligand.mol2')) fps.append(AllChem.GetMorganFingerprint(mol, 3)) sims = [] for i in range(len(fps)): sims.append(BulkTanimotoSimilarity(fps[i],fps)) arr = np.array(sims) np.savez_compressed('data/tanimoto_similarity.npz', arr)
Sequence identity calculations in python with Biopandas turned out to be too slow for this amount of data so I used the ultra fast MMseqs2. The first step to running MMseqs2 is to create a .fasta file of all the sequences, which I call QUERY.fasta. This is what the first few lines look like:
Continue readingUseful small molecules blogs
I thought I’d share a list of some of the other blogs that have helped me during my PhD so far and may be useful to new starters (or those who may not have come across them before). This list is by no means exhaustive and I’m very open to other recommendations!
Continue readingBe a computational chemist and you must be a jack of all trades
Being a jack of all trades brings to mind someone who has extensive multidisciplinary expertise and is equipped with many tools in their toolbox to solve different problems. A jack of all trades is a great succinct description for computational chemists in drug discovery.
Recently I had a great conversation with Dr. Arjun Narayanan, a Senior Research Scientist at Vertex Pharmaceuticals and a jack of all trades as a computational chemist. In this blog post, I’ll describe what he does as a computational chemist, the problems he solves, and the new tools he’s looking forward to adding to his toolbox.