We are going virtual! Our next Comp Chem Kitchen, CCK-18, will be via a Zoom Webinar, on Friday, March 27, 2020, at 5-6 pm. We are delighted to announce that Prof. Andreas Bender from the University of Cambridgewill be speaking, as well as Dr Vicky Hellon from F1000 Research. To attend the CCK-18 webinar, you must sign up for a free Eventbrite ticket (limit 100).
Category Archives: Cheminformatics
Visualisation of very large high-dimensional data sets as minimum spanning trees
Large high-dimensional data sets are frequently used in chemical and biological sciences. For example the ChEMBL database contain millions of bioactive molecules from the scientific literature and their associated biological assay data are usually used for drug discovery. Visualising such databases helps understand the structure of data.
Continue readingBayesian Optimization and Correlated Torsion Angles—in Small Molecules
Our collaborator, Prof. Geoff Hutchison from the University of Pittsburg recently took part in the Royal Society of Chemistry’s 2020 Twitter Poster Conference, to highlight the great work carried out by one of my DPhil students, Lucian Leung Chan, on the application of Bayesian optimization to conformer generation:
State of the art in AI for drug discovery: more wet-lab please
The reception of ML approaches for the drug discovery pipeline, especially when focused on the hit to lead optimization process, has been rather skeptical by the medchem community. One of the main drivers for that is the way many ML publications benchmark their models: Historic datasets are split into two parts, with the larger part used to train and the smaller to test ML models. In order to standardize that validation process, computational chemists have constructed widely used benchmark datasets such as the DUD-E set, which is commonly used as a standard for protein-ligand binding classification tasks. Common criticism from medicinal chemists centers on the main problem associated with benchmark datasets: the absence of direct lab validation.
Continue readingEffect of Debiasing Protein-Ligand binding data on Generalization
Virtual screening is a computational technique used in drug discovery to search libraries of small molecules in order to identify those structures that bind tightly and specifically to a given protein target. Many machine learning (ML) models have been proposed for virtual screening, however, it is not clear whether these models can truly predict the molecular properties accurately across chemical space or simply overfit the training data. As chemical space contains clusters of molecules around scaffolds, memorising the properties of a few scaffolds can be sufficient to perform well, masking the fact that the model may not generalise beyond close analogue. Different debiasing algorithms have been introduced to address this problem. These algorithms systematically partition the data to reduce bias and provide a more accurate metric of the model performance.
Continue readingAutoDock 4 and AutoDock Vina
A recently just-released publication from Ngyuen et al. ing JCIM pointed out that while AutoDock Vina is faster, AutoDock 4 tends to have better correlation with experimental binding affinity.1
[This post has been edited to provide more information about the cited paper, as well as providing additional citations.]
Ngyuyen et al. selected 800 protein-ligand complexes for 47 protein targets that had both experimental PDB structures complexed with a ligand, as well as their associated binding affinity values.
Continue readingJournal Club: Is our data biased, and should it be?
Last week I presented the above paper at group meeting. While a little different from a typical OPIG journal club paper, the data we have access to almost certainly suffers from the same range of (possible) biases explored in this paper.
Continue readingNeurIPS 2019: Chemistry/Biology papers
NeurIPS is the largest machine learning conference (by number of participants), with over 8,000 in 2017. This year, the conference will be held in Vancouver, Canada from 8th-14th December.
Recently, the list of accepted papers was announced, with 1430 papers accepted. Here, I will highlight several of potential interest to the chem-/bio-informatics communities. Given the large number of papers, these were selected either by “accident” (i.e. I stumbled across them in one way or another) or through a basic search (e.g. Ctrl+f “molecule”).
Continue readingWhen OPIGlets leave the office
Hi everyone,
My blogpost this time around is a list of conferences popular with OPIGlets. You are highly likely to see at least one of us attending or presenting at these meetings! I’ve tried to make it as exhaustive as possible (with thanks to Fergus Imrie!), listing conferences in upcoming chronological order.
(Most descriptions are slightly modified snippets taken from the official websites.)
BOKEI: Bayesian Optimization Using Knowledge of Correlated Torsions and Expected Improvement for Conformer Generation
In previous blog post, we introduced the idea of Bayesian optimization and its application in finding the lowest energy conformation of given molecule[1]. Here, we extend this approach to incorporate the knowledge of correlated torsion and accelerate the search.
Continue reading