One of the fundamental (pitfalls) of machine learning is to ensure that you don’t train on your test set, but what if I told you that you could?
Author Archives: Fergus Imrie
Le Tour de Farce v11.0
“They don’t make them like they used to!”
With much experience of all things farcical, it was my delight to have returned just in time for the 2024 edition of OPIG’s Tour de Farce, which took place on 11th July. This year’s route was 8 miles long and encompassed four of the finest establishments Oxford has to offer (nothing “unusually conservative” to see here Eoin).
Continue readingFragment-to-Lead Successes in 2019
In this blogpost, I want to highlight the excellent work by Jahnke and collaborators. For the past 5 years, they have published an annual perspective covering fragment-to-lead success stories from the previous year. Very helpfully, their work includes a table detailing the hit fragment(s) and lead molecule, together with key experimental results and parameters.
Continue readingNeurIPS 2020: Chemistry / Biology papers
Another blog post, another look at accepted papers for a major ML conference. NeurIPS joins the other major machine learning conferences (and others) in moving virtual this year, running from 6th – 12th December 2020. In a continuation of past posts (ICML 2020, NeurIPS 2019), I will highlight several of potential interest to the chem-/bio-informatics communities
The list of accepted papers can be found here, with 1,903 papers accepted out of 9,467 submissions (20% acceptance rate).
In addition to the main conference, there are several workshops highly related to the type of research undertaken in OPIG: Machine Learning in Structural Biology and Machine Learning for Molecules.
The usual caveat: given the large number of papers, these were selected either by “accident” (i.e. I stumbled across them in one way or another) or through a basic search (e.g. Ctrl+f “molecule”). If you find any I have missed, please reach out and I will update accordingly.
Continue readingLearning from Biased Datasets
Both the beauty and the downfall of learning-based methods is that the data used for training will largely determine the quality of any model or system.
While there have been numerous algorithmic advances in recent years, the most successful applications of machine learning have been in areas where either (i) you can generate your own data in a fully understood environment (e.g. AlphaGo/AlphaZero), or (ii) data is so abundant that you’re essentially training on “everything” (e.g. GPT2/3, CNNs trained on ImageNet).
This covers only a narrow range of applications, with most data not falling into one of these two categories. Unfortunately, when this is true (and even sometimes when you are in one of those rare cases) your data is almost certainly biased – you just may or may not know it.
Continue readingICML 2020: Chemistry / Biology papers
ICML is one of the largest machine learning conferences and, like many other conferences this year, is running virtually from 12th – 18th July.
The list of accepted papers can be found here, with 1,088 papers accepted out of 4,990 submissions (22% acceptance rate). Similar to my post on NeurIPS 2019 papers, I will highlight several of potential interest to the chem-/bio-informatics communities. As before, given the large number of papers, these were selected either by “accident” (i.e. I stumbled across them in one way or another) or through a basic search (e.g. Ctrl+f “molecule”).
Continue readingDeLinker – Deep Generative Models for 3D Linker Design
*** Disclaimer: This blog post represents some shameless self-promotion. ***
I am delighted to announce that our most recent work, DeLinker, was recently published in the Journal of Chemical Information and Modeling (link).
Continue readingJournal Club: Is our data biased, and should it be?
Last week I presented the above paper at group meeting. While a little different from a typical OPIG journal club paper, the data we have access to almost certainly suffers from the same range of (possible) biases explored in this paper.
Continue readingNeurIPS 2019: Chemistry/Biology papers
NeurIPS is the largest machine learning conference (by number of participants), with over 8,000 in 2017. This year, the conference will be held in Vancouver, Canada from 8th-14th December.
Recently, the list of accepted papers was announced, with 1430 papers accepted. Here, I will highlight several of potential interest to the chem-/bio-informatics communities. Given the large number of papers, these were selected either by “accident” (i.e. I stumbled across them in one way or another) or through a basic search (e.g. Ctrl+f “molecule”).
Continue readingPython Handout
Many OPIGlets extensively use Jupyter (in either Notebook or Lab flavour) to prototype and present their work. However, as project progress frequently notebooks are converted into regular python files for a number of reasons, losing the notebook functionality.
Wouldn’t it be nice if we could combine some of the benefits of Jupyter notebooks (not least the ability to present both code & results naturally) with regular python files?
Enter Python Handout.
Python Handout was recently (5th August 2019) released by Danijar Hafner and allows Python scripts to be converted into handouts with Markdown comments and inline figures (see above picture).
Installation is via pip (pip3 install -U handout
) and Python Handout supports python 3 scripts.
While I’ve not used Handout much (yet), I will definitely be experimenting more in the coming weeks.