5th AI in Chemistry meeting

A few weeks ago we attended the RSC’s 5th AI in Chemistry conference at Churchill College, Cambridge. It featured research and discussions on a broad range of topics and was attended by a diverse set of more than 200 researchers from academia and industry, including six Opiglets.

One of the main research themes was the application of machine learning models for the prediction of molecular properties such as bioactivity and ADME-T. This included talks from Héléna Gaspar (BenevolentAI), Miriam Mathea (BASF) and Raquel Rodriguez-Perez (Novartis); an interesting sub-topic that emerged was leveraging the benefits of local versus global models.

There were also some interesting talks about combining machine learning and MD/QM. Julien Michel (EaStCHEM School of Chemistry) presented “Hybrid Alchemical Free Energy/Machine-Learning Methodologies for Drug Discovery”, where he showed how FEP can be massively accelerated by using machine learning. Adrian Roitberg showed in his talk “Machine Learning for Accurate Energies and Forces in Molecular Systems” how neural network potentials can be employed for simulations at a fraction of a cost.

Molecular design using generative models was another key theme, featuring a Keynote talk from Charlotte, which showcased some of the work being done in OPIG, and from Opiglet An Goto, who presented her talk “De novo Molecular Design in 3D using Available Reagents, Reactions, and Docking in Deep Reinforcement Learning for SARS-CoV-2 Main Protease”.

Another major theme was synthesis automation and retrosynthesis prediction, including talks from Teodoro Laino (IBM) and Connor Coley (MIT). Teodoro Laino’s talk discussed the application of language models for retrosynthesis prediction and generation of synthesis protocols to allow autonomous chemistry on a robotic platform. Connor Coley’s talk linked synthesis considerations to molecular design, demonstrating how we can design molecules in the chemical space constrained according to what we can synthesize.

The problem of dataset biases was one of the most popular topics, as shown by the number of posters and talks that focused on it. One example was a talk from Lucian Chan (Astex), titled “3D pride without 2D prejudice: Bias-controlled multi-level generative models for structure-based ligand design”, that presented a new generative hierarchical model explicitly designed to reduce the impact of data bias in the generative process. Opiglet Leo Klarner presented the poster “Bias in the Benchmark: Systematic experimental errors in bioactivity databases confound multi-task and meta-learning algorithms​”, which studies the prevalence and implications of experimental biases in commonly used datasets, for which he was awarded the best academic poster prize.

It was also interesting to hear from speakers who are using AI in other areas of chemistry beyond drug discovery. We heard from Kim Jelfs (Imperial College London), who spoke about developing molecular materials with desired properties (such as porosity) and their tool that also considers the synthesizability of such materials; Donal O’Shea (RCSI) spoke about conducting risk profiling for 180 flavour compounds that are added to vaping devices. Other interesting topics included the societal and environmental impact of machine learning in the context of cheminformatics research (Daniel Probst, EPFL).

Authors