Understanding the synthesizability of molecules proposed by generative models

De novo molecular design is a computational technique to generate molecules with desired properties from scratch. Classical generative algorithms are based on Genetic Algorithms (GA) and the iterative construction of molecules from molecular fragments. Recently, Variational Auto-Encoders (VAEs), Generative Adversarial Networks (GANs) have been developed for this task, however, the synthesizability of the proposed molecular structures remains an issue. Gao and Coley[1] provided an analysis of the synthesizability of the molecules proposed by these de novo generative algorithms, and discuss their strengths and weaknesses.

They used ASKCOS[2] to evaluate the synthesizability of the compounds in different libraries, and showed MOSES[3] has the highest synthesizability (~90%). The GDB17[4] has the lowest rate of synthesizability (~3.5%).

They compared three different approaches to address the synthesizability issue: (1) post hoc filtering, (2) training set biasing, and (3) heuristic biasing. They showed that training set biasing helps improve the performance in distribution learning (unoptimized molecular generation), but the effect is insignificant in goal-directed task (optimized molecular generation). Modification of the objective function with heuristic score help improve the synthesizability of generated candidates, but may detract from the main objectives.

The authors also mentioned different ways to improve the performance of the generative models, including biasing generation with reinforcement learning or designing new algorithms that explicitly constrained by predictions of chemical reactivity.

In summary, the authors provided an analysis of the synthesizability of the molecular structures proposed by generative models. This analysis suggested that new algorithms are required to improve the utility of these models in real discovery workflows.

References:

  1. W.H. Gao and C.W. Coley. The Synthesizability of Molecules Proposed by Generative Models. J. Chem. Inf. Model. 2020
  2. C.W. Coley et al. A robotic platform for flow synthesis of organic compounds informed by AI planning. Science 2019 365 eaax1566
  3. D. Polykovskiy et al. Molecular Sets (MOSES): A Benchmarking Platform for Molecular Generation Models. arXiv 2018
  4. L. Ruddigkeit et al. Enumeration of 166 Billion Organic Small Molecules in the Chemical Universe Database GDB-17. J. Chem. Inf. Model. 2012, 52, 2864−2875

Author