Benchmarks in De Novo Drug Design

I recently came across a review of “De novo molecular drug design benchmarking” by Lauren L. Grant and Clarissa S. Sit where they highlighted the recently proposed benchmarking methods including Fréchet ChemNet Distance [1], GuacaMol [2], and Molecular Sets (MOSES) [3] together with its current and future potential applications as well as the steps moving forward in terms of validation of benchmarking methods [4].

From this review, I particularly wanted to note about the issues with current benchmarking methods and the points we should be aware of when using these methods to benchmark our own de novo molecular design methods. Goal-directed models are referring to de novo molecular design methods optimizing for a particular scoring function [2].

  • Usefulness of GuacaMol benchmark is questionable due to copy problem [5]: It is possible to score a perfect score on novelty, validity, and uniqueness and high score on KL divergence by using a model that only adds a single carbon to a randomly selected molecule from the training set (i.e. AddCarbon model). Hence, better metrics to quantify novelty would be beneficial – not just considering whether a new SMILES string becomes generated and its existence in the training set.
  • Potential issue with GuacaMol’s goal-directed benchmarks [5]: It is challenging to incorporate all desired molecular qualities into one score, so it is possible to generate molecules that are unstable, synthetically unrealistic, or highly uncommon substructures by optimizing for goal-directed models.
  • High scoring models on GuacaMol and MOSES does not necessarily imply synthetically accessible molecules especially for goal-directed models [6].
  • Lack of evaluation on efficacy for GuacaMol and MOSES [4]: We are unsure whether medicinal chemists would agree with the scores proposed by these benchmarking methods.

De novo molecular design is a growing field and so as its benchmarking methods and is apparent that metrics such as synthetic accessibility and rating by medicinal chemists could further improve future benchmarking methods.

References:
[1] K. Preuer, P. Renz, T. Unterthiner, S. Hochreiter and G. Klambauer, Fréchet ChemNet Distance: A Metric for Generative Models for Molecules in Drug Discovery, J. Chem. Inf. Model., 2018, 58(9), 1736–1741, DOI: 10.1021/acs.jcim.8b00234.
[2] N. Brown, M. Fiscato, M. H. S. Segler and A. C. Vaucher, GuacaMol: Benchmarking Models for de Novo Molecular Design, J. Chem. Inf. Model., 2019, 59(3), 1096–1108, DOI: 10.1021/acs.jcim.8b00839.
[3] D. Polykovskiy, A. Zhebrak, B. Sanchez-Lengeling, S. Golovanov, O. Tatanov, S. Belyaev, R. Kurbanov, A. Artamonov, V. Aladinskiy, M. Veselov, A. Kadurin, S. Johansson, H. Chen, S. Nikolenko, A. Aspuru-Guzik and A. Zhavoronkov, Molecular Sets (MOSES): A Benchmarking Platform for Molecular Generation Models, Front. Pharmacol., 2020, 11, 565644, DOI: 10.3389/fphar.2020.565644.
[4] L. L. Grant and C. S. Sit, De novo molecular drug design benchmarking, RSC Med. Chem., 2021, 12, 1273, DOI: 10.1039/D1MD00074H
[5] P. Renz, D. Van Rompaey, J. K. Wegner, S. Hochreiter and G. Klambauer, On Failure Modes in Molecule Generation and Optimization, Drug Discovery Today: Technol., 2019, 32–33, 55–63, DOI: 10.1016/j.ddtec.2020.09.003.
[6] W. Gao and C. W. Coley, The Synthesizability of Molecules Proposed by Generative Models, J. Chem. Inf. Model., 2020, 60(12), 5714–5723, DOI: 10.1021/acs.jcim.0c00174.

Author