Narrowing the gap between machine learning scoring functions and free energy perturbation using augmented data

I’m delighted to report our collaboration (Ísak Valsson, Matthew Warren, Aniket Magarkar, Phil Biggin, & Charlotte Deane), on “Narrowing the gap between machine learning scoring functions and free energy perturbation using augmented data”, has been published in Nature’s Communications Chemistry (https://doi.org/10.1038/s42004-025-01428-y).

During his MSc dissertation project in the Department of Statistics, University of Oxford, OPIG member Ísak Valsson developed an attention-based GNN to predict protein-ligand binding affinity called “AEV-PLIG”. It featurizes a ligand’s atoms using Atomic Environment Vectors to describe the Protein-Ligand Interactions found in a 3D protein-ligand complex. AEV-PLIG is free and open source (BSD 3-Clause), available from GitHub at https://github.com/oxpig/AEV-PLIG, and forked at https://github.com/bigginlab/AEV-PLIG.

Ísak also developed a much more challenging protein-ligand binding affinity prediction benchmark than CASF-2016, called “Out-Of-Distribution Test”, which is also available (https://github.com/isakvals/OOD-Test). It is designed to assess how well a method generalizes to more dissimilar ligands and proteins than seen its training set. AEV-PLIG performed best in terms of Pearson correlation coefficient on OOD Test, and another tough benchmark also developed in OPIG called “0-Ligand Bias” (https://doi.org/10.1093/bioinformatics/btaf040). AEV-PLIG proved to be more accurate than RF-score, Pafnucy, OnionNet-2, PointVS, SIGN, and AEScore (Table 1).

Matthew Warren showed that augmenting our training data (PDBbind v2020) with semi-synthetic data (BindingNet) boosted the performance of AEScore (https://github.com/RMeli/aescore) and we confirmed this with AEV-PLIG.

Together, we found that with even more data (BindingDB), our augmented AEV-PLIG model’s prediction accuracy starts to approach that of Free Energy Perturbation (FEP+) for congeneric series of ligands that bind the same protein — see Figures 3 & 4 in our paper — yet we are ~400,000 times faster, while using a single GPU instead of several. We also showed AEV-PLIG performed best on our FEP Benchmark than the other ML-based scoring functions we looked at (Table 1).

Another take home: the performance of AEV-PLIG steadily improved as we increased the fraction of augmented training data with no sign of leveling off (Figure S5). Whether this putative ‘scaling law’ applies to protein-ligand binding affinity prediction remains to be seen, but the notion of a simpler model architecture with more physically-relevant features combined with more data shows great promise…

Author

Garrett

View all posts

Oxford Protein Informatics Group

or "OPIG" to friends

Narrowing the gap between machine learning scoring functions and free energy perturbation using augmented data

Author