I’m delighted to report our collaboration (Ísak Valsson, Matthew Warren, Aniket Magarkar, Phil Biggin, & Charlotte Deane), on “Narrowing the gap between machine learning scoring functions and free energy perturbation using augmented data”, has been published in Nature’s Communications Chemistry (https://doi.org/10.1038/s42004-025-01428-y).
![](https://i0.wp.com/www.blopig.com/blog/wp-content/uploads/2025/02/IMG_4982.webp?resize=625%2C362&ssl=1)
During his MSc dissertation project in the Department of Statistics, University of Oxford, OPIG member Ísak Valsson developed an attention-based GNN to predict protein-ligand binding affinity called “AEV-PLIG”. It featurizes a ligand’s atoms using Atomic Environment Vectors to describe the Protein-Ligand Interactions found in a 3D protein-ligand complex. AEV-PLIG is free and open source (BSD 3-Clause), available from GitHub at https://github.com/oxpig/AEV-PLIG, and forked at https://github.com/bigginlab/AEV-PLIG.
Ísak also developed a much more challenging protein-ligand binding affinity prediction benchmark than CASF-2016, called “Out-Of-Distribution Test”, which is also available (https://github.com/isakvals/OOD-Test). It is designed to assess how well a method generalizes to more dissimilar ligands and proteins than seen its training set. AEV-PLIG performed best in terms of Pearson correlation coefficient on OOD Test, and another tough benchmark also developed in OPIG called “0-Ligand Bias” (https://doi.org/10.1093/bioinformatics/btaf040). AEV-PLIG proved to be more accurate than RF-score, Pafnucy, OnionNet-2, PointVS, SIGN, and AEScore (Table 1).
Matthew Warren showed that augmenting our training data (PDBbind v2020) with semi-synthetic data (BindingNet) boosted the performance of AEScore (https://github.com/RMeli/aescore) and we confirmed this with AEV-PLIG.
Together, we found that with even more data (BindingDB), our augmented AEV-PLIG model’s prediction accuracy starts to approach that of Free Energy Perturbation (FEP+) for congeneric series of ligands that bind the same protein — see Figures 3 & 4 in our paper — yet we are ~400,000 times faster, while using a single GPU instead of several. We also showed AEV-PLIG performed best on our FEP Benchmark than the other ML-based scoring functions we looked at (Table 1).
Another take home: the performance of AEV-PLIG steadily improved as we increased the fraction of augmented training data with no sign of leveling off (Figure S5). Whether this putative ‘scaling law’ applies to protein-ligand binding affinity prediction remains to be seen, but the notion of a simpler model architecture with more physically-relevant features combined with more data shows great promise…