A simple criterion can conceal a multitude of chemical and structural sins

We’ve been investigating deep learning-based protein-ligand docking methods which often claim to be able to generate ligand binding modes within 2Å RMSD of the experimental one. We found, however, this simple criterion can conceal a multitude of chemical and structural sins…

DeepDock attempted to generate the ligand binding mode from PDB ID 1t9b
(light blue carbons, left), but gave pretzeled rings instead (white carbons, right).

If you’re interested in assessing the structural quality and chemical validity of predicted binding modes (and conformations) of small molecules, you might like to read about one of our DPhil students 🤓 Martin Buttenschoen‘s work on PoseBusters on arXiv.

Example PoseBusters waterfall plot showing the PoseBusters tests as filters for the TankBind predictions on the Astex Diverse data set. The leftmost (dotted) bar shows the number of complexes in the test set. The red bars show the number of predictions that fail with each additional test going from left to right. The rightmost (solid) bar indicates the number of predictions that pass all tests, i.e. those that are ‘PB-Valid’. For the 85 test cases in the Astex Diverse set 50 (59%) predictions have RMSD within 2Å RMSD and 5 (5.9%) pass all tests.

Martin has developed a pip-installable Python package that’s easy to use, with friendly documentation.

You can also hear Martin speak at the upcoming RSC CICAG and RSC BMRC’s 6th “AI in Chemistry” Symposium at Churchill College, Cambridge—and there’s still time to register…

As one of the Co-Chairs of the organizing committee, I’d like to thank AstraZeneca and OpenBioSim for sponsoring this year’s #AIChem23. We have a fantastic line-up of speakers and poster presenters for what promises to be another exciting meeting at the intersection of AI and Chemistry.

It’s worth mentioning four days after we posted our preprint on arXiv, another arXiv preprint describing very similar tool called PoseCheck—designed to check small molecules produced by structure-based deep generative AI models—was posted from Prof. Tom Blundell’s group.

Author