Last month a bunch of us attended the Sheffield Chemoinformatics Conference. We heard many great presentations and there were many invitations to check out one’s GitHub page. I decided now is the perfect time to try out some code that was shown by one presenter.
Peter Ertl from Novartis presented his work on the The encyclopedia of functional groups. He presented a method that automatically detects functional groups, without the use of a pre-defined list (which is what most other methods use for detecting functional groups). His method involves recursive searching through the molecule to identify groups of atoms that meet certain criteria. He used his method to answer questions such as: how many functional groups are there and what are the most common functional groups found in common synthetic molecules versus bioactive molecules versus natural products. Since I, like many others in the group, are interested in fragment libraries (possibly due to a supervisor in common), I thought I could try it out on one of these.
I was also inspired by the recent blog post by Elliot where he looked at the differences in library properties between MiniFrag, DSiP and Fraglites. Wouldn’t it be interesting to look at the differences in functional group compositions of the libraries? Hence I have made a start by calculating the functional groups in the original DLSP library.
You can look at my jupyter notebook here where I use Peter Ertl’s ifg module to find the most common functional groups in the original DSPL library and the top 10 are shown below.
Interestingly, if we compare this result to the most common functional groups from the ChEMBL database there are definitely some differences. For example, flouro- and chloro- substituents appear high in the most common groups from the ChEMBL database but don’t appear in the top ten from the original DSPL.
Ref: Ertl, P. (2017). An algorithm to identify functional groups in organic molecules. Journal of Cheminformatics, 9(1), 36. https://doi.org/10.1186/s13321-017-0225-z