Recently I’ve been thinking a lot about how to decompose a compound into smaller fragments specifically for a retrosynthetic purpose. My question is: given a compound, can I return building blocks that are likely to synthesize together to produce this compound simply by breaking likely bonds formed in a reaction? A method that is nearly 15 years old named, breaking of retrosynthetically interesting chemical substructures (BRICS), is one approach to do this. Here I’ll explore how BRICS can reflect synthetic accessibility.
The method identifies 16 different chemical environments indicated by link atoms of different types. The 16 different environments are depicted below in a figure from the original authors Degen et al.
On the plot, every line between fragments indicates a bond can be formed between the two Ln symbols on the fragments. The Ln notation is the number of that specific fragment type and since there are 16 different chemical environments, n ranges from 1 to 16. Circle patches indicate rings of various sizes and the atom label ‘X’ stands for any of the elements C, N, O, or S.
RDKit offers an easy implementation of of BRICS decomposition from the BRICS module. So first, lets import all the packages we need.
from rdkit import Chem
from rdkit.Chem import BRICS
from rdkit.Chem import Draw
import random
from rdkit.Chem import RDConfig
import os
import sys
sys.path.append(os.path.join(RDConfig.RDContribDir, 'SA_Score'))
import sascorer
Lets fragment this compound, propranolol, a antihypertensive agent.
m = Chem.MolFromSmiles('CC(NCC(O)COC1=CC=CC2=CC=CC=C21)C')
m
Now we can fragment the compound based on BRICS rules frm the RDKit.Chem module, BRICS.
frags = list(Chem.BRICS.BRICSDecompose(m))
mols = [Chem.MolFromSmiles(x) for x in frags]
Draw.MolsToGridImage(mols, molsPerRow=4, subImgSize=(200, 200))
Notice the numbers of the dummy atoms follow with the notation from the original paper. Dummy atom 16 always extends off an aromatic ring system. Dummy atom 4 always connects to a carbon atom which is also attached to either a carbon or hydrogen atom. Dummy atom 3 always connects to an oxygen and dummy atom 5 always connects to an sp3 nitrogen atom.
A cool functionality that RDKit provides is the recomposition of fragments with the BRICS.BRICSBuild function. Lets look at all the possible combinations.
build = BRICS.BRICSBuild(mols)
random.seed(90)
prods = [next(build) for x in range(17)]
Draw.MolsToGridImage(prods, molsPerRow=4, subImgSize=(200, 200))
Here I chose 17 generations of compounds. I had to test through different ranges to find the most number of combinations the generator can return given the fragments. Some of these compounds look difficult to synthesize (the double ring system attached with a single bond). Lets run them through a synthetic accessibility filter provided by RDKit.
sa_scores = [sascorer.calculateScore(x) for x in prods]
Draw.MolsToGridImage(prods, molsPerRow=4, subImgSize=(200, 200), legends=["%.2f"%x for x in sa_scores])
What’s the synthetic accessibility of the original compound as computed by an SA scorer?
Draw.MolsToGridImage([m], legends=["%.2f"%sascorer.calculateScore(m)])
avg_sa = sum(sa_scores)/len(sa_scores)
avg_sa
The average SA score of the recomposed fragments is 2.06, lower than the SA score for the original compound, with a score of 2.30. This SA scoring metric ranges from 1 (easiest to make) to 10 (hardest to make). A lower SA score for the recomposed fragments makes intuitive sense because you are creating compounds directly from building blocks, which are not complex structures.
The SA score specifically has a complexity penalty where the score is raised based on the complexity of the ring (if there are any spiro rings and ring fusions), size of the ring (rings with >8 atoms are penalized), a high number of stereocenters, and a size penalty (molecules with more atoms are penalized).
Theoretically, compounds that can be fragmented with BRICS should be more synthetically accessible because you are fragmenting on common bonds formed in many reactions. Therefore are there any compounds that BRICS cannot cleave? Is this compound predicted SA score lower because it can’t be fragmented by these rules?
One compound it cannot cleave is below:
m = Chem.MolFromSmiles('O=c1ccc(c[nH]1)C1NCCc2ccc3OCCOc3c12')
m
frags = list(Chem.BRICS.BRICSDecompose(m))
frags
The fragment that is proposed is simply a return of the original compound, meaning the BRICS rules identified no retrosynthetic fragments. This is a little perplexing to me because according to the BRICS rules you can cleave the bond between the alpha carbon of the secondary amine and the heterocycle, but nonetheless here you cannot break it.
This compound has an SA score of 3.33, the highest score we’ve seen out of any of the compounds we’ve looked at. After running it through Postera’s manifold, there are 12 proposed synthesis routes and the building blocks proposed are not fragments of the original compound at all. See this Mannich reaction:
In conclusion, it’s nice you can easily fragment a compound into likely reactants using BRICS rules. But this should be taken with a grain of salt. If you can fragment a molecule based on BRICS rules it is likely that compound is synthetically accessible from those fragments but might not neccessarily have a high SA score reflecting that. Likewise, for a compound that cannot be easily broken down into fragments based on reaction rules, it does not neccessarily mean that it is not synthetically accessible.