Have you ever needed to find a reaction SMARTS pattern for a certain reaction but don’t have it already written out? Do you have a reaction SMARTS pattern but need to test it on a set of reactants and products to make sure it transforms them correctly and doesn’t allow for odd reactants to work? I recently did and I spent some time developing functions that can:
- Generate a reaction SMARTS for a reaction given two reactants, a product, and a reaction name.
- Check the reaction SMARTS on a list of reactants and products that have the same reaction name.
What is SMARTS?
SMARTS stands for SMiles ARbitrary Target Specification, and is a language used for pattern searching in molecules. It was invented by Daylight and they have much more information on their site here. Reaction SMARTS are SMARTS used to define reactions, they will usually have this basic structure:
[reactant_1].[reactant_N]>>[product_1].[product_N]
For the rest of the functions you’ll need to install and import the neccessary libraries.
!pip install reaction-utils rxnmapper rdkit from rxnutils.chem.reaction import ChemicalReaction from rxnmapper import RXNMapper from rdkit import Chem from rdkit.Chem import AllChem, DataStructs, Draw
Example 1: Generate a reaction SMARTS given the reactants and product
Let’s say we need a reaction smarts for the bimolecular reaction, reductive amination. I have these two reactants and product.
reactant1 = Chem.MolFromSmiles('O=Cc1ccc2c(c1)OCO2') reactant2 = Chem.MolFromSmiles('Cc1nnc2c(N)nccn12') product = Chem.MolFromSmiles('Cc1nnc2c(NCc3ccc4c(c3)OCO4)nccn12') captions = ['Reactant 1', 'Reactant 2', 'Product'] Draw.MolsToGridImage([reactant1, reactant2, product], molsPerRow=3, subImgSize=(300,300), legends=captions)
Now lets define the make_rxn_template
function, which takes the two reactant mol objects, product mol object, and radius for the reaction template search as inputs. The radius argument specifies the radius of the template. It uses RXNmapper which labels exact atoms in the reactants that are kept in the product. Steph (another lovely OPIG member) wrote a helpful post on this tool, check it out!
This make_rxn_template
function outputs the SMARTS reaction template.
def make_rxn_template(reactants: list, product: Chem.Mol, radius=1): """ Given mol objects of two reactants and product, make a SMARTS reaction template. """ reactants = [Chem.MolToSmiles(reactant) for reactant in reactants] product = Chem.MolToSmiles(product) # Make string of reaction reaction = f"{reactants[0]}.{reactants[1]}>>{product}" rxnmapper = RXNMapper() # Get atom mapped forward template res = rxnmapper.get_attention_guided_atom_maps([reaction])[0] # Convert to ChemicalReaction object rxn = ChemicalReaction(res['mapped_rxn']) # Get reaction templates with radius mapping = rxn.generate_reaction_template(radius=radius, expand_ring=False, expand_hetero=False) forward_template = mapping[0].smarts return forward_template
# create reaction SMARTS first_reductive_amination_mapping = make_rxn_template(reactants, product, radius=1) print(first_reductive_amination_mapping)
'O=[CH;D2;+0:1]-[c:2].[#7;a:3]:[c:4](-[NH2;D1;+0:5]):[c:6](:[#7;a:7]):[#7;a:8]>>[#7;a:3]:[c:4](-[NH;D2;+0:5]-[CH2;D2;+0:1]-[c:2]):[c:6](:[#7;a:7]):[#7;a:8]'
Great, that is our reaction SMARTS for reductive amination! To check if this reaction SMARTS will work for other reductive amination examples lets test it in Example 2.
Example 2: Check a reaction SMARTS on many reaction examples
Now we need to check if this SMARTS works for other examples of this same reaction in our dataset. I have a dataframe of 10 other reductive amination examples that looks like this:
product | reactants | name |
---|---|---|
Cc1nnc2c(NCc3ccc4c(c3)OCO4)nccn12 | (O=Cc1ccc2c(c1)OCO2, Cc1nnc2c(N)nccn12) | Reductive_amination |
O[C@@H](CNCc1nnc2ccccn12)c1ccc2c(c1)OCO2 | (O=Cc1nnc2ccccn12, NCC(O)c1ccc2c(c1)OCO2) | Reductive_amination |
Cc1cc(C(=O)N[C@H](C)[C@H](C)NCc2ccccn2)no1 | (O=Cc1ccccn1, Cc1cc(C(=O)NC(C)C(C)N)no1) | Reductive_amination |
CCOC(=O)C[C@@H](c1cc(C)on1)N1CCCCC1 | (CCOC(=O)CC(=O)c1cc(C)on1, C1CCNCC1) | Reductive_amination |
C[C@@H](F)CCNC1CCN(C(=O)CCN2CCCCC2)CC1 | (O=C1CCN(C(=O)CCN2CCCCC2)CC1, CC(F)CCN) | Reductive_amination |
c1cncc([C@@H]2CCC[C@@H](NCc3ccc4c(c3)OCO4)C2)c1 | (O=C1CCCC(c2cccnc2)C1, NCc1ccc2c(c1)OCO2) | Reductive_amination |
COc1cccc(CN[C@H](Cc2ccc3c(c2)OCO3)C(=O)O)n1 | (NC(Cc1ccc2c(c1)OCO2)C(=O)O, COc1cccc(C=O)n1) | Reductive_amination |
Cc1cc(C(=O)NCC[C@@H](C)NCC(=O)N2CCCCC2)no1 | (NCC(=O)N1CCCCC1, CC(=O)CCNC(=O)c1cc(C)on1) | Reductive_amination |
FC(F)(F)C(F)(F)[C@@H]1CCCN(CCCN2CCCCC2)C1 | (O=CCCN1CCCCC1, FC(F)(F)C(F)(F)C1CCCNC1) | Reductive_amination |
CC(C)(C)c1cncc(CN[C@@H](CN2CCCCC2)C(=O)O)c1 | (NC(CN1CCCCC1)C(=O)O, CC(C)(C)c1cncc(C=O)c1) | Reductive_amination |
Here I define the following functions to test my generated reductive amination template on my examples. Basically check_smarts
is the entry function that takes the dataframe of the product and reactants as SMILES as input. For each row in the dataframe check_rxn_template_works
applies the given reaction template to reactants and product using check_rxn
. Both combinations of reactants are checked since the order is specific for each reaction template.
The check_rxn
function applies the reaction SMARTS to the reactants and if a product is found, that product is checked if it matches the original product by tanimoto similarity = 1. There are three possible outputs from check_rxn
:
- The reaction SMARTS is applicable and produces the expected product with Tanimoto similarity of 1.0.
- The reaction SMARTS is applicable but does not produce any of the expected products with Tanimoto similarity of 1.0.
- The reaction SMARTS is not applicable to the provided reactants, therefore there is no product.
""" This code block contains functions for reaction enumeration and reaction mapping. NOTE: This is intended to be run in a jupyter notebook, hence the use of the display() functionality. """ def remove_chirality(smiles: str): return smiles.replace('@@', '').replace('@', '').replace('/', '').replace('\\', '') def check_rxn(rxn: AllChem.ChemicalReaction, reactants: list, products: list, verbose=False): """ Check if reaction template works for a combination of reactants using Tanimoto similarity to compare against a list of products. INPUT: rxn: AllChem.ReactionSmarts object reactants: list of mol objects of reactants products: list of smiles of possible products """ predicted_products_flat = [] if rxn.RunReactants(reactants): predicted_products_list = rxn.RunReactants(reactants) predicted_products_flat = [remove_chirality(Chem.MolToSmiles(mol)) for product_set in predicted_products_list for mol in product_set] # Check Tanimoto similarity with each product in the provided list for predicted_smiles in predicted_products_flat: predicted_product_mol = Chem.MolFromSmiles(predicted_smiles) predicted_product_fp = AllChem.GetMorganFingerprint(predicted_product_mol, 2) for product_smiles in products: product_mol = Chem.MolFromSmiles(product_smiles) product_fp = AllChem.GetMorganFingerprint(product_mol, 2) similarity = DataStructs.TanimotoSimilarity(product_fp, predicted_product_fp) if similarity == 1.0: if verbose: print("The reaction SMARTS is applicable and produces the expected product with Tanimoto similarity of 1.0.") return 1, predicted_products_flat if verbose: print("The reaction SMARTS is applicable but does not produce any of the expected products with Tanimoto similarity of 1.0.") return -1, predicted_products_flat else: if verbose: print("The reaction SMARTS is not applicable to the provided reactants, therefore there is no product.") return 0, predicted_products_flat def check_rxn_template_works(rxn_temp: str, reactants: list, products: list, verbose=False): """ Check if reaction template works. LOOKS AT BOTH COMBINATIONS OF REACTANTS. """ rxn = AllChem.ReactionFromSmarts(rxn_temp) reactants = [Chem.MolFromSmiles(reactant) for reactant in reactants] products = [remove_chirality(product) for product in products] AllChem.SanitizeRxn(rxn) outcome = 0 # Check if the number of reactants is applicable if len(reactants) != len(rxn.GetReactants()): if verbose: print("The reaction SMARTS is not applicable to the provided reactants.") return 0, [] # Check if the reaction is applicable to both combinations of reactants if verbose: print('Outcome of first combo:') outcome, predicted_products_flat = check_rxn(reactants=reactants, rxn=rxn, products=products, verbose=verbose) if len(reactants) == 1: return outcome, predicted_products_flat reactants2 = [reactants[1], reactants[0]] if verbose: print('Outcome of second combo:') outcome2, predicted_products_flat2 = check_rxn(reactants=reactants2, rxn=rxn, products=products, verbose=verbose) predicted_products_flat2 if outcome == 1 or outcome2 == 1: outcome = 1 predicted_products_flat.extend(predicted_products_flat2) return outcome, predicted_products_flat if outcome == -1 or outcome2 == -1: outcome = -1 predicted_products_flat.extend(predicted_products_flat2) return outcome, predicted_products_flat else: outcome = 0 predicted_products_flat.extend(predicted_products_flat2) return outcome, predicted_products_flat def check_smarts(df: pd.DataFrame, rxn_smarts: str): success = 0 failure = 0 total = len(df) for index, row in df.iterrows(): reactants = row['reactants'] product = [row['product']] print('ROW:', index) outcome = check_rxn_template_works(rxn_smarts, reactants, product, verbose=True) if outcome[0] != 1: failure += 1 if outcome[0] == -1: print('FAILED: THE PREDICTED PRODUCT(S) IS/ARE WRONG') for i in outcome[1]: display(Chem.MolFromSmiles(i)) if outcome[0] == 0: print('FAILED: THE REACTION SMARTS IS NOT APPLICABLE TO THE PROVIDED REACTANTS') print('REACTANT(S) BELOW:') display(Chem.MolFromSmiles(reactants[0])) display(Chem.MolFromSmiles(reactants[1])) print('PROPER PRODUCT(S) BELOW') products = [Chem.MolFromSmiles(i) for i in product] for i in products: display(i) else: success += 1 print('SUCCESS: THE PREDICTED PRODUCT(S) IS/ARE RIGHT') for i in outcome[1]: display(Chem.MolFromSmiles(i)) print(f'SUCCESS: {success} / {total} ({(success/len(df)*100)}%)') print(f'FAILURE: {failure} / {total} ({failure/len(df)*100}%)')
check_smarts(df, first_reductive_amination_mapping)
Etc…
These results mean the reaction template only worked for one example (the one it was originally generated for) and did not work for any other examples 😥. While not a great result, this is most likely due to its high specificity.
After some manual editing using the helpful SMARTS visualization tool SMARTS PLUS, made by the University of Hamburg, we have a more general reductive amination template.
first_reductive_amination = 'O=[CH;D2;+0:1]-[c:2].[#7;a:3]:[c:4](-[NH2;D1;+0:5]):[c:6](:[#7;a:7]):[#7;a:8]>>[#7;a:3]:[c:4](-[NH;D2;+0:5]-[CH2;D2;+0:1]-[c:2]):[c:6](:[#7;a:7]):[#7;a:8]' general_reductive_amination_mapping = '[#6:2](=[#8])(-[#6:1]).[#7;H2,H1:3]>>[#6:2](-[#6:1])-[#7:3]'
check_smarts(df, general_reductive_amination_mapping)
Success!!
Acknowledgements
Please check out these tools that were indespensible for this!
- RXNutils: https://molecularai.github.io/reaction_utils/
- RXNmapper: https://github.com/rxn4chemistry/rxnmapper
- SMARTS PLUS: smarts.plus
- And as always, RDKit: https://github.com/rdkit/rdkit