Over the past few years I have explored different data visualization strategies with the goal of rapidly communicating information to medicinal chemists. I have recently fallen in love with “molecule networks” as an intuitive and interactive data visualization strategy. This blog gives a brief tutorial on how to start generating your own molecule networks.
Visualizing one molecule
To generate the networks I use PyVis. The package is simple to use, and they have a great tutorial. There’s only two steps to generating a network: add some nodes, then add some edges.
To add a node containing a molecule, use the code below. Here I have written the function
which takes an RDKit molecule and creates an image. A node can then be defined using generate_molecule_image()
net.add_node(node_id,shape="circularImage", image=image_path)
. Then just visualize the network using
. net.show()
There are multiple keyword arguments that can be passed in to change the appearance of nodes. We will see more in the rest of this post, but the first two I share are label
(text displayed under the node), and title
(text displayed when you hover your mouse over the node)
Note: Sometimes molecules are not visualized in Jupyter notebook. If you have a network in Jupyter which shows nodes but no molecules, try opening the .html in a browser.
import pandas as pd import numpy as np import base64 from PIL import Image from pyvis.network import Network from rdkit import Chem from rdkit.Chem.Draw import rdMolDraw2D from rdkit.Chem.rdRGroupDecomposition import RGroupDecompose from rdkit.Chem import rdRGroupDecomposition from rdkit import RDLogger RDLogger.DisableLog('rdApp.*') # Function to generate a molecule image as a base64 string def generate_molecule_image(mol): drawer = rdMolDraw2D.MolDraw2DCairo(300, 300) # Image size (300x300) rdMolDraw2D.PrepareAndDrawMolecule(drawer, mol) drawer.FinishDrawing() image_data = drawer.GetDrawingText() return base64.b64encode(image_data).decode("utf-8") # Create a PyVis network net = Network(notebook=True) # Load RDKit molecule smiles = "CCCC1=NN(C2=C1N=C(NC2=O)C3=C(C=CC(=C3)S(=O)(=O)N4CCN(CC4)C)OCC)C" mol = Chem.MolFromSmiles(smiles) # Generate image of molecule mol_image_base64 = generate_molecule_image(mol) # Add node to network # - Image is added using shape="circularImage" or shape="image" net.add_node(1, label="My molecule", title="More information", shape="circularImage", image=f"data:image/png;base64,{mol_image_base64}") # Visualize the network net.show("single_molecule_network.html")
Adding an edge
An edge is defined between two nodes by using net.add_edge(from_node_id, to_node_id)
. In the example below I have created two nodes with indexes 1 and 2 respectively, and added an edge between them using net.add_edge(1, 2)
.
Some more customization options: Nodes can be customized with color
(set color of node), and borderWidth
(border thickness of the node). Edges can be customized with label
(text shown on edge), title
(text shown when hovering), and color
(color of edge).
More customization options exist, please see PyVis documentation.
# Create a PyVis network net = Network(height="800px", width="1200px", notebook=True) # Load RDKit molecules smiles1 = "CCCC1=NN(C2=C1N=C(NC2=O)C3=C(C=CC(=C3)S(=O)(=O)N4CCN(CC4)C)OCC)C" smiles2 = "CN1CC(=O)N2[C@@H](C1=O)CC3=C([C@H]2C4=CC5=C(C=C4)OCO5)NC6=CC=CC=C36" mol1 = Chem.MolFromSmiles(smiles1) mol2 = Chem.MolFromSmiles(smiles2) # Generate images of molecules mol1_image_base64 = generate_molecule_image(mol1) mol2_image_base64 = generate_molecule_image(mol2) # Add molecules to network # - In this network molecules are indexed by indices 1 and 2. Index can also be a string. net.add_node(1, label="Sildenafil", title="Potency = XXX", shape="circularImage", image=f"data:image/png;base64,{mol1_image_base64}", color="red", borderWidth=3) net.add_node(2, label="Tadalafil", title="Potency = YYY", shape="circularImage", image=f"data:image/png;base64,{mol2_image_base64}", color="blue", borderWidth=3) # Make edge between molecules # - Edge is formed between nodes of index 1 and 2. net.add_edge(1, 2, label="PDE5", title="Protein target", color="purple") # Visualize the network net.show("two_molecule_network.html")
Visualizing set of elaborations and color by score
Now that we can add nodes and edges, all that is left is to automatically generate some networks. In this example I have defined an arbitrary core smiles, a list of possible R group, and a list of random scores.
To generate the network I loop through each R group, add a node to the network, and add an edge to my core molecule. Here I introduce my simple function get_color()
which maps a value to a color according to a heat map. For each R group in my network I color code the node using this score. You can imagine this can be used for visualizing potency or other molecular properties.
import random import matplotlib.pyplot as plt import matplotlib.colors as mcolors # Function to map a score to a heatmap color def get_color(score, vmin=-1, vmax=1): cmap = plt.cm.coolwarm # Use a heatmap color map norm = mcolors.Normalize(vmin=vmin, vmax=vmax) # Normalize scores between 0 and 1 rgba = cmap(norm(score)) hex_color = mcolors.rgb2hex(rgba) # Convert RGBA to hex color return hex_color # Molecule data core_smiles = "C1=NN(C2=C1N=C(NC2=O)C3=C(C=CC(=C3)[*:1]))C" r_group_smiles = ["*C","*O","*C(=O)N","*c1nocc1","*C(F)(F)F","*c1ccncc1","*OCC",] scores = [random.uniform(-1, 1) for s in r_group_smiles] # Create a PyVis network and add the node with the molecule image net = Network(height="700px", width="1000px", notebook=True) # Add Core molecule to network core_mol = Chem.MolFromSmiles(core_smiles) mol_image_base64 = generate_molecule_image(core_mol) net.add_node("Core", label="Core", title="Core", shape="circularImage", image=f"data:image/png;base64,{mol_image_base64}") # --- Add R groups --- for idx, (smiles, score) in enumerate(zip(r_group_smiles, scores)): r_mol = Chem.MolFromSmiles(smiles) color = get_color(score) mol_image_base64 = generate_molecule_image(r_mol) net.add_node(smiles, label="", title=smiles, shape="circularImage", image=f"data:image/png;base64,{mol_image_base64}", color=color, borderWidth=3) net.add_edge(smiles, "Core", width=3) # Visualize the graph net.show("elaboration_network.html")
Visualizing Free Wilson coefficients
Finally, I want to show a more complex example to demonstrate a possible use case for this visualization. Here I run the beginning of a Free Wilson analysis (a classic cheminformatics analysis that everyone should know). I have taken the analysis code and example directly from Pat Walters’ tutorial. For those not familiar with this method, we take a set of ligands which all share a core, decompose all the R groups, then run a simple regression model to predict which combination of R group would lead to optimal properties.
In this case I have run only the first half of the analysis which runs the regressions and generates R groups along with their coefficients describing how positively they contribute to potency (a larger positive coefficient is better).
I use the molecule network to visualize all R groups explored in this analysis. I have added two layers to my network: the first isolates the vector on the core, and the second shows all R groups elaborated at this position. The R groups are color coded by their contribution to potency, with dark red being more favorable and dark blue being least favorable R groups.
#------------ Do free wilson calculation -------------- # I only run the first half of the calculation which obtained coefficients for each R group # The second part involves enumerating combinations of R group to identify promising combinations # This section of code is taken from Pat Walters' Free Wilson Analysis # https://practicalcheminformatics.blogspot.com/2018/05/free-wilson-analysis.html # https://github.com/PatWalters/practical_cheminformatics_tutorials # https://colab.research.google.com/github/PatWalters/practical_cheminformatics_tutorials/blob/main/sar_analysis/free_wilson.ipynb from sklearn.preprocessing import OneHotEncoder from sklearn.linear_model import Ridge # Get dataset from Pat Walters input_filename = "https://raw.githubusercontent.com/PatWalters/practical_cheminformatics_tutorials/main/data/CHEMBL313_sel.smi" df = pd.read_csv(input_filename) df['mol'] = df.SMILES.apply(Chem.MolFromSmiles) # Define core molecule core_smiles = "c1ccc(C2CC3CCC(C2)N3)cc1" core_mol = Chem.MolFromSmiles(core_smiles) # Run R group decomposition and construct DataFrame ps = rdRGroupDecomposition.RGroupDecompositionParameters() ps.allowMultipleRGroupsOnUnlabelled = True match, miss = RGroupDecompose(core_mol,df.mol.values,asSmiles=True, options=ps) rgroup_df = pd.DataFrame(match) core_df = pd.DataFrame({"mol" : [Chem.MolFromSmiles(x) for x in rgroup_df.Core.unique()]}) # Get unique R groups unique_list = [] print("Total unique R groups") for r in rgroup_df.columns[1:]: num_rgroups = len(rgroup_df[r].unique()) print(f" {r} {num_rgroups}") unique_list.append(rgroup_df[r].unique()) # Featurise and run ridge regression enc = OneHotEncoder(categories=unique_list,sparse_output=False) one_hot_mat = enc.fit_transform(rgroup_df.values[:,1:]) ridge = Ridge() ridge.fit(one_hot_mat,df.pIC50) # Extract data from regression rg_df_dict = {} start = 0 rgroup_names = rgroup_df.columns[1:] for rg,name in zip(enc.categories_,rgroup_names): rg_mol_list = [Chem.MolFromSmiles(x) for x in rg] coef_list = ridge.coef_[start:start+len(rg)] start += len(rg) rg_df = pd.DataFrame({"smiles":rg, "mol": rg_mol_list, "coef": coef_list}) rg_df.sort_values("coef",inplace=True) rg_df_dict[name] = rg_df # ----------- Visualising coefficients as Graph------------------- # Initalise network net = Network(height="1000px", width="1900px", notebook=True) # Add core molecule node core_image_base64 = generate_molecule_image(core_df.iloc[0]["mol"]) net.add_node("Core", label="Core", title="Core", shape="circularImage", image=f"data:image/png;base64,{core_image_base64}", borderWidth=3, color="grey") for r in rg_df_dict: net.add_node(r, label=r, title=r, color="grey") net.add_edge("Core", r, width=3, color="grey", label=r) # Connect core to vector for index, row in rg_df_dict[r].iterrows(): node_id = f"{row['smiles']}" score = row['coef'] score_string = str(round(score, 3)) color = get_color(score) mol_image = generate_molecule_image(row["mol"]) net.add_node(node_id, label=score_string, title=node_id, shape="circularImage", image=f"data:image/png;base64,{mol_image}", borderWidth=3, color=color) net.add_edge(r, node_id, width=3, color=color) # Connect core to vector net.show("free_wilson_coefficient_network.html")
If you run this method yourself, the generated networks will be interactive. In this case I have moved around the nodes so that they are arranged by score. You can also quickly see that the most favorable group in the R3 position is the ester, and second most favorable is isoxazole. You can also see an interesting insight that adding any methyl groups to the isoxazole appears unfavorable.
This is only one small example of what this network visualization can do. I have used this method for making SAR visualization tools, and the paper rdScaffoldNetwork: The Scaffold Network Implementation in RDKit used PyVis for visualizing scaffold networks.
Happy networking!
– Nicholas Runcie