Molecule Networks: data visualization using PyVis

Over the past few years I have explored different data visualization strategies with the goal of rapidly communicating information to medicinal chemists. I have recently fallen in love with “molecule networks” as an intuitive and interactive data visualization strategy. This blog gives a brief tutorial on how to start generating your own molecule networks.

Visualizing one molecule

To generate the networks I use PyVis. The package is simple to use, and they have a great tutorial. There’s only two steps to generating a network: add some nodes, then add some edges.

To add a node containing a molecule, use the code below. Here I have written the function generate_molecule_image() which takes an RDKit molecule and creates an image. A node can then be defined using net.add_node(node_id,shape="circularImage", image=image_path). Then just visualize the network using net.show().

There are multiple keyword arguments that can be passed in to change the appearance of nodes. We will see more in the rest of this post, but the first two I share are label (text displayed under the node), and title (text displayed when you hover your mouse over the node)

Note: Sometimes molecules are not visualized in Jupyter notebook. If you have a network in Jupyter which shows nodes but no molecules, try opening the .html in a browser.

import pandas as pd
import numpy as np
import base64
from PIL import Image
from pyvis.network import Network

from rdkit import Chem
from rdkit.Chem.Draw import rdMolDraw2D
from rdkit.Chem.rdRGroupDecomposition import RGroupDecompose
from rdkit.Chem import rdRGroupDecomposition
from rdkit import RDLogger
RDLogger.DisableLog('rdApp.*')


# Function to generate a molecule image as a base64 string
def generate_molecule_image(mol):
    drawer = rdMolDraw2D.MolDraw2DCairo(300, 300)  # Image size (300x300)
    rdMolDraw2D.PrepareAndDrawMolecule(drawer, mol)
    drawer.FinishDrawing()
    image_data = drawer.GetDrawingText()
    return base64.b64encode(image_data).decode("utf-8")

# Create a PyVis network
net = Network(notebook=True)

# Load RDKit molecule
smiles = "CCCC1=NN(C2=C1N=C(NC2=O)C3=C(C=CC(=C3)S(=O)(=O)N4CCN(CC4)C)OCC)C"
mol = Chem.MolFromSmiles(smiles)

# Generate image of molecule
mol_image_base64 = generate_molecule_image(mol)

# Add node to network
# - Image is added using shape="circularImage" or shape="image"
net.add_node(1, 
             label="My molecule", 
             title="More information", 
             shape="circularImage", 
             image=f"data:image/png;base64,{mol_image_base64}")

# Visualize the network
net.show("single_molecule_network.html")

Adding an edge

An edge is defined between two nodes by using net.add_edge(from_node_id, to_node_id). In the example below I have created two nodes with indexes 1 and 2 respectively, and added an edge between them using net.add_edge(1, 2).

Some more customization options: Nodes can be customized with color (set color of node), and borderWidth (border thickness of the node). Edges can be customized with label (text shown on edge), title (text shown when hovering), and color (color of edge).

More customization options exist, please see PyVis documentation.

# Create a PyVis network
net = Network(height="800px", width="1200px", notebook=True)

# Load RDKit molecules
smiles1 = "CCCC1=NN(C2=C1N=C(NC2=O)C3=C(C=CC(=C3)S(=O)(=O)N4CCN(CC4)C)OCC)C"
smiles2 = "CN1CC(=O)N2[C@@H](C1=O)CC3=C([C@H]2C4=CC5=C(C=C4)OCO5)NC6=CC=CC=C36"
mol1 = Chem.MolFromSmiles(smiles1)
mol2 = Chem.MolFromSmiles(smiles2)

# Generate images of molecules
mol1_image_base64 = generate_molecule_image(mol1)
mol2_image_base64 = generate_molecule_image(mol2)

# Add molecules to network
# - In this network molecules are indexed by indices 1 and 2. Index can also be a string.
net.add_node(1, 
             label="Sildenafil", 
             title="Potency = XXX", 
             shape="circularImage", 
             image=f"data:image/png;base64,{mol1_image_base64}", 
             color="red", 
             borderWidth=3)

net.add_node(2, 
             label="Tadalafil", 
             title="Potency = YYY", 
             shape="circularImage", 
             image=f"data:image/png;base64,{mol2_image_base64}", 
             color="blue", 
             borderWidth=3)

# Make edge between molecules
# - Edge is formed between nodes of index 1 and 2.
net.add_edge(1, 2, label="PDE5", title="Protein target", color="purple")

# Visualize the network
net.show("two_molecule_network.html")

Visualizing set of elaborations and color by score

Now that we can add nodes and edges, all that is left is to automatically generate some networks. In this example I have defined an arbitrary core smiles, a list of possible R group, and a list of random scores.

To generate the network I loop through each R group, add a node to the network, and add an edge to my core molecule. Here I introduce my simple function get_color() which maps a value to a color according to a heat map. For each R group in my network I color code the node using this score. You can imagine this can be used for visualizing potency or other molecular properties.

import random
import matplotlib.pyplot as plt
import matplotlib.colors as mcolors

# Function to map a score to a heatmap color
def get_color(score, vmin=-1, vmax=1):
    cmap = plt.cm.coolwarm  # Use a heatmap color map
    norm = mcolors.Normalize(vmin=vmin, vmax=vmax)  # Normalize scores between 0 and 1
    rgba = cmap(norm(score))
    hex_color = mcolors.rgb2hex(rgba)  # Convert RGBA to hex color
    return hex_color
    

# Molecule data
core_smiles = "C1=NN(C2=C1N=C(NC2=O)C3=C(C=CC(=C3)[*:1]))C"
r_group_smiles = ["*C","*O","*C(=O)N","*c1nocc1","*C(F)(F)F","*c1ccncc1","*OCC",]
scores = [random.uniform(-1, 1) for s in r_group_smiles]

# Create a PyVis network and add the node with the molecule image
net = Network(height="700px", width="1000px", notebook=True)

# Add Core molecule to network
core_mol = Chem.MolFromSmiles(core_smiles)
mol_image_base64 = generate_molecule_image(core_mol)
net.add_node("Core", label="Core", title="Core", shape="circularImage", image=f"data:image/png;base64,{mol_image_base64}")

# --- Add R groups ---
for idx, (smiles, score) in enumerate(zip(r_group_smiles, scores)):
    r_mol = Chem.MolFromSmiles(smiles)
    color = get_color(score)
    
    mol_image_base64 = generate_molecule_image(r_mol)
    net.add_node(smiles, label="", title=smiles, shape="circularImage", image=f"data:image/png;base64,{mol_image_base64}", color=color, borderWidth=3)
    net.add_edge(smiles, "Core", width=3)

# Visualize the graph
net.show("elaboration_network.html")

Visualizing Free Wilson coefficients

Finally, I want to show a more complex example to demonstrate a possible use case for this visualization. Here I run the beginning of a Free Wilson analysis (a classic cheminformatics analysis that everyone should know). I have taken the analysis code and example directly from Pat Walters’ tutorial. For those not familiar with this method, we take a set of ligands which all share a core, decompose all the R groups, then run a simple regression model to predict which combination of R group would lead to optimal properties.

In this case I have run only the first half of the analysis which runs the regressions and generates R groups along with their coefficients describing how positively they contribute to potency (a larger positive coefficient is better).

I use the molecule network to visualize all R groups explored in this analysis. I have added two layers to my network: the first isolates the vector on the core, and the second shows all R groups elaborated at this position. The R groups are color coded by their contribution to potency, with dark red being more favorable and dark blue being least favorable R groups.

#------------ Do free wilson calculation --------------
# I only run the first half of the calculation which obtained coefficients for each R group
# The second part involves enumerating combinations of R group to identify promising combinations
# This section of code is taken from Pat Walters' Free Wilson Analysis
# https://practicalcheminformatics.blogspot.com/2018/05/free-wilson-analysis.html
# https://github.com/PatWalters/practical_cheminformatics_tutorials
# https://colab.research.google.com/github/PatWalters/practical_cheminformatics_tutorials/blob/main/sar_analysis/free_wilson.ipynb

from sklearn.preprocessing import OneHotEncoder
from sklearn.linear_model import Ridge

# Get dataset from Pat Walters
input_filename = "https://raw.githubusercontent.com/PatWalters/practical_cheminformatics_tutorials/main/data/CHEMBL313_sel.smi"
df = pd.read_csv(input_filename)
df['mol'] = df.SMILES.apply(Chem.MolFromSmiles)

# Define core molecule
core_smiles = "c1ccc(C2CC3CCC(C2)N3)cc1"
core_mol = Chem.MolFromSmiles(core_smiles)

# Run R group decomposition and construct DataFrame
ps = rdRGroupDecomposition.RGroupDecompositionParameters()
ps.allowMultipleRGroupsOnUnlabelled = True
match, miss = RGroupDecompose(core_mol,df.mol.values,asSmiles=True, options=ps)
rgroup_df = pd.DataFrame(match)
core_df = pd.DataFrame({"mol" : [Chem.MolFromSmiles(x) for x in rgroup_df.Core.unique()]})

# Get unique R groups
unique_list = []
print("Total unique R groups")
for r in rgroup_df.columns[1:]:
    num_rgroups = len(rgroup_df[r].unique())
    print(f" {r} {num_rgroups}")
    unique_list.append(rgroup_df[r].unique())

# Featurise and run ridge regression
enc = OneHotEncoder(categories=unique_list,sparse_output=False)
one_hot_mat = enc.fit_transform(rgroup_df.values[:,1:])
ridge = Ridge()
ridge.fit(one_hot_mat,df.pIC50)

# Extract data from regression
rg_df_dict = {}
start = 0
rgroup_names = rgroup_df.columns[1:]
for rg,name in zip(enc.categories_,rgroup_names):
    rg_mol_list = [Chem.MolFromSmiles(x) for x in rg]
    coef_list = ridge.coef_[start:start+len(rg)]
    start += len(rg)
    rg_df = pd.DataFrame({"smiles":rg, "mol": rg_mol_list, "coef": coef_list})
    rg_df.sort_values("coef",inplace=True)
    rg_df_dict[name] = rg_df


# ----------- Visualising coefficients as Graph-------------------
# Initalise network
net = Network(height="1000px", width="1900px", notebook=True)

# Add core molecule node
core_image_base64 = generate_molecule_image(core_df.iloc[0]["mol"])
net.add_node("Core", label="Core", title="Core", shape="circularImage", image=f"data:image/png;base64,{core_image_base64}", borderWidth=3, color="grey")

for r in rg_df_dict:
    net.add_node(r, label=r, title=r, color="grey")
    net.add_edge("Core", r, width=3, color="grey", label=r)  # Connect core to vector

    for index, row in rg_df_dict[r].iterrows():       
        node_id = f"{row['smiles']}"
        score = row['coef']
        score_string = str(round(score, 3))
        color = get_color(score)
        mol_image = generate_molecule_image(row["mol"])
        
        net.add_node(node_id,
                     label=score_string,
                     title=node_id,
                     shape="circularImage",
                     image=f"data:image/png;base64,{mol_image}",
                     borderWidth=3,
                     color=color)

        net.add_edge(r, node_id, width=3, color=color)  # Connect core to vector
        
net.show("free_wilson_coefficient_network.html")

If you run this method yourself, the generated networks will be interactive. In this case I have moved around the nodes so that they are arranged by score. You can also quickly see that the most favorable group in the R3 position is the ester, and second most favorable is isoxazole. You can also see an interesting insight that adding any methyl groups to the isoxazole appears unfavorable.

This is only one small example of what this network visualization can do. I have used this method for making SAR visualization tools, and the paper rdScaffoldNetwork: The Scaffold Network Implementation in RDKit used PyVis for visualizing scaffold networks.

Happy networking!
– Nicholas Runcie

Author