Accelerating Simulations with CUDA MPS: An OpenMM Implementation Guide
Introduction
High-performance molecular dynamics simulations often require running concurrent simulations on GPUs. However, traditional GPU resource allocation can lead to inefficient utilization when running multiple processes, with users often resorting to using multiple GPUs to achieve this. While parrallelsing across nodes can improve time to solution, many processes require coordination and hence communication which quickly becomes a bottleneck. This is exacerbated with more powerful hardware as internal node communication for a single simulation on a single GPU can also become a bottleneck. This problem has been addressed for CPU parrallelism with multiprocessing and multithreading but previously this was challenging to do this efficiently on GPUs.
NVIDIA’s Multi-Process Service (MPS) offers a solution by enabling efficient and easy sharing of GPU resources among multiple processes with just a few commands. In this blog post, we’ll explore how to implement CUDA MPS with Python multiprocessing and OpenMM to accelerate molecular dynamics simulations.
I first stumbled upon OPIG blogs through a post on ligand-binding thermodynamics, which refreshed my understanding of some thermodynamics concepts from undergrad, bringing me face-to-face with the concept that made most molecular physics students break out in cold sweats: Entropy. Entropy is that perplexing measure of disorder and randomness in a system. In the context of molecular dynamics simulations (MD), it calculates the conformational freedom and disorder within protein molecules which becomes particularly relevant when calculating binding free energies.
In MD, MM/GBSA and MM/PBSA are fancy terms for trying to predict how strongly molecules stick together and are the go-to methods for binding free energy calculations. MM/PBSA uses the Poisson–Boltzmann (PB) equation to account for solvent polarisation and ionic effects accurately but at a high computational cost. While MM/GBSA approximates PB, using the Generalised Born (GB) model, offering faster calculations suitable for large systems, though with reduced accuracy. Consider MM/PBSA as the careful accountant who considers every detail but takes forever, while MM/GBSA is its faster, slightly less accurate coworker who gets the job done when you’re in a hurry.
Like many before me, I made the classic error of ignoring entropy, assuming that entropy changes that were similar across systems being compared would have their terms cancel out and could be neglected. This would simplify calculations and ease computational constraints (in other words it was too complicated, and I had deadlines breathing down my neck). This worked fine… until it didn’t. The wake-up call came during a project studying metal-isocitrate complexes in IDH1. For context, IDH1 is a homodimer with a flexible ‘hinge’ region that becomes unstable without its corresponding subunit, giving rise to very high fluctuations. By ignoring entropy in this unstable system, I managed to generate binding free energy results that violated several laws of thermodynamics and would make Clausius roll in his grave.
For a long time crystallographers and subsequently the authors of AlphaFold2 had you believe that proteins are a static group of atoms written to a .pdb file. Turns out this was a HOAX. If you don’t want to miss out on the latest trend of working with dynamic structural ensembles of proteins this blog post is exactly right for you. MDAnalysis is a python package which as the name says was designed to analyse molecular dyanmics simulation and lets you work with trajectories of protein structures easily.
The spatial or 3D structure of a molecule is particularly relevant to modeling its activity in QSAR. The 3D structural information affects molecular properties and chemical reactivities and thus it is important to incorporate them in deep learning models built for molecules. A key aspect of the spatial structure of molecules is the flexible distribution of their constituent atoms known as conformation. Given the temperature of a molecular system, the probability of each of its possible conformation is defined by its formation energy and this follows a Boltzmann distribution [McQuarrie and Simon, 1997]. The Boltzmann distribution tells us the probability of a certain confirmation given its potential energy. The different conformations of a molecule could result in different properties and activity. Therefore, it is imperative to consider multiple conformers in molecular deep learning to ensure that the notion of conformational flexibility is embedded in the model developed. The model should also be able to capture the Boltzmann distribution of the potential energy related to the conformers.
On 5th April 2024, over 60 researchers braved the train strikes and gusty weather to gather at Lady Margaret Hall in Oxford and engage in a day full of scientific talks, posters and discussions on the topic of adaptive immune receptor (AIR) analysis!
Some time ago, I needed to find a way to computationally estimate conductance values for every protein frame from several molecular dynamics (MD) trajectories.
In a previous post, I wrote about how to clean the resulting instant conductance timeseries from outliers. But, I never described how I generated these timeseries.
In this post, I will show how you can parallelise the computation of instant conductance given an MD trajectory. I will touch on the difficulties of this process. And why I had to implement a custom tool for it given that MDAnalysis seems to already have implemented a routine of this sort. Finally, I will provide two Python scripts that you can easily adapt to run your parallel calculations – for which I’ll provide some important notes you don’t wanna skip.
Chemoinformatics uses a curious jumble of terms from thermodynamics, wet-lab techniques and statistical terminology, which is at its most jarring, it could be argued, in machine learning. In some datasets one often sees pIC50, pEC50, pKi and pKD, in discussion sections a medchemist may talk casually of entropy, whereas in the world of molecular mechanics everything is internal energy. Herein I hope to address some common misconceptions and unify these concepts.
Being a jack of all trades brings to mind someone who has extensive multidisciplinary expertise and is equipped with many tools in their toolbox to solve different problems. A jack of all trades is a great succinct description for computational chemists in drug discovery.
Recently I had a great conversation with Dr. Arjun Narayanan, a Senior Research Scientist at Vertex Pharmaceuticals and a jack of all trades as a computational chemist. In this blog post, I’ll describe what he does as a computational chemist, the problems he solves, and the new tools he’s looking forward to adding to his toolbox.
The program cpptraj, written by Daniel Roe (https://github.com/Amber-MD/cpptraj) and distributed Open Source with the AmberTools package (https://ambermd.org/AmberTools.php), is a powerful tool for analysis of molecular dynamics simulations. In addition to all of the expected analyses like Root Mean Square Deviation and native contacts, cpptraj also includes a suite of vector algebra functions.
While this vector algebra functionality is fairly well known and easy to find in the documentation, I think it is less well known that cpptraj can write trajectories of the computed vectors. These trajectories can then be loaded into Visual Molecular Dynamics (VMD) alongside the analysed trajectory and played as a movie. This functionality is a valuable tool for debugging your vector calculations to make sure they are doing precisely what you intend. It may also prove useful for generating visualizations of vectors alongside molecular structures for publications.
The cpptraj script below reads in an Amber parameter file and coordinate file and then calculates the angle between two planes.
The first plane is defined by two vectors in the plane of the guanidino group of a R65 residue (v1 and v2); the second plane is defined by two vectors between CA atoms of amino acids in the alpha helix containing R65 (v3 and v4). The first two vectormath calls determine the normal vectors to the planes and the final vectormath line computes the angle between the normal vectors. Taken together, these commands compute the angle between the arginine side chain and a plane passing through the CA atoms of the alpha helix. Let’s check that the vectors {v1, v2, v3, v4} are being computed correctly.
The resulting vector trajectory vectors.mol2 can be loaded directly into VMD without a topology. Note that in this case we only analyzed a single frame, but you can run this same procedure on DCD files, too. This is what I get when I load the vectors into VMD alongside the structure:
I hope this vector trajectory functionality will be helpful to a few people who like to neurotically check their analyses like I do. You can download the example prmtop and rst7 files below. Note that you should rename them to remove the extra “.txt” file extension before attempting to use them for anything.
The information in this blog post is adapted from an Amber Archive post from Daniel Roe, dated 30-Oct-2018: http://archive.ambermd.org/201811/0058.html
Have you ever had an annoying dataset that looks something like this?
or even worse, just several of them
In this blog post, I will introduce basic techniques you can use and implement with Python to identify and clean outliers. The objective will be to get something more eye-pleasing (and mostly less troublesome for further data analysis) like this