I first stumbled upon OPIG blogs through a post on ligand-binding thermodynamics, which refreshed my understanding of some thermodynamics concepts from undergrad, bringing me face-to-face with the concept that made most molecular physics students break out in cold sweats: Entropy. Entropy is that perplexing measure of disorder and randomness in a system. In the context of molecular dynamics simulations (MD), it calculates the conformational freedom and disorder within protein molecules which becomes particularly relevant when calculating binding free energies.
In MD, MM/GBSA and MM/PBSA are fancy terms for trying to predict how strongly molecules stick together and are the go-to methods for binding free energy calculations. MM/PBSA uses the Poisson–Boltzmann (PB) equation to account for solvent polarisation and ionic effects accurately but at a high computational cost. While MM/GBSA approximates PB, using the Generalised Born (GB) model, offering faster calculations suitable for large systems, though with reduced accuracy. Consider MM/PBSA as the careful accountant who considers every detail but takes forever, while MM/GBSA is its faster, slightly less accurate coworker who gets the job done when you’re in a hurry.
Like many before me, I made the classic error of ignoring entropy, assuming that entropy changes that were similar across systems being compared would have their terms cancel out and could be neglected. This would simplify calculations and ease computational constraints (in other words it was too complicated, and I had deadlines breathing down my neck). This worked fine… until it didn’t. The wake-up call came during a project studying metal-isocitrate complexes in IDH1. For context, IDH1 is a homodimer with a flexible ‘hinge’ region that becomes unstable without its corresponding subunit, giving rise to very high fluctuations. By ignoring entropy in this unstable system, I managed to generate binding free energy results that violated several laws of thermodynamics and would make Clausius roll in his grave.
My then supervisor comments:
Funny enough, a year after finishing my master’s, an old lab mate messaged me on LinkedIn (yes, LinkedIn) to chat about entropic terms:
So, I revisited the theory, application, and implementation of entropic contributions in free energy calculations, focusing on its use in MM/PBSA and MM/GBSA methods, and provide a high-level overview. More detailed reviews can be found in the works of Homeyer and Gohlke (2012), Steinbrecher and Labahn (2010), and Miller et al. (2008).
Fundamentals of Molecular Dynamics and Mechanics
We first need to understand MD simulation. This is a method that numerically integrates Newton’s equations of motion to model the microscopic behaviour of molecules. It updates atomic positions and velocities over discrete time steps using algorithms like Verlet and Velocity Verlet methods (These algorithms are explained more on my GitHub repo). By incorporating thermostats and barostats, MD simulations can operate under various ensembles (NVE, NVT, NPT), enabling the study of systems under different thermodynamic conditions.
MD simulations are built on the principles of molecular mechanics (MM) – computational methods that model molecular systems using classical physics. This approach calculates molecular structures, energies, and properties by treating atoms as particles and bonds as springs. At its heart are force fields – mathematical functions and parameters that describe the system’s potential energy. The total potential energy (E) of a system is expressed as the sum of individual energy terms representing different interactions:
E = Ebonded + Enon-bonded
The Ebonded interactions include bond stretching (modelled using Hooke’s Law or Morse potential), angle bending (energy from deviations in equilibrium bond angles), and dihedral angles (energy from rotation around bonds). The Enon-bonded interactions consist of van der Waals forces (modelled using Lennard-Jones potential) and electrostatic interactions (calculated using Coulomb’s Law).
The potential energy functions in force fields are parameterised using experimental data or high-level quantum mechanical calculations. Key parameters include force constants, equilibrium bond lengths, and partial charges – all critical for accurate simulations. However, MM does not account for electronic polarisation or bond breaking/forming. Additionally, modelling chemical reactions requires specialised force fields, and overall accuracy depends heavily on parameter quality.
While MM and MD simulations provide detailed insights into molecular structures and dynamics, they cannot accurately compute thermodynamic properties like free energies. Advanced computational methods such as Free Energy Perturbation (FEP), replica exchange FEP, and thermodynamic integration are needed to estimate these properties, particularly binding free energies, but these become computationally demanding for large systems and often need intermediate steps to converge properly.
End-state methods for binding free energy calculations
End-state free energy methods provide a more efficient alternative by eliminating the need to simulate these intermediate states. Using implicit solvent models further reduces computational demands by removing fluctuations from explicit solvent molecules. Consequently, end-state methods are widely used in computational studies. These methods are frequently used for two analyses: assessing the relative stability of multiple conformations within a system and determining the binding free energy of a non-covalently bound receptor-ligand complex, as shown in thermodynamic cycles in Figure 1. Stability calculations compare free energies of different conformations to determine their relative stabilities. For instance, when a biomolecule transitions from conformation A to B, the free energy change is calculated using equation 1:
ΔGtransition (A to B), solvated = ΔGsolvated, B – ΔGsolvated, A (1)
Therefore, to determine binding free energies, we calculate the difference between the free energy of the bound complex and the sum of the free energies of the unbound receptor and ligand using equation 2:
ΔGbinding (solvated) = ΔGsolvated, complex – (ΔGsolvated, receptor + ΔGsolvated, ligand) (2)
Think of it like figuring out how much you save on rent when two people share an apartment versus living separately, subtracting individual expenses from the combined cost. The changes in free energy for each term in equations 1 and 2 are estimated using equation 3:
ΔGtotal (solvated) = Egas, phase + (ΔGsolvation – T X Ssolute) (3)
In this equation 3, ΔGsolvation represents a true free energy because the solvent’s effects by using an implicit solvent model have been averaged. Typically, the gas-phase energies (Egas) are derived from MM energies provided by force fields. The solvation energies (ΔGsolvation) are calculated using an implicit solvent model, and the entropy contribution (S) is estimated using established approximations mentioned below.
Figure 1: Thermodynamic cycle for binding free energy calculation, showing ligand and receptor in solvated (top) and vacuum (bottom) states. Free energies of binding in solution (ΔGsolvated binding) and in vacuum (ΔGvacuum binding) are illustrated, facilitating the calculation of binding free energy in solution.
The free energy of solvation is further broken down into electrostatic and nonpolar contributions. Several implicit solvent models are available to compute solvation-free energies, including the PB/GB models mentioned above and a Reference Interaction Site Model (RISM). Note that these energy equations only describe when we are working with one structural snapshot/single-point energies of the solute at this point—kind of like judging a movie based on a single scene (hopefully one without spoilers). However, in practice, end-state calculations estimate these energies by averaging over an ensemble of representative structures generated by the MD simulations.
Accounting for Entropy (S)
In end-state calculations for relative stability and binding free energies, implicit solvent models often overlook certain contributions to the solute’s entropy. By treating biological systems as rigid rotors, translational and rotational entropies can be calculated using standard formulas from statistical mechanics. The vibrational entropy can be approximated using either normal mode analysis (NMA) or quasi-harmonic analysis (QHA).
NMA involves calculating vibrational frequencies at local minima on the potential energy surface. This method is computationally intensive for large systems because it requires minimising each frame, constructing the Hessian matrix, and diagonalising it to obtain vibrational frequencies (eigenvalues). The computational cost scales approximately with the number of atoms N as (3N)m. Here, m = 3, reflecting the cubic scaling associated with diagonalising a 3N×3N matrix.
QHA is less demanding computationally but necessitates many ensemble frames to accurately estimate the total entropy, which increases the computational load of the initial simulation. This method approximates the eigenvalues of the mass-weighted covariance matrix derived from the ensemble as frequencies of global, orthogonal motions, allowing for the calculation of vibrational entropies using standard formulas
A standard NMA Implementation
Since I have more hands-on experience with NMA, I will focus on its implementation details over QHA. A more detailed application can be found on the AMBER tutorials.
Implementing NMA in MD simulations involves crucial steps to ensure accurate results. It begins with with the systems energy minimisation to reach equilibrium geometry. Common algorithms, including steepest descent, conjugate gradient, and Newton-Raphson methods, are used to do this, and you can see an implementation of these here. This step is followed by constructing the Hessian matrix containing second derivatives of potential energy. This can be calculated by analytical derivatives or finite differences, which involves slightly displacing atoms and computing changes in forces (very expensive!). The next steps involve mass-weighting to form a dynamical matrix, which is then diagonalised to obtain eigenvalues and eigenvectors representing vibrational frequencies and atomic displacement patterns. I tend to use AMBER, but several software packages like GROMACS and NAMD/VMD facilitate NMA calculations, though choice depends on system size and computational resources.
Limitations and Challenges of NMA
Outside of its ridiculous computational costs, NMA is limited because it assumes atoms vibrate around the equilibrium position with small amplitudes. But at higher temperatures or in flexible molecules, anharmonicity becomes significant, and the harmonic approximation may fail. Essentially, NMA works like predicting how a swing moves, but it makes a big assumption: it thinks the swing only moves in small, perfect back-and-forth motions, like a pendulum. But real molecules don’t always behave so neatly, just like how a kid might swing wildly or twist the swing chains, molecules can make bigger, messier movements, especially when they get hot or are naturally flexible. When this happens, the simple back-and-forth prediction fail, meaning we might miss important details about how they work in real life.
NMA also tends not to account for solvent effects because including every water molecule in calculations is like trying to count every raindrop in a storm. You could either pretend the water is there as a general influence (implicit) or include actual water molecules (explicit) – but this makes everything much slower to calculate. Finally, there are mathematical hiccups. Sometimes, the calculations give impossible answers, like negative frequencies or different movements that get tangled together in the math. Over time, small math errors pile up – eventually, things blow up. Research tries to fix these issues by using fancier math like QHA and anharmonic corrections to account for these wild movements, but it’s a bit like trying to write rules for playground chaos. It’s possible but much more complicated than describing simple swinging.
Despite its limitations, NMA remains a valued tool for understanding molecular systems and tends to effectively bridge the gap between molecular structure and dynamic behaviour. However, make sure to carefully consider its constraints when applying it to your simulation.
Conclusion
To conclude, I hope to have clarified that while MM/PBSA and MM/GBSA are powerful tools, their accuracy fundamentally depends on how we handle entropy. Yes, these calculations are computationally expensive, complicated, and can technically be skipped, but they’re essential for meaningful results. So, my advice is simple: respect entropy from the beginning. Your future self (and your supervisor) will thank you.