Chemoinformatics uses a curious jumble of terms from thermodynamics, wet-lab techniques and statistical terminology, which is at its most jarring, it could be argued, in machine learning. In some datasets one often sees pIC50, pEC50, pKi and pKD, in discussion sections a medchemist may talk casually of entropy, whereas in the world of molecular mechanics everything is internal energy. Herein I hope to address some common misconceptions and unify these concepts.
Dissociation constant and Gibbs free energy
The dissociation constant, KD, is the ratio of the dissociation rate, koff, over the association rate, kon, the former being the rate at which a single entity (the protein–ligand complex) becomes two, whereas the latter is when two different entities at different concentrations become one, hence why its not a pure number but a value which lower is better. For enzymes the protein–ligand complex can not only dissociate but can change with a catalytic rate (kcat), the Michaelis constant (KM) is similar to the dissociation constant, except for the fact that the numerator (one-becomes-two term) is a sum of two directions (koff and kcat).
The change in Gibbs free energy (ΔG, a potential so negative is good) from binding is a different but related factor. This is the total energy of the system accounting for entropy (ΔS) and enthalpy (ΔH).
The relationship between ΔG and the dissociation constant is somewhat:
Where C0 is a fudge factor of 1M to make the units match and Botzmann constant (kB) times temperature is very common scaling factor, that is the molar version of RT. [For a more complete exposition see Hall et al. 2020]. Comparing this to the Arrhenius equation or its variant, the Eyring equation, where a Gibbs potential is converted to a rate constant, you see something similar: there is a logarithmic relationship between dissociation constant and Gibbs free energy. Just for the sake of accuracy:
This is why dissociation constants don’t behave linearly and their log is often used (pKD). (Parenthetically this and other p transformations los their prefix, so mM results in a -3, and their sign is flipped).
Temperature appears in the above equations, which is annoying, because dissociation constants are temperate dependant. The temperature dependence is not simple but is described by the Van ‘t Hoff equation. Enzyme are flexible so catalysis changes with temperature with heat capacity making an appearance in Macromolecular Rate Theory. So temperature and macromolecular kinetics very much falls under the “It’s complicated”, but does exist (personally I think it’s fascinating).
pIC50, pEC50 and pKD
Unfortunately, measuring this is done by isothermal tritration calorimetry (ITC), which, albeit accurate and giving both rates and enthalpy, is laborious in both set-up and analysis, so quicker assays are generally performed, such as inhibition assays either in vitro or in vivo. In vitro, with a competitive inhibitor for enzymes one gets a larger apparent Michaelis constant (KM), while the catalytic rate (kcat) is unaffected. The Michaelis constant (KM) is similar to the dissociation constant, except for the fact that the numerator (one-becomes-two term) is a sum of two directions (koff and kcat). The dissociation constant of an inhibitor (Ki) is obviously independent of the dissociation constant of the native substrate, but the concentration of ligand causing 50% inhibition (IC50) is not. The Cheng-Prusoff equation defines the relationship between IC50 and the inhibition constant (Ki), wherein the former is scaled (hyperbolically) with the native substrate to given the latter.
This means different assays using different conditions cannot be compared.
This also means that pIC50 and pKD are not synonyms even if some datasets lump them together as pK values. pEC50 has all sorts of bio-stuff going on (such as membrane permeability) that one likewise should not compare them.
In most cases assays are performed with a substrate concentration close to the Michaelis constant, so the inhibitor’s dissociation constance is about half the size! Parenthetically, most enzymes operate with a Michaelis constant close to physiological substate concentration, a useful fact for back of the envelope calculations.
Gibbs, enthalpy and internal energy
Another point easily fumbled is the relationship between the different energies. The easiest is that the primary name of Josiah Willard Gibbs is Willard not Josiah —I don’t know what you call it, it’s some weird olden days custom.
A second one is difference in Gibbs, enthalpy and internal energy are potentials so negative is good. A rock is energetically happier when it has rolled down a hill. This is Gbound – Gunbound
or Gmutant – Gwildtype
etc. although some tools have ∆G sign flipped to be positive to be less confusing, but makes everything confusing.
The second law of thermodynamics introduces entropy and enthalpy —entropy (∆S) is important for protein–ligand binding as I will get to below, but first enthalpy. When a protein structure is energy minimised (in silico) the temperature is set to zero so Gibbs free energy and enthalpy are the same and the state with the lower enthalpy/Gibbs is found —or is it? Enthalpy (H) is the internal energy (U) plus pressure times volume displaced, i.e. the space taken up by the ligand and protein, however as the latter term is the same, we can ignore it.
This U notation is seen in documentation of forcefields, i.e.the sum of different term/potentials approximating physical properties (along with fudge factors in the case of hybrid ones to make the values match). Normally these are
- a harmonic stretch term,
- a periodic harmonic bend term,
- a cosine torsion term,
- and non-bonded interaction terms formed of the Lenard–Jones potential and Coulombic term.
Many variants exist, my favourites are:
- a sum of exponentials to replace the LJ equation recently described by the Daniel Cole group (similar to a Morse potential as it is an extension to it),
- the Drude oscillator, a centroid like particle to express the polarisability of an set of atoms (e.g. aromatic rings) because partial charges are otherwise fixed in classical molecular mechanics,
- the Dreiding H-bond term, which has a cosine (
cos θ
) term to diminish the effect of deviation from 180° H-bonds.
The fact that there are so many variants shows that molecular mechanics equations are an approximation.
Entropy!
Even if internal energy (U) or enthalpy (H) can be elegant modelled, there is its messy counterpart entropy. Entropy is all about maximising the number of microstates in a system and increasing the ways the energy can be distributed to these, via their translational, rotational and vibrational degrees of freedom (cf. equipartition theorem).
This has some interesting effects:
- water displacement (desolvation): when a water is displaced the entropy increases by 1.7 kcal/mol —unless it was a “frustrated water”, which is lower
- Rigidification (conformational entropy): a ligand (or flexible sidechain) loses rotational degrees of freedom on binding, so entropy decreases by about 1 kcal/mol (making the Gibbs potential worse).
- Hydrophobic Effect: the interatomic Coulombic term used in the internal energy above when applied to neutrally partially charged groups is minor, but when water comes in to the picture entropy make a big stand, because the water bond with each other to avoid the hydrophobic patch thus reducing their rotational freedom. This can go both ways: hydrophobic ligand or ligand masks hydrophobic residues. The nicest small molecule example of the latter is from Young 2007 where a greasy isopropyl group makes the Ki go from 39 µM to 1 nM. But in biology protein folding is hand-down the most impressive example.
To make matters more complex, there’s also induced fit, wherein the entropy of the system may diminish, but the enthalpy release from binding is so great it is compensated for. In summary, entropy and water do play a part in binding. In vacuo calculations are effectively only enthalpy. An implicit solvation models, such as Generalized Born model, does account for solvent effects but is basically an enthalpic-like term —it is not temperature dependent. There are variants such as GBSA models, which better account for the hydrophobic effect, but the solvation shell is not properly modelled. In other words, waters are important.
The most accurate computational technique to calculate the ∆G of a bound ligand involves running an MD simulation (FEP calculation) and applying a formula (Zwanzig equation) akin to an averaged log-sum-exp smoothed maximum to effectively get the energy across a trajectory. That is because it is a dynamic process and not a snapshot —even if crystallography makes it seem that way. That is, the crystallographic state is not the sole truth.
For fragments enthalpy is said to dominant, while for larger entropy plays a larger role. This is a guideline and entropy works both ways (desolvation + hydrophobic effect vs. rigidification), in fact the penalty from rigidification might prevent fragments from binding.
Relationship between kon and koff and entropy and enthalpy
There is somewhat of a relationship between dissociation rate and enthalpy, and association rate and entropy, but it is not at all clear cut. The dissociation rate constant is related to the stability of the complex and is influenced by the strength of the interactions (enthalpic contributions) holding the complex together and the entropy change upon dissociation. The association rate constant typically has contributions from entropic conditions, such as the diffusion of molecules towards each other, the orientation and conformational adjustments necessary for binding (induced fit), but also enthalpic interactions such as the formation of the transition state leading to the bound complex.
Conclusion
In terms of binding, the internal energy (appox. enthalpy) from interaction is not the whole story and entropy can play a role with rigidification causing a penalty, while water displacement and masking hydrophobic effect cause a gain. Hence why Gibbs free energy determines binding, however, whereas this in turn is directly logarithmically proportional to the dissociation constant, the pIC50 is also dependent on the assay substrate concentration.