Cross-linking mass-spectrometry: a guide to conformational confusions.

In the age of highly accurate structure prediction methods, I have seen more and more usage of cross-linking mass-spectrometry (XL-MS) and I wanted to understand its limitations more carefully. This is more of a guide to interpreting the data rather than how to perform the experiment. 

XL-MS is an analytical technique to identify and characterize protein-protein interactions and protein structures and is now more frequently being used to “validate” analysis of structure predictions. The main idea is to capture the 3D structure by covalently connecting residues that are proximal. The method is applied to proteins in solution using a typical proteomics workflow. Proteins are crosslinked, digested and the subject to LC-MS/MS. However, there are some experimental complications that are required to fully understand the methodological limitations.

Firstly, the XL reaction is applied sub-stoichiometrically and any crosslinkable residue may have multiple valid reactions because of its proximity (water etc). Thus the result is a mixture of potential products and as a result the cross-linked peptides have a much lower abundance than the non-cross-linked peptides from the same protein. Typically, one can improve analysis by using offline chromatographic enrichment, such as size-exclusion chromatography. XL peptides are identified based on their fragmentation pattern in the MS.

A linkage is assigned to residue pairs, resulting in a distance constraint defined by the length of the cross-linker used. These distance restraints report on 3D folding, topology, and PPI.

Let’s first understand the sub-stoichiometry of the XL reaction. This typically means that each molecule will only have a couple of crosslinks. However, the results are combined over different molecules (a necessity of performing the experiment via mass-spectrometry). Thus the data represents a conformational average. XL can then be mapped to a static protein structure. Care must be taken here – does a euclidean straight line represent reality here? Perhaps solvent accessible surface distance (SASD) along the protein surface is better. However, this starts to become circular if we need the protein structure to calculate this distance. The question then arises what happens if the XL distance is inconsistent with the static protein structure. The first option is that there was a false discovery – checking the confidence level, spectrum and sequence match to other areas of the protein would be sensible. Though most software reports results only at a high-confidence level (but errors are still there). If we’re confident in our data then the conformational average argument comes to play. The inconsistency is suggestive of protein flexibility or dynamics.

If one can separate out the different conformations (before or after cross-linking) one can apply quantitative MS approaches to compare conformations. There are too many approaches to detail carefully but one idea is to use isotope labeled cross-linkers and hence the conformational state is linked to a particular isotopologue of the cross-linker. The samples can then be mixed and then analyzed by MS. Quantitative analysis is typically harder to achieve good quality data for than the identification-only approach.

Another approach is photoactivated crosslinking. This is a two-step approach which first comprises an NHS ester group reacting with the protein at defined positions. Subsequent UV activation forms a linkage to amino acids within reachable distance. A cool approach is to use non-canonical photo amino acids so-called photo-AAs. These can be incorporated into a protein and so can be used to side-step the accessibility challenges. For example, consider photo-leucine (which is a leucine with a diazirine side-chain if I’ve understood correctly!) has been used to map conformations of binders in situ. 

One thing that is unclear from the literature is the relative stability of cross-linkers (that I could find!). If a cross-linker is unstable over the lifetime of a conformation then what exactly is it reporting on? The different approaches could lead to a conformational average, a slight bias towards characterizing certain conformations or just one snap-shot of one conformation. Often papers are unclear about these interpretation details and it’s often typical to brush these details under the figurative conformational carpet. In-cell cross-linking data may be further complicated by the cellular context and the conformational average is over subcellular localisation (insert head-exploding emoji).

I’d love to see photo-AA be used to follow the dynamics of a particular protein over the changes induced by a cellular process. Perhaps even proteome-wide.  

I refer to the entire collective works of Juri Rappsilber for better thoughts, mistakes are my own.

Author