CryoEM is now the dominant technique for solving antibody structures

Last year, the Structural Antibody Database (SAbDab) listed a record-breaking 894 new antibody structures, driven in no small part by the continued efforts of the researchers to understand SARS-CoV-2.

Fig. 1: The aggregate growth in antibody structure data (all methods) over time. Taken from http://opig.stats.ox.ac.uk/webapps/newsabdab/sabdab/stats/ on 25th May 2022.

In this blog post I wanted to highlight the major driving force behind this curve – the huge increase in cryo electron microscopy (cryoEM) data – and the implications of this for the field of structure-based antibody informatics.

Below is a quick plot of antibody structures solved per year by X-ray crystallography (XRC) and by cryoEM (Fig. 2).

Fig. 2: X-Ray crystal structure depositions versus cryo-electron microscopy structure depositions in SAbDab. 2022’s figures are estimated through linear extrapolation based on today’s figures.

Indeed, in the years preceding the SARS-CoV-2 pandemic, the numbers of antibody structures solved by XRC were at best stagnating, if not declining. This is consistent with the statistics for the Protein Data Bank across all proteins (XRC trend, Cryo trend). Meanwhile the number of antibodies solved by cryoEM has been steadily rising since the advent of the technique, historically growing at an exponential rate with doublings every two years. The pandemic accelerated this trend: from 112 solved in 2019 to 199 solved in 2020 [+43.7%] and then to 444 solved in 2021 [+123.1%]!

Judging by the ratio of structures released so far this year, we have finally reached the tipping point where more new antibody structures are being solved by cryoEM than X-Ray crystallography. Linear extrapolation of these figures would forecast a roughly flat year for X-Ray structures (432 [-4% year-on-year]), and another growth year for cryoEM (573 [+22.5% year-on-year]).

There are many reasons for the growing populatity of cryoEM, see this Nature article for an op/ed on the topic. Primary motivations include the ability to characterise larger in-tact complexes than XRC (e.g. membrane proteins, multimeric states), and an ability to study broader sets of protein classes – not just those that tend to crystallise. For antibodies, it is the former reason that has been most influencial of late, as seen in the recent slew of structures of antibodies engaging full length SARS-CoV-2 spike protein trimers. It is essential to include all the subunits to characterise epitopes that bridge two repeating units, or those that only bind in specific mixed multimeric conformations (e.g. one Receptor Binding Domain-up, two Receptor Binding Domains-down, etc.).

While the increasing use of cryoEM brings with it opportunities to generally characterise a broader range of antibody epitope regions, this data is less ideal for use in structure-based antibody informatics applications that require precise input data. While headline “resolution” values of cryoEM structures are improving over time (approaching a mean value of 3.5Å on the latest antibody structures, Fig. 3), this only captures approximate global consistency of the Fourier shells, and the fit of the atomic model to the electron density can vary widely from locale to locale within the same structure. Therefore there is still high uncertainty about when exactly cryoEM structures can tell us useful information about antibody-antigen interaction patterns or about loop backbone conformations, meaning they are typically filtered out when building train-test-validate sets regardless of ‘resolution’.

Fig. 3: The trend in headline resolution (±1sd) of CryoEM-solved antibody-antigen structures over time. By contrast, X-ray crystallography resolution had peaked by 2013 at an average of around 2.5Å (blue dashed line). Since the methods of measuring resolution are different for the two experimental techniques, direct numerical comparisons should be made with caution.

But for the role of structural biology in tackling the SARS-CoV-2 pandemic, we would likely be in a steady year-on-year decline in new XRC data. Despite the growth in SAbDab’s contents, we would be getting incrementally fewer and fewer new structures that (on today’s threshold requirements) could be used for developing algorithms for structure prediction/complementarity prediction.

It is true that sheer amount is often second in value to diversity; 100 solved structures of 20 distinct antibodies can be worth less to general prediction tasks than 50 solved structures of 50 distinct antibodies, especially if the antibodies are very different from anything else yet solved. It is possible that even if XRC data grows at an ever-slower rate post-pandemic, the richness of new antigen binding contexts could still add a lot to algorithm performance.

However, I think it’s important that as a field we are aware of and continue to track this trend. To me, it seems likely that the era of N(antibodies solved by cryo) > N(antibodies solved by XRC) is here to stay, especially as cryoEM continues to trend towards achieving finer-quality densities (Fig. 3). It may soon be time to conduct studies to assess whether structure-based antibody informatics can avail of any of today’s cryoEM data, or, more explicitly, to define local quality thresholds that should be surpassed before we can trust the data for prediction tasks.

Author