The Coronavirus Antibody Database: 10 months on, 10x the data!

Back in May 2020, we released the Coronavirus Antibody Database (‘CoV-AbDab’) to capture molecular information on existing coronavirus-binding antibodies, and to track what we anticipated would be a boon of data on antibodies able to bind SARS-CoV-2. At the time, we had found around 300 relevant antibody sequences and a handful of solved crystal structures, most of which were characterised shortly after the SARS-CoV epidemic of 2003. We had no idea just how many SARS-CoV-2 binding antibody sequences would come to be released into the public domain…

10 months later (2nd March 2021), we now have tracked 2,673 coronavirus-binding antibodies, ~95% with full Fv sequence information and ~5% with solved structures. These datapoints originate from 100s of independent studies reported in either the academic literature or patent filings.

The entire contents CoV-AbDab database as of 2nd March 2021.

The vast majority of these are complementary to SARS-CoV-2: 2,390/2,673 (89.4%), and of these 773/2,390 (32.3%) bind sufficiently strongly to a neutralising epitope to inhibit viral replication, primarily within the receptor-binding domain (RBD, 699/773). The remaining 74 bind elsewhere on the spike protein, primarily to N-Terminal domain, but also to the S2 domain and the S1/S2 furin cleavage site.

This implies a broad range of neutralisation mechanisms, including direct competition for the ACE-2 binding site, physical inhibition of the transition from RBD-down to RBD-up, blocking of the transition of the spike protein to its membrane fusion state, and allosteric conformational modification. Capturing such a large spread of neutralising antibodies is necessary to rapidly identify which SARS-CoV-2 protective immunoglobulins are present in the deep-sequenced response/memory repertoires of infected or vaccinated individuals.

Many of the antibodies in CoV-AbDab are also cross-reactive with other epidemic/seasonal coronaviruses. This knowledge could be key to prophylactic design, thinking forward to protective medicines that may help to contain future epidemic betacoronavirus outbreaks.

All these antibodies are cross-reactive between SARS-CoV-1 and SARS-CoV-2, and have the ability to neutralise (to different extents) one or both viruses.

Across binders to all coronaviruses, 1,636 antibodies have been found that engage the receptor-binding domain from a wide range of genetic origins. This is an unpredicented number of confirmed complementary molecules against a relatively small antigenic region. CoV-AbDab therefore represents an enormous opportunity for the antibody function prediction community, where often a limited diversity of “true binder” data precludes learning more general determinants of complementarity. Critical assessments to understand just how diverse a set of antibodies are required at training-time to learn generalisable features about binders to a particular epitope may be made possible with the scale of data now in CoV-AbDab.

It is important to recognise that CoV-AbDab reflects the biases of the experiments currently performed into SARS-CoV-2. Binders to the spike protein, and in particular to neutralising epitopes in the RBD, are likely to be overrepresented relative to their expected proportion of a natural infection response. Most antibodies have been ‘baited’ from natural response repertoires (this is a positive bias from the perspective of the drug discovery community, as they are typically more developable than antibodies sourced from high-throughput in vitro assays). Finally, all studies referenced in CoV-AbDab use antigens from wildtype/early variants of SARS-CoV-2 as baits. Binders to these viruses may differ from binders to the latest dominant strains. Future papers relevant to CoV-AbDab are likely to investigate the role of antigenic drift on immunogenicity, and so the database will probably have to adapt to communicate which strain of SARS-CoV-2 was tested in each study.

Nonetheless, even with its current biases, the data in CoV-AbDab represents a huge opportunity and we expect it to grow still further in the months to come (we are tracking 60+ papers with expected data). We’re eager to see what follow-on investigations may result from this unique dataset.

Author