Back in May 2020, we released the Coronavirus Antibody Database (‘CoV-AbDab’) to capture molecular information on existing coronavirus-binding antibodies, and to track what we anticipated would be a boon of data on antibodies able to bind SARS-CoV-2. At the time, we had found around 300 relevant antibody sequences and a handful of solved crystal structures, most of which were characterised shortly after the SARS-CoV epidemic of 2003. We had no idea just how many SARS-CoV-2 binding antibody sequences would come to be released into the public domain…
10 months later (2nd March 2021), we now have tracked 2,673 coronavirus-binding antibodies, ~95% with full Fv sequence information and ~5% with solved structures. These datapoints originate from 100s of independent studies reported in either the academic literature or patent filings.
The vast majority of these are complementary to SARS-CoV-2: 2,390/2,673 (89.4%), and of these 773/2,390 (32.3%) bind sufficiently strongly to a neutralising epitope to inhibit viral replication, primarily within the receptor-binding domain (RBD, 699/773). The remaining 74 bind elsewhere on the spike protein, primarily to N-Terminal domain, but also to the S2 domain and the S1/S2 furin cleavage site.
This implies a broad range of neutralisation mechanisms, including direct competition for the ACE-2 binding site, physical inhibition of the transition from RBD-down to RBD-up, blocking of the transition of the spike protein to its membrane fusion state, and allosteric conformational modification. Capturing such a large spread of neutralising antibodies is necessary to rapidly identify which SARS-CoV-2 protective immunoglobulins are present in the deep-sequenced response/memory repertoires of infected or vaccinated individuals.
Many of the antibodies in CoV-AbDab are also cross-reactive with other epidemic/seasonal coronaviruses. This knowledge could be key to prophylactic design, thinking forward to protective medicines that may help to contain future epidemic betacoronavirus outbreaks.
Across binders to all coronaviruses, 1,636 antibodies have been found that engage the receptor-binding domain from a wide range of genetic origins. This is an unpredicented number of confirmed complementary molecules against a relatively small antigenic region. CoV-AbDab therefore represents an enormous opportunity for the antibody function prediction community, where often a limited diversity of “true binder” data precludes learning more general determinants of complementarity. Critical assessments to understand just how diverse a set of antibodies are required at training-time to learn generalisable features about binders to a particular epitope may be made possible with the scale of data now in CoV-AbDab.
It is important to recognise that CoV-AbDab reflects the biases of the experiments currently performed into SARS-CoV-2. Binders to the spike protein, and in particular to neutralising epitopes in the RBD, are likely to be overrepresented relative to their expected proportion of a natural infection response. Most antibodies have been ‘baited’ from natural response repertoires (this is a positive bias from the perspective of the drug discovery community, as they are typically more developable than antibodies sourced from high-throughput in vitro assays). Finally, all studies referenced in CoV-AbDab use antigens from wildtype/early variants of SARS-CoV-2 as baits. Binders to these viruses may differ from binders to the latest dominant strains. Future papers relevant to CoV-AbDab are likely to investigate the role of antigenic drift on immunogenicity, and so the database will probably have to adapt to communicate which strain of SARS-CoV-2 was tested in each study.
Nonetheless, even with its current biases, the data in CoV-AbDab represents a huge opportunity and we expect it to grow still further in the months to come (we are tracking 60+ papers with expected data). We’re eager to see what follow-on investigations may result from this unique dataset.