Since my last blogpost on this topic back in 2018, OPIG has expanded its range of tools for antibody/BCR analysis. Here is an updated summary of the OPIG antibody databases and immunoinformatics tools.
NB: Several of our databases/tools [SAbDab, Thera-SAbDab, ABodyBuilder, PEARS, FREAD, Sphinx, ANARCI, Antibody iPatch, EpiPred, SCALOP, TAP] are now packaged in a Virtual Box called SAbBox. SAbBox is available under a free academic or a paid commercial license:
SAbBox Academic License: https://process.innovation.ox.ac.uk/software/p/15303a/sabbox-academic/1
SAbBox Commercial License: https://process.innovation.ox.ac.uk/software/p/15303/sabbox/1
Databases
1. The Structural Antibody Database (SAbDab)
Link: http://opig.stats.ox.ac.uk/webapps/newsabdab/sabdab/
Paper: https://academic.oup.com/nar/article/42/D1/D1140/1044118
SAbDab mines the PDB for antibody/nanobody structures, annotating them with metadata. It can be searched for:
- A PDB code
- PDB codes that match a series of metadata (resolution cutoffs, species, bound/unbound, has affinity data etc.)
- CDRs that match a series of metadata
- PDB codes with a particular VH-VL orientation
2. The Therapeutic Structural Antibody Database (Thera-SAbDab)
Link: http://opig.stats.ox.ac.uk/webapps/newsabdab/therasabdab/search/
Paper: https://academic.oup.com/nar/advance-article/doi/10.1093/nar/gkz827/5573951
Thera-SAbDab mines the WHO International Non-proprietary Name Publications and SAbDab to provide the latest sequence and structural data and metadata for all therapeutic antibody/nanobody formats. Sequence and attribute searches available.
3. The Coronavirus Antibody Database (CoV-AbDab)
Link: http://opig.stats.ox.ac.uk/webapps/covabdab/
Paper: https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btaa739/5893556
CoV-AbDab mines the scientific literature (papers, preprints) and patents to pool together sequences of all antibodies/nanobodies proven by experimental assay to bind at least one coronavirus (e.g. SARS-CoV, MERS-CoV, SARS-CoV-2) antigen. Structural information is also provided through mining of SAbDab, and homology models are built for full Fv sequences for which no solved structures currently exist. Sequence and attribute searches available.
4. Observed Antibody Space (OAS)
Link: http://opig.stats.ox.ac.uk/webapps/oas/
Required Input: N/A (Database)
Paper: http://www.jimmunol.org/content/201/8/2502
OAS (Observed Antibody Space) is a quality-filtered, consistently-annotated database of full-chain BCR/antibody sequence data. All sequences are provided pre-numbered in the IMGT numbering scheme and annotated with potential liabilities. Every dataset is freely downloadable; most downloads are fully minimum Adaptive Immune Receptor Repertoire Community Standard (MiAIRR) compliant and the remainder will be updated in the coming months. Here you can:
- Filter sequencing data by study or filter the data across studies
- Look at snapshots of the immune repertoire in specific disease states (e.g. healthy, day 7 after vaccination, HIV positive)
- Analyse different isotype properties
- Analyse different species properties, and much more…
A recent development is that OAS has been adapted to include single cell VDJ sequencing data; see more in Aleks Kovaltsuk’s latest blogpost: https://www.blopig.com/blog/2020/09/adding-paired-bcr-data-to-oas/
BCR/Antibody Structural Modelling Tools
5. VH-VL Orientation (AbAngle)
Code: http://opig.stats.ox.ac.uk/webapps/newsabdab/sabpred/abangle/
Required Input: Variable domain (Fv) structure
Paper: http://opig.stats.ox.ac.uk/webapps/newsabdab/sabpred/abangle/
AbAngle can characterise Fv VH/VL orientation through a combination of 5 angles and 1 distance measurement.
6. Loop Canonical Form Classifier (SCALOP)
Code: https://github.com/oxpig/SCALOP
Webserver: http://opig.stats.ox.ac.uk/webapps/newsabdab/sabpred/scalop
Required Input: (Paired/Separate Chain) Antibody Variable Domain Sequence(s)
Paper: https://academic.oup.com/bioinformatics/advance-article-abstract/doi/10.1093/bioinformatics/bty877/5132697
Five of the CDRs (CDRH1-2, CDRL1-3) are found to fall into distinct, clusterable, canonical structures. SCALOP uses environment-specific substitution matrices to assign likely canonical form from sequence alone. Its high fidelity and speed ensure that this analysis can be performed even on very large datasets (e.g. as part of SAAB+).
7. BCR/Antibody Loop Homology Modelling (FREAD)
Code: http://opig.stats.ox.ac.uk/webapps/newsabdab/sabpred/fread/
Required Input: CDR Sequence + Framework structure (on which the loop will be grafted)Paper: https://pubs.rsc.org/en/content/articlelanding/2011/MB/c1mb05223c
FREAD uses Environment-Specific Substitution Tables to evaluate a score for how compatible the dihedral angles of each template loop would be for the target CDR sequence. It then ranks putative loop templates by ease of graftability onto the chosen framework region (least clash ranks first). FREAD was originally designed for general loop modelling, but can be made CDR-specific by building separate antibody CDR loop template databases.
8. Hybrid CDRH3 Modelling (Sphinx H3)
Webserver: http://opig.stats.ox.ac.uk/webapps/newsabdab/sabpred/sphinx
Required Inputs: CDR Sequence + Framework structure (on which the loop will be grafted)
Paper: https://academic.oup.com/bioinformatics/article/33/9/1346/2908432
Sphinx is useful when your homology modeller (e.g. FREAD) cannot find a close template match for your loop of interest. It uses a combination of shorter, sequence similar, template data and ab initio methodology to fill in the gaps. It then returns its decoys ranked using the SOAP-loop algorithm.
9. Side-Chain Modelling (PEARS)
Webserver: http://opig.stats.ox.ac.uk/webapps/newsabdab/sabpred/pears
Required Inputs: Antibody Variable Domain Structure + Corresponding Sequence (with side chains to be remodelled in capital letters)
Paper: https://onlinelibrary.wiley.com/doi/abs/10.1002/prot.25453
PEARS remodels side chains by using Gaussian Mixture Models to predict the most probable rotamer for each remodelled residue, given its position in the antibody sequence.
10. Full Fv Modelling Pipeline (ABodyBuilder)
Webserver: http://opig.stats.ox.ac.uk/webapps/newsabdab/sabpred/abodybuilder/
Required Inputs: (Paired chain) Antibody Variable Domain Sequence (will model Nanobodies if only heavy chain supplied)
Paper: https://www.tandfonline.com/doi/full/10.1080/19420862.2016.1205773
ABodyBuilder chains together ANARCI – FREAD – Modeller/Sphinx (if FREAD fails to find a good loop template) – PEARS as a pipeline to create antibody models from sequence data. It also reports likely model accuracy for each region of the structure. Typical runtime is just over 30s for most antibodies.
BCR/Antibody Informatics Tools
11. BCR/Antibody Sequence Numbering (ANARCI)
Code: https://www.biorxiv.org/content/10.1101/2020.03.24.004051v2
Webserver: http://opig.stats.ox.ac.uk/webapps/newsabdab/sabpred/anarci/
Required Input: (Separate Chain) Antibody Variable Domain Sequence(s)
Paper: https://academic.oup.com/bioinformatics/article/32/2/298/1743894
Consistent use of a numbering scheme is essential to quickly identify CDR regions or to compare between multiple antibodies. ANARCI uses Hidden Markov Models to align your sequences to germlines of known numbering, and rapidly returns them numbered in the scheme of choice (IMGT, Chothia, Kabat, Martin).
12. BCR-seq Dataset Error Annotation (ABOSS)
Code: http://opig.stats.ox.ac.uk/resources
Required Input: (Separate Chain) Antibody Variable Domain Sequences
Paper: https://www.jimmunol.org/content/early/2018/11/02/jimmunol.1800669
ABOSS highlights potential sequencing errors in BCR-seq datasets. Sequences that do not align to germlines (see ANARCI), have IMGT CDRH3 lengths ≥ 37, possess indels in the canonical CDRs or framework regions, start at IMGT position 24 or later, or have a J gene with sequence identity < 50% to known IMGT germlines are removed. The estimated error rate for your dataset is then calculated based on how often the C23-C104 (IMGT numbering) conserved disulfide bridge is missing from your data. Sequences with residues seen at a given position less frequently than the estimated error rate are then filtered out of the dataset.
13. BCR-seq Dataset Structural Annotation (SAAB+)
Code: https://github.com/oxpig/saab_plus
Required Inputs: (Separate Chain) BCR/Antibody Sequences
Paper: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1007636
SAAB+ (developed from the original SAAB software, see https://www.frontiersin.org/articles/10.3389/fimmu.2018.01698/full) rapidly annotates BCR-seq datasets with structural features (e.g. CDRH1-2, CDRL1-3 canonical forms; closest CDRH3 template in the PDB).
14. BCR-seq Fv Structural Diversity Assessment (Repertoire Structural Profiling)
Required Inputs: BCR-seq Dataset (VH + VL, separate or paired)
Preprint: https://www.biorxiv.org/content/10.1101/2020.03.17.993444v2
Repertoire Structural Profiling converts BCR-seq datasets containing VH and VL reads (paired or unpaired) into the maximum diversity of modellable Fv structures that can currently be derived from those sequences. Once fully structurally modelled, these sets of Fv topologies are termed “Antibody Model Libraries”. These Antibody Model Libraries can be used as a geometric basis set for in vitro/in silico screening library development.
15. BCR/Antibody Paratope Prediction (Antibody iPatch)
Webserver: http://opig.stats.ox.ac.uk/webapps/sabdab-sabpred/ABipatch.php
Required Inputs: Antibody and Antigen structures
Paper: https://academic.oup.com/peds/article/26/10/621/1512673
Antibody i-Patch uses contact prediction, antibody binding profiles (derived from PDB antibody-antigen complexes), and the supplied antibody and antigen structures to rank antibody CDR residues based on their propensity to form part of the paratope [the region of the antibody that engages the antigen]. The inverse protocol (where antigen residues are ranked on their likelihood to form part of the epitope) is available here: http://opig.stats.ox.ac.uk/webapps/newsabdab/sabpred/epipred/
16. Predicted Paratope Clustering for Functional Annotation (Paratyping)
Code: http://opig.stats.ox.ac.uk/resources
Required Inputs: BCR/Antibody VH sequence
Preprint: https://www.biorxiv.org/content/10.1101/2020.06.02.121129v1
Paratyping uses the Parapred tool to mark up predicted paratope residues and then clusters BCR/antibody sequences by sequence identity over these residues. This tool is an orthogonal approach to clonotyping for repertoire functional analysis.
17. Paratope Structural and Chemical Similarity Assessment (Ab-Ligity)
Code: http://opig.stats.ox.ac.uk/resources
Required Inputs: BCR/Antibody VH or Fv sequence
Preprint: https://www.biorxiv.org/content/10.1101/2020.03.24.004051v2
Ab-Ligity uses Parapred to predict paratope residues and ABodyBuilder to model the antibody Fv structure. It then converts these paratope residues and their coordinates into a hash-table representation that captures both structural and chemical features. A sufficiently high Tversky index value between two hash tables infers that two antibodies have paratopes similar enough to one another that they might be functionally equivalent.
18. Therapeutic Antibody Developability Assessment (TAP)
Webserver: http://opig.stats.ox.ac.uk/webapps/newsabdab/sabpred/tap
Required Input: Antibody Variable Domain Sequence (Paired Chains)
Paper: https://www.pnas.org/content/116/10/4025
TAP builds a model of your variable domain antibody sequence (via. ABodyBuilder) and calculates several surface properties, extreme values of which are linked to poor therapeutic developability outcomes. It then compares these values to a set of therapeutic antibodies that reached Phase II of clinical trials and flags outlying candidates.