I recently had the opportunity to present my work on antibody virtual screening at the 2021 ISMB/ECCB virtual conference. In this blogpost, I want to summarise two research projects presented in the 3DSIG immunoinformatics session (in which I also presented my work) highlighting two different avenues of approaching epitope prediction (and immunoinformatics questions in general): Structure-based (Epitope3D) and sequence-based (SeRenDIP-CE).
Epitope3D
Presented by Bruna Moreira da Silva, Epitope3D is an epitope prediction tool using graph-based signatures, developed at the University of Melbourne, which is an example of a structure-based approach to epitope prediction. It was developed and trained using a curated, non-redundant set of 200 antigen structures with marked epitopes.
Individual residues, labeled according to whether they belong to the epitope, are characterised using graph-based signatures of the neighbourhood of each residue, representing geometry and chemical composition of the environment, making it a similar approach to recent attempts to utilise geometric features in protein model assessment and property prediction.
The resulting signatures are used as input to a Adaboost classifier, which is tested against a set of 45 held-out antigens. In a comparison study against several other epitope prediction tools, Epitope3D boosted impressive classification performance. It will be interesting to see further evaluation of this tool.
SeRenDIP-CE
On the sequence-side, Dr Anton Feenstra presented SeRenDIP-CE, a random forest method for predicting epitopes from sequence.
This method is based on deriving a range of features from the antigen amino-acid sequence (172 features across a sliding window of 9 amino acids, incorporating MSA-derived information) and training a random forest using those features to predict epitopes on a set of antigens collected from our very own Structural Antibody Database (SAbDab).
Interestingly, SeRenDIP-CE made use of a transfer learning approach of sorts, by combining in their train set data both from antibody-antigen interactions and from general hetero-dimer protein-protein interactions (PPIs) used for their previous SeRenDIP model (I say transfer learning of sorts, because the model is trained once on the full combined dataset, rather than updating the model with the more specific antigen dataset after training on the hetero-dimer dataset). The authors reported this training procedure to achieve considerably better results than training on either just the antigen dataset or just the hetero-dimer dataset.
This has some interesting implications for general development of machine learning methods for antibody-antigen interactions, as it implies that, despite the dissimilarity of binding modes in antibody-antigen interactions compared to general PPIs, antibody machine learning methods can benefit from dataset augmentation from such PPI datasets.