In keeping with the other posts in recent weeks, and providing a certain continuity, this post also focusses on antibodies. For those of you that have read the last few weeks’ posts, you may wish to skip the first few paragraphs, otherwise things may get repetitive…
Antibodies are key components of the immune system, with almost limitless potential variability. This means that the immune system is capable of producing antibodies with the ability to bind to almost any target. Antibodies exhibit very high specificity and very high affinity towards their targets, and this makes them excellent at their job – of marking their targets (antigens) to identify them to the rest of the immune system, either for modification or destruction.
(left) The Immunoglobulin (IgG) fold, the most common fold for antibodies. It is formed of four chains, two heavy and two light. The binding regions of the antibody are at the ends of the variable domains VH and VL, located at the ends of the heavy and light chains respectively. (right) The VH domain. At the end of both the VH and the VL domains are three hypervariable loops (CDRs) that account for most of the structural variability of the binding site. The CDRs are highlighted in red. The rest of the domain (coloured in cyan), that is not the CDRs, is known as the framework.
Over the past few years, the use of antibodies as therapeutic agents has increased. It is now at the point where we are beginning to computationally design antibodies to bind to specific targets. Whether they are designed to target cancer cells or viruses, the task of designing the CDRs to complement the antigen perfectly is a very difficult one. Computationally, the best way of predicting the affinity of an antibody for an antigen is through the use of docking programs.
For best results, high resolution, and very accurate models of both the antibody and the antigen are needed. This is because small changes in the antibodies sequence can be seen to produce large changes in the affinity, experimentally.
Many antibody modelling protocols currently exist, including WAM, PIGS, and RosettaAntibody. These use a variety of approaches. WAM and PIGS use homology modelling approaches to model the framework, augmented with expert knowledge-based rules to model the CDRs. RosettaAntibody also uses homology modelling to model the framework of the antibody, but then uses the Rosetta protocol to perform an exploration of the conformational space to find the lowest energy conformation.
However, there are several problems that remain. The orientation between the VH domain and the VL domain is shown to be instrumental in the high binding affinity of the antibody. Mutations to framework residues that change the orientation of the VH and VL domains have been shown to cause significant changes to the binding affinity.
Because of the multi-chain modelling problem, which currently has no general solution, the current approach is often to copy the orientation across from the template antibody to create the orientation of the target antibody. (The three examples above do perform some extent of orientation optimisation using conserved residues at the VH-VL interface.)
However, before we begin to consider how to effect the modelling of the VH-VL interface, we must first build the VH and the VL separately. All of the domain folds in the IgG structure are very similar, consisting of two anti-parallel beta sheets sandwiched together. These beta sheets are very well conserved. The VH domain is harder to model because it contains the CDR H3 – which is the longest and most structurally variable of the 6 CDRs – so we may as well start there…
Framework structural alignment of 605 non-redundant VHs (made non-redundant @95% sequence identity). The beta sheet cores are very well conserved, but the loops exhibit more structural variability (although not that much by general protein standards…). The stumps where the CDRs have been removed are labelled.
But even before we start modelling the VH, how hard is the homology modelling problem likely to be for the average VH sequence that we come across? Extracting all of the VH sequences from the IMGT database (72,482 sequences) we find the structure in SAbDab (Structural Antibody Database) that exhibits the highest sequence identity to each of the sequences. This is the structure that would generally be used as the template for modelling. Results below…
Most of the sequences have a best template with over 70% sequence identity, so modelling them with low RMSDs (< 1 Angstrom) should be possible. However, there are still those that have lower sequence identity. These could be problematic…
When we are analysing the accuracy of our models, we often generate models for which we have experimentally derived crystal structures, and then compare them. But a crystal structure is not necessarily the native conformation of the protein, and some of the solvents added to aid the crystallisation could well distort the structure in some small (or possibly large) way. Or perhaps the protein is just flexible, and so we wouldn’t expect it to adopt just one conformation.
Again using SAbDab to help generate our datasets, we found the maximum variation (backbone RMSD) between sequence-identical VH domains, for the framework region only. How different can 100% identical sequences get? Again, results are below…
We see that even for 100% identical domains, the conformations can be different enough for a significant RMSD. The change that created a 1.4A RMSD change (PDB entries 4fqc and 4fq1) is due to a completely different conformation for one of the framework loops.
So, although antibody modelling is easy in some respects – high conservation, large number of available structures for templates – it is not just a matter of getting it ‘close’, or even ‘good’. It’s about getting it as near to perfect as possible… (even though perfect may be ~ 0.4 A RMSD over the framework…)
Watch this space…
“Perfection is not attainable, but if we chase perfection we can catch excellence.”
(Vince Lombardi )