Fragment Based Drug Discovery with Crystallographic Fragment Screening at XChem and Beyond

Disclaimer: I’m a current PhD student working on PanDDA 2 for Frank von Delft and Charlotte Deane, and sponsored by Global Phasing, and some of this is my opinion – if it isn’t obvious in one of the references I probably said it so take it with a pinch of salt

Fragment Based Drug Discovery

Principle

Fragment based drugs discovery (FBDD) is a technique for finding lead compounds for medicinal chemistry. In FBDD a protein target of interest is identified for inhibition and a small library, typically of a few hundred compounds, is screened against it. Though these typically bind weakly, they can be used as a starting point for chemical elaboration towards something more lead-like. This approach is primarily contrasted with high throughput screening (HTS), in which an enormous number of larger, more complex molecules are screened in order to find ones which bind. The key idea is recognizing that the molecules in these HTS libraries can typically be broken down into a much smaller number of common substructures, fragments, so screening these ought to be more informative: between them they describe more of the “chemical space” which interacts with the protein. Since it first appeared about 25 years ago, FBDD has delivered four drugs for clinical use and over 40 molecules to clinical trials.

References:

Libraries 

Critical to the success of fragment screening campaigns is the fragments screened. These must meet a variety of criteria to be useful. They must be representative of the chemical motifs that are likely to bind, they must be small enough to minimise unfavourable interactions and steric clashes and they must be suitable for further chemical elaboration. Furthermore, that they be readily available is typically a practical requirement, especially for academic and small industrial users. 

A variety of libraries are commonly used today including the DSi poised library, F2X universal library, FragMAXlib, the EU-OPENSCREEN fragment library and a whole family of commercial libraries. 

References:

Methods

Many experimental techniques are available for performing fragment based drug discovery, but all have their advantages and disadvantages. The main parameters to consider are the range of detectable affinities, the ability to determine what those affinities are, the amount of information provided about the protein-ligand interaction, the protein consumption, throughput and technical challenges.

The main techniques used today are surface plasmon resonance (SPR), isothermal titration calorimetry (ITC), thermal shift assay (TSA), microscale thermophoresis (MST), nuclear magnetic resonance (NMR) and X-Ray crystallography. As we’re mostly interested in crystallography at XChem I won’t detail the others, but other techniques are being investigated for orthogonal validation here!

References:

Follow Up Design

FBDD typically includes a further step before conventional medicinal chemistry: fragment elaboration. Fragments typically make very poor binders, however three common approaches are used to achieve the potency required of a lead compound: growing, merging and linking. This process will typically produce many follow up compounds for a second round of experimental screening.

In growing small chemical changes are made to the fragment. When structural information is available, this is typically guided by the structure of the protein. In merging, fragments with partial overlaps in space are combined in the hopes of capturing the favourable interactions of both. In linking, non-overlapping but nearby fragments are combined by joining them with a new chemical moiety, a linker. Although the latter two approaches can in principle be performed in the absence of structural information on the ligand’s binding mode, you’re basically guessing which fragments are suitable for combination.

With this new generation of compounds another round of experimental screening is typically undertaken to determine which of them have bound, and whether they have achieved acceptable affinities to be considered as leads.

References:

Crystallographic Fragment Screening

Among the experimental techniques, crystallographic fragment based drug discovery provides uniquely powerful insight into the binding mode (with the possible exception of some NMR approaches). Furthermore it is sensitive to fragments with a larger range of affinities than other common approaches. This makes it by far the most powerful of the techniques for guiding rational fragment elaboration.

These advantages however are not without limitations. First and foremost is the crystals: a consistent crystal system that tolerates the soaking of high concentrations of fragments and diffracts to a reasonably high resolution (<2.5A) is required. Relative to other FBDD methods, it also requires significant amounts of protein. The analysis of results is also more complicated than most methods (also with the exception of some NMR methods), with the ability to detect a binder requiring explicit construction of a model of its binding mode. Now that point might be a bit controversial: how compelling does a model need to be in order to be considered “real”? When do we believe a blob is the fragment we soaked, not just noise or a different ligand? These aren’t questions with easy answers.

Other significant limitations exist, however today these can be largely overcome with the correct setup. Historically throughput was a significant issue, although robotics and modern beamlines have dramatically accelerated this for those facilities with access to them. Partial occupancy binders have been very difficult to model, although PanDDA analysis typically greatly simplifies this. Even the heavy capital requirements can be overcome with the availability of beamlines like XChem that provide their services to smaller, academic users.

Outside of FBDD, fragment methods have also been applied productively in basic science. Fragment screens have been used to probe protein dynamics, as they often introduce conformational changes on binding. They have also been used in functional studies, where previously unknown substrates of proteins have been discovered by comparison to fragments known to bind.

References:

The XChem platform

Overview

Diamond’s FBDD platform is XChem. It was the first in the world to offer routine medium throughput crystallographic fragment screening to academia and industry. Today over a hundred projects on diverse targets have been through the XChem pipeline. Between users more than five million pounds has been raised for progressing hits from XChem screens. 

References:

Pre-screen

Typically projects are not advanced to XChem until a reliable crystal system has been proven. That said, final preparation of crystals for the screen, that is to say soaking of fragments and sample harvesting, is typically performed in house. This relies on two key pieces of infrastructure: the echo acoustic dispenser and the crystal shifter.

Simply soaking the correct fragments into the crystals can be enormously time consuming and error prone. Fortunately, the echo acoustic dispenser completely automates this process. The device uses ultrasonic pulses to launch droplets of very precise volumes of fragment containing solution into the crystal medium.

Mounting the number of crystals required for an XChem screen on pins can also become prohibitively time consuming, as can keeping records of this process. To help manage this, the shifter was developed, largely based on the premise that humans are still a better way to fish crystals than robots, but every part of the process other than that can be better handled by machines.

References:

Beamline

XChem operates on Diamond’s I04-1 beamline. I04-1 is optimised for speed and reliability, as little variation is needed in beam parameters for crystallographic fragment screening. I04-1 achieves its automated high throughput of crystals with off the shelf robotics, with the reliability a product of a simple setup, with the beamline being fixed wavelength monochromatic. Like other MX beamlines at Diamond I04-1 uses a Pilatus detector for the actual data collection. As a result of a shortened undulator I04-1 has only a fraction of the flux of its neighbouring beamlines, but this is little impediment for the experiments typically conducted here.

References:

Analysis

Data from the detector is processed automatically with DIALS, Xia2, autoPROC ,and STARANISO. From these results, the final datasets for further analysis are selected based on assessed data quality and similarity to a reference model.

One of the biggest innovations to come out of I04-1 is the PanDDA algorithm developed by Nick Pearce. Determination of binding in these experiments is limited by the need to model partial occupancy molecular fragments. This is typically extremely challenging: even spotting the relevant bov in contoured electron density can be almost impossible, let alone modelling it with any confidence. PanDDA addresses both of these issues by both detecting clusters of outlying electron density and producing background-corrected “event maps”, which attempt to approximate what those blobs would look like if they were a fragment binding with unitary occupancy. 

References:

Follow Up Design

As fragment elaboration typically requires immediate follow up experiments to confirm binding, and does not typically concern itself too much with medical suitability, it makes sense to this chemistry part of the screen. A number of tools, particularly from around the Oxford area are used at XChem for this, including Fragmenstein, DeLinker, and fragment binding hotspots.

Fragment hotspots are another recent technique used to drive fragment elaboration. In the technique, binding information from the CSD is used to estimate the proclivity for parts of the protein’s surface to accept a certain type of bond, and hence derive a map of where favourable interactions can be made. This allows chemists to design compounds based on the initial hits that target these. Furthermore recent advances have even allowed for designing towards selectivity over related proteins.

Fragmenstein is another tool, primarily for merging fragments, but also capable of attempting to position follow-ups inspired by fragments with known binding locations. It takes a minimum common substructure approach, but differs interestingly in that it imposes the strong restraint that the relative positions of the fragment atoms in their binding mode is preserved.

DeLinker on the other hand is targeted at the linking problem. The program treats linking as a graph problem, and hence is able to leverage the power of deep generative graph neural networks in order to propose chemical linkers. 

References:

COVID Moonshot

Without a doubt, the largest project that XChem has been involved in to date is the COVID moonshot. Diamond Light Source’s contribution to this global effort to develop novel small molecule therapeutics for SARS-CoV-2 was primarily through the XChem platform. Since the beginning of the pandemic, data from more than a thousands crystals has been collected at XChem and used to drive the consortium’s compound design.

Perhaps most remarkably, XChem was able to go from gettiing crystals to over 600 datasets screened in under 72 hours. Having been at the beamline in this period helping with PanDDA analyses of the initial results during this period, I have to say it was pretty amazing to see up close.

References:

Other Platforms

There are a range of other Fragment Screening facilities around Europe, and the links between them and XChem are long and productive. FragMAX is a fragment screening platform operating on the BioMAX beamline at MAXIV. It works on similar principles to XChem, also using the crystal shifter, beamline crystal mounting robotics and having its own data analysis application: the FragMAXApp. The Helmholtz Zentrum Berlin also offers a fragment screening platform at BESSYII, similar to the FragMAX offering (even using the FragMAXApp). The high throughput screening lab at EMBL Grenoble offers another fragment screening platform, based on their CrystalDriect technology developed to offer fully web-based crystallisation trials to the collection pipeline.

On the commercial side Astex offers crystallographic fragment screening though their Pyramid Discovery Platform, alongside other screening techniques such as NMR and ITC. I couldn’t find a lot of specifics on what that looked like internally however.

References:

Future for crystallographic FBDD at diamond

K04

With Diamond II requiring the current I04-1 beamline to be decommissioned, the opportunity is being taken to build a new beamline for XChem: K04. This will enable an order of magnitude increase in throughput, which will help manage the massive oversubscription but also create room for more high-risk and exploratory experiments. Furthermore, the much higher energies that will be available with the rebuild and the synchrotron upgrades will make fragment screens on membrane proteins, protein-protein crystals and otherwise currently poorly handled systems feasible. No doubt these more ambitious systems will bring with them a slew of new edge cases for analysis as well.

References:

Advances in robotic synthesis 

XChem is currently exploring ways to accelerate follow up screening through the use of robotic synthesis. Typically the number of follow up compounds that can be practically tested is at most in the 10s, despite the fact the use of poised fragments means that the easily accessible space is much larger than this. This would open up potential to generate and screen hundreds of follow ups, potentially generating much more promising candidates and generating additional useful information for medchem on what modifications the compound will tolerate while still binding. While Opentrons have made this kind of robotic chemistry much more accessible, there are still hurdles to be overcome. The inteded method of screening without purification also complicates analysis: when you have multiple compounds in the soak, that are chemically similar, the prior that informs your modelling is much weaker, and statistical model comparison may become necessary, not to mention the potential for new techniques to analyse other biophysical assays of reaction mixtures. 

Advances in analysis

The greatest challenge to analysing the future of experiments at XChem is, in my opinion, one of scale. More crystals screened means every step of analysis becomes performance-critical, and must be robust to every edge case crystallography can throw at it. Moreover the sheer volume of data, and storage and handling of it, is likely to become a major challenge that will require sophisticated summaries and models in order to be fully leveraged and massive computational infrastructure.

Beyond pragmatic concerns, this level of scale presents both opportunities and challenges for current analysis algorithms. Any algorithm with scaling maximum memory requirements will likely need reconsidering. Crude mixtures, heterogeneity in crystal systems and follow up screens will all need to be routinely handled. On the side of opportunities, the scale and repeatability of this data is likely to open up enormous possibilities for machine learning and other statistical analysis, particularly in the detection and automated modelling of fragment binding events. All these have been key aims of my own work on the PanDDA algorithm. 

Summary

Crystallographic fragment screening at XChem and other facilities has only just begun in earnest in the past few years, but the promising candidates heading into trials and publications suggest it is set to become one of the most powerful forms of novel lead discovery.

There are significant challenges and opportunities ahead, at XChem and other fragment screening facilities. Some are more general to crystallography: partial occupancy modelling, getting reliably diffracting crystal systems fast and quantifying confidence in binding when evidence from a single dataset is weak. Others, such as the design and automated synthesis of follow ups, are more specific to FBDD. Either way, there is a lot to work on.

Author