This paper describes the complete XChem process for crystal-based fragment screening, starting from applying for access and all subsequent steps to data dissemination.
In fragment-based drug discovery, hundreds or often thousands of compounds smaller than ~300 Da are tested against the protein of interest to identify chemical entities that can be developed into potent drug candidates. Since the compounds are small, interactions are weak, and the screening method must therefore be highly sensitive; moreover, structural information tends to be crucial for elaborating these hits into lead-like compounds. Therefore, protein crystallography has always been a gold-standard technique, yet historically too challenging to find widespread use as a primary screen.
Initial XChem experiments were demonstrated in 2014 and then trialed with academic and industrial collaborators to validate the process. Since then, a large research effort and significant beamtime have streamlined sample preparation, developed a fragment library with rapid follow-up possibilities, automated and improved the capability of I04-1 beamline for unattended data collection, and implemented new tools for data management, analysis and hit identification.
XChem is now a facility for large-scale crystallographic fragment screening, supporting the entire crystals-to-deposition process, and accessible to academic and industrial users worldwide. The peer-reviewed academic user program has been actively developed since 2016, to accommodate projects from as broad a scientific scope as possible, including well-validated as well as exploratory projects. Academic access is allocated through biannual calls for peer-reviewed proposals, and proprietary work is arranged by Diamond’s Industrial Liaison group. This workflow has already been routinely applied to over a hundred targets from diverse therapeutic areas, and effectively identifies weak binders (1%-30% hit rate), which both serve as high-quality starting points for compound design and provide extensive structural information on binding sites. The resilience of the process was demonstrated by continued screening of SARS-CoV-2 targets during the COVID-19 pandemic, including a 3-week turn-around for the main protease.
Fragment-Based Drug Discovery (FBDD) is a widely-used strategy for lead discovery, and since its emergence 25 years ago, it has delivered four drugs for clinical use and more than 40 molecules have been advanced to clinical trials1,2,3. Fragments are small chemical entities usually with a molecular weight of 300 Da or less. They are selected for their low chemical complexity, which provide good starting points for development of highly ligand efficient inhibitors with excellent physicochemical properties. Their size means that they sample the binding landscape of proteins more thoroughly than libraries of larger drug- or lead-like compounds, and thus also reveal hot spots and putative allosteric sites. Combined with structural information, fragments provide a detailed map of the potential molecular interactions between protein and ligand. Nevertheless, reliably detecting and validating those entities, which tend to bind weakly to the target protein, requires an array of robust and sensitive biophysical screening methods such as Surface Plasmon Resonance (SPR), Nuclear Magnetic Resonance (NMR), or Isothermal Titration Calorimetry (ITC)4,5.
X-ray crystallography is an essential part of the FBDD toolkit: it is sensitive enough to identify weak binders and directly yields structural information about the interactions at a molecular level. It is complementary to other biophysics screens and usually essential for progressing fragment hits to lead compounds; it requires high quality crystal systems, meaning that crystallization is highly reproducible, and crystals ideally diffract to better than 2.8 Å resolution.
Historically, it has been very difficult to use crystallography as primary fragment screen6,7,8, whether in academia or in industry. In contrast, synchrotrons achieved order of magnitude improvements in robotics, automation9,10,11 and detector technology12,13, and combined with equally accelerated computing power and algorithms of data processing14,15,16, complete diffraction datasets can be measured in seconds and large numbers of them entirely unattended, as pioneered at LillyCAT7 and later MASSIF17,18 (European Synchrotron Radiation Facility (ESRF)). This led synchrotrons to develop highly streamlined platforms to make crystal-based fragment screening as primary screen accessible to a wide user community (XChem at Diamond; CrystalDirect at EMBL/ESRF19; BESSY at Helmholtz-Zentrum Berlin20; FragMax at MaxIV21).
This paper documents the protocols that constitute the XChem platform for fragment screening by X-ray crystallography, from sample preparation to the final structural results of 3D-modeled hits. The pipeline (Figure 1) required developing new approaches to crystal identification22, soaking23, and harvesting24, as well as data management software25 and an algorithmic approach to identifying fragments26 that is now widely used in the community. The crystal harvesting technology is now sold by a vendor (see Table of Materials), and the open availability of the tools has allowed other synchrotrons to adapt them to set up equivalent platforms21. Ongoing projects address data analysis, model completion, and data dissemination through the Fragalysis platform27. The sample preparation laboratory is adjacent to beamline I04-1, simplifying the logistics of transferring hundreds of frozen samples to the beamline and dedicated beamtime on I04-1 allows rapid X-ray feedback to guide the campaign.
XChem is an integral part of Diamond's user program, with two calls per year (early April and October). The peer-review process has been refined in consultation with experts in drug discovery from Academia and Industry. Along with a strong science case, the proposal process28 requires applicants to self-assess not only the readiness of the crystal system, but also their expertise in biochemical and orthogonal biophysical methods and capacity to progress screening hits through follow-up chemistry. The modes of access have also evolved to accommodate the multidisciplinary user community:
Tier 1 (single project) is for projects at the exploratory stage and hit validation tools (biophysics or biochemical tools) and follow-up strategies need not be in place. If accepted, the project is granted a reduced number of beamtime shifts, enough for proof of concept.
Tier 2 (single project) is for well-validated projects and requires downstream tools and follow-up strategies to be in place. If accepted, the project is allocated enough beamtime for a full fragment screening campaign. Single projects (Tier 1 or Tier 2) are to be completed within the 6 months of the allocation period (either April to September or October to March).
Block Allocation Group (BAG) is for a consortia of groups and projects, where a robust target selection and prioritization process is in place within the BAG, along with a clear follow-up pipeline. BAGs must have at least one fully XChem trained expert (superuser), who coordinates their activities with Diamond staff and trains the BAG members. The allocated number of beamtime shifts is defined by the number of scientifically strong projects in the BAG and is re-evaluated per allocation period based on the BAG's report. The access is available for 2 years.
The XChem experiment is divided into three stages, with a decision point for each of them: solvent tolerance test, pre-screen, and main screen (Figure 2). The solvent tolerance test helps define the soaking parameters, the amount of solvent (DMSO, ethylene glycol, or other cryoprotectants if needed) the crystal system can tolerate and for how long. Solvent concentrations typically range from 5%-30% over at least two time points. Diffraction data is collected and compared to the base diffraction of the crystal system; this will determine the soaking parameters for the following stage. For the pre-screen, 100-150 compounds are soaked using the conditions determined in the solvent test, and its purpose is to confirm that the crystals can tolerate the compounds in those conditions. If needed, the cryoprotectant is subsequently added to the drops already containing the fragments. The success criteria are that 80% or more of the crystals survive well enough to yield diffraction data of good and consistent quality; if this fails, soaking conditions are usually revised by altering the soak time or solvent concentration. Following a successful pre-screen, the rest of the compounds chosen for the experiment can be set up using the final parameters.
The DSI-poised library (see Table of Materials) was purposely designed to allow rapid follow-up progression using poised chemistry29 and has been the facility's workhorse library. It is available to users at a concentration of 500 mM in DMSO. Academic users can also access other libraries provided by collaborators (over 2,000 compounds in total) at concentrations of 100-500 mM in DMSO (a full list can be found on the website28). Much of the overall collection is also available in ethylene glycol, for crystal systems that do not tolerate DMSO. Users can also bring their own libraries, provided they are in plates compatible with the acoustic liquid handling system (see Table of Materials).
For all three steps of the experiment (solvent characterization, pre-screen or full screen), the following sample preparation procedures are identical (Figure 3): selection of the compound dispensing location through imaging and targeting of crystallization drops with TeXRank22; dispensing into drops using the acoustic liquid dispensing system for both solvent and compounds23; efficient harvesting of the crystals using the Crystal shifter24; and upload of sample information into the beamline database (ISPyB). The current interface for experiment design and execution is an Excel-based application (SoakDB), which generates the necessary input files for the different equipment of the platform, and tracks and records all results in an SQLite database. Barcode scanners are used at various stages throughout the process to help track samples and this data is added to the database.
Diffraction data are collected in unattended mode using dedicated beamtime on beamline I04-1. Two centering modes are available, namely, optical and X-ray based17. For needle and rod-shaped crystals, X-ray centering is advised, whereas chunkier crystals generally support optical mode, which is faster and, therefore, allows for more samples to be collected in the allotted beamtime. Depending on the resolution of the crystals (established before entering the platform) data collection can either be 60 s or 15 s total exposure. Data collection during the solvent test stage usually informs which combination will work best with the performance of beamline I04-1.
The large volume of data analysis is managed through XChemExplorer (XCE)25, which can also be used to launch the hit identification step using PanDDA26. XCE is a data management and workflow tool that supports large-scale analysis of protein-ligand structures (Figure 4); it reads any of the auto-processing results from data collected at Diamond Light Source (DIALS16, Xia214, AutoPROC30, and STARANISO31) and auto-selects one of the results based on data quality and similarity to a reference model. It is important that the model is representative of the crystal system used for XChem screening, and must include all waters or other solvent molecules, as well as all co-factors, ligands, and alternative conformations visible in crystals soaked with solvent only. The quality of this reference model will directly impact the amount of work required during the model building and refinement stage. PanDDA is used to analyze all the data and identify binding sites. It aligns structures to a reference structure, calculates the statistical maps, identifies events, and calculates event maps26,32. In the PanDDA paradigm, it is neither necessary nor desirable to build the full crystallographic model; what must be modeled is only the view of the protein where a fragment is bound (the bound-state model), so the focus need be only on building the ligand and surrounding residues/solvent molecules according to the event map32.
1. Project proposal submission
2. Preparation for the visit
3. Fragment screening experiment
4. Data collection
NOTE: Data is collected in an unattended mode and managed by the XChem/beamline team.
5. Data analysis
6. Depositing the data
NOTE: All datasets from a fragment screen and the ground-state model used to generate the PanDDA event maps can be deposited in the PDB using group depositions.
The XChem pipeline for fragment screening by X-Ray crystallography has been extensively streamlined, enabling its uptake by the scientific community (Figure 5). This process has been validated on over 150 of screening campaigns with a hit rate varying between 1% and 30%47,48,49,50,51,52 and by many repeat users. Crystal systems that are not suitable (low resolution, inconsistent in crystallization or in diffraction quality) or cannot tolerate either DMSO or ethylene glycol are eliminated early in the process, saving time, effort, and resource. Successful campaigns provide a three-dimensional map of potential interaction sites on the target protein; a typical outcome is the XChem screen of the main protease of SARS-CoV-2 (Figure 6). Typically, fragment hits are found in: (a) known sites of interest, such as enzyme active sites and sub-pockets48; (b) putative allosteric sites, for example, in protein-protein interactions53; (c) crystal packing interfaces, generally considered as false positives (Figure 6). This structural data generally provides a basis for merging, linking, or growing fragment hits into lead-like small molecules1,3.
Figure 1: The XChem pipeline. The platform is represented schematically from project proposal through sample preparation, data collection, and hit identification. Please click here to view a larger version of this figure.
Figure 2: Screening strategy. The workflow indicates the purpose of each milestone, the experiment's requirements, and the decision points. Please click here to view a larger version of this figure.
Figure 3: Sample preparation workflow. Critical steps for the sample preparation are represented with information from each step being recorded in an SQLite database. Please click here to view a larger version of this figure.
Figure 4: Data analysis using XCE. Critical steps in the data analysis are represented by a workflow diagram with the relevant software packages. Please click here to view a larger version of this figure.
Figure 5: Evolution of the XChem user program: The chart demonstrates the uptake and consolidation of the user program from 2015 through to 2019 with the creation of BAGs in 2019 and the resilience of the platform through the COVID-19 pandemic in 2020. Please click here to view a larger version of this figure.
Figure 6: Representative results of XChem fragment screen. SARS-CoV2 main protease (Mpro) dimer is represented in surface with active site hits shown in yellow, putative allosteric hits shown in magenta, and surface/crystal-packing artefacts shown in green. The figure was made using Chimera and Mpro PDB entries from group deposition G_1002156. Please click here to view a larger version of this figure.
The process outlined in this paper has been extensively tested by the user community and the adaptability of the protocols described here is key for handling the wide variety of projects typically encountered on the platform. However, a few pre-requisites of the crystal system are necessary.
For any fragment screening campaign carried out using X-ray crystallography, a reproducible and robust crystal system is critical. As the standard XChem protocol involves addition of the fragment directly to the crystal drop, optimization should focus on the number of drops containing high-quality crystals rather than the total number of crystals. If drops contain multiple crystals, then they are effectively redundant although may alleviate the harvesting process. Furthermore, transferring the crystallization protocol from the home institute to onsite facilities can be challenging. This is generally best achieved using crystal seeding to promote reproducible nucleation54, and, therefore, a good practice is for users to provide seed stocks along with their protein and crystallization solutions.
To ensure good compound solubility and support, the high soaking concentrations intended to drive binding of weak fragments, fragment libraries are provided in organic solvents, specifically DMSO and ethylene glycol. Provision of two different solvents gives users an alternative for crystals which do not tolerate DMSO at all, or where it occludes the binding of fragments in a site of interest. Users can supply alternative libraries in aqueous buffer: compounds will dispense well provided they are completely dissolved and formatted in plates compatible with the liquid dispensing robot.
For projects where it is not possible to find an appropriate organic solvent that would both solubilize the library and be tolerated by the crystal system, an alternative procedure is to use dried compounds as established at BESSY55.
In the community, there is a long-standing question about being able to soak compounds into crystals grown in crystallization conditions containing high salt concentrations. Practically, more precipitation of the compounds and rapid formation of salt crystals at the harvesting stage is observed, which is reduced by applying a humid environment around the harvesting area. Generally, screening campaigns in crystal systems from high salt crystallization conditions give a comparable hit rate to low salt conditions.
The initial stages of the XChem process (solvent tolerance testing and pre-screen) are relatively small-scale and quick experiments but allow clear go/no go decision for the project. Most painfully, alternative crystal systems will need to be found if neither solvent is tolerated, or the pre-screen results in a very low hit rate. In contrast, if they are successful, the results directly inform the soaking condition to use for the screening experiment, and the best strategy for data collection. Since quality of the data, especially the resolution, will affect the quality of the electron density for hit identification and analysis, the aim is to soak at the highest possible compound concentration that does not have a deleterious effect on diffraction quality (with the majority of datasets (~80%) diffracting to a resolution of 2.8 Å or better).
The data analysis process is streamlined within XChemExplorer, which relies on the PanDDA software for the detection of weak binders and allows users to quickly visualize and review the outcomes of the screening campaign. XChemExplorer imports data processing results from the packages available at Diamond (DIALS16, autoPROC30, STARANISO31, and Xia214) with resolution limits determined by the standard method for each package (i.e., CC1/2 = 0.3). By default, dataset selection is based on a score calculated from I/sigI, completeness, and a number of unique reflections, but specific processing results can be selected for use both globally or for individual samples25. Data is also excluded from analysis by PanDDA based on criteria including resolution, Rfree, and difference in unit cell volume between reference and target data (defaults are 3.5 Å, 0.4, and 12% respectively), so that poorly diffracting, mis-centered, or mis-indexed crystals do not affect the analysis.
The PanDDA algorithm takes advantage of the substantial number of datasets collected during a fragment campaign to detect partial occupancy ligands that are not visible in standard crystallographic maps. Initally, PanDDA uses data collected during the solvent tolerance testing and pre-screen steps to prepare an average density map which is then used to create a ground-state model. As this model will be used for all subsequent analysis steps, it is vital that it accurately represents the un-liganded protein under the conditions used for the fragment screen. PanDDA then uses a statistical analysis to identify bound ligands, generating an event map for the bound state of the crystal. An event map is generated by subtracting the unbound fraction of the crystal from the partial-occupancy dataset and presents what would be observed if the ligand was bound at full occupancy. Even fragments that appear clear in conventional 2mFo-DFc maps might be mismodeled if the event maps are not consulted32. While PanDDA is a powerful method for identifying datasets that differ from the average maps (which is usually indicative of fragment binding) and metrics such as RSCC, RSZD, B-factor ratio, and RMSD during refinement are provided for the users benefit, the user is ultimately responsible for deciding whether the observed density accurately depicts the expected ligand and the most suitable conformation.
Following data analysis and refinement, it is possible for all users to simultaneously deposit multiple structures in the Protein Data Bank (PDB) using XChemExplorer. For each fragment-screen, two group depositions are made. The first deposition contains all fragment-bound models, with coefficients for calculating PanDDA event maps included in MMCIF files. The second deposition provides the accompanying ground-state model, along the measured structure factors of all datasets of the experiment: this data can be used to reproduce the PanDDA analysis, and for developing future algorithms. As for the structures of the hits, when fragment occupancy is low, refinement is better behaved if models are a composite of the ligand-bound and confounding ground-state structures32; nevertheless, the practice is to deposit only the bound-state fractions, since the full composite models are in general complex and difficult to interpret. As a result, some quality indicators recalculated by the PDB (in particular, R/Rfree) are sometimes slightly elevated. It is also possible to provide all raw data using platforms such as Zenodo56, although this is not currently supported by the XChem pipeline.
Overall, since its operation in 2016, fragment ligands could be identified in over 95% of the targets using this procedure. Experience from the many projects that XChem has supported was distilled into best practice for crystal preparation33, while a fragment library was evolved that implemented the poised concept for aiding fragment progression29, also helping establish the practice of making library composition public. The platform has demonstrated the importance of well-maintained infrastructure and documented processes, detailed here, and made it possible to evaluate other fragment libraries57,58, to compare libraries48, and to inform the design of the collaborative EUOpenscreen-DRIVE library59,60.
The authors have nothing to disclose.
This work represents a large joint effort between the Diamond Light Source and the Structure Genomic Consortium. The authors would like to acknowledge Diamond's various support groups and MX group for their contribution to the automation of i04-1 beamline and for providing streamlined data collection and auto-processing pipelines, which are commonly run across all MX beamlines. They would also like to thank the SGC PX group for their resilience being the first users to test the setup and Evotec for being the first serious industrial user. This work was supported by iNEXT-Discovery (Grant 871037) funded by Horizon 2020 program of the European Commission.
DSI-poised library | Enamine | DSI-896 | fragment library |
Echo 550 and 650 series | Beckman-Coulter | acoustic dispensing system | |
Echo microplates | Beckman-Coulter | 001-12380; 001-8768; 001-6025 | 1536-well and 384-well microplates |
Shifter | Oxford Lab Technology | harvesting device | |
Microplate centrifuge with a swing-out rotor | Sigma | model 11121 | microplate centrifuge |
3-drops crystallisation plates | Swissci | 3W96T-UVP | Crystallisation plates |
Formulatrix plate imager and Rockmaker software | Formulatrix | Crystallisation plates imaging device |