This protocol introduces the tools available for modeling small-molecule ligands in cryoEM maps of macromolecules.
Deciphering the protein-ligand interactions in a macromolecular complex is crucial for understanding the molecular mechanism, underlying biological processes, and drug development. In recent years, cryogenic sample electron microscopy (cryoEM) has emerged as a powerful technique to determine the structures of macromolecules and to investigate the mode of ligand binding at near-atomic resolution. Identifying and modeling non-protein molecules in cryoEM maps is often challenging due to anisotropic resolution across the molecule of interest and inherent noise in the data. In this article, the readers are introduced to various software and methods currently used for ligand identification, model building, and refinement of atomic coordinates using selected macromolecules. One of the simplest ways to identify the presence of a ligand, as illustrated with the enolase enzyme, is to subtract the two maps obtained with and without the ligand. The extra density of the ligand is likely to stand out in the difference map even at a higher threshold. There are instances, as shown in the case of metabotropic Glutamate receptor mGlu5, when such simple difference maps cannot be generated. The recently introduced method of deriving the Fo-Fc omit map can serve as a tool for validating and demonstrating the presence of the ligand. Finally, using the well-studied β-galactosidase as an example, the effect of resolution on modeling the ligands and solvent molecules in cryoEM maps is analyzed, and an outlook on how cryoEM can be used in drug discovery is presented.
Cells accomplish their functions by carrying out innumerable chemical reactions simultaneously and independently, each meticulously regulated to ensure their survival and adaptability in response to environmental cues. This is achieved by molecular recognition, which enables biomolecules, especially proteins, to form transient or stable complexes with other macromolecules as well as small molecules or ligands1. Thus, protein-ligand interactions are fundamental to all processes in biology, which include the regulation of protein expression and activity, the recognition of substrates and cofactors by enzymes, as well as how cells perceive and relay signals1,2. A better understanding of the kinetic, thermodynamic, and structural properties of the protein-ligand complex reveals the molecular basis of ligand interaction and also facilitates rational drug design by optimizing drug interaction and specificity. An economical and faster approach to studying protein-ligand interaction is to use molecular docking, which is a computational method that virtually screens a diverse range of small molecules and predicts the binding mode and affinity of these ligands to target proteins3. However, experimental evidence from high-resolution structures determined by X-ray Diffraction (XRD), Nuclear Magnetic Resonance (NMR), or electron cryomicroscopy (cryoEM) provides the essential proof for such predictions and aids in the development of newer and more effective activators or inhibitors for a given target. This article uses the abbreviation 'cryoEM' as the technique is commonly referred to. However, there is an ongoing debate on choosing the correct nomenclature, and recently, the term cryogenic-sample Electron Microscopy (cryoEM) has been proposed to indicate that the sample is at cryogenic temperature and imaged with electrons4. Similarly, the maps derived from cryoEM have been called electron potential, electrostatic potential, or Coulomb potential, and for simplicity, here we use cryoEM maps5,6,7,8,9,10.
Although XRD has been the gold standard technique in high-resolution structure determination of protein-ligand complexes, post resolution-revolution11 cryoEM has gained momentum, as indicated by the surge of Coulomb potential maps or cryoEM maps deposited in the Electron Microscopy Database (EMDB)12,13 over the last few years14. Owing to advancements in sample preparation, imaging, and data processing methods, the number of Protein Data Bank (PDB)14 depositions employing cryoEM increased from 0.7% to 17% between 2010 and 2020, with approximately 50% of reported structures in 2020 being determined at 3.5 Å resolution or better15,16. CryoEM has been rapidly adopted by the structural biology community, including the pharmaceutical industry, as it allows the study of flexible and non-crystalline biological macromolecules, especially membrane proteins and multi-protein complexes, at near-atomic resolution, overcoming the process of crystallization and obtaining well-diffracting crystals required for high-resolution structure determination by XRD.
Accurate modeling of the ligand in the cryoEM map is paramount, as it serves as a blueprint of the protein-ligand complex at a molecular level. There are several automated ligand-building tools used in X-ray crystallography that depend on the shape and topology of the ligand density in order to fit or build the ligand into the electron density17,18,19,20. Nevertheless, if the resolution is lower than 3 Å, these approaches tend to produce less desirable outcomes because the topological features on which they depend for recognition and building become less defined. In many instances, these methods have proven ineffective in accurately modeling ligands into cryoEM maps, as these maps have been determined in the low-to-medium resolution range, typically between 3.5 Å-5 Å17.
The first step in 3D structure determination of a protein-ligand complex by cryoEM involves either co-purifying the ligand with the protein (when the ligand has a high binding affinity to the protein) or incubating the protein solution with the ligand for a specific duration before grid preparation. Subsequently, a small sample volume is placed onto a plasma-cleaned holey TEM grid, followed by flash freezing in liquid ethane and ultimately imaging with a cryo-TEM. The 2D projection images from hundreds of thousands to millions of individual particles are averaged to reconstruct a 3-dimensional (3D) Coulomb potential map of the macromolecule. Identifying and modeling ligands and solvent molecules in these maps pose significant challenges in many cases due to the anisotropic resolution across the map (i.e., the resolution is not uniform across the macromolecule), flexibility in the region where the ligand is bound, and the noise in the data. Many of the modeling, refinement, and visualization tools that were developed for XRD are now being adapted for use in cryoEM for the same purposes18, 19, 20, 21. In this article, an overview of various methods and software currently used to identify ligands, build models, and refine the coordinates derived from cryoEM is presented. A step-by-step protocol has been provided to illustrate the processes involved in modeling ligands using specific protein-ligand complexes with varying resolution and complexity.
The first step in modeling ligands in cryoEM maps includes the identification of the ligand density (non-protein) in the map. If ligand binding does not induce any conformational change in the protein, then calculating a simple difference map between the protein-ligand complex and the apo-protein essentially highlights the regions of extra density, suggesting the presence of the ligand. Such differences can be observed immediately, as it just requires two maps, and even intermediate maps during the process of 3D refinement can be used to check if the ligand is present. Additionally, if the resolution is high enough (<3.0 Å), then the difference map can also provide insights into the location of water molecules as well as ions interacting with the ligand and the protein residues.
In the absence of the apo-protein map, it is now possible to use Servalcat22, which is available as a standalone tool and has also been integrated into the CCP-EM software suite23,24 as part of the Refmac refinement and in CCP4 8.0 release25,26. Servalcat allows the calculation of an FSC weighted difference (Fo-Fc) map using the unsharpened half-maps and the apoprotein model as input. The Fo-Fc omit map represents the disparity between the experimental map (Fo) and the map derived from the model (Fc). In the absence of a ligand in the model, a positive density in a Fo-Fc map that overlaps with the experimental EM map typically suggests the presence of the ligand. The assumption here is that the protein chain is well-fitted in the map, and the remaining positive density indicates the location of the ligand. However, it is important to meticulously examine whether the positive density stems from modeling inaccuracies, such as the wrong rotamer of a protein side-chain.
The second step involves obtaining or creating a cartesian coordinate file of the ligand with well-defined geometry from the available chemical information. Standard ligands (for example, ATP and NADP+) that are already available in the CCP4 monomer library can be used for refinement by retrieving the coordinate and geometry files via their monomer accession code. However, for unknown or non-standard ligands, various tools are available to create the geometry files. Some of the examples include the eLBOW27 – (electronic ligand builder and optimization workbench) in Phenix28, Lidia – an in-built tool in Coot29, JLigand/ACEDRG30,31, CCP-EM23,24, Ligprep32-a module of Glide within the Schrodinger suite. The ligand coordinate file is then fitted in the density, guided by both the experimental cryoEM map and the difference map in Coot. This is followed by real-space refinement in Phenix28 or reciprocal refinement in Refmac33. A Linux workstation or a laptop equipped with a good graphics card and the above-mentioned software is required. Most of these programs are included in various suites. CCP-EM24 and Phenix28 are freely available to academic users and include a variety of tools that are used in this article, including Coot, Refmac533,34,35,36, Servalcat, phenix.real_space_refine, etc. Similarly, Chimera37, and ChimeraX38 provide free licenses to academic users.
1. Modeling phosphoenolpyruvate (PEP) in enolase from Mycobacterium tuberculosis
2. Modeling of ligands in metabotropic Glutamate receptor mGlu5
3. Modeling the inhibitor, deoxygalacto-nojirimycin (DGN), and solvent molecules in a high-resolution sharpened map of β-galactosidase
4. Effect of resolution on ligand modeling in β-galactosidase
Example 1
The enzyme enolase from M. tuberculosis catalyzes the penultimate step of glycolysis and converts 2-phosphoglycerate to phosphoenolpyruvate (PEP), which is an essential intermediate for several metabolic pathways44,45. CryoEM data for the apo-enolase and PEP-bound enolase samples were collected at the same pixel size of 1.07 Å, and image processing was performed with Relion 3.146,47. The apo-enolase and the PEP-enolase structures were determined at 3.1 Å and 3.2 Å, respectively48. The maps and models were deposited in EMDB and PDB49,50 (EMD-30988, EMD-30989, PDB-7e4x, and PDB-7e51). The cryoEM map of the enzyme shows that it is an octamer in the solution (Figure 1A). In order to identify the ligand density in the PEP-enolase map, the unsharpened maps of the apoenzyme and PEP-bound enzyme were selected, and a difference map was calculated in ChimeraX, by subtracting the PEP-enolase map from the apoenzyme map. A distinct density (green) at a high threshold was observed, which suggested the presence of the ligand (Figure 1B). Modeling the protein chain in the unsharpened map clearly indicated that the extra density is present in the active site of the protein (Figure 1C). The ligand, PEP, was then modeled in the B-factor sharpened map using Coot, and the protein+ligand model was refined in real space with Phenix. Two Mg2+ ions were modeled in the density observed in the vicinity of the ligand (Figure 1D). The ligand, PEP, adopts a similar orientation as observed in other enolase homologs, and several active site residues, such as Lys-386, Arg-364, form hydrogen bond interactions with the ligand PEP. The Mg2+ ions form metal coordination bonds with Asp-241, Glu-283, Asp-310, and the phosphate of PEP (Figure 1D).
Example 2
In the absence of an available apo-protein structure or if the protein undergoes a large conformational change, calculating the difference maps as described above is not possible. In 2021, Garib Murshudov's group at the Laboratory of Molecular Biology, Cambridge, introduced Servalcat22, which implements a refinement workflow using Refmac and also calculates a Fo-Fc difference map after refinement. A positive Fo-Fc difference density suggests the presence of molecules/ligands that were not included in the model during refinement, essentially an omit map. However, it is recommended to first assess the fit of the model to the map in general and then evaluate the difference density map.
To illustrate the use of Servalcat/Refmac, mGlu5, a dimeric G-protein coupled receptor that binds to the neurotransmitter, L-Glutamate was chosen. Upon binding of the agonist, L-quisqualate, the extracellular domain reorients, which triggers the rotation of the 7TM, bringing them closer to stabilize the activated state. Thus, there is a large conformational change observed between the apo/antagonist vs. agonist bound states51 (Figure 2A and Figure 2E). The two half-maps for both the agonist (EMD-31536) and the antagonist (EMD-31537) bound complexes were obtained from EMDB, and the cryoEM maps show varied resolution across the molecule and better-resolved extracellular domain. Subsequently, these were utilized as inputs in Servalcat along with the apo-protein as the model to compute the difference or Fo-Fc map for each dataset. This map distinctly showed the presence of various ligand molecules (non-protein). The resolution estimated by FSC (Fourier Shell correlation) for the agonist and antagonist bound complexes were 3.8 Å and 4.0 Å, respectively. In the case of the agonist-bound mGlu5, the Servalcat Fo-Fc difference map showed the presence of both the agonist (L-quisqualate) (Figure 2B) and N-AcetylGlucosamine (NAG) (Figure 2C) in the ECD of the receptor (due to lower resolution of the TMD, here we focus only in the ECD and the top of the TMD). Protein residues, including Tyr-64, Trp-100, Ser-151, and Thr-175, are seen to interact with the agonist. Density near the Asn-210 residue suggested the presence of N-acetyl glucosamine (Figure 2C). A density consistent with cholesteryl hemisuccinate, which was added during the purification of mGlu5, was observed near the top of transmembrane helix 1 (Figure 2D). As the resolution is moderate and the ligand, L-quisqualate, can be placed in different orientations, prior structure of the extracellular domain with the ligand (PDB-6N50) was used as a guide to model the ligand. Antagonist binding stabilizes the open or resting state of the receptor (Figure 2E). A density consistent with the antagonist LY341495 was observed at the hinge of lobe I and lobe II of the venus fly trap domain in the ECD. The antagonist interacts with residues similar to those of the agonist. Stacking interaction between Tyr-223 in lobe II with the antagonist stabilizes the receptor in an open state (Figure 2F). Similar to the agonist structure, glycosylation or the presence of N-acetylglucosamine moiety was observed near Asn-210 (Figure 2G).
Example 3
The third example elucidates the protocol to model fragment-sized or small-sized ligands and solvent molecules in high-resolution CryoEM maps. Fragment-based Drug Discovery (FBDD) has emerged as a powerful and innovative method in the development of target-based novel therapeutics in various disease areas, making it a promising avenue in pharma R&D52,53. FBDD begins with screening and careful selection of small, highly soluble, low-molecular-weight fragments of molecules that bind to specific target proteins or biomolecules of interest. Determining the structures of these protein-fragment complexes reveals the mode of binding of these fragments, which serves as a guide to designing larger and more complex drug-like molecules with increasing affinity and specificity towards the target protein54. However, this method demands a high-resolution ligand density to accurately determine the pose and correctly place the functional groups of the ligand15.
β-galactosidase, one of the first high-resolution structures to be determined from the advancements in cryoEM technology, is a well-studied 450kDa homotetrameric enzyme that catalyzes the hydrolysis of lactose to glucose and galactose55. To showcase the use of cryoEM in FBDD, Astex, UK, determined the structure of β-galactosidase with a fragment-sized inhibitor, deoxygalacto-nojirimycin (DGN) bound in the active site (EMDB-10563, PDB:6tsh)56. This dataset is used to illustrate the protocol to model ligands and solvents unambiguously in high-resolution maps. To showcase the effect of resolution on ligand modeling and visualization, the maps were filtered to 3.0 Å and 3.5 Å in the post-processing step in Relion. This highlights the quality of the map density at different resolutions and underscores the need for higher resolution for modeling ligands and solvents.
The enzyme is a tetramer with D2 symmetry in solution (Figure 3A). Difference map (between map and model), as computed by Servalcat, suggested the presence of DGN and several solvent molecules in the active site of the enzyme (Figure 3B). At an estimated resolution of 2.3 Å, the density showed high-resolution features, which helped in the accurate modeling of the inhibitor in the protein active site. Interactions between DGN and Tyr-503 and His-540 were observed (Figure 3C). The difference density also suggested the presence of solvent molecules that interact with DGN as well as the protein residues. Mg2+ and several water molecules were modeled in the density (Figure 3D). Metal coordination bonds between Mg2+ and Glu-416, Glu-461, and several water molecules are observed (Figure 3D). Mg2+ was seen to interact with DGN via a water molecule.
At a lower resolution of 3.5 Å and 3.0 Å, the ligand density resembles a blob and lacks high-resolution features crucial for accurate modeling of the ligand (Figure 4A,B). Density for water molecules was almost non-existent at these resolutions. In summary, with increasing resolution, especially higher than 3.0 Å, the density enabled the modeling of a greater number of water molecules (Figure 4C,D). The correct positioning of the ligand's chiral centre became achievable at ~2.3 Å (Figure 4C,D) due to the presence of distinct features in the map that guided the placement as well as modeling of water molecules and Mg2+. In comparison, the density for Mg2+ remained discernible across the resolution range (Figure 4A–C).
Figure 1: Modeling the ligand phosphoenolpyruvate in M. tuberculosis enolase. (A) shows the B-factor sharpened map of the PEP-bound enolase enzyme. The map suggests that enolase is octameric in solution, and each monomer in the map is colored differently. (B) displays an unsharpened cryoEM map of apo-Enolase enzyme in grey with the difference map (between PEP-bound and apo-Enolase unsharpened maps) overlaid in green, suggesting the presence of ligand, PEP. This is a demonstration map to show the presence of ligands. (C) displays the fit of the enolase model in the unsharpened cryoEM map, highlighting the position of the difference density relative to the protein. The protein model is shown in cartoon representation and colored in Chainbow. This figure shows that the extra density (green) is present in the active site of each monomer. (D) shows the ligand, PEP, encased in the B-factor sharpened map, colored blue. Additionally, density for two Mg2+ ions was also observed, which forms metal coordination bonds with several active site residues, including Ser-42, Asp 241, Glu-283, Asp-310, and ligand atoms. The ligand makes hydrogen bond interactions with Lys-386, Lys-335, and Arg-364. The protein residues are shown in stick representation, and Mg2+ ions are shown as purple spheres. The figures in panels (A–D) were generated with Pymol. Please click here to view a larger version of this figure.
Figure 2: Identification, modeling, and visualization of various ligands in mGlu5 receptor. The Fo-Fc omit maps were obtained using Servalcat. (A) shows mGlu5 receptor dimer structure (PDB-7fd8) in cartoon representation, with each monomer colored in teal and wheat, respectively, and bound to the agonist L-quisqualate. All the ligands that were identified in the extracellular and on the top of the transmembrane domain of the receptor are encased in the Fo-Fc omit map, colored green, and contoured at 6σ.(B) highlights the fit of the agonist, L-quisqualate encased in the difference map. L-quisqualate interacts with several mGlu5 residues including Tyr-64, Trp-100, Ser-151, Thr-175 and Gly-280. The H-bond interactions are represented by red dashes. In (C), an additional density is evident near Asn-210, which is present in the receptor's extracellular domain, and N-acetylglucosamine molecule (NAG) was modeled in this density. For clarity, the NAG is not linked to Asn in the current figure. (D) shows the difference in density for cholesterol hemisuccinate (CHS) in green near the lipid-exposed surface of the receptor. The CHS molecule, represented in the sticks, was modeled in this density. (E) shows mGlu5 receptor structure (PDB-7fd9) bound to the antagonist LY341495 in cartoon representation. In (F), an extra density in the Fo-Fc difference map located at the hinge between lobe I and lobe II of the extracellular domain indicates the presence of the antagonist. Key residues (Tyr-64, Trp-100, Ser-152, Ser-173, Thr-175, and Tyr-223) around the antagonist are shown in stick representations, and potential hydrogen bond interactions with the antagonist are shown in red dashes. (G) shows the difference in density suggestive of the presence of NAG molecule near Asn-210 in the antagonist-bound structure (note, the NAG is not linked with Asn for clarity). The figures were generated with Pymol. Please click here to view a larger version of this figure.
Figure 3: Identification, modeling, and refinement of a small inhibitor and solvent molecules in the high-resolution map of β-galactosidase (EMD-10563). (A) shows the β-galactosidase model resolved to 2.3 Å (PDB: 6tsh) in cartoon representation, where each monomer is colored distinctly. The grey box highlights the ligand binding site. (B) displays the Fo-Fc difference density (from Servalcat) in green mesh in the active site of the enzyme. The difference in density suggests the presence of the inhibitor (DGN) and several solvent molecules in the active site. (C) demonstrates that guided by the Fo-Fc map, the inhibitor deoxygalacto-nojirimycin (DGN) is modeled in the active site. This modeled ligand is depicted in stick format and encased in the Fo density (by Servalcat), which is colored in blue_mesh. Hydrogen bond interactions between DGN and several protein residues, including Tyr-503 and His-540, are observed. The additional Fo-Fc difference density around the ligand (green) is indicative of the solvent molecules. (D) The map shows several solvent molecules, including water and Mg2+, represented as red and purple spheres, respectively, are modeled in the active site after ensuring that each solvent molecule is bonded to either protein (Glu-416, His-418, and Glu-461) or ligand residues. The water molecules and Mg2+ are encased in the Fo density (blue mesh). Mg2+ is seen to interact with the ligand, DGN via a water molecule. The Fo-Fc map (green mesh) in panels (B,C) is contoured at 6σ, whereas the Fo density – blue mesh in C and D (from the Servalcat refinement after modeling) is contoured at 3σ. The figures in panels (A–D) were generated with Pymol. Please click here to view a larger version of this figure.
Figure 4: Effect of resolution on ligand modeling in β-galactosidase. Half-maps from EMD-10563 were used as input in the Relion post-process step, and the combined post-process maps were filtered to resolutions 2.3 Å, 3.0 Å, and 3.5 Å with different B-factors. The map shown in all the panels is contoured at 6σ. For clarity, only the protein backbone is shown with no side chains, ligand or solvent molecules in panels A, B, and C. (A) The map is filtered to 3.5 Å, and the active site of β-galactosidase is shown. A blob resembling the ligand, DGN, is seen at this resolution, accompanied by a few smaller blobs in the vicinity. Modeling the ligand in the correct orientation proves challenging due to the lack of distinct features in the map. (B) The map filtered to 3.0 Å resolution is shown. Here, the ligand blob becomes slightly more defined but still lacks features in general. A few more small blobs suggestive of solvent molecules are also observed. (C) The map filtered to 2.3 Å resolution reveals the ligand density with distinct features, notably revealing the chair conformation of the iminosugar. A significant number of small blobs corresponding to water molecules are observed at this resolution. The automatic B-factor estimation/sharpening in Relion post-process gives a value of -18 Å2 for the maps filtered to 3 Å and 3.5 Å, while the value is -52 Å2 for the map filtered to 2.3 Å. Different B-factor sharpening of EM maps can also be performed with Coot and useful in model building. (D) The panel illustrates that the ligand DGN (stick representation), Mg2+, and water molecules (as spheres) as modeled in the active site and the sharpened map at 2.3 Å, shown in blue mesh surrounding these atoms. The figures in panels (A–D) were generated with Pymol. Please click here to view a larger version of this figure.
The improvements in microscope hardware and software have resulted in an increase in the number of cryoEM structures in recent years. Although the highest resolution achieved at the moment in single particle cryoEM is 1.2 Å57,58,59, the majority of the structures are being determined around 3-4 Å resolution. Modeling ligands in medium to low-resolution maps can be tricky and often fraught with ambiguity. Given the widespread use of cryoEM in both academia and the pharma industry for translational research and drug discovery, ensuring that ligands are modeled correctly and without ambiguity is essential. Thus, it is prudent to quantify the resolvability of ligand atoms by calculating Q-scores60, which is now available in the EMDB as a metric to evaluate the quality of the maps and the fit of the model as well as in Chimera.
In the first example, ChimeraX was used to compute the difference map in real space between the apo and the ligand-bound maps in the M. tuberculosis enolase enzyme. The additional density at a high threshold suggests the presence of the ligand, phosphoenolpyruvate, in the active site and, Mg2+ bound to the ligand. It is important to note that, in this case, the map is of medium resolution (3.2 Å), and water molecules cannot be modeled confidently (Figure 1). The limitation associated with this method is that it can only be applied when ligand binding does not induce significant conformational changes in the protein. Map normalization was not performed in this case since both the apo and ligand-bound datasets were acquired at the same pixel size and processed with identical parameters in Relion. However, it is worth noting that when comparing maps generated by different reconstruction programs, such as Relion46,47 and CryoSparc61, or with different quality, normalization of the maps becomes essential before meaningful comparisons can be made.
The next example is mGlu5, which undergoes large molecular reorganization upon agonist binding, as evident from the cryoEM structures51,62 (Figure 2). In this scenario, it is not feasible to compute a simple difference map between the unbound (apo) and ligand-bound receptors because of substantial differences between the maps. Here, Servalcat, which uses unsharpened and unweighted half-maps as input for refinement in reciprocal space and subsequently computes a difference map between the experimental map and the map derived from the model, was utilized. At a high threshold, one can visualize differences and act as a guide for correction and improvement of the model. Several unmodeled blobs in the extracellular and near the transmembrane domain of mGlu5 were observed and used as a guide to model ligands (Figure 2).
The third example shows how resolution (2.3 Å) plays a crucial role in map interpretation and modeling of a fragment-sized inhibitor in β-galactosidase. Here, the challenge was to identify a very small ligand (<200 Da) in the Servalcat difference map amidst the inherent noise in the data and model it accurately. In addition to the high global resolution determined using Fourier Shell Correlation (FSC), the local resolution specific to the ligand was also sufficiently high to ensure accurate placement of the chiral centres of the ligands (Figure 3). Density for solvent molecules was observed in the difference map throughout the enzyme and especially surrounding the ligand. The effect of resolution on modeling ligands and solvent atoms was also demonstrated (Figure 4). It is important to exercise caution when modeling water or solvent molecules because, on occasion, noise at a low threshold may resemble water or solvent molecules, leading to potential misinterpretations.
Another important consideration is that the cryoEM maps alone may not be sufficient to accurately identify a metal ion on its own. Additional biophysical methods like Extended X-ray Absorption Fine Structure (EXAFS) or Energy-Dispersive X-ray Spectroscopy (EDX) are often necessary to confirm the presence and identity of the metal ion. In both enolase and β-galactosidase enzymes, Mg2+ was modeled due to the wealth of information already available on these proteins, confirming the identity of the metal ion. Also, the coordination of the metal ions in these cases, exemplified by the classic octahedral geometry and the near-ideal coordination distances of Mg2+, provided substantial proof of their identity.
In general, a few important considerations should be taken into account while modeling ligands in cryoEM maps. To begin with, choosing the correct map for ligand identification and modeling is important. In all cases, an unsharpened and unweighted map or half-map is advisable over a sharpened and weighted map to visualize the ligand density, as sharpening and weighting could result in under-sharpened or over-sharpened (noise due to series termination) regions in the map. This may result in a suboptimal ligand density, and the use of different B-factor sharpening can be employed to evaluate the density during model building in Coot. Although there is a risk in using the sharpened map to identify ligands in a cryoEM map, the unsharpened map may not show the full detail of the ligand density but can be used for demonstration purposes, as shown in Figure 1B,C.
The pose of the modeled ligand should be validated, especially in cases where the data is weak. As shown here for mGlu5, the local resolution varies across the cryoEM map, and unbiased modeling of the ligand can be challenging. Servalcat can be used as a valuable tool for detecting potential inaccuracies in protein and ligand modeling22.
Compositional heterogeneity can be present in a protein-ligand complex where only a specific population may have the ligand present (if the ligand is of low molecular weight, then the classification step might not remove the heterogeneity). Nevertheless, it is important to perform the 3D classification63 step iteratively during image processing prior to ligand modeling and verify if the density of the ligand improves. If multiple copies of protein are present, caution should be exercised when applying symmetry across the map during initial model generation and refinement, as this may average out the ligand density in all the symmetry-related molecules. Symmetry should only be imposed after the map has been thoroughly inspected to confirm the presence of ligand density in all the protein chains.
Depending on the state (crystal or solution) and the location (buried or surface), the atoms can be dynamic, and in model refinement, this is referred to as the atomic displacement parameter (ADP). Along with the difference map, which provides visual clues to possible inaccuracies in the model, ADP values can be used to evaluate the accuracy of the ligands after refinement64,65,66,67. Typically, the ligand should have similar ADP values as the surrounding residues, i.e., if they are stably bound and modeled accurately. However, ligands at the periphery or atoms of a ligand (like lipids) that are far from macromolecules can have higher ADP values. In addition to refining the coordinates, both Refmac33,34 and Phenix allow refinement of ADP values28, 68. In Refmac, the Mott-Bethe approximation is used to calculate the electron scattering factor of individual atoms during map calculation. In the recent version of Phenix, individual B-factor refinement, similar to crystallography, has been introduced to account for atom disorder. Very often, in the refined models derived from cryoEM, a wide range of ADP values are observed (sometimes values close to zero), and even the Q-score that is used in EMDB for evaluating the fit of the model with the map depends on the primary map deposited and the nature of B-factor sharpening60. In cryoEM model building, multiple maps are often used, and thus, the maps used in modeling and refinement should clearly be mentioned in the methods, as due to anisotropic resolution in many macromolecules, one single map may not be sufficient to explain all the details.
In modeling of ligands in macromolecules, one of the main limitations of cryoEM maps (like in crystallography) is that if the ligand binding site is of low resolution or if the bound ligand is dynamic, then ascertaining the correct conformation may prove difficult. In addition, most cryoEM structures have resolutions below 3 Å, and the representation of water molecules in the maps is limited, making it difficult to assess the role of hydration in ligand or drug binding (as shown here with the examples of enolase and mGluR). Computational methods can be used in combination with cryoEM data to address these limitations69. In contrast to crystallography, only the model undergoes refinement, not the map. Presently, the only method to indicate the presence of a ligand or ensure accurate modeling is by generating omit maps (using tools like Servalcat). Thus, there are a number of tools to help researchers build and evaluate the model, but there are several areas in model refinement where newer approaches or modifications of current approaches can be expected in the near future.
In this article, we have focused on the current approaches for modeling ligands, which include manual inspection of the ligand density, generating the ligand geometry file, followed by modeling the ligand in the cryoEM map. This is an exciting period in structural biology and drug discovery, as direct electron detectors with faster frame rates and the use of faster data acquisition70 have resulted in the obtaining of high-resolution maps (<2.8 Å) of several macromolecules often bound with small molecule ligands in a relatively short time. The recent introduction of automated ligand modeling tools such as GEMspot69 in the Schrodinger suite and EMERALD17 in the Rosetta suite, which attempts to find the most probable bound pose of the ligand while factoring in the experimental cryoEM data, holds promise to streamline and automate this process. Similar to X-ray crystallography, it is envisioned that identifying the binding modes of small-molecule ligands by cryoEM, perhaps two or more in a single day, will become a realistic possibility.
The authors have nothing to disclose.
SJ is a recipient of the PhD studentship from DAE-TIFR, and the funding is acknowledged. KRV acknowledges DBT B-Life grant DBT/PR12422/MED/31/287/2014 and the support of the Department of Atomic Energy, Government of India, under Project Identification No. RTI4006.
CCP4-8.0 | Consortium of several institutes | https://www.ccp4.ac.uk | Free for academic users and includes Coot and list of tools developed for X-ray crystallography |
CCP-EM | Consortium of several institutes | https://www.ccpem.ac.uk/download.php | Free for academic users and includes Coot, Relion and many others |
Coot | Paul Emsley, LMB, Cambridge | https://www2.mrc-lmb.cam.ac.uk/personal/pemsley/coot/ | General software for model building but also available with other suites described above |
DockinMap (Phenix) | Consortium of several institutes | https://phenix-online.org/documentation/reference/dock_in_map.html | Software inside the Phenix suite for docking model into cryoEM maps |
Electron Microscopy Data Bank | Consortium of several institutes | https://www.ebi.ac.uk/emdb/ | Public Repository for Electron Microscopy maps |
Falcon | Thermo Fisher Scientific | https://assets.thermofisher.com/TFS-Assets/MSD/Technical-Notes/Falcon-3EC-Datasheet.pdf | Commercial, camera from Thermo Fisher |
Phenix | Consortium of several institutes | https://phenix-online.org/download | Free for academic users and includes Coot |
Protein Data Bank | Consortium of several institutes | https://rcsb.org | Public database of macromolecular structures |
Pymol | Schrodinger | https://pymol.org/2/ | Molecular viusalization tool. Educational version is free but comes with limitation. The full version can be obtained with a small fee. |
Relion | MRC-LMB, Cambridge | https://relion.readthedocs.io/en/release-4.0/Installation.html | Software for cryoEM image processing, also available with CCP-EM |
Titan Krios | Thermo Fisher Scientific | https://www.thermofisher.com/in/en/home/electron-microscopy/products/transmission-electron-microscopes/krios-g4-cryo-tem.html?cid=msd_ls_xbu_xmkt_tem-krios_285811_gl_pso_gaw_tpne1c& gad_source=1&gclid=CjwKCAiA-P-rBhBEEiwAQEXhHyw5c8MKThmdA AkZesWC4FYQSwIQRk ZApkj08MfYG040DtiiuL8 RihoCebEQAvD_BwE |
Commercial, cryoTEM from Thermo Fisher |
UCSF Chimera | UCSF, USA | https://www.cgl.ucsf.edu/chimera/download.html | General purpose software for display, analysis and more |
UCSF Chimera X | UCSF, USA | https://www.cgl.ucsf.edu/chimerax/ | General purpose software for display, analysis and more |