Summary

Modeling Ligands into Maps Derived from Electron Cryomicroscopy

Published: July 19, 2024
doi:

Summary

This protocol introduces the tools available for modeling small-molecule ligands in cryoEM maps of macromolecules.

Abstract

Deciphering the protein-ligand interactions in a macromolecular complex is crucial for understanding the molecular mechanism, underlying biological processes, and drug development. In recent years, cryogenic sample electron microscopy (cryoEM) has emerged as a powerful technique to determine the structures of macromolecules and to investigate the mode of ligand binding at near-atomic resolution. Identifying and modeling non-protein molecules in cryoEM maps is often challenging due to anisotropic resolution across the molecule of interest and inherent noise in the data. In this article, the readers are introduced to various software and methods currently used for ligand identification, model building, and refinement of atomic coordinates using selected macromolecules. One of the simplest ways to identify the presence of a ligand, as illustrated with the enolase enzyme, is to subtract the two maps obtained with and without the ligand. The extra density of the ligand is likely to stand out in the difference map even at a higher threshold. There are instances, as shown in the case of metabotropic Glutamate receptor mGlu5, when such simple difference maps cannot be generated. The recently introduced method of deriving the Fo-Fc omit map can serve as a tool for validating and demonstrating the presence of the ligand. Finally, using the well-studied β-galactosidase as an example, the effect of resolution on modeling the ligands and solvent molecules in cryoEM maps is analyzed, and an outlook on how cryoEM can be used in drug discovery is presented.

Introduction

Cells accomplish their functions by carrying out innumerable chemical reactions simultaneously and independently, each meticulously regulated to ensure their survival and adaptability in response to environmental cues. This is achieved by molecular recognition, which enables biomolecules, especially proteins, to form transient or stable complexes with other macromolecules as well as small molecules or ligands1. Thus, protein-ligand interactions are fundamental to all processes in biology, which include the regulation of protein expression and activity, the recognition of substrates and cofactors by enzymes, as well as how cells perceive and relay signals1,2. A better understanding of the kinetic, thermodynamic, and structural properties of the protein-ligand complex reveals the molecular basis of ligand interaction and also facilitates rational drug design by optimizing drug interaction and specificity. An economical and faster approach to studying protein-ligand interaction is to use molecular docking, which is a computational method that virtually screens a diverse range of small molecules and predicts the binding mode and affinity of these ligands to target proteins3. However, experimental evidence from high-resolution structures determined by X-ray Diffraction (XRD), Nuclear Magnetic Resonance (NMR), or electron cryomicroscopy (cryoEM) provides the essential proof for such predictions and aids in the development of newer and more effective activators or inhibitors for a given target. This article uses the abbreviation 'cryoEM' as the technique is commonly referred to. However, there is an ongoing debate on choosing the correct nomenclature, and recently, the term cryogenic-sample Electron Microscopy (cryoEM) has been proposed to indicate that the sample is at cryogenic temperature and imaged with electrons4. Similarly, the maps derived from cryoEM have been called electron potential, electrostatic potential, or Coulomb potential, and for simplicity, here we use cryoEM maps5,6,7,8,9,10.

Although XRD has been the gold standard technique in high-resolution structure determination of protein-ligand complexes, post resolution-revolution11 cryoEM has gained momentum, as indicated by the surge of Coulomb potential maps or cryoEM maps deposited in the Electron Microscopy Database (EMDB)12,13 over the last few years14. Owing to advancements in sample preparation, imaging, and data processing methods, the number of Protein Data Bank (PDB)14 depositions employing cryoEM increased from 0.7% to 17% between 2010 and 2020, with approximately 50% of reported structures in 2020 being determined at 3.5 Å resolution or better15,16. CryoEM has been rapidly adopted by the structural biology community, including the pharmaceutical industry, as it allows the study of flexible and non-crystalline biological macromolecules, especially membrane proteins and multi-protein complexes, at near-atomic resolution, overcoming the process of crystallization and obtaining well-diffracting crystals required for high-resolution structure determination by XRD.

Accurate modeling of the ligand in the cryoEM map is paramount, as it serves as a blueprint of the protein-ligand complex at a molecular level. There are several automated ligand-building tools used in X-ray crystallography that depend on the shape and topology of the ligand density in order to fit or build the ligand into the electron density17,18,19,20. Nevertheless, if the resolution is lower than 3 Å, these approaches tend to produce less desirable outcomes because the topological features on which they depend for recognition and building become less defined. In many instances, these methods have proven ineffective in accurately modeling ligands into cryoEM maps, as these maps have been determined in the low-to-medium resolution range, typically between 3.5 Å-5 Å17.

The first step in 3D structure determination of a protein-ligand complex by cryoEM involves either co-purifying the ligand with the protein (when the ligand has a high binding affinity to the protein) or incubating the protein solution with the ligand for a specific duration before grid preparation. Subsequently, a small sample volume is placed onto a plasma-cleaned holey TEM grid, followed by flash freezing in liquid ethane and ultimately imaging with a cryo-TEM. The 2D projection images from hundreds of thousands to millions of individual particles are averaged to reconstruct a 3-dimensional (3D) Coulomb potential map of the macromolecule. Identifying and modeling ligands and solvent molecules in these maps pose significant challenges in many cases due to the anisotropic resolution across the map (i.e., the resolution is not uniform across the macromolecule), flexibility in the region where the ligand is bound, and the noise in the data. Many of the modeling, refinement, and visualization tools that were developed for XRD are now being adapted for use in cryoEM for the same purposes18, 19, 20, 21. In this article, an overview of various methods and software currently used to identify ligands, build models, and refine the coordinates derived from cryoEM is presented. A step-by-step protocol has been provided to illustrate the processes involved in modeling ligands using specific protein-ligand complexes with varying resolution and complexity.

The first step in modeling ligands in cryoEM maps includes the identification of the ligand density (non-protein) in the map. If ligand binding does not induce any conformational change in the protein, then calculating a simple difference map between the protein-ligand complex and the apo-protein essentially highlights the regions of extra density, suggesting the presence of the ligand. Such differences can be observed immediately, as it just requires two maps, and even intermediate maps during the process of 3D refinement can be used to check if the ligand is present. Additionally, if the resolution is high enough (<3.0 Å), then the difference map can also provide insights into the location of water molecules as well as ions interacting with the ligand and the protein residues.

In the absence of the apo-protein map, it is now possible to use Servalcat22, which is available as a standalone tool and has also been integrated into the CCP-EM software suite23,24 as part of the Refmac refinement and in CCP4 8.0 release25,26. Servalcat allows the calculation of an FSC weighted difference (Fo-Fc) map using the unsharpened half-maps and the apoprotein model as input. The Fo-Fc omit map represents the disparity between the experimental map (Fo) and the map derived from the model (Fc). In the absence of a ligand in the model, a positive density in a Fo-Fc map that overlaps with the experimental EM map typically suggests the presence of the ligand. The assumption here is that the protein chain is well-fitted in the map, and the remaining positive density indicates the location of the ligand. However, it is important to meticulously examine whether the positive density stems from modeling inaccuracies, such as the wrong rotamer of a protein side-chain.

The second step involves obtaining or creating a cartesian coordinate file of the ligand with well-defined geometry from the available chemical information. Standard ligands (for example, ATP and NADP+) that are already available in the CCP4 monomer library can be used for refinement by retrieving the coordinate and geometry files via their monomer accession code. However, for unknown or non-standard ligands, various tools are available to create the geometry files. Some of the examples include the eLBOW27 – (electronic ligand builder and optimization workbench) in Phenix28, Lidia – an in-built tool in Coot29, JLigand/ACEDRG30,31, CCP-EM23,24, Ligprep32-a module of Glide within the Schrodinger suite. The ligand coordinate file is then fitted in the density, guided by both the experimental cryoEM map and the difference map in Coot. This is followed by real-space refinement in Phenix28 or reciprocal refinement in Refmac33. A Linux workstation or a laptop equipped with a good graphics card and the above-mentioned software is required. Most of these programs are included in various suites. CCP-EM24 and Phenix28 are freely available to academic users and include a variety of tools that are used in this article, including Coot, Refmac533,34,35,36, Servalcat, phenix.real_space_refine, etc. Similarly, Chimera37, and ChimeraX38 provide free licenses to academic users.

Protocol

1. Modeling phosphoenolpyruvate (PEP) in enolase from Mycobacterium tuberculosis

  1. Identification of ligand density in the cryoEM map of PEP-enolase complex
    1. Download the unsharpened half-maps (emd_30988_additional_1.map and emd_30988_additional_2.map) of apo-enolase from the additional data in EMDB (see Table of Materials).
    2. Open ChimeraX (see Table of Materials). Open the half-maps of apo-enolase by clicking on Open from the toolbar and selecting the file names. Type vop add #1 #2 in the command line to combine both the half-maps to obtain the apo-enolase unsharpened map.
      NOTE: #1 and #2 denote the two half maps for the apo-enolase enzyme, as mentioned in step 1.1.1.
    3. Rename the combined map (ChimeraX map ID: #3) by typing rename #3 Apo-enolase-unsharpened-map. Color the map gray from the model panel. The map indicates that enolase is octameric in the solution with D4 symmetry.
    4. Repeat steps 1.1.1-1.1.3 using the following half-maps emd_30989_additional_1.map (map ID: 4) and emd_30989_additional_2.map (map ID:5) to obtain PEP-enolase-unsharpened-map (map ID: #6).
    5. In order to compute a difference map, first fit the two maps (PEP-enolase and apo-enolase unsharpened maps from step 1.1.3 and step 1.1.4) onto each other by clicking on Fit (map in map) in the Map panel (toolbar) in ChimeraX.
      NOTE: Normalization is not performed between the two maps. In some cases, normalization may be required to bring the maps to the same scale.
    6. Subtract the PEP-enolase map from the apo-enolase map by typing vop subtract #6 #3 in the command line.
      NOTE: #6 = PEP-enolase unsharpened map, #3 = Apo-enolase unsharpened map.
    7. Color the subtracted map (map ID:7) in green, denoting positive density (as per the convention in X-ray crystallography)39.
    8. Download the coordinates of M. tuberculosis octameric enolase (PDB: 7e4x) from the Protein Data Bank (PDB), click on open from the toolbar, and select the filename.
    9. Rename the model 7e4x.pdb by typing rename #8 Apo_enolase.pdb in the command line. #8 denotes the Apo_enolase.pdb in ChimeraX.
    10. Select the model from the model panel. Click on Right Mouse from the toolbar. Click on Move molecule to position the model in proximity to the PEP-enolase map (#6) and align the model with respect to the map. Fit this model into the PEP-enolase-unsharpened-map by typing fit #8 in #6 in the command line. Here, the map is numbered 6, and the model is numbered 8.
      NOTE: In some cases, the local fit in ChimeraX may not be enough to align the model. In these cases, an additional step of global fitting (DockinMap, see Table of Materials) may be performed in Phenix.
    11. Save the fitted model by typing save Apo-enolase.pdb #8 in the command line.
  2. Modeling and refinement of PEP in the B-factor sharpened cryoEM map of PEP-enolase complex
    1. Open "coot" (see Table of Materials) from the terminal by typing ./Coot &.
    2. Display "Apo-enolase.pdb" by clicking on File > Open Coordinates Apo-enolase.pdb. The coordinate file is displayed by bonds (color by atom).
    3. Download the PEP-bound Enolase sharpened map, i.e., emd_30989.map from EMDB. Display the same by clicking on File > Open Map > emd_30989.map . Set the threshold to 7.00 σ by scrolling the middle mouse button.
    4. Find unmodeled blobs by clicking on Validate > Unmodeled blobs > Find Blobs.
    5. Locate the unmodeled ligand density near the active site residues, Ser 42, Lys-386 and Arg-364 of the enolase model.
    6. Get the PEP monomer model file from the Coot monomer library by clicking on File > Get Monomer and entering PEP as the 3 Letter code.
    7. Move the PEP molecule to the density using the Rotate/Translate- Zone/Chain/Molecule option in the sidebar menu. Use real-space refinement in Coot to fit the PEP molecule in the density by clicking on Real Space Refine Zone in the sidebar menu. Merge the fitted ligand with the Apo-enolase.pdb by using the merge molecule option in the edit tab. Save the model as PEP-Enolase.pdb.
      NOTE: One can also use ligand> Jiggle-Fit Ligand option to fit the ligand as well.
    8. To add ligands to the rest of the monomers in the coordinate file, click on Calculate > NCS Tools > NCS ligands. A separate window named Find NCS-Related Ligands will appear. For the Protein with NCS option, select the coordinate file Apo-enolase.pdb and Chain ID A as NCS Master Chain.
      1. For the molecule containing ligand, select the Apo-enolase.pdb as the coordinate file and Chain ID, J, and Residue Number 1 a 1. Click on Find Candidate Positions. A window named Fitted Ligands will appear with a list of candidate positions. Evaluate the fitting by clicking on the individual candidate ligands and analyze the fit visually.
        NOTE: In Servalcat/Refmac, only a monomer model can be provided, and the symmetry used in reconstruction can be given to obtain an expanded model.
    9. Additional density for the solvent molecules was also observed near the ligand. High-resolution crystal structures of enolase from various homologs suggest the presence of two Mg2+ ions40,41, which probably stabilizes the negative charge of the reaction intermediate. Model two Mg2+ ions in the active site density by clicking on place atom at the pointer and selecting MG from the Pointer Atom type list.
      NOTE: The bond geometry and distance may be used as a guide to select the metal ion. In the case of Mg2+ bound to oxygen atoms in protein residues, the bond distance varies between 2.1 Å -2.4 Å, with an octahedral geometry. However, in the case of low-resolution maps, the coordination sphere is often incomplete, and the distance may vary as well because of limitations in the resolution.
    10. Repeat step 1.2.8 in order to add the Mg2+ ions in the symmetry-related monomers.
    11. Save the model by clicking on File > Save coordinates > Select Filename > and typing Enolase+PEP+Mg.pdb.
    12. Open Phenix GUI (see Table of Materials). Run a real space refinement job with the Enolase+PEP+Mg.pdb and the emd_30989.map as the input model and the map, respectively, using default parameters. This step is done iteratively with Coot to achieve good map-model fit and geometry.
      NOTE: The unsharpened difference map is used to demonstrate the presence of ligand, such as in a figure. For modeling purposes, the B-factor sharpened map has been used and is recommended. The B-factor sharpened map can also be used to calculate the difference map for demonstration purposes as opposed to the unsharpened map. However, if the resolution is lower in the region where the ligand is bound, then the single B-factor used for sharpening the entire map may not clearly reveal the ligand density.
  3. Visualization and figure generation of the modeled ligand
    1. Open the refined Enolase+PEP+Mg model from the final Phenix refinement job in PyMOL42 (see Table of Materials) by clicking on File > Open and selecting the file name.
    2. Load the emd-30989.map (PEP-bound sharpened map) by clicking on File > Open and selecting the filename.
    3. Rename the map as PEP-sharpened by clicking on actions > rename against the emd_30989 object and typing PEP-sharpened.
      NOTE: In most cases, for example enolase, the maps when opened in pymol are normalized as the normalize map option is checked by default in pymol. As cryoEM maps are centered on a larger box, the normalisation will include everything in the box. Thus, normalisation can be turned off before opening in pymol.
    4. Select the ligands by clicking on display > sequence.
    5. Tipo isomesh mesh_ligands, PEP-sharpened, 6.0, sele, carve=3.0 in the command line and press enter.
    6. Color the mesh_ligand blue by clicking on the last box, C, next to the mesh_ligand object.
    7. Display the active site residues, Ser-42, Asp-241, Glu-283, Asp-310, Arg-364 and Lys-386 interacting with the PEP and Mg2+ in stick and sphere representation, respectively, and the enolase enzyme in cartoon representation.
    8. Ray trace by typing ray 3600, 3600 to generate publication quality images and save as a png file by clicking on File > save and choosing PEP.png as the file name.

2. Modeling of ligands in metabotropic Glutamate receptor mGlu5

  1. Identifying and modeling the ligand density in the cryoEM map of agonist and antagonist-bound mGlu5
    1. Download the two half-maps along with their corresponding coordinate files for each of the agonist bound (EMD-31536, 7fd8.pdb) and the antagonist bound (EMD-31537,7fd9.pdb) mGlu5 complexes.
    2. Open ChimeraX
      1. Open the coordinate file, 7fd8.pdb by clicking open and selecting 7fd8.pdb.
      2. Tipo delete ligand in the command line and press Enter.
      3. Similarly, use the command delete/B to delete the chain B.
        NOTE: Ligands and chains can be deleted by using the drop-down menu at the top using select and actions > atoms/bonds > delete.
      4. Save the unliganded pdb by typing the following in the command line: save 7fd8_noligand_chainA.pdb #1. #1 denotes the model ID.
      5. Repeat the steps 2.1.2.1-2.1.2.4 for 7fd9.pdb.
    3. Open CCP-EM. Create a new project named mGlu5 and specify the project directory and username.
      1. Open Relion from the options on the left panel.
        1. Go to the Mask Creation tab.
        2. Provide one of the half-maps for EMD-31536 as the input.
          NOTE: Relion accepts maps with .mrc as a suffix, and the EMD maps have .map suffix. The map files can be opened in ChimeraX for examination and saved in the .mrc format.
        3. In the mask tab, provide pixel size as 0.89 Å and initial binarization threshold as 0.007.
        4. Run the job with the alias 31536 (or any other name that is easy to follow).
        5. Similarly, create a mask for the EMD – 31537 half map.
      2. Open Refmac Servalcat program from the options in the left panel in CCPEM.
        1. Provide the name as mGlu5_agonist.
        2. Import the 7fd8_noligand_chainA .pdb file as described in the model section 2.1.2.4. Provide the location of the two half maps and the corresponding mask created in Relion.
        3. Specify the resolution as 3.8 Å.
        4. Go to refinement settings > Strict Symmetry and mention C2 as the Relion point group symmetry.
        5. Start the program by pressing the start button at the top.
    4. Once the job is finished, open the result in Coot (option available at the top in the result section). This will display a diffmap.mtz in Coot by default, where the colors red and green indicate negative and positive densities, respectively.
      1. Go to Display manager > properties of the difference map labeled DELFWT PHDELWT and change the contour level to 4.0 (absolute value).
        NOTE: The choice of the threshold depends on the map and resolution.
      2. Hide the map labeled FWT PHWT and the molecule labeled refined.pdb.
      3. Move the map around using the mouse to visualize the positive density (colored green) and bring the larger blob to the centre.
      4. Go to File > Get monomer, type QUS, and hit enter.
      5. Go to Calculate > Modeling > Rigid Body Fit molecule and double-click on the QUS monomer to adjust the molecule to fit in the Fo-Fc density.
      6. Use the merge molecule option in the edit tab to add the QUS molecule to the coordinate file, refined.pdb.
      7. Repeat steps 2.1.4.3-2.1.4.6 to fit the ligand with monomer code NAG and CHS in the smaller blob in the extracellular and at the top of the transmembrane domain, respectively.
      8. Save the coordinate as refined_ligands.pdb.
        NOTE: The resolution of the transmembrane (TM) domain is lower, and thus, it is not discussed here.
  2. Refinement of the mGlu5 liganded structure
    1. Open CCPEM and clone the previous refinement job (step 2.1.3.2) and run it with the refined_ligands.pdb and sharpened map (emd. 31536.mrc) as input, using default parameters and C2 symmetry.
      NOTE: If the program doesn't recognize the ligand, then the CIF files need to be provided. These CIF files can be downloaded from the PDB or generated using various programs (see step 3.1.1 for details).
  3. Visualization and figure generation in PyMOL
    1. Open the refined_expanded.pdb model from the latest Refmac refinement job in PyMOL.
    2. Go to File > Open and browse to the Refmac job directory to load the diffmap_normalised_fofc.mrc (difference map from Servalcat job in step 2.1.3.2) file.
      NOTE: If the .mrc extension files are not visible while browsing, then change the file type to all files and browse.
    3. Select the ligands using display > sequence.
    4. Go to Actions button and rename the selection as ligands.
    5. Type the following in the command line and hit enter: isomesh mesh_ligands, diffmap_normalised_fofc, 6.0, ligands, carve=2.5.
      NOTE: The mesh width can be adjusted. Here, 0.2 mesh width is used, and this is set by using the following command: set mesh_width=0.2.
    6. Right-click on one of the monomer and go to chain > color to specify the color of each monomer. Color the ligand and mesh using the last box labeled c in the ligand and mesh_ligands objects. The mGlu5 receptor dimer is shown in cartoon representation, and the monomers are colored in teal and wheat. Ligands are shown in stick, and the mesh contouring the ligands is colored green.
    7. Use the actions > orient/ center/zoom option against the objects and adjust manually to visualize the mesh and clearly. Select the nearby residues that could be interacting with the ligand, for example, Tyr64, Trp100 for QUS, and as required, display their side chain or main chain or both as stick representation.
    8. Ray trace to generate high-resolution images and save them as a png file.
      NOTE: The above steps are repeated for the antagonist bound map EMD-31537 and the corresponding coordinate file PDB-7fd9.

3. Modeling the inhibitor, deoxygalacto-nojirimycin (DGN), and solvent molecules in a high-resolution sharpened map of β-galactosidase

  1. Identification of the ligand, DGN, using the half-maps from cryoEM
    1. Preparation of unknown Ligand dictionary for modeling [Deoxygalacto-nojirimycin(DGN)]
      NOTE: The Deoxygalacto-nojirimycin dictionary file is included in the CCP4 monomer library as DGJ (3 letter ligand ID). Nevertheless, here it is considered as a new and unknown ligand to illustrate the process of generating dictionary files for unknown ligands, DGN is used as the 3 letter code.
      1. Open the compound summary for deoxygalacto-nojirimycin (DGN) in the PubChem webpage and copy the smiles string for the compound deoxygalacto-nojirimycin.
      2. Open Phenix > Ligands > eLBOW.
      3. Specify the smiles string as input.
      4. Specify appropriate ligand building optimization method. Here, simple optimization is used.
      5. Paste the chemical smiles string in the ligand definition section.
      6. Provide a job title, output file prefix, and ligand ID appropriately and press run.
        NOTE: The output of the job contains a coordinate file, DGN.pdb, and a dictionary file, DGN.cif file. Jligand29 can also be used to make the cif file.
    2. Download the β-galactosidase coordinate file (6tsh.pdb) from the PDB and remove the coordinates for the protein chains (B, C, D), ligand, DGN, and other solvent molecules by using a text editor or delete chain/zone option in Coot. Save the coordinate file as apo-betaGal_chainA.pdb.
      NOTE: ChimeraX or Pymol can also be used to remove the ligand/solvent molecule.
    3. Download the half-maps (emd_10563_half_map_1.map and emd_10563_half_map_2.map) from additional data of the EMD-10563 entry from EMDB.
    4. Open CCPEM and use Relion to create the mask for the map at a threshold of 0.01 (as described in steps 2.1.3.1.1-2.1.3.1.4).
    5. Run Servalcat (within the CCPEM suite as described in step 2) with the half-maps obtained in step 3.1.3 and the apo-model in step 3.1.2 and D2 symmetry.
      NOTE: The model has to be aligned to one of the half-maps or full maps for locating the ligand density. This can be done by fitting the model into the map using ChimeraX and then saving the coordinates relative to the half-map. In this example, half-maps and the primary map in EMDB have different box sizes/grids, and the model needs to be placed in the half-map.
    6. Open Coot.
      1. Open DGN.cif for the ligand dictionary as generated in step 3.1.1 with Phenix eLBOW. This is done by going to File > Import CIF dictionary and browsing the eLBOW job directory.
      2. Open the protein and ligand coordinate files, apo-betaGal_chainA.pdb from step 3.1.2 and DGN.pdb from step 3.1.6 respectively.
      3. Auto-open the diffmap.mtz file from the Servalcat job (step 3.1.5) and visualize the map labeled DELFWT PHDELWT in Coot. Contour the map at a threshold of ~ 4 absolute value.
        NOTE: It is important to check the map statistics by typing header map.mrc or using tools in Chimera. All the necessary information on the pixel size, box size, minimum, mean, and maximum density, and rms deviation from mean density can be obtained. It is crucial to check the pixel size and box size of the cryoEM maps. The 3D map represents the protein-ligand map volume in a grid of 3D voxels. In each voxel, the electron potential value or the probability of finding the electron at that location is stored. When a certain threshold is chosen, then the voxels, which have a density lower than the specified value, are set to zero. This helps to minimize the experimental noise in the map and visualize the map features better43. Thus, the map should be contoured at a threshold where the map features are easily visible and the noise in the map is minimum.
      4. Go to Ligand > Find Ligands. In the new window that appears, select ligand, the difference map, and the apo-betaGal_chainA.pdb model. Leave the other options at their default settings and click Find Ligands to start the search.
      5. A list of top hits showing potential ligand density is displayed with a copy placed in each of the densities. Manually check all the hits where the ligand has been placed. Keep the most probable hits and remove the ambiguous ones.
        NOTE: The difference in map density at a high threshold clearly suggests the presence of the ligand in the active site.
  2. Modeling the ligand DGN and solvent molecules
    1. Perform a real space refinement in Coot to fit the ligand (DGN.pdb) in the difference density map, and then merge the ligand coordinates with the apo-betaGal_chainA.pdb and save it as betaGal_chainA+ligand.pdb.
      NOTE: The ligands that have been placed in regions of other chains can be removed and are not merged while saving the merged coordinate file. The ligands in other chains can be generated by NCS Ligands as shown in example 1, step 1.2.8, or in Refmac/Servalcat using symmetry expansion.
    2. Run another Servalcat job as above (step 3.1.5) with betaGal_chainA+ligand.pdb as input model and the half-maps (from step 3.1.3) with D2 symmetry.
    3. Difference density (from the Servalcat job at step 3.2.2) surrounding the modeled ligand suggests the presence of the solvent molecules coordinating with the ligand molecule. The central density close to the ligand appears to be too big for water molecules.
    4. Based on previous biochemical and structural data indicating the presence of Mg2+ at that position, the Mg2+ ion is modeled here by clicking the place atom option and specifying the atom as MG.
    5. Model water molecules in the difference density in the active site that indicate the presence of solvent molecule by clicking on the place atom at pointer and selecting water.
      NOTE: Coot also has the option to pick water molecules automatically. Here, since the focus is only on the active site, water molecules are added manually.
    6. Merge the coordinates for Mg2+ and water with the betaGal+ligand.pdb file and save it as betaGal+ligand+solvent_chainA.pdb.
    7. With the betaGal+ligand+solvent_chainA.pdb, perform Refmac refinement in Servalcat with D2 symmetry.
      NOTE: The output difference map from this job can serve as a means to validate whether the ligand has been accurately modeled, as the difference map should show minimal residual positive or negative density.
  3. Visualization and figure generation with PyMOL
    NOTE: The steps outlined below provide a few examples of making figures using models and maps that can be used for illustrating the presence of ligands and solvents.
    1. Open the refined_expanded.pdb from the last Servalcat job (step 3.2.6) with the ligand and solvent modeled.
    2. Select different chains separately to color them magenta, yellow, green, and teal, as described in example 2.
    3. Go to Display > Sequence, make separate selections for the water molecules, magnesium, and DGN, and rename the objects as water, MG, and DGN, respectively.
    4. Display MG and water molecules as spheres by selecting the objects, right-clicking, and show > sphere. Similarly, display the ligand in stick representation and color the ligands and solvents appropriately. In this case, the ligand has been colored yellow by a heteroatom, the magnesium ion is colored purple, and water molecules are colored red.
    5. Select DGN, MG, and water molecules. Right-click and select actions > copy to object and name it as ligands.
    6. Open the diffmap_normalised_fo.mrc from the Servalcat job in step 3.2.6 and rename as ligand_fo.mrc. Type the following command: isomesh mesh_ligands_fo, ligand_fo, 3, ligands, carve=2.
      NOTE: The carve option of 2 Å is applied around the atoms, and depending on the resolution and quality of the maps, values of 3-5 can be used. Furthermore, when opening cryoEM maps in PyMOL, it is advisable to set normalize_ccp4_maps to off.
    7. Use the actions > orient/ zoom options and adjust manually to visualize the ligands and nearby interacting residues (wherever applicable) as shown in step 2.3.7.
    8. Ray trace these scenes to save high-resolution images using the command ray 3600, 3600.
    9. Save the scene as .png file.
      NOTE: The following steps are optional and outline the process to visualize (1) the difference density (Fo-Fc) for the unmodeled ligand, DGN, (2) the density (Fo) of the modeled ligand, DGN as well as the difference density (Fo-Fc) for unmodeled solvents and lastly for the (3) the density (Fo) for the modeled ligand, DGN and the solvent molecules in the active site.
    10. To demonstrate the presence of the ligand, DGN, and surrounding solvent molecules using the difference map, open diffmap_normalised_fofc.mrc from the Servalcat job in step 3.1.5. Rename the map file as unliganded_fofc.mrc by clicking actions > rename next to the map object. Type the following in the command line: isomesh mesh_ligands_fofc, unliganded_fofc, ligands, level=6, carve=2.
    11. To demonstrate the density of the modeled ligand, DGN, and the surrounding unmodeled solvent density, open the diffmap_normalised_fo.mrc as well as diffmap_normalised_fofc.mrc from the servalcat job in step 3.2.2. Rename the diffmap_normalised_fo.mrc as DGN_fo and the diffmap_normalised_fofc.mrc as DGN_unmodelled_solvents_fofc.
    12. Select MG and water from Display > Sequence, right-click and select actions > copy to object, and name them as solvents.
    13. Type the following in the command line: isomesh mesh_DGN_fo, DGN_fo, DGN, level=3, carve=2; isomesh mesh_solvent_fofc, DGN_unmodelled_solvents_fofc, solvents, level=6, carve=2.
      NOTE: The mesh_DGN_fo will show the density of the ligand, and the mesh_solvent_fofc will show the omit density for the MG and water.
    14. Finally, to visualize the modeled ligands as well as the surrounding solvent molecules in the active site, open the diffmap_normalised_fo.mrc from the Servalcat job in step 3.2.6 and rename it as ligand_fo.mrc.
    15. Type the following in the command line: isomesh mesh_ligand_fo, ligand_fo, selection=ligands, level=3, carve=2.
      NOTE: Here, we used diffmap_normalised_fo.mrc, but the final sharpened map can be used to show the density of ligands and the surrounding residues. It is a good practice to mention the type of map used, B-factor sharpening values in the figure legend.
    16. The mesh width and sphere size can be set by typing the following commands: set mesh_width=0.2 and set sphere_scale=0.25 to make the sphere size smaller.
    17. Color the fo mesh blue and the fofc mesh green.
    18. Use the actions > orient/ zoom options and adjust manually to visualize the ligands and nearby interacting residues (wherever applicable), as shown in step 2.3.7.
    19. Ray trace these scenes to save high-resolution images using the command ray 3600, 3600.
    20. Save the individual scenes as .png files.

4. Effect of resolution on ligand modeling in β-galactosidase

  1. Open Terminal and go to the directory containing the two half-maps.
  2. Open the two half-maps (step 3.1.3) in .map format in ChimeraX, visualize them and save them as emd10563_half1_1_unfil.mrc and emd10563_half2_1_unfil.mrc.
  3. The mask created in step 3.1.4 can be used as an input.
  4. Run a PostProcessing job in Relion using half-maps and masks from steps 4.2-4.3, with the default parameters.
  5. Repeat step 4.4, now with a Skip FSC-weighing enabled and specifying an ad-hoc low-pass filter of 3 Å.
  6. Repeat the step 4.4 with an ad-hoc low-pass filter of 3.5 Å.
  7. Open the maps (postprocess.mrc) from steps 4.4-4.6, filter at different resolutions, and the model coordinate, 6tsh.pdb (aligned to the half-maps in ChimeraX) file from step 3.1.2 in PyMOL. Rename the different maps as 3.0, 3.5, and 2.3 as defined by their resolutions.
    NOTE: One can confirm the low-pass filtering of the maps by visualizing them in ChimeraX or Coot.
  8. Visualise the effect of resolution on the density of ligand and solvent molecules by creating a mesh of 2 Å around the ligand and solvent molecules at a threshold of 6σ as described in step 3.3.6 with maps filtered to different resolutions.

Representative Results

Example 1
The enzyme enolase from M. tuberculosis catalyzes the penultimate step of glycolysis and converts 2-phosphoglycerate to phosphoenolpyruvate (PEP), which is an essential intermediate for several metabolic pathways44,45. CryoEM data for the apo-enolase and PEP-bound enolase samples were collected at the same pixel size of 1.07 Å, and image processing was performed with Relion 3.146,47. The apo-enolase and the PEP-enolase structures were determined at 3.1 Å and 3.2 Å, respectively48. The maps and models were deposited in EMDB and PDB49,50 (EMD-30988, EMD-30989, PDB-7e4x, and PDB-7e51). The cryoEM map of the enzyme shows that it is an octamer in the solution (Figure 1A). In order to identify the ligand density in the PEP-enolase map, the unsharpened maps of the apoenzyme and PEP-bound enzyme were selected, and a difference map was calculated in ChimeraX, by subtracting the PEP-enolase map from the apoenzyme map. A distinct density (green) at a high threshold was observed, which suggested the presence of the ligand (Figure 1B). Modeling the protein chain in the unsharpened map clearly indicated that the extra density is present in the active site of the protein (Figure 1C). The ligand, PEP, was then modeled in the B-factor sharpened map using Coot, and the protein+ligand model was refined in real space with Phenix. Two Mg2+ ions were modeled in the density observed in the vicinity of the ligand (Figure 1D). The ligand, PEP, adopts a similar orientation as observed in other enolase homologs, and several active site residues, such as Lys-386, Arg-364, form hydrogen bond interactions with the ligand PEP. The Mg2+ ions form metal coordination bonds with Asp-241, Glu-283, Asp-310, and the phosphate of PEP (Figure 1D).

Example 2
In the absence of an available apo-protein structure or if the protein undergoes a large conformational change, calculating the difference maps as described above is not possible. In 2021, Garib Murshudov's group at the Laboratory of Molecular Biology, Cambridge, introduced Servalcat22, which implements a refinement workflow using Refmac and also calculates a Fo-Fc difference map after refinement. A positive Fo-Fc difference density suggests the presence of molecules/ligands that were not included in the model during refinement, essentially an omit map. However, it is recommended to first assess the fit of the model to the map in general and then evaluate the difference density map.

To illustrate the use of Servalcat/Refmac, mGlu5, a dimeric G-protein coupled receptor that binds to the neurotransmitter, L-Glutamate was chosen. Upon binding of the agonist, L-quisqualate, the extracellular domain reorients, which triggers the rotation of the 7TM, bringing them closer to stabilize the activated state. Thus, there is a large conformational change observed between the apo/antagonist vs. agonist bound states51 (Figure 2A and Figure 2E). The two half-maps for both the agonist (EMD-31536) and the antagonist (EMD-31537) bound complexes were obtained from EMDB, and the cryoEM maps show varied resolution across the molecule and better-resolved extracellular domain. Subsequently, these were utilized as inputs in Servalcat along with the apo-protein as the model to compute the difference or Fo-Fc map for each dataset. This map distinctly showed the presence of various ligand molecules (non-protein). The resolution estimated by FSC (Fourier Shell correlation) for the agonist and antagonist bound complexes were 3.8 Å and 4.0 Å, respectively. In the case of the agonist-bound mGlu5, the Servalcat Fo-Fc difference map showed the presence of both the agonist (L-quisqualate) (Figure 2B) and N-AcetylGlucosamine (NAG) (Figure 2C) in the ECD of the receptor (due to lower resolution of the TMD, here we focus only in the ECD and the top of the TMD). Protein residues, including Tyr-64, Trp-100, Ser-151, and Thr-175, are seen to interact with the agonist. Density near the Asn-210 residue suggested the presence of N-acetyl glucosamine (Figure 2C). A density consistent with cholesteryl hemisuccinate, which was added during the purification of mGlu5, was observed near the top of transmembrane helix 1 (Figure 2D). As the resolution is moderate and the ligand, L-quisqualate, can be placed in different orientations, prior structure of the extracellular domain with the ligand (PDB-6N50) was used as a guide to model the ligand. Antagonist binding stabilizes the open or resting state of the receptor (Figure 2E). A density consistent with the antagonist LY341495 was observed at the hinge of lobe I and lobe II of the venus fly trap domain in the ECD. The antagonist interacts with residues similar to those of the agonist. Stacking interaction between Tyr-223 in lobe II with the antagonist stabilizes the receptor in an open state (Figure 2F). Similar to the agonist structure, glycosylation or the presence of N-acetylglucosamine moiety was observed near Asn-210 (Figure 2G).

Example 3
The third example elucidates the protocol to model fragment-sized or small-sized ligands and solvent molecules in high-resolution CryoEM maps. Fragment-based Drug Discovery (FBDD) has emerged as a powerful and innovative method in the development of target-based novel therapeutics in various disease areas, making it a promising avenue in pharma R&D52,53. FBDD begins with screening and careful selection of small, highly soluble, low-molecular-weight fragments of molecules that bind to specific target proteins or biomolecules of interest. Determining the structures of these protein-fragment complexes reveals the mode of binding of these fragments, which serves as a guide to designing larger and more complex drug-like molecules with increasing affinity and specificity towards the target protein54. However, this method demands a high-resolution ligand density to accurately determine the pose and correctly place the functional groups of the ligand15.

β-galactosidase, one of the first high-resolution structures to be determined from the advancements in cryoEM technology, is a well-studied 450kDa homotetrameric enzyme that catalyzes the hydrolysis of lactose to glucose and galactose55. To showcase the use of cryoEM in FBDD, Astex, UK, determined the structure of β-galactosidase with a fragment-sized inhibitor, deoxygalacto-nojirimycin (DGN) bound in the active site (EMDB-10563, PDB:6tsh)56. This dataset is used to illustrate the protocol to model ligands and solvents unambiguously in high-resolution maps. To showcase the effect of resolution on ligand modeling and visualization, the maps were filtered to 3.0 Å and 3.5 Å in the post-processing step in Relion. This highlights the quality of the map density at different resolutions and underscores the need for higher resolution for modeling ligands and solvents.

The enzyme is a tetramer with D2 symmetry in solution (Figure 3A). Difference map (between map and model), as computed by Servalcat, suggested the presence of DGN and several solvent molecules in the active site of the enzyme (Figure 3B). At an estimated resolution of 2.3 Å, the density showed high-resolution features, which helped in the accurate modeling of the inhibitor in the protein active site. Interactions between DGN and Tyr-503 and His-540 were observed (Figure 3C). The difference density also suggested the presence of solvent molecules that interact with DGN as well as the protein residues. Mg2+ and several water molecules were modeled in the density (Figure 3D). Metal coordination bonds between Mg2+ and Glu-416, Glu-461, and several water molecules are observed (Figure 3D). Mg2+ was seen to interact with DGN via a water molecule.

At a lower resolution of 3.5 Å and 3.0 Å, the ligand density resembles a blob and lacks high-resolution features crucial for accurate modeling of the ligand (Figure 4A,B). Density for water molecules was almost non-existent at these resolutions. In summary, with increasing resolution, especially higher than 3.0 Å, the density enabled the modeling of a greater number of water molecules (Figure 4C,D). The correct positioning of the ligand's chiral centre became achievable at ~2.3 Å (Figure 4C,D) due to the presence of distinct features in the map that guided the placement as well as modeling of water molecules and Mg2+. In comparison, the density for Mg2+ remained discernible across the resolution range (Figure 4AC).

Figure 1
Figure 1: Modeling the ligand phosphoenolpyruvate in M. tuberculosis enolase. (A) shows the B-factor sharpened map of the PEP-bound enolase enzyme. The map suggests that enolase is octameric in solution, and each monomer in the map is colored differently. (B) displays an unsharpened cryoEM map of apo-Enolase enzyme in grey with the difference map (between PEP-bound and apo-Enolase unsharpened maps) overlaid in green, suggesting the presence of ligand, PEP. This is a demonstration map to show the presence of ligands. (C) displays the fit of the enolase model in the unsharpened cryoEM map, highlighting the position of the difference density relative to the protein. The protein model is shown in cartoon representation and colored in Chainbow. This figure shows that the extra density (green) is present in the active site of each monomer. (D) shows the ligand, PEP, encased in the B-factor sharpened map, colored blue. Additionally, density for two Mg2+ ions was also observed, which forms metal coordination bonds with several active site residues, including Ser-42, Asp 241, Glu-283, Asp-310, and ligand atoms. The ligand makes hydrogen bond interactions with Lys-386, Lys-335, and Arg-364. The protein residues are shown in stick representation, and Mg2+ ions are shown as purple spheres. The figures in panels (AD) were generated with Pymol. Please click here to view a larger version of this figure.

Figure 2
Figure 2: Identification, modeling, and visualization of various ligands in mGlu5 receptor. The Fo-Fc omit maps were obtained using Servalcat. (A) shows mGlu5 receptor dimer structure (PDB-7fd8) in cartoon representation, with each monomer colored in teal and wheat, respectively, and bound to the agonist L-quisqualate. All the ligands that were identified in the extracellular and on the top of the transmembrane domain of the receptor are encased in the Fo-Fc omit map, colored green, and contoured at 6σ.(B) highlights the fit of the agonist, L-quisqualate encased in the difference map. L-quisqualate interacts with several mGlu5 residues including Tyr-64, Trp-100, Ser-151, Thr-175 and Gly-280. The H-bond interactions are represented by red dashes. In (C), an additional density is evident near Asn-210, which is present in the receptor's extracellular domain, and N-acetylglucosamine molecule (NAG) was modeled in this density. For clarity, the NAG is not linked to Asn in the current figure. (D) shows the difference in density for cholesterol hemisuccinate (CHS) in green near the lipid-exposed surface of the receptor. The CHS molecule, represented in the sticks, was modeled in this density. (E) shows mGlu5 receptor structure (PDB-7fd9) bound to the antagonist LY341495 in cartoon representation. In (F), an extra density in the Fo-Fc difference map located at the hinge between lobe I and lobe II of the extracellular domain indicates the presence of the antagonist. Key residues (Tyr-64, Trp-100, Ser-152, Ser-173, Thr-175, and Tyr-223) around the antagonist are shown in stick representations, and potential hydrogen bond interactions with the antagonist are shown in red dashes. (G) shows the difference in density suggestive of the presence of NAG molecule near Asn-210 in the antagonist-bound structure (note, the NAG is not linked with Asn for clarity). The figures were generated with Pymol. Please click here to view a larger version of this figure.

Figure 3
Figure 3: Identification, modeling, and refinement of a small inhibitor and solvent molecules in the high-resolution map of β-galactosidase (EMD-10563). (A) shows the β-galactosidase model resolved to 2.3 Å (PDB: 6tsh) in cartoon representation, where each monomer is colored distinctly. The grey box highlights the ligand binding site. (B) displays the Fo-Fc difference density (from Servalcat) in green mesh in the active site of the enzyme. The difference in density suggests the presence of the inhibitor (DGN) and several solvent molecules in the active site. (C) demonstrates that guided by the Fo-Fc map, the inhibitor deoxygalacto-nojirimycin (DGN) is modeled in the active site. This modeled ligand is depicted in stick format and encased in the Fo density (by Servalcat), which is colored in blue_mesh. Hydrogen bond interactions between DGN and several protein residues, including Tyr-503 and His-540, are observed. The additional Fo-Fc difference density around the ligand (green) is indicative of the solvent molecules. (D) The map shows several solvent molecules, including water and Mg2+, represented as red and purple spheres, respectively, are modeled in the active site after ensuring that each solvent molecule is bonded to either protein (Glu-416, His-418, and Glu-461) or ligand residues. The water molecules and Mg2+ are encased in the Fo density (blue mesh). Mg2+ is seen to interact with the ligand, DGN via a water molecule. The Fo-Fc map (green mesh) in panels (B,C) is contoured at 6σ, whereas the Fo density – blue mesh in C and D (from the Servalcat refinement after modeling) is contoured at 3σ. The figures in panels (AD) were generated with Pymol. Please click here to view a larger version of this figure.

Figure 4
Figure 4: Effect of resolution on ligand modeling in β-galactosidase. Half-maps from EMD-10563 were used as input in the Relion post-process step, and the combined post-process maps were filtered to resolutions 2.3 Å, 3.0 Å, and 3.5 Å with different B-factors. The map shown in all the panels is contoured at 6σ. For clarity, only the protein backbone is shown with no side chains, ligand or solvent molecules in panels A, B, and C. (A) The map is filtered to 3.5 Å, and the active site of β-galactosidase is shown. A blob resembling the ligand, DGN, is seen at this resolution, accompanied by a few smaller blobs in the vicinity. Modeling the ligand in the correct orientation proves challenging due to the lack of distinct features in the map. (B) The map filtered to 3.0 Å resolution is shown. Here, the ligand blob becomes slightly more defined but still lacks features in general. A few more small blobs suggestive of solvent molecules are also observed. (C) The map filtered to 2.3 Å resolution reveals the ligand density with distinct features, notably revealing the chair conformation of the iminosugar. A significant number of small blobs corresponding to water molecules are observed at this resolution. The automatic B-factor estimation/sharpening in Relion post-process gives a value of -18 Å2 for the maps filtered to 3 Å and 3.5 Å, while the value is -52 Å2 for the map filtered to 2.3 Å. Different B-factor sharpening of EM maps can also be performed with Coot and useful in model building. (D) The panel illustrates that the ligand DGN (stick representation), Mg2+, and water molecules (as spheres) as modeled in the active site and the sharpened map at 2.3 Å, shown in blue mesh surrounding these atoms. The figures in panels (AD) were generated with Pymol. Please click here to view a larger version of this figure.

Discussion

The improvements in microscope hardware and software have resulted in an increase in the number of cryoEM structures in recent years. Although the highest resolution achieved at the moment in single particle cryoEM is 1.2 Å57,58,59, the majority of the structures are being determined around 3-4 Å resolution. Modeling ligands in medium to low-resolution maps can be tricky and often fraught with ambiguity. Given the widespread use of cryoEM in both academia and the pharma industry for translational research and drug discovery, ensuring that ligands are modeled correctly and without ambiguity is essential. Thus, it is prudent to quantify the resolvability of ligand atoms by calculating Q-scores60, which is now available in the EMDB as a metric to evaluate the quality of the maps and the fit of the model as well as in Chimera.

In the first example, ChimeraX was used to compute the difference map in real space between the apo and the ligand-bound maps in the M. tuberculosis enolase enzyme. The additional density at a high threshold suggests the presence of the ligand, phosphoenolpyruvate, in the active site and, Mg2+ bound to the ligand. It is important to note that, in this case, the map is of medium resolution (3.2 Å), and water molecules cannot be modeled confidently (Figure 1). The limitation associated with this method is that it can only be applied when ligand binding does not induce significant conformational changes in the protein. Map normalization was not performed in this case since both the apo and ligand-bound datasets were acquired at the same pixel size and processed with identical parameters in Relion. However, it is worth noting that when comparing maps generated by different reconstruction programs, such as Relion46,47 and CryoSparc61, or with different quality, normalization of the maps becomes essential before meaningful comparisons can be made.

The next example is mGlu5, which undergoes large molecular reorganization upon agonist binding, as evident from the cryoEM structures51,62 (Figure 2). In this scenario, it is not feasible to compute a simple difference map between the unbound (apo) and ligand-bound receptors because of substantial differences between the maps. Here, Servalcat, which uses unsharpened and unweighted half-maps as input for refinement in reciprocal space and subsequently computes a difference map between the experimental map and the map derived from the model, was utilized. At a high threshold, one can visualize differences and act as a guide for correction and improvement of the model. Several unmodeled blobs in the extracellular and near the transmembrane domain of mGlu5 were observed and used as a guide to model ligands (Figure 2).

The third example shows how resolution (2.3 Å) plays a crucial role in map interpretation and modeling of a fragment-sized inhibitor in β-galactosidase. Here, the challenge was to identify a very small ligand (<200 Da) in the Servalcat difference map amidst the inherent noise in the data and model it accurately. In addition to the high global resolution determined using Fourier Shell Correlation (FSC), the local resolution specific to the ligand was also sufficiently high to ensure accurate placement of the chiral centres of the ligands (Figure 3). Density for solvent molecules was observed in the difference map throughout the enzyme and especially surrounding the ligand. The effect of resolution on modeling ligands and solvent atoms was also demonstrated (Figure 4). It is important to exercise caution when modeling water or solvent molecules because, on occasion, noise at a low threshold may resemble water or solvent molecules, leading to potential misinterpretations.

Another important consideration is that the cryoEM maps alone may not be sufficient to accurately identify a metal ion on its own. Additional biophysical methods like Extended X-ray Absorption Fine Structure (EXAFS) or Energy-Dispersive X-ray Spectroscopy (EDX) are often necessary to confirm the presence and identity of the metal ion. In both enolase and β-galactosidase enzymes, Mg2+ was modeled due to the wealth of information already available on these proteins, confirming the identity of the metal ion. Also, the coordination of the metal ions in these cases, exemplified by the classic octahedral geometry and the near-ideal coordination distances of Mg2+, provided substantial proof of their identity.

In general, a few important considerations should be taken into account while modeling ligands in cryoEM maps. To begin with, choosing the correct map for ligand identification and modeling is important. In all cases, an unsharpened and unweighted map or half-map is advisable over a sharpened and weighted map to visualize the ligand density, as sharpening and weighting could result in under-sharpened or over-sharpened (noise due to series termination) regions in the map. This may result in a suboptimal ligand density, and the use of different B-factor sharpening can be employed to evaluate the density during model building in Coot. Although there is a risk in using the sharpened map to identify ligands in a cryoEM map, the unsharpened map may not show the full detail of the ligand density but can be used for demonstration purposes, as shown in Figure 1B,C.

The pose of the modeled ligand should be validated, especially in cases where the data is weak. As shown here for mGlu5, the local resolution varies across the cryoEM map, and unbiased modeling of the ligand can be challenging. Servalcat can be used as a valuable tool for detecting potential inaccuracies in protein and ligand modeling22.

Compositional heterogeneity can be present in a protein-ligand complex where only a specific population may have the ligand present (if the ligand is of low molecular weight, then the classification step might not remove the heterogeneity). Nevertheless, it is important to perform the 3D classification63 step iteratively during image processing prior to ligand modeling and verify if the density of the ligand improves. If multiple copies of protein are present, caution should be exercised when applying symmetry across the map during initial model generation and refinement, as this may average out the ligand density in all the symmetry-related molecules. Symmetry should only be imposed after the map has been thoroughly inspected to confirm the presence of ligand density in all the protein chains.

Depending on the state (crystal or solution) and the location (buried or surface), the atoms can be dynamic, and in model refinement, this is referred to as the atomic displacement parameter (ADP). Along with the difference map, which provides visual clues to possible inaccuracies in the model, ADP values can be used to evaluate the accuracy of the ligands after refinement64,65,66,67. Typically, the ligand should have similar ADP values as the surrounding residues, i.e., if they are stably bound and modeled accurately. However, ligands at the periphery or atoms of a ligand (like lipids) that are far from macromolecules can have higher ADP values. In addition to refining the coordinates, both Refmac33,34 and Phenix allow refinement of ADP values28, 68. In Refmac, the Mott-Bethe approximation is used to calculate the electron scattering factor of individual atoms during map calculation. In the recent version of Phenix, individual B-factor refinement, similar to crystallography, has been introduced to account for atom disorder. Very often, in the refined models derived from cryoEM, a wide range of ADP values are observed (sometimes values close to zero), and even the Q-score that is used in EMDB for evaluating the fit of the model with the map depends on the primary map deposited and the nature of B-factor sharpening60. In cryoEM model building, multiple maps are often used, and thus, the maps used in modeling and refinement should clearly be mentioned in the methods, as due to anisotropic resolution in many macromolecules, one single map may not be sufficient to explain all the details.

In modeling of ligands in macromolecules, one of the main limitations of cryoEM maps (like in crystallography) is that if the ligand binding site is of low resolution or if the bound ligand is dynamic, then ascertaining the correct conformation may prove difficult. In addition, most cryoEM structures have resolutions below 3 Å, and the representation of water molecules in the maps is limited, making it difficult to assess the role of hydration in ligand or drug binding (as shown here with the examples of enolase and mGluR). Computational methods can be used in combination with cryoEM data to address these limitations69. In contrast to crystallography, only the model undergoes refinement, not the map. Presently, the only method to indicate the presence of a ligand or ensure accurate modeling is by generating omit maps (using tools like Servalcat). Thus, there are a number of tools to help researchers build and evaluate the model, but there are several areas in model refinement where newer approaches or modifications of current approaches can be expected in the near future.

In this article, we have focused on the current approaches for modeling ligands, which include manual inspection of the ligand density, generating the ligand geometry file, followed by modeling the ligand in the cryoEM map. This is an exciting period in structural biology and drug discovery, as direct electron detectors with faster frame rates and the use of faster data acquisition70 have resulted in the obtaining of high-resolution maps (<2.8 Å) of several macromolecules often bound with small molecule ligands in a relatively short time. The recent introduction of automated ligand modeling tools such as GEMspot69 in the Schrodinger suite and EMERALD17 in the Rosetta suite, which attempts to find the most probable bound pose of the ligand while factoring in the experimental cryoEM data, holds promise to streamline and automate this process. Similar to X-ray crystallography, it is envisioned that identifying the binding modes of small-molecule ligands by cryoEM, perhaps two or more in a single day, will become a realistic possibility.

Divulgaciones

The authors have nothing to disclose.

Acknowledgements

SJ is a recipient of the PhD studentship from DAE-TIFR, and the funding is acknowledged. KRV acknowledges DBT B-Life grant DBT/PR12422/MED/31/287/2014 and the support of the Department of Atomic Energy, Government of India, under Project Identification No. RTI4006.

Materials

CCP4-8.0 Consortium of several institutes https://www.ccp4.ac.uk Free for academic users and includes Coot and list of tools developed for X-ray crystallography
CCP-EM Consortium of several institutes https://www.ccpem.ac.uk/download.php Free for academic users and includes Coot, Relion and many others
Coot  Paul Emsley, LMB, Cambridge https://www2.mrc-lmb.cam.ac.uk/personal/pemsley/coot/ General software for model building but also available with other suites described above
DockinMap (Phenix) Consortium of several institutes https://phenix-online.org/documentation/reference/dock_in_map.html Software inside the Phenix suite for docking model into cryoEM maps
Electron Microscopy Data Bank  Consortium of several institutes https://www.ebi.ac.uk/emdb/ Public Repository for Electron Microscopy maps
Falcon Thermo Fisher Scientific  https://assets.thermofisher.com/TFS-Assets/MSD/Technical-Notes/Falcon-3EC-Datasheet.pdf Commercial, camera from Thermo Fisher 
Phenix  Consortium of several institutes https://phenix-online.org/download Free for academic users and includes Coot
Protein Data Bank Consortium of several institutes https://rcsb.org Public database of macromolecular structures
Pymol Schrodinger https://pymol.org/2/ Molecular viusalization tool. Educational version is free but comes with limitation. The full version can be obtained with a small fee.
Relion MRC-LMB, Cambridge https://relion.readthedocs.io/en/release-4.0/Installation.html Software for cryoEM image processing, also available with CCP-EM
Titan Krios Thermo Fisher Scientific  https://www.thermofisher.com/in/en/home/electron-microscopy/products/transmission-electron-microscopes/krios-g4-cryo-tem.html?cid=msd_ls_xbu_xmkt_tem-krios_285811_gl_pso_gaw_tpne1c&
gad_source=1&gclid=CjwKCAiA-P-rBhBEEiwAQEXhHyw5c8MKThmdA
AkZesWC4FYQSwIQRk
ZApkj08MfYG040DtiiuL8
RihoCebEQAvD_BwE
Commercial, cryoTEM from Thermo Fisher
UCSF Chimera UCSF, USA https://www.cgl.ucsf.edu/chimera/download.html General purpose software for display, analysis and more
UCSF Chimera X UCSF, USA https://www.cgl.ucsf.edu/chimerax/ General purpose software for display, analysis and more

Referencias

  1. Steinbrecher, T., Labahn, A. Towards accurate free energy calculations in ligand protein-binding studies. Curr Med Chem. 17, 767-785 (2010).
  2. Fu, Y., Zhao, J., Chen, Z. Insights into the molecular mechanisms of protein-ligand interactions by molecular docking and molecular dynamics simulation: a case of oligopeptide binding protein. Comput Math Methods Med. 4, 3502514 (2018).
  3. Grinter, S. Z., Zou, X. Challenges, applications, and recent advances of protein-ligand docking in structure-based drug design. Molecules. 19 (7), 10150-10176 (2014).
  4. Henderson, R., Hasnain, S. Cryo-EM’: electron cryomicroscopy, cryo electron microscopy or something else. IUCrJ. 10 (5), 519-520 (2023).
  5. Rosenthal, P. B. A potential difference for single-particle cryo-EM. IUCrJ. 6, 988-989 (2019).
  6. Wang, J., Moore, P. B. On the interpretation of electron microscopic maps of biological macromolecules. Prot Sci. 26 (1), 122-129 (2017).
  7. Murshudov, G. N. Refinement of atomic structures against cryo-EM Maps. Meth Enzymol. 579, 277-305 (2016).
  8. Kulik, M., Chodkiewicz, M. L., Dominiak, P. M. Theoretical 3D electron diffraction electrostatic potential maps of proteins modeled with a multipolar pseudoatom data bank. Acta Crystallogr D Struct Biol. 78 (8), 1010-1020 (2022).
  9. Yonekura, K., Kato, K., Ogasawara, M., Tomita, M., Toyoshima, C. Electron crystallography of ultrathin 3D protein crystals: Atomic model with charges. Proc Natl Acad Sci U S A. 112 (11), 3368-3373 (2015).
  10. Mu, X., Gillman, C., Nguyen, C., Gonen, T. An overview of microcrystal electron diffraction (MicroED). Annu Rev Biochem. 90, 431-450 (2021).
  11. Kühlbrandt, W. The resolution revolution. Science. 343 (6178), 1443-1444 (2014).
  12. Lawson, C. L., et al. EMDataBank unified data resource for 3DEM. Nucleic Acids Res. 44, 396-403 (2016).
  13. Lees, J. A., Dias, J. M., Han, S. Applications of Cryo-EM in small molecule and biologics drug design. Biochem Soc Trans. 49 (6), 2627-2638 (2021).
  14. Robertson, M. J., Meyerowitz, J. G., Skiniotis, G. Drug discovery in the era of cryo-electron microscopy. Trends Biochem Sci. 47 (2), 124-135 (2022).
  15. Muenks, A., Zepeda, S., Zhou, G., Veesler, D., DiMaio, F. Automatic and accurate ligand structure determination guided by cryo-electron microscopy maps. Nat Commun. 14, 1164 (2023).
  16. Oldfield, T. J. X-ligand: An application for the automated addition of flexible ligands into electron density. Acta Crystallogr D Biol Crystallogr. 57 (5), 696-705 (2001).
  17. Zwart, P. H., Langer, G. G., Lamzin, V. S. Modelling bound ligands in protein crystal structures. Acta Crystallogr D Biol Crystallogr. 60, 2230-2239 (2004).
  18. Terwilliger, T. C., Klei, H., Adams, P. D., Moriarty, N. W., Cohn, J. D. Automated ligand fitting by core-fragment fitting and extension into density. Acta Crystallogr D Biol Crystallogr. 62 (8), 915-922 (2006).
  19. Casañal, A., Lohkamp, B., Emsley, P. Current developments in Coot for macromolecular model building of Electron Cryo-microscopy and Crystallographic Data. Prot Sci. 29 (4), 1069-1078 (2020).
  20. Yamashita, K., Palmer, C. M., Burnley, T., Murshudov, G. N. Cryo-EM single-particle structure refinement and map calculation using Servalcat. Acta Crystallogr D Struct Biol. 77 (10), 1282-1291 (2021).
  21. Wood, C. Collaborative computational project for electron cryo-microscopy. Acta Crystallogr D Biol Crystallogr. 71 (1), 123-126 (2015).
  22. Burnley, T., Palmer, C. M., Winn, M. Recent developments in the CCP-EM software suite. Acta Crystallogr D Struct Biol. 73 (6), 469-477 (2017).
  23. Potterton, E., McNicholas, S., Krissinel, E., Cowtan, K., Noble, M. The CCP4 molecular-graphics project. Acta Crystallogr D Biol Crystallogr. 58 (11), 1955-1957 (2002).
  24. Agirre, J. The CCP4 suite: Integrative software for macromolecular crystallography. Acta Crystallogr D Struct Biol. 79 (6), 449-461 (2023).
  25. Moriarty, N. W., Grosse-Kunstleve, R. W., Adams, P. D. Electronic ligand builder and optimization workbench (eLBOW): A tool for ligand coordinate and restraint generation. Acta Crystallogr D Biol Crystallogr. 65 (10), 1074-1080 (2009).
  26. Adams, P. D. A comprehensive Python-based system for macromolecular structure solution. Acta Crystallogr D Biol Crystallogr. 66 (2), 213-221 (2010).
  27. Emsley, P., Cowtan, K. Coot: Model-building tools for molecular graphics. Acta Crystallogr D Biol Crystallogr. 60 (12), 2126-2132 (2004).
  28. Debreczeni, J. &. #. 2. 0. 1. ;., Emsley, P. Handling ligands with Coot. Acta Crystallogr D Biol Crystallogr. 68 (4), 425-430 (2012).
  29. Long, F., et al. AceDRG: A stereochemical description generator for ligands. Acta Crystallogr D Struct Biol. 73 (2), 112-122 (2017).
  30. Schrödinger. LigPrep. Schrödinger Release 2022-1: Schrödinger, LLC. , (2022).
  31. Brown, A. Tools for macromolecular model building and refinement into electron cryo-microscopy reconstructions. Acta Crystallogr D Biol Crystallogr. 71 (1), 136-153 (2015).
  32. Murshudov, G. N., Vagin, A. A., Dodson, E. J. Refinement of macromolecular structures by the maximum-likelihood method. Acta Crystallogr D Biol Crystallogr. 53, 240-255 (1997).
  33. Kovalevskiy, O., Nicholls, R. A., Long, F., Carlon, A., Murshudov, G. N. Overview of refinement procedures within REFMAC 5: Utilizing data from different sources. Acta Crystallogr D Struct Biol. 74 (3), 215-227 (2018).
  34. Winn, M. D., et al. Overview of the CCP4 suite and current developments. Acta Crystallogr D Biol Crystallogr. 67 (4), 235-242 (2011).
  35. Pettersen, E. F., et al. UCSF Chimera – A visualization system for exploratory research and analysis. J Comput Chem. 25 (13), 1605-1612 (2004).
  36. Pettersen, E. F., et al. UCSF ChimeraX: Structure visualization for researchers, educators, and developers. Prot Sci. 30 (1), 70-82 (2021).
  37. Lamb, A. L., Kappock, T. J., Silvaggi, N. R. You are lost without a map: Navigating the sea of protein structures. Biochim Biophys Acta Proteins Proteom. 1854 (4), 256-268 (2015).
  38. Ehinger, S., Schubert, W. D., Bergmann, S., Hammerschmidt, S., Heinz, D. W. Plasmin(ogen)-binding α-Enolase from streptococcus pneumoniae: Crystal structure and evaluation of plasmin(ogen)-binding sites. J Mol Biol. 343 (4), 997-1005 (2004).
  39. Tjia-Fleck, S., Readnour, B. M., Ayinuola, Y. A., Castellino, F. J. High-Resolution single-particle cryo-EM hydrated structure of streptococcus pyogenes enolase offers insights into its function as a plasminogen receptor. Bioquímica. 62 (3), 735-746 (2023).
  40. Pfab, J., Si, D. Automated threshold selection for cryo-EM density maps. ACM-BCB 2019. Proceedings of the 10th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics. , 161-166 (2019).
  41. Rahi, A., et al. Enolase of Mycobacterium tuberculosis is a surface exposed plasminogen binding protein. Biochim Biophys Acta Gen Subj. 1861 (1), 3355-3364 (2017).
  42. Henderson, B., Martin, A. Bacterial virulence in the moonlight: Multitasking bacterial moonlighting proteins are virulence determinants in infectious disease. Infect Immun. 79 (9), 3476-3491 (2011).
  43. Scheres, S. H. W. A bayesian view on cryo-EM structure determination. J Mol Biol. 415 (2), 406-418 (2012).
  44. Zivanov, J., et al. New tools for automated high-resolution cryo-EM structure determination in RELION-3. Elife. 7, 42166 (2018).
  45. Mohammed, A., et al. Structural snapshots of Mycobacterium tuberculosis enolase reveal dual mode of 2PG binding and its implication in enzyme catalysis. IUCrJ. 10 (6), 738-753 (2023).
  46. Berman, H. M., et al. The Protein Data Bank. Nucleic Acids Res. 28 (1), 3-21 (2000).
  47. Nasrallah, C. Agonists and allosteric modulators promote signaling from different metabotropic glutamate receptor 5 conformations. Cell Rep. 36 (9), 109648 (2021).
  48. Doak, B. C., Norton, R. S., Scanlon, M. J. The ways and means of fragment-based drug design. Pharmacol Ther. 167, 28-37 (2016).
  49. Baker, M. Fragment-based lead discovery grows up. Nat Rev Drug Discov. 12 (1), 5-7 (2013).
  50. Kirsch, P., Hartman, A. M., Hirsch, A. K. H., Empting, M. Concepts and core principles of fragment-based drug design. Molecules. 24 (3), 4309 (2019).
  51. Bartesaghi, A., et al. 2.2 Å resolution cryo-EM structure of β-galactosidase in complex with a cell-permeant inhibitor. Science. 348 (6239), 1147-1151 (2015).
  52. Saur, M. Fragment-based drug discovery using cryo-EM. Drug Discov Today. 25 (3), 485-490 (2020).
  53. Maki-Yonekura, S., Kawakami, K., Takaba, K., Hamaguchi, T., Yonekura, K. Measurement of charges and chemical bonding in a cryo-EM structure. Commun Chem. 6 (1), 98 (2023).
  54. Yip, K. M., Fischer, N., Paknia, E., Chari, A., Stark, H. Atomic-resolution protein structure determination by cryo-EM. Nature. 587 (7832), 157-161 (2020).
  55. Nakane, T. Single-particle cryo-EM at atomic resolution. Nature. 587 (7832), 152-156 (2020).
  56. Pintilie, G. Measurement of atom resolvability in cryo-EM maps with Q-scores. Nat Methods. 17 (3), 328-334 (2020).
  57. Punjani, A., Rubinstein, J. L., Fleet, D. J., Brubaker, M. A. CryoSPARC: Algorithms for rapid unsupervised cryo-EM structure determination. Nat Methods. 14 (3), 290-296 (2017).
  58. Koehl, A. Structural insights into the activation of metabotropic glutamate receptors. Nature. 566 (7742), 79-84 (2019).
  59. Scheres, S. H. W. Classification of structural heterogeneity by maximum-likelihood methods. Meth Enzymol. 482, 295-320 (2010).
  60. Carugo, O. Atomic displacement parameters in structural biology. Amino Acids. 50 (7), 775-786 (2018).
  61. Wlodawer, A., Li, M., Dauter, Z. High-resolution Cryo-EM maps and models: A crystallographer’s perspective. Structure. 25 (10), 1589-1597 (2017).
  62. Masmaliyeva, R. C., Babai, K. H., Murshudov, G. N. Local and global analysis of macromolecular atomic displacement parameters. Acta Crystallogr D Struct Biol. 76 (10), 926-937 (2020).
  63. Merritt, E. A. To B or not to B: A question of resolution. Acta Crystallogr D Biol Crystallogr. 68 (4), 468-477 (2012).
  64. Afonine, P. V., et al. Real-space refinement in PHENIX for cryo-EM and crystallography. Acta Crystallogr D Struct Biol. 74 (60), 531-544 (2018).
  65. Roberts on, M. J., van Zundert, G. C. P., Borrelli, K., Skiniotis, G. GemSpot: A pipeline for robust modeling of ligands into Cryo-EM maps. Structure. 28 (6), 707-716 (2020).
  66. Weis, F., Hagen, W. J. H. Combining high throughput and high quality for cryo-electron microscopy data collection. Acta Crystallogr D Struct Biol. 76 (8), 724-728 (2020).

Play Video

Citar este artículo
Jha, S., Bose, S., Vinothkumar, K. R. Modeling Ligands into Maps Derived from Electron Cryomicroscopy. J. Vis. Exp. (209), e66310, doi:10.3791/66310 (2024).

View Video