This paper presents a protocol for processing cryo-EM images using the software suite SPHIRE. The present protocol can be applied for nearly all single particle EM projects that target near-atomic resolution.
SPHIRE (SPARX for High-Resolution Electron Microscopy) is a novel open-source, user-friendly software suite for the semi-automated processing of single particle electron cryo-microscopy (cryo-EM) data. The protocol presented here describes in detail how to obtain a near-atomic resolution structure starting from cryo-EM micrograph movies by guiding users through all steps of the single particle structure determination pipeline. These steps are controlled from the new SPHIRE graphical user interface and require minimum user intervention. Using this protocol, a 3.5 Å structure of TcdA1, a Tc toxin complex from Photorhabdus luminescens, was derived from only 9500 single particles. This streamlined approach will help novice users without extensive processing experience and a priori structural information, to obtain noise-free and unbiased atomic models of their purified macromolecular complexes in their native state.
After the development of the direct electron detector technology, the remarkable progress in single particle cryo-EM is currently reshaping structural biology 1. Compared with X-ray crystallography, this technique requires only a small amount of protein material without the need for crystallization, while simultaneously posing fewer restrictions regarding purity of the sample and still allowing determination of structures at near-atomic resolution. Importantly, different compositions or states can now be computationally separated and structure determination of the different conformations can be carried out at unprecedented level of detail. Recently, density maps of challenging molecules could be produced at resolutions allowing de novo model building and thus deep understanding of their mode of action 2,3,4,5.
A wide variety of image processing software packages are available in the 3DEM (3D Electron Microscopy) community (https://en.wikibooks.org/wiki/Software_Tools_For_Molecular_Microscopy) and most of them are under continuous development. Near-atomic resolution has been achieved for proteins exhibiting various molecular weights and symmetries with several different software packages, including EMAN2 6, IMAGIC7, FREALIGN 8, RELION 9, SPIDER 10, and SPARX 11. Each package requires a different level of user expertise and provides a different level of user guidance, automation and extensibility. Moreover, whereas some programs provide complete environments to facilitate all steps of image analysis, others are designed to optimize specific tasks, such as the refinement of alignment parameters starting from a known reference structure. More recently, several platforms have been developed, including APPION 12 and SCIPION 13, that provide a single processing pipeline which integrates approaches and protocols from the different software packages listed above.
To contribute to the current development of cryo-EM, SPARX was re-developed into a new stand-alone and complete platform for single particle analysis, called SPHIRE (SPARX for High-Resolution Electron Microscopy). In order to increase accessibility of the technique for new researchers in the field and to cope with the large amount of data produced by modern fully-automated high-end electron microscopes, the processing pipeline was redesigned and simplified by introducing an easy-to-use Graphical User Interface (GUI) and automating the major steps of the workflow. Moreover, new algorithms were added to allow fast, reproducible and automated structure determination from cryo-EM images. Furthermore, validation by reproducibility was introduced in order to avoid common artifacts produced during refinement and heterogeneity analysis.
Although the program was extensively modified, its appreciated core features were maintained: straightforward open-source code, the modern object-oriented design and Python interfaces for all basic functions. Thus, it was not changed into a black box program, enabling users to study and easily modify the Python code, to create additional applications or modify the overall workflow. This is especially useful for non-standard cryo-EM projects.
Here we present a protocol for obtaining a near-atomic resolution density map from cryo-EM images using the GUI of SPHIRE. It describes in detail all steps required to generate a density map from raw cryo-EM direct detector movies and is not restricted to any particular macromolecule type. This protocol primarily intends to guide newcomers in the field through the workflow and provide important information about crucial steps of the processing as well as some of the possible pitfalls and obstacles. More advanced features and the theoretical background behind SPHIRE will be described elsewhere.
NOTE: To follow this protocol, it is necessary to properly install SPHIRE on a system with an MPI installation (currently, a Linux cluster). Download SPHIRE and the TcdA1 dataset from http://www.sphire.mpg.de and follow the installation instructions: http://sphire.mpg.de/wiki/doku.php?id=howto:download. This procedure also installs EMAN2. SPHIRE currently uses EMAN2’s e2boxer for particle selection and e2display for displaying image files. For dose-weighted motion correction of the raw micrograph movies, SPHIRE uses unblur 14. Download the program and follow the installation instructions (http://grigoriefflab.janelia.org/unblur, Grigorieff lab). For interactive visualization of the resulting structures, the protocol will use the molecular graphics program Chimera 15 (https://www.cgl.ucsf.edu/chimera/download.html). A nice tutorial to get familiar with the features used throughout this protocol can be found here: https://www.cgl.ucsf.edu/chimera/data/tutorials/eman07/chimera-eman-2007.html. Instructions on how to submit a parallel job to a cluster from the SPHIRE GUI can be found here: http://sphire.mpg.de/wiki/doku.php?id=howto:submissions. The overall organization of the SPHIRE GUI and the major steps of the workflow performed throughout this protocol are illustrated in Figure 1.
1. PROJECT: Set Constant Parameter Values for This Project
2. MOVIE: Align the Frames of Each Movie Micrograph to Correct the Overall Motion of the Sample
3. CTER: Estimate the Defocus and Astigmatism Parameters of the CTF
4. WINDOW: Extract Particles from the Dose-weighted Average Micrographs
5. ISAC: Classification of Particle Images in 2D
6. VIPER: Calculate an Initial 3D Model
7. MERIDIEN: Refine the Initial 3D Volume
8. SORT3D: Sort 3D Heterogeneity by Focusing on the Highly Variable Regions
9. LOCALRES: Estimate the Local Resolution of the Final 3D Volume
The protocol described above was executed starting from 112 direct detector movies of the A component of the Photorhabdus luminescens Tc complex (TcdA1) 20,21,22. This dataset was recorded on a Cs-corrected electron cryo-microscope with a high-brightness field emission gun (XFEG), operated at an acceleration voltage of 300 kV. The images were acquired automatically with a total dose of 60 e–/Å-2 at a pixel size of 1.14 Å on the specimen scale. After alignment of the movie frames (Protocol Step 2), the resulting motion-corrected averages had isotropic Thon rings extending to high-resolution (Figure 2a). The individual particles were easily visible and well separated (Figure 2b). Particles were then picked using the swarm tool of e2boxer 18 (Protocol Step 4.1). In this case, an appropriate threshold was set using the more selective option (Figure 2c). The 112 digital micrographs yielded 9,652 particles. The majority of the extracted images (Protocol Step 4.2) contained well-defined particles and their box size was ~1.5 times larger than the particle size, as recommended (Figure 2d). Next, using ISAC, a 2D heterogeneity analysis was performed (Protocol Step 5). It yielded 98 class averages (Figure 3a). Using these 2D class averages, an ab initio model was calculated using VIPER (Protocol Step 6) at intermediate resolution (Figure 3b). This model shows excellent agreement with the crystal structure of TcdA1 previously solved at 3.9 Å resolution 22 (Figure 3c). This ab initio model was used as an initial template for the 3D refinement (MERIDIEN), yielding a 3.5 Å (0.143 criterion) reconstruction (Protocol Step 7) from only ~40,000 asymmetric units (Figure 4). This near-atomic resolution map was obtained within 24 h, using up to 96 CPUs for the steps of the workflow that benefit from multiple cores.
For the 3D variability analysis (Protocol step 8), only 2,000 particle images per group were used in step 8.3.3 (i.e. the process starts with 5 initial 3D groups) and 200 images for the smallest group size in step 8.3.4 due to the small number of particles (~10,000). The analysis revealed localized flexibility mainly at the N-terminal region of the complex that contains the His tag used for purification (Figure 5a). Indeed, twelve N-terminal residues and the His tag were not resolved in the previously published crystal structure of TcdA1 22 and this most probably disordered region remained unresolved in the present cryo-EM density, likely due to its flexibility. Additional variability was detected at the receptor-binding domains and the BC-binding domain (Figure 5a). Due to the overall satisfactory resolution of the structure and the rather small size of the dataset, this heterogeneity was decided to be tolerable and therefore a focused 3D classification 23 was not performed. Finally, the local resolution of the final density map was computed (Protocol step 9.1, Figure 5b) and the sharpened 3D map was locally filtered (Protocol step 9.2). A volume of this quality can be used for de novo model building using Coot 24 or any other refinement tool (Figure 6).
Figure 1: Image Processing using SPHIRE. (a) The GUI of the SPHIRE software package. A specific step of the workflow can be activated by selecting the respective pictogram on the left side of the GUI ("workflow step"). The commands and utilities associated with this step of the workflow will appear in the central area of the GUI. After selecting one of the commands, the respective parameters are shown on the right area of the GUI. Advanced parameters usually do not require modification of the preset default values. (b) Stages in the workflow of single particle image processing using the SPHIRE GUI. Please click here to view a larger version of this figure.
Figure 2: Motion Correction and Particle Extraction. (a, b) Typical high-quality, low-dose, drift-corrected digital micrograph recorded at a defocus of 1.7 µm. Note the isotropic Thon rings extending to a resolution of 2.7 Å in the power spectrum (a) and the well discernible particles in the 2D image (b). (c) Particle selection using e2boxer. Green circles indicate selected particles. (d) Typical raw particles extracted from the dose-weighted micrograph. Scale bars = 20 nm. Please click here to view a larger version of this figure.
Figure 3: 2D Clustering and Initial Model Generation. (a) Gallery of 2D class averages, with the majority representing side views of the particle. Scale bar = 20 nm. (b) Ab initio 3D map of TcdA1 obtained using RVIPER from the reference-free class averages. (c) Rigid-body fitting of the TcdA1 crystal structure (ribbons) (pdb-id 1VW1) into the initial cryo-EM density (transparent gray). Please click here to view a larger version of this figure.
Figure 4: Cryo-EM 3D Structure of TcdA1. (a, b) Final 3.5 Å density map of TcdA1 computed using ~9,500 particle images: (a) side and (b) top view. (c) Representative areas of the cryo-EM density for an α-helix and a β-sheet. Please click here to view a larger version of this figure.
Figure 5: Variability Analysis and Local Resolution. (a) Surface of the sharpened TcdA1 cryo-EM map (gray) and the variability map (green). For better clarity, the variability map was low-pass filtered to 30 Å. (b) Surface rendering of the TcdA1 sharpened cryo-EM map colored according to local resolution (Å). Note the topological agreement between areas of high variability and low local resolution. Please click here to view a larger version of this figure.
Figure 6: 3D Model Building of TcdA1 using Coot. Representative regions of the cryo-EM density and the atomic model are shown for an α-helix. The atomic model was built de novo using Coot. Please click here to view a larger version of this figure.
Single particle cryo-EM has shown a rapid development in the recent years and delivered numerous atomic resolution structures of macromolecular complexes of major biological significance25. In order to support the large number of novice users that are currently entering the field, we developed the single particle image analysis platform SPHIRE and present here a walk-through protocol for the entire workflow including movie alignment, particle picking, CTF estimation, initial model calculation, 2D and 3D heterogeneity analysis, high-resolution 3D refinement and local resolution estimation and filtering.
The protocol described here is intended as a short guide to 3D structure determination using cryo-EM micrographs of the protein of interest and with the help of computational tools provided by the stand-alone GUI of SPHIRE.
The main feature of the workflow is that most of the procedures need to be run only once, since they rely on the concept of validation by reproducibility19 and do not require parameter tweaking. This automatic validation mechanism is a main advantage of SPHIRE over other software packages since the results tend to be objective as well as reproducible and, most importantly, obtainable at an acceptable computational cost. The pipeline provides in addition a wealth of diagnostic information for experienced users to conduct further independent validation and assessment with own methods. Nevertheless, a novice user who has at least elemental theoretical background in structural biology and electron microscopy should be able to obtain near-atomic resolution structures using own data and the automated validation procedures.
However, obtaining a near-atomic resolution structure is not always straightforward and the result will highly depend on the quality of the sample and the input data. For the procedures presented here, it is assumed that a sufficient number of high quality unaligned raw EM movies are available, with their averages showing clearly discernible homogeneous and randomly orientated single particles. In general, there are no restrictions regarding symmetry, size or overall shape of the molecule, but a low molecular weight can be a limiting factor, especially when the protein has a featureless globular shape. Usually, analysis of larger, well-ordered particles with high point-group symmetry is less demanding. Therefore, it is strongly recommend for novice users to run the present protocol first with a well-characterized cryo-EM dataset. Either the SPHIRE tutorial data (http:/sphire.mpg.de) or one of the EMPIAR submitted datasets (https://www.ebi.ac.uk/pdbe/emdb/empiar/) with raw movies are a good starting point.
When processing own data, it is very likely that some datasets or some of the images will not satisfy certain quality criteria. In this context, in addition to the automated stability and reproducibility checks, performed by the program for major steps of the workflow, it is still recommend for users to visually inspect the results at certain "checkpoints" of the protocol, especially if the final reconstruction is not satisfactory.
The first visual inspection can be done at the micrograph level after the movie alignment (Protocol step 2) and the CTF estimation (Protocol step 3). The resulting motion-corrected averages should show clearly discernible and well-separated single particles and their power spectra should show clearly discernible, isotropic Thon rings. The spatial frequency to which they are visible defines, in most cases, the highest resolution to which the structure can in principle be ultimately determined. Examples of a motion-corrected average of sufficient quality and its power spectrum are shown in the section "Representative results". Outlier images that might have a negative impact on the final result can be removed with the help of SPHIRE's Drift and CTF assessment GUI tools (http://sphire.mpg.de/wiki/doku.php).
With regard to particle screening, the crucial step in the SPHIRE pipeline is the 2D classification using ISAC (Protocol step 5.2). Here, the user should control that the reproducible 2D class averages identified automatically by the program adopt a range of orientations sufficient to quasi-evenly cover the angular space. If the quality of the class averages is not satisfactory (noisy and/or blurry images) and/or the number of reproducible class averages is very low, consider improving the auto-picking quality, optimizing dataset imaging or sample preparation. In most cases, it is not possible to calculate a reliable reconstruction from a dataset that does not yield good 2D class averages. Examples of high quality 2D class averages are shown in the section "Representative results".
At least 100 class averages are required to obtain a reliable initial 3D model using RVIPER in an automated manner (Protocol step 6.1). For this step, the user should select the averages with the highest quality and include as many different orientations of the particle as possible. The quality of the initial model is critical for the success of the subsequent high-resolution 3D refinement.
In other software packages, 3D classification is sometimes performed to remove "bad" particles8,9. However, in SPHIRE most of these particles are automatically eliminated already during 2D classification using ISAC. Thus, it is recommended to perform the computationally intensive step of 3D sorting only if the reconstruction and the 3D variability analysis indicate heterogeneity of the dataset.
Most importantly, the user should always carefully inspect the resulting 3D volumes carefully (Protocol step 9.3), and confirm that the features of the respective density agree well with the nominal resolution. At a resolution of <9 Å, rod-like densities corresponding to α-helices become visible. At a resolution <4.5 Å, densities corresponding to strands in β-sheets are normally well separated and bulky amino acids become visible. A high-resolution map (<3 Å) should show clearly discernible side chains, thus allowing building of an accurate atomic model.
Results obtained to date demonstrate that, with the help of SPHIRE's automated reproducibility tests and minimal visual inspections, the present protocol is generally applicable to any type of single particle cryo-EM project. Representative results of each processing step are shown for the reconstruction of the TcdA1 toxin of Photorhabdus luminescens 21, which has been solved to near-atomic resolution. Density maps of similar quality can be used to construct reliable atomic models by de novo backbone tracing as well as reciprocal or real-space refinement, and thus provide a solid structural framework for the understanding of complex molecular mechanisms.
ACCESION CODES:
The coordinates for the EM structure and the unprocessed movies have been deposited in the Electron Microscopy Data Bank and the Electron Microscopy Pilot Image Archive under accession numbers EMD-3645 and EMPIAR-10089, respectively.
The authors have nothing to disclose.
We thank D. Roderer for providing us TcdA1 micrographs. We thank Steve Ludtke for his ongoing support of EMAN2 infrastructure. This work was supported by funds from the Max Planck Society (to S.R.) and the European Council under the European Union’s Seventh Framework Programme (FP7/2007-2013) (grant no. 615984) (to S.R.) and grant from the National Institutes of Health R01 GM60635 to P.A.P.).
SPHIRE | Max Planck Institute of Molecular Physiology- Dortmund and Houston Medical School, Houston, Texas | http://sphire.mpg.de | |
UCSF Chimera | University of California, San Francisco | http://www.cgl.ucsf.edu/chimera/ | |
Unblur | Janelia Farm Research Campus, Ashburn | http://grigoriefflab.janelia.org/unblur | |
Coot | MRC Laboratory of Molecular Biology, Cambridge | http://www2.mrc-lmb.cam.ac.uk/personal/pemsley/coot/ | |
EMAN2 | Baylor College of Medicine, Houston | http://blake.bcm.edu/emanwiki/EMAN2 | |
Computing Cluster with 1824 cores | Max Planck Institute of Molecular Physiology | Linux Cluster with 76 nodes, each with 2 Processors Xeon E5-2670v3 12C 2.30 GHz and 128 Gb RAM | |
TITAN KRIOS electron microscope | FEI | 300 kV, Cs correction, XFEG | |
Falcon II direct electron detector | FEI | ||
EPU (automated data acquisition software) | FEI | https://www.fei.com/software/epu/ |