An advanced particle selection method for cryo-EM, namely CryoSieve, improves density map resolution by removing a majority of particles in final stacks, as demonstrated through its application on a real-world dataset.
Over the past decade, advancements in technology and methodology within the field of cryogenic electron microscopy (cryo-EM) single-particle analysis (SPA) have substantially improved our capacity for high-resolution structural examination of biological macromolecules. This advancement has ushered in a new era of molecular insights, replacing X-ray crystallography as the dominant method and providing answers to longstanding questions in biology. Since cryo-EM does not depend on crystallization, which is a significant limitation of X-ray crystallography, it captures particles of varying quality. Consequently, the selection of particles is crucial, as the quality of the selected particles directly influences the resolution of the reconstructed density map. An innovative iterative approach for particle selection, termed CryoSieve, significantly improves the quality of reconstructed density maps by effectively reducing the number of particles in the final stack. Experimental evidence shows that this method can eliminate the majority of particles in final stacks, resulting in a notable enhancement in the quality of density maps. This article outlines the detailed workflow of this approach and showcases its application on a real-world dataset.
Cryogenic electron microscopy (cryo-EM) single-particle analysis (SPA) has become a dominant method to determine high-resolution three-dimensional density maps of biological macromolecules. Due to a series of technological innovations1,2,3,4,5,6, named resolution revolution7, cryo-EM has the capability to determine the structures of biological macromolecules with up to atomic resolution at an unprecedented rate. This breakthrough marks the beginning of a new era in molecular insights, overtaking X-ray crystallography as the predominant technique and answering longstanding biological questions.
Cryo-EM SPA diverges from X-ray crystallography by not requiring the crystallization of biological macromolecules. Instead, a solution containing the target biological macromolecules is rapidly frozen in vitreous ice. It is then imaged with an electron beam to produce a series of micrographs, bypassing the need for crystallization8. Subsequently, particle-picking algorithms are utilized to extract individual raw particles from these micrographs4,9,10,11,12. As cryo-EM does not depend on crystallization, it is natural that extracted particles are predominantly damaged or in undesired conformational states, necessitating multiple rounds of particle selection to achieve a high-resolution density map. In cryo-EM SPA image processing, particle selection is therefore crucial for obtaining high-resolution density maps13.
In cryo-EM SPA, standard particle selection methods include two-dimensional (2D) and three-dimensional (3D) classification14. 2D classification categorizes particles into a predefined number of groups, yielding an average image and an estimated 2D resolution for each class. Researchers can then visually inspect these classes, removing particles from lower resolution groups to use the remaining ones in reconstructions aimed at achieving higher resolution. Once particle poses are established using refinement algorithms, researchers will proceed with 3D classification, clustering particles into multiple classes. This enables visual inspection of the reconstructed density map for each class, allowing for the exclusion of undesirable particles, such as those from undesired conformations. Following multiple rounds of classification, a final stack comprising relatively high-quality particles is obtained. These final stacks are instrumental in producing atomic or near-atomic resolution density maps.
Zhu and her colleagues have demonstrated that further particle selection can be conducted on these final stacks15. CryoSieve15, an innovative iterative method for particle selection, can be applied to enhance the quality of the final density map by significantly reducing the number of particles. While other particle sorting criteria and software, such as the normalized cross-correlation (NCC) method16, the angular graph consistency (AGC) approach17, and non-alignment classification5, are currently in use within the field, this method has been shown to outperform these algorithms in terms of effectiveness.
In this study, we present a detailed guide to the entire process. As a case study, we applied this new method to the dataset of the influenza hemagglutinin trimer (EMPIAR entry: 10097)18, which includes 130,000 particles in its final stack. Our procedure successfully discarded about 73.8% of the particles from the final stack of this dataset, improving the resolution of the reconstructed density map from 4.11 Å to 3.62 Å. In addition to the influenza hemagglutinin trimer, results from multiple datasets are presented in earlier publication15, showcasing a variety of resolutions and molecular weights of biomolecules.
1. Installation
2. Particle sieving
3. Finding the optimal iteration
In this protocol, we utilized the influenza hemagglutinin trimer dataset (EMPIAR entry: 10097) as a demonstration of the efficacy of this process. Due to the preferred orientation of the sample, data acquisition required tilting at 40°. The protein exhibits C3 symmetry and has a molecular weight of 150 kDa.
We have implemented the protocol described earlier to process the final particle stack. It progressively removed 20% of the particles in each iteration, resulting in a retention ratio of 80.0%, 64.0%, 51.2%, and so on. As depicted in Figure 1 and Figure 2, the resolution of the retained particles initially improved but eventually decreased. Among the iterations, the 6th iteration was identified as the most optimal subset, containing the fewest particles yet achieving the highest resolution. Our algorithm successfully identified a subset of particles comprising only 26.2% of the original stack, resulting in an improved resolution from 4.19 Å to 3.62 Å (re-estimated by CryoSPARC), shown in Figure 2. Furthermore, density maps before and after using CryoSieve were compared in Figure 3. Model-to-map Fourier Shell Correlation (FSC) curve and half-maps FSC curve of the reconstructed density maps before and after the method are also shown (Figure 3A-B). Raw density maps and sharp density maps obtained were also compared, with the equivalent contour level applied (Figure 3C). The side chains of sharp density maps were compared, showing the enhancement of reconstructed density maps. The estimated Rosenthal-Henderson B-factor was also adopted for the criteria of particle quality19. After removing the majority of particles in the final stack, the Rosenthal-Henderson B-factor raised from 226.9 Å2 to 146.2 Å2 (Figure 3D). Local resolution, local B-factor20, and ResLog21 were also utilized for comparison, indicating that CryoSieve indeed enhances both the quality of the density maps and the particles (Figure 4).
Figure 1: Resolutions of each iteration. Resolutions that have been reported are highlighted in red boxes. Please click here to view a larger version of this figure.
Figure 2: Resolutions of each iteration. Resolutions identified by homogeneous refinement jobs are highlighted in red boxes. Please click here to view a larger version of this figure.
Figure 3: Density maps. (A) Comparison of model-to-map FSC curve of reconstructed density maps before and after using CryoSieve. The y-axis represents FSC, while the x-axis represents resolution. The red dashed line marks the threshold of 0.5 for the FSC. The vertical dashed line illustrates the resolution of the density maps obtained under a threshold of 0.5. (B) Half-maps FSC curve were obtained from reconstructed density maps before and after using CryoSieve via CryoSPARC. The y-axis represents FSC, while the x-axis represents resolution. (C) Raw density maps and sharp density maps were shown for both the CryoSieve-retained particles and the complete set of particles in the final stacks. The equivalent contour level of 0.65 was applied for raw density maps. The equivalent contour level of 0.84 was applied for sharp density maps. Sharp density maps were directly obtained by CryoSPARC. The sharp density maps were auto-postprocessed, first FSC-weighted (based on FSCs given by CryoSPARC). Then, the B-factor was sharpened using the auto-determined B-factors (232.0 Å2 for all particles in the final stack and 160.8 Å2 for CryoSieve). The side chains in the sharp density maps were compared, incorporating atomic models for reference. Red arrows highlight the improved regions. (D) The estimated Rosenthal-Henderson B-factor was shown for both the CryoSieve-retained particles and the complete set of particles in the final stacks. The y-axis represents the number of particles used, and the x-axis represents the reciprocal of the square of the resolution. Moving from top to bottom, each point represents half the particles of the previous one. The resolutions were determined by refinement. B-factors were determined using a least-squares approximation of the measured points, as shown by the fitting curves. The estimated Rosenthal and Henderson's B-factors are indicated in the legends: orange represents particles retained by CryoSieve, while blue denotes all particles in the final stack. Please click here to view a larger version of this figure.
Figure 4: Comparison of various metrics of density maps. (A) Comparison of local resolution maps before and after using CryoSieve obtained by CryoSPARC. The local resolution ranges between 7 Å (red) and 3.5 Å (blue). (B) Comparison of density maps before and after using CryoSieve, colored with the local B-factor map obtained by LocBFactor using a resolution range of [20-3.5] Å. (C), Comparison of ResLog plots before and after using CryoSieve obtained by CryoSPARC. Please click here to view a larger version of this figure.
Supplementary Figure 1: Using commands nvidia-smi and conda -V to verify the prerequisites. If the prerequisites are met, typing the command nvidia-smi will display the GPU driver version, the CUDA version, and the status of the GPU cards. Similarly, entering the command conda -V should correctly display the installed version of Conda. Please click here to download this File.
Supplementary Figure 2: The process of creating new GPU-acceleration environments. The screen displays the output generated by the command used to create the Conda environment. Please click here to download this File.
Supplementary Figure 3: Installation of CryoSieve in the GPU-acceleration environment. After activating the newly created Conda environment, the screen displays the output resulting from executing the command to install CryoSieve using Pip. Please click here to download this File.
Supplementary Figure 4: Help information. Please click here to download this File.
Supplementary Figure 5: Running process. Upon executing CryoSieve through the command line, the screen then displays information regarding the running process. Please click here to download this File.
Supplementary Figure 6: The configuration of CryoSPARC's jobs. (A) Import particle stack. (B) Import 3D volumes. (C-D) Homogeneous refinement. Please click here to download this File.
Supplementary File 1: Options of CryoSieve. Please click here to download this File.
Supplementary File 2: Processing time and minimal requirement for running Cryosieve. Please click here to download this File.
Supplementary File 3: Generation of initial model by CryoSPARC. Please click here to download this File.
Supplementary File 4: Rationale for disabling force re-do GS split. Please click here to download this File.
Supplementary File 5: Options of cryosieve-csrefine. Please click here to download this File.
Supplementary File 6: Options of cryosieve-csrhbfactor. Please click here to download this File.
Cryo-EM stands as a pivotal technique for elucidating the structures of biological molecules. In this process, after data collection via microscopy, particle extraction from micrographs is essential, followed by their classification in multiple stages to compile the final stack. A common challenge is the predominance of damaged or undesirably conformed particles, underscoring the need for repeated particle selection to attain high-resolution density maps. This makes particle selection a critical step in cryo-EM SPA for achieving high-quality density maps. Existing particle selection techniques include the statistical non-tilt validation algorithm22, the z-score-based approach23, and the angular accuracy estimation method24.
CryoSieve emerges as a valuable tool in this context, adept at eliminating a significant number of extraneous particles from the final stack. This reduction not only enhances the reconstruction's computational efficiency but also streamlines the process. It offers a comprehensive suite for particle selection, where the extent of particle discard and the consequent improvement in resolution largely hinge on the initial data quality and the methodologies employed in data processing.
In this manuscript, we have presented a complete workflow of particle sieving using the real case dataset of influenza hemagglutinin trimer (EMPIAR entry: 10097). The steps covered and discussed here can be summarized as particle sieving and pose re-estimation. The final 3D reconstructed volume achieved a resolution of 3.62 Å, and side chains in alpha-helices were clearer in the post-processed volume compared to the published density map.
CryoSieve is an open-source method which is available on GitHub (https://github.com/mxhulab/cryosieve). A detailed tutorial can also be found on its homepage. Users can install and use it by following the tutorial. Additionally, two modules, cryosieve-csrefine and cryosieve-csrhbfactor, are provided. The cryosieve-csrefine module is specifically crafted to automate the sequential execution of various operations within CryoSPARC (Supplementary File 5). These operations include importing particle stacks and conducting ab initio, homogeneous refinement, or non-uniform refinement jobs. On the other hand, the cryosieve-csrhbfactor module is designed to automate the determination of the Rosenthal-Henderson B-factor by leveraging the capabilities of cryosieve-csrefine (Supplementary File 6).
Presently, this method's application is confined to single conformation scenarios. Consequently, in instances where particles represent multiple conformations, their capabilities are limited. Users are advised to initially engage in 3D classification to segregate particles of disparate conformations before employing it for refined particle selection. Moreover, although the method demonstrates proficiency in filtering out over 50% of particles from the final stack, the origins of these discarded particles and the underlying reasons for their negligible contribution to reconstruction quality remain unclear. This gap in understanding necessitates additional research to comprehensively address and potentially rectify this limitation.
There are three possible existing methods of particle sorting or particle sieving. First of all, cisTEM4 can report a score for each single particle image after 3D refinement. Users could sort particles using the cisTEM score to discard particles. The angular graph consistency (AGC) approach17 is also a method to discard misaligned particles. Furthermore, the non-alignment classification5 is a traditional way to discard particles using 3D classification. We compared the quality of particles retained by these methods with CryoSieve and found that the retained particles of CryoSieve are of higher quality15. The method presented here significantly outperforms alternative methods and achieves the smallest number of particles at the same resolution.
As demonstrated in the result, the majority of particles in a cryo-EM final stack do not contribute to density map reconstruction. In other words, among all particles gathered during image acquisition, only a select few, namely the finest subset, actually contribute to the final reconstruction. Consequently, the ratio of this final subset to the total number of collected particles could serve as a quantitative metric for assessing sample quality. The higher this ratio, the better the sample quality. Despite technical advancements that have made cryo-EM more accessible to structural biologists, sample preparation remains a major bottleneck in the workflow. Scientists and engineers are thus focusing their efforts on this challenge25. In single-particle analysis (SPA), sample preparation consists of two crucial steps: sample optimization and grid preparation. The former involves purifying the specimen while maintaining its optimal biochemical state. The latter entails preparing the sample for analysis in the microscope, including chemical or plasma treatment of the grid, sample deposition, and vitrification. Numerous techniques have been proposed to address macromolecular instability, but the efficacy of one approach over another depends on the sample's characteristics25,26. Currently, grid preparation results are heavily influenced by the user's expertise and experience, which can make the process time-consuming and challenging27,28. The numerous variables encountered in sample and grid preparation pose challenges in establishing cause-and-effect relationships, as researchers can only assess the sample at the molecular level using the microscope. As a result, quantitative statistics from comparisons of different sample and grid preparation protocols are still lacking, and a systematic approach is necessary to investigate trends and comprehend the fundamental mechanisms of sample behavior29.
The authors have nothing to disclose.
This work was supported by Shenzhen Academy of Research and Translation (to M.H.), the Advanced Innovation Center for Structural Biology (to M.H.), the Beijing Frontier Research Center for Biological Structure (to M.H.), the National Key R&D Program of China (No.2021YFA1001300) (to C.B.), the National Natural Science Foundation of China (No.12271291) (to C.B.), and the National Natural Science Foundation of China (No.12071244) (to Z.S.).
CryoSPARC | Structura Biotechnology Inc. Toronto, Canada | CryoSPARC (Cryo-EM Single Particle Ab-Initio Reconstruction and Classification) is a state of the art HPC software solution for complete processing of single-particle cryo-electron microscopy (cryo-EM) data. CryoSPARC is useful for solving cryo-EM structures of membrane proteins, viruses, complexes, flexible molecules, small particles, phase plate data and negative stain data. | |
EMPIAR-10097 Dataset | https://ftp.ebi.ac.uk/empiar/world_availability/10097/data/Particle-Stack/T40_HA_130K-Equalized-Particle-Stack.mrcs | This dataset comprises single-particle cryo-EM data of the Influenza Hemagglutinin trimer, characterized by its highly preferred orientation, collected using a 40-degree tilted collection strategy. | |
initial.mrc | https://github.com/mxhulab/cryosieve-demos/tree/master/EMPIAR-10097 | ||
mask.mrc | https://github.com/mxhulab/cryosieve-demos/tree/master/EMPIAR-10097 | ||
RELION | 4.0-beta-2 | RELION (REgularised LIkelihood OptimisatioN) is an open-source software for cryo-electron microscopy (cryo-EM) data processing, particularly for refining macromolecular structures. Utilizing a Bayesian approach, it excels in separating signal from noise, enabling high-resolution structure determination. RELION supports single-particle analysis, tomography, and sub-tomogram averaging, and has become widely used in structural biology due to its effectiveness and user-friendly interface. | |
T40_HA_130K-Equalized_run-data_CryoSPARC_refined.star | https://github.com/mxhulab/cryosieve-demos/tree/master/EMPIAR-10097 | Metadata file for the final stack of particles from EMPIAR-10097 |