Single-particle analysis in cryo-electron microscopy is one of the main techniques used to determine the structure of biological ensembles at high resolution. Scipion provides the tools to create the whole pipeline to process the information acquired by the microscope and achieve a 3D reconstruction of the biological specimen.
Cryo-electron microscopy has become one of the most important tools in biological research to reveal the structural information of macromolecules at near-atomic resolution. In single-particle analysis, the vitrified sample is imaged by an electron beam and the detectors at the end of the microscope column produce movies of that sample. These movies contain thousands of images of identical particles in random orientations. The data need to go through an image processing workflow with multiple steps to obtain the final 3D reconstructed volume. The goal of the image processing workflow is to identify the acquisition parameters to be able to reconstruct the specimen under study. Scipion provides all the tools to create this workflow using several image processing packages in an integrative framework, also allowing the traceability of the results. In this article the whole image processing workflow in Scipion is presented and discussed with data coming from a real test case, giving all the details necessary to go from the movies obtained by the microscope to a high resolution final 3D reconstruction. Also, the power of using consensus tools that allow combining methods, and confirming results along every step of the workflow, improving the accuracy of the obtained results, is discussed.
In cryo-electron microscopy (cryo-EM), single particle analysis (SPA) of vitrified frozen-hydrated specimens is one of the most widely used and successful variants of imaging for biological macromolecules, as it allows to understand molecular interactions and the function of biological ensembles1. This is thanks to the recent advances in this imaging technique that gave rise to the “resolution revolution”2 and have allowed the successful determination of biological 3D structures with near-atomic resolution. Currently, the highest resolution achieved in SPA cryo-EM was 1.15 Å for apoferritin3 (EMDB entry: 11668). These technological advances comprise improvements in the sample preparation4, the image acquisition5, and the image processing methods6. This article is focused on this last point.
Briefly, the goal of the image processing methods is to identify all the acquisition parameters to invert the imaging process of the microscope and recover the 3D structure of the biological specimen under study. These parameters are the gain of the camera, the beam-induced movement, the aberrations of the microscope (mainly the defocus), the 3D angular orientation and translation of each particle, and the conformational state in case of having a specimen with conformational changes. However, the number of parameters is very high and cryo-EM requires using low-dose images to avoid radiation damage, which significantly reduces the Signal-to-Noise Ratio (SNR) of the acquired images. Thus, the problem cannot be unequivocally solved and all the parameters to be calculated only can be estimations. Along the image processing workflow, the correct parameters should be identified, discarding the remaining ones to finally obtain a high-resolution 3D reconstruction.
The data generated by the microscope are gathered in frames. Simplifying, a frame contains the number of electrons that have arrived at a particular position (pixel) in the image, whenever electron-counting detectors are used. In a particular field of view, several frames are collected and this is called a movie. As low electron doses are used to avoid radiation damage that could destroy the sample, the SNR is very low and the frames corresponding to the same movie need to be averaged to obtain an image revealing structural information about the sample. However, not only a simple average is applied, the sample can suffer shifts and other kinds of movements during the imaging time due to the beam-induced movement that need to be compensated. The shift-compensated and averaged frames originate a micrograph.
Once the micrographs are obtained, we need to estimate the aberrations introduced by the microscope for each of them, called Contrast Transfer Function (CTF), which represents the changes in the contrast of the micrograph as a function of frequency. Then, the particles can be selected and extracted, which is called particle picking. Every particle should be a small image containing only one copy of the specimen under study. There are three families of algorithms for particle picking: 1) the ones that only use some basic parameterization of the appearance of the particle to find them in the whole set of micrographs (e.g., particle size), 2) the ones that learn how the particles look like from the user or a pretrained set, and 3) the ones that use image templates. Each family has different properties that will be shown later.
The extracted set of particles found in the micrographs will be used in a 2D classification process that has two goals: 1) cleaning the set of particles by discarding the subset containing pure noise images, overlapping particles, or other artifacts, and 2) the averaged particles representing each class could be used as initial information to calculate a 3D initial volume.
The 3D initial volume calculation is the next crucial step. The problem of obtaining the 3D structure can be seen as an optimization problem in a multidimensional solution landscape, where the global minimum is the best 3D volume that represents the original structure, but several local minima representing suboptimal solutions can be found, and where it is very easy to get trapped. The initial volume represents the starting point for the searching process, so bad initial volume estimation could prevent us to find the global minimum. From the initial volume, a 3D classification step will help to discover different conformational states and to clean again the set of particles; the goal is to obtain a structurally homogeneous population of particles. After that, a 3D refinement step will be in charge of refining the angular and translation parameters for every particle to get the best 3D volume possible.
Finally, in the last steps, the obtained 3D reconstruction can be sharpened and polished. Sharpening is a process of boosting the high frequencies of the reconstructed volume, and the polishing is a step to further refine some parameters, as CTF or beam-induced movement compensation, at the level of particles. Also, some validation procedures could be used to better understand the achieved resolution at the end of the workflow.
After all these steps, the tracing and docking processes7 will help to give a biological meaning to the obtained 3D reconstruction, by building atomic models de novo or fitting existing models. If high resolution is achieved, these processes will tell us the positions of the biological structures, even of the different atoms, in our structure.
Scipion8 allows creating the whole workflow combining the most relevant image processing packages in an integrative way. Xmipp9, Relion10, CryoSPARC11, Eman12, Spider13, Cryolo14, Ctffind15, CCP416, Phenix17, and many more packages can be included in Scipion. Also, it incorporates all the necessary tools to benefit the integration, interoperability, traceability, and reproducibility to make a full tracking of the entire image-processing workflow8.
One of the most powerful tools that Scipion allows us to use is the consensus, which means to compare the results obtained with several methods in one step of the processing, making a combination of the information conveyed by different methods to generate a more accurate output. This could help to boost the performance and improve the achieved quality in the estimated parameters. Note that a simpler workflow can be build without the use of consensus methods; however, we have seen the power of this tool22,25 and the workflow presented in this manuscript will use it in several steps.
All the steps that have been summarized in the previous paragraphs will be explained in detail in the following section and combined in a complete workflow using Scipion. Also, how to use the consensus tools to achieve a higher agreement in the generated outputs will be shown. To that end, the example dataset of the Plasmodium falciparum 80S Ribosome has been chosen (EMPIAR entry: 10028, EMDB entry: 2660). The dataset is formed by 600 movies of 16 frames of size 4096×4096 pixels at a pixel size of 1.34Å taken at an FEI POLARA 300 with an FEI FALCON II camera, with a reported resolution at EMDB is 3.2Å18 .
1. Creating a project in Scipion and importing the data
2. Movie alignment: from movies to micrographs
3. CTF estimation: calculating the aberrations of the microscope
4. Particle picking: finding particles in the micrographs
5. 2D classification: grouping similar particles together
6. Initial volume estimation: building the first guess of the 3D volume
7. 3D classification: discovering conformational states
8. 3D refinement: refining angular assignments of a homogeneous population
9. Evaluation and post-processing
We have used the dataset of the Plasmodium falciparum 80S Ribosome (EMPIAR entry: 10028, EMDB entry: 2660) to conduct the test and, with the Scipion protocol presented in the previous section, a high resolution 3D reconstructed volume of the macromolecule in this particular example has be achieved, beginning with the information gathered by the microscope that consist of very noisy images containing 2D projections in any orientation of the specimen.
The main results obtained after running the whole protocol are presented in Figure 10, Figure 11, and Figure 12. Figure 10 represents the obtained 3D volume before post-processing. In Figure 10a, an FSC of 3 Å can be seen, that it is very close to the Nyquist limit (with data with a pixel size of 1.34 Å, the Nyquist limit is 2.6 Å). Figure 10b shows some slices of the reconstructed 3D volume with high levels of details and well-defined structures. In Figure 11 the results after locally analyze the resolution of the obtained 3D volume are presented. It can be seen that most of the voxels in the structure achieve a resolution below 3 Å, mainly those located in the central part of the structure. However, the outer part shows worse resolutions, what is consistent with the blurring appearing in those areas in the slices of Figure 10b. Figure 12 shows the same 3D map after post-processing that is able to highlight the higher frequencies of the volume, revealing more details and improving the representation, which can be seen especially in the 3D presentation in Figure 12c.
In Figure 14, Chimera26 was used to see a 3D representation of the obtained volume (Figure 14a), the post-processed (Figure 14b), and the resolution map (Figure 14c), colored with the color code of the local resolutions. This can give even more information about the obtained structure. This tool is very useful to gain an insight into the quality of the obtained volume, as very small details in the whole 3D context of the structure can be seen. When the achieved resolution is enough, even some biochemical parts of the structure can be found (e.g., alpha-helices in Figure 14d. In this figure, it must be highlighted the high resolution achieved in all the central parts of the 3D structure, which can be seen as the dark blue areas in Figure 14c.
All the previous results were achieved thanks to a good performance of the whole protocol, but this might be not the case. There are several ways to identify a bad behavior. In the most general case, this happens when the obtained structure has low resolution and it is not able to evolve to a better one. One example of this is presented in Figure 15. A blurred volume (Figure 15c) results in a low FSC, which can be seen in the FSC curve (Figure 15a) and the histogram of the local estimation (Figure 15b). This example was generated using a 3D refinement method with incorrect input data, as it was expecting some specific properties in the input set of particles that they do not fulfill. As can be seen, it is always very important to know how the different methods expect to receive the data and prepare it properly. In general, when an output like the one in Figure 15 is obtained, there might be a problem in the processing workflow or the underlying data.
There are several checkpoints along the workflow that can be analyzed to know if the protocol evolves properly or not. For example, right after picking, several of the methods discussed earlier can rank the particles and give a score for each of them. In the case of having bad particles, these methods allow to identify and remove them. Also, the 2D classification can be a good indicator of having a bad set of particles. Figure 16 shows an example of such a bad set. In the Figure 16a, good classes containing some details of the structure are shown, while Figure 16b shows bad classes, which are noisy or uncentered, in this last case it can be seen that the picking was incorrect and two particles seem to appear together. Another checkpoint is the initial volume estimation, Figure 17 shows an example of good (Figure 17a) and bad (Figure 17b) initial estimations. The bad estimation was created using an incorrect setup for the method. It must be taken into account that all the setups should be done carefully, choosing appropriately every parameter according to the data being analyzed. In case of not having a map with some minimal structural information, the following refinement will be unable to obtain a good reconstruction.
When the problem is a bad acquisition, in which the movies do not preserve structural information, it will be impossible to extract good particles from them and get a successful processing. In that case, more movies should be collected to get a high resolution 3D reconstruction. But, if this is not the case, there are several ways to manage problems along the processing workflow. If the picking is not good enough, there are several ways to try to fix it, e.g., repeating the picking, using different methods, or trying to manually pick more particles to help the methods to learn from them. During the 2D classification, if just a few classes are good, consider also to repeat the picking process. In the initial volume estimation, try to use several methods if some of them gave inaccurate results. The same applies to the 3D refinement. Following this reasoning, in this manuscript, several consensus tools have been presented, which could be very useful to avoid problems and continue the processing with accurate data. Thanks to using a consensus among several methods, we can discard data that are difficult to pick, classify, align, etc., which probably is an indicator of poor data. However, if several methods are able to agree in the generated output, probably these data contain valuable information with which to continue processing.
We encourage the reader to download more datasets and try to process them following the recommendations presented in this manuscript and to create a similar workflow combining processing packages using Scipion. Trying to process a dataset is the best way to learn the power of the processing tools available in the state-of-the-art in Cryo-EM, to know the best rules to overcome the possible drawbacks appearing during the processing, and to boost the performance of the available methods in each specific test case.
Figure 1. Movie alignment result. (a) The main window of the results, with a list of all the micrographs generated and additional information: the power spectral density, the trajectory of the estimated alignment in polar coordinates, the same in cartesian coordinates, the filename of the generated micrograph. (b) The alignment trajectory represented in cartesian coordinates. (c) The generated micrograph. Please click here to view a larger version of this figure.
Figure 2. CTF estimation with Ctffind result. The main window with the results includes a figure with the estimated PSD (in a corner) along with the PSD coming from the data, and several defocus params. Please click here to view a larger version of this figure.
Figure 3. Manual picking windows with Xmipp. (a) The main window with the list of micrographs to process and some other parameters. (b) Manually picking particles inside a region of a micrograph. (c) and (d) Automatically picked particles to be supervised to create a set of training particles for the Xmipp auto picking method. Please click here to view a larger version of this figure.
Figure 4. Deep consensus picking with Xmipp result. The parameter zScoreDeepLearning gives weight to the goodness of a particle and it is key to discovering bad particles. (a) The lowest zScores values are associated with artifacts. (b) The highest zScores are associated with particles containing the macromolecule. Please click here to view a larger version of this figure.
Figure 5. 2D classification with Cryosparc result. The classes generated (averages of subsets of particles coming from the same orientation) are shown. Several good classes selected in red (with some level of detail) and some bad classes non-selected (noisy and uncentered classes). Please click here to view a larger version of this figure.
Figure 6. 3D initial volume with swarm consensus result. A view of the 3D initial volume obtained after running the consensus tool xmipp3 – swarm consensus, using the previous 3D initial volume estimations of Xmipp and Relion. (a) The volume is represented by slices. (b) 3D visualization of the volume. Please click here to view a larger version of this figure.
Figure 7. Refinement of a 3D initial volume with Relion result. (a) FSC curve obtained, crossing the threshold at a 4.5Å, approximately. (b) Angular coverage shown as upper view of the 3D sphere. In this case, as there is no symmetry, the assigned particles should cover the whole sphere. (c) Refined volume represented by slices. Please click here to view a larger version of this figure.
Figure 8. 3D alignment based on deep learning with Xmipp result. The results generated by xmipp3 – deep align method for 3D alignment. (a) The angular assignment for every particle in the form of transformation matrix. (b) The angular coverage. Please click here to view a larger version of this figure.
Figure 9. 3D alignment consensus result. (a) List of particles with the obtained differences in shift and angles parameters. (b) Plot of the angular differences per particle. (c) Plot of the shift difference per particle. Please click here to view a larger version of this figure.
Figure 10. Final iteration of 3D refinement result. (a) FSC curve. (b) Obtained volume at full resolution by slices. Please click here to view a larger version of this figure.
Figure 11. Local resolution analysis with Xmipp result. Results of the method xmipp3 – local MonoRes. (a) Some representative slices colored with the resolution value per voxel, as indicated in the color code. (b) Local resolution histogram. Please click here to view a larger version of this figure.
Figure 12. Sharpening with Xmipp result. Results of xmipp3 – localdeblur sharpening method. (a) List of obtained volumes per iteration. (b) 3D volume obtained after the last iteration represented by slices. (c) A 3D representation of the final volume. Please click here to view a larger version of this figure.
Figure 13. Validate overfitting tool in Xmipp result. Results of xmipp3 – validation overfitting. The green line corresponds to reconstruction from data, the red line from noise. (a) Inverse of the squared resolution with the logarithm of the number of particles. (b) Resolution with the number of particles. Please click here to view a larger version of this figure.
Figure 14. Several 3D representations of the obtained volume. (a) Pre-processed volume. (b) Post-processed volume. (c) Local resolution, dark blue voxels are those with higher resolution (2.75Å) and dark red voxels are those with lower resolution (10.05Å). (d) Zoom in the post-processed volume where an alpha-helix (red oval) can be seen. Please click here to view a larger version of this figure.
Figure 15. Example of a bad 3D reconstruction. (a) FSC curve with a sharp fall and crossing the threshold at low resolution. (b) Local resolution histogram. (c) 3D volume by slices. Please click here to view a larger version of this figure.
Figure 16. Example of 2D classes. (a) Good classes showing some level of detail. (b) Bad classes containing noise and artifacts (upper part obtained with Xmipp, lower with CryoSparc). Please click here to view a larger version of this figure.
Figure 17. Example of 3D initial volume with different qualities. (a) Good initial volume where the shape of the macromolecule can be observed. (b) Bad initial volume where the obtained shape is completely different from the expected one. Please click here to view a larger version of this figure.
Supplemental Figure 1. Creating a Scipion project. Window displayed by Scipion where an old project can be selected or a new one can be created giving a name and a location for that project. Please click here to download this File.
Supplemental Figure 2. Import movies method. Window displayed by Scipion when pwem – import movies is open. Here, the main acquisition parameters must be included to let the movies available to be processed in Scipion. Please click here to download this File.
Supplemental Figure 3. Movie alignment method. Window displayed by Scipion when xmipp3 – optical alignment is used. The input movies, the range of frames considered for alignment, and some other parameters to process the movies should be filled. Please click here to download this File.
Supplemental Figure 4. CTF estimation method with Ctffind. The form in Scipion with all the necessary fields to run the program Ctffind. Please click here to download this File.
Supplemental Figure 5. Wizard in Scipion. A wizard to help the user filling some parameters in the form. In this case, the wizard is to complete the resolution field in the grigoriefflab – ctffind method. Please click here to download this File.
Supplemental Figure 6. CTF refinement method with Xmipp. The form of xmipp3 – ctf estimation with all the parameters to make a refinement of a previously estimated CTF. Please click here to download this File.
Supplemental Figure 7. Preprocess micrographs method. The form of xmipp3 – preprocess micrographs that allows carrying out some operations over them. In this example, Remove bad pixels and Downsample micrographs is the useful one. Please click here to download this File.
Supplemental Figure 8. Picking method with Cryolo. The form to run the Cryolo picking method using a pretrained network. Please click here to download this File.
Supplemental Figure 9. Consensus picking method with Xmipp. The form of xmipp3 – deep consensus picking based on deep learning to calculate a consensus of coordinates, using a pretrained network over several sets of coordinates obtained with different picking methods. Please click here to download this File.
Supplemental Figure 10. Extract particles method. Input and preprocess tabs of xmipp3 – extract particles. Please click here to download this File.
Supplemental Figure 11. 3D initial volume method with Xmipp. The form of the method xmipp3 – reconstruct significant to obtain an initial 3D map. The Input and Criteria tabs are shown. Please click here to download this File.
Supplemental Figure 12. Resize volume method. The form to make a crop or resize of a volume. In this example, this method is used to generate a full size volume after xmipp3 – reconstruct significant. Please click here to download this File.
Supplemental Figure 13. 3D initial volume with Relion result. A view of the obtained 3D initial volume with relion – 3D initial model method by slices. Please click here to download this File.
Supplemental Figure 14. Refinement of the initial volume with Relion. The form of the method relion – 3D auto-refine. In this example, it was used to refine an initial volume estimated after consensus. The Input and Reference 3D map tabs are shown. Please click here to download this File.
Supplemental Figure 15. 3D classification method. Form of relion – 3D classification. The tabs Input, Reference 3D map, and Optimisation are shown. Please click here to download this File.
Supplemental Figure 16. 3D alignment based on a deep learning method. The form opened for the method xmipp3 – deep align. Here it is necessary to train a network with a training set, then that network will predict the angular assignment per particle. Please click here to download this File.
Supplemental Figure 17. 3D refinement method. Form of the xmipp3 – highres method. Tabs Input and Angular assignment are shown. Please click here to download this File.
Supplemental Figure 18. First iteration of 3D refinement result. (a) FSC curve. (b) Obtained volume (of a smaller size than the full resolution) represented as slices. Please click here to download this File.
Supplemental Figure 19. First iteration of 3D refinement correlation analysis. A new window appears by clicking on the bar icon in the upper part of the window with the list of particles. In Plot columns window a histogram of the desired estimated parameter can be created. Please click here to download this File.
Supplemental Figure 20. Validation overfitting tool. Form of xmipp3 – validate overfitting method. Please click here to download this File.
Currently, cryo-EM is a key tool to reveal the 3D structure of biological samples. When good data is collected with the microscope, the available processing tools will allow us to obtain a 3D reconstruction of the macromolecule under study. Cryo-EM data processing is able to achieve near-atomic resolution, which is key to understanding the functional behavior of a macromolecule and is also crucial in drug discovery.
Scipion is a software that allows creating the whole workflow combining the most relevant image processing packages in an integrative way, which helps the traceability and reproducibility of the entire image-processing workflow. Scipion provides a very complete set of tools to carry out the processing; however, obtaining high resolutions reconstructions depends completely on the quality of the acquired data and how these data is processed.
To get a high resolution 3D reconstruction, the first requirement is to obtain good movies from the microscope, which preserve structural information to high resolution. If this is not the case, the workflow will not be able to extract high definition information from the data. Then, a successful processing workflow should be able to extract particles that really correspond to the structure and to find the orientations of these particles in the 3D space. If any of the steps in the workflow fails, the quality of the reconstructed volume will be degraded. Scipion allows for using different packages in any of the processing steps, which helps to find the most adequate approach to process the data. Moreover, thanks to having many packages available, consensus tools, that boost the accuracy by finding an agreement in the estimated outputs of different methods, can be used. Also, it has been discussed in detail in the Representative Results section several validation tools and how to identify accurate and inaccurate results in every step of the workflow, to detect potential problems, and how to try to solve them. There are several checkpoints along the protocol that could help to realize if the protocol is running properly or not. Some of the most relevant are: picking, 2D classification, initial volume estimation, and 3D alignment. Checking the inputs, repeating the step with a different method, or using consensus, are options available in Scipion that the user can use to find solutions when issues appear.
Regarding the previous approaches to package integration in the Cryo-EM field, Appion31 is the only one that allows real integration of different software packages. However, Appion is tightly connected with Leginon32, a system for automated collection of images from electron microscopes. The main difference with Scipion is that data model and storage are less coupled. In such a way, to create a new protocol in Scipion, only a Python script needs to be developed. However, in Appion, the developer must write the script and change the underlying database. In summary, Scipion was developed to simplify maintenance and extensibility.
We have presented in this manuscript a complete workflow for Cryo-EM processing, using the real case dataset of the Plasmodium falciparum 80S Ribosome (EMPIAR entry: 10028, EMDB entry: 2660). The steps covered and discussed here can be summarized as movie alignment, CTF estimation, particle picking, 2D classification, initial map estimation, 3D classification, 3D refinement, evaluation, and post-processing. Different packages have been used and consensus tools were applied in several of these steps. The final 3D reconstructed volume achieved a resolution of 3 Å and, in the post-processed volume, some secondary structures can be distinguished, like alpha-helices, which helps to describe how atoms are arranged in space.
The workflow presented in this manuscript shows how Scipion can be used to combine different Cryo-EM packages in a straightforward and integrative way to simplify the processing, and obtain more reliable result at the same time.
In the future, the development of new methods and packages will keep growing and software like Scipion to easily integrate all of them will be even more important for the researchers. Consensus approaches will be more relevant even then, when plenty of methods with different basis will be available, helping to obtain more accurate estimations of all the parameters involve in the reconstruction process in Cryo-EM. Tracking and reproducibility are key in the research process and easier to achieve with Scipion thanks to having a common framework for the execution of complete workflows.
The authors have nothing to disclose.
The authors would like to acknowledge economical support from: The Spanish Ministry of Science and Innovation through Grants: PID2019-104757RB-I00/AEI/10.13039/501100011033, the "Comunidad Autónoma de Madrid" through Grant: S2017/BMD-3817, Instituto de Salud Carlos III, PT17/0009/0010 (ISCIII-SGEFI/ERDF), European Union (EU) and Horizon 2020 through grant: INSTRUCT – ULTRA (INFRADEV-03-2016-2017, Proposal: 731005), EOSC Life (INFRAEOSC-04-2018, Proposal: 824087), iNEXT – Discovery (Proposal: 871037), and HighResCells (ERC – 2018 – SyG, Proposal: 810057). The project that gave rise to these results received the support of a fellowship from "la Caixa" Foundation (ID 100010434). The fellowship code is LCF/BQ/DI18/11660021. This project has received funding from the European Union's Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No. 713673. The authors acknowledge the support and the use of resources of Instruct, a Landmark ESFRI project.