This protocol describes the process of applying seven different automated segmentation tools to structural T1-weighted MRI scans to delineate grey matter regions that can be used for the quantification of grey matter volume.
Within neuroimaging research, a number of recent studies have discussed the impact of between-study differences in volumetric findings that are thought to result from the use of different segmentation tools to generate brain volumes. Here, processing pipelines for seven automated tools that can be used to segment grey matter within the brain are presented. The protocol provides an initial step for researchers aiming to find the most accurate method for generating grey matter volumes from T1-weighted MRI scans. Steps to undertake detailed visual quality control are also included in the manuscript. This protocol covers a range of potential segmentation tools and encourages users to compare the performance of these tools within a subset of their data before selecting one to apply to a full cohort. Furthermore, the protocol may be further generalized to the segmentation of other brain regions.
Neuroimaging is widely used in both clinical and research settings. There is a current move to improve the reproducibility of studies that quantify brain volume from magnetic resonance imaging (MRI) scans; thus, it is important that investigators share experiences of using available MRI tools for segmenting MRI scans into regional volumes, to improve the standardization and optimization of methods1. This protocol provides a step-by-step guide to using seven different tools to segment the cortical grey matter (CGM; grey matter which excludes subcortical regions) from T1-weighted MRI scans. These tools were previously used in a methodological comparison of segmentation methods2, which demonstrated variable performance between tools on an Huntington's disease cohort. Since performance of these tools is thought to vary among different datasets, it is important for researchers to test a number of tools before selecting only one to apply to their dataset.
Grey matter (GM) volume is regularly used as a measure of brain morphology. Volumetric measures are generally reliable and able to discriminate between healthy controls and clinical groups3. The volume of different tissue types of brain regions is most often calculated using automated software tools that identify these tissue types. Thus, to create high quality delineations (segmentations) of the GM, accurate delineation of the white matter (WM) and cerebrospinal fluid (CSF) is critical in achieving accuracy of the GM region. There are a number of automated tools that may be used for performing GM segmentation, and each requires different processing steps and results in a different output. A number of studies have applied the tools to different datasets to compare them with one other, and some have optimized specific tools1,4,5,6,7,8,9,10,11. Previous work has demonstrated that variability between volumetric tools can result in inconsistencies within the literature when studying brain volume, and these differences have been suggested as driving factors for false conclusions being drawn about neurological conditions1.
Recently, a comparison of different segmentation tools in a cohort that included both healthy control participants and participants with Huntington's disease was performed. Huntington's disease is a genetic neurodegenerative disease with a typical onset in adulthood. Gradual atrophy of subcortical and CGM is a prominent and well-studied neuropathological feature of the disease. The results demonstrated variable performance of seven segmentation tools that were applied to the cohort, supporting previous work that demonstrated variability in findings depending on the software used to calculate brain volumes from MRI scans. This protocol provides information on the processing used in Johnson et al. (2017)2 that encourages careful methodological selection of the most appropriate tools for use in neuroimaging. This manual covers the segmentation of GM volume but does not cover the segmentation of lesions, such as those seen in multiple sclerosis.
Note: Ensure that all images are in NifTI format. Conversion to NifTI is not covered here.
1. Segmentation via SPM 8: Unified Segment
NOTE: This procedure is performed via the SPM8 GUI which operates within Matlab. The SPM8 guide provides further detail and can be found at: http://www.fil.ion.ucl.ac.uk/spm/doc/spm8_manual.pdf.
2. Segmentation via SPM 8: New Segment
NOTE: This procedure is performed via the SPM8 GUI. The SPM8 guide provides further detail and can be found at: http://www.fil.ion.ucl.ac.uk/spm/doc/spm8_manual.pdf. Make sure that SPM8 is installed and set in the software path. Open the SPM software, typically performed by typing "spm" into a command line. This opens a graphical user interface (GUI) window with a range of options that can be selected to perform analysis.
3. Segmentation via SPM 12: Segment
NOTE: This procedure is performed via the SPM12 GUI. The SPM12 guide provides further detail and can be found at:http://www.fil.ion.ucl.ac.uk/spm/doc/manual.pdf.
4. Segmentation via FSL FAST
NOTE: This procedure is done in the command line. The FSL guide provides further detail and can be found at: https://fsl.fmrib.ox.ac.uk/fsl/fslwiki.
5. Segmentation via FreeSurfer
NOTE: This procedure is done in the command line. The FreeSurfer guide provides further detail and can be found at: https://surfer.nmr.mgh.harvard.edu/.
6. Segmentation via ANTs
NOTE: This procedure is done in the command line. ANTs is a more complex software than the other tools and it should be noted that the procedure explained here could be further optimised for each cohort to improve the results. ANTs documentation can be found at: http://stnava.github.io/ANTsDoc/. There are two ways to segment the images into tissue classes as described below.
7. Segmentation via MALP-EM
8. Visual Quality Control
NOTE: Visual quality control should be performed on all segmented regions to be used in the analysis. Quality control ensures that the segmentations are of a high standard and represent reliable segmentation of the CGM. To perform quality control, each scan is opened and overlaid on the original T1 to compare the generated region to the CGM visible on the scan.
Average brain volumes for 20 control participants, along with demographic information, is shown in Table 1. This acts as a guide for expected values when using these tools. Results should be viewed in the context of the original T1.nii image. All GM regions should be inspected as per the steps described in section 8. When performing visual QC, it is important to directly compare the GM regions to the T1 scan by viewing them overlaid on the T1.
Regions should be rejected for gross errors as shown in Figure 1. Sometimes these errors result if processing was run incorrectly, or if the brain was poorly positioned within the field of view. To correct these errors, the native T1 scans can be rigidly re-aligned to standard space and segmentation can be re-attempted. The rate of failures will vary depending on quality of the data and tools used, as well as the classification of failure. In the current study, failure rates of total failures resulting in rejection were < 5% for all tools, but less significant errors were consistently seen across a number of tools. FSL FAST, SPM 8 New Segment and FreeSurfer had errors (but not failures) in > 50% of scans for this cohort. This error rate was quantified by examining the notes taken during the visual QC process, with errors included if they were seen as a reasonable departure from the expected regions, as shown in Figures 2-6. It is important to note that these tools have been validated on other datasets and result in much lower error rates 3,8. While these errors could possibly be improved via manual intervention or inclusion of a mask at brain extraction, since SPM New Segment and MALP-EM resulted in a lower error rate for this dataset, these tools would be used instead. Masks can be applied before processing within ANTs and MALP-EM, and after processing for SPM (all versions) and FSL FIRST.
More minor errors are shown in Figures 2-6. By testing different segmentation tools on a dataset before application to the whole cohort, the tool that performs best on that dataset can be selected for analysis. When performing QC, a procedure should be developed for choosing to reject, edit, or accept segmentations. Common errors seen for the seven tools are described here, with examples shown in Figures 2-6. Errors in segmentation such as these can often be corrected with the addition of a mask in the processing stream or editing the regions. However, regions with extensive over- or under-estimation of the cortex may need to be rejected from analysis. Strict criteria should be developed and followed when making this decision. These steps are not covered in this protocol and will vary from dataset to dataset.
Generally, when performing visual QC, it is important to pay particular attention to temporal and occipital regions, as these are areas that show the most consistent errors. Figure 2 shows examples of good and bad temporal segmentations, and Figure 3 shows examples of good and bad occipital segmentations. Figure 4 shows another common issue that occurs in all tools, in which non-brain tissue is classified as CGM in superior slices of the brain. Figure 5 displays another issue seen in a number of segmentations where regions of the CGM are excluded from the segmentation. This often occurs in superior slices of the brain, as seen in Figure 5.
SPM8 Unified Segment commonly resulted in poor temporal delineation, with the segmented GM region spilling into non-brain tissue surrounding the temporal lobes. Spillage into the occipital lobe is common, while under-estimation of the frontal lobes also seen in a number of regions. For SPM8 New segment, poor temporal delineation and occipital spillage were also common. Using this version of SPM also results in voxels within the skull and dura being classified as GM in nearly all segmentations. SPM12 was improved compared to earlier versions of SPM, with the temporal lobe segmentations improved and less spillage in other regions. ANTs showed highly variable performance on this cohort, with the initial brain extraction determining the quality of segmentation. It is important to pay particular attention to the external boundaries, and if brain extraction is poor using ANTs, then the brain mask included in the Atropos command can be improved. Issues with over-estimation of the GM in the temporal and occipital lobes were again common. MALP-EM showed fewer issues with over-estimation of the temporal and occipital lobes; although, there was under-estimation of the cortex in a number of cases. This can be improved by inclusion of a brain mask in the pipeline. FSL FAST segmentations were highly variable, due to the variable performance of BET brain extraction on the data from this cohort. Again, issues within occipital and temporal lobes were common; however, these can be improved with optimization of brain extraction. Finally, FreeSurfer volumetric regions are often tight along the GM/CSF boundary, typically excluding some regions of GM in the outer boundary (Figure 6). As with other tools, spillage outside of the GM is prevalent within the temporal and occipital lobes. Finally, Figure 7 shows an example of a good segmentation displayed in FSLview that had no errors in segmentation. Manual editing of the regions can often be performed to improve regions, although this is not covered here.
Figure 1: Example of a failed segmentation displayed on a T1 scan. This segmentation should be re-processed and excluded from analysis if it cannot be improved. Please click here to view a larger version of this figure.
Figure 2: Examples of the performance of different tools on the temporal lobe on a T1 scan. (A) The T1 scan without a segmentation. (B) The T1 scan with an example of a good regional delineation (MALP-EM). (C) The T1 scan with an example of a good regional delineation (FreeSurfer). (D) The T1 scan with an example of a poor regional delineation, showing spillage in the left and right temporal lobes (SPM 8 New Segment). (E) The T1 scan with an example of a poor regional delineation, showing spillage in the left and right temporal lobes (FSL FAST). The scans are viewed in FSLeyes with the T1 scan as a base image and the GM region as an overlay. In this figure, the GM regions are viewed as red-yellow with an opacity of 0.4. The color gradient represents partial volume of voxels, with voxels that are more yellow having a higher PVE estimate (more likely to be GM) and those that are red having a lower PVE estimate (less likely to be GM). Please click here to view a larger version of this figure.
Figure 3: Examples of the performance of different tools on the occipital lobe on a T1 scan. (A) The T1 scan without a segmentation. (B) The T1 scan with an example of a good regional delineation (MALP-EM). (C) The T1 scan with an example of a poor occipital lobe delineation with spillage into the dura in the medial section of the region (SPM 8 Unified Segment). (D) The T1 scan with an example of a poor occipital lobe delineation with spillage into the dura in the medial and superior sections of the region (SPM 8 New Segment). (E) The T1 scan with an example of a poor occipital lobe delineation with spillage into the dura in the medial and superior sections of the region (FSL FAST). The scans are viewed in FSLeyes with the T1 scan as a base image, and the GM region as an overlay. In this figure, the GM regions are viewed as red-yellow with an opacity of 0.4. The color gradient represents partial volume of voxels, with voxels that are more yellow having a higher PVE estimate (more likely to be GM) and those that are red having a lower PVE estimate (less likely to be GM). Please click here to view a larger version of this figure.
Figure 4: Example of a GM region spilled into the dura, displayed in an FSLview window (in sagittal, coronal, and axial views). The blue region highlights spillage into the dura. Please click here to view a larger version of this figure.
Figure 5: Example of a GM region that has excluded regions of the CGM from segmentation. This region is displayed in an FSLview window, in sagittal, coronal, and axial views. The axial view best shows the regions that have been excluded from segmentation. Please click here to view a larger version of this figure.
Figure 6: Example of a FreeSurfer GM region that is very tight along the GM/CSF boundary, displayed in FreeView. The coronal window in the top left best displays the under-estimation in the CGM in this region. Please click here to view a larger version of this figure.
Figure 7: Example of a well-delineated MALP-EM region on a T1 brain scan. The region shows no issues with over- or under-estimation of the CGM in any region. Please click here to view a larger version of this figure.
Table 1: Demographic information and average GM volumes (mL) for 20 control participants from the TRACK-HD study, segmented using the seven tools described here.
Recently, research has demonstrated that the use of different volumetric methods may have important implications for neuroimaging studies1,2. By publishing protocols that help guide novice users in how to apply different neuroimaging tools, as well as how to perform QC on the results output by these tools, researchers may select the best method to apply to their dataset.
While most steps in this SOP can be adjusted to suit the data and researcher requirements, one of the most critical processes presented here are the steps describing detailed visual quality control. Visual QC should be performed on all segmentations output by these tools and is essential for the accurate measurement of CGM. The QC steps taken to ensure high-quality segmentations have been developed after the examination of thousands of CGM regions. By comparing different tools via visual examination, the most accurate method can be found for each dataset.
For each tool, there are different options that can be used to optimize segmentation on each dataset. It is often preferable to realign all scans to native space prior to segmentation, as this can reduce errors in segmentation; however, this is not essential. Furthermore, the regions output by each tool differ, with some including only cortical GM and some also including subcortical regions. Furthermore, some regions output partial volume estimates (PVE) and some output discrete tissue maps. While volume extraction is not covered here, and discussion of the difference between PVE and discrete tissue maps is beyond the scope of this standard operating procedure (SOP), PVE maps are generally accepted as a more reliable measure12. This SOP provides information on the processing used in Johnson et al. (2017)2 to segment and QC the scans; however, there may be more appropriate selections for other users depending on the quality of their images, and further processing such as the application of masks to limit regions to cortical GM may be required. All segmentations can be performed in native space.
This protocol provides example pipelines for seven different methods that can be used to segment the CGM from T1 MRI scans. These examples largely follow the default pipelines that are recommended for each software, and it is important to note that further optimization of these pipelines may be necessary for the successful segmentation of a region on different scans. Some tools, such as MALP-EM, have limited options and are likely better for users who are new to neuroimaging. Other tools, including ANTs, can undergo detailed optimization, and the protocol presented here represents one possible application of this software. Additional options, such as the use of masks to limit calculation of the volumes, are also possible for most tools.
It is important to note that not all tools can be used on every operating system. SPM and ANTs are compatible with Windows, Mac, and Linux systems, FSL is compatible with Mac and Linux systems, and MALP-EM and FreeSurfer are compatible with Linux systems (or a Linux virtual machine running on a Windows/Mac PC).
This protocol covers the steps that can be used to perform segmentation and quality control (QC) on 3D T1-weighted MRI scans to generate CGM regions. However, the protocol assumes that images are 3D T1 images in NifTI format (.nii extension). In the analysis performed by Johnson et al.2, images were already bias-corrected using the N3 procedure13. This protocol also assumes that the software has been downloaded and installed on a linux machine as per the instructions provided by each tool. The software compared here include SPM814, SPM12, FSL15, FreeSurfer16,17, ANTs18, and MALP-EM19.
This SOP covers a range of segmentation techniques; however, there are other options available for segmenting structural T1 scans. These methods were selected for previous comparison by Johnson et al.2 based on their frequency of use within Huntington's disease research. However, every tool performs differently in each dataset, and segmentation tools not covered here may be appropriate for other datasets and research groups.
These tools are widely used within neuroimaging research. As software updates are created for these tools, it is likely that the output of each segmentation method will undergo significant changes over time. However, emphasis should remain on the process of visual QC to ensure that high-quality segmentations are used in neuroimaging studies.
The authors have nothing to disclose.
We wish to thank all those at the CHDI/High Q Foundation responsible for the TRACK-HD study; in particular, Beth Borowsky, Allan Tobin, Daniel van Kammen, Ethan Signer, and Sherry Lifer. The authors also wish to extend their gratitude to the TRACK-HD study participants and their families. This work was undertaken at UCLH/UCL, which received a proportion of funding from the Department of Health's National Institute for Health Research Biomedical Research Centres funding scheme. S.J.T. acknowledges support of the National Institute for Health Research through the Dementias and Neurodegenerative Research Network, DeNDRoN.
TRACK-HD Investigators:
C. Campbell, M. Campbell, I. Labuschagne, C. Milchman, J. Stout, Monash University, Melbourne, VIC, Australia; A. Coleman, R. Dar Santos, J. Decolongon, B. R. Leavitt, A. Sturrock, University of British Columbia, Vancouver, BC, Canada; A. Durr, C. Jauffret, D. Justo, S. Lehericy, C. Marelli, K. Nigaud, R. Valabrègue, ICM Institute, Paris, France; N. Bechtel, S. Bohlen, R. Reilmann, University of Münster, Münster, Germany; B. Landwehrmeyer, University of Ulm, Ulm, Germany; S. J. A. van den Bogaard, E. M. Dumas, J. van der Grond, E. P. 't Hart, R. A. Roos, Leiden University Medical Center, Leiden, Netherlands; N. Arran, J. Callaghan, D. Craufurd, C. Stopford, University of Manchester, Manchester, United Kingdom; D. M. Cash, IXICO, London, United Kingdom; H. Crawford, N. C. Fox, S. Gregory, G. Owen, N. Z. Hobbs, N. Lahiri, I. Malone, J. Read, M. J. Say, D. Whitehead, E. Wild, University College London, London, United Kingdom; C. Frost, R. Jones, London School of Hygiene and Tropical Medicine, London, United Kingdom; E. Axelson, H. J. Johnson, D. Langbehn, University of Iowa, IA, United States; and S. Queller, C. Campbell, Indiana University, IN, United States.