The current article describes the basics of multivariate analysis and contrasts it to the more commonly used voxel-wise univariate analysis. Both types of analysis are applied to a clinical-neuroscience data set. Supplementary split-half simulations show better replication of the multivariate results in independent data sets.
Representative Results
Univariate performance The results can be seen in detail in Figure 3. The area of the largest AD-related FDG deficit was found in the super temporal gyrus, Brodmann area 38. The area under the ROC-curve achieved was AUC=0.90. The generalization of this contrast to the replication sample was quite good with an area under the ROC curve of AUC=0.84.
Multivariate performance The results can be seen in detail in Figure 4. Areas with positive loadings, hinting at a relative preservation of signal in the face of disease were found in the cerebellum, while associated signal loss was found the parietotemporal and frontal areas, and the posterior cingulate gyrus. The areas under ROC-curves in both derivation and replication samples were slightly better than the univariate marker at 0.96 and 0.88, respectively.
Split-sample simulations The results can be seen in detail in Figure 5. The figure shows that the multivariate marker gives better replication of diagnostic performance than the univariate marker. The mean total error rate for the multivariate marker is 0.203, whereas for the univariate marker it is 0.307.
Figure 1. This simple figure describes the difference between univariate and multivariate analytic strategies: a hypothetical 3-dimensional data set is displayed in this illustration. On the left side, there is no correlation between the 3 variables plotted. On the right side in contrast, one can see a major source of variance indicating a positive correlation between all three voxels. A univariate analysis that just considered mean values on a voxel-by-voxel basis could not tell any difference between these two scenarios. Multivariate analysis, in contrast, identifies the major sources of variance in the data (red arrow) before proceeding to construct neural activation patterns form these sources.
Figure 2. This slide shows in a simplified form the basic accomplishment of any multivariate analysis in neuroimaging data. The data array Y(s,x), which depends on a subject index s, and a voxel index x, indicating the location of the voxel in the brain, is decomposed into a sum of several terms. First, a product of a purely subject-dependent factor score, ssf(s), and a purely voxel-dependent covariance pattern, v(x). Second, activation that cannot be accounted for by the covariance pattern is captured in a subject- and voxel-dependent noise term, e(s,x). The two graphics below the equation give an example of the subject scaling factor and the covariance pattern. Every participant manifests the covariance pattern, just to a different degree as shown by the subject factor score. Rather than having to keep track of every voxel’s behavior separately, the covariance pattern and its subject expression provide a parsimonious summary of the major source of variance. As the subject scaling factor increases in magnitude, the areas denoted in blue in the covariance pattern decrease their associated activation, while the areas indicated in red simultaneously increase their associated activation. The subject factor score can be correlated with external variables of interest like subject age or behavioral performance in a cognitive task, and no correction for multiple comparisons has to be applied to this correlation.
Several techniques for such decomposition exist, but the most common one is Principal Components Analysis (PCA). This is the technique of choice for us. Note that subject scaling factors can be obtained by projecting the covariance pattern into any data set of equal dimensionality, not just the data set that produced the covariance pattern in the first place. This makes covariance patterns suitable for testing whether brain-behavioral relationships that were observed in one data set can be replicated in a different data set.
Figure 3. This figure shows the result of the univariate analysis. In the lower left panel, the FDG signal values are plotted for the area that shows the largest AD-related deficit in the derivation sample. Its MNI coordinates are X=2 mm ,Y= -48 mm , Z= 30mm (Precuneus/PCG, Brodmann Area 31). The lower right panel shows the FDG signal at this very location in the replication sample. One can appreciate that the FDG differences between AD patients and controls in the replication sample, while still significant overall, are reduced with more overlap between the groups.
Figure 4. This figure shows the results of the multivariate analysis. In the top panel, we display several axial slices that show significantly positively and negatively weighted areas (p<0.001) in the covariance pattern in red and blue, respectively. Note that we scaled every scan by its global mean value, so red and blue color indicate relative rather and absolute increases and decreases of PET signal with disease severity. Red areas thus hint at relative preservation in the face of the disease, while blue indicates a loss of signal as a consequence of the disease. Red areas are mainly found in the cerebellum, while blue areas appear in the posterior cingulate gyrus, parietotemporal and frontal regions. Lower left panel: the subject factor scores of the AD-related covariance pattern are displayed in the derivation sample. Higher subject scores are found for the AD patients. Lower right panel: the subject factor scores resulting from the prospective application of the AD-related covariance pattern to the replication sample are plotted here. One can appreciate a slight worsening of the diagnostic contrast with increased overlap in the replication sample, but the generalization of the diagnostic efficacy is noticeably better than in the univariate case.
Figure 5. This figure shows the results of the 1,000 split-sample simulations. Listed are means and standard deviations of the univariate and multivariate diagnostic error rates in the replication samples. One can appreciate that the multivariate marker’s generalization of performance is considerably better, although somewhat more variable than the univariate marker’s.
We hope to have given the viewer a flavor of the basics of multivariate analysis; interested viewers are encouraged to check out our website. A few choices for parameters in the multivariate analysis were made that can be subject debate to considerable debate. We spared the discussion of these issues in this article to avoid distraction from the major issues. First, we chose the first 6 Principal Components to construct our AD-related covariance pattern. There are theoretical reasons for this choice that we did not discuss 4. The particular choice of 6 Principal Components though is not critical for our argument: one can chose in the range from 2 to 20 PCs and still obtain superior generalization performance of the multivariate marker in the split-sample simulations. The results are similarly robust with respect to the choice of numbers of subjects in derivation and replication samples. We chose 20 subjects for both groups in the replication sample, but this was purely for mathematical convenience to speed up the computations. Our results about the relative merits of both techniques would hold similarly if the numbers of subjects in the derivation samples were increased.
Second, we only presented the most basic kind of multivariate analysis. Considerable complication with techniques borrowed from the Machine-Learning literature, linear and non-linear transformations prior to the PCA, and various other wrinkles are feasible that could boost the generalization performance even more. For simplicity we have not touched on these possibilities in this article.
The authors have nothing to disclose.
The author is grateful for NIH grant support:
NIH/NIBIB 5R01EB006204-03 Multivariate approaches to neuroimaging analysis
NIH/NIA 5R01AG026114-02 Early AD Detection with ASL MRI & Covariance Analysis
ADNI: Imaging data was provided by the Alzheimer’s Disease Neuroimaging Initiative (ADNI) (NIH U01AG024904). Data collection and sharing for this project was funded by the Alzheimer’s Disease Neuroimaging Initiative (ADNI) (National Institutes of Health Grant U01 AG024904). ADNI is funded by the National Institute on Aging, the National Institute of Biomedical Imaging and Bioengineering, and through generous contributions from the following: Abbott, AstraZeneca AB, Bayer Schering Pharma AG, Bristol-Myers Squibb, Eisai Global Clinical Development, Elan Corporation, Genentech, GE Healthcare, GlaxoSmithKline, Innogenetics ,Johnson and Johnson, Eli Lilly and Co., Medpace, Inc., Merck and Co., Inc., Novartis AG, Pfizer Inc, F. Hoffman-La Roche, Schering-Plough, Synarc, Inc., and Wyeth, as well as non-profit partners the Alzheimer’s Association and Alzheimer’s Drug Discovery Foundation, with participation from the U.S. Food and Drug Administration. Private sector contributions to ADNI are facilitated by the Foundation for the National Institutes of Health(http://www.fnih.org). The grantee organization is the Northern California Institute for Research and Education, and the study is coordinated by the Alzheimer’s Disease Cooperative Study at the University of California, San Diego. ADNI data are disseminated by the Laboratory for Neuro Imaging at the University of California, Los Angeles. This research was also supported by NIH grants P30 AG010129, K01 AG030514, and the Dana Foundation.