Non-targeted metabolite profiling by ultra performance liquid chromatography coupled with mass spectrometry (UPLC-MS) is a powerful technique to investigate metabolism. This article outlines a typical workflow utilized for non-targeted metabolite profiling of serum including sample organization and preparation, data acquisition, data analysis, quality control, and metabolite identification.
Non-targeted metabolite profiling by ultra performance liquid chromatography coupled with mass spectrometry (UPLC-MS) is a powerful technique to investigate metabolism. The approach offers an unbiased and in-depth analysis that can enable the development of diagnostic tests, novel therapies, and further our understanding of disease processes. The inherent chemical diversity of the metabolome creates significant analytical challenges and there is no single experimental approach that can detect all metabolites. Additionally, the biological variation in individual metabolism and the dependence of metabolism on environmental factors necessitates large sample numbers to achieve the appropriate statistical power required for meaningful biological interpretation. To address these challenges, this tutorial outlines an analytical workflow for large scale non-targeted metabolite profiling of serum by UPLC-MS. The procedure includes guidelines for sample organization and preparation, data acquisition, quality control, and metabolite identification and will enable reliable acquisition of data for large experiments and provide a starting point for laboratories new to non-targeted metabolite profiling by UPLC-MS.
The term “metabolomics” can encompass many things. For example, a metabolomics experiment can be performed using a variety of analytical platforms such as NMR and both gas and/or liquid chromatography coupled with mass spectrometry. Furthermore, metabolomics experiments can be performed in a targeted or non-targeted manner, or a combination of both. A targeted metabolomics experiment will involve directed analysis of a panel of molecules important to the biological question at hand (e.g. small molecules involved in the TCA cycle will allow for accurate quantitation of that pathway). In this situation, the biological hypothesis is dictating the choice of metabolites to be targeted in the analysis and the analytical steps are optimized for the detection of these molecules. Alternatively, a non-targeted metabolomics experiment is hypothesis generating. In this case, the experiment is performed in a broad and unbiased manner to enable detection of as many metabolites as possible. The results from a non-targeted experiment will drive the next step of the research (which in many cases may involve a targeted metabolomics workflow). It is also possible to combine the two approaches, in which case an experiment is performed in a non-targeted manner while concurrently a panel of known molecules are monitored within the data.
The tutorial presented here is focused specifically on non-targeted metabolite profiling of serum. As described above, the non-targeted approach provides an unbiased view of the detectable metabolites, can generate large amounts of information, and ultimately allow for novel discoveries. The use of this approach, specifically employing ultra performance liquid chromatography coupled with mass spectrometry (UPLC-MS), is becoming widespread 1, 2, 3 and involves the following steps: (1) experimental design (2) sample collection (3) sample preparation (4) data acquisition by UPLC-MS (5) data pre-processing (peak detection, integration, alignment, and normalization) (6) statistical data analysis (both uni- and multivariate) (7) metabolite identification and (8) biological interpretation.
Currently, there are no established standard methods for UPLC-MS based non-targeted metabolite profiling and subsequent data pre-processing steps. This lack of standardization is due in part to one of the primary analytical challenges of metabolite profiling; the chemical diversity of the metabolome. Because of this diversity, it is impossible for a single extraction method or mass spectrometry acquisition method to provide comprehensive coverage of all metabolites in a single analysis. In concept, metabolite coverage can be maximized by using multiple extractions (e.g. aqueous, methanol, chloroform:methanol, etc.) coupled with various chromatographic conditions (e.g. reverse phase, HILIC, etc.) and various ionization modes (e.g. positive ion, negative ion, chemical ionization, etc.). Often, however, researchers do not have a pre-determined bias for a specific chemical class and thus the expense of performing multiple extractions and instrument acquisitions is not warranted, especially for large-scale experiments. Thus, the video tutorial presented here was designed to provide a general procedure for large scale non-targeted metabolite profiling of serum by UPLC-MS. It will enable new and established laboratories to perform these types of experiments and the building blocks upon which they can expand the approach for various sample types, specific chemical classes, or targeted analysis. Specifically, this protocol will include the steps of: serum sample preparation, sample organization for large scale studies, UPLC-MS data acquisition, quality control (QC) procedures, and metabolite identification. Strategies for data pre-processing and statistical analysis are also presented.
The protocol will not focus on the steps of experimental design, sample collection, or biological data interpretation as it is outside the scope of this tutorial. However, many resources exist in the literature for these topics and the authors encourage researchers new to metabolomics to explore these thoroughly 4, 5, 6, 7, 8, 9. In particular, experimental design is extremely important and is critical to the success of a non-targeted metabolomics experiment. Factors such as appropriate biological replication and consistency in sample collection procedure (e.g. time on bench, storage temperature, storage time, freeze-thaws, etc.) must be considered to ensure a viable study and to facilitate appropriate biological interpretation of the data.
1. Sample Organization
2. Serum and Quality Control (QC) Sample Preparation Procedures
3. UPLC-MS Data Acquisition
Time (min) | %A | %B | curve |
0.0 | 100 | 0 | |
0.1 | 100 | 0 | 6 |
1.0 | 60 | 40 | 6 |
3.0 | 30 | 70 | 6 |
11.0 | 0 | 100 | 6 |
17.0 | 0 | 100 | 6 |
17.1 | 100 | 0 | 6 |
23.0 | 100 | 0 | 6 |
Time (min) | %A | %B | curve |
0.0 | 100 | 0 | |
1 | 100 | 0 | 6 |
13 | 5 | 95 | 6 |
16 | 5 | 95 | 6 |
16.05 | 100 | 0 | 6 |
20 | 100 | 0 | 6 |
4. Peak Detection, Integration, Alignment, and Normalization
There are a variety of options to perform these steps including freeware and vendor specific tools. This step has been skipped in the video tutorial but our approach is described below as an example.
For each sample, the analytical data generated from this type of experiment is a profile of ions as described by a retention time, m/z value, and a spectral intensity. A non-targeted metabolite profiling experiment will be comprised of many samples and thus peak detection, retention time alignment, and normalization must be performed to enable subsequent statistical analysis of the dataset.
5. Statistical Analysis
For metabolomics experiments, both multivariate and univariate statistical techniques are necessary for data interpretation. Two techniques that are commonly used include Principal Component Analysis and Analysis of Variance (ANOVA). There are multiple ways to approach statistical analysis of the data and both open source and commercial software tools are available. This step has been skipped in the video tutorial but our approach is described below as an example.
6. Metabolite Identification
The workflow presented here is very general and can be applied to results from any instrument platform. An alternative strategy is presented in the Discussion section.
The basic analytical steps of a non-targeted metabolite profiling experiment by UPLC-MS are outlined in Figure 1. The raw data for each sample can be visualized as a base peak chromatogram. Figure 2 shows an example base peak chromatogram of a serum sample analyzed by gradient option (a) in the tutorial. Following statistical analysis as described above, metabolite identification is attempted for all statistically significant molecular features. Confident identification (level 1) requires matching of chromatographic retention time, accurate mass, and fragmentation pattern, acquired on the same analytical instrumentation, between the experimental molecular feature and an authentic standard of the putative metabolite. Figure 3 shows and example of a level 1 metabolite identification of caffeine in human serum.
The protocol presented here is focused on sample organization and preparation, UPLC-MS data acquisition and quality control, and analytical metabolite identification. The steps of peak detection, alignment, and normalization and statistical analysis are critically important to the output of a non-targeted metabolite profiling experiment, however, we chose to skip them in the video tutorial as there a multiple options and tools available for performing these tasks. Many of the tools are vendor specific, although freeware options are available. An example of our approach to these steps is presented in the protocol above.
Figure 1. Analytical workflow for a non-targeted metabolite profiling experiment. Steps include: sample organization and preparation, UPLC-MS data acquisition, quality control, data pre-processing and statistical analysis, and metabolite identification.
Figure 2. Example UPLC-MS base peak chromatogram of human serum using gradient option a as described in the protocol. Peaks represent the base peak (largest) ion signal at each point in time.
Figure 3. Level 1 metabolite Identification of caffeine in human serum based on chromatographic retention time, accurate mass, and fragmentation pattern. Extracted ion chromatogram of caffeine (parent ion 195.0878) from an authentic standard (A) and an experimental serum sample (B). Fragmentation spectra of caffeine from an authentic standard (C) and an experimental serum sample (D). Click here to view larger figure.
This tutorial is meant to serve as a starting point for conducting large scale non-targeted metabolite profiling by UPLC-MS. The workflow is focused on metabolites that can be extracted with an aqueous methanol solvent, retained on a C8 or C18 UPLC column, and detected as positive ions. In the situation where there is not a pre-determined bias towards a specific metabolite class and a hypothesis generating global profile is desired, this protocol is valuable as it will result in the detection of a large percentage of serum metabolites from a variety of chemical classes (i.e. moderately polar to non-polar compounds). If budget and time allow, metabolite coverage can be easily expanded by the incorporation of various extraction methods, chromatographic separations, and ionization modes 13,14 15.
Data stability over time is an important analytical consideration for large scale experiments in which data may be collected over many weeks or months. Given the unbiased and holistic nature of the approach, the incorporation of an internal standard (or even a panel of internal standards) for the purpose of data normalization is not appropriate and may in fact skew the data, as the assumption that an internal standard is representative of the entire metabolome is invalid. Additionally, given the natural variation inherent to serum, the use of an endogenous internal control should be approached with caution. Proper sample preparation and randomization combined with the normalization strategy described below can ensure normality across the dataset. In addition, the use of easily quantifiable and traceable QC metrics during data acquisition ensures analytical stability (e.g. retention time, sensitivity, and mass accuracy).
While the unbiased nature of non-targeted metabolite profiling allows for the generation of large amounts of data and the potential for novel discoveries, it is marked by a substantial limitation in the challenge of metabolite identification. Small molecule spectral libraries are well developed for gas chromatography (GC)-MS applications utilizing electron impact ionization, but have been slow to develop for LC-MS small molecule applications. Unlike peptides, which fragment predictably, metabolites are structurally diverse, making their behavior in the mass spectrometer less predictable, and this has greatly inhibited the generation of theoretical spectral databases (as used in proteomic applications). Recently, some progress has been made in the generation of small molecule LC-MS spectral libraries through the analysis of authentic standards 10, 16, and these libraries continue to grow. However, given the complexity of the metabolome, variation of the metabolome between biological sources, and the variation in molecule fragmentation with MS conditions, the generation of comprehensive spectral libraries through the analysis of authentic standards is a slow and tedious process. Thus, the process of molecular identification remains a significant bottleneck in non-targeted LC-MS based metabolite profiling studies 1, 17.
An alternative approach to the metabolite identification workflow described in the protocol is to incorporate indiscriminant MS/MS acquisition (idMS/MS) as part of the initial metabolite profiling workflow. In this approach, MS and MS/MS acquisitions are performed simultaneously in parallel alternating scans, and the MS/MS acquisition is an indiscriminant process, i.e. no precursor ion isolation is employed. Thus, using an idMS/MS workflow, both the MS and MS/MS signals are acquired concurrently for every sample, eliminating the need for subsequent targeted MS/MS acquisition. The challenge of this approach is the assignment of precursor-product ion relationships. A workflow for this analysis was recently described by our group and can be implemented in R18. The output of this workflow is reconstructed MS/MS spectra for each detectable metabolite that can be used to compare against an authentic standard and/or searched against public or in-house spectral databases. The advantage of this approach is that it does not require subsequent mass spectrometry experimentation to facilitate the metabolite identification process. Mass spectrometers from Waters are capable of performing this type of data acquisition in what is called MSE mode. Alternative approaches are possible for tandem MS/MS instruments from other vendors. For example, two injections of each sample can be performed, one using low collision energy and a second at high collision energy. If using a non-tandem instrument (e.g. ESI-TOF), the data can be collected using two injections at two different cone voltage settings, where higher cone voltage induces increased in-source fragmentation. Several instrument platforms do allow scan to scan parameter switching, which often includes both collision energy as well as cone voltage settings 18.
The workflow for metabolite identification presented in the protocol, as well as the alternative strategy described above, are both focused on the assignment of molecular features to known metabolites. Thus, there still remains the significant challenge of unknown metabolite identification. At this time, the key strategy for structural elucidation of unknown metabolites remains focused on isolation and concentration of the metabolites with subsequent analysis by NMR17. This is not practically feasible for a large number of metabolites and thus the molecular identification of unknowns represents a major analytical hurdle that will continue to be a priority for the field. An example of current efforts in this area is the development of in-silico spectral libraries of small molecule metabolites. Kangas et al. have recently published a tool for the generation of in-silico spectra of lipid compounds19. Work such as this has the potential to significantly expand coverage in MS/MS spectral libraries to all known biological compounds, with the caveat that some compounds classes will be more difficult to predict than the lipids.
The authors have nothing to disclose.
The presented tutorial was performed and developed within the Proteomics and Metabolomics Facility at Colorado State University which is partially funded by the CSU Research Administration Resources for Scholarly Projects.
Name of Reagent/Material | Company | Catalog Number | Comments |
96 well plates – 500 μl wells | VWR | 40002-020 | These are used for sample preparation |
96 well plate mats | VWR | 89026-514 | These are used for sample preparation |
96 well plates – 350 μl wells | Waters Corporation | WAT058943 | These are used for sample injection |
96 well plate mats | Waters Corporation | 186000857 | These are used for sample injection |
96 well plate heat seals | Waters Corporation | 186002789 | These can be used for sample injection or long term storage |
96 well plate heat sealer | Waters Corporation | 186002786 | |
LC-MS grade methanol | Fluka | 34966 | |
LC-MS grade acetonitrile | Fluka | 34967 | |
LC-MS grade aater | Fluka | 39253 | |
LC-MS grade formic acid | Fluka | 56302 | |
Multichannel electronic pipettor | VWR | 89000-674 | |
Pipett tips | Eclipse (purchased through Light Labs) | B-5061/B-4061 | |
Chilled centrifuge – Allegra X-12R | Beckman Coulter | N/A – contact Beckman Coulter | |
Acquity Ultra performance Liquid Chromatography (UPLC) System | Waters Corporation | N/A – contact Waters Corporation | |
UPLC C8 column (gradient option a) | Waters Corporation | 186002876 | |
UplC T3 column (gradient option b) | Waters Corporation | 186003536 | |
Xevo G2 Q-TOF Mass spectrometer | Waters Corporation | N/A – contact Waters Corporation |