This protocol is for the extraction and concentration of protein and DNA from microbial biomass collected from seawater, followed by the generation of tryptic peptides suitable for tandem mass spectrometry-based proteomic analysis.
Meta-omic technologies such as metagenomics, metatranscriptomics and metaproteomics can aid in the understanding of microbial community structure and metabolism. Although powerful, metagenomics alone can only elucidate functional potential. On the other hand, metaproteomics enables the description of the expressed in situ metabolism and function of a community. Here we describe a protocol for cell lysis, protein and DNA isolation, as well as peptide digestion and extraction from marine microbial cells collected on a cartridge filter unit (such as the Sterivex filter unit) and preserved in an RNA stabilization solution (like RNAlater). In mass spectrometry-based proteomics studies, the identification of peptides and proteins is performed by comparing peptide tandem mass spectra to a database of translated nucleotide sequences. Including the metagenome of a sample in the search database increases the number of peptides and proteins that can be identified from the mass spectra. Hence, in this protocol DNA is isolated from the same filter, which can be used subsequently for metagenomic analysis.
Microorganisms are ubiquitous and play essential roles in Earth’s biogeochemical cycles 1. Currently, there are numerous molecular approaches available for characterizing microbial community structure and function. Most common is the analysis of 16S rRNA gene sequences PCR-amplified from environmental DNA 2–4. A disadvantage of 16S rRNA gene analysis is that it only provides information on phylogenetic identity and community structure, with little information on metabolic function. In contrast, approaches such as metagenomics, metatranscriptomics and metaproteomics provide information on community structure and metabolism. Metagenomics, or the analysis of the gene content of an assemblage of organisms, provides information about the structure and functional potential of the community 5–8. Although powerful, this functional potential may not correspond to the metabolic activities of the organisms. An organism’s genotype is represented by its genes, each of which can be transcribed to RNA and further translated to protein, resulting in a phenotype. Thus, to aid in the understanding of microbial functional activity in an environment, post-genomic analysis should be performed 9. Metatranscriptomics, or the analysis of RNA transcripts is useful because it reveals which genes are transcribed in any given environment. However, mRNA levels do not always match their corresponding protein levels due to translational regulation, RNA half-life, and the fact that multiple protein copies can be generated for every mRNA 10.
For these reasons metaproteomics is now recognized as an important tool for environmental microbiology. Common metaproteomic analyses use a shotgun proteomic approach where the near full complement of proteins in a complex sample are purified and analyzed simultaneously, usually through enzymatic digestion into peptides and analysis on a mass spectrometer. Subsequent tandem mass spectrometry (MS/MS) “peptide fingerprinting” is used to determine the peptide sequence and potential protein of origin by protein database searching (for a review see 11). Proteomic work has come a long way in the past 25 years thanks to the increase in genomic data availability and the increase in the sensitivity and accuracy of mass spectrometers allowing for high-throughput protein identification and quantification 11,12. Since proteins are the final product of gene expression, metaproteomic data can help determine which organisms are active in any given environment and what proteins they are expressing. This is advantageous when trying to determine how a particular set of environmental variables will affect the phenotype of an organism or community. Early on, MS/MS-based metaproteomic studies in the ocean were used to identify specific proteins in targeted microbial lineages, with the first study focusing on the light driven proton pump proteorhodopsin in SAR11 marine bacteria 13. More recently, comparative metaproteomic analyses have elucidated differential protein expression patterns between complex communities. Examples include the identification of temporal shifts in metabolism in the coastal Northwest Atlantic Ocean 14 or the Antarctic Peninsula 5. Other studies have described variations in protein expression patterns across spatial scales, for instance, along a geographical transect from a low-nutrient ocean gyre to a highly productive coastal upwelling system 15. For further reviews of metaproteomics we recommend Schneider et al. (2010) 9 and Williams et al. (2014) 16. Targeted proteomics has also been employed in recent years to quantify expression of specific metabolic pathways in the environment 17,18.
There are three main phases in metaproteomic analysis. The first phase is sample preparation, which includes sample collection, cell lysis and concentration of protein. Sample collection in marine microbiology often entails the filtration of seawater through a pre-filter to remove larger eukaryotic cells, particles and particle-associated bacteria, followed by filtration for the capture of free living microbial cells, commonly with the use of a 0.22 µm cartridge filter unit 19,20. These filters are incased in a plastic cylinder and a cell lysis and protein extraction protocol that can be performed within the filter unit would be a valuable tool. Once biomass is obtained, the cells must be lysed to allow for protein extraction. Several methods can be employed, including guanidine-HCl lysis 21 and sodium dodecyl sulfate (SDS)-based lysis methods. Although detergents like SDS are very efficient at disrupting membranes and solubilizing many protein types, concentrations as low as 0.1% can interfere with downstream protein digestion and MS analysis 22. Of major concern is the negative effects of SDS on trypsin digestion efficiency, resolving power of reverse phase liquid chromatography and ion suppression or accumulation inside the ion source 23.
The second phase is fractionation and analysis, where proteins are subjected to enzymatic digestion followed by LC MS/MS analysis, resulting in a m/z fragmentation pattern that can be used to ascertain the primary amino acid sequence of the initial tryptic peptide. Various digestion methods can be performed depending on the types of detergents used, as well as the downstream mass spectrometry workflow. In our protocol, 1-D PAGE electrophoresis followed by removal of SDS from the gel is utilized in order to remove any detergent contamination. The analysis of proteins that are difficult to solubilize, such as membrane proteins, requires the use of high concentrations of SDS or other detergents. This leads to compatibility issues with SDS-gel electrophoresis. If the objective of a study requires the solubilization of these hard to solubilize proteins, the tube-gel system can be used 22,24. The tube-gel method incorporates proteins within the gel matrix without the use of electrophoresis. Subsequently any detergents used for solubilization are removed before protein digestion.
The third phase is the bioinformatic analysis. In this phase the MS/MS peptide data are searched against a database of translated nucleotide sequences to determine which peptides and proteins are present in the sample. The identification of peptides is dependent on the database it is searched against. Marine metaproteomic data are commonly searched against databases comprised of reference genomes, metagenomic data such as the Global Ocean Sampling dataset 25, as well as single cell amplified genomes from uncultivated lineages 26,27. Protein identification can also be increased by the inclusion of metagenomic sequences from the same sample as the metaproteomic data was derived 5.
Here we provide a protocol for the generation of peptides suitable for MS/MS-based analysis from microbial biomass collected by filtration and stored in an RNA stabilization solution. The protocol described here allows for DNA and protein to be isolated from the same sample so that all steps leading up to the protein and DNA precipitations are identical. From a practical perspective, less filtration is required since only one filter is required for both protein and DNA extraction. We would also like to acknowledge that this protocol was created through the combination, adaptation and modification of two previously published protocols. The cell lysis steps are adapted from Saito et al. (2011) 28 and the in-gel trypsin digest component is adapted from Shevchenko et al. (2007) 29.
1. Prepare Reagents
2. Perform Cell Lysis in Cartridge Filter Unit with Cells Preserved in RNA Stabilization Solution
3. Protein Precipitation
4. DNA Precipitation
5. SDS-PAGE Gel of Proteins
6. In-gel Trypsin Digest and Peptide Extraction
Note: All steps from here until 6.10 are performed in a biological safety cabinet to minimize contamination.
As a demonstration, we performed the protocol on two seawater samples collected from the surface and the chlorophyll maximum of the coastal ocean in Northern Canada. While at sea, 6-7 L of seawater was passed through a 3 µm GF/D prefilter, then microbial cells were collected onto a 0.22 µm cartridge filter unit following the protocol of Walsh et al. 20. Cells were immediately stored in an RNA stabilization solution until further processing. Upon returning to the lab, we performed the protocol as it is presented here. The concentrated cell lysate was divided; protein was precipitated from 90% of the volume, while DNA was precipitated from the remaining 10% of the volume. We recovered 24-26 µg of protein and 250-308 ng of high quality DNA from these samples (Figure 1). After the in-gel trypsin digestion and peptide extraction, we subjected the peptides to MS/MS analysis using a nano-LC coupled to the Orbitrap Elite mass spectrometer (Thermo Fisher Scientific, Waltham, MA, USA). From the peptides, we generated over 23,000 MS/MS spectra per sample. Peptides and proteins were then identified by searching these spectra against a custom in-house sequence database using the PEAKS bioinformatics tool (BSI, Waterloo, ON, Canada). The database was comprised of predicted proteins from marine reference genomes and metagenomes. The search resulted in the identification of around 1,000 peptides and 700-800 proteins for each sample. Naturally, these results are dependent on microbial cell abundance, MS instrumentation, and protein search database and algorithms. Nonetheless, these results demonstrate that this protocol has the potential to produce adequate tryptic peptides suitable for identify hundreds of proteins in the environment. Moreover, since metagenomic libraries can be constructed from as little as 100 ng of DNA 30, this protocol also has potential to provide adequate quantities of DNA to generate matched metagenomic-metaproteomic datasets.
Taxonomic and functional composition of the metaproteomes was analyzed using a combination of BLASTp and the MEGAN (Metagenome analyzer) software package 31,32 (Figure 2). Proteins assigned to Alpha-proteobacteria were the most highly represented in the dataset, the vast majority of which were assigned to the SAR11 clade. The Rhododobacterales clade of Alpha-proteobacteria was also highly represented and identified most often in surface waters. Proteins assigned to Bacteroidetes were evenly distributed between the surface and chlorophyll maximum, but Flavobacteria proteins were identified to a greater degree at the chlorophyll maximum. Gamma-proteobacterial proteins were evenly distributed throughout the water column while Beta-proteobacterial proteins were found predominately in the surface. From a functional perspective, a wide range of metabolic pathways were identified. Vertical structuring of these metabolic pathways was apparent. For example, proteins associated with amino acid metabolism, carbohydrate metabolism and prokaryotic carbon fixation pathways were identified primarily at the surface, and nitrogen metabolism was found exclusively at the surface. Photosynthetic carbon fixation proteins were observed primarily at the chlorophyll maximum while proteins involved in photosynthesis were identified evenly between the surface and chlorophyll maximum. These results demonstrate that a wide variety of proteins from a diversity of microbial taxa can be detected using the protocol presented here.
Figure 1: Genomic DNA from 2 depths at Arctic station S633. The first lane contains 4 µl of a 1 kb DNA ladder, lane 2 contains 3 µl of genomic DNA extraction from S633_2 m, lane 3 contains 3 µl of genomic DNA extraction from S633_20 m and lanes 4-6 contain 0.5 µl (85 ng), 2 µl (333 ng) and 4 µl (667 ng) of HindIII digested lambda DNA.
Figure 2: Taxonomic and functional analysis of 2 depths at Arctic station S633. Taxonomic diversity comparison of the Arctic station 633 surface and chlorophyll maximum waters (A) created using MEGAN. Functional diversity comparison of the Arctic station 633 surface and chlorophyll maximum waters (B) created using MEGAN to query against the KEGG database. Please click here to view a larger version of this figure.
Sample preservation is key to metaproteomic studies and previous work demonstrated that an RNA stabilization solution is a useful storage buffer for storing cells prior to protein extraction 28. Ideally, samples would be preserved in situ to negate shifts in protein expression during handling 33,34. In fact, in situ sampling and fixation technologies have been developed, which allow for the autonomous collection and preservation of samples by ship-deployed instruments. However, access to these technologies is not always feasible. In the common case that it is not, samples should be preserved as soon as possible after collection.
Here we present a protocol for extracting protein from RNA stabilization solution stored cells collected on a cartridge filter unit, which is commonly used in aquatic microbiology. The protocol includes cell lysis using an SDS-lysis solution and heating, followed by a protein concentration step using ultracentrifugal filter units that doubled as a necessary desalting step. It must be noted that the concentrating and desalting steps cannot be overlooked. We found that a minimum of three buffer exchange steps was required to desalt our concentrate. Due to the high salt concentration of the RNA stabilization solution, if proper desalting does not occur, too much salt will be precipitated during the overnight protein precipitation step and the desalting and precipitation step will have to be repeated. Additionally, if desalination is not properly performed the 1D-PAGE will not work and the samples will be lost.
Next, the concentrated lysate was divided so that both protein precipitation and DNA precipitation could be performed. This is useful as it is often desirable that metagenomic and metaproteomic data be generated from the same samples. If a protein is not represented in the protein sequence database then the peptide will not be identified. Including the genomic data from the same sample as the proteomic data reduces the risk of not being able to identify a protein due to its absence from the database.
Although this protocol was optimized for use with cartridge filter units and validated to work on coastal ocean microbial communities, it can be adapted for use with other types of environmental samples and filters. However, it should be clearly stated that the success of this protocol is dependent on an adequate amount of starting biomass. Therefore in aquatic ecosystems where biomass may be very low, we recommend increasing the volume of water filtered accordingly.
The authors have nothing to disclose.
The authors would like to acknowledge Marcos Di Falco for his expertise and advice with the preparation of the samples for nano-LC MS/MS as well as Dr. Zoran Minic from the University of Regina for the LC MS/MS analysis. This work was supported by NSERC (DG402214-2011) and CRC (950-221184) funding. D.C. was supported by Concordia Institute for Water, Energy, and Sustainable Systems and FQRNT.
Sterivex -GP 0.22 μm filter unit | Millipore | SVGP01050 | Sampling |
RNAlater Stabilization Solution | Ambion | AM7021 | Sampling |
Tris | Bio Basic | 77-86-1 or TB0196-500G | Protein Extraction/ SDS PAGE gel |
DTT | Sigma-Aldrich | D0632-1G | Protein Extraction |
SDS | Bio Basic | 15-21-3 | Protein Extraction/ SDS PAGE gel |
EDTA | Bio Basic | 6381-92-6 | Protein Extraction |
Glycerol | Fisher Scientific | 56-81-5 | Protein Extraction |
10K Amicon Filter | Millipore | UFC801024 | Protein Extraction |
Methanol | Sigma-Aldrich | 179337-4L | Protein Precipitation |
Acetone | Fisher Scientific | 67-64-1 | Protein Precipitation |
MPC Protein Precipitation reagent | Epicenter | mmP03750 | DNA Precipitation |
2-Propanol | Fisher Scientific | 67-63-0 | DNA Precipitation |
Qubit dsDNA BR Assay kit | Life Technologies | Q32850 | DNA Quantification |
Qubit Protein Assay kit | Life Technologies | Q33211 | Protein Quantification |
Sucrose | Bio Basic | 57-50-1 | SDS PAGE gel |
TEMED | Bio Rad | 161-0800 | SDS PAGE gel |
APS | Bio Rad | 161-0700 | SDS PAGE gel |
30% Acrylamide | Bio Rad | 161-0158 | SDS PAGE gel |
SimplyBlue SafeStain | Invitrogen | LC6060 | SDS PAGE gel |
Glycine | Bio Rad | 161-0718 | SDS PAGE gel |
B-mercaptoethanol | Bio Basic | 60-24-2 | SDS PAGE gel |
Laemmli Sample Buffer | Bio Rad | 161-0737 | SDS PAGE gel |
Precision Plus Protein Kaleidoscope Ladder | Bio Rad | 161-0375EDU | SDS PAGE gel |
Acetonitrile | VWR | CABDH6044-4 | In-gel Trypsin digest |
NH4HCO3 | Bio Basic | 1066-33-7 | In-gel Trypsin digest |
DTT | Sigma-Aldrich | D0632-1G | In-gel Trypsin digest |
Formic Acid | Sigma-Aldrich | F0507-500ML | In-gel Trypsin digest |
HPLC grade H2O | Sigma-Aldrich | 270733-4L | In-gel Trypsin digest |
Iodoacetamide | Bio Basic | 144-48-9 | In-gel Trypsin digest |
Trypsin | Promega | V5111 | In-gel Trypsin digest |
Protein LoBind Tube 1.5ml | Eppendorf | 22431081 | In-gel Trypsin digest |
2mL ROBO vial 9mm | Candian Life Science | VT009/C395SB | In-gel Trypsin digest |
PP BM insert, No spring | Candian Life Science | 4025P-631 | In-gel Trypsin digest |