Using the cystic fibrosis airway as an example, the manuscript presents a comprehensive workflow comprising a combination of metagenomic and metatranscriptomic approaches to characterize the microbial and viral communities in animal-associated samples.
The accessibility of high-throughput sequencing has revolutionized many fields of biology. In order to better understand host-associated viral and microbial communities, a comprehensive workflow for DNA and RNA extraction was developed. The workflow concurrently generates viral and microbial metagenomes, as well as metatranscriptomes, from a single sample for next-generation sequencing. The coupling of these approaches provides an overview of both the taxonomical characteristics and the community encoded functions. The presented methods use Cystic Fibrosis (CF) sputum, a problematic sample type, because it is exceptionally viscous and contains high amount of mucins, free neutrophil DNA, and other unknown contaminants. The protocols described here target these problems and successfully recover viral and microbial DNA with minimal human DNA contamination. To complement the metagenomics studies, a metatranscriptomics protocol was optimized to recover both microbial and host mRNA that contains relatively few ribosomal RNA (rRNA) sequences. An overview of the data characteristics is presented to serve as a reference for assessing the success of the methods. Additional CF sputum samples were also collected to (i) evaluate the consistency of the microbiome profiles across seven consecutive days within a single patient, and (ii) compare the consistency of metagenomic approach to a 16S ribosomal RNA gene-based sequencing. The results showed that daily fluctuation of microbial profiles without antibiotic perturbation was minimal and the taxonomy profiles of the common CF-associated bacteria were highly similar between the 16S rDNA libraries and metagenomes generated from the hypotonic lysis (HL)-derived DNA. However, the differences between 16S rDNA taxonomical profiles generated from total DNA and HL-derived DNA suggest that hypotonic lysis and the washing steps benefit in not only removing the human-derived DNA, but also microbial-derived extracellular DNA that may misrepresent the actual microbial profiles.
Viral and microbial communities associated with the human body have been investigated extensively in the past decade through the application of sequencing technologies1,2. The outcomes have led to the recognition of the importance microbes in human health and disease. The major initiative came from the human microbiome project that describes the bacteria (and some archaea) residing on human skin, and within oral cavities, airways, urogenital tract, and gastrointestinal tract3. Further microbiome studies of healthy human airways through bronchoalveolar lavage (BAL)4,5 and nasopharyngeal swabs4 have shown that the lung can serve as an environmental sampling device, results in transient microbial colonization in the airways. However, the impact of microbial colonization in impaired airway surfaces can lead to severe and chronic lung infections, such as those seen in Cystic Fibrosis (CF) patients.
CF is a lethal genetic disease caused by the mutation in Cystic Fibrosis Transmembrane Regulator (CFTR) gene6. These mutations give rise to defective CFTR proteins that in turn affect transepithelial ion transport across the apical surface of the epithelium. The disease affects multiple organ systems, but the majority of mortality and morbidity is attributable to CF lung disease7. The CF lung provides a unique ecosystem for microbial colonization. The defect in ion transport causes mucus to build up in the CF airways, creating microenvironments consisting of aerobic, microaerophilic, and anaerobic compartments anchored by a static nutrient-rich mucosal surface. This environment facilitates the colonization and proliferation of microbes, including viral, bacteria, and fungi. Acute and chronic pulmonary microbial infections lead to constant but ineffective immune responses, resulting in extensive airway remodeling, loss of pulmonary capacity, and ultimately pulmonary failure.
Bacterial communities associated with the CF lung have been well described using both culture-dependent and culture-independent approaches, which include using 16S ribosomal RNA (rRNA) gene sequencing8 and shotgun metagenomics9,10. The 16S rRNA-based approach is able to characterize a wide range of microbial species and capture broad shifts in community diversity. However, it is limited in its resolution in defining the communities (summarized in Claesson et al. 201011) and the predictions of metabolic potentials are limited to those general functions known for the taxa identified. Therefore, 16S rRNA gene sequencing methods are insufficient for the necessary taxonomic and functional analytic accuracy of the diverse microbial communities present in CF lungs. The metagenomic approach described here complements the 16S rRNA-based approach, overcomes its limitations, and enables a relatively effective way to analyze both the microbial community taxonomy and genetic contents in CF lungs.
Microbial DNA isolated from animal-associated samples often contains a large amount of host DNA. CF sputum or lung tissue samples usually contain a large amount of human DNA released by neutrophils in the immune response, often greater than 99% of the total DNA12–14. Although some intact human cells may be present, most of this DNA is free in solution or adsorbed to the surface of microbes. In addition, the presence of exceptionally viscous mucus plugs, cellular debris, and other unknown contaminants further complicate isolation of microbial cells. Several methods were tested for depleting these samples of human DNA, including Percoll gradients to separate human from microbial cells15, treatment with DNase I, ethidium bromide monoazide to selectively degrade human DNA16, and the MolYsis kit, all with limited success. To date the most effective microbial DNA purification procedure for CF sputum has been a modification of the process described by Breitenstein et al. (1995)17. This approach, herein known as hypotonic lysis (HL) method, uses a combination of β-mercaptoethanol to reduce mucin disulfide bonds, hypotonic lysis of eukaryotic cells, and DNase I treatment of soluble DNA9. Despite the lack of alternatives the HL method raised some concerns due to (i) possible biases resulting from unwanted lysis of microbes and (ii) whether the observed fluctuations in community composition9,10 are an artifact of variations associated with the sample processing. In addition to the generation of shotgun metagenomes, we address these issues by comparing the 16S rRNA gene profiles of the total DNA and microbial DNA extracted from the HL method using the same set of sputum samples collected from a single patient across seven consecutive days.
Compared to microbial communities, the characterization of viral communities associated with animals is limited18,19. The viral communities in CF airways have only been characterized minimally20–22. The first metagenomic study characterizing the DNA of viral communities in CF airways showed that most viruses associated with CF lungs are phages20. The metabolic potential of phage in CF and non-CF individuals was significantly different. Specifically, the phage communities in CF individuals carried genes reflective of bacterial host adaptations to the physiology of CF airways, and bacterial virulence20. Subsequent metagenomic studies of viruses in CF lung tissue demonstrated distinct spatial heterogeneity of viral communities between anatomical regions22. In addition, CF lung tissue harbored the lowest viral diversity observed to date in any ecosystem22. Most viruses identified were phages with the potential to infect CF pathogens. However, eukaryotic viruses such as herpesviruses, adenoviruses, and human papilloma viruses (HPV) were also detected. In one event, where cysts in the lung tissue were observed during dissection, more than 99% of a human papillomavirus genome was recovered, even though the patient was never diagnosed with a pulmonary papilloma or carcinoma. This indicates that the viral diversity present not only reflects the severity of tissue damage, but may also expose and explain an underlying uncharacterized disease. The protocols described here provide a simple, yet powerful way to isolate viral-like particles (VLPs) from samples that consist of large amounts of thick mucus, host and microbial cells, free DNA, as well as cell debris.
Complementing metagenomics, metatranscriptomics is used to monitor the dynamics in gene expression across the microbial community and the host9,23. In this case, both microbial and host mRNA need to be preferentially selected. Since bacterial mRNAs are not polyadenylated, an oligo-dT-based mRNA pull-down method cannot be exploited. Polyadenylation-dependent RNA amplification cannot be used in host-associated samples if the samples are known to contain large amounts of eukaryotic mRNA. Many animal-associated samples, including CF sputum, contain a high density of cells in addition to high amounts of cellular debris and nucleases that include RNases. Therefore, another challenging task is to prevent extensive RNA degradation during metatranscriptome processing. In most cases, total RNA extracted from CF sputum is partially degraded, limiting the downstream applications and utility of the derived RNA. In recent years, several approaches for rRNA depletion have been developed and adapted in commercially available kits. The efficacy of these approaches is however limited, especially when working with partially degraded rRNA9,24. The methods employed here allowed for the retrieval of partially degraded total RNA suitable for efficient downstream total rRNA removal. Direct comparison of the efficiency in rRNA removal from partially degraded total RNA comparing two different kits was illustrated by Lim et al. (2012)9.
Overall, the goal of this manuscript is to provide a complete set of protocols (Figure 1) to generate viral and microbial shotgun metagenomes, and a metatranscriptome, from a single animal-associated sample, using induced sputum sample as an example. Molecular laboratory workflow should include separate pre- and post-amplification areas to minimize cross-contamination. The methods are easily adaptable to other sample types such as tissue22, nasopharyngeal and oropharyngeal swabs25, bronchoalveolar lavage (BAL) and coral (unpublished data). Each sample should be processed immediately upon collection especially when microbial metagenomics and metatranscriptomics studies are desired. If the samples were frozen, it limits the isolation of intact microbial cells for microbial metagenomes as freezing potentially disrupt the cell integrity. However, freezing does not preclude metatranscriptomics and viral isolation, but the quality of RNA and amount of viral particles recovered may be affected through the freeze-thaw process. It is important to note that induced sputum has served as the primary source of samples in many studies associated with adult CF patients and other chronic pulmonary diseases26,27 as BAL can be too invasive. In our studies, sputum samples were collected with a careful and consistent sampling method, i.e., following mouthwash and rinsing of the oral cavity using sterile saline solution to keep oral microbes contamination within the sputum samples to a minimum.
NOTE: Induced sputum samples were collected in accordance with the University of California Institutional Review Board (HRPP 081500) and San Diego State University Institutional Review Board (SDSU IRB#2121), by the research coordinator of the University of California, San Diego (UCSD) adult CF clinic.
1. Sample Collection and Pre-treatment (Pre-treat Samples within 30 Min After Collection)
2. Generating Viral Metagenome
3. Generating Microbial Metagenome
4. Generating Metatranscriptome
Viral Metagenomes
CF sputum is exceptionally viscous and contains a high amount of mucin and free DNA (Figure 2A); the density gradient ultracentrifugation facilitates the elimination of host-derived DNA (Figure 2B). The results from a previous study9 showing eight viromes generated from the presented workflow are summarized here (Table 1). Seven samples (CF1-D, CF1-E, CF1-F, CF4-B, CF4-C, CF5-A, and CF5-B; Table 1) were processed as described in Section 2. The generated viromes contained little (0.02%-3.7%) human-derived sequences with only one exception (70%). CF4-A was omitted from the density gradient ultracentrifugation step (CF4-A) and the virome generated from this specific sample contained >97% human-derived sequences (Table 1). Figure 2 shows an example of the epifluorescence microscopy image of a typical CF sputum sample before (Figure 2A) and after (Figure 2C) density gradient ultracentrifugation. Clear viral-like particles (VLPs) were observed in the micrographs without large particles following the density gradient separation. After VLPs DNA extraction, bacterial contamination is often tested using 16S rDNA amplification prior to the sequencing of VLPs DNA.
Microbial Metagenomes
Seven sputum samples presented here were collected from a single CF patient across seven consecutive days. The patient started on oral antibiotic (Ciprofloxacin and Doxycycline) on Day 3 after the sputum was collected. The volume of each sputum sample collected from this patient was 15 ml throughout the 7 days; therefore, PBS was not added to the sample. The goal of this sampling event was to evaluate the protocols presented in this workflow by (i) evaluating the daily fluctuation of microbial community structure, and (ii) compare the microbial community structure and resolution between metagenomics and 16S rDNA sequencing. Therefore, total DNA and HL-DNA were extracted from each sample.
The HL-DNA concentration of each sputum sample following DNA extraction is presented in Table 2. The total yield of HL-DNA ranged from 210 ng to >5 μg. Illumina sequencing libraries were generated with a total starting material of 1 ng for each sample (Figure 3). The characteristics of the metagenomics data are presented in Table 2. All but one library yielded more than 1 million sequences and more than 85% high quality sequences were retained upon data preprocessing using the PRINSEQ29 software. All datasets were first preprocessed to remove duplicates and sequences of low quality (minimum quality score of 25), followed by further screening and removal of human-derived sequences using DeconSeq30. The amount of human-derived sequence contamination is highly dependent on the sample properties. Here, the total amount of human-derived sequences ranged from 14-46% (Table 2). The preprocessed sequences were then annotated using the Metaphlan31 pipeline as well as MG-RAST32 server.
In addition to metagenomes, 16S rDNA amplicon libraries were generated from both the total DNA and HL-DNA via primers targeting approximately 300 bp of the V1-V2 variable region in the 16S rRNA gene33,34. PCR products from individual samples were normalized and pooled for sequencing using the Illumina 500-cycle paired-end sequencing performed on the MiSeq platform. Paired-end 16S rDNA amplicon sequences were sorted by sample via barcodes using a python script and the paired reads were assembled using phrap35,36. Assembled sequence ends were trimmed until the average quality score was ≥20 using a 5 nt window. Potential chimeras were then removed using Uchime37 against a chimera-free subset of the SILVA38 reference sequences. Taxanomy was assigned to the high quality reads with SINA39 (version 1.2.11) using the 418,497 bacterial sequences from the SILVA38 database. Sequences with identical taxonomic assignments were clustered to produce Operational Taxonomic Units (OTUs). This process generated 1,655,278 sequences for 16 samples (average size: 103,455 sequences/sample; min: 72,603; max: 127,113). The median Goods coverage score, a measure of completeness of sequencing, was ≥ 99.9%. The software package Explicet40 (v2.9.4, www.explicet.org) was used for analysis and figure generation. Alpha-diversity (intra-sample) and beta-diversity (inter-sample) were calculated in Explicet at the rarefaction point of 72,603 sequences with 100 bootstrap re-samplings.
The first question targeted by this study was whether hypotonic lysis preferentially selects for (i.e., preferentially retains or lyses) particular groups of microbes. After the first hypotonic lysis, re-suspended pelleted cells were subsampled from the first two samples (CF1-1A* and CF1-2A*) to compare with the same samples after the second hypotonic lysis (CF1-1and CF1-2). All samples were treated equally, i.e., treated with DNase I prior to DNA extraction, followed by DNA extraction and the sequencing pipeline. As shown in Figure 4, the microbial profiles of the subsamples are highly similar to the samples after two hypotonic lysis treatments. In addition, the second hypotonic lysis increases the fraction of non-human sequences by 6-17% within the metagenomes (Table 2).
To test for differences in microbial composition between metagenomic- and 16S rDNA-based profiling, and for changes before and after hypotonic lysis that might explain the differences previously seen between our studies and others, bacterial 16S rDNA sequencing libraries were generated from both the total DNA and HL-derived DNA (Figure 4B). At genus level, the taxonomy profiles of the common CF-associated bacteria such as Pseudomonas, Stenotrophomonas, Prevotella, Veillonella,and Streptococcus were highly similar between the 16S rDNA libraries and metagenomes generated from the HL-derived DNA. However, Rothia detection in the 16S rDNA libraries was not as abundant as with the metagenomic libraries. When comparing the 16S rDNA taxonomical profiles generated from total DNA and HL-derived DNA, Pseudomonas was differentially represented in the total DNA compared to the HL-derived DNA starting from Day 3.
Metatranscriptomes
Typically, the total RNA extracted from CF sputum is partially degraded and the size ranges from 25-4,000 bps (Figures 5A and 5C). Here, the representative results presented was previously published in Lim et al. 20129. The fraction of rRNA within the non-depleted metatranscriptomes ranges from 27-83%, and the relative abundance of rRNA varied across samples (Table 3; data extracted from Lim et al.9). However, depletion with Ribo-Zero kit decreased the rRNAs relative abundance of rRNA to 1-5% with the exception of sample CF1-F. The variation in the effectiveness of rRNA removal could reflect the quality of extracted RNA, or differences in the microbial community present and hence the accessibility of rRNAs for probes hybridization9. The electropherograms of a successful (Figure 5B) and unsuccessful (Figure 5D) rRNA removal procedure using the Ribo-Zero rRNA removal kit differ, at which rRNA peaks are visible in the unsuccessful removal.
The size range of cDNA libraries generated often reflects the size range of the starting RNA sample. The cDNA libraries presented here were generated with a whole transcriptome amplification kit (WTA2) upon rRNA depletion followed by Roche-454 sequencing library preparation9. The cDNA generated contain fragments ranging from 50-4,000 bps (Figures 5E and 5F) and is highly consistent across samples (Lim et al. 2012)9. The availability of other platform-specific RNA-Seq library preparation kits currently provide more alternative options for one to combine cDNA synthesis and sequencing library preparation in optimum conditions. One recommended option to date is the ScriptSeq Complete Gold Kit combining rRNA removal reagents recommended above and RNA-Seq library preparation kit.
Figure 1: Workflow for the preparation of host-associated samples, such as sputum sample, for virome, microbiome, and metatranscriptome sequencing.
Figure 2: Cesium chloride density gradients ultracentrifugation facilitate the elimination of extracellular DNA and large particles (A), and allow for optimal isolation of viral-like particles from CF sputum. One milliliter of each gradient is layered on top of each other prior to loading the pre-treated sample (B). Following particles isolation and purification, epifluorescence microscopy with nucleic acid dyes such as SYBR Gold are used to verify the presence and purity of viral particles in samples. Clear viral-like particles (C; white arrow) were observed following the density gradient separation of CF sputum sample.
Figure 3: Example of the size distribution of Nextera XT libraries generated from 1 ng of HL-DNA that resulted in CF sputum microbiomes. Library normalization, pooling, and loading amount was done as described in the manufacturer protocol without any deviation.
Figure 4: Taxonomic analysis of the microbial communities in nine samples collected longitudinally from one CF patient. (A) Microbial profiles based on the metagenomic libraries generated from hypotonic lysis method-based DNA. The species assignment was based on the Metaphlan pipeline following data preprocessing that remove duplicates and sequences with low quality and human sequence homology. In order to show that two-steps hypotonic lysis did not preferentially selects for particular groups of microbes, subsamples (*) after the first hypotonic lysis were included. (B) Microbial profiles based on the V1V2 region of 16S rRNA gene sequencing from total DNA (T) and hypotonic lysis method-based DNA (HL). These data have not been previously published.
Figure 5: Examples of Agilent 2100 Bioanalyzer electropherograms of RNA (A-D) and cDNA (E-F) generated for the metatranscriptomic libraries, using RNA pico and high-sensitivity dsDNA chips respectively. (A) and (C) show the examples of electropherograms before rRNA removal procedures. The electropherograms of a successful (B) and unsuccessful (D) rRNA removal procedure using total rRNA Removal kit differ slightly, at which rRNA peaks are visible in the unsuccessful removal. The size range of cDNA (E-F) generated using the whole-transcriptome amplification kit (Sigma-Aldrich) is similar to the size range of the starting rRNA-depleted RNA, and highly consistent across the two different samples. Please click here to view a larger version of this figure.
CF1-D | CF1-E | CF1-F | CF4-A | CF4-B | CF4-C | CF5-A | CF5-B | |
Total number of reads | 224,859 | 87,891 | 106,189 | 93,301 | 140,020 | 1,558 | 272,552 | 217,438 |
Preprocessed readsa | 109,389 | 73,624 | 67,070 | 82,011 | 68,617 | 1,137 | 215,808 | 158,432 |
49% | 84% | 63% | 88% | 49% | 73% | 79% | 73% | |
Number of bases | 47,239,573 | 33,351,525 | 28,922,479 | 27,667,695 | 29,386,841 | 243,986 | 95,205,805 | 69,581,811 |
Mean read length | 432 | 453 | 431 | 337 | 428 | 215 | 441 | 439 |
Host sequencesb | 240 | 526 | 28 | 79,774 | 13 | 797 | 585 | 5,859 |
0.21% | 0.71% | 0.04% | 97.27% | 0.02% | 70.10% | 0.27% | 3.70% | |
Viral hitsc | 7,214 | 23,550 | 4,070 | 737 | 4,642 | 22 | 6,466 | 5,981 |
6.59% | 31.99% | 6.07% | 0.90% | 6.77% | 1.93% | 3.00% | 3.78% | |
Unassigned Readsd | 103,888 | 60,490 | 32,780 | 1,935 | 68,440 | 311 | 105,612 | 119,551 |
94.97% | 82.16% | 48.87% | 2.36% | 99.74% | 27.35% | 48.94% | 75.46% | |
a Reads after data pre-processing by PRINSEQ29. | ||||||||
b Human reads identified by DeconSeq30 plus reads with a best BLASTn hit (NCBI nucleotide database) to the phylum Chordata. | ||||||||
c tBLASTx hits against in-house viral genome database. The percentage was calculated using the total number of preprocessed reads. | ||||||||
d Reads with no BLASTn hit against the NCBI nucleotide database. The percentage was calculated using the total number of preprocessed reads. Some reads with no BLASTn hit against the NCBI nucleotide database were identified as viral at protein level in the tBLASTx analysis. |
Table 1: Library characteristics of eight viromes generated from sputum samples using presented workflow. This table is extracted from Lim et al. (2012)9. Seven samples (CF1-D, CF1-E, CF1-F, CF4-B, CF4-C, CF5-A, and CF5-B) were processed as described in Section 2 and generated viromes that contained little (0.02% – 3.7%) human-derived sequences with one exception (70%). CF4-A was omitted from the density gradient ultracentrifugation step (CF4-A) and generated virome that contained > 97% human-derived sequences.
Sample | Concentration | Total Yield | Total No. Reads | Total No. Reads (Processedb) | Non-human Sequences |
(ng/μl) | (ng) | (Rawa) | (%) | ||
CF1-1A* | 2.3 | 230 | 1,098,454 | 937,688 | 691,541 |
74% | |||||
CF1-1 | 13 | 1,300 | 2,212,756 | 1,958,910 | 1,574,520 |
80% | |||||
CF1-2A* | 2.1 | 210 | 672,878 | 588,106 | 407,530 |
69% | |||||
CF1-2 | 5.2 | 520 | 1,944,012 | 1,697,010 | 1,455,174 |
86% | |||||
CF1-3 | 28.8 | 2,880 | 1,048,304 | 896,756 | 560,852 |
63% | |||||
CF1-4 | 24.1 | 2,410 | 1,154,922 | 984,702 | 621,098 |
63% | |||||
CF1-5 | 33.6 | 3,360 | 1,029,622 | 888,630 | 481,548 |
54% | |||||
CF1-6 | 43.2 | 4,320 | 1,434,016 | 1,256,504 | 725,858 |
58% | |||||
CF1-7 | 57.8 | 5,780 | 1,000,174 | 872,036 | 565,376 |
65% | |||||
* 1 ml of sample was subsampled from CF1-1 and CF1-2 following the first hypotonic lysis step (Step 3.1.5) before the second hypotonic lysis procedure. The cells were spun down as described in 3.1.7 and proceed through the remaining protocol without any modification. | |||||
a Unprocessed Illumina reads from a 2 x 300 bp MiSeq sequencing run. | |||||
b Reads were assessed, trimmed, and removed based on quality and length as described in the discussion. |
Table 2: Characteristics of microbiomes generated from sputum samples using presented workflow. The DNA concentration of each sample in 100 μl elution buffer (5 mM Tris/HCl, pH 8.5) and the characteristics of sequence data are presented. A total of 1 ng was used to generate individual library using the Nextera XT library preparation kit.
Sample | CF1-D | CF1-F | CF4-B | CF4-C | ||||
Treatment | None | Ribo-Zero | None | Ribo-Zero | None | Ribo-Zero | None | Ribo-Zero |
Preprocessed reads | 2,088 | 1,991 | 40,876 | 25,238 | 19,728 | 32,737 | 31,791 | 36,172 |
Mean read length | 275 | 245 | 262 | 270 | 233 | 259 | 240 | 267 |
Total rRNA reads | 1,737 | 91 | 29,499 | 17,267 | 5,285 | 291 | 16,371 | 1,761 |
83.20% | 4.60% | 72.20% | 68.40% | 26.80% | 0.90% | 51.50% | 4.90% | |
Microbial rRNA | 1,414 | 32 | 19,978 | 12,035 | 23 | 227 | 6,916 | 1,076 |
67.70% | 1.60% | 48.90% | 47.70% | 0.10% | 0.70% | 21.80% | 3.00% | |
Eukaryota rRNA | 323 | 59 | 9,520 | 5,232 | 5,262 | 64 | 9,455 | 683 |
15.50% | 3.00% | 23.30% | 20.70% | 26.70% | 0.20% | 29.70% | 1.90% | |
% rRNA removed* | 0% | 95% | 0% | 5% | 0% | 97% | 0% | 91% |
Non-rRNA reads | 351 (16.8%) | 1,900 (95.4%) | 11,377 (27.8%) | 7,971 (31.6%) | 14,443 (73.2%) | 32,446 (99.1%) | 15,420 (48.5%) | 34,411 (95.1%) |
Total NR hits | 102 (4.9%) | 691 (34.7%) | 3,327 (8.1%) | 2,857 (11.3%) | 4,938 (25.0%) | 10,751 (32.8%) | 5,905 (18.6%) | 15,766 (43.6%) |
Eukaryotic | 74 | 407 | 2,790 | 2,524 | 4,614 | 10,227 | 4,553 | 8,274 |
Bacterial | 26 | 283 | 520 | 312 | 287 | 471 | 1,326 | 7,442 |
Unassigned reads | 249 (11.9%) | 1,209 (60.7%) | 8,050 (19.7%) | 5,114 (20.3%) | 9,505 (48.2%) | 21,695 (66.3%) | 9,515 (29.9%) | 18,645 (51.5%) |
*The amount of rRNA removed expressed as a percentage of the amount present in the non-depleted aliquot. |
Table 3: Library characteristics of the metatranscriptomes with and without rRNA depletion. The data is extracted from Lim et al. (2012)9, which has additional comparison of other rRNA removal kits and the effect of cDNA nebulization prior to sequencing library preparation.
Viral Metagenomics
Viral particles are concentrated using polyethylene glycol (PEG) precipitation or small volume concentrators. In some cases, concentration may not be needed, but pre-filtration or low speed centrifugation steps are used to remove eukaryotic and microbial cells. Viral lysates will be further enriched and purified using density gradient ultracentrifugation9,41 or small size filters (e.g., 0.45 μm) to remove eukaryotic and large microbial cells25. Density gradient ultracentrifugation is typically performed with dense but inert solutions such as sucrose or cesium chloride to isolate and concentrate viral particles41. Physical separation is based on the size and buoyant density of viral particles. Therefore, proper choice of filter pore size and the rigorous preparation of gradients are essential to isolate specific viral communities, as the success of the physical recovery of VLPs determines the community isolated41 (i.e., viral particles that do not pass through the filter or fall within the extraction density will not be detected in the metagenome). After viral isolation and concentration, there may be contaminating non-viral genomic material present in the sample both in the form of free nucleic acids and microbial and eukaryotic cells. Therefore, it is critical to verify the purity of viral particles in samples (Figures 1A and 1B). A chloroform treatment is commonly used to lyse remaining cells, followed by nuclease treatment to degrade free nucleic acids prior to nucleic acid extraction.
A caveat to the presented workflow was the use of density gradient separation to isolate viral particles as it may exclude enveloped viral particles that may be too buoyant to enter the CsCl gradient. An alternative “catch-all” method is to omit the density gradient separation and isolate the community DNA from the 0.45 μm – filtrates treated with chloroform and DNase I. This approach is also appropriate to accommodate small sample volumes such as those from swabs or blood plasma. However, this may result in chloroform-resistant bacterial contamination and higher amount of DNase I-resistant extracellular DNA.
Current sequencing protocols require 1 ng to 1 μg of nucleic acids for sequencing library preparation whereby higher DNA yields provide a wider choice of sequencing options. The DNA concentration of generated viromes often ranges from below the detection limit to more than 200 ng/μl. The amount of viral nucleic acids recovered may be insufficient for direct sequencing library preparation. In such cases, nucleic acid amplification is essential. Linker amplification shotgun libraries (LASLs)2,42,43 and whole genome amplification based on multiple displacement amplification (MDA) are the two methods most commonly used to generate sufficient DNA for sequencing. MDA methods such as those based on Phi29 DNA polymerase are known to suffer from amplification biases, and may preferentially amplify ssDNA and circular DNA, resulting in non-quantitative taxonomical and functional characterization44,45. An optimized version of the LASLs approach has been shown to introduce only minimal biases, promotes higher sensitivity (for small amounts of starting material), and is easily adapted for different sequencing platforms43. However, the approach has many steps, requires specialized equipment to minimize DNA loss, and is limited to dsDNA templates. In our laboratory, this approach has been successfully adapted to amplify detectable and undetectable amount of DNA extracted from bronchoalveolar lavage-, coral- and sea water-derived VLPs (unpublished and Hurwitz et al.46).
Developing data analysis pipelines has classically been one of the most challenging aspects of viral metagenomics analysis due to the highly diverse and largely unknown nature of the viral communities. While there are an estimated 108 viral genotypes in the biosphere, to date current viral databases contain ~ 4,000 viral genomes, which is about 1/100,000th of this approximate total viral diversity. Therefore, similarity-based searches (such as BLAST47) for taxonomic and functional assignment in viral metagenomes possess inherent challenges. Many sequences fail to have significant similarities to genomes in the database, and therefore, are classified as unknown. Even though homology-based searches are the most important applications for assigning taxonomy and function to sequence data, alternative approaches based on database-independent analysis have been developed48-50. Fancello et al.51 provide a complete review of computational tools and algorithms used in viral metagenomics.
Microbial Metagenomics
Typically, the total amount of DNA extracted from hypotonic lysis-treated microbial communities (HL-DNA) range from 20 ng to 5 μg. The yield is highly dependent on the patient’s health status and the amount of sputum sample collected, which explains the variations seen in the total yield of HL-DNA extracted in this study (Table 2). The critical steps to generate good quality sequence data rely on the quality of sequencing libraries generated. Figure 2 shows a typical size range for the sequencing libraries generated from CF sputum-derived microbial DNA using an enzymatic-based DNA fragmentation procedure. The optimal library size is dependent on the choice of sequencing platform and application, and therefore, the fragmentation procedure can be optimized, if necessary, through alternative approaches such as sonication and nebulization. In addition to the presented representative results, the success of the presented method on CF sputum collected from multiple patients across multiple time points is also illustrated in Lim et al. (2012)9 and Lim et al. (2014)10.
Previous studies9,10 suggest that every patient harbors a unique set of microbial community that shifts over time, thereby reflecting the persistence of the major players within the community while fluctuations are likely due to perturbations such as antibiotic treatments. Whether these fluctuations occur daily even without external perturbations or due to sampling procedure and sample processing, is still in question. Based on the HL-DNA metagenomic and 16S rDNA amplicon analysis, the 7-day longitudinal sampling shows that the daily fluctuation of microbial profiles without antibiotic perturbation (Day 1, 2, and 3) was minimal (Figures 3A and 3B). Upon introduction of oral antibiotics immediately after the Day 3 sampling, changes in the community profile became apparent on Day 4. While the antibiotic ciprofloxacin targets a broad spectrum of known bacterial pathogens such as P. aeruginosa, Staphylococcus aureus, and Streptococcus pneumonia, the treatment increased the relative abundance of P. aeruginosa while decreasing the Streptococcus spp. and P. melaninogenica. By Day 6, the community slowly recovered to the initial starting community structure. The results suggest that fluctuations of microbial profiles within a single patient are more likely due to community perturbations in the airways.
Given the consistency between the microbial profiles of 16S rDNA libraries and metagenomes from HL-derived DNA, we ruled out the biases originating from the 16S rRNA primers used in this study. One possible explanation for the differences seen across 16S rDNA taxonomical profiles generated from total DNA and HL-derived DNA (Figure 3B) may be the presence of high amounts of Pseudomonas spp. extracellular DNA after the antibiotic treatment. This is supported by the findings that these differences were most apparent at Day 7, three days after the antibiotics treatment, which targets Pseudomonas spp. in addition to others. Ciprofloxacin is commonly used as the first-line treatment in patients with CF and chronic P. aeruginosa infection even though its spectrum of activity includes most CF-associated pathogens. We hypothesized that the antibiotic treatment eradicates susceptible communities including Streptococcus spp. and hence creating a niche filled by resistant P. aeruginosa. Pseudomonas aeruginosa may gain resistance through increasing its biofilm communities and extracellular DNA has been shown to be the main structural support of its biofilm architecture52. Even as the community structure recovered, extracellular DNA may have remained in the CF sputum. Therefore, these data suggest that hypotonic lysis and the washing steps presented in this workflow potentially benefit in not only removing the human-derived DNA, but also microbial-derived extracellular DNA that may misrepresent the actual microbial profiles.
Metatranscriptomics
A high quality metatranscriptome should contain relatively few ribosomal RNA (rRNA) sequences and represent an unbiased sampling of the community transcripts (mRNA). Due to the short half-life and limited amount of mRNA, it is critical that the protocol, as presented here, minimizes sample handling to maximize the number of transcripts recovered.
In recent years, several approaches for rRNA depletion have been developed and adapted in commercially available kits. These include MICROBEnrich, Ribo-Zero, and sample-specific subtractive hybridizations53 that are based on oligonucleotide hybridization, and the mRNA-ONLY kit that is based on exonuclease enzymatic activity targeting RNA containing a 5’ monophosphate. In addition, several approaches for mRNA enrichments such as the MessageAmp II-Bacteria Kit that preferentially polyadenylates and amplifies linear RNA are also available. Some of these methods (e.g., mRNA-ONLY, MICROBExpress and the MessageAmp) are used concurrently for optimal efficiency. However, the efficacy of all of these approaches are limited, especially when working with partially degraded rRNA, as often observed in total RNA extracted from CF samples. Polyadenylation-dependent RNA amplification cannot be used to generate metatranscriptomes consisting of both eukaryotic and prokaryotic mRNA. In addition, the poly(A) tail added to the sequences may reduces the amount of useful sequence data. Regions with homopolymer stretches will tend to have lower quality scores, causing a significant number of reads to be filtered out by sequencing and post-sequencing software, and the average useful read length after trimming off poly (A) tails will be reduced significantly54.
Dealing with complex CF microbial communities and partially degraded RNA (Figures 4A and 4C), our previous study showed that the hybridization-capture method by the Ribo-Zero Gold kit was more effective in removing both human and microbial rRNA compared to the combine treatments using other kits9 (Table 3). The resultant data allows concurrent analysis of both human host and microbial transcripts. Depending on the yield and quality of RNA, as well as ultimate choice of sequencing platform, many of these processes including the cDNA synthesis step can be streamlined with sequencing library generation. For example, Ribo-Zero treated RNA can be used to make metatranscriptome sequencing libraries using ScriptSeq RNA-Seq Library Preparation kit.
Metagenomic analysis of animal-associated communities provides a comprehensive representation of the overall functional entity that includes the host and its associated communities. The workflow presented here is adaptable to a variety of complex animal-associated samples, especially those that contain thick mucus, high amounts of cell debris, extracellular DNA, protein and glycoprotein complexes, as well as host cells in addition to the desired viral and microbial particles. Even though viral and microbial particles may be lost at every step, particles isolation and purification are essential to minimize the amount of host DNA. While the metagenomics data provides metabolic potentials of the communities examined, metatranscriptomics complement this by revealing the differential expression of encoded functions9. A comprehensive assessment of the genomics and transcripts data has yielded new insights to the dynamics of community interactions and facilitates the development of improving therapies9,10,55.
The authors have nothing to disclose.
This work was supported by the National Institute of Health (1 R01 GM095384-01) awarded to Forest Rohwer. We thank Epicentre, an Illumina company for providing early access to Ribo-Zero Epidemiology kits. We thank Mark Hatay for the design and production of the ultracentrifugation tube holder. We thank Andreas Haas and Benjamin Knowles for critical readings and discussions of the manuscript, and Lauren Paul for assisting the filming process.
Name of Material/ Equipment | Company | Catalog Number | Comments/Description |
Guanidine isothiocyanate-based RNA lysis buffer | Life Technologies | 10296-028 | TRIzol LS Reagent was used in this study |
Silica beads; 0.1 to 0.15 mm | Cole-Parmer | YO-36270-62 | Zirconia-Silicate Beads were used in this study |
Dithiothreitol Molecular Grade (Dry Powder) | Promega | V3155 | Can be purchased from any other company |
Sterile syringe filter; 0.45 µm pore size hydrophilic PVDF membrane | Millipore | SLHV033RS | |
DNase I enzyme | Calbiochem | 260913 | |
Sterile syringe filter; 0.02 µm pore size | FisherScientific | 09-926-13 | Diameter: 25mm |
Ultra-Clear Centrifuge Tubes | Beckman | 344059 | 9/16 X 3 ½ in. (14X90mm), suitable for SW41 Ti Rotor and VLPs density gradient ultracentrifugation |
SW41 Ti Rotor | Beckman | 333790 | |
SS-34 Fixed Angle Rotor | Thermo Scientific | 28020 | |
Phi29 DNA polymerase | Monserate Biotech | 4001 | 10 U/µl |
Phi29 Random Hexamer | Thermo Scientific | S0181 | Previously bought from Fidelity Systems |
Alumina matrix filter with annular polypropylene support ring; 0.02 µm pore size | FisherScientific | 09-926-34 | Whatman Anodisc Filter Membranes were used in this study; Diameter 25mm |
SYBR Gold Nucleic Acid Gel Stain | Life Technologies | S-11494 | |
2-Mercaptoethanol | Sigma-Aldrich | M6250-100ML | |
RNase-free DNase I | NEB | M0303S | Can be purchased from any other company |
Glycogen, RNA grade | FisherScientific | FERR0551 | Can be purchased from any other company |
Oak Ridge high-speed centrifuge tubes | FisherScientific | 05-562-16B | For large volume DNA extraction procedure, suitable for SS-34 fixed-angle rotor |
Total rRNA removal kit | Epicentre | MRZE724 | ScriptSeq Complete Gold Kit (Epidemiology) can be used to couple rRNA removal with sequencing library preparation |
Cesium chloride | FisherScientific | BP1595-500 | |
Hexadecytrimethyl-ammonium bromide (CTAB) | Sigma-Aldrich | H5882-100G |