We describe an efficient, robust, and cost effective method for extracting nucleic acid from swabs for characterization of bacterial communities using 16S rRNA gene amplicon sequencing. The method allows for a common processing approach for multiple sample types and accommodates a number of downstream analytic processes.
There is a growing appreciation for the role of microbial communities as critical modulators of human health and disease. High throughput sequencing technologies have allowed for the rapid and efficient characterization of bacterial communities using 16S rRNA gene sequencing from a variety of sources. Although readily available tools for 16S rRNA sequence analysis have standardized computational workflows, sample processing for DNA extraction remains a continued source of variability across studies. Here we describe an efficient, robust, and cost effective method for extracting nucleic acid from swabs. We also delineate downstream methods for 16S rRNA gene sequencing, including generation of sequencing libraries, data quality control, and sequence analysis. The workflow can accommodate multiple samples types, including stool and swabs collected from a variety of anatomical locations and host species. Additionally, recovered DNA and RNA can be separated and used for other applications, including whole genome sequencing or RNA-seq. The method described allows for a common processing approach for multiple sample types and accommodates downstream analysis of genomic, metagenomic and transcriptional information.
The human lower reproductive tract, gastrointestinal system, respiratory tract, and skin are colonized by complex bacterial communities that are critical for maintaining tissue homeostasis and supporting the health of the host1. For instance, certain lactobacilli create an inhospitable environment for pathogens by acidifying the vaginal vault, producing antimicrobial effectors and modulating local host immunity2-4. The growing appreciation for the bacterial microbiome's importance has also increased interest in characterizing bacterial communities in many clinical contexts. Here we describe a method to determine the composition of the bacterial microbiome from genital swabs. The protocol can be readily modified for stool samples and swabs collected from other anatomical locations and other host species.
Due to the inherent limitations in the number of samples that can be collected and stored from a given study participant, this protocol was designed to extract DNA, RNA, and potentially even protein from a single swab using an adapted phenol-chloroform based bead-beating method5,6. The combination of physical disruption of bacterial cell walls with bead-beating and chemical disruption with detergents allows rapid lysis of Gram-positive, Gram-negative, and acid-fast bacteria without additional enzymatic digestion steps. To obtain high quality RNA, it is recommended to use dry swabs that were kept at or below 4 °C immediately after collection and during transport to the laboratory (if applicable), and stored long-term at -80 °C.
To determine the bacterial microbiome within a given sample, this procedure utilizes 16S rRNA gene amplicon sequencing, which is currently the most cost-effective means to comprehensively assign bacterial taxonomy and perform relative quantification. Alternative methods include targeted qPCR7, custom microarrays8, and whole-genome sequencing9. The 16S rRNA gene contains nine hypervariable regions, and there is no consensus regarding the optimal V region to sequence for vaginal microbiome studies. Here we use the 515F/806R primer set and build on the pipeline designed by Caporaso et al.10-12. Caporaso et al.'s 515F/806R primer set enables multiplexing of hundreds of samples on a single sequencing run due to the availability of thousands of validated barcoded primers and compatibility with Illumina sequencing platforms. Unlike the Human Microbiome Project's 27F/338R primer set13, 515F/806R also effectively amplifies Bifidobacteriaceae and thus accurately captures Gardnerella vaginalis, an important member of the vaginal microbial community in some women. Alternatively, a 338F/806R primer pair has been successfully used for pyrosequencing of vaginal samples14 and a 515F/926R primer pair has recently become available for next-generation sequencing12.
Finally, this protocol provides basic instructions to perform 16S amplicon analysis using the Quantitative Insights into Microbial Ecology (QIIME) software package15. Successful implementation of the QIIME commands described here yields a table containing bacterial taxonomic abundances for each sample. Many additional quality control steps, taxonomic classification methods, and analysis steps can be incorporated into the analysis, as described in detail on the QIIME website (http://qiime.org/index.html). If the analysis will be performed on an Apple computer, the MacQIIME package16 provides easy installation of QIIME and its dependencies. Alternative software packages for 16S rRNA gene sequence analysis include Mothur17 and UPARSE18.
The study protocol was approved by and followed the guidelines of the Biomedical Research Ethics Committee of the University of KwaZulu-Natal (Durban, South Africa) and the Massachusetts General Hospital Institutional Review Board (2012P001812/MGH; Boston, MA).
1. Extraction of Total Nucleic Acid from Cervicovaginal Swabs
Note: Perform nucleic acid extractions in sets of 16 samples or fewer. The protocol as written below assumes samples are processed in sets of 12. If performing multiple rounds of extractions, serially number the extraction batches and record each sample's extraction batch number as well as other sample information (include metadata such as the participant's ID number, age, date/time of swab collection, hormonal contraceptive type, sexually transmitted infection testing results, etc.) in Table 1.
2. PCR Amplification of the 16S rRNA Gene V4 Hypervariable Region
Note: Perform the PCR amplification in sets of 12 samples or fewer to minimize the risk of contamination and human error. If performing multiple rounds of amplification, serially number the amplification batches and record each sample's amplification batch number in Table 1.
3. Library Pooling and High-Throughput Sequencing
4. Sequence Analysis
Note: Outlined here is a basic pipeline for sequence analysis using the QIIME 1.8.0 software package. For simplicity, the provided commands assume that the mapping file is called mapping.txt, the 12 bp index read file is called index.fastq, and the 300 bp sequencing read file is called sequences.fastq. Install QIIME or MacQIIME16 and familiarize yourself with the basics of UNIX to execute these commands. Read the complete guide to QIIME at:
The general overview of the protocol, which enables the determination of relative bacterial abundances from a swab using 16S rRNA gene sequencing, is shown in Figure 1.The protocol has been optimized for human vaginal swabs, but can be easily adapted for most mucosal sampling sites and other hosts. Figure 2 demonstrates the high-quality DNA and RNA that can be isolated using the bead-beating protocol. Figure 3 illustrates a successful PCR amplification of 12 samples, where each amplification with a sample yielded a single strong band of the correct size and each water control did not yield a band. Figure 4 illustrates the quantification of the final library pool prior to sequencing. Figure 5 shows a typical sequence quality profile after a single-end 300 bp MiSeq run.
Figure 1. Schematic Overview of the Protocol. First, nucleic acid is extracted from a swab by bead-beating in a buffered solution containing phenol, chloroform, and isoamyl alcohol. Variable region 4 of the 16S rRNA gene is then amplified from the resulting nucleic acid using PCR. PCR amplicons from up to hundreds of samples are then combined and sequenced on a single run. The resulting sequences are matched to a reference database to determine relative bacterial abundances. The entire protocol can be performed in approximately three days. Please click here to view a larger version of this figure.
Figure 2. High-quality Nucleic Acid Extracted Using the Phenol:Chloroform Bead Beating Method. (A) DNA quality, as assessed using a spectrophotometer. An A260/A280 ratio between 1.8 and 2.0 indicates pure nucleic acid that is not contaminated with phenol or protein. (B) After a column clean-up, this protocol can yield high-quality RNA, indicated by strong 16S and 23S rRNA peaks. (C) RNA degradation can occur if the sample is not kept cold after collection (during transport and storage) or if RNases are present during processing. Please click here to view a larger version of this figure.
Figure 3. Confirmation of Successful 16S rRNA Gene Amplification Using the 515F and Barcoded 806R Primer Set. Top) Gel electrophoresis is used to confirm the presence of a single band around 380 base pairs in every sample that was amplified with template. The absence of a band indicates unsuccessful amplification; this is usually due to human error and the PCR reaction from that sample should be repeated. Bottom) No template (water) controls run in parallel with the same primer pair should not have a band present. The presence of a band in the water control indicates contaminated reagents; discard the reagents that may be contaminated and re-do the PCR amplifications of both the template and water control for that primer pair. Please click here to view a larger version of this figure.
Figure 4. Quantification of the Final Library Pool Concentration and Validation of the Library Size. After pooling the individual sample amplicons, the concentration of the final library pool must be determined. The library pool must then be further diluted to achieve a 2 nM concentration. Please click here to view a larger version of this figure.
Figure 5. Representative Bar Plot of the Sequence Quality Scores at Each Position of the Read. It is normal for the sequence quality to drop after 200 base pairs, but the average quality score should remain above 30. Please click here to view a larger version of this figure.
#SampleID | Barcode Sequence |
LinkerPrimer Sequence |
rcbcPrimer | SampleType | Extraction Batch |
Amplification Plate |
Description | |||
#An example mapping file can be found at: http://qiime.org/_static/Examples/File_ Formats/ Example_ Mapping_ File.txt |
||||||||||
AG2350 | TCCCTTGTCTCC | CCGGACTACHVGGGTWTCTAAT | rcbc000 | Cervical Swab |
1 | A |
Table 1. Mapping File Template. Creating an accurate and thorough mapping file is critical for successfully executing the protocol. The mapping file is not only required for executing QIIME, but it also enables the researcher to maintain the link between the sample barcode and metadata, to analyze the data for any systematic biases (e.g., batch-to-batch variation), and to determine interesting correlations between the metadata and bacterial populations. A bare-bones mapping file is provided, but users are encouraged to add as many columns containing metadata as possible. Examples of additional metadata for a vaginal swab includes the participant's age, date/time of swab collection, hormonal contraceptive type (if applicable), sexually transmitted infection testing results, etc.
Supplemental File 1. List of Barcoded Reverse Primer Sequences10. The first three columns can be used to complete the mapping file, and the last column provides the entire primer sequence for ordering purposes. Please click here to download this file.
Here we describe a protocol for the identification and characterization of relative bacterial abundances within a human vaginal swab. This protocol can easily be adapted for other sample types, such as stool and swabs of other body sites, and for samples collected from a wide variety of sources. The extraction of nucleic acid by bead-beating in a buffered solution of phenol and chloroform allows for isolation of both DNA and RNA, which is particularly important when working with precious samples collected through clinical studies. The isolated bacterial DNA is excellent for bacterial taxonomic identification and genomic assembly, while the simultaneous collection of RNA provides the opportunity to determine functional bacterial, host, and viral contributions through RNA-seq. The described protocol uses a validated one-step primer set that has been successfully deployed on a wide range of sample types, including human, canine, and environmental samples10. The availability of thousands of barcoded primers enables multiplexing of samples and tremendous savings on sequencing costs. The complete cost (including all reagents, a single sequencing run, and primers but not equipment) is about $20 per sample when 200 samples are multiplexed. Additionally, there is very high reproducibility when multiple swabs from the same sample site are processed independently through the entire pipeline. Overall, the protocol is cost efficient, flexible, reliable, and repeatable.
The nucleic acid extraction portion of this protocol is limited by the safety precautions required when working with phenol and chloroform, and the challenges of automating the pipeline to a high-throughput, 96-well plate format. Additionally, the vigorous bead beating used for mechanical lysis shears the bacterial DNA to approximately 6 kilobase fragments; if longer DNA fragments are required for downstream applications, the duration of bead beating should be shortened. The limitations of the bacterial identification portion of this protocol are inherent to any method that relies on 16S rRNA gene sequencing. 16S rRNA sequencing is ideal for bacterial identification to the genus and even species level, but rarely provides strain level identification. While the V4 variable region of the 16S rRNA gene provides robust discrimination amongst most bacterial species11, additional computational methods such as Oligotyping31 may need to be used to precisely identify certain species, such as Lactobacillus crispatus. Finally, information about the precise bacterial functional capabilities within a particular sample cannot be determined by 16S rRNA gene sequencing alone, though this protocol enables extraction of whole genome DNA and RNA that can be used towards this purpose.
The most critical step to ensuring success with this protocol is taking great care to prevent contamination during sample collection, nucleic acid extraction, and PCR amplification. Ensure sterility at the time of sample collection by wearing clean gloves and using sterile swabs, tubes, and scissors. To assess for contamination of the collection materials, collect negative control swabs by placing additional unused swabs directly into transport tubes at the time of sampling. In the lab, perform all pre-amplification steps in a sterilized hood containing only decontaminated supplies and using only molecular grade, DNA-free reagents. During nucleic acid extraction, prevent cross-contamination by using new sterile forceps and fresh gloves with each sample, and keeping all tubes closed unless in use. Processing unused swabs in parallel ensures sterility of both the sample collection and nucleic acid extraction; the unused swabs should not yield a pellet after isopropanol precipitation and ethanol washing. If a pellet does appear, perform 16S rRNA gene amplification to determine a possible source of the contamination (e.g., the presence of Streptococcus or Staphylococcus would indicate skin contamination). Additionally, perform PCR amplifications with no template control reactions in parallel to ensure that the PCR reagents and reactions have not been contaminated. If a band appears in a no template control, discard the reagents and repeat the amplification with fresh reagents. Taking these precautions will ensure successful sequencing of the bacteria of interest.
The PCR amplification step tends to require the most troubleshooting. Amplifying in sets of twelve samples provides a balance between efficiency and consistency. The complete absence of bands across all samples in a given amplification set indicates a systematic failure, e.g., forgetting to add a reagent or incorrectly programming the thermocycler. The absence of a band from a few samples is usually due to human error, and the amplifications should be re-run with the same pairing of sample and reverse primer. In the case of continued absence of a band, the sample can be re-amplified using a reverse primer with a different barcode. Repeated amplification failures with multiple reverse primers may indicate an inhibitor present in the sample. In that case, cleaning the DNA with a column will often remove inhibitors without significantly altering relative bacterial abundances. If multiple bands result after amplification, re-amplify the sample with a different reverse primer barcode.
In addition to preventing environmental contamination and ensuring amplification of a single specific product, successful sequencing relies on care when preparing the library pool. The goal is to combine equimolar amounts of each sample's amplicons to ensure approximately the same number of sequencing reads per sample. If the nucleic acid concentrations prior to amplification are comparable, simply adding equal volumes of each sample's amplicons is sufficient when creating the library pool. However, if the nucleic acid concentrations are vastly different and added in equal volume, the sample with the low nucleic acid concentration will be poorly represented with a low number of reads. In this case, it is possible to add a higher volume of the amplicons from the low concentration sample based upon the relative intensity of the gel band. Alternatively, it is possible to more rigorously remove primers from the individual amplicons, quantify individual sample's amplicon concentration using a fluorometric dsDNA quantification kit, and precisely combine equimolar amounts of each sample.
Once a well-balanced amplicon pool is generated, it becomes critical to carefully measure the pool's concentration. Subsequent careful dilution and spike-in with PhiX to increase the read complexity is critical for achieving optimal sequencing results. High-throughput sequencers that use sequencing by synthesis are very sensitive to the cluster density on the flow cell. Loading a library pool that is too concentrated will result in overclustering, with lower quality scores, lower data output, and inaccurate demultiplexing32. Loading a library pool that is too dilute will also result in low data output. Carefully quantifying the library pool prior to sequencing will ensure optimal results.
16S rRNA gene sequencing provides a comprehensive assessment of the bacteria present within a given sample and is an absolutely critical first step in hypothesis generation. The presence of a rich set of metadata further enables the researcher to test associations between particular bacterial species and important biological factors. Furthermore, the same 16S information can be used to infer the bacterial functions using with tools such as PICRUSt33. The ultimate goal is to use 16S characterization to identify novel associations that can be further tested and validated in model systems, adding to our growing understanding of the impact of the bacterial microbiome on human health and disease.
The authors have nothing to disclose.
We would like to thank Elizabeth Byrne, David Gootenberg, and Christina Gosmann for critical feedback on the protocol; Megan Baldridge, Scott Handley, Cindy Monaco, and Jason Norman for sample preparation guidance and demonstrations; Wendy Garrett, Curtis Huttenhower, Skip Virgin, and Bruce Walker for protocol advice and fruitful discussions; and Jessica Hoisington-Lopez for sequencing support. This work was supported by the Bill and Melinda Gates Foundation and the NIAID (1R01AI111918). D.S.K. received additional support from the Burroughs Wellcome Fund. M.N.A. was supported by award number T32GM007753 from the NIGMS, and the Paul and Daisy Soros Fellowship. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIGMS or the NIH.
Equipment: | |||
Mini-Beadbeater-16 | BioSpec | 607 | |
PCR workstation | Any PCR hood can be used, e.g., the AirClean 600. | ||
Thermocycler | Any thermal cycler with a heated lid can be used, e.g., MJ Research PTC-200. | ||
Electrophoresis system | Any electrophoresis system can be used, e.g. the Thermo Scientific Owl EasyCast B1 Mini Gel Electrophoresis system. | ||
Nanodrop | Thermo Scientific | 2000C | Any other DNA quantification method will be sufficient |
Bioanalyzer | Agilent | 2100 | An alternative is the Agilent 2200 TapeStation Instrument. Not absolutely necessary but very helpful. |
MiSeq or HiSeq | Illumina | ||
Name | Company | Catalog Number | コメント |
Materials: | |||
Catch-All Sample Collection swab | Epibio | QEC89100 | Other swabs can be used but the Catch-All swab is recommended by the Human Microbiome Project. |
ELIMINase | Fisher | 04-355-31 | |
SteriFlip 50 mL filtration device (0.22 µm) | EMD Millipore | SCGP00525 | |
0.1 mm glass beads | BioSpec | 11079101 | |
2 mL screw-cap tubes | Sarstedt | 72.694.006 | For bead beating |
UltraPure 5M NaCl | Life Technologies | 24740-011 | Molecular Biology Grade |
1 M Tris-HCl | Ambion (Invitrogen) | AM9856 | Molecular Biology Grade |
0.5 M EDTA | Ambion (Invitrogen) | AM9260G | Molecular Biology Grade |
Sodium Dodecyl Sulfate, 20% Solution | Fisher | BP1311-200 | Molecular Biology Grade |
UltraPure DNase/RNase-free distilled water | Ambion | 10977-015 | Molecular Biology Grade, for buffer preparation |
2-Propanol BioReagent, for molecular biology, ≥99.5% | Sigma | I9516-500ML | Molecular Biology Grade |
Phenol:Chloroform:IAA, 25:24:1 | Invitrogen | AM9730 | Warning: Toxic |
3 M Sodium Acetate, pH 5.5 | Life Technologies | AM9740 | Molecular Biology Grade |
Disposable sterile polystyrene forceps, PS | Cole Parmer | EW-06443-20 | |
1.5 mL, clear, PCR clean tubes | Eppendorf | 22364120 | |
PCR grade water | MoBio | 17000-11 | For PCR |
Phusion High-Fidelity DNA Polymerase | New England Biolabs | M0530S | |
dNTP mix | Sigma | D7295-0.5mL | |
0.2 ml PCR 8-tube with attached clear flat caps, natural | USA Scientific | 1492-3900 | Any 8-tube strips that are DNase, RNase, DNA, and PCR inhibitor free will work |
Agarose | BioExpress | E-3121-25 | |
50X TAE buffer | Lonza | 51216 | |
DNA gel stain | Invitrogen | S33102 | |
6X DNA Loading Dye | Thermo (Fisher) | R0611 | |
50bp GeneRuler Ladder | Thermo (Fisher) | SM0373 | |
AllPrep DNA/RNA kit | Qiagen | 80284 | |
UltraClean PCR Clean-up Kit | MoBio | 12500-100 | |
Quant-iT PicoGreen dsDNA Assay Kit | Thermo Fisher Scientific | P11496 | An alternative is Qubit Fluorometric Quantification (Life Technologies) |
Name | Company | Catalog Number | コメント |
Primers: | |||
515F (forward primer) 5'-AATGATACGGCGACCACCGAG ATCTACACTATGGTAATTGT GTGCCAGCMGCCGCGGTAA-3' |
Order at 100 nmole; Purification: Standard Desalting. Resuspend at 100 µM. **Critical: primers must be resuspended with MoBio PCR Grade Water (see above) in a hood to avoid contamination.** | ||
Reverse primers, see the Supplemental Code File and: ftp://ftp.metagenomics.anl.gov/data/misc/EMP/SupplementaryFile1_barcoded _primers_515F_806R.txt |
IDT is recommended | If ordering large sets of primers, order as a 96-well plate at the 100 nmole scale. Resuspend at 100 μM. Full directions for primer ordering and resuspension at http://www.earthmicrobiome.org/files/2013/04/EMP_primer_ordering_and _resuspension.doc. **Critical: primers must be resuspended with MoBio PCR Grade Water (see above) in a hood to avoid contamination.** |
|
Read 1 Sequencing Primer 5'-TAT GGT AAT TGT GTG CCA GCM GCC GCG GTA A-3' | 25 nmole; Purification: Standard Desalting. Resuspend at 100 µM. | ||
Read 2 Sequencing Primer 5'-AGT CAG TCA GCC GGA CTA CHV GGG TWT CTA AT-3' | 26 nmole; Purification: Standard Desalting. Resuspend at 100 µM. | ||
Index Sequencing Primer 5'-ATT AGA WAC CCB DGT AGT CCG GCT GAC TGA CT-3' | 27 nmole; Purification: Standard Desalting. Resuspend at 100 µM. | ||
PhiX Control v3 | Illumina | FC-110-3001 | Required if performing the sequencing in-house. If the sequencing will be performed by a third-party sequencing center, they will already have PhiX. |