A semi-automated workflow is presented for targeted sequencing of 16S rRNA from human milk and other low-biomass sample types.
Studies of microbial communities have become widespread with the development of relatively inexpensive, rapid, and high throughput sequencing. However, as with all these technologies, reproducible results depend on a laboratory workflow that incorporates appropriate precautions and controls. This is particularly important with low-biomass samples where contaminating bacterial DNA can generate misleading results. This article details a semi-automated workflow to identify microbes from human breast milk samples using targeted sequencing of the 16S ribosomal RNA (rRNA) V4 region on a low- to mid-throughput scale. The protocol describes sample preparation from whole milk including: sample lysis, nucleic acid extraction, amplification of the V4 region of the 16S rRNA gene, and library preparation with quality control measures. Importantly, the protocol and discussion consider issues that are salient to the preparation and analysis of low-biomass samples including appropriate positive and negative controls, PCR inhibitor removal, sample contamination by environmental, reagent, or experimental sources, and experimental best practices designed to ensure reproducibility. While the protocol as described is specific to human milk samples, it is adaptable to numerous low- and high-biomass sample types, including samples collected on swabs, frozen neat, or stabilized in a preservation buffer.
The microbial communities that colonize humans are believed to be critically important to human health and disease influencing metabolism, immune development, susceptibility to disease, and responses to vaccination and drug therapy1,2. Efforts to understand the influence of the microbiota on human health currently emphasize the identification of microbes associated with defined anatomic compartments (i.e., skin, gut, oral, etc.), as well as localized sites within these compartments3,4. Underpinning these investigative efforts is the rapid emergence and increased accessibility of next-generation sequencing (NGS) technologies that provide a massively parallel platform for analysis of the microbial genetic content (microbiome) of a sample. For many physiological samples, the associated microbiome is both complex and abundant (i.e., stool), but, for some samples, the microbiome is represented by low microbial biomass (i.e., human milk, lower respiratory tract) where sensitivity, experimental artefacts, and possible contamination become major issues. The common challenges of microbiome studies and appropriate experimental design have been the subject of multiple review articles5,6,7,8.
Presented herein is a robust NGS experimental pipeline based on targeted sequencing of the rRNA 16S V4 region9 to characterize the microbiome of human milk. Microbiome analysis of human milk is complicated not only by an inherently low microbial biomass10, but additionally by high levels of human DNA background11,12,13,14 and potential carryover of PCR inhibitors15,16 in extracted nucleic acid. This protocol relies on commercially available extraction kits and semi-automated platforms that can help minimize variability across sample preparation batches. It incorporates a well-defined bacterial mock community that is processed alongside samples as a quality control to validate each step in the protocol and provide an independent metric of pipeline robustness. Although the protocol as described is specific to the human milk samples, it is readily adaptable to other sample types including stool, rectal, vaginal, skin, areolar, and oral swabs10,17, and can serve as a starting point for researchers who wish to perform microbiome analyses.
For all protocol steps, proper personal protective equipment (PPE) must be worn, and stringent contamination prevention approaches need to be taken. Observe flow of work from pre-amplification work areas to post-amplification work areas to minimize contamination of samples. All supplies used are sterile, free of RNase, DNase, DNA, and pyrogen. All pipette tips are filtered. A flowchart of the protocol steps is provided (Figure 1).
1. Sample Lysis
NOTE: Sample lysis and nucleic acid extraction are performed using a DNA/RNA extraction kit in a clean-room environment where both engineering and procedural controls are in place to minimize the introduction of environmental bacteria to the samples.
2. Isolate DNA/RNA
3. Targeted 16S PCR Set-up
NOTE: The set-up for the 16S PCR is carried out in a designated pre-amplification workspace located within the clean-room. The reagents and samples are prepared and then loaded onto a liquid handler to perform the PCR for each sample in triplicate (30 samples, which include true samples and extraction positive and negative controls, plus 2 PCR water controls in triplicate, for a total of 96 combined samples and controls). Once the PCR reactions are assembled and sealed, the sample plate is transferred to a thermal cycler located in a post-amplification area for cycling.
4. Targeted 16S Post-PCR Quality Control Using Tape-based Platform for Gel Electrophoresis
NOTE: Post-PCR quality control (QC) and all subsequent steps are carried out in a designated post-amplification area of the lab. The DNA is analyzed in an automated DNA/RNA fragment analyzer.
5. Library Calculation, Pooling, Clean-up, and QC
The protocol presented here includes important quality control (QC) steps to ensure that the data generated meet benchmarks for protocol sensitivity, specificity, and contamination control. The protocol's first QC step follows PCR amplification of the 16S V4 region (Figure 2). One µL of PCR product from each sample was analyzed by electrophoresis to confirm that it was within the expected size range of 315 – 450 bp (Figure 2, red arrow). Some human milk samples generated lower amounts of specific product (Figure 2A, compare lanes 3 and 9 – 11 with lanes 4 – 8), suggesting either low levels of extracted microbial DNA in those samples, or carry-over of PCR inhibitors during extraction. For samples that produce less than 2.0 nM of product in the 315 – 450 bp range (Figure 2A, lane 7), PCR inhibitor cleanup is carried-out using a single step kit and the sample is re-amplified. Success rates for recovery of sample amplification after cleanup is approximately 40%. Quantitation of specific product for each sample (Figure 2B) is essential for determining its required volume for equal molar pooling of samples for sequencing. A pooled library for targeted sequencing is usually dominated by a specific PCR product (Figure 3). If there is a significant amount of non-specific product in the library, a gel-purification step should be added to the workflow.
In the example presented in Figure 2A, faint bands are observed for buffer controls (BC; lanes 2 and 12) and the PCR water negative control (PC; lane 1), indicating possible environmental or reagent contamination. Such bands are not uncommon and typically represent low amounts of PCR product (i.e., <1 nM) and produce few read counts during sequencing (<1,000). Representative sequencing results (Figure 4) confirm that these samples do indeed have very low sequencing read counts (Figure 4A, lanes 1 and 11; Figure 4B, Buffer and PCR Water lanes) and, importantly, the taxa composition for the control samples is distinct from the human milk samples (Figure 4A; compare lanes 1 and 11 with lanes 2 – 10). High read counts in the negative controls, together with significant overlap in taxa composition between controls and samples, suggests cross-contamination and the need for improved contamination control.
Sequencing results (Figure 4) demonstrate high diversity in the taxa associated with the human milk microbiome and variability in the number of sequencing read counts for each sample (Figure 4A, lanes 2 – 10). In contrast, the sequencing results for the bacterial mock that was processed along with the human milk samples demonstrated taxa composition and read counts that were comparable to results obtained for the mock in previous workflow runs (compare Figure 4A, lane 12 with Figure 4B, mock lanes). The consistent results for the mock lanes suggest that the observed variability for the human milk samples is an authentic experimental result, and not a function of intrinsic workflow variability.
Figure 1: Flow chart of the Targeted 16S Sequencing Pipeline. Please click here to view a larger version of this figure.
Figure 2: Quality control analysis of 16S V4 amplicons. (A) Gel image of 16S V4 amplicons resolved by electrophoresis using an automated DNA/RNA fragment analyzer. 16S V4 amplicons were generated according to Caporaso et al.9, and one µL of each PCR product was analyzed using high sensitivity DNA reagents according to the manufacturer's guidelines. Most human milk samples (lanes 3 – 6 and 8 – 11) and the bacterial mock (lane 13) produced a primary PCR product at the expected size of approximately 400 bp (red arrow). The human milk sample in lane 7 failed to produce a significant amount of specific product and was subject to cleanup and re-amplification. Minimal product was detected for the PCR negative control (PC, lane 1), and lysis buffer negative controls (BC, lanes 2 and 12) indicated minimal contamination present in the analyzed samples. MW, molecular weight markers: upper red and lower green bars identify the 1,500 bp and 25 bp size markers, respectively, in each lane. (B) Top Electropherogram of lane 3 from gel in (A). The primary PCR product falls within the peak region defined by the red vertical bars and comprises fragments ranging in size from 299 – 497 bp resulting in an average PCR product size of 396 bp. Gating is done on a slightly wider range than the anticipated amplicon size (in this case 315 – 450 bp) to be sure to include the entire sample peak. The upper and lower peaks correlate with fragment sizes of 25 bp and 1,500 bp, respectively. Bottom: chart summarizing the size parameters for the peak region, the concentration in ng/µL of the PCR product within the peak region, and the molarity in nM for the specific PCR product. This information is then used to calculate how much of each sample will be pooled in an equal molar library for sequencing (see Sample Calculation). Please click here to view a larger version of this figure.
Figure 3: Electropherogram of a pooled and concentrated sequencing library. Equal molar amounts of individual samples to be sequenced were combined into a pooled library. The library was then cleaned and concentrated to a total volume of 50 µL using a silica-membrane-based PCR clean up kit. Final preparation of the library for sequencing on the next generation sequencer was conducted according to the manufacturer's protocol. This library was successfully sequenced despite the presence of additional bands. If there is a concern about PCR products outside the expected size range, the manufacturer's protocol suggests the addition of a gel size selection step. This QC step is not usually performed. Please click here to view a larger version of this figure.
Figure 4: Evaluation of negative and positive controls. (A) Relative abundances of bacterial taxa of an extraction batch with controls and human milk samples. As a QC measure, compositions of each extraction batch as loaded on the automated DNA/RNA purification instrument are generated immediately following a sequencing run. Numbers under each sample bar indicate the number of filtered reads for the respective sample. The compositions of the buffer controls are distinct from that of the human milk samples. (B) Relative abundances of bacterial taxa in buffer, mock, and PCR controls. Number of reads and composition are evaluated for all negative (buffer and PCR water) and positive (bacterial mock) controls. The compositions of the buffer and water vary, but the mock community remains quite stable. Please click here to view a larger version of this figure.
Targeted next-generation sequencing of 16S rRNA is a widely used, rapid technique for microbiome characterization18. However, many factors, including batch effects, environmental contamination, sample cross-contamination, sensitivity, and reproducibility can adversely affect experimental results and confound their interpretation7,19,20. To best facilitate robust 16S analyses, microbiome workflows must incorporate good experimental design, the use of appropriate controls, spatial segregation of workflow steps, and application of best practices. The protocol described here incorporates each of these parameters and provides important experimental tools to address the challenges above and implement a 16S workflow for diverse samples.
Good experimental design is critical for 16S microbiome analyses. This includes proper collection and storage of samples, as well as selection of 16S primers appropriate for the region of interest. For example, the V4 region (515F/806R) is selected for human milk because it has good amplification of Bifidobacterium, which plays an important role in development of the neonatal gut microbiome21. Other primer sets (e.g., 27F/338R, 515F/926R) may be more appropriate for studies of other microbial communities. An important note is that the annealing temperature for the targeted 16S PCR and the expected amplicon size may vary based on primer selection.
Other places the protocol may be modified are based on results of the QC steps incorporated in the work flow. A few options exist for troubleshooting when either no or little DNA is detected following the targeted 16S PCR amplification step. 1) The sample can be put through a PCR inhibitor removal step. Amplification following PCR inhibitor removal using a single step kit performed per the manufacturer's protocol is successful approximately forty percent of the time, 2) more extracted DNA can be added to the targeted 16S PCR, or 3) a new and potentially larger aliquot of milk can be processed if available. If there is a concern about PCR products outside the expected size range following the silica-membrane-based purification of the library, a gel purification step can be added. Finally, the QC steps are critical to determine if there is evidence of contamination, which is discussed in detail below. If significant contamination is detected, then depending on where the contamination is introduced, either the PCR can be repeated or if necessary, the sample can be re-extracted. Fortunately, with good laboratory practices, these are rare events. Finally, while this protocol is written to highlight caveats in the amplification of low biomass samples and specifically human milk, the protocol can easily be modified for the amplification of oral, rectal, vaginal, and skin swabs or sponges as well as stool. If other sample types are chosen, then consideration is given to which extraction kits are optimal for the specific sample type.
Batch effects due to kits, reagents, or sequencing runs are important sources of variability in microbiome studies. DNA extraction kits, along with other reagents, possess low levels of bacterial DNA, which may vary substantially by lot20,22,23,24. For a large project, using a single lot of kits, reagents, selecting kits designed to minimize kit contamination may simplify analysis7. Samples of both subjects and controls are processed side by side. It is best if all the samples for a single study, both subject and control, can be incorporated into a single sequencing run. If a large number of samples are to be processed in batches, samples that are representative of both subjects and controls are included in each batch and processed together. It is also important to organize batch processing to minimize contamination of low-biomass samples (i.e., human milk) by samples that are high biomass (i.e., stool). In such cases, process low biomass sample types first, and then high biomass samples for the same study.
Low biomass samples pose unique challenges to microbiome studies, as contamination from the environment, reagents, instruments, and the researcher can make it difficult to distinguish between authentic community members present in low abundance7,25,26 and those that are artificially introduced to the sample through the experimental process19. The workflow described here incorporates important experimental negative controls at both the sample preparation step (buffer-only lysis control), and the PCR step (PCR water control) (see Figure 4). These controls help identify contamination sources and facilitate effective corrective measures at the bench or in silico27,28,29,30. Negative controls are carefully evaluated and reported with the study results7.
To minimize contamination, spatial segregation of experimental activities into a clean pre-PCR amplification area and post-amplification area is important. Optimally, the clean room has both an area for PCR master mix preparation and a sample preparation/addition of master mix area, and may incorporate a separate dead air box or a biological safety cabinet housing dedicated consumables and small equipment needed for the master mix preparation. The clean room design incorporates a positive airflow system with high efficiency particulate air (HEPA) filtration. Use of personal protective equipment is essential to maintaining a controlled, low-microbial environment, and includes hairnets, lab coats, gloves, and shoe covers. Kits/reagents and samples are ideally stored in separate dedicated refrigerators/freezers. PCR setup is also carried out in the clean room in designated workstations; clear separation of primer stocks and reagents from extracted DNA is maintained until samples are loaded on the automated pipetting platform. Once a PCR setup is complete, the plate is transferred to the post-amplification area and loaded onto a thermal cycler.
It is important to restrict the flow of work activity from clean areas to post-amplification areas; there is no retrograde movement of reagents, instruments, or supplies from post-amplification areas to the clean area. Personnel that have entered any of the post-amplification areas are barred from entry to the clean area for 24 hours (until the next day). In addition to the above workflow considerations, cleaning protocols must be implemented in both clean and post-amplification areas to minimize nucleic acid contamination of work surfaces and instrumentation. If physical barriers or separate rooms are not possible, all efforts must be taken to set up the work in areas as far apart as possible.
In addition to contamination, microbiome studies are challenged by sensitivity, variability, and reproducibility31. This protocol addresses these issues by incorporating a defined bacterial mock community that is extracted, amplified, and sequenced along with each batch of samples (see Figure 4b). This control provides a constant internal reference that evaluates the reproducibility of the experimental results generated, and can be used to troubleshoot problems that arise. For example, the quality of the extracted mock DNA can provide a metric for effective sample lysis and DNA extraction, which genomic DNA controls miss. Quality control of PCR amplicons for the mock sample can also indicate PCR efficiency and specificity. Furthermore, because the mock comprises multiple bacteria types, the relative sensitivity for a processed batch of samples can be inferred by the representation of taxa in the sequencing results for the mock sample. An ideal mock community will evaluate the ability to detect key bacterial species in the compartment being analyzed, and therefore the composition of the mock community may need to vary by study. As shown in Figure 4a, there is considerable variability among sample sequencing results, but the sequence results for the bacterial mock community is highly reproducible (see Figure 4b).
While the mock community in Figure 4 is a unique mixture of 33 strains from a combination of commercially available and local clinical isolates, a commercially available mock community has recently been developed32.
Although the workflow described here is limited in its ability to broadly address reproducibility across different microbiome studies, it does provide an important experimental approach that allows researchers to incorporate appropriate experimental controls and monitor reproducibility within their own results.
The authors have nothing to disclose.
We would like to thank Helty Adisetiyo, PhD and Shangxin Yang, PhD for the development of the protocol.Overall support for the International Maternal Pediatric Adolescent AIDS Clinical Trials Group (IMPAACT) was provided by the National Institute of Allergy and Infectious Diseases (NIAID) of the National Institutes of Health (NIH) under Award Numbers UM1AI068632 (IMPAACT LOC), UM1AI068616 (IMPAACT SDMC) and UM1AI106716 (IMPAACT LC), with co-funding from the Eunice Kennedy Shriver National Institute of Child Health and Human Development (NICHD) and the National Institute of Mental Health (NIMH). The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH.
AllPrep RNA/DNA Mini Kit | Qiagen | 80204 | DNA/RNA extraction kit |
Eliminase | Fisher Scientific | 435532 | RNase, DNase, DNA decontaminant |
Thermo Mixer | Fisher Scientific | temperature-controlled vortexer | |
Buffer RLT plus | Qiagen | 1053393 | guanidinium thiocyanate lysis buffer/ Part of Allprep kit |
ß-Mercaptoethanol | Sigma Aldrich | 63689-25ML-F | ß-ME is a reducing agent that will irreversibly denature RNases by reducing disulfide bonds |
LME Beads | MP Biomedicals | 116914050 | bead tube |
QIAgen TissueLyzer | Qiagen | 85300 | automated sample disruptor adapter set |
QIAshredder column | Qiagen | 79654 | |
QIAgen RB tube | manufacturer's microcentrifuge tube in kit | ||
QIAcube and related plasticware | Qiagen | 9001292 | automated DNA/RNA purification instrument |
DNA exitus plus | Applichem | A7089 | non-enzymatic decontamination solution |
EB Buffer | Qiagen | 19086 | elution buffer |
QIAgility and related plasticware | Qiagen | 9001532 | robotic liquid handler |
PCR water | MO BIO | 17000- | |
5PRIME HotMasterMix | Quantabio | 2200400 | |
Barcoded reverse primers | Eurofin | No Catalog #'s | designed and ordered |
96 well PCR plate | USA scientific | 1402-9708 | |
Tapestation 2200 and related plasticware | Agilent | G2964AA | automated DNA/RNA fragment analyzer |
D1000 reagents for Tapestation | Agilent | 5067-5585 | Sample buffer and ladder are part of this kit |
OneStep PCR Inhibitor Removal Kit | Zymo Research | 50444470 | PCR inhibitor removal is done per the manufacturer's instructions. |
QIAquick PCR Purification Kit | Qiagen | 28104 | DNA clean up kit: silica-membrane-based purification of PCR products |
Qubit dsDNA HS Assay Kit | Thermo Fisher | Q32854 | dimethylsulfoxide-based dilution buffer and dye are part of this kit. |
Qubit Fluorometer | Thermo Fisher | Q33216 | |
NanoDrop | Thermo Fisher | microvolume spectrophotometer | |
MiSeq 300 V2 kit | Illumina | 15033624/15033626 | |
MiSeq | Illumina | No Catalog #'s | next generation sequencer |