Characterizing microbial community has been a longstanding goal in environmental microbiology. Next-generation sequencing methods now allow for the characterization of microbial communities at an unprecedented depth with minimal cost and labor. We detail here our approach to sequence bacterial 16S ribosomal RNA genes using a benchtop sequencer.
One of the major questions in microbial ecology is “who is there?” This question can be answered using various tools, but one of the long-lasting gold standards is to sequence 16S ribosomal RNA (rRNA) gene amplicons generated by domain-level PCR reactions amplifying from genomic DNA. Traditionally, this was performed by cloning and Sanger (capillary electrophoresis) sequencing of PCR amplicons. The advent of next-generation sequencing has tremendously simplified and increased the sequencing depth for 16S rRNA gene sequencing. The introduction of benchtop sequencers now allows small labs to perform their 16S rRNA sequencing in-house in a matter of days. Here, an approach for 16S rRNA gene amplicon sequencing using a benchtop next-generation sequencer is detailed. The environmental DNA is first amplified by PCR using primers that contain sequencing adapters and barcodes. They are then coupled to spherical particles via emulsion PCR. The particles are loaded on a disposable chip and the chip is inserted in the sequencing machine after which the sequencing is performed. The sequences are retrieved in fastq format, filtered and the barcodes are used to establish the sample membership of the reads. The filtered and binned reads are then further analyzed using publically available tools. An example analysis where the reads were classified with a taxonomy-finding algorithm within the software package Mothur is given. The method outlined here is simple, inexpensive and straightforward and should help smaller labs to take advantage from the ongoing genomic revolution.
Metagenomic sequencing is a very powerful technology as it targets the entirety of the genetic information contained in an environmental sample. There are different flavors of metagenomic sequencing, including shotgun sequencing, large-insert libraries and amplicon sequencing. Amplicon sequencing offers the advantage of being relatively inexpensive, fast and able to produce reads from a single genomic region that can be generally aligned. In addition, the data analysis workflow for amplicon sequencing is mostly standardized. However, since it is based on PCR, it has all the biases related to incomplete specificity, incomplete coverage and primer biases1,2, which makes this approach semi-quantitative at best. Several genomic regions can be targeted for amplicon sequencing including functional genes, but the most popular options are to use marker genes such as the 16S rRNA gene to generate a community profile. Traditionally, 16S rRNA gene amplicon sequencing was carried out using labor intensive techniques that included cloning in E. coli, colony picking and plasmid extraction followed by Sanger sequencing on the isolated plasmids, and, consequently, most studies analyzed fewer than 100 clones per sample. Next-generation sequencing brought two major advances: massive parallelization of the sequencing reactions and, most importantly, clonal separation of templates without the need to insert gene fragments in a host. This has simplified tremendously the sequencing of 16S rRNA gene amplicons, which is now back as a routine feature of many environmental microbiology studies, resulting in a “renaissance” for 16S rRNA gene amplicon sequencing 3.
Since the advent of Roche 454 sequencing in 20054, several other next-generation sequencing technologies have appeared on the market (e.g., Illumina, Solid, PacBio). More recently, the introduction of bench-top sequencers brought to small labs the sequencing capacity once exclusive to large sequencing centers. Five benchtop machines are currently available: the 454 GS Junior, the Ion Torrent Personal Genome Machine (PGM) and Proton, and the Illumina MiSeq and NextSeq 500. While all these sequencers offer less reads per run and fewer bases per dollar than most full-scale sequencers, they are more flexible, rapid, and their low acquisition and run costs makes them affordable for small academic laboratories. Benchtop sequencers are particularly well suited for amplicon, small genome and low-complexity metagenome sequencing in environmental microbiology studies, because this type of studies generally does not require an extreme depth of sequencing. For example, it is generally agreed that for 16S rRNA gene sequencing studies the number of reads per sample is not paramount, as ~1,000 reads can generate the same patterns as multi-million reads datasets5. Having said that, benchtop next-generation sequencers still generate large amounts of sequence data, with maximal yields of ~35 Mbp (454 GS Junior), ~2 Gbp (Ion Torrent PGM), ~10-15 Gbp (Ion Torrent Proton), ~10 Gbp (Illumina MiSeq) and ~100 Gbp (Illumina Next Seq 500), which is more than enough for most environmental microbiology studies.
Next-generation sequencing of 16S rRNA amplicons using benchtop sequencers has been recently applied to a wide variety of environments. For example, the Ion Torrent PGM has been used for community analyses of uranium mine tailings that had particularly high pH and low permeability6, of recirculating aquaculture systems7, of hydrocarbon-contaminated Arctic soils8,9, of oil sands mining affected sediments and biofilms from the Athabasca River10,11, of the rhizosphere of willows planted in contaminated soils12, of the human and animal bodies13-16 and of anaerobic digesters17.
In this contribution we detail our approach to sequence 16S rRNA gene amplicons in-house using a benchtop next-generation sequencer (the Ion Torrent PGM). After DNA extraction, 16S rRNA genes are amplified using domain-level bacterial primers that contain sequencing adapters and unique, sample-specific sequences (barcodes). The amplicons are purified, quantified and pooled at an equimolar ratio. The pooled samples are then clonally amplified in an emulsion PCR and sequenced. Resulting sequences are analyzed using publicly available bioinformatics tools (e.g., Mothur).
1. 16S rRNA Gene Amplicon Library Preparation by the Fusion Method
2. Amplicon Purification, Quantification and Pooling
3. Emulsion PCR and Sequencing
4. Basic Sequence Data Analysis
After purification on gel, with 25 cycles of PCR amplification, the amplification products are usually at a concentration of 0.2-10.0 ng in 50 µl of water. This may vary widely depending on the starting DNA concentration, the type of sample and the purification kit used. It is recommended to keep the number of PCR cycles to the lowest possible to avoid chimera formation and decrease amplification biases, keeping in mind that all samples should be amplified using the same number of cycles. To minimize the number of polyclonal reads and empty spheres and maximize the number of reads, the Qubit ratio should be between 0.1 and 0.3 and the FAM fluorescence should be above 200. Using a 314 chip on an Ion Torrent PGM, the average output is around 0.3-0.5 M good quality reads after filtering of the results in Mothur. Table 2 shows a typical breakdown of the number of reads after each step of the procedure for a run containing 36 multiplexed environmental samples amplified with primers targeting the V3-4 region of the 16S and analyzed using Mothur. In Mothur, the trim.seqs procedure generate a “*.trim.fasta” file containing the sequences that passed the quality filters and a “*.scrap.fasta” that contains the sequences that did not pass the quality filters along with the reason for rejection in the sequence header. When supplied with barcodes in the “oligos” file, this command will also generate a “*.groups” file that contains the group membership of every sequence based on the barcode sequence. The classify.seqs procedure generates a “.tax.summary” that can be opened in Excel. This file contains the summary of the taxonomic affiliation (in lines) for each of the samples (in columns). This file can be used for downstream statistical analyses and to visualize community composition at various taxonomic levels. The “.taxonomy” file contains the detailed taxonomic affiliation for each sequence. The average community composition at the phylum/class level across all 36 samples is shown in Figure 1.
Figure 1. Average community composition at the phylum/class level across all samples.
forward | TACGGRAGGCAGCAG | |
barcode | CTAAGGTAAC | Sample01 |
barcode | TAAGGAGAAC | Sample02 |
barcode | AAGAGGATTC | Sample03 |
barcode | TACCAAGATC | Sample04 |
barcode | CAGAAGGAAC | Sample05 |
barcode | CTGCAAGTTC | Sample06 |
barcode | TTCGTGATTC | Sample07 |
barcode | TTCCGATAAC | Sample08 |
barcode | TGAGCGGAAC | Sample09 |
barcode | CTGACCGAAC | Sample10 |
Table 1. Example “oligos” file for use in Mothur.
# of reads | % of previous step | Avg. per sample | |
Number of wells | 1,262,519 | – | 35,070 |
Wells with beads | 1,114,108 | 88.20% | 30,947 |
Beads with templates | 1,112,746 | 99.90% | 30,910 |
Monoclonal beads | 826,805 | 74.30% | 22,967 |
Good quality reads (Output from the sequencer) | 782,204 | 94.60% | 21,728 |
Pass Mothur filters (min. avg. quality score of 20 over a 50bp window, min. length of 150bp) | 372,168 | 47.60% | 10,338 |
Classified at the phylum level in GreenGenes (50% confidence threshold) | 342,171 | 91.90% | 9,505 |
Classified at the family level in GreenGenes (50% confidence threshold) | 316,512 | 92.50% | 8,792 |
Classified at the genus level in GreenGenes (50% confidence threshold) | 289,899 | 91.60% | 8,053 |
Table 2. Number of reads produced from a typical run for 36 environmental samples multiplexed on one Ion Torrent 314 chip.
The method presented here is straightforward and inexpensive, and should allow many laboratories to access the power of metagenomic sequencing. Although it varies depending on the sequencing platform used, once the libraries are constructed very little hands-on time is required, with most of the process being automatized. For the sequencing platform used here (Ion Torrent PGM), the complete procedure can be performed within two days of work. At the moment of writing (September 2013), the reagent costs related to the example detailed above were as follows: PCR amplification of 36 samples: $25, gel purification and PicoGreen DNA quantitation of 36 samples: $125, emulsion PCR for one pooled amplicon sample: $150 and sequencing reagents: $250, for a total of $550 or $15 per sample or $0.0015 per quality-filtered read. This price does not include instrument service contract, instrument depreciation, technician salary and laboratory space usage.
One of the most important steps is to pool all the products in an equimolar ratio, in order to retrieve similar number of reads for each of the samples. PicoGreen quantification was used here, but other methods might be suitable, though less accurate (e.g., UV quantification, gel-based quantification). Even by doing the most accurate quantification and pooling, there is some variability in the number of reads per sample, and in the typical run detailed in Table 2, it ranges from 4,380 to 32,750 reads, with an average of 10,338 reads. If processing large number of samples (more than 40-50), single column gel purification can be replaced by gel purification in plate or purification using beads with a stringent size cutoff (e.g., AMPure beads).
To date, the most used next-generation sequencing technology for the 16S rRNA gene is 454. The Ion Torrent sequencing technology used in this protocol is conceptually very similar to 454 and both technologies are prone to the same type of sequencing errors. Not surprisingly, it was shown that Ion Torrent sequencing resulted in sequencing results very similar to 454 sequencing10. Recently, many researchers have explored the use of Illumina technology for 16S rRNA gene amplicon sequencing18,19. In any case, it would be easy to adapt the current protocol for other benchtop sequencers like the Illumina MiSeq or the 454 GS Junior by changing the fusion primer sequences to match the adapters and barcodes needed for these sequencing technologies, like in the method recently described for the Illumina MiSeq19. Alternatively, researchers could follow steps 1 and 2 of the protocol detailed here and send the pooled amplicons to a sequencing center where the emulsion PCR and sequencing would be performed.
The 16S rRNA gene reads were trimmed and classified using Mothur, but many other analyses can be performed on 16S rRNA gene amplicons. For instance, beta diversity can be evaluated by calculating the Unifrac distances between each sample pair using the procedure outlined at http://unifrac.colorado.edu/20. Alpha diversity indices and number of operational taxonomical units of each sample can be calculated using tools within QIIME like AmpliconNoise 21 or using the procedure outlined by Huse et al.22 and available within Mothur.
The primers used here amplified the variable regions 3 and 4 from the 16S rRNA gene, but many other regions could be targeted. In present study, 16S rRNA genes were amplified from plant material and the choice of primer was made to avoid amplification of chloroplast 16S rRNA gene23,24. There are a wide variety of other primers available that vary in term of the product length, taxonomic power and usefulness25,26. However, in all cases 200-400 bp reads of the 16S rRNA gene cannot be reliably classified at the species level, and analyses are limited to the genus and higher taxonomical levels. Other genes could be more appropriate if species level information is needed, like the cpn60 and rpoB genes27,28. Future drastic drops in the cost of sequencing and increases in the power of analytical tools might make it feasible to replace 16S rRNA gene sequencing by shotgun metagenomics, but until then 16S rRNA gene sequencing remains the gold standard of environmental microbiology.
The authors have nothing to disclose.
Development of the method presented here has been carried out with various sources of funding, including Genome Canada and Genome Quebec, Environment Canada STAGE program and internal NRC funds.
Reagent | Company | Catalog Number |
Ion 314 Chip Kit v2 | Life Technologies | 4482261 |
Ion PGM Sequencing 200 Kit v2 | Life Technologies | 4482006 |
Ion PGM Template OT2 200 Kit | Life Technologies | 4480974 |
HotStarTaq Plus Master Mix Kit | Qiagen | 203646 |
Primers and probes | IDT | NA |
Qiaquick Gel Extraction Kit | Qiagen | 28704 |
BSA 20 mg/ml | Roche | 10,711,454,001 |
Dynabeads MyOne Streptavidin C1 | Life Technologies | 65001 |