Nanopore technology for sequencing biomolecules has wide applications in the life sciences, including identification of pathogens, food safety monitoring, genomic analysis, metagenomic environmental monitoring, and characterization of bacterial antibiotic resistance. In this article, the procedure for metagenomic soil DNA sequencing for species identification using the nanopore sequencing technology is demonstrated.
This article describes the steps for construction of a DNA library from soil, preparation and use of the nanopore flow cell, and analysis of the DNA sequences identified using computer software. Nanopore DNA sequencing is a flexible technique that allows for rapid microbial genome sequencing to identify bacterial and viral species, to characterize bacterial strains, and to detect genetic mutations that confer resistance to antibiotics. The advantages of nanopore sequencing (NS) for life sciences include its low complexity, reduced cost, and rapid real-time sequencing of purified genomic DNA, PCR amplicons, cDNA samples, or RNA. NS is an example of “strand sequencing” which involves sequencing DNA by guiding a single stranded DNA molecule through a nanopore that is inserted into a synthetic polymer membrane. The membrane has an electrical current applied across it, so as the individual bases pass through the nanopore the electrical current is disrupted to varying degrees by the four nucleotide bases. The identification of each nucleotide occurs by detecting the characteristic modulation of the electrical current by the different bases as they pass through the nanopore. The NS system consists of a handheld, USB powered portable device and a disposable flow cell that contains a nanopore array. The portable device plugs into a standard laptop computer that reads and records the DNA sequence using computer software.
The goal of this procedure is to demonstrate the steps required for preparation of an environmental DNA library for sequencing, utilization of a nanopore flow cell sequencing device, and to perform analysis of the generated DNA sequences using system software and the National Center for Biotechnology Information (NCBI) bioinformatics tools to identify microbial species in soil. Currently, most DNA sequencing platforms require a major investment in technical training and complex instrumentation, which is not feasible in resource poor environments or in field applications. The nanopore sequencing (NS) platform eliminates these issues with a cost effective, simple to use library preparation protocol, and a portable device to sequence and analyze a variety of different types of nucleic acids1,3. We have incorporated the NS platform into several lab classes for master's degree students.
Nanopore technology for sequencing biomolecules has demonstrated wide applications in the life sciences, including identification of bacterial and viral pathogens1,2,6, environmental biodiversity studies, food safety monitoring, genomic analysis3,5, and characterization of bacterial antibiotic resistance4. NS is a fast and accurate method to sequence nucleic acids that is based on the principle of "strand sequencing" by detecting electrical disruption by individual nucleotide bases when single stranded DNA passes through a nanopore inserted into an electrified synthetic polymer membrane. The steps involved in preparing DNA for NS include genome fragmentation, end repair and 3' dA-tailing of genomic fragments, adaptor and tether annealing to the DNA, DNA library purification, and loading the library into the nanopore flow cell device. Fragmentation of the genome into ~8 kb sizes is accomplished by centrifuging 1 – 2 µg of genomic DNA through a g-tube fragmentation tube. The fragmented genomic ends are then repaired and tailed with poly dA using a commercially available kit. Single stranded adapter sequences, which are compatible with the nanopore motor protein, are added to DNA ends which are used to guide the DNA sequence through the nanopore (Figure 1). The tether sequences are required for DNA purification and for localizing the DNA molecules to the pore membrane. The hairpin is generated by ligating a hairpin adapter to one end of the dA tailed library. The hairpin structures at the DNA ends allows reading of the sense and antisense strands as the DNA passes through the nanopore (Figure 2). The prepared genomic library is then purified from the reaction by using streptavidin beads using a magnetic field, followed by loading the sample into the nanopore flow cell for analysis.
The sequenced DNA is assessed for quality and sequencing reads that are acceptable for analysis are then subjected to several bioinformatics tools to identify microbes. The sequences are "translated" into a FASTQ from a FAST5 format. In the FASTQ format, the sequences can then be used in BLAST analysis.
Note: Metagenomic DNA is purified from soil (Baltimore County, Maryland) using a commercially available soil genomic isolation kit (see Table of Materials). Using an UV spectrophotometer (see Table of Materials), the purified genomic DNA should have a 260/280 (nm) ratio >1.8 and a 260/230 ratio between 2.0 – 2.2 to assure that the sample is free of contaminants. The amount of genomic DNA required for NS ranges from 200 ng to 2 µg.
1. Rapid Library Preparation Method (Short Protocol)
NOTE: This is a short protocol. See Table of Materials for the rapid sequencing kit.
2. Ligation Sequencing Protocol (Long Protocol): Metagenomic DNA Fragmentation
Note: See Table of Materials for the ligation sequencing kit.
3. Fragmented Genomic DNA End-preparation
4. Adapter and Tether Addition to End-prepped Genomic DNA Fragments
5. Magnetic Bead Preparation
6. Library Purification
7. Elution of Library from Magnet Beads
8. Starting a Run/Quality Control
9. Starting a Sequencing Run
10. Loading the Library
11. Starting a Sequencing Software Protocol Script
12. Analysis of Run and Results
Note: During the sequencing run, the data can be monitored using the "VIEW REPORT" program using the desktop agent. During or after the run is complete, files are available in a FAST5 file format in the data → reads → pass folder (default). If this folder is open while sequencing is active, the computer may freeze.
The experimental design and nanopore technology provided a fast and inexpensive method for students to sequence soil DNA. The representative run passed the quality control parameters with more than 800 pores available for sequencing. The run resulted in over 125,000 reads available for study with a median sequence length of 5.38 kb. Sequences are given a quality score and only those sequences with acceptable scores were then analyzed. As the output of the sequencing reaction is in FAST5 format, which is not accepted by programs such as BLAST (NCBI), the sequences were viewed in the HDF viewer which converts the sequences to FASTQ format, which is compatible for BLAST analysis. In future iterations of the software, the data will be available as a FASTQ format, eliminating the need for the HDF viewer. See Supplementary Figures 1, 2, and 3.
Fifty sequences of over 350 nucleotides in length were subject to analysis by BLASTN (nucleotides against nucleotide database) or BLASTX (nucleotides translated into amino acid sequence) (NCBI) to identify organisms in the soil sample. Given the time constraints of the class and that the focus of the course is on laboratory methods and not bioinformatics, we have the students analyze sequences within these parameters. Our bioinformatics classes can use the data for development of pipeline analysis tools. This level of bioinformatics analysis is not covered in the laboratory based class. Table 1 is a list of the organisms that were identified with e values of less than 4 e-04. Generally, e-values less than 4 e-04 indicate strong similarity between the input sequence (query) and the matched sequence. The organisms identified by BLASTN (against the non-redundant (nr) database) are from a wide variety of species and are available for further genomic and protein analyses. This sample of soil, which came from a garden in Baltimore County, MD had never been tested, so there were no previous data from which to determine what organisms might be in the soil. BLASTN analysis (against the nr database) analysis also indicated there were sequences with no similarities in the nr database. To determine if these are truly novel sequences, further study is needed. Analysis of several sequences by BLASTX revealed several proteins from organisms not represented in the BLASTN analysis. Table 2 lists several possible proteins including endopeptidases, and ATPases-like proteins. Using this library of sequences, students have the opportunity to use more sophisticated bioinformatics tools to perform further analysis, if directed by the instructor.
Figure 1: Genomic library preparation. Steps for preparation of genomic DNA library for sequencing using nanopore technology. This figure has been modified with the necessary permissions. Please click here to view a larger version of this figure.
Figure 2. Schematic of DNA sequencing using a nanopore. DNA passing through a nanopore membrane with electrical signal generation and base identification. This figure has been modified with the necessary permissions. Please click here to view a larger version of this figure.
Organism | e value |
Streptomyces sp. | 3.00E-06 |
Nocardoides sp. | 5.00E-04 |
Gordonia sp. | 5.00E-04 |
Hyphomicrobium nitrativorens | 1.00E-139 |
Starkeya novella | 1.00E-31 |
Hyphomicrobium denitrificans | 1.00E-30 |
Fibomicrobium sp. | 5.00E-17 |
Pseudomonas sp. | 2.00E-15 |
Rhodothemus marinus | 1.00E-17 |
Turneiella parva | 2.00E-24 |
E. coli | 0.00E+00 |
Bradyrhizobium sp. | 3.00E-30 |
Gemmatirosa kalamazoonesis | 1.00E-20 |
Burkholderia sp. | 8.00E-10 |
Sphingomonas sp. | 8.00E-10 |
Cellumonas sp. | 6.00E-11 |
Table 1: Selected results of BLASTN analysis. Soil metagenomics DNA sequences subjected to BLASTN similarity search against the NCBI non-redundant nucleotide database. The data reported have e-values of 4 x e-4 or less.
Protein | Organism |
Hypothetical protein | Acidobacter bacterium |
SAM dependent methyltransferase | H. denitrificans |
ATPase | Jannaschia sp. |
Endopeptidase | Shigella sonnei |
Lysis protein | E. coli |
Table 2: Selected results of BLASTX analysis. Soil metagenomics DNA sequences subjected to BLASTX similarity search against the NCBI non-redundant protein database.
Supplemental Figure 1: Platform QC results. The results of the platform QC are presented. In the upper right corner are the results of the QC for active pores. Each quadrant of the flow cell is tested for active pores. The main page is dynamic and changes as each quadrant is tested. In this run, 1,414 single pores were detected. Please click here to view a larger version of this figure.
Supplemental Figure 2: Converting FAST5 files to FASTQ files. Here on the right side, is the output of the data from the sequencing run. Each line represents an individual sequence. The left side of the figure shows the highlighted sequence from the right side, in the HDF viewer that converts the sequence to a FASTQ file which can be used for further analysis. Please click here to view a larger version of this figure.
Supplemental Figure 3: Example of FASTQ sequence in HDF viewer. This sequence (read 803) is in a FASTQ file which converts the FAST5 data into nucleotides. Please click here to view a larger version of this figure.
Many next generation sequencing methods have been generated and each depends on sequencing by synthesis but the detection platforms for identifying nucleotides differ. NS, the most recent addition to the marketplace, uses a different method entirely, which does not require a sequencing reaction or labeled nucleotides. This method takes advantage of charge differential of already incorporated nucleotides as they pass through an electrified pore. The identification of each nucleotide occurs by modulation of the electrical current by the different bases as they pass through the nanopore. This multiplexed system allows the user to sequence many fragments at a time. By sequencing both strands of the DNA, the accuracy of the sequence is significantly increased and the sequencing software, which can be loaded onto a laptop computer, processes the signals and provides the sequence information that can be analyzed.
In pyrosequencing, a sequencing by synthesis (SBS) method, detection of the specific base incorporated into the template depends on the luciferase assay and the generation of chemiluminescent signals8. In ion semiconductor sequencing, the released hydrogen ion, which decreases the pH is detected by an ion sensor9. Single molecule real-time sequencing depends on the zero mode wave guide (ZMW)10, which illuminates for detection a florescent molecule tagged to the incorporated nucleotide. SBS uses a unique method to amplify the target DNA such that clusters of unique sequences are generated13. Detection of the added nucleotide is achieved when the fluorescence of the tagged nucleotide is recorded. NS on the other hand has unique advantage over other methods in that it requires limited technical resources, is portable, produces long sequencing reads, requires no prior DNA amplification, and can be operated at a reduced cost compared to other methods. Our students found the newer, rapid library prep protocol to be straightforward and amenable to a three hour lab class. Some of the issues we encountered were bubbles in the flow cell which were difficult to remove, it required significant computer power (one terabyte of storage), the current output of data is in a FAST5 file, and the sequencing flow cell has a limited shelf life before it deteriorates. In addition, other disadvantages of the NS Ligation sequencing protocol (long protocol) is that it requires several library preparation steps, requires expertise in molecular biology techniques, and generates reduced sequencing fidelity when compared to some sequencing methodologies3. However, recent advances with the new rapid library preparation kit requires only 10 min for library preparation and has demonstrated a reduced sequencing error rate. The new library preparation method was very amenable for use in a lab class.
There are several critical steps in the protocol, particularly in the QC of the flow cell. This includes performing an initial QC within five days of the receipt of the flow cell and using them within 8 weeks. Although we have used flow cells that were beyond 8 weeks, the number of open/active pores is greatly reduced. It is important that experiments are planned to fit a timeline where maximum use of the flow cells is achieved. We have used the cleaning protocol and reused the flow cells with success.
Investigating the metagenomics of soil represent an untapped genetic reservoir of microbial diversity. For example, one gram of soil is estimated to contain between 107 – 109 prokaryotic cells14. Moreover, soil organisms are a main source of novel natural products, enzymes, and antibiotics. Thus, soil metagenomics DNA sequence analysis represents a valuable instructional tool for students at every level of education. The NS technology's ease of use and low cost make this system a very effective teaching tool. Students can sequence environmental samples, and upon completion of the sequence use available bioinformatics tools to identify and characterize microbes and metagenomics sequences in test samples. Using the NS technology, students have true hands-on experience, which has until now been out of reach for use in laboratory courses because of the advanced technical expertise and high reagent, equipment, and maintenance costs in other sequencing platforms. Recently, one of our students (J. Harrison, personal communication) reported the use of this technology in an environmental monitoring project of farm soils. We expect that there will be many more applications for this technology in the education space.
The authors have nothing to disclose.
This project was support in part by Johns Hopkins University, Office of the Provost through the Gateway Science Initiative.
Thermal Cycler | LifeECO | BTC42096 | |
Covaris g-TUBE | Covaris | 520079 | |
NEBNext End Repair Module | New England BioLab | E7546 | |
Eppendorf LoBind Centrifuge Tubes | Sigma-Aldrich | Z666505 | |
NEB Blunt/TA Ligase Master Mix | New England BioLab | MO367 | |
MyOne C1 Strepavidin Beads | Thermo Fisher | 65001 | |
Eppendorf microcentrifuge | Eppendorf | Model 5420 | |
Nanodrop UV spectrophotometer | Thermo Fisher | ND-2000 | Model 2000 |
Belly Dancer Orbital Shaker | Sigma-Aldrich | Z768499 | |
Power Soil DNA Isolation Kit | MO BIO | 12888-50 | |
Ligation Sequencing Kit 2D | Oxford Nanopore | SQK-LSK208 | |
Rapid Sequencing Kit for Genomic DNA | Oxford Nanopore | SQK-RAD002 | |
End-Prep Reaction Buffer and enzyme mix (NEB Blunt/TA Ligase Naster Mix) | New England BioLab | MO367 | |
AmPure XP Magnetic Beads | Beckman Coulter | A63880 | |
Basic Starter Pack | Oxford Nanopore | Includes MinION Sequencing Device and Flow Cell | |
MinKNOW software | Oxford Nanopore | ||
Rapid Sequencing Kit for Genomic DNA | Oxford Nanopore | SQK-RAD002 | Includes Running Buffer with Fuel (RBF), Fragmentation Mix (FRM) Rapid Adapter (RAD) Library Lodaing Beads (LLB) |