Nanopore sequencing is a novel technology that allows cost-effective sequencing in remote locations and resource-poor settings. Here, we present a protocol for sequencing of mRNAs from whole blood that is compatible with such conditions.
Sequencing in remote locations and resource-poor settings presents unique challenges. Nanopore sequencing can be successfully used under such conditions, and was deployed to West Africa during the recent Ebola virus epidemic, highlighting this possibility. In addition to its practical advantages (low cost, ease of equipment transport and use), this technology also provides fundamental advantages over second-generation sequencing approaches, particularly the very long read length, ability to directly sequence RNA, and real-time availability of data. Raw read accuracy is lower than with other sequencing platforms, which represents the main limitation of this technology; however, this can be partially mitigated by the high read depth generated. Here, we present a field-compatible protocol for sequencing of the mRNAs encoding for Niemann-Pick C1, which is the cellular receptor for ebolaviruses. This protocol encompasses extraction of RNA from animal blood samples, followed by RT-PCR for target enrichment, barcoding, library preparation, and the sequencing run itself, and can be easily adapted for use with other DNA or RNA targets.
Sequencing is a powerful and important tool in biological and biomedical research. It allows analysis of genomes, genetic variations, and RNA expression profiles, and thus plays an important role in the investigation of human and animal diseases alike1,2. Sanger sequencing, one of the oldest methods available for DNA sequencing, is still routinely used to this day and has been a corner-stone of molecular biology. Over the past 50 years, this technology has been improved to achieve read-lengths of more than 1,000 nt and an accuracy as high as 99.999%1. However, Sanger sequencing also has limitations. Sequencing a larger set of samples or the analysis of whole genomes with this method is time consuming and expensive1,3. Second-generation (next-generation) DNA sequencing methods such as 454 pyrosequencing and Illumina technology have allowed us to significantly reduce the cost and workload required for sequencing in the last decade, and have led to a tremendous increase in the amount of biological sequence information available4. Nevertheless, individual sequencing runs using these second-generation technologies are expensive, and sequencing under field conditions is challenging, as the necessary equipment is bulky and fragile (similar to Sanger sequencing devices), and often has to be calibrated and serviced by specially trained personnel. Also, for many of the second-generation technologies read-lengths are rather limited, which often makes downstream bioinformatics analysis of these data challenging.
Third-generation sequencing using pocket-sized nanopore sequencing devices (see Table of Materials) can serve as an alternative to these established sequencing platforms. In these devices a single-stranded DNA or RNA molecule passes through a nanopore simultaneously with an ionic current that is then measured by a sensor (Figure 1). As the strand traverses the nanopore, the modulation of the current by the nucleotides present in the pore at any given time is detected, and computationally back-translated into the nucleotide sequence5. Because of this operational principle, nanopore sequencing allows both the generation of very long reads (close to 1 x 106 nucleotides6) and the analysis of sequencing data in real time. Barcoding is possible by attaching defined nucleotide sequences to the nucleic acids in a sample, which allows analysis of multiple samples in a single sequencing run, thus increasing sample throughput and lowering per sample costs. Due to their high portability and ease of use, nanopore sequencing devices have been used successfully in the field during the recent Ebola virus disease epidemic in West Africa, highlighting their suitability for rapid deployment into remote regions7,8.
Here, we describe a detailed field-compatible protocol for the sequencing of mRNA encoding for the Niemann-Pick C1 (NPC1) protein, which is the obligate entry receptor for filoviruses such as ebolaviruses, and has been shown to limit species susceptibility to these viruses9,10. The protocol encompasses extraction of whole RNA from blood samples, specific amplification of NPC1 mRNA by RT-PCR, barcoding of samples, library preparation and sequencing with a nanopore sequencing device. Data analysis cannot be discussed due to space limitations, although some basic directions are provided in the representative results; however, the interested reader is referred to a previous publication11 for a more detailed description of the workflow we used, as well as to publications by others12,13,14 for detailed information regarding the analysis tools used in this workflow.
Samples were collected following the Njala University Institutional Review Board (NUIRB) protocol no. IRB00008861/FWA00018924.
1. RNA Extraction from Blood Samples
2. Reverse Transcription of NPC1 mRNA into cDNA
3. Amplification of the NPC1 Open Reading Frame
4. Barcoding of NPC1 Amplicons
5. Library Preparation
6. Quality Check of Flow Cell
7. Loading the Flow Cell and Starting the Sequencing Run
In a representative experiment to test the presented protocol we extracted the RNA from 10 different blood samples of five animal species (i.e., 2 individuals per species (goat, sheep, swine, dog, cattle)) (Table 3). RNA yields and quality following extraction can vary widely, in particular due to differences in sample handling and storage. In our representative experiment, we observed RNA concentrations between 43 ng and 543 ng per µL (Table 3). Also, after amplification by RT-PCR, gel analysis of the NPC-1 PCR-products showed various outcomes (Figure 2), with markedly weaker bands for samples BC01 and BC02 (both goat). These differences were most likely caused by differences in sample quality, although differences in PCR efficacy due to differences in primer binding to the NPC1 gene of different species cannot be excluded. However, these differences in yield and/or amplification efficiency did not markedly impact the overall sequencing outcome. Further, an additional non-specific PCR product occurred in sample BC10 (cattle). In contrast to Sanger sequencing, such non-specific products do not negatively influence the results of nanopore sequencing, as these reads are discarded during mapping of the obtained reads to a reference sequence as part of the data analysis.
Prior to each sequencing run, a quality check of the flow cell to be used is strongly recommended, with a minimum requirement of 800 total pores. In our representative experiment, this quality check returned 1,102 pores available for sequencing. Since the data are provided in real time and can be analyzed immediately, the length of a sequencing run can be adjusted for the individual application (i.e., until sufficient sequencing data is produced for the desired analysis). In our experiments, sequencing runs are typically performed overnight, and in the case of our representative experiment we obtained approximately 1.4 million reads during such a 14 h run.
Depending on the type of data analysis to be performed, it can be advisable to process only a subset of the obtained reads. In the case of our representative experiment, a subset of 10,000 reads was selected for further analysis. To this end, the fastq files generated during the sequencing run were further processed in an Ubuntu 18.04 LTS environment, and demultiplexed using flexbar v3.0.3 with parameters optimized for demultiplexing of nanopore sequencing data (barcode-tail-length 300, barcode-error-rate 0.2, barcode-gap-penalty -1)12. After demultiplexing, read mapping and consensus generation can then be done using a number of different tools, but a detailed discussion of the bioinformatics aspect of nanopore sequencing goes beyond the scope of this manuscript. However, in the case of our representative results, read mapping to a reference sequence was performed using Geneious 10.2.3. Of the 10,000 reads analyzed, 5,457 showed a length between 1,750 and 2,000 nucleotides, which matches the expected sizes for the PCR fragments amplified as part of our workflow (1,769 nt, Figure 3). An additional peak in the length distribution of reads was observed between 250 to 500 nucleotides, which can be attributed to unspecific PCR products. Demultiplexing of reads allowed the assignment of 87.6% of the reads to one of the 10 barcodes/samples analyzed (Figure 4). The proportion of demultiplexed reads for each barcode ranged from 3.4% for barcode 1 to 16.9% for barcode 10; however, due to the overall large number of reads this still allowed meaningful consensus calling with a high read depth even for these lower abundance barcode datasets. Indeed, mapping of the sorted reads to a reference sequence of NPC1 resulted in between 31.7% (barcode 2) and 100% (barcode 7 and 8) of reads mapping to the reference, giving a read depth of more than 90 reads at any position for each sample. This is then more than adequate to allow confident consensus base-calling with a negligible error rate.
Figure 1: Schematic representation of DNA sequencing using nanopore technology. A single-stranded DNA molecule passes through a nanopore embedded in an electrically resistant membrane, with a helicase regulating the transition speed. An ionic current simultaneously passes through the pore and is continuously measured. Modulations of the current caused by the nucleotides present in the pore are detected and computationally back-translated into the nucleotide sequence of the DNA strand. Please click here to view a larger version of this figure.
Figure 2: Amplification of PCR products of Niemann-Pick C1 from mRNA. mRNA was isolated from goat (BC01 and 02), sheep (BC03 and 04), swine (BC05 and 06), dog (BC07 and 08), and cattle (BC09 and 10). Nested PCR products were separated in a 0.8% agarose gel in 1x TAE buffer (prepared from 50x TAE buffer: 242.28 g of Tris base, 57.1 mL of glacial acetic acid, 100 mL of 0.5 M EDTA, dH2O to 1 L, pH adjusted to 8.0) for 45 min at 100 V and stained with Sybr Safe. Please click here to view a larger version of this figure.
Figure 3: Read-length distribution of 10,000 reads from the representative experiment. The number of reads obtained having a given read length interval is indicated. Please click here to view a larger version of this figure.
Figure 4: Distribution of reads after demultiplexing. The number and percentage of demultiplexed (grey) and mapped reads (black) for each barcode are shown. Please click here to view a larger version of this figure.
Table 1: Overview of primer sets used. Initial amplification of target sequences was performed with Primer Set 1. Primer Set 2 was then used for nested amplification and adapter addition. Adapters are indicated in red. Please click here to download this file.
Table 2: Overview of barcode sequences. Individual barcodes were used to identify each sequenced sample. Please click here to download this file.
Table 3: RNA concentrations obtained following extraction from blood samples sequenced in the representative experiment. The RNA concentrations of two individuals from each of five species are shown, and the ratios of the optical densities at 260/280 nm and 260/230 nm are indicated. Please click here to download this file.
Over the last two decades, sequencing of biological samples has become an increasingly important aspect of studies in a wide range of subject areas. The development of second-generation sequencing systems based on the sequencing of a dense array of DNA features using iterative cycles of enzymatic manipulation and image-based data acquisition1 has dramatically increased throughput compared to the traditional Sanger sequencing technique, and allows analysis of multiple samples as well as various nucleic acid species in a given sample in parallel4. However, for most of the commonly used second-generation systems, only short reads are produced, and all platforms rely on sensitive, bulky, and expensive equipment3,4.
In contrast to second-generation sequencing platforms, the sequencing device used in this protocol is based on nanopore technology. Here a single-stranded nucleic acid molecule passes through a nanopore, resulting in modulation of an ionic current that is also flowing through the same nanopore, and which can be measured and back-translated to infer the sequence of the nucleic acid molecule. This third-generation sequencing approach imparts a number of advantages over other approaches. The main advantages that are directly related to the unique working principle of this technology are the extremely long read length produced (read lengths of up to 8.8 x 105 nucleotides have been reported6), the ability to sequence not only DNA but also RNA directly, which was recently demonstrated for a complete influenza virus genome17, and the ability to analyze data in real-time as they are being generated, which allows rapid metagenomics detection of pathogens within minutes18. Additional practical advantages are the extremely small size of the nanopore sequencing device, allowing its use in any laboratory or on field missions to remote locations19,20, and the low price in comparison to other sequencing platforms. In terms of running costs, currently a new flow cell is required for each sequencing run, which results in costs of about $1,100 per run for the flow cell and library preparation reagents. These costs can be reduced in some cases by washing and reusing the flow cell, or by barcoding and sequencing multiple samples in a single run. Also, a novel type of flow cell is currently being beta-tested by a small number of laboratories, which will require the use of a flow cell adaptor (called a “flongle”), and should significantly reduce flow cell price and thus running costs.
The major shortcoming of nanopore sequencing remains its accuracy, with single read accuracies in the range of 83 to 86% being reported6,21,22, and most of the inaccuracies being caused by insertion/deletions (indels)5,21. However, high read depth can compensate for these inaccuracies, and a recent study suggested based on theoretical considerations that a read depth of >10 might increase overall accuracy to >99.8%21. Nevertheless, further improvements in accuracy will be needed, particularly if analysis is to be performed on a single molecule level rather than on a consensus sequence level. The use of 1D2 technology as described in this protocol, which is based on the addition of the 1D2 and barcode adapters (cf. section 5.5) that result in both strands of a single DNA molecule being sequenced by the same nanopore, increases read accuracy since information from both DNA strands can be used for sequence determination. Further, a workaround strategy that can be pursued in order to combine the advantages of nanopore sequencing (particularly long read length) with the higher accuracy of other sequencing technologies is to use nanopore sequencing information as a scaffold, which is then polished using sequencing data from other platforms6.
The most critical factor for the success of the protocol presented here is sample quality, and particularly the amount and quality of the extracted RNA. Proper storage and prompt extraction of the RNA help in achieving an adequate RNA yield. The use of appropriate blood collection tubes allows the storage of blood samples for up to one month, but blood clotting can be an issue, particularly when samples are being stored at elevated temperatures, which can be the case under field conditions. The second critical step is the amplification of target sequences, and particular under field conditions PCR reactions often perform less well than under standard laboratory conditions7. To this end, careful primer design and optimization is paramount to achieve robust amplification. Additionally, nested PCR approaches and touchdown PCR, as used in this protocol, can increase both specificity and sensitivity of target gene amplification4,7. Indeed, in our experience in Liberia and Guinea with this technology nested protocols were required under field conditions with field samples even for primer sets which allowed amplification of targets from laboratory samples and under laboratory conditions with a single round of PCR (7 and unpublished results).
In contrast to these more critical steps, library preparation and the sequencing run itself are rather robust procedures. However, under field conditions practical issues such as the availability of certain pieces of equipment can be problematic. For example, a UV spectrophotometer is needed to determine DNA concentrations prior to library preparation of barcoded samples. However, should such a device not be available under field conditions, an equal volume of each sample can simply be combined to make up the 45 µL required for library preparation, with differences in sample input material then usually being mitigated by the large number of reads. Similarly, the need for internet connectivity for the sequencing run can be an issue, even though the base-calling no longer has to be performed online but can be done locally; however, this necessity can be removed under certain circumstances by the manufacturer if required.
In summary, the presented protocol allows relatively low-cost sequencing in locations with no access to traditional sequencing equipment, including in remote locations. It can easily be adapted to any target RNA or DNA, thus allowing researchers to answer numerous biological questions.
The authors have nothing to disclose.
The authors thank Allison Groseth for critical reading of the manuscript. This work was financially supported by the German Federal Ministry of Food and Agriculture (BMEL) based on a decision of the Parliament of the Federal Republic of Germany through the Federal Office for Agriculture and Food (BLE).
1D2 adapter, barcode adapter mix, ABB buffer, elution buffer, RBF buffer, LBB beads | Oxford Nanopore Technologies | SQK-LSK308 | 1D² Sequencing Kit |
blood collection tube with DNA/RNA stabilizing reagent | Zymo Research | R1150 | DNA/RNA Shield – Blood Collection Tube |
blunt/TA ligase master mix | New England Biolabs | M0367S | Blunt/TA Ligase Master Mix |
DNA-low binding reaction tube | Eppendorf | 30108051 | DNA LoBind Tube |
DNase buffer and DNase | ThermoFisher Scientific | 11766050 | SuperScript™ IV VILO™ Master Mix with ezDNase™ Enzyme |
flow cell | Oxford Nanopore Technologies | FLO-MIN105.24 | flow cell R9.4 |
hot start high fidelity DNA polymerase | New England Biolabs | M0493L | Q5 Hot Start High-Fidelity DNA Polymerase (500 U) |
magnetic beads | Beckman Coulter | A63881 | Agencourt AMPure XP beads |
magnetic rack | ThermoFisher Scientific | 12321D | DynaMag-2 Magnet |
nanopore sequencing device | Oxford Nanopore Technologies | – | MinION Mk 1B |
PCR barcoding kit | Oxford Nanopore Technologies | EXP-PBC001 | PCR Barcoding Kit I (R9) |
reverse transcriptase master mix | ThermoFisher Scientific | 11766050 | SuperScript™ IV VILO™ Master Mix with ezDNase™ Enzyme |
RNA purification spin column, DNA/RNA prep buffer, DNA/RNA wash buffer, DNase I, DNA digestion buffer | Zymo Research | R1151 | Quick-DNA/RNA Blood Tube Kit |
rotating mixer | ThermoFisher Scientific | 15920D | HulaMixer Sample Mixer |
Taq DNA polymerase | New England Biolabs | M0287S | LongAmp Taq 2X Master Mix |
Ultra II End-prep kit | New England Biolabs | E7546S | NEBNext Ultra II End-Repair/dA-tailing Modul |
UV spectrophotometer | Implen | – | NanoPhotometer |
vacuum manifold | Zymo Research | S7000 | EZ-Vac Vacuum Manifold |