We previously validated a protocol for amplicon-based whole genome Usutu virus (USUV) sequencing on a nanopore sequencing platform. Here, we describe the methods used in more detail and determine the error rate of the nanopore R10 flow cell.
Whole genome sequencing can be used to characterize and to trace viral outbreaks. Nanopore-based whole genome sequencing protocols have been described for several different viruses. These approaches utilize an overlapping amplicon-based approach which can be used to target a specific virus or group of genetically related viruses. In addition to confirmation of the virus presence, sequencing can be used for genomic epidemiology studies, to track viruses and unravel origins, reservoirs and modes of transmission. For such applications, it is crucial to understand possible effects of the error rate associated with the platform used. Routine application in clinical and public health settings require that this is documented with every important change in the protocol. Previously, a protocol for whole genome Usutu virus sequencing on the nanopore sequencing platform was validated (R9.4 flowcell) by direct comparison to Illumina sequencing. Here, we describe the method used to determine the required read coverage, using the comparison between the R10 flow cell and Illumina sequencing as an example.
Fast developments in third generation sequence technologies allows us to move forward towards close to real-time sequencing during viral outbreaks. This timely availability of genetic information can be useful to determine the origin and evolution of viral pathogens. Gold standards in the fields of next generation sequencing however, are still the second-generation sequencers. These techniques rely on specific and time-consuming techniques like clonal amplification during an emulsion PCR or clonal bridge amplification. The third-generation sequencers are cheaper, hand-held and come with simplified library preparation methodologies. Especially the small size of the sequence device and the low purchase price makes it an interesting candidate for deployable, fieldable sequencing. This could for instance be seen during the Ebola virus outbreak in Sierra Leone and during the ongoing arbovirus outbreak investigations in Brazil1,2,3. However, the reported high error rate4 might limit the applications for which nanopore sequencing can be used.
Nanopore sequencing is evolving quickly. New products are available in the market on a regular basis. Examples of this are for instance the 1D squared kits which enables sequencing of both strands of the DNA molecule, thereby boosting the accuracy of the called bases5 and the development of the R10 flow cell which measures the change in current at two different instances in the pore6. In addition, improved bio-informatic tools like improvements in basecalling will improve the accuracy of basecalling7. One of the most frequently used basecallers, (e.g., Albacore), has been updated at least 12 times in a 9-month time period5. Recently, the manufacturer also released a novel basecaller called flip-flop, which is implemented in the default nanopore software8. Together, all of these improvements will lead to more accurate sequences and will decrease the error rate of the nanopore sequencer.
Usutu virus (USUV) is a mosquito-borne arbovirus of the family Flaviviridae and it has a positive-stranded RNA genome of around 11,000 nucleotides. USUV mainly affects great grey owls and blackbirds9,10, although other bird species are also susceptible to USUV infection11. Recently, USUV was also identified in rodents and shrews although their potential role in transmission of the virus remains unknown12. In humans, asymptomatic infections have been described in blood donors13,14,15,16 while USUV infections also have been reported to be associated with encephalitis or meningo-encephalitis17,18. In the Netherlands, USUV was first detected in wild birds in 201610 and in asymptomatic blood donors in 201814. Since the initial detection of USUV, outbreaks have been reported during the subsequent years and surveillance, including whole genome sequencing, is currently ongoing to monitor the emerge and spread of an arbovirus in a previously naïve population.
Similar to what has been described for other viruses, such as Ebola virus, Zika virus and yellow fever virus3,19,20, we have developed a primer set to sequence full length USUV21. This polymerase chain reaction (PCR)-based approach allows for the recovery of full length USUV genomes from highly host-contaminated sample types like brain samples in samples up to a Ct value of around 32. Benefits of an amplicon-based sequencing approach are a higher sensitivity compared to metagenomic sequencing and a higher specificity. Limitations of using an amplicon-based approach are that the sequences should be similar in order to design primers fitting all strains and that primers are designed on our current knowledge about the virus diversity.
Given the constant developments and improvements in third generation sequencing, there is a need to evaluate the error rate of the sequencer on a regular basis. Here, we describe a method to evaluate the performance of nanopore directly against Illumina sequencing using USUV as an example. This method is applied to sequences generated with the latest R10 flow cell and basecalling is performed with the latest version of the flip-flop basecaller.
NOTE: List of software tools to be used: usearch v11.0.667; muscle v3.8.1551; porechop 0.2.4; cutadapt 2.5; minimap2 2.16-r922; samtools 1.9; trimmomatic 0.39; bbmap 38.33; spades v3.13.1; kma-1.2.8
1. Primer design
2. Multiplex PCR
3. Data analysis to generate consensus sequences from nanopore data
4. Analysis of the Illumina data
5. Determining the required read coverage to compensate for the error profile in nanopore sequencing using Illumina data as gold standard
Recently, a new version of the flow cell version (R10) was released and offered improvements to the basecaller used to convert the electronic current signal to DNA sequences (so-called flip-flop basecaller). Therefore, we have re-sequenced USUV from brain tissue of an USUV-positive owl which was previously sequenced on a R9.4 flow cell and on an Illumina Miseq instrument21. Here, we described the method used to determine the required read coverage for reliable consensus calling by direct comparison to Illumina sequencing.
Using the newer flow cell in combination with the basecaller flip-flop we show that a read coverage of 40x results in identical results as compared to Illumina sequencing. A read coverage of 30x results in an error rate of 0.0002% which corresponds to one error in every 585,000 nucleotides sequenced, while a read coverage of 20x results in one error in every 63,529 nucleotides sequenced. A read coverage of 10x results in one error in every 3,312 nucleotides sequenced, meaning that over three nucleotides per full USUV genome are being called wrong. With a read coverage above 30x, no indels were observed. A read coverage of 20x resulted in the detection of one indel position while a read coverage of 10x resulted in indels in 29 positions. An overview of the error rate using different read coverage cut-offs is shown in Table 1.
Coverage | Errors iteration 1 | Error rate iteration 1 | Indels: | Errors iteration 2 | Error rate iteration 2 | Indels: | Errors iteration 3 | Error rate iteration 3 | Indels: |
10× | 100 | 0.0274% | 4 | 116 | 0.0297% | 18 | 110 | 0.0282% | 7 |
20× | 4 | 0.0010% | 0 | 6 | 0.0015% | 1 | 7 | 0.0018% | 0 |
30x | 2 | 0.0005% | 0 | 0 | 0.0000% | 0 | 0 | 0.0000% | 0 |
40x | 0 | 0.0000% | 0 | 0 | 0.0000% | 0 | 0 | 0.0000% | 0 |
50× | 0 | 0.0000% | 0 | 0 | 0.0000% | 0 | 0 | 0.0000% | 0 |
Table 1: Overview of the error rate of nanopore sequencing. Each iteration represents one thousand random samples.
Supplementary File 1: Random selection. Please click here to view this file (Right click to download).
Nanopore sequencing is constantly evolving and therefore there is a need for methods to monitor the error rate. Here, we describe a workflow to monitor the error rate of the nanopore sequencer. This can be useful after the release of a new flow cell, or if new releases of the basecalling are released. However, this can also be useful for users who want to set-up and validate their own sequencing protocol.
Different software and alignment tools can yield different results33. In this manuscript, we aimed to use freely available software packages which are commonly used, and which have clear documentation. In some cases, preference might be given to commercial tools, which generally have a more user-friendly interfaces but have to be paid for. In the future, this method can be applied to the same sample in case big modifications in sequence technology or basecalling software are introduced Preferentially this should be done after each update of the basecaller or flowcell, however given the speed of the current developments this can be also been done only after major updates.
The reduction in the error rate in sequencing allows for a higher number of samples to be multiplexed. Thereby, nanopore sequencing is getting closer to replacing conventional real time PCRs for diagnostic assays, which is already the case for influenza virus diagnostics. In addition, the reduction of the error rate increases the usability of this technique sequencing, for instance for the determination of minor variants and for high-throughput unbiased metagenomic sequencing.
A critical step in the protocol is that close, reliable reference sequences need to be available. The primers are based on the current knowledge about virus diversity and might need to be updated every once in a while. Another critical point when setting up an amplicon-based sequencing approach is the balancing of the primer concentration to get an even balance in amplicon depth. This enables the multiplexing of more samples on a sequence run and results in a significant cost reduction.
The authors have nothing to disclose.
This work has received funding from the European Union's Horizon 2020 research and innovation programme under grant agreement No. 643476 (COMPARE).
Agencourt AMPure XP beads | Beckman Coulter | A63881 | |
dNTPs | Qiagen | 201900 | |
FLO-MIN106 R10 flowcell | Nanopore | R10 flowcell | |
KAPA Hyperplus libarary preparation kit | Roche | 7962436001 | |
Library Loading Bead Kit | Nanopore | EXP-LLB001 | |
Ligation Sequencing Kit 1D | Nanopore | SQK-LSK109 | |
Native Barcoding Kit 1D 1-12 | Nanopore | EXP-NBD103 | |
Native Barcoding Kit 1D 13-24 | Nanopore | EXP-NBD104 | |
NEB Blunt/TA Ligase Master Mix | NEB | M0367S | |
NEB Next Quick Ligation Module | NEB | E6056 | |
NEB Next Ultra II End Repair / dA-Tailing Module | NEB | E7546S | |
Protoscript II Reverse Transcriptase | NEB | M0368X | |
Q5 High-Fidelity polymerase | NEB | M0491 | |
Qubit dsDNA HS Assay kit | Thermo Fisher | Q32851 | |
Random Primers | Promega | C1181 | |
RNAsin Ribonuclease Inhibitor | Promega | N2111 |