We present a detailed small RNA library reparation protocol with less bias than standard methods and an increased sensitivity for 2'-O-methyl RNAs. This protocol can be followed using homemade reagents to save cost or using kits for convenience.
The study of small RNAs (sRNAs) by next-generation sequencing (NGS) is challenged by bias issues during library preparation. Several types of sRNA such as plant microRNAs (miRNAs) carry a 2'-O-methyl (2'-OMe) modification at their 3' terminal nucleotide. This modification adds another difficulty as it inhibits 3' adapter ligation. We previously demonstrated that modified versions of the 'TruSeq (TS)' protocol have less bias and an improved detection of 2'-OMe RNAs. Here we describe in detail protocol 'TS5', which showed the best overall performance. TS5 can be followed either using homemade reagents or reagents from the TS kit, with equal performance.
Small RNAs (sRNAs) are involved in the control of a diversity of biological processes1. Eukaryotic regulatory sRNAs are typically between 20 and 30 nt in size; the three major types are microRNAs (miRNA), piwi-interacting RNAs (piRNA) and small interfering RNAs (siRNA). Aberrant miRNA expression levels have been implicated in a variety of diseases2. This underscores the importance of miRNAs in health and disease and the requirement for accurate, quantitative research tools to detect sRNAs in general.
Next-generation sequencing (NGS) is a widely used method to study sRNAs. Main advantages of NGS as compared with other approaches, such as quantitative PCR or microarray techniques (qPCR), are that it does not need a priori knowledge of the sRNA sequences and can therefore be used to discover novel RNAs, and in addition it suffers less of background signal and saturation effects. Further, it can detect single nucleotide differences and has a higher throughput than microarrays. However, NGS also has some drawbacks; the cost of a sequencing run remains relatively high and the multistep process required to convert a sample into a library for sequencing may introduce biases. In a typical sRNA library preparation process, a 3' adapter is first ligated to the sRNA (often gel-purified from total RNA) using a truncated version of RNA ligase 2 (RNL2) and a preadenylated 3' adapter (Figure 1) in the absence of ATP. This increases the efficiency of sRNA-adapter ligation and reduces the formation of side reactions such as sRNA circularization or concatemerization. Subsequently, a 5’ adapter is ligated by RNA ligase 1 (RNL1), followed by reverse transcription (RT) and PCR amplification. All these steps may introduce bias3,4. Consequently, read numbers may not reflect actual sRNA expression levels leading to artificial, method-dependent expression patterns. Specific sRNAs may be either over- or underrepresented in a library, and strongly underrepresented sRNAs may escape detection. The situation is particularly complicated with plant miRNAs, siRNAs in insects and plants, and piRNAs in insects, nematodes and mammals, in which the 3' terminal nucleotide has a 2'-O-methyl (2'-OMe) modification1. This modification strongly inhibits 3' adapter ligation5, making library preparation for these types of RNA a difficult task.
Previous work demonstrated that adapter ligation introduces serious bias, due to RNA sequence/structure effects6,7,8,9,10,11. Steps downstream of adapter ligation such as reverse transcription and PCR do not significantly contribute to bias6,11,12. Ligation bias is likely due to the fact that adapter molecules with a given sequence will interact with sRNA molecules in the reaction mixture to form co-folds, that may either lead to favorable or unfavorable configurations for ligation (Figure 2). Data from Sorefan et al7 suggest that RNL1 prefers a single stranded context, while RNL2 prefers a double strand for ligation. The fact that the adapter/sRNA co-fold structures are determined by the specific adapter and sRNA sequences explains why specific sRNA are over- or underrepresented with a given adapter set. It is also important to note that within a series of sRNA libraries to be compared, the same adapter sequences should be used. Indeed, it has previously been observed that changing adapters by the introduction of different barcode sequences alters miRNA profiles in sequencing libraries9,13.
Randomization of adapter sequences near the ligation junction likely reduces these biases. Sorefan and colleagues7 used adapters with 4 random nucleotides at their extremities, designated "High Definition" (HD) adapters, and showed that the use of these adapters lead to libraries that better reflect true sRNA expression levels. More recent work confirmed these observations and revealed that the randomized region does not need to be adjacent to the ligation junction11. This novel type of adapters was named "MidRand" adapters. Together, these results demonstrate that improved adapter design can reduce bias.
Instead of modifying the adapters, bias can be suppressed through the optimisation of reaction conditions. Polyethylene glycol (PEG), a macromolecular crowding agent known to increase ligation efficiency14, has been shown to significantly reduce bias15,16. Based on these results, several "low bias" kits appeared on the market. These include kits that use PEG in the ligation reactions, either in combination with classical adapters or HD adapters. Other kits avoid ligation altogether, and use 3' polyadenylation and template switching for 3' and 5' adapter addition, respectively12. In yet another strategy, 3' adapter ligation is followed by a circularization step, thus omitting 5' adapter ligation17.
In a previous study, we searched for a sRNA library preparation protocol with the lowest possible levels of bias and the best detection of 2'-OMe RNAs12. We tested some of the above-mentioned 'low bias' kits, which had a better detection of 2'-OMe RNAs than the standard protocol (TS). Surprisingly however, upon modification (the use of randomized adapters, PEG in the ligation reactions and removal of excess 3' adapter by purification) the latter outperformed the other protocols for the detection of 2'-OMe RNAs. Here, we describe in detail a protocol based on the TS protocol, 'TS5', which had the best overall detection of 2'-OMe RNAs. The protocol can be followed using reagents from the TS kit and one reagent from the 'Nf' kit or, to save money, using homemade reagents, with equal performance. We also provide a detailed protocol for the purification of sRNA from total RNA and the preparation of preadenylated 3' adapter.
1. Isolation of small RNAs
2. Preparation of preadenylated 3' HD adapter
NOTE: Preadenylation of 3' HD adapter was done in a manner similar to the protocol described by Chen et al18. Note that preadenylated adapter can be ordered directly (/5rApp/ modification), but this is quite expensive.
3. Library preparation – Protocol TS5
NOTE: We present here the modified TS protocol 'TS5' that we described previously12 and that can be performed either with reagents from the kit or with self-provided reagents. It should be noted that we obtained similar or even slightly better results with a different protocol, 'TS7'. However, with TS7 it is more difficult to eliminate adapter dimers. We have therefore preferred to describe TS5 in detail, but TS7 can be followed by simply replacing the adapters. For TS7 use the 'MidRand-Like (MRL)' adapter sequences (Table 1). Note that here the randomized regions are in the middle of the adapters. Primers for reverse transcription and PCR will hybridize to the sequences downstream of the randomized region in the 3' adapter and upstream of the randomized region in the 5' adapter. Sequencing will start from the first randomized nucleotide in the 5' adapter.
4. Data analysis
NOTE: The data analysis procedure described below is based on the Linux operating system Ubuntu 16.04 LTS.
Critical steps are the isolation of the small RNA fraction of the starting total RNA material (Figure 3) and the desired final library product (Figure 4). Both steps involve polyacrylamide gel purification; small RNA is isolated from 15% TBE urea gels, while the final libraries are isolated from 6% native TBE gels. Small RNA isolated from gel can be analyzed on a small RNA capillary electrophoresis chip (Table of Materials; Figure 3B). This will allow users to estimate the amount of small RNA recovered and the proportion of miRNA in the preparation.
Gel purification of the final library product is a delicate step as a number of additional products are formed that migrate close to the desired library. It is important to not overload the gel as this will increase the risk to contaminate the library with other species such as adapter dimers. As An example (Figure 4), increasing amounts of PCR-amplified library (from B. napus RNA) were loaded on the gel and the product corresponding to the expected size (150 bp) was cut out (Figure 4A). After elution, the purified library was checked on a capillary gel electrophoresis chip; in addition to the expected 150 bp product, an increasing proportion of a 130 bp species, corresponding to adapter dimers was observed as increasing amounts of PCR product were loaded (Figure 4B,C).
We have tested if protocol TS5 performs similarly with homemade reagents as with reagent from the kits. To this end, we prepared libraries from a mix of synthetic small RNAs 1-6, each present without or with a 2'-OMe modification, as done in our previous study12. Figure 5 shows the proportion of reads corresponding to each of these RNAs obtained previously and with the new libraries made using reagents from the TS and Nf kits or from other suppliers. As can be seen, very similar results were obtained.
Figure 6 shows a comparison of the performance of protocol TS5 with the standard TS protocol for the detection of plant (A. thaliana and B. napus) miRNAs, which are 2'-OMe modified, and of unmodified human miRNAs. We also tested the detection of piRNAs, 2'-OMe modified in human samples. As can be seen, TS5 performs significantly better than TS for the detection of 2'-OMe RNAs but not for unmodified RNAs. However, also for unmodified sRNAs, even though not a larger number of RNAs are detected, the obtained read numbers probably better reflect true expression levels with TS5 than with TS due to lower levels of bias.
Figure 1. Schematic representation of the sRNA library preparation workflow for Illumina sequencing. First, sRNAs are isolated by a gel purification step. Here, the size range of miRNAs is indicated but any other size range could be selected, depending on the RNAs of interest. A quality control (QC) step is then performed to check the quality and quantity of isolated sRNA. During library preparation, a preadenylated (App) 3' adapter is first ligated to the sRNA. Then, a 5' adapter is ligated. Subsequently reverse transcription is performed using a primer complementary to the 3' adapter, followed by PCR amplification, during which the Illumina P5, P7, and index ('In') sequences are added. The resulting library is gel purified, followed by a quality control step. Then the library is sequenced, and data are analyzed. Please click here to view a larger version of this figure.
Figure 2. Bias due to sequence-dependent adapter-sRNA co-folding. In the adapter ligation mixture, adapters will cofold with sRNAs in a manner that depends on the sequence of the adapter and the sRNA. Thus, different sRNAs (illustrated by examples a, b, and c; indicated by different shades of green) will cofold differently with a given adapter (the 3' adapter is indicated in red, the 5' adapter is indicated in blue). The black arrows indicate ligation junctions. This may lead to a favorable or unfavorable context for ligation. As RNL2 appears to prefer a double stranded environment, 3' ligation is expected to be more efficient for RNAs a and b than for RNA c. With RNL1 having a preference for a single stranded region around the ligation junction, 5' adapter ligation may be most efficient for RNA a, followed by b and least efficient for c. Together this may result in an overrepresentation of sRNA a, an intermediate representation of RNA b and an underrepresentation of RNA c in the final library, even if the three RNAs are present at equal amounts in the original RNA sample. Note that in a similar fashion, a given sRNA will be represented differently when changing the adapter sequence (e.g., by adding different barcodes). Please click here to view a larger version of this figure.
Figure 3. Isolation of small RNA and quality control. (A). Electrophoretic separation of Brassica napus total RNA (10 µg) on a 15% TBE urea denaturing polyacrylamide gel. A small RNA ladder (see Table of Materials) was migrated along as a molecular size marker. After migration the gel was stained and the RNA was visualized on a trans illuminator. The region from 17 to 29 nucleotides was cut out (indicated by a red rectangle) and RNA was eluted. (B). The quality of the purified RNA was checked by capillary gel electrophoresis. Note that this analysis provides information on the proportion of miRNA in the sample (93% in this case). Please click here to view a larger version of this figure.
Figure 4. Gel purification of a B. napus small RNA library prepared following protocol TS5 and quality control. (A). Increasing amounts of a PCR amplified library from B. napus small RNA were loaded on a 6% native TBE gel; 2.5 µL (a), 5 µL (b), 10 µL (c) or 20 µL (d) PCR product. A 50 bp ladder was migrated alongside. PCR products migrating at the expected 150 pb position were isolated (red rectangle), DNA was eluted and purified. (B). Quality control of the purified library; gel representation. (C). Electropherogram representation of the same analysis. As can be seen, the 150 pb product is increasingly contaminated with adapter dimers (~130 bp) as larger amounts of PCR product are migrated on gel. Please click here to view a larger version of this figure.
Figure 5. Protocol TS5 performs similarly with reagents from the TS and Nf kits or with reagents from other suppliers. Histograms representing the percentage of the total numbers of raw reads (before trimming) corresponding to RNA(OMe)1-6 with protocol TS5 followed with reagents from the TS and Nf kits (blue bars), or reagents from other suppliers (orange bars). For comparison, results from our previous study are shown by grey bars. The total numbers of reads corresponding to RNA1-6 (total RNA) or RNA-OMe1-6 (total RNA-OMe) are shown as well. Shown are the mean values of at least two independent experiments. Error bars represent standard deviations. Part of this figure has been modified from Dard-Dascot et al, 201812 Please click here to view a larger version of this figure.
Figure 6. Comparison of miRNA detection of protocol TS5 with the classical TS method. (A). The proportion of reads mapping to A. thaliana, B. napus, or H. sapiens miRNAs in miRBase were determined. We also mapped the reads of the human libraries to piRBase for piRNA detection. Note that A.thaliana and B.napus miRNAs, as well as human piRNAs are 2'-OMe modified, in contrast to human miRNAs. Shown are the mean values of at least two independent experiments and the error bars represent standard deviations. (B). Numbers of miRNAs (or piRNAs) identified. We determined the numbers of known miRNAs from the different species identified with protocols TS or TS5. Note that for A. thaliana, B. napus, or H. sapiens, 427, 92, and 2588 miRNAs have been registered in miRBase, respectively. 0.5 million reads from the TS or TS5 libraries were mapped to B. napus miRNAs in miRbase, 1 million reads were mapped to the A. thaliana or human databases. Shown are the mean values of at least two independent experiments with standard deviations represented by error bars. This figure has been modified from Dard-Dascot et al, 201812. Please click here to view a larger version of this figure.
Name oligonucleotide | 5' modification | 3' modification | sequence 5' to 3' | purification | |||||
5' HD (TS5) adapter | 5AmMC6 | [5AmMC6]GTTCAGAGTTCTACAGTCCGACGATCNrNrNrN (note that this oligo is a DNA-RNA chimeric; the three 3' terminal nts are RNA) | HPLC | ||||||
3' HD (TS5) adapter | phosphate | 3AmMO | [Phos]rNrNrNrNTGGAATTCTCGGGTGCCAAGG[3AaMO] (note that this oligo is a DNA-RNA chimeric; the four 5' terminal nts are RNA) | HPLC | |||||
5' MRL (TS7) adapter | 5AmMC6 | [5AmMC6]GTTCAGAGTTCTACAGTCCGACGATCNNNNrArCrGrArUrArC (note that this oligo is a DNA-RNA chimeric; the seven 3' terminal nts are RNA) | HPLC | ||||||
3' MRL (TS7) adapter | phospate | 3AmMO | [Phos]GTATCGTNNNNNNTGGAATTCTCGG[3AmMO] | HPLC | |||||
RT primer | GCCTTGGCACCCGAGAATTCCA | HPLC | |||||||
Universal P5 primer | AATGATACGGCGACCACCGAGATCTACACGTTCAGAGTTCTACAGTCCGA | HPLC | |||||||
P7-index primer | CAAGCAGAAGACGGCATACGAGATNNNNNNGTGACTGGAGTTCCTTGGCACCCGAGAATTCCA (NNNNNN = index) | HPLC | |||||||
index 1: CGTGAT | |||||||||
index 2: ACATCG | |||||||||
index 3: GCCTAA | |||||||||
index 4: TGGTCA | |||||||||
index 5: CACTGT | |||||||||
index 6: ATTGGC | |||||||||
index 7: GATCTG | |||||||||
index 8: TCAAGT | |||||||||
index 9: CTGATC | |||||||||
index 10: AAGCTA | |||||||||
index 11: GTAGCC | |||||||||
index 12: TACAAG |
Table 1. Oligonucleotides used with this protocol.
Small RNA library preparation remains challenging due to bias, mainly introduced during adapter ligation steps. RNAs with a 2'-OMe modification at their 3' end such as plant miRNAs, piRNA in insects, nematodes and mammals, and small interfering RNAs (siRNA) in insects and plants are particularly difficult to study because the 2'-OMe modification inhibits 3' adapter ligation. A number of solutions have been proposed in the literature to improve sRNA library preparation protocols, but most commercially available kits are still based on the classical TS protocol, which has severe bias. A few 'low bias' kits exist, however, including the Nf kit with randomized adapters and PEG in the ligation reactions, and a few kits appeared recently that avoid adapter ligation altogether12. We reported previously that Nf detects more different miRNAs than the standard TS protocol, but protocol 'S' (without adapter ligation) performed relatively poorly due to a significant formation of side-products12. Surprisingly, upon modification the TS protocols had a more sensitive detection of 2'-OMe sRNAs than the other protocols, but not of normal sRNAs. We chose here to describe in detail the TS5 protocol, in which the adapters are randomized at their extremities, PEG is used in the ligation reactions and excess 3' adapter is eliminated by purification on beads. It should be noted here that a different protocol (TS7), using MRL adapters may perform slightly better than the TS5 protocol. However, as there are only minor differences between the two and because with TS7, it is more difficult to separate the desired library product from adapter dimers, we preferred here to describe in detail the TS5 protocol. However, users can, if desired, replace the TS5 adapters by TS7 adapters. Note that these are slightly longer leading to a final library product of ~170 bp rather than ~150 bp.
The possibility to perform protocol TS5 or TS7 with 'home-made' materials allows to substantially reduce cost. However, there may be a larger variability in terms of quality with home-made materials; especially home-made pre-adenylated 3' adapter may be subject to variable quality due to varying efficacies of pre-adenylation. It is therefore recommended to prepare a large stock and if a new stock is prepared, compare its performance with the previous one. A control RNA sample can be used for this purpose.
A disadvantage of the protocols describe herein is the relatively strong formation of side-products and the difficulty to separate the desired library from these products. Care must be taken to not overload the acrylamide gel for purification. Recently, modified adapters were developed that have a strongly reduced tendency to form dimers21. A 2'-OMe modification at the 3' end of the 5' adapter combined with a methylphosphonate modification at the 5' extremity of the 3' adapter efficiently suppressed adapter dimers. It will be interesting to test such modified adapters in the herein described protocol.
The gel purification steps in protocol TS5 are relatively labor-intensive. If the use of modified adapters efficiently reduces the formation of adapter dimers, gel purification of the final library may not be necessary anymore. In addition, as an alternative to the gel purification step to isolate small RNA (step 1), a strategy using magnetic beads to enrich for small RNAs exists (https://ls.beckmancoulter.co.jp/files/appli_note/Supplemental_Protocol_for_miRNA_.pdf). We have not used this method ourselves, but it is worth testing and if it works well it could significantly simplify the protocol.
In conclusion, while protocol TS5 could be further improved, it performs better than commercially available kits, at least those tested in our previous comparative analysis, for the detection of 2'OMe sRNA. It can be followed using home-made materials, allowing significant cost reduction. For convenience, and perhaps more constant performance, reagents from the TS and Nf kits can be used.
The authors have nothing to disclose.
This work was supported by the National Center for Scientific Research (CNRS), The French Alternative Energies and Atomic Energy Commission (CEA) and Paris-Sud University. All library preparation, Illumina sequencing and bioinformatics analyses for this study were performed at the I2BC Next-Generation Sequencing (NGS) facility. The members of the I2BC NGS facility are acknowledged for critical reading of the manuscript and helpful suggestions.
2100 Bioanalyzer Instrument | Agilent | G2939BA | |
Acid-Phenol:Chloroform, pH 4.5 (with IAA, 125:24:1) | ThermoFisher | AM9720 | |
Adenosine 5'-Triphosphate (ATP) | Nex England Biolabs | P0756S | |
Agencourt AMPure XP beads | Beckman Coulter | A63880 | |
Bioanalyzer High Sensitivity DNA Kit | Agilent | 5067-4626 | |
Bioanalyzer Small RNA Kit | Agilent | 5067-1548 | |
Corning Costar Spin-X centrifuge tube filters | Sigma Aldrich | CLS8162-96EA | |
Dark Reader transilluminator | various suppiers | ||
HotStart PCR Kit, with dNTPs | Kapa Biosystems | KK2501 | |
NEXTflex small RNA-seq V3 kit | BIOO Scientific | NOVA-5132-05 | optional |
Novex TBE gels 6% | ThermoFisher | EC6265BOX | |
Novex TBE Urea gels 15% | ThermoFisher | EC6885BOX | |
QIAquick Nucleotide Removal Kit | Qiagen | 28304 | |
Qubit 4 Quantitation Starter Kit | ThermoFisher | Q33227 | |
Qubit ssDNA Assay Kit | ThermoFisher | Q10212 | |
RNA Gel Loading Dye (2X) | ThermoFisher | R0641 | |
RNA Gel Loading Dye (2X) | ThermoFisher | R0641 | |
RNase Inhibitor, Murine | Nex England Biolabs | M0314S | |
SuperScript IV Reverse Transcriptase | ThermoFisher | 18090200 | |
SYBR Gold Nucleic Acid Gel Stain | ThermoFisher | S11494 | |
T4 RNA Ligase 1 (ssRNA Ligase) | Nex England Biolabs | M0204S | |
T4 RNA Ligase 2, truncated | Nex England Biolabs | M0242S | |
TrackIt 50 bp DNA ladder | ThermoFisher | 10488043 | |
TruSeq Small RNA Library Prep Kit | Illumina | RS-200-0012/24/36/48 | optional |
UltraPure Glycogen | ThermoFisher | 10814010 | |
XCell SureLock Mini-Cell | ThermoFisher | EI0001 | |
XCell SureLock Mini-Cell | ThermoFisher | EI0001 | |
ZR small RNA ladder | Zymo Research | R1090 | |
the last two numbers correspond to the set of indexes |