Here, we describe a step-by-step strategy for isolating small RNAs, enriching for microRNAs, and preparing samples for high-throughput sequencing. We then describe how to process sequence reads and align them to microRNAs, using open source tools.
Half of all human transcripts are thought to be regulated by microRNAs. Therefore, quantifying microRNA expression can reveal underlying mechanisms in disease states and provide therapeutic targets and biomarkers. Here, we detail how to accurately quantify microRNAs. Briefly, this method describes isolating microRNAs, ligating them to adaptors suitable for high-throughput sequencing, amplifying the final products, and preparing a sample library. Then, we explain how to align the obtained sequencing reads to microRNA hairpins, and quantify, normalize, and calculate their differential expression. Versatile and robust, this combined experimental workflow and bioinformatic analysis enables users to begin with tissue extraction and finish with microRNA quantification.
First discovered in 19931, it is now estimated that nearly 2000 microRNAs are present in the human genome2. MicroRNAs are small non-coding RNAs that are typically 21-24 nucleotides long. They are post-transcriptional regulators of gene expression, often binding to complementary sites in the 3-untranslated region (3-UTR) of target genes to repress protein expression and degrade mRNA. Quantifying microRNAs can give valuable insight into gene expression and several protocols have been developed for this purpose3.
We have developed a defined, reproducible, and long-standing protocol for small RNA sequencing, and for analyzing normalized reads using open source bioinformatics tools. Importantly, our protocol enables the simultaneous identification of both endogenous microRNAs and exogenously delivered constructs that produce microRNA-like species, while minimizing reads that map to other small RNA species, including ribosomal RNAs (rRNAs), transfer RNA-derived small RNAs (tsRNAs), repeat-derived small RNAs, and mRNA degradation products. Fortunately, microRNAs are 5-phosphorylated and 2-3 hydroxylated4, a feature that can be leveraged to separate them from these other small RNAs and mRNA degradation products. Several commercial options exist for microRNA cloning and sequencing that are often quicker and easier to multiplex; however, the proprietary nature of kit reagents and their frequent modifications makes comparing sample runs challenging. Our strategy optimizes collecting only the correct size of microRNAs through acrylamide and agarose gel purification steps. In this protocol, we also describe a procedure for aligning sequence reads to microRNAs using open source tools. This set of instructions will be especially useful for novice informatics users, regardless of whether our library preparation method or a commercial method is used.
This protocol has been used in several published studies. For example, it was used to identify the mechanism by which the Dicer enzyme cleaves small hairpin RNAs at a distance of two nucleotides from the internal loop of the stem-loop structure – the so-called "loop-counting rule"5. We also followed these methods to identify the relative abundance of delivered small hairpin RNAs (shRNA) expressed from recombinant adeno-associated viral vectors (rAAVs), to identify the threshold of shRNA expression that can be tolerated prior to liver toxicity associated with excess shRNA expression6. Using this protocol, we also identified microRNAs in the liver that respond to the absence of microRNA-122 – a highly expressed hepatic microRNA – while also characterizing the degradation pattern of this microRNA7. Because we have used our protocol consistently in numerous experiments, we have been able to observe sample preparations longitudinally, and see that there are no discernible batch effects.
In sharing this protocol, our goal is to enable users to generate high quality, reproducible quantification of microRNAs in virtually any tissue or cell line, using affordable equipment and reagents, and free bioinformatics tools.
Animal experiments were authorized by the Institutional Animal Care and Use Committee of the University of Washington.
Small RNA library preparation
1. RNA isolation
2. 3' adaptor ligation
3. 5' linker ligation
4. Reverse transcription (RT)
5. PCR amplification
6. Agarose gel purification
Small RNA sequence alignment and bioinformatics
7. Data upload
8. Adaptor removal, barcode sort, and trim
9. Alignment of reads to microRNAs
Schematic of steps involved in library preparation
An overall schematic of small RNA extraction, sequencing, and alignment is outlined in Figure 2.
Liver samples from one male and one female mouse were collected and snap frozen in liquid nitrogen. Total RNA was extracted and evaluated for quality and concentration.
Small RNA sequencing yields sufficient RNA for sequencing
3 μg of RNA from two independent RNA extractions were used as starting material for small RNA sequencing. Samples were run on an acrylamide gel and cut out between size markers corresponding to 17-28 nt of RNA (Figure 1A). Samples were chopped into fragments for RNA isolation (Figure 1B) and transferred to a low-retention 1.5 mL centrifuge tube (Figure 1C). Barcodes bc7 and bc17 (Table 1) were ligated to the 5' end of the small RNA. Small RNA libraries were PCR amplified using 22 cycles of PCR to yield 8.0 and 11.2 ng/μL product, respectively. Samples were pooled and a 10 nM pooled sample was submitted for high-throughput sequencing using a 50 bp read length.
MiR-122 is the most abundant microRNA in the mouse liver
After barcode sorting, 851,931 reads contained barcodes from liver sample 1 and 650,154 from liver sample 2. Of the reads, 83.5% and 90.0% mapped to microRNAs respectively, with the remaining reads mapping to rRNAs (1.8% and 0.6% respectively), tRNAs and mRNA degradation fragments. After alignment to human microRNA hairpins, we observed strong concordance between microRNA read counts in each replicate (R2 = 0.998; Figure 3). A total of 306 microRNA species were detected, with the greatest number of reads mapping to miR-122 (Supplementary Table 4). MicroRNA abundance was similar between male and female liver samples.
Figure 1. Extraction of small RNAs from an acrylamide gel. (A) Acrylamide gel and region that is cut corresponding to the size of microRNAs. (B) Gel pieces before and after cutting into smaller fragments. (C) Process for transferring gel fragments into siliconized tubes. (D) PCR reaction on low-melt agarose gel demonstrating correct cloned product compared with linker-linker product and unsaturated (22 cycles) versus saturated (24 cycles) samples. Please click here to view a larger version of this figure.
Figure 2. Schematic of protocol. A timeline showing the major steps involved in the procedure. Please click here to view a larger version of this figure.
Figure 3. Reproducibility of results from two independent RNA extractions. Scatterplot of microRNA read counts from a male mouse liver (x-axis) compared with a female mouse liver (y-axis) using a log10-based scale. Each point represents the reads per million (RPM) mapped microRNA count for each individual microRNA. Please click here to view a larger version of this figure.
Primer | Sequence | |
3' linker 1 | rAppCTGTAGGCACCATCAAT–NH2 | |
lower size marker | rArUrCrGrCrArUrGrCrUrGrArCrGrUrArCrUrArGGTAACCGCATCATGCGTC | |
upper size marker | rArArUrCrArGrCrGrGrArUrUrGrCrArUrGrArArCrGrUrArCrArUrArGGTAACCGCATCATGCGTC | |
barcode1 | /5AmMC6/ACGCTCTTCCGATCTrArGrCrG | |
barcode2 | /5AmMC6/ACGCTCTTCCGATCTrCrGrUrC | |
barcode3 | /5AmMC6/ACGCTCTTCCGATCTrCrUrGrG | |
barcode4 | /5AmMC6/ACGCTCTTCCGATCTrArCrUrU | |
barcode5 | /5AmMC6/ACGCTCTTCCGATCTrGrGrGrU | |
barcode6 | /5AmMC6/ACGCTCTTCCGATCTrGrUrUrA | |
barcode7 | /5AmMC6/ACGCTCTTCCGATCTrUrArUrG | |
barcode8 | /5AmMC6/ACGCTCTTCCGATCTrUrCrGrC | |
barcode9 | /5AmMC6/ACGCTCTTCCGATCTrGrCrArG | |
barcode10 | /5AmMC6/ACGCTCTTCCGATCTrArUrArC | |
barcode11 | /5AmMC6/ACGCTCTTCCGATCTrUrUrCrU | |
barcode12 | /5AmMC6/ACGCTCTTCCGATCTrCrArArU | |
barcode13 | /5AmMC6/ACGCTCTTCCGATCTrArArGrA | |
barcode14 | /5AmMC6/ACGCTCTTCCGATCTrUrGrArA | |
barcode15 | /5AmMC6/ACGCTCTTCCGATCTrUrGrGrG | |
barcode16 | /5AmMC6/ACGCTCTTCCGATCTrArUrUrG | |
barcode17 | /5AmMC6/ACGCTCTTCCGATCTrUrCrArU | |
barcode18 | /5AmMC6/ACGCTCTTCCGATCTrGrUrArU | |
RT primer | ATTGATGGTGCCTACAG | |
PCR primer F | AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT | |
PCR primer R | CAAGCAGAAGACGGCATACGAGCTCTTCCGATCTATTGATGGTGCCTACAG |
Table 1. List of primers.
Supplemental Table 1. List of barcode sequences. Please click here to download this file.
Supplemental Table 2. Curated list of mouse microRNA precursor sequences. Please click here to download this file.
Supplemental Table 3. Curated list of human microRNA precursor sequences. Please click here to download this file.
Supplemental Table 4. Raw and normalized microRNA read counts. Please click here to download this file.
Despite the identification of microRNAs over 20 years ago13, the process of microRNA sequencing remains laborious and requires specialized equipment, hindering laboratories from routinely adopting in-house protocols14. Other techniques can simultaneously evaluate microRNAs, like microRNA microarrays and multiplexed expression panels; however, these approaches are limited in that they only quantify the microRNAs present in their probe set. Because of this, they miss important features of small RNA sequencing, like the identification of novel microRNAs, and of microRNA isoforms – nucleoside changes that can have important biological function6,7,15.
When starting a new experiment, using a commercial vendor is often easiest because they offer technical support and ease of use. Several commercial options are available for microRNA sequencing, which can be multiplexed to reduce the workload when processing large numbers (>100) of samples. These commercial kits are continually being improved, which is both an advantage and disadvantage. On the one hand, the companies who make these kits have developed novel microRNA capture methods, for example, through circularization of their 5 and 3 ends prior to sequencing or using degenerate linkers with random sequences at each end to reduce ligation bias. They have also developed methods to remove adaptor-dimers, for example, through ligation of double-stranded adaptors or hybridization of complementary oligonucleotides. On the other hand, commercial kits recommend against modification or alteration of any step. Therefore, if any updates are made to a kit, it is difficult or impossible to compare data derived from old and new versions of kits, as well as data derived from kits from different commercial vendors. Here, we have described a protocol that has staying power in the face of commercial alternatives. Our focus on gel purification steps – while they add time to the protocol – enables consistent microRNA capture and reproducibility over the many years we have used it. Several evaluations of the reproducibility between commercial kits and in-house protocols have been made, and we refer the reader to a few of these studies16,17,18,19. Importantly, the steps we outline for bioinformatics analysis of microRNAs can be employed regardless of the choice of kit or in-house protocol.
MicroRNA sequencing is often troubled by the choice of barcodes: in some instances, the ligation efficiency of the various barcodes may not be equivalent, leading to biased distributions of sequences in the samples20. It is now recommended to use degenerate bases at the 5 and 3 end to minimize ligation biases of specific microRNAs21,22. In this protocol, we have not observed these ligation issues and have observed consistent readouts for technical and biological replicates evaluated with different barcodes5,6,7,23, but it is important to be aware of them. Methods to avoid ligation bias include the incorporation of index primers in the PCR primers, or to add one or more random RNA nucleosides at the 3 end of the 5 adaptor sequence (Table 1). Introducing one or more synthetic spike-in RNA, such as the C. elegans miR-39 microRNA24, is also an option for normalization purposes, which is critical for low-yield situations, like quantifying microRNAs from exosomes. Likewise, for RNA ligation, we have successfully used T4 RNA ligase 1, but ligation with less bias has been demonstrated for a truncated form of T4 RNA ligase 225. Finally, Superscripts III and IV are alternative reverse transcription enzymes that we have used without issue.
The choice of microRNA database can also influence the final normalized results. A challenge with curating microRNA databases is that several novel small RNAs listed as microRNAs are actually fragments of repeat elements and not bona fide microRNAs2. Efforts have been made to retire microRNAs that do not conform to standard criteria, so that the next version of a microRNA database is more refined; however, the next iteration also contains new candidates that need confirmation. When repeat-derived microRNAs are included in alignments, they can skew the results and overwhelm data from existing microRNAs. Therefore, the use of well-curated datasets of microRNAs from different species is essential26,27. We have experienced greatest reproducibility when aligning small RNA sequencing reads to curated lists of conserved microRNAs and have included these hairpins in Supplementary Table 2 and Supplementary Table 3. These lists match estimates of about 500 high confidence microRNAs in the human genome2,26.
As with any technique, results should be confirmed with an orthogonal approach. We have successfully reproduced small RNA sequencing results with small RNA northern blots that incorporate radiolabeled probes to confirm the size and relative abundance of candidate microRNAs6,7,23. Quantitative PCR of microRNAs using split-read sequencing, and confirmation of target mRNA changes using qPCR and western blotting are other options for validation.
In summary, we have provided a method to isolate and sequence microRNAs and perform alignments against existing microRNA databases. The affordability of the reagents and equipment and the use of open source tools for analysis should make this protocol accessible to anyone. Finally, this protocol can be used in any tissue or cell line to yield highly reproducible, high-quality reads.
The authors have nothing to disclose.
We would like to thank members of the laboratories of Andrew Fire and Mark Kay for guidance and suggestions.
100 bp DNA ladder | NEB | N3231 | |
19:1 bis-acrylamide | Millipore Sigma | A9926 | |
25 bp DNA step ladder | Promega | G4511 | |
Acid phenol/chloroform | ThermoFisher | AM9720 | |
Acrylamide RNA loading dye | ThermoFisher | R0641 | |
Ammonium persulfate (APS) | Biorad | 161-0700 | |
Bioanalyzer instrument | Agilent | G2991AA | For assessing RNA quality and concentration |
Chloroform | Fisher Scientific | C298-500 | |
Ethanol (100%) | Sigma | E7023 | |
Gel Loading Buffer II | ThermoFisher | AM8547 | |
GlycoBlue | ThermoFisher | AM9516 | Blue color helps in visualizing pellet |
HCl | Sigma | 320331 | |
KOH | Sigma | P5958 | |
Maxi Vertical Gel Box 20 x 20cm | Genesee | 45-109 | |
miRVana microRNA isolation kit | ThermoFisher | AM1560 | |
miSeq system | Illumina | SY-410-1003 | For generating small RNA sequencing data |
NaCl | Fisher Scientific | S271-500 | |
Nusieve low-melting agarose | Lonza | 50081 | |
Parafilm (laboratory sealing film) | Millipore Sigma | P7793 | |
Poly-ethylene glycol 8000 | NEB | included with M0204 | |
ProtoScript II First strand cDNA Synthesis Kit | NEB | E6560S | |
QIAquick Gel Extraction kit | Qiagen | 28704 | |
Qubit Fluorometer | ThermoFisher | Q33226 | For quantifying DNA concentration |
Qubit RNA HS Assay kit | ThermoFisher | Q32855 | |
Razor Blades | Fisher Scientific | 12640 | |
Siliconized Low-Retention 1.5 ml tubes | Fisher Scientific | 02-681-331 | |
T4 RNA ligase 1 | NEB | M0204 | |
T4 RNA Ligase 2, truncated | NEB | M0242S | |
TapeStation | Agilent | G2939BA | For assessing RNA quality and concentration |
Taq DNA Polymerase | NEB | M0273X | |
TEMED | Biorad | 161-0800 | |
Tris Base pH 7.5 | Sigma | 10708976001 | |
Tris-buffered EDTA | Sigma | T9285 | |
Trizol | ThermoFisher | 15596026 | |
UltraPure Ethidium bromide (10 mg/ml) | Invitrogen | 15585-011 | |
Universal miRNA cloning linker | NEB | S1315S | |
Urea | Sigma | U5378 |