Circular RNAs (circRNAs) are non-coding RNAs that may have roles in transcriptional regulation and mediating interactions between proteins. Following assessment of different parameters for construction of circRNA sequencing libraries, a protocol was compiled utilizing stranded total RNA library preparation with RNase R pre-treatment and is presented here.
Circular RNAs (circRNAs) are a class of non-coding RNAs involved in functions including micro-RNA (miRNA) regulation, mediation of protein-protein interactions, and regulation of parental gene transcription. In classical next generation RNA sequencing (RNA-seq), circRNAs are typically overlooked as a result of poly-A selection during construction of mRNA libraries, or are found at very low abundance, and are therefore difficult to isolate and detect. Here, a circRNA library construction protocol was optimized by comparing library preparation kits, pre-treatment options and various total RNA input amounts. Two commercially available whole transcriptome library preparation kits, with and without RNase R pre-treatment, and using variable amounts of total RNA input (1 to 4 µg), were tested. Lastly, multiple tissue types; including liver, lung, lymph node, and pancreas; as well as multiple brain regions; including the cerebellum, inferior parietal lobe, middle temporal gyrus, occipital cortex, and superior frontal gyrus; were compared to evaluate circRNA abundance across tissue types. Analysis of the generated RNA-seq data using six different circRNA detection tools (find_circ, CIRI, Mapsplice, KNIFE, DCC, and CIRCexplorer) revealed that a stranded total RNA library preparation kit with RNase R pre-treatment and 4 µg RNA input is the optimal method for identifying the highest relative number of circRNAs. Consistent with previous findings, the highest enrichment of circRNAs was observed in brain tissues compared to other tissue types.
Circular RNAs (CircRNAs) are endogenous, non-coding RNAs that have gained attention given their pervasive expression in the eukaryotic transcriptome1,2,3. They are formed when exons back-splice to each other and hence were initially considered to be splicing artifacts4,5. However, recent studies have demonstrated that circRNAs exhibit cell type, tissue, and developmental stage specific expression3,6 and are evolutionarily conserved2,3. Furthermore, they are involved in mediation of protein-protein interactions7, micro-RNA (miRNA) binding3,8,9,10, and regulation of parental gene transcription11.
In classical RNA sequencing (RNA-seq), circRNAs may be completely lost during library construction as a result of poly-A selection for mRNAs or may be difficult to isolate given their low abundance. However, recent circRNA characterization studies have incorporated a pre-treatment step using RNase R in order to enrich for circRNAs2,12,13. RNase R is an exoribonuclease that digests linear RNAs, leaving behind circular RNA structures. CircRNA enrichment protocols were optimized by generating and comparing data from two commercially available whole transcriptome library construction kits, with and without an RNase R pre-treatment step, and using varying amounts of total RNA input (1 to 4 µg). The optimized protocol was next used to evaluate the abundance of circRNAs across five different brain regions (cerebellum [BC], inferior parietal lobe [IP], middle temporal gyrus [MG], occipital cortex [OC] and superior frontal gyrus [SF]) and four other tissue types (liver [LV], lung [LU], lymph node [LN] and pancreas [PA]). RNA-seq libraries were paired end sequenced and data was analyzed using six different circRNA prediction algorithms: find_circ3, CIRI14, Mapsplice15, KNIFE16, DCC17, and CIRCexplorer18. Based on our analysis, the highest number of unique circRNAs was detected when using a stranded total RNA library preparation kit with RNase R pre-treatment and 4 µg total input RNA. The optimized protocol is described here. As previously reported19,20, the highest enrichment of circRNAs was observed in the brain compared to other tissue types.
This research has been performed in compliance with all institutional, national and international guidelines for human welfare. Brain tissues were obtained from the Banner Sun Health Research Institute Brain and Body Donation Program in Sun City, AZ. The operations of the Brain and Body Donation Program are approved by the Western Institutional Review Board (WIRB protocol #20120821). All subjects or their legal representatives signed the informed consent. Commercial (non-brain) biospecimens were purchased from Proteogenex.
1. RNase R Treatment
NOTE: In the following steps, the reaction volume is adjusted to a total volume of 50 µL. This is the minimum sample volume to be used in the RNA cleanup & concentrator kit (see Table of Materials). Additionally, the optimized protocol described here is for an input amount of 4 µg of total RNA. A longer incubation time for RNase R treatment is recommended for an input amount >4 µg.
2. Purifying RNA Using an RNA Cleanup and Concentrator Kit
NOTE: When using high quality RNA (RIN>8, DV200>80%), RNase R treatment may result in loss of approximately 60% of RNA. Using a 4 µg input, it is estimated that 2–2.5 µg of treated RNA is left after section 1.
3. circRNA Library Prep
NOTE: See Table of Materials for kit, which contains most reagents used in this section.
4. Data Analysis Workflow
Data generated using a commercially available universal control RNA (UC) and using two library preparation kits, both of which include a ribo-depletion step in their protocols, was first assessed. Using an analytical workflow (Data analysis workflow, section 4), overall, a higher number of circRNAs was detected in the TruSeq datasets compared to the Kapa ones (Figure 1). Although the ribosomal RNA (rRNA) percentages were below 5% in datasets from both kits for lower input amounts (1, 2 ug), Kapa datasets had higher rRNA content for 4, 5, and 10 ug inputs (Table 2). Hence, based on the number of detected circRNAs and rRNA depletion efficiency, further experiments were performed using the TruSeq kit.
Next, the significance of RNase R pre-treatment was tested by comparing the data generated from RNase R pre-treated and non-pre-treated libraries. To this end, total RNA was extracted from the MG of healthy elderly individuals and sequencing data generated from libraries with (N = 3) and without (N = 3) pre-treatment using RNase R23 was compared. A higher number of circRNAs was consistently identified in the pre-treated libraries compared to the non-pre-treated ones (Figure 2). This is expected since pre-treatment removes linear RNAs, thus enriching for circRNA species.
Thirdly, the amount of input RNA that would be optimal for detecting a higher diversity of circRNAs was tested. Libraries were prepared using 1, 2, and 4 µg of total input RNA which was extracted from MG, OC, and SF brain regions, and as well as UC RNA. Comparing the abundance of circRNAs detected from each library, the highest diversity of circRNA species was observed when using 4 µg input RNA compared to 2 and 1 µg (Figure 3), as reflected by the number of unique circRNAs identified. One caveat to note is that although various incubation times during RNase R treatment were not tested, a trend whereby an increasing number of circRNAs were detected across total RNA inputs of 1 to 4 µg was observed when controlling for all other parameters.
This optimized protocol was then applied across multiple tissue types to compare circRNA abundances. Five brain regions, including BC, MG, OC, IP, and SF, from four healthy elderly individuals, were tested, along with four other tissue types, including LV, LU, LN and PA, from six healthy donors. Overall, a higher abundance of circRNAs was observed in the brain compared to other tissue types (Figure 4), as has been previously reported19,20.
Figure 1: CircRNA detection using TruSeq vs. Kapa total RNA kits. Sequencing data was generated for UC RNA using two separate total RNA library preparation kits, each with 1, 2, 4, 5, and 10 µg input RNA and ribonuclease R (RNase R) pre-treatment. The number of circRNAs detected by the tools in each sample was normalized to the number of mapped reads, per million (Y-axis). Please click here to view a larger version of this figure.
Figure 2: CircRNA detection with and without RNase R pre-treatment. Sequencing data generated using the TruSeq kit was used to compare the impact of RNase R pre-treatment. RNA was extracted from the middle temporal gyrus (MG) of healthy elderly controls for this analysis. The normalized number of circRNAs detected (Y-axis) was calculated similarly to Figure 1. RNase R+ = pre-treated with RNase R, RNase R- = not pre-treated with RNase R. Please click here to view a larger version of this figure.
Figure 3: CircRNA detection using varying amount of input RNA. Using RNA extracted from MG, occipital cortex (OC), and superior frontal gyrus (SF), as well as UC RNA, the number of unique circRNAs detected when using 1, 2, and 4 µg of input RNA, each whose library was constructed using the TruSeq kit and RNase R pre-treatment, was compared. The normalized number of circRNAs detected (Y-axis) was calculated similarly to Figure 1. Please click here to view a larger version of this figure.
Figure 4: CircRNA detection in brain vs. other tissue types. CircRNA enriched datasets using RNA extracted from various brain regions including cerebellum (BC), inferior parietal lobe (IP), MG, OC, and SF, as well as four other tissue types including liver (LV), lung (LU), lymph node (LN), and pancreas (PA) were generated. CircRNA enrichment was carried out using the Illumina TruSeq kit with RNase R pre-treatment and 4 µg of total input RNA. Box plots represent the number of circRNAs detected by at least three of the six tools across the samples from each brain region/tissue type. Please click here to view a larger version of this figure.
Test # | Parameter evaluated | Test conditions | Sample source | Input amounts/conditions/samples tested | Total number of sequencing reads |
1 | Library preparation kit | Illumina TruSeq Stranded Total RNA vs. the Roche Kapa Total RNA kits | UC | TruSeq: 1 µg | 8,91,46,128 |
TruSeq: 2 µg | 7,93,90,202 | ||||
TruSeq: 4 µg | 6,66,12,238 | ||||
TruSeq: 5 µg | 7,88,56,902 | ||||
TruSeq: 10 µg | 6,61,06,874 | ||||
Kapa: 1 µg | 8,83,95,496 | ||||
Kapa: 2 µg | 10,66,59,272 | ||||
Kapa: 4 µg | 10,62,34,954 | ||||
Kapa: 5 µg | 7,47,75,914 | ||||
Kapa: 10 µg | 11,00,68,504 | ||||
2 | Pre-treatment | RNase R pre-treated vs. non pre-treated | MG | Pair1: MG_1 (RNase R+) | 10,76,09,934 |
Pair1: MG_5 (RNase R-) | 9,62,15,516 | ||||
Pair2: MG_2 (RNase R+) | 9,68,40,790 | ||||
Pair2: MG_6 (RNase R-) | 10,16,09,754 | ||||
Pair3: MG_3 (RNase R+) | 11,15,76,344 | ||||
Pair3: MG_7 (RNase R-) | 11,13,14,114 | ||||
3 | Total RNA input | 1 µg vs. 2 µg vs. 4 µg | MG, OC, SF, UC | MG: 1 µg | 12,00,94,758 |
MG: 2 µg | 11,64,75,728 | ||||
MG: 4 µg | 12,13,15,232 | ||||
OC: 1 µg | 11,11,18,120 | ||||
OC: 2 µg | 11,53,25,492 | ||||
OC: 4 µg | 11,49,13,266 | ||||
SF: 1 µg | 12,27,24,142 | ||||
SF: 2 µg | 9,39,33,288 | ||||
SF: 4 µg | 12,33,31,474 | ||||
UC: 1 µg | 9,24,48,120 | ||||
UC: 2 µg | 12,58,15,354 | ||||
UC: 4 µg | 12,56,92,534 | ||||
4 | Tissue types | Brain regions vs. other tissue types | BC, MG, OC, SF, IP, LU, LV, LN, PA | BC_1 | 10,72,08,904 |
BC_2 | 11,18,33,362 | ||||
BC_3 | 9,61,25,856 | ||||
BC_4 | 9,62,77,094 | ||||
IP_1 | 9,86,56,506 | ||||
IP_2 | 11,35,95,746 | ||||
IP_3 | 12,87,81,536 | ||||
IP_4 | 9,59,81,446 | ||||
MG_1 | 10,76,09,934 | ||||
MG_2 | 9,68,40,790 | ||||
MG_3 | 11,15,76,344 | ||||
MG_4 | 10,05,39,028 | ||||
OC_1 | 8,85,47,042 | ||||
OC_2 | 12,09,83,142 | ||||
OC_3 | 10,26,55,452 | ||||
OC_4 | 10,84,49,330 | ||||
SF_1 | 8,76,21,824 | ||||
SF_2 | 14,50,57,894 | ||||
SF_3 | 11,01,52,030 | ||||
SF_4 | 9,67,24,472 | ||||
LN_1 | 15,02,94,816 | ||||
LN_2 | 8,33,30,187 | ||||
LN_3 | 11,30,96,032 | ||||
LN_4 | 10,78,38,278 | ||||
LU_1 | 16,05,27,595 | ||||
LU_2 | 8,94,30,799 | ||||
LU_3 | 9,14,51,858 | ||||
LV_1 | 9,72,18,369 | ||||
LV_2 | 10,54,16,880 | ||||
LV_3 | 8,86,53,148 | ||||
LV_4 | 8,61,02,943 | ||||
LV_5 | 12,87,88,483 | ||||
LV_6 | 11,87,76,622 | ||||
PA_1 | 8,79,20,160 | ||||
PA_2 | 7,82,36,741 | ||||
PA_3 | 10,21,24,209 | ||||
PA_4 | 11,53,22,926 |
Table 1: Sample and test conditions summary. Summary of all parameters and test conditions evaluated in this study, along with the total number of sequencing reads generated for each sample. UC = universal control, MG = middle temporal gyrus, OC = occipital cortex, BC = cerebellum, IP = inferior parietal lobe, SF = superior frontal gyrus, LU = lung, LV = liver, LN = lymph node, PA = pancreas, RNase R = ribonuclease R, RNase R+ = pre-treated with RNase R, RNase R- = not pre-treated with RNase R.
Sample source | Input amounts/samples tested | Percent rRNA |
UC | TruSeq: 1 µg | 5.53% |
TruSeq: 2 µg | 4.11% | |
TruSeq: 4 µg | 4.38% | |
TruSeq: 5 µg | 3.21% | |
TruSeq: 10 µg | 3.74% | |
Kapa: 1 µg | 5.57% | |
Kapa: 2 µg | 4.56% | |
Kapa: 4 µg | 9.67% | |
Kapa: 5 µg | 12.69% | |
Kapa: 10 µg | 15.59% |
Table 2: rRNA percentages in TruSeq vs. Kapa libraries.
In this study, two commercially available library preparation kits, pre-treatment options, and input RNA amounts were tested in order to optimize a circRNA enrichment protocol for construction of circRNA sequencing libraries. Based on this study’s assessments, a number of key aspects and critical steps in creating circRNA sequencing libraries are apparent. Our evaluation confirms the utility of RNase R pre-treatment, as reflected by the increased number of circRNAs detected. Overall, a higher diversity of circRNAs when using the Illumina TruSeq library kit with RNase R pre-treatment and 4 µg of input RNA was observed. These results align with previous findings that the RNase R enrichment step is beneficial for detection of circRNAs2.
Additional key aspects of circRNA library construction include the amount of total RNA that is available for sequencing as well as the type of tissue that the RNA extracted from. Although a 4 µg input of total RNA was found to yield the highest number of detected circRNAs, the majority of RNAseq studies utilize <=1 µg of total RNA such that obtaining higher amounts may be challenging, particularly for analysis of human specimens. Identification of circRNAs remains feasible for lower input amounts, but it is relevant to acknowledge that the specificity of the analysis may be impacted. This study further highlights the higher number of circRNAs that are detected in human brain compared to other tissues, as previously reported19,20. It is thus critical to acknowledge the differential expression of circRNAs across different tissue types. Furthermore, additional research in the context of disease will be important for shedding light into how circRNAs may be involved in pathogenic processes.
The performance of the two assessed RNA library preparation kits also highlights that although different commercially-available kits may demonstrate significant similarities, differences are still observed when analyzing circRNAs. Two major findings from this comparison include decreased rRNA depletion and a lower number of circRNAs identified using one approach. While one possibility is that a higher abundance of rRNA in a sample may interfere with creating sequence-able circRNA library molecules, this finding emphasizes the need to assess seemingly similar kits, particularly when reagents are proprietary.
Although the data presented here provides insights into the existence and abundance of circRNAs in various tissue types, this study has a few technical limitations. Firstly, while RNase R treatment reduces the population of linear RNAs in a sample, it is not well understood if this depletion step introduces any biases in circRNA detection and whether it may deplete circRNAs. Previous studies have reported that in some cases, circRNAs are sensitive to RNase R2,24,25. Secondly, it is unclear if increasing the total RNA input above 4 µg will result in a linear increase in the number of identified circRNAs. As previously mentioned, available total RNA is often limited in research studies so lower input amounts were considered here. Of note, circRNAs can still be detected when using lower inputs but it is important to acknowledge that lower inputs are associated with detection of a lower number of circRNAs. Thirdly, the optimized protocol presented here utilizes RNAs extracted from a specific set of tissues. Given the variable distribution of circRNA expression across different tissue types, the association between total RNA input amounts and the number of identified circRNAs may differ across tissues.
With increasing interests in understanding the biological role of circRNAs, new strategies are also being developed to better enable characterization and identification of circRNAs. One new bioinformatics approach enables identification of circRNAs that may be lowly expressed through reconstruction of full-length circRNAs, and also enables quantification of expression of specific circRNA isoforms26. This approach takes advantage of features described as reverse overlap (RO) reads that may occur on the 3’ or 5’ ends of circRNA library molecules. Development of new strategies for identifying circRNAs, encompassing both laboratory approaches and bioinformatics tools, will contribute to the field’s understanding of the function and impact of circRNAs.
The authors have nothing to disclose.
We are grateful to the Banner Sun Health Research Institute Brain and Body Donation Program (BBDP) of Sun City, Arizona for the provision of human brain tissues. The BBDP has been supported by the National Institute of Neurological Disorders and Stroke (U24 NS072026 National Brain and Tissue Resource for Parkinson’s Disease and Related Disorders), the National Institute on Aging (P30AG19610 Arizona Alzheimer’s Disease Core Center), the Arizona Department of Health Services (contract 211002, Arizona Alzheimer’s Research Center), the Arizona Biomedical Research Commission (contracts 4001, 0011, 05-901 and 1001 to the Arizona Parkinson's Disease Consortium) and the Michael J. Fox Foundation for Parkinson’s Research27. This study was also supported by the DHS and the State of Arizona (ADHS grant # ADHS14-052688). We also thank Andrea Schmitt (Banner Research) and Cynthia Lechuga (TGen) for administrative support.
1000 µL pipette tips | Rainin | GP-L1000F | |
20 µL pipette tips | Rainin | SR L 10F | |
200 µL pipette tips | Rainin | SR L 200F | |
2200 TapeStation Accessories (foil covers) | Agilent Technologies | 5067-5154 | |
2200 TapeStation Accessories (tips) | Agilent Technologies | 5067-5153 | |
Adhesive Film for Microplates | VWR | 60941-064 | |
AMPure XP Beads 450 mL | Beckman Coulter | A63882 | PCR purification |
Eppendorf twin.tec 96-Well PCR Plates | VWR | 951020401 | |
High Sensitivity D1000 reagents | Agilent Technologies | 5067-5585 | |
High Sensitivity D1000 ScreenTape | Agilent Technologies | 5067-5584 | |
HiSeq 2500 Sequencing System | Illumina | SY-401-2501 | |
HiSeq 3000/4000 PE Cluster Kit | Illumina | PE-410-1001 | |
HiSeq 3000/4000 SBS Kit (150 cycles) | Illumina | FC-410-1002 | |
HiSeq 4000 Sequencing System | Illumina | SY-401-4001 | |
HiSeq PE PE Rapid Cluster Kit v2 | Illumina | PE-402-4002 | |
HiSeq Rapid SBS Kit v2 (50 cycle) | Illumina | FC-402-4022 | |
Kapa Total RNA Kit | Roche | KK8400 | |
Molecular biology grade ethanol | Fisher Scientific | BP28184 | |
Qubit Assay Tubes | Supply Center by Thermo Fischer | Q32856 | |
Qubit dsDNA High Sense Assay Kit | Supply Center by Thermo Fischer | Q32854 | |
RNA cleanup and concentrator – 5 | Zymo | RCC-100 | Contains purification columns, collection tubes |
RNAClean XP beads | Beckman Coulter Genomics | RNA Cleanup beads | |
Rnase R | Lucigen | RNR07250 | |
SuperScript II Reverse Transcriptase 10,000 units | ThermoFisher (LifeTech) | 18064014 | |
TapeStation 2200 | Agilent Technologies | Nucleic Acid analyzer | |
TElowE | VWR | 10128-588 | |
TruSeq Stranded Total RNA Library Prep Kit | Illumina | 20020596 | Kit used in section 3 |
Two-Compartment Divided Tray | VWR | 3054-1004 | |
UltraPure Water | Supply Center by Thermo Fischer | 10977-015 | |
Universal control RNA | Agilent | 740000 |