Here, we detail the method of Sequencing of Psoralen crosslinked, Ligated, and Selected Hybrids (SPLASH), which enables genome-wide mapping of intramolecular and intermolecular RNA-RNA interactions in vivo. SPLASH can be applied to study RNA interactomes of organisms including yeast, bacteria and humans.
Knowing how RNAs interact with themselves and with others is key to understanding RNA based gene regulation in the cell. While examples of RNA-RNA interactions such as microRNA-mRNA interactions have been shown to regulate gene expression, the full extent to which RNA interactions occur in the cell is still unknown. Previous methods to study RNA interactions have primarily focused on subsets of RNAs that are interacting with a particular protein or RNA species. Here, we detail a method named Sequencing of Psoralen crosslinked, Ligated, and Selected Hybrids (SPLASH) that allows genome-wide capture of RNA interactions in vivo in an unbiased manner. SPLASH utilizes in vivo crosslinking, proximity ligation, and high throughput sequencing to identify intramolecular and intermolecular RNA base-pairing partners globally. SPLASH can be applied to different organisms including bacteria, yeast and human cells, as well as diverse cellular conditions to facilitate the understanding of the dynamics of RNA organization under diverse cellular contexts. The entire experimental SPLASH protocol takes about 5 days to complete and the computational workflow takes about 7 days to complete.
Studying how macromolecules fold and interact with each other is the key to understanding gene regulation in the cell. While much effort has been focused in the past decade on understanding how DNA and proteins contribute to gene regulation, relatively less is known about post-transcriptional regulation of gene expression. RNA carries information in both its linear sequence and in its secondary and tertiary structure1. Its ability to base pair with itself and with others is important for its function in vivo. Recent advances in high throughput RNA secondary structure probing has provided valuable insights into the locations of double and single stranded regions in the transcriptome2,3,4,5,6,7,8, however information on the pairing interaction partners is still largely missing. To determine which RNA sequence is interacting with another RNA region in the transcriptome, we need global pair-wise information.
Mapping pair-wise RNA interactions in a global, unbiased manner has traditionally been a major challenge. While previous approaches, such as CLASH9, hiCLIP10 and RAP11, are used to identify RNA interactions in a large scale manner, these techniques typically map RNA base pairing for a subset of RNAs that either interact with a particular protein or RNA species. Recent developments in studying global RNA interactions include the method RPL12, which does not stabilize RNA interactions in vivo and hence may only capture a subset of in vivo interactions. To overcome these challenges, we and others developed genome-wide, unbiased strategies to map RNA interactomes in vivo, using modified versions of the crosslinker psoralen13,14,15. In this protocol, we describe the details for performing Sequencing of Psoralen crosslinked, Ligated, and Selected Hybrids (SPLASH), which utilizes biotinylated psoralen to crosslink base pairing RNAs in vivo, followed by proximity ligation and high throughput sequencing to identify RNA base-pairing partners genome-wide (Figure 1)15.
In this manuscript, we describe the steps to perform SPLASH using cultured adherent cells, in this case HeLa cells. The same protocol can be easily adapted to suspension mammalian cells and to yeast and bacteria cells. Briefly, HeLa cells are treated with biotinylated psoralen and irradiated at 365 nm to crosslink interacting RNA base pairs in vivo. The RNAs are then extracted from the cells, fragmented and enriched for crosslinking regions using streptavidin beads. Interacting RNA fragments are then ligated together using proximity ligation and made into a cDNA library for deep sequencing. Upon sequencing, the chimeric RNAs are mapped onto the transcriptome/genome to identify the RNA interacting regions that are paired to each other. We have successfully utilized SPLASH to identify thousands of RNA interactions in vivo in yeast and different human cells, including intramolecular and intermolecular RNA base pairing in diverse classes of RNAs, such as snoRNAs, lncRNAs and mRNAs, to glimpse into the structural organization and interaction patterns of RNAs in the cell.
1. Treatment of HeLa Cells with Biotinylated Psoralen and RNA Extraction
2. RNA Fragmentation
3. RNA Size Selection and Elution
4. Enrichment of RNA Crosslinking Regions
5. Proximity Ligation
6. Reverse Crosslinking of Biotinylated Psoralen
7. Reverse Transcription and cDNA Circularization
8. PCR Amplification (Small Scale PCR)
9. PCR Amplification (Large Scale PCR) and Purification
Figure 1 depicts the schematic of the SPLASH workflow. Upon the addition of biotinylated psoralen in the presence of 0.01% digitonin, and UV crosslinking, total RNA is extracted from the cells and a dot blot is performed to ensure that crosslinking of biotinylated to the RNA has happened efficiently (Figure 2). We use biotinylated 20 base oligos as positive controls to titrate the amount of biotinylated psoralen to be added to the cells, such that approximately 1 in every 150 bases are crosslinked.
As more PCR duplication events tend to occur with increased PCR amplification cycles, we perform a small scale PCR amplification using different PCR cycles to determine the lowest number of amplification cycles that will provide enough material for deep sequencing. In an efficient library preparation process, we are able to amplify a cDNA sequencing library from a 1.5 µg of size selected RNA input in less than 15 cycles of PCR amplification (Figure 3). The library is then sequenced using 2x 150 base pair reads on a high throughput sequencing machine and the sequencing reads are then processed according to the computational pipeline in Figure 4. The end result is a list of filtered chimeric interactions that includes both intramolecular and intermolecular RNA-RNA interactions in the transcriptome (Table 11).
Figure 1: Schematic of the experimental workflow of SPLASH15. Pair-wise RNA interactions inside the cell are crosslinked using biotinylated psoralen under UV light at 365 nm. Crosslinked RNA is then extracted and fragmented to around 100 bases. Interacting regions that contain the biotinylated psoralen crosslinks are enriched by binding to streptavidin beads and ligated together. Upon reverse crosslinking under UV light at 265 nm, the chimeric RNAs are then cloned into a cDNA library for deep sequencing. Please click here to view a larger version of this figure.
Figure 2: Testing biotinylated crosslinking efficiency by dot blotting15. Dot blot of biotinylated psoralen crosslinked RNA in vivo in the presence of 1% digitonin. Different concentrations of crosslinked RNA (20-2,000 ng) are spotted on the membrane. Different concentrations of biotinylated 20 base oligo are spotted as positive controls. Please click here to view a larger version of this figure.
Figure 3: Small scale library amplification for SPLASH. 1 µL of cDNA from the RT reaction is used for small scale PCR. Reactions are taken out at 10, 15 and 20 cycles and run on a 3% agarose gel to determine the minimum number of amplification cycles required for library generation. In this specific example, 10 cycles of amplification using 10 µL of cDNA in a large scale PCR reaction will generate enough amplicons for deep sequencing.
Figure 4: Schematic of the computational workflow of SPLASH. Sequencing reads from high throughput sequencing are mapped to the transcriptome and filtered against poor quality reads, PCR duplicates, duplicate mapping and splicing junction reads using the analysis pipeline. The final output is a list of chimeras that represent both intra- and intermolecular interactions in the transcriptome. Please click here to view a larger version of this figure.
Reagents | Volume (µL) | Final |
Nuclease free water | 64 | |
10x T4 PNK buffer | 8 | 1x |
Rnase Inhibitor (20 U/µL) | 4 | 1 U/µL |
T4 PNK (10 U/µL) | 4 | 0.5 U/µL |
Total | 80 |
Table 1: Reagents for 3' end repair.
Reagents | Volume (µL) | Final |
PNK reaction | 80 | |
Nuclease free water | 3 | |
10x T4 PNK buffer | 2 | 1x |
10 mM ATP | 10 | 1 mM |
T4 PNK (10 U/µL) | 5 | 0.5 U/µL |
Total | 100 |
Table 2: Reagents for 5' end repair.
Reagents | Volume (μL) | Final |
PNK reaction | 100 | |
10x T4 RNA ligase buffer | 6 | 1x |
10 mM ATP | 6 | 1 mM |
Rnase inhibitor (20 U/µL) | 4 | 0.5 U/µL |
T4 Rnl1 (10 U/µL) | 40 | 2.5 U/µL |
Nuclease free water | 4 | |
Total | 160 |
Table 3: Reagents for proximity ligation.
Component | Amount per reaction (µL) | Final |
RNA and linker | 5 | |
T4 Rnl2 buffer (10x) | 1 | 1x |
PEG 8000 (50%, wt/vol) | 3 | 15% (wt/vol) |
Rnase inhibitor (20 U/µL) | 0.5 | 1 U/µL |
T4 Rnl2 (tr) (200U/µL) | 0.5 | 10 U/µL |
Total | 10 |
Table 4: Reagents for adaptor ligation.
Component | Amount per reaction (µL) | Final |
Ligation and primer | 6 | |
First-strand buffer (5x) | 2 | 1x |
dNTPs (10 mM) | 0.5 | 0.5 mM |
DTT (0.1 M) | 0.5 | 5 mM |
Rnase inhibitor (20 U/µL) | 0.5 | 1 U/µL |
Reverse transcriptase (200 U/µL) | 0.5 | 10 U/µL |
Total | 10 |
Table 5: Reagents for reverse transcription.
Component | Amount per reaction (µL) | Final |
First-strand cDNA | 6 | |
Single strand DNA ligase buffer (10x) | 1 | 1x |
Betaine (5 M) | 2 | 1 M |
MnCl2 (50 mM) | 0.5 | 2.5 mM |
Single strand DNA ligase (100U/µL) | 0.5 | 5 U/µL |
Total | 10 |
Table 6: Reagents for circularization of cDNA.
Component | Amount per reaction (µL) | Final |
Ligated DNA Product | 1 | |
Nuclease free Water | 10.5 | |
High-Fidelity DNA polymerase 2x master mix | 12.5 | 1x |
Universal PCR Primer (10 µM) | 0.5 | 0.02 µM |
Index (X) Primer (10 µM) | 0.5 | 0.02 µM |
Total | 25 |
Table 7: Reagents used for small scale PCR.
Step | Temperature | Time | Cycles |
Initial Denaturation | 98 °C | 30 s | 1 |
Denaturation | 98 °C | 10 s | |
Annealing | 65 °C | 30 s | 25 |
Extension | 72 °C | 30 s | |
Final Extension | 72 °C | 5 min | 1 |
Hold | 4 °C | ∞ |
Table 8: Conditions used for small scale PCR.
Component | Amount per reaction (µL) | Final |
Ligated DNA Product | 5 | |
Nuclease free Water | 6.5 | |
High-Fidelity DNA polymerase 2x master mix | 12.5 | 1x |
Universal PCR Primer (10 µM) | 0.5 | 0.02 µM |
Index (X) Primer (10 µM) | 0.5 | 0.02 µM |
Total | 25 |
Table 9: Reagents used for large scale PCR.
Step | Temperature | Time | Cycles |
Initial Denaturation | 98 °C | 30 s | 1 |
Denaturation | 98 °C | 10 s | |
Annealing | 65 °C | 30 s | Y |
Extension | 72 °C | 30 s | |
Final Extension | 72 °C | 5 min | 1 |
Hold | 4 °C | ∞ |
Table 10: Conditions used for large scale PCR.
Table 11: Analysis output of the computational pipeline. "SAM flag split read 1" = 0 indicates that the read is mapped to the positive strand of the transcriptome. "RNA 1 identity" refers to the identity of the RNA of the left of the chimera. "RNA 1 start position" refers to the start position to which the read is mapped along RNA 1. "RNA 1 end position" refers to the end position to which the read is mapped along RNA 1. "Mapping score split read 1" refers to the mapping score of the read that was mapped to RNA 1. "Cigar split read 1" indicates the number of bases that was clipped from the read "S" and the number of bases that was mapped to the read "M". "Split read 1" indicates the sequence that was mapped to RNA 1. The same nomenclature was used for the right side of the chimera, which is named as RNA 2. Please click here to view a larger version of this table.
Here, we describe in detail the experimental and computational workflow for SPLASH, a method that allows us to identify pair-wise RNA interactions in a genome-wide manner. We have successfully utilized SPLASH in bacterial, yeast and human cultures and anticipate that the strategy can be widely applied to diverse organisms under different cellular states. One of the critical steps in the protocol is to start with at least 20 µg of crosslinked RNA to have adequate material for downstream processes. The RNA is then fragmented to 100 bases and PAGE-size-selected. These steps are important for us to preferentially enrich for ligated chimeras, rather than monomers during the library generation process. We typically recover around 1.5 µg of RNA after fragmentation and the first size selection, and the RNA is then converted into a cDNA library according to the SPLASH workflow. We find that this amount of starting RNA allows us to generate enough material for high throughput sequencing using less than 15 cycles of PCR amplification. As the number of PCR duplication events increases dramatically with increased PCR cycles, keeping the number of amplification cycles low is critical to be able to extract useful, and unique chimeras in the downstream analysis.
In contrast to the other genome-wide psoralen based strategies, SPLASH utilizes a biotinylated version of psoralen to crosslink base-paired RNA fragments to each other. As we titrated the amount of crosslinking to approximately one crosslink per hundred and fifty bases, we can enrich for crosslinked RNA regions by using streptavidin beads after RNA fragmentation. Furthermore, performing enzymatic reactions while the RNAs are bound on beads also allowed us to perform buffer exchanges and washes conveniently. However, one of the limitations of the strategy is that biotinylated psoralen is less efficient at penetrating into cells than psoralen or 4′-aminomethyl trioxsalen (AMT). As such, it is critical to ensure that biotinylated psoralen has entered the cells and crosslinked the RNAs efficiently. We routinely perform dot blots on crosslinked, extracted RNAs, together with biotinylated oligos as positive controls, to ensure that our RNAs are properly crosslinked. In the event that crosslinking is weak, strategies to permeate the cellular membrane, such as adding low concentrations of digitonin (to 0.01%) for 5 min, can be used to allow biotinylated psoralen to enter the cells efficiently. As psoralen absorbs at the same wavelength as nucleic acids (UV 260 nm), we typically use fluorometric quantification systems, rather than UV absorption for quantification of crosslinked RNAs.
One limitation of psoralen based crosslinking strategies is that psoralen preferentially crosslinks at uridines (U). As such, base pairing regions that are U poor might be missed during crosslinking. Hence while detecting a crosslinking event between two strands provide evidence that an interaction is occurring, a lack of crosslinking does not equate to a lack of interaction. In our SPLASH protocol, we capture very little microRNA-mRNA interactions, and low amounts of lncRNA interactions. As microRNA bound mRNAs are likely to be downregulated in gene expression and lncRNAs are typically lowly expressed, their poor representation in our data is primarily due to insufficient sequencing depth. We typically sequence at least 200 million paired end reads per human transcriptome library for each replicate. At this depth, most of our sequencing reads fall on fairly abundant RNAs. We anticipate that the usage of enrichment strategies for specific populations of RNAs will greatly enhance the signal for these relatively low abundant RNAs. One other challenge that we observed in analyzing chimeric data is to distinguish between true interacting chimeric events versus chimeras that are generated from splicing. To prevent contamination of splicing reads in our chimeric list, we filter away all reads that are close to known annotated splicing sites in the transcriptome. We anticipate that with more complete splicing annotations, the final list of chimeras will become even more accurate.
Different from previous strategies that focused on RNA interactions that are specific to a single RNA species or a single RNA binding protein, the ability of SPLASH to map RNA-RNA interactions in a genome-wide manner for all RNAs enables us to study large scale RNA interaction networks and to identify novel intramolecular and intermolecular RNA interactions for the first time. Using SPLASH, we obtained thousands of intramolecular and intermolecular interactions in the human and yeast transcriptomes, allowing us to glimpse into the organization and dynamics of the RNA interactome in vivo. Advances in integrating this long range RNA interaction data into RNA structure modeling algorithms is likely to refine our current models of RNA organization in vivo. Incorporating SPLASH into intermolecular RNA prediction algorithms, such as snoRNA prediction programs, can also improve the accuracy of these predictions. We anticipate that future uses of SPLASH on other complex organisms and dynamic systems will continue to shed light on the intricacies of RNA based gene regulation in biology.
The authors have nothing to disclose.
We thank members of the Wan lab and the Nagarajan lab for informative discussions. N.Nagarajan is supported by funding from A*STAR. Y.Wan is supported by funding from A*STAR and Society in Science-Branco Weiss Fellowship.
1 Kb Plus DNA Ladder | Life Technologies Holdings Pte Ltd | 10787026 | DNA ladder |
10 bp DNA ladder | Life Technologies Holdings Pte Ltd | 10821-015 | DNA ladder |
20 % SDS solution | First BASE | BUF-2052-1L | |
20x SSC | First BASE | BUF-3050-20X1L | |
3.0 M Sodium Acetate Solution | First BASE | BUF-1151-1L-pH5.2 | Required for nucleic acid precipitation |
40% Acrylamide/Bis Solution, 19:1 | Bio-Rad | 1610145 | TBE Urea gel component |
Ambion Buffer Kit | Life Technologies Holdings Pte Ltd | AM9010 | |
Ammonium Persulfate, Molecular Grade | Promega | V3131 | TBE Urea gel component |
Bromophenol Blue | Sigma-Aldrich | B0126-25G | |
Chloroform | Merck | 1.02445.1000 | RNA extraction |
Single strand DNA ligase | Epicentre | CL9025K | CircLigase II ssDNA Ligase |
Centrifuge tube filters | Sigma-Aldrich | CLS8160-96EA | Costar Spin-X centrifuge tube filters |
D5628-1G DIGITONIN CRYSTALLINE | Sigma-Aldrich | D5628-1G | For cell treatment |
Dark Reader Transilluminator | Clare Chemical Research | Dark Reader DR89X Transilluminator | Blue light transilluminator |
DNA Gel Loading Dye (6X) | Life Technologies Holdings Pte Ltd | R0611 | Required for agarose gel electroporation |
Dulbecco's Modified Eagle Medium | Pan BioTech | P04-03500 | For Hela cell culture |
Streptavidin magnetic beads | Life Technologies Holdings Pte Ltd | 65002 | Dynabeads MyOne Streptavidin C1 |
Magnetic stand for 15ml tubes | Life Technologies Holdings Pte Ltd | 12301D | DynaMag-15 |
Magnetic stand for 15ml tubes | Life Technologies Holdings Pte Ltd | 12321D | DynaMag-2 |
ThermoMixer | Eppendorf | 5382 000.015 | Eppendorf ThermoMixer C |
Biotinylated psoralen | Life Technologies Holdings Pte Ltd | 29986 | EZ-Link Psoralen-PEG3-Biotin |
F8T5/BL | Hitachi | F8T5/BL | 365 nm UV bulb |
Fetal Bovine Serum | Life Technologies Holdings Pte Ltd | 10270106 | Components of Hela medium |
Formamide | Promega | H5052 | Component in hybridization buffer |
G8T5 | Sankyo-Denki | G8T5 | 254 nm UV bulb |
Glycogen | Life Technologies Holdings Pte Ltd | 10814010 | Required for nucleic acid precipitation |
Nanodrop | Life Technologies Holdings Pte Ltd | Nanodrop 2000 | Spectrophotometer for nucleic acidquantification |
Nuclease free water | First BASE | BUF-1180-500ml | |
Penicillin Streptomycin | Life Technologies Holdings Pte Ltd | 15140122 | Components of Hela medium |
High-Fidelity DNA polymerase (2x) | Life Technologies Holdings Pte Ltd | F531L | Phusion High-Fidelity PCR Master Mix with HF Buffer (2x) |
Primers Set 1 | New England Biolabs | E7335L | PCR primers |
DNA Gel Extraction Kit | QIAgen | 28106 | QIAquick Gel Extraction Kit |
Fluorometric Quantification kit | Life Technologies Holdings Pte Ltd | Q32854 | Qubit dsDNA HS Assay Kit |
RNA clean up kit | Qiagen | 74106 | RNeasy Mini Kit Silica-membrane column |
Rnase Inhibitor | Life Technologies Holdings Pte Ltd | AM2696 | SUPERase In |
Reverse transcriptase | Life Technologies Holdings Pte Ltd | 18080400 | SuperScript III First-Strand Synthesis SuperMix |
Nucleic Acid Gel Stain | Life Technologies Holdings Pte Ltd | S11494 | SYBR GOLD NUCLEIC ACID 500 UL |
T4 Polynucleotide Kinase | New England Biolabs | M0201L | End repair enzyme |
T4 RNA Ligase 1 | New England Biolabs | M0204L | Enzyme for proximity ligation |
T4 RNA Ligase 2, truncated KQ | New England Biolabs | M0373L | Enzyme for adaptor ligation |
Temed | Bio-Rad | 1610801 | TBE Urea gel component |
Guanidinium thiocyanate-phenol-chloroform | Life Technologies Holdings Pte Ltd | 15596018 | TRIzol® Reagent for RNA extraction |
Urea | First BASE | BIO-2070-5kg | |
UV crosslinker | Stratagene | 400072 | UV Stratalinker 1800 |
UV Transilluminator | UVP | 95-0417-01 | For visualizing of bands |
Xylene Cyanol FF | Sigma-Aldrich | X4126-10G | |
DNA cleanup it | Zymo Research | D4004 | Zymo DNA concentrator-5 |
List of software required | |||
FastQC software | 0.11.4 | http://www.bioinformatics.babraham.ac.uk/projects/fastqc/ | |
SeqPrep software | 1.0.7 | https://github.com/jstjohn/SeqPrep | |
BWA software | 0.7.12 | http://bio-bwa.sourceforge.net/ | |
SAMTOOLS software | 1.2.1 | http://www.htslib.org/ | |
PULLSEQ software | 1.0.2 | https://github.com/bcthomas/pullseq | |
STAR software | 2.5.0c | https://github.com/alexdobin/STAR | |
rem_dups.py PYTHON script | PYTHON 2.7.11 | https://github.com/CSB5/splash/tree/master/src | |
find_chimeras.py PYTHON script | PYTHON 2.7.11 | https://github.com/CSB5/splash/tree/master/src | |
pickJunctionReads.awk AWK script | AWK GNU 4.1.2 | https://github.com/CSB5/splash/tree/master/src | |
Buffer composition | |||
Elution buffer: | |||
0.3 M sodium acetate | |||
Lysis Buffer: | |||
50 mM Tris-Cl pH 7.0 | |||
10 mM EDTA | |||
1% SDS | |||
Always add Superase-in fresh before use except when washing beads | |||
Proteinase K Buffer | |||
100 mM NaCl | |||
10 mM TrisCl pH 7.0 | |||
1 mM EDTA | |||
0.5% SDS | |||
Hybridization Buffer | |||
750 mM NaCl | |||
1% SDS | |||
50 mM Tris-Cl pH 7.0 | |||
1 mM EDTA | |||
15% formamide (store in the dark at 4 °C) | |||
Always add Superase-in fresh before use | |||
2x SSC Wash Buffer | |||
2x NaCl and Sodium citrate (SSC) (diluted from 20x SSC Invitrogen stock) | |||
0.5% SDS | |||
2X RNA fragmentation buffer | |||
18mM MgCl2 | |||
450mM KCl | |||
300mM Tris-CL pH 8.3 | |||
2x RNA loading Dye: | |||
95% Formamide | |||
0.02% SDS | |||
0.02% bromophenol blue | |||
0.01% Xylene Cyanol | |||
1mM EDTA | |||
3' RNA adapter sequence | 5rAppCTGTAGGCACCATCAAT/3ddC | ||
RT primer sequence | AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGC/iSp18/CACTCA/iSp18/TTCAGACGTGTGCTCTTCCGATCTATTGATGGTGCCTACAG |