We describe a protocol to identify RNA-binding proteins and map their RNA-binding regions in live cells using UV-mediated photocrosslinking and mass spectrometry.
Noncoding RNAs play important roles in several nuclear processes, including regulating gene expression, chromatin structure, and DNA repair. In most cases, the action of noncoding RNAs is mediated by proteins whose functions are in turn regulated by these interactions with noncoding RNAs. Consistent with this, a growing number of proteins involved in nuclear functions have been reported to bind RNA and in a few cases the RNA-binding regions of these proteins have been mapped, often through laborious, candidate-based methods.
Here, we report a detailed protocol to perform a high-throughput, proteome-wide unbiased identification of RNA-binding proteins and their RNA-binding regions. The methodology relies on the incorporation of a photoreactive uridine analog in the cellular RNA, followed by UV-mediated protein-RNA crosslinking, and mass spectrometry analyses to reveal RNA-crosslinked peptides within the proteome. Although we describe the procedure for mouse embryonic stem cells, the protocol should be easily adapted to a variety of cultured cells.
The purpose of the RBR-ID method is to identify novel RNA-binding proteins (RBPs) and map their RNA-binding regions (RBRs) with peptide-level resolution to facilitate the design of RNA-binding mutants and the investigation of the biological and biochemical functions of protein-RNA interactions.
RNA is unique among biomolecules as it can both act as a messenger carrying genetic information and also fold into complex three-dimensional structures with biochemical functions more akin to those of proteins1,2. A growing body of evidence suggests that noncoding RNAs (ncRNAs) play important roles in various gene regulatory and epigenetic pathways3,4,5 and, typically, these regulatory functions are mediated in concert with proteins that interact specifically with a given RNA. Of particular relevance, a set of interacting proteins was recently identified for the intensely studied long ncRNA (lncRNA) Xist, providing valuable insight into how this lncRNA mediates X-chromosome inactivation in female cells6,7,8. Notably, several of these Xist-interacting proteins do not contain any canonical RNA-binding domains9, and therefore their RNA-binding activity could not be predicted in silico based on their primary sequence alone. Considering that thousands of lncRNAs are expressed in any given cell10, it is reasonable to assume that many of them might act via interactions with yet to be discovered RNA-binding proteins (RBPs). An experimental strategy to identify these novel RBPs would therefore greatly facilitate the task of dissecting the biological function of ncRNAs.
Previous attempts to identify RBPs empirically have relied on polyA+ RNA selection coupled to mass spectrometry (MS)11,12,13,14,15. Although these experiments added many proteins to the list of putative RBPs, by design they could only detect proteins bound to polyadenylated transcripts. However, most small RNAs and many lncRNAs are not polyadenylated.16,17 and their interacting proteins would likely have been missed in these experiments. A recent study applied machine learning to protein-protein interactome databases to identify proteins that co-purified with multiple known RBPs and showed that these recurrent RBP partners were more likely to possess RNA-binding activities18. However, this approach relies on mining existing large interaction databases and can only identify proteins that can be co-purified in non-denaturing conditions with known RBPs, thus excluding from the analysis insoluble, membrane-embedded, and scarce proteins.
The identification of a protein as a bona fide RBP often does not automatically yield information on the biological and/or biochemical function of the protein-RNA interaction. To address this point, it is typically desirable to identify the protein domain and amino acid residues involved in the interaction so that specific mutants can be designed to test the function of RNA binding in the context of each novel RBP19,20. Previous efforts by our group and others have used recombinant protein fragments and deletion mutants to identify RNA binding regions (RBRs)19,20,21,22; however, such approaches are labor-intensive and incompatible with high-throughput analyses. More recently, a study described an experimental strategy to map RNA-binding activities in a high-throughput fashion using mass spectrometry23; however, this approach relied on a double polyA+ RNA selection, and thus carried the same limitations as the RBP identification approaches described above.
We developed a technique, termed RNA binding region identification (RBR-ID), which exploits protein-RNA photocrosslinking and quantitative mass spectrometry to identify proteins and protein regions interacting with RNA in live cells without making assumption on the RNA polyadenylation status, thus including RBPs bound to polyA- RNAs24. Moreover, this method relies exclusively on crosslinking and has no requirements on protein solubility or accessibility and is thus suitable to map RNA-binding activities within membranes (e.g. the nuclear envelope) or poorly soluble compartments (e.g. the nuclear matrix). We describe the experimental steps to perform RBR-ID for the nuclei of mouse embryonic stem cells (mESCs) but with minor modifications this protocol should be suitable for a variety of cell types, provided that they can efficiently incorporate 4SU from the culture medium.
1. Culture and Expansion of mESCs
NOTE: Mouse embryonic stem cells are easy to culture and can be quickly expanded to the large numbers required by biochemical experiments thanks to their fast cycling time. Healthy mESCs double every 12 h.
2. Crosslink of Protein–RNA Interactions in Live Cells
NOTE: RNA-protein crosslinking is mediated by the photo-activatable ribonucleoside analog 4-thiouridine (4SU). 4SU has a longer absorbance maximum than endogenous nucleotides and can only be incorporated into RNA; therefore intermediate-wavelength UVB can be used to selectively crosslink RNA to proteins25,26. UVB treatment of 4SU-treated cells leads to covalent crosslinks between 4SU-containing RNA and amino acids, with a reported preference for Tyr, Trp, Met, Lys, and Cys27.
3. Isolation of Nuclei
NOTE: Nuclei are isolated to remove cytoplasmic proteins and increase coverage of nuclear proteins. This step can be replaced with other forms of cellular fractionation to study RBPs in different cellular compartments.
4. Lysis of Nuclei
NOTE: Crosslinked nuclei are lysed in a mass spectrometry-compatible buffer to release proteins and protein-RNA complexes.
5. Trypsin Digestion
NOTE: Proteins are digested to generate peptides suitable for bottom-up mass spectrometry (MS) analysis.
6. Desalting of peptides
7. Removal of Crosslinked RNA
NOTE: Treat peptides with nuclease to remove crosslinked RNA.
8. Nano Liquid Chromatography, Mass Spectrometry, and Raw Data Processing
NOTE: Because 4SU-crosslinking changes the mass of the peptide, their ions do not count toward the intensity of the non-crosslinked peptide during LC-MS/MS, which therefore appears to be decreased by the crosslinking. The degree and consistency of this decrease reflects the degree of protein-RNA crosslinking for each peptide24.
Figure 1 depicts the RBR-ID workflow. Due to the relatively low crosslinking efficiency of this technique, it is very important to consider both the depletion level and consistency of the observed effect (P-value) across biological replicates. Figure 2 shows a volcano plot of RBR-ID result. Peptides that overlapped RNA recognition motif (RRM) domain show highly consistent depletion level. RRM domains can be used as a positive control for RBR-ID analysis.
RBR-ID can be used to map RBRs in live cells. An RBR-ID score can be calculated for each peptide to estimate RNA-binding potential: RBR-ID score = -log2(normalized +4SU intensity/normalized -4SU intensity) x (log10(P-value))2. Figure 3 shows a heatmap of RBR-ID scores projected on the surface of the spliceosomal subunit U1-70k. The bright red color in vicinity of the RNA contact, as determined from the crystal structure, indicates correct identification of the protein-RNA interaction.
Upon identification of novel RBPs and their RBRs, it is highly recommended that their RNA-binding activity be confirmed by an independent method. In our previous study24, we identified TET2 as a novel RBP and mapped the RNA-binding activity to a C-terminal region, as shown in Figure 4A by plotting the RBR-ID score along the primary sequence of the protein. Figure 4B shows that the requirement for this novel RBR could be verified by performing PAR-CLIP26 using the WT sequence and comparing the signal to that of a mutant lacking the predicted RBR. Additional control and more detailed explanation of this validation experiment are available in He et al. 201624.
Figure 1: RBR-ID overview. Mouse ESCs are treated with 4SU or not (1-2) and irradiated with 312 nm UV (3). Nuclei are isolated (4) and extracts digested with protease and nuclease, yielding both crosslinked and uncrosslinked peptides (5). Covalent RNA adducts at crosslink sites alter the peptide mass, leading to a corresponding decrease in intensity of the uncrosslinked peptide's mass spectrum (6, arrow). Please click here to view a larger version of this figure.
Figure 2: Proteome-wide RBR-ID in mESCs. Volcano plots showing log-fold changes in peptide intensities on the x axis and Student's t-test P-values on the y axis for ± 4SU treatments (312 nm). Peptides overlapping annotated RRM domains are in blue. An RNA-binding peptide from HNRNPC is highlighted in red. Previously published in He et al. 201624. Please click here to view a larger version of this figure.
Figure 3: RBR-ID maps the sites of protein–RNA interactions. Zoomed-in regions of the crystal structure of U1 snRNP (PDB ID: 4PJO31) showing protein surfaces color-coded according to their RBR-ID score and interacting RNAs for U1-70K. Previously published in He et al. 201624. Please click here to view a larger version of this figure.
Figure 4: Validation of the RBR of TET2. (A) Primary sequence and known domains for TET2; smoothed residue-level RBR-ID score plotted along the primary sequence (middle); and scheme of epitope-tagged catalytic domain fragment (CD) and RBR-deleted (CDΔRBR) constructs used for validation (bottom). (B) PAR-CLIP of transiently expressed TET2 CD and ΔCDΔRBR in HEK293 cells. Autoradiography for 32P-labeled RNA (top) and control western blot (bottom). Previously published in He et al. 201624. Please click here to view a larger version of this figure.
We describe a detailed experimental protocol to perform RBR-ID in mESCs and, with appropriate modifications, in any cell that can incorporate 4SU into RNA. Other cell types may require optimization of the approach to ensure a sufficient signal to noise ratio. Additionally, while the protocol described herein focuses on the examination of nuclear RBPs, the RBR-ID technology should be easily adapted to different cellular compartments, such as the cytosol or specific organelles, by use of different fractionation strategies. Parameters that may require optimization include 4SU concentration and incorporation time as well as the energy of UV crosslinking. These parameters are important to ensure efficient formation of RNA-protein crosslinks and yield a satisfactory signal to noise ratio. We have shown that 312 nm UVB light is much more efficient at crosslinking 4SU-containing RNAs to proteins compared to the 365 nm UVA typically used in PAR-CLIP32. Direct comparison of RBR-ID performed with 312 nm and 365 nm demonstrated that 312 nm UVB resulted in the identification of a much larger portion of known RBPs24.
Two other methods are available to perform proteome-wide RBP identification. One, called RBDmap, is similar to RBR-ID in that it utilizes UV crosslinking and MS. RBDmap was used to identify RBPs and map their RNA-binding activities in HeLa cells23. This method relies on positive identification of peptides adjacent to the RNA-binding sites and relies on two sequential oligo-dT pull-downs, suggesting that it might have a lower false positive rate than RBR-ID. However, the pull-downs allow only the identification of RBPs that bind to polyA+ RNA and require large amounts of input material, up to 10 – 100 times more than RBR-ID. RBR-ID can be performed with as little as 5 µg of protein per biological replicate (~2 µg per technical replicate), which can be collected from as few as 250,000 cells.
The second method, termed SONAR, relies on data mining of large protein-protein interaction databases to detect proteins that frequently co-purify with known RBPs and whose interaction with these RBPs might therefore be mediated by RNA18. This clever approach is very powerful and allows for the identification of novel RBPs without performing any wet lab experiment, provided that suitable databases are available; however, it cannot reveal the interaction sites and therefore is complementary but not alternative to RBR-ID.
In summary, our RBR-ID technique can identify novel RBPs and map their RBRs with limited input sample amounts and without requiring any biochemical purification of the protein-RNA crosslinked complexes. The technique makes no assumption on the type of RNA bound by the identified RBPs and therefore can map the RNA binding activity of proteins that bind to non-polyadenylated, many of which are noncoding, suggesting that RBR-ID might prove a useful technique to study the regulatory roles of noncoding RNAs.
The authors have nothing to disclose.
R.B. was supported by the Searle Scholars Program, the W.W. Smith Foundation (C1404), and the March of Dimes Foundation (1-FY-15- 344). B.A.G acknowledges support from NIH grants R01GM110174 and NIH R01AI118891, as well as DOD grant BC123187P1. R.W.-T. was supported by NIH training grant T32GM008216.
KnockOut DMEM | Fisher Scientific | 10829018 | |
Fetal bovine serum, qualified, US origin | Fisher Scientific | 26140079 | |
L-Glutamine solution 200 mM | Sigma | G7513 | |
Penicillin-Streptomycin solution | Sigma | P0781 | |
MEM Non-essential Amino Acid Solution (100×) | Sigma | M7145 | |
2-Mercaptoethanol | Sigma | M3148 | |
ESGRO Leukemia Inhibitory Factor (LIF) | EMD Millipore | ESG1106 | |
CHIR99021 | Tocris | 4423 | |
PD0325901 | Sigma | PZ0162 | |
Gelatin solution,2% in water | Sigma | G1393 | |
4-thiouridine | Sigma | T4509 | 50 mM stock in water |
Spectrolinker XL-1500 | Fisher Scientific | 11-992-90 | |
Phenylmethanesulfonyl fluoride | Sigma | 78830 | |
IGEPAL CO-630 | Sigma | 542334 | Commercial form of octyl-phenoxy-polyethoxy-ethanol detergent |
Iodoacetamide | Sigma | I6125 | |
Trypsin, sequencing grade | Promega | V5111 | |
Empore solid phase extraction disk | 3M | 66883 | |
OLIGO R3 Reversed – Phase Resin | Fisher Scientific | 1133903 | |
Benzonase | Sigma | E8263 | High purity nuclease |
Sonic Dismembrator Model 100 | Fisher Scientific | discontinued | updated with FB505110 |
HPLC grade acetonitrile | Fisher Chemical | A955-4 | |
HPLC grade water | Fisher Scientific | W6 4 | |
TFA | Fisher Scientific | A11650 | |
Ammonium Bicarbonate | Sigma | A6141 | |
Acetic Acid | Sigma | 49199 | |
Formic Acid | Sigma | F0507 | |
ReproSil-Pur 18-AQ | Dr. Maisch GmbH HPLC | r13.aq.0003 | Packing material for HPLC column |
Capillary for nano columns (75 µm) | Molex | 1068150017 | |
MaxQuant software | Max Planck Institute for Biochemistry | Can perform chromatographic alignment of multiple MS runs |