R-loops constitute a prevalent class of transcription-driven non-B DNA structures that occur in all genomes depending on both DNA sequence and topological favorability. In recent years, R-loops have been implicated in a variety of adaptive and maladaptive roles and have been linked to genomic instability in the context of human disorders. As a consequence, the accurate mapping of these structures in genomes is of high interest to many investigators. DRIP-seq (DNA:RNA Immunoprecipitation followed by high throughput sequencing) is described here. It is a robust and reproducible technique that permits accurate and semi-quantitative mapping of R-loops. A recent iteration of the method is also described in which fragmentation is accomplished using sonication (sDRIP-seq), which allows strand-specific and high-resolution mapping of R-loops. sDRIP-seq thus addresses some of the common limitations of the DRIP-seq method in terms of resolution and strandedness, making it a method of choice for R-loop mapping.
R-loops constitute a prevalent class of transcription-driven non-B DNA structures that occur in all genomes depending on both DNA sequence and topological favorability. In recent years, R-loops have been implicated in a variety of adaptive and maladaptive roles and have been linked to genomic instability in the context of human disorders. As a consequence, the accurate mapping of these structures in genomes is of high interest to many investigators. DRIP-seq (DNA:RNA Immunoprecipitation followed by high throughput sequencing) is described here. It is a robust and reproducible technique that permits accurate and semi-quantitative mapping of R-loops. A recent iteration of the method is also described in which fragmentation is accomplished using sonication (sDRIP-seq), which allows strand-specific and high-resolution mapping of R-loops. sDRIP-seq thus addresses some of the common limitations of the DRIP-seq method in terms of resolution and strandedness, making it a method of choice for R-loop mapping.
R-loops are three-stranded nucleic acid structures that form primarily during transcription upon hybridization of the nascent RNA transcript to the template DNA strand. This results in the formation of an RNA:DNA hybrid and causes the displacement of the non-template DNA strand in a single-stranded looped state. Biochemical reconstitution1,2,3,4 and mathematical modeling5, in combination with other biophysical measurements6,7, have established that R-loops are more likely to occur over regions that exhibit specific favorable characteristics. For instance, regions that display strand asymmetry in the distribution of guanines (G) and cytosines (C) such that the RNA is G-rich, a property called positive GC skew, are favored to form R-loops when transcribed owing to the higher thermodynamic stability of the DNA:RNA hybrid compared to the DNA duplex8. Regions that have evolved positive GC skew, such as the early portions of many eukaryotic genes4,9,10,11, are prone to forming R-loops in vitro and in vivo3,4,12. Negative DNA superhelical stress also greatly favors structure formation13,14 because R-loops efficiently absorb such topological stresses and return the surrounding DNA fiber to a favorable relaxed state5,15.
Historically, R-loop structures were considered to result from rare, spontaneous, entanglements of RNA with DNA during transcription. However, the development of DNA:RNA immunoprecipitation (DRIP) coupled with high-throughput DNA sequencing (DRIP-seq) allowed the first genome-wide mapping of R-loops and revealed that those structures are far more prevalent than expected in human cells4,16. R-loops occur over tens of thousands of conserved, transcribed, genic hotspots in mammalian genomes, with a predilection for GC-skewed CpG islands overlapping the first intron of genes and the terminal regions of numerous genes17. Overall, R-loops collectively occupy 3%-5% of the genome in human cells, consistent with measurements in other organisms, including yeasts, plants, flies, and mice18,19,20,21,22.
Analysis of R-loop forming hotspots in human cells revealed that such regions associate with specific chromatin signatures23. R-loops, in general, are found over regions with lower nucleosome occupancy and higher RNA polymerase density. At promoters, R-loops associate with increased recruitment of two co-transcriptionally deposited histone modifications, H3K4me1 and H3K36me317. At gene termini, R-loops associate with closely arranged genes that undergo efficient transcription termination17, consistent with prior observations24. R-loops were also shown to participate in the initiation of DNA replication at the replication origins of bacteriophage, plasmid, mitochondrial, and the yeast genomes25,26,27,28,29,30,31. In addition, 76% of R-loop-prone human CpG island promoters function as early, constitutive replication origins32,33,34,35, further reinforcing the connections between R-loops and replication origins. Collectively, these studies suggest that R-loops represent a novel type of biological signal that can trigger specific biological outputs in a context-dependent manner23.
Early on, R-loops were shown to form at class switch sequences during the process of immunoglobulin class switch recombination3,36,37. Such programmed R-loops are thought to initiate class switch recombination through the introduction of double-stranded DNA breaks38. Since then, harmful R-loop formation, generally understood to result from excessive R-loop formation, has been linked to genomic instability and processes such as hyper recombination, transcription-replication collisions, replication, and transcriptional stress (for review39,40,41,42,43). As a consequence, improved mapping of R-loop structures represents an exciting and essential challenge to better decipher the distribution and function of these structures in health and disease.
DNA:RNA immunoprecipitation (DRIP) relies on high affinity of the S9.6 monoclonal antibody for DNA:RNA hybrids44. DRIP-seq permits robust genome-wide profiling of R-loop formation4,45. While useful, this technique suffers from limited resolution due to the fact that restriction enzymes are used to achieve gentle DNA fragmentation. In addition, DRIP-seq does not provide information on the directionality of R-loop formation. Here, we report a variant of DRIP-seq that permits the mapping of R-loops at high resolution in a strand-specific manner. This method relies on sonication to fragment the genome prior to immunoprecipitation and the method is thus called sDRIP-seq (sonication DNA:RNA immunoprecipitation coupled with high throughput sequencing) (Figure 1). The use of sonication permits an increased resolution and limits restriction enzyme-linked fragmentation biases observed in DRIP-seq approaches46. sDRIP-seq produces R-loop maps that are in strong agreement with the results from both DRIP-seq and the previously described high-resolution DRIPc-seq method in which sequencing libraries are built from the RNA strands of immunoprecipitated R-loop structures45.
Faced with a plethora of methods to choose from, users may wonder which particular DRIP-based approach is preferable for their needs. We offer the following advice. DRIP-seq, despite its limitations, is technically easiest and is the most robust (highest yields) of all three methods discussed here; it thus remains broadly useful. Numerous DRIP-seq datasets have been published, which provide a useful comparison point for new datasets. Finally, the bioinformatic analysis pipeline is simpler as the data is not stranded. It is recommended that new users begin honing their R-loop mapping skills with DRIP followed by quantitative polymerase chain reaction (qPCR) and DRIP-seq. sDRIP-seq represents a slightly higher degree of technical difficulty: the yields are slightly reduced due to sonication (discussed below) and the sequencing library process is slightly more complex. Yet, the gain of strandedness and higher resolution is invaluable. It is noted that sDRIP-seq will capture both two-stranded RNA:DNA hybrids and three-stranded R-loops. Due to the library construction steps, DRIP-seq will not capture two-stranded RNA:DNA hybrids. DRIPc-seq is the most technically demanding and requires higher amount of starting materials. In return, it offers the highest resolution and strandedness. Because sequencing libraries are built from the RNA moiety of R-loops or hybrids, DRIPc-seq may suffer from possible RNA contamination, especially since S9.6 possesses residual affinity for dsRNA19,47,48. sDRIP-seq permits strand-specific, high resolution mapping without worries about RNA contamination since sequencing libraries are derived from DNA strands. Overall, these three methods remain useful and present differing degrees of complexity and slightly different caveats. All three, however, produce highly congruent datasets48 and are highly sensitive to RNase H pre-treatment, which represents an essential control to ensure signal specificity45,49. It is noted that given the size selection imposed on sequencing libraries, small hybrids (estimated <75 bp), such as those forming transiently around lagging strand DNA replication priming sites (Okazaki primers) will be excluded. Similarly, since all DRIP methods involve DNA fragmentation, unstable R-loops that require negative DNA supercoiling for their stability will be lost5. Thus, DRIP approaches may underestimate R-loop loads, especially for short, unstable R-loops that may be best captured using in vivo approaches45,48. It is noted that R-loops can also be profiled in an S9.6-independent manner at deep coverage, high-resolution, and in a strand-specific manner on single DNA molecules after sodium bisulfite treatment12. Additionally, strategies using a catalytically inactive RNase H1 enzyme have been employed to map native R-loops in vivo, highlighting short, unstable R-loops that form primarily at paused promoters50,51,52.
The following protocol is optimized for the human Ntera-2 cell line grown in culture, but it has been successfully adapted without modification to a range of other human cell lines (HEK293, K562, HeLa, U2OS), primary cells (fibroblasts, B-cells) as well as in other organisms with small modifications (mice, flies).
1. Cell harvest and lysis
2. DNA extraction
3. DNA fragmentation
NOTE: For restriction enzyme-based DRIP-seq, follow step 3.1. For sonication-based DRIP-seq, skip to step 3.2.
4. S9.6 immunoprecipitation
NOTE: The immunoprecipitation steps are similar regardless of whether DNA was fragmented through REs or sonication.
5. Pre-library step for sonicated DNA only
NOTE: Sonication leads the displaced ssDNA strand of R-loops to break. Thus, three-stranded R-loop structures are converted into two-stranded DNA:RNA hybrids upon sonication. As a result, these DNA:RNA hybrids must be converted back to double-stranded DNA prior to library construction. Here, a second strand synthesis step is employed. An alternative approach that has been successfully used is to instead perform a single-stranded DNA ligation followed by a second strand synthesis53.
6. Pre-library sonication step for RE DNA only
NOTE: DRIP leads to the recovery of RE fragments that are often kilobases in length and thus not suited for immediate library construction.
7. Library construction
8. Quality control
DRIP as well as sDRIP can be analyzed through qPCR (Figure 2A) and/or sequencing (Figure 2B). After the immunoprecipitation step, the quality of the experiment must be first confirmed by qPCR on positive and negative control loci, as well as with RNase H-treated controls. Primers corresponding to frequently used loci in multiple human cell lines are provided in Table 2. The results from qPCR should be displayed as a percentage of input, which corresponds to the percentage of cells carrying an R-loop at the time of the lysis for a given locus. In a successful DRIP experiment, the yield for negative loci should be less than 0.1% whereas positive loci can vary from 1% to over 10% for highly transcribed loci such as RPL13A (Figure 2A). For sDRIP, yields are typically lower (20%-50%) as judged by DRIP-qPCR but appear to affect recovery uniformly such that no particular subset of R-loops is affected more than another. As a result, maps derived from DRIP, sDRIP, and DRIPc are in good agreement (Figure 2B). qPCR data can also be displayed as fold enrichment of the percentage of input for positive loci over negative loci, thus assessing the specificity of the experiment. Fold enrichments typically range from a minimal of 10-fold to over 200-fold depending on the loci chosen for analysis. When precise quantification across multiple samples representing gene knockdowns, knockouts, or various pharmacological treatments, is required, the use of spike in controls to normalize inter-sample experimental variation is highly encouraged. Such spike-ins can correspond to synthetic hybrids53 or genomes of unrelated species54.
DRIP and sDRIP materials can be sequenced using single or paired-end sequencing strategies. Data can be extracted and analyzed similarly to most ChIP data using standard computational pipelines (see45 for DRIP-relevant information). After adapter trimming and removal of PCR duplicates, reads can be mapped to a reference genome and uploaded to a genome browser. A typical expected output of DRIP and sDRIP is shown in Figure 2B. The DRIP output is represented by the only green track as it does not allow strand specificity whereas sDRIP shows R-loop mapping to the positive and negative strands indicated respectively in red and blue. Control tracks corresponding to a sample pre-treated with RNase H show a clear reduction of signals, confirming the specificity of the technique for RNA:DNA hybrid-derived materials. The gains in resolution permitted by sDRIP are clearly illustrated when comparing the sizes of input DNA material (Figure 2C). The reproducibility of sDRIP-seq, along with the global impact of RNase H1 pre-treatment and the correlation between sDRIP-seq and DRIPc-seq are depicted by XY plots in Figure 2D.
Figure 1: Overview of the DRIP-seq and sDRIP-seq procedures. Both approaches start by the same DNA extraction steps developed to preserve R-loops (RNA strands within R-loops are represented by squiggly lines). For DRIP-seq, the genome is fragmented using restriction enzymes, often resulting in kilobase-size fragments within which shorter R-loops are embedded. For sDRIP-seq, the genome is fragmented via sonication, which results in smaller fragments and the shearing and loss of the displaced single-strand of R-loops (indicated by dashed lines). Following immunoprecipitation with the S9.6 antibody, DRIP leads to the recovery of three-stranded R-loops embedded within restriction fragments, while sDRIP recovers two-stranded RNA:DNA hybrids with little flanking DNA, ensuring higher resolution. For sDRIP, a library construction step must be included to convert RNA:DNA hybrids back to duplex DNA. As shown here, this is an opportunity to build strand-specific libraries. As detailed in the protocol itself, exogenous treatment with RNase H represents a key control for the specificity of both procedures; they are not shown here. Please click here to view a larger version of this figure.
Figure 2: Result of R-loop mapping strategies. (A) qPCR results from successful immunoprecipitations using the DRIP and sDRIP method (corresponding to qPCR check step 4.13). Results are from two independent experiments from human Ntera-2 cells at a negative locus and two positive loci, including the highly R-loop-prone RPL13A locus and the moderately R-loop-prone TFPT locus. The y-axis indicates the yield of the immunoprecipitation as a percentage of the input DNA. Note that the recovery is slightly more robust for DRIP than sDRIP. (B) The results of R-loop mapping conducted in human Ntera-2 cells are shown over a region centered around the CCND1 and neighboring ORAOV1 genes. The first two tracks correspond to DRIP-seq results, without and with RNase H treatment, respectively. The position of the restriction enzymes used to fragment the genome are shown at the top. The next six tracks represent the results of strand-specific sDRIP-seq, broken down between (+) and (-) strands (two replicates each) and pre-treated with RNase H, or not, as indicated. The last four tracks represent the results of R-loop mapping via the high-resolution strand-specific DRIPc-seq method (Sanz et al., 2016; Sanz and Chedin, 2019), where libraries are built from the RNA strands of R-loops. As can be clearly seen, the CCND1 and ORAOV1 genes lead to R-loop formation on the (+) and (-) strands, respectively, consistent with their directionality. RNase H treatment abolishes the signal, as expected. (C) Input DNA materials after restriction enzyme fragmentation (left) and sonication (right) are shown after the materials were separated by agarose gel electrophoresis. The DNA ladder corresponds to a 100 bp ladder and the 500 bp band is highlighted by an asterisk. (D) XY signal correlation plots are shown to illustrate the reproducibility of sDRIP-seq (left), the overall sensitivity of sDRIP-seq to RNase H1 pre-treatment (middle), and the global correlation between sDRIP-seq and DRIPc-seq (right). All data are from Ntera-2 human cells. Please click here to view a larger version of this figure.
Table 1: PCR program settings. The duration and temperature settings for the PCR cycles are listed. Please click here to download this Table.
Table 2: Primers used for qPCR validation in human cell lines. All sequences are listed in the 5' to 3' direction. Please click here to download this Table.
Described here are two protocols to map R-loop structures in potentially any organism using the S9.6 antibody. DRIP-seq represents the first genome-wide R-loop mapping technique developed. It is an easy, robust, and reproducible technique that allows one to map the distribution of R-loops along any genome. The second technique, termed sDRIP-seq, is also robust and reproducible but achieves higher resolution and strand-specificity owing to the inclusion of a sonication step and a stranded sequencing library construction protocol. Both techniques are highly sensitive to RNase H treatment prior to immunoprecipitation, confirming that the signal is principally derived from genuine RNA:DNA hybrids. Finally, when comparing immunoprecipitation yields between R-loop positive and R-loop negative loci, both techniques offer up to a 100-fold difference in several human cell lines, providing high specificity mapping with low background.
When considering which method to implement, it is useful to consider their respective strengths and limitations. As previously noted, DRIP-seq produces maps with a lower resolution and does not give information on the strandedness of R-loop formation. The lower resolution is mainly a product of the use of REs to fragment the genome. This gentle method is best at preserving R-loops, thereby allowing unsurpassed recovery of such structures, and making DRIP-seq very robust. To circumvent the issue of limited resolution while preserving high recovery, RE cocktails can be adapted and/or maps resulting from different RE cocktails can be combined to improve resolution16. A technique using 4 bp cutters has been developed to improve the resolution of DRIP-seq and may achieve strand-specific mapping22,55, although the resulting datasets have not yet been systematically compared to other human datasets. It is important to note that in RE-based approaches, larger fragments tend to be recovered more efficiently because they can carry multiple R-loop forming regions. This bias must be taken into account when analyzing DRIP-seq datasets. Similarly, peak calling for DRIP-seq data must be ultimately translated into R-loop-positive RE fragments, since it is these fragments that are immunoprecipitated and the position of R-loops within these fragments can’t be inferred. In general, it is recommended that users first adopt RE-based DRIP-seq to learn the method and build their confidence in achieving the yields documented in Figure 2A. sDRIP-seq typically results in lower yields, which could result in maps with lower signal-to-noise ratios in untrained hands. The use of sonication as a means of fragmenting the genome offers in return a great improvement in resolution since the non-R-looped portions that typically constitute the majority of RE fragments will be broken off, allowing S9.6 to principally retrieve the R-looped portions (Figure 1). It is worth noting that sonication causes the displaced ssDNA strand of R-loops to break. It is therefore essential to add a second strand synthesis after immunoprecipitating sonicated DNA:RNA hybrids, which will convert these hybrids back to dsDNA, prior to building sequencing libraries. Without this step, the only fragments that can be ligated to dsDNA adapters will be background dsDNA fragments; thus, the resulting maps will be devoid of any signal. Strand-specificity provides numerous further benefits to the understanding of R-loop formation mechanisms, making sDRIP-seq a method of choice for the study of R-loops.
Importantly, maps obtained via DRIP-seq and sDRIP-seq represent the average distribution of R-loops through a cell population; thus, the length and position of individual R-loops cannot be addressed with those techniques. For this, an independent and complementary method termed single-molecule R-loop footprinting (SMRF-seq)12 can be leveraged to reveal individual R-loops at high-resolution in a strand-specific manner. Assessment of R-loop formation using SMRF-seq over 20 different loci, including independently of S9.6, revealed a strong agreement between collection of individual R-loop footprints and the population average distribution gather by DRIP-based approaches12, lending strong support to DRIP-based approaches. It is also important to consider that R-loop mapping data only provides a snapshot of R-loop genomic distribution and does not provide information on the dynamics of R-loop formation, stability, and resolution. DRIP approaches, combined with specific drug treatments and an evaluation of R-loop distributions through time series, can nonetheless be deployed to address these parameters17,53. The limitations of R-loop profiling methodologies are particularly important to keep in mind when the goal is to characterize altered R-loop distributions in response to genetic, environmental, or pharmacological perturbations. In addition to those already described above, it is key to consider any possible change to nascent transcription since these will inherently cause R-loop changes owing to the co-transcriptional nature of these structures. These issues and guidelines for developing rigorous R-loop mapping approaches have been extensively discussed48,56 and readers are encouraged to refer to these studies.
The authors have nothing to disclose.
Work in the Chedin lab is supported by a grant from the National Institutes of Health (R01 GM120607).
15 mL tube High density Maxtract phase lock gel | Qiagen | 129065 | |
2 mL tube phase lock gel light | VWR | 10847-800 | |
Agarose A/G beads | ThermoFisher Scientific | 20421 | |
Agencourt AMPure XP beads | Beckman Coulter | A63881 | |
AmpErase Uracil N-glycosylase | ThermoFisher Scientific | N8080096 | |
Index adapters | Illumina | Corresponds to the TrueSeq Single indexes | |
Klenow fragment (3’ to 5’ exo-) | New England BioLabs | M0212S | |
NEBNext End repair module | New England BioLabs | E6050 | |
PCR primers for library amplification | primer 1.0 P5 (5’ AATGATACGGCGACCACCGAGA TCTACACTCTTTCCCTACACGA 3’) |
||
PCR primers for library amplification | PCR primer 2.0 P7 (5’ CAAGCAGAAGACGGCATACG AGAT 3’) |
||
Phenol/Chloroform Isoamyl alcohol 25:24:1 | Affymetrix | 75831-400ML | |
Phusion Flash High-Fidelity PCR master mix | ThermoFisher Scientific | F548S | |
Quick Ligation Kit | New England BioLabs | M2200S | |
Ribonuclease H | New England BioLabs | M0297S | |
S9.6 Antibody | Kerafast | ENH001 | These three sources are equivalent |
S9.6 Antibody | Millipore/Sigma | MABE1095 | |
S9.6 Antibody | Abcam | ab234957 |
.