R-loops constitute a prevalent class of transcription-driven non-B DNA structures that occur in all genomes depending on both DNA sequence and topological favorability. In recent years, R-loops have been implicated in a variety of adaptive and maladaptive roles and have been linked to genomic instability in the context of human disorders. As a consequence, the accurate mapping of these structures in genomes is of high interest to many investigators. DRIP-seq (DNA:RNA Immunoprecipitation followed by high throughput sequencing) is described here. It is a robust and reproducible technique that permits accurate and semi-quantitative mapping of R-loops. A recent iteration of the method is also described in which fragmentation is accomplished using sonication (sDRIP-seq), which allows strand-specific and high-resolution mapping of R-loops. sDRIP-seq thus addresses some of the common limitations of the DRIP-seq method in terms of resolution and strandedness, making it a method of choice for R-loop mapping.
R-loops constitute a prevalent class of transcription-driven non-B DNA structures that occur in all genomes depending on both DNA sequence and topological favorability. In recent years, R-loops have been implicated in a variety of adaptive and maladaptive roles and have been linked to genomic instability in the context of human disorders. As a consequence, the accurate mapping of these structures in genomes is of high interest to many investigators. DRIP-seq (DNA:RNA Immunoprecipitation followed by high throughput sequencing) is described here. It is a robust and reproducible technique that permits accurate and semi-quantitative mapping of R-loops. A recent iteration of the method is also described in which fragmentation is accomplished using sonication (sDRIP-seq), which allows strand-specific and high-resolution mapping of R-loops. sDRIP-seq thus addresses some of the common limitations of the DRIP-seq method in terms of resolution and strandedness, making it a method of choice for R-loop mapping.
R-loops are three-stranded nucleic acid structures that form primarily during transcription upon hybridization of the nascent RNA transcript to the template DNA strand. This results in the formation of an RNA:DNA hybrid and causes the displacement of the non-template DNA strand in a single-stranded looped state. Biochemical reconstitution1,2,3,4 and mathematical modeling5, in combination with other biophysical measurements6,7, have established that R-loops are more likely to occur over regions that exhibit specific favorable characteristics. For instance, regions that display strand asymmetry in the distribution of guanines (G) and cytosines (C) such that the RNA is G-rich, a property called positive GC skew, are favored to form R-loops when transcribed owing to the higher thermodynamic stability of the DNA:RNA hybrid compared to the DNA duplex8. Regions that have evolved positive GC skew, such as the early portions of many eukaryotic genes4,9,10,11, are prone to forming R-loops in vitro and in vivo3,4,12. Negative DNA superhelical stress also greatly favors structure formation13,14 because R-loops efficiently absorb such topological stresses and return the surrounding DNA fiber to a favorable relaxed state5,15.
Historically, R-loop structures were considered to result from rare, spontaneous, entanglements of RNA with DNA during transcription. However, the development of DNA:RNA immunoprecipitation (DRIP) coupled with high-throughput DNA sequencing (DRIP-seq) allowed the first genome-wide mapping of R-loops and revealed that those structures are far more prevalent than expected in human cells4,16. R-loops occur over tens of thousands of conserved, transcribed, genic hotspots in mammalian genomes, with a predilection for GC-skewed CpG islands overlapping the first intron of genes and the terminal regions of numerous genes17. Overall, R-loops collectively occupy 3%-5% of the genome in human cells, consistent with measurements in other organisms, including yeasts, plants, flies, and mice18,19,20,21,22.
Analysis of R-loop forming hotspots in human cells revealed that such regions associate with specific chromatin signatures23. R-loops, in general, are found over regions with lower nucleosome occupancy and higher RNA polymerase density. At promoters, R-loops associate with increased recruitment of two co-transcriptionally deposited histone modifications, H3K4me1 and H3K36me317. At gene termini, R-loops associate with closely arranged genes that undergo efficient transcription termination17, consistent with prior observations24. R-loops were also shown to participate in the initiation of DNA replication at the replication origins of bacteriophage, plasmid, mitochondrial, and the yeast genomes25,26,27,28,29,30,31. In addition, 76% of R-loop-prone human CpG island promoters function as early, constitutive replication origins32,33,34,35, further reinforcing the connections between R-loops and replication origins. Collectively, these studies suggest that R-loops represent a novel type of biological signal that can trigger specific biological outputs in a context-dependent manner23.
Early on, R-loops were shown to form at class switch sequences during the process of immunoglobulin class switch recombination3,36,37. Such programmed R-loops are thought to initiate class switch recombination through the introduction of double-stranded DNA breaks38. Since then, harmful R-loop formation, generally understood to result from excessive R-loop formation, has been linked to genomic instability and processes such as hyper recombination, transcription-replication collisions, replication, and transcriptional stress (for review39,40,41,42,43). As a consequence, improved mapping of R-loop structures represents an exciting and essential challenge to better decipher the distribution and function of these structures in health and disease.
DNA:RNA immunoprecipitation (DRIP) relies on high affinity of the S9.6 monoclonal antibody for DNA:RNA hybrids44. DRIP-seq permits robust genome-wide profiling of R-loop formation4,45. While useful, this technique suffers from limited resolution due to the fact that restriction enzymes are used to achieve gentle DNA fragmentation. In addition, DRIP-seq does not provide information on the directionality of R-loop formation. Here, we report a variant of DRIP-seq that permits the mapping of R-loops at high resolution in a strand-specific manner. This method relies on sonication to fragment the genome prior to immunoprecipitation and the method is thus called sDRIP-seq (sonication DNA:RNA immunoprecipitation coupled with high throughput sequencing) (Figure 1). The use of sonication permits an increased resolution and limits restriction enzyme-linked fragmentation biases observed in DRIP-seq approaches46. sDRIP-seq produces R-loop maps that are in strong agreement with the results from both DRIP-seq and the previously described high-resolution DRIPc-seq method in which sequencing libraries are built from the RNA strands of immunoprecipitated R-loop structures45.
Faced with a plethora of methods to choose from, users may wonder which particular DRIP-based approach is preferable for their needs. We offer the following advice. DRIP-seq, despite its limitations, is technically easiest and is the most robust (highest yields) of all three methods discussed here; it thus remains broadly useful. Numerous DRIP-seq datasets have been published, which provide a useful comparison point for new datasets. Finally, the bioinformatic analysis pipeline is simpler as the data is not stranded. It is recommended that new users begin honing their R-loop mapping skills with DRIP followed by quantitative polymerase chain reaction (qPCR) and DRIP-seq. sDRIP-seq represents a slightly higher degree of technical difficulty: the yields are slightly reduced due to sonication (discussed below) and the sequencing library process is slightly more complex. Yet, the gain of strandedness and higher resolution is invaluable. It is noted that sDRIP-seq will capture both two-stranded RNA:DNA hybrids and three-stranded R-loops. Due to the library construction steps, DRIP-seq will not capture two-stranded RNA:DNA hybrids. DRIPc-seq is the most technically demanding and requires higher amount of starting materials. In return, it offers the highest resolution and strandedness. Because sequencing libraries are built from the RNA moiety of R-loops or hybrids, DRIPc-seq may suffer from possible RNA contamination, especially since S9.6 possesses residual affinity for dsRNA19,47,48. sDRIP-seq permits strand-specific, high resolution mapping without worries about RNA contamination since sequencing libraries are derived from DNA strands. Overall, these three methods remain useful and present differing degrees of complexity and slightly different caveats. All three, however, produce highly congruent datasets48 and are highly sensitive to RNase H pre-treatment, which represents an essential control to ensure signal specificity45,49. It is noted that given the size selection imposed on sequencing libraries, small hybrids (estimated <75 bp), such as those forming transiently around lagging strand DNA replication priming sites (Okazaki primers) will be excluded. Similarly, since all DRIP methods involve DNA fragmentation, unstable R-loops that require negative DNA supercoiling for their stability will be lost5. Thus, DRIP approaches may underestimate R-loop loads, especially for short, unstable R-loops that may be best captured using in vivo approaches45,48. It is noted that R-loops can also be profiled in an S9.6-independent manner at deep coverage, high-resolution, and in a strand-specific manner on single DNA molecules after sodium bisulfite treatment12. Additionally, strategies using a catalytically inactive RNase H1 enzyme have been employed to map native R-loops in vivo, highlighting short, unstable R-loops that form primarily at paused promoters50,51,52.
Described here are two protocols to map R-loop structures in potentially any organism using the S9.6 antibody. DRIP-seq represents the first genome-wide R-loop mapping technique developed. It is an easy, robust, and reproducible technique that allows one to map the distribution of R-loops along any genome. The second technique, termed sDRIP-seq, is also robust and reproducible but achieves higher resolution and strand-specificity owing to the inclusion of a sonication step and a stranded sequencing library construction p…
The authors have nothing to disclose.
Work in the Chedin lab is supported by a grant from the National Institutes of Health (R01 GM120607).
15 mL tube High density Maxtract phase lock gel | Qiagen | 129065 | |
2 mL tube phase lock gel light | VWR | 10847-800 | |
Agarose A/G beads | ThermoFisher Scientific | 20421 | |
Agencourt AMPure XP beads | Beckman Coulter | A63881 | |
AmpErase Uracil N-glycosylase | ThermoFisher Scientific | N8080096 | |
Index adapters | Illumina | Corresponds to the TrueSeq Single indexes | |
Klenow fragment (3’ to 5’ exo-) | New England BioLabs | M0212S | |
NEBNext End repair module | New England BioLabs | E6050 | |
PCR primers for library amplification | primer 1.0 P5 (5’ AATGATACGGCGACCACCGAGA TCTACACTCTTTCCCTACACGA 3’) |
||
PCR primers for library amplification | PCR primer 2.0 P7 (5’ CAAGCAGAAGACGGCATACG AGAT 3’) |
||
Phenol/Chloroform Isoamyl alcohol 25:24:1 | Affymetrix | 75831-400ML | |
Phusion Flash High-Fidelity PCR master mix | ThermoFisher Scientific | F548S | |
Quick Ligation Kit | New England BioLabs | M2200S | |
Ribonuclease H | New England BioLabs | M0297S | |
S9.6 Antibody | Kerafast | ENH001 | These three sources are equivalent |
S9.6 Antibody | Millipore/Sigma | MABE1095 | |
S9.6 Antibody | Abcam | ab234957 |
.