Here we describe a method amenable to simultaneously quantitate and genome-wide map ribonucleotides in highly intact DNA at single-nucleotide resolution, combining enzymatic cleavage of genomic DNA with its alkaline hydrolysis and subsequent 5´-end sequencing.
Established approaches to estimate the number of ribonucleotides present in a genome are limited to the quantitation of incorporated ribonucleotides using short synthetic DNA fragments or plasmids as templates and then extrapolating the results to the whole genome. Alternatively, the number of ribonucleotides present in a genome may be estimated using alkaline gels or Southern blots. More recent in vivo approaches employ Next-generation sequencing allowing genome-wide mapping of ribonucleotides, providing the position and identity of embedded ribonucleotides. However, they do not allow quantitation of the number of ribonucleotides which are incorporated into a genome. Here we describe how to simultaneously map and quantitate the number of ribonucleotides which are incorporated into human mitochondrial DNA in vivo by Next-generation sequencing. We use highly intact DNA and introduce sequence specific double strand breaks by digesting it with an endonuclease, subsequently hydrolyzing incorporated ribonucleotides with alkali. The generated ends are ligated with adapters and these ends are sequenced on a Next-generation sequencing machine. The absolute number of ribonucleotides can be calculated as the number of reads outside the recognition site per average number of reads at the recognition site for the sequence specific endonuclease. This protocol may also be utilized to map and quantitate free nicks in DNA and allows adaption to map other DNA lesions that can be processed to 5´-OH ends or 5´-phosphate ends. Furthermore, this method can be applied to any organism, given that a suitable reference genome is available. This protocol therefore provides an important tool to study DNA replication, 5´-end processing, DNA damage, and DNA repair.
In a eukaryotic cell, the concentration of ribonucleotides (rNTPs) is much higher than the concentration of deoxyribonucleotides (dNTPs)1. DNA polymerases discriminate against ribonucleotides, but this discrimination is not perfect and, as a consequence, ribonucleotides instead of deoxyribonucleotides may be incorporated into genomes during DNA replication. Ribonucleotides may be the most common non-canonical nucleotides incorporated into the genome2. Most of these ribonucleotides are removed during Okazaki fragment maturation by RNase H2 initiated ribonucleotide excision repair (RER) or by Topoisomerase 1 (reviewed in reference3). Ribonucleotides that cannot be removed stay stably incorporated in the DNA2,4 and may affect it in both harmful and beneficial ways (reviewed in reviewed5). Besides being able to act as positive signals, for example in mating type switch in Schizosaccharomyces pombe6 and marking the nascent DNA strand during mismatch repair (MMR)7,8, ribonucleotides affect the structure9 and stability of the surrounding DNA due to the 2´-hydroxyl group of their ribose10, resulting in replicative stress and genome instability11. The abundance of ribonucleotides in genomic DNA (gDNA) and their relevance in replication and repair mechanisms, as well as the implications for genome stability, give reason to investigate their precise occurrence and frequency in a genome-wide manner.
RNase H2 activity has not been found in human mitochondria and ribonucleotides are therefore not efficiently removed in mitochondrial DNA (mtDNA). Several pathways are involved in the supply of nucleotides to human mitochondria and to investigate whether disturbances in the mitochondrial nucleotide pool cause an elevated number of ribonucleotides in human mtDNA, we developed a protocol to map and quantitate these ribonucleotides in human mtDNA isolated from fibroblasts, HeLa cells, and patient cell lines12.
Most in vitro approaches (reviewed in reviewed13) to determine DNA polymerases' selectivity against rNTPs are based on single ribonucleotide insertion or primer extension experiments where competing rNTPs are included in the reaction mix, allowing the identification or relative quantitation of ribonucleotide incorporation in short DNA templates. Quantitative approaches on short sequences may not reflect dNTP and rNTP pools at cellular concentrations and therefore provide insight into polymerase selectivity but are of limited significance regarding whole genomes. It has been shown that the relative amount of ribonucleotides incorporated during the replication of a longer DNA template, such as a plasmid, can be visualized on a sequencing gel using radiolabeled dNTPs and hydrolyzing the DNA in an alkaline milieu14. Furthermore, gDNA has been analyzed on Southern blots following alkaline hydrolysis, allowing strand-specific probing and determination of absolute rates of ribonucleotide incorporation in vivo15. These approaches allow a relative comparison of incorporation frequency but deliver no insight into the position or identity of the incorporated ribonucleotides. More recent approaches to analyze the ribonucleotide content in gDNA in vivo, like HydEn-Seq16, Ribose-Seq17, Pu-Seq18, or emRiboSeq19, take advantage of the embedded ribonucleotides' sensitivity to alkaline or RNase H2 treatment, respectively, and employ Next-generation sequencing to identify ribonucleotides genome-wide. These methods do not provide insight into the absolute incorporation frequency of the detected ribonucleotides. By adding the step of sequence specific enzymatic cleavage to the HydEn-seq protocol, the method we describe here conveniently extends the information gained from a sequencing approach, allowing simultaneous mapping and quantitation of embedded ribonucleotides12. This method is applicable to virtually any organism given that highly intact DNA extracts can be generated and a suitable reference genome is available. The method could be adapted to quantitate and determine the location of any lesion that can be digested by a nuclease and leaves a 5´-phosphate or a 5´-OH end.
To map and quantitate ribonucleotides in genomic DNA, the method combines cleavage by a sequence specific endonuclease and alkaline hydrolysis generating 5´-phosphate ends at sites where the specific recognition sequence for the endonuclease is located and 5´-OH ends at positions where ribonucleotides were located. Since the generated free ends are subsequently ligated with adapters and sequenced using Next-generation sequencing, it is of importance to use highly intact DNA and avoid random fragmentation during DNA extraction and library preparation. Assessing these reads normalized to the reads at the endonuclease cleavage sites allows a simultaneous quantitation and mapping of the detected ribonucleotides. Free 5´-ends are detected in control experiments where the alkaline hydrolysis of DNA is replaced by treatment with KCl. The acquired data provide insight into ribonucleotide location and quantity and allows analyses with respect to ribonucleotide content and incorporation frequency.
This protocol is outlined in Figure 1 and includes the isolation of gDNA, digestion with restriction enzymes to be able to quantitate the number of ribonucleotides, treatment with alkali to hydrolyze the phosphodiester bonds of ribonucleotides incorporated into the gDNA, phosphorylation of free 5´-OH ends, ssDNA ligation of adapters, second strand synthesis, and PCR amplification before sequencing.
1. Adapters and Index Primers
2. Growth and Harvest of Cells
3. DNA Purification and Quantitation
4. HincII Treatment and Alkaline Hydrolysis
5. 5´ End Phosphorylation
6. ssDNA Ligation
7. Second-strand Synthesis
8. PCR Amplification and Library Quantitation
9. Library Analysis and Pooling
10. Sequencing
11. Data Analysis
Illustrating the methodology described above, representative data were generated analyzing human mitochondrial DNA from HeLa cells12. Figure 2B shows the summarized reads at all HincII sites in heavy (HS) and light strand (LS) of human mtDNA after KCl treatment (left panels). Around 70% of all detected 5´-ends localize to the cut-sites, demonstrating the high efficiency of the HincII digestion. Treating libraries with KOH to hydrolyze the DNA at embedded ribonucleotides decreases the number of reads at HincII sites to about 40% (Figure 2B, right panels). This is expected since large numbers of 5´-ends are generated at the sites of ribonucleotide incorporation, and is indicative of a sufficient library quality. Figure 2C illustrates the localization and frequency of 5´-ends (green) after KCl treatment and reads generated by HydEn-seq (magenta) after KOH treatment, detecting both free 5´-ends and ends generated at ribonucleotides by alkaline hydrolysis. Free 5´-ends and ribonucleotides localizing to the HS of human mtDNA are shown in the left panel and those localizing to the LS are shown in the right panel. The relative numbers of raw reads at ribonucleotides (Figure 2D, upper panel) or HincII sites (lower panel) on HS and LS of mtDNA show, respectively, a 14-fold or 31-fold stronger coverage of the LS relative to the HS, while a similar bias was not observed for nuclear DNA. This strand bias may be explained by the distinct difference in base composition of the two strands and illustrates the importance of the normalization to reads at HincII sites.
Normalizing read counts to HincII gives a quantitative measure of the number of ribonucleotides per mitochondrial genome (Figure 3A). As illustrated in Figure 3B, the reads after KOH treatment for each ribonucleotide normalized to the sequence composition of each strand show a ratio different than 1, indicating a non-random distribution of reads suggesting a distinct ribonucleotide pattern and a high library quality. That ratio is unaffected by previous digestion with HincII, verifying the enzyme's cleavage specificity. Normalizing the reads at the sites of embedded ribonucleotides to those at HincII cleavage sites, as well as to the genome nucleotide content, generates a quantitative measure of how many of each ribonucleotide are incorporated per 1,000 complementary bases (Figure 3C).
Figure 1: Schematic for DNA Processing and Library Preparation. (1) Whole genomic DNA is cleaved by HincII for normalization in the subsequent quantitation of ribonucleotides, generating blunt ends at HincII sites (black arrowhead). (2) The DNA is treated with KOH to hydrolyze at ribonucleotide sites, leading to 2´,3´-cyclic phosphate (red pentagon) at 3´-ends and free 5´-OH ends. (3) 5´-OH ends are phosphorylated by T4 Polynucleotide Kinase 3´-phosphatase-minus. (4) All 5´-ends carrying a phosphate group are ligated to the ARC140 oligonucleotide by T4 RNA ligase. (5) The second strand is synthesized using T7 DNA Polymerase and the ARC76-77 oligonucleotides containing random N6 sequences. (6) The library is amplified by a high-fidelity DNA Polymerase using ARC49 and one of the ARC78 to ARC107 index primers containing a unique barcode for multiplexing. (7) 5´-ends are located by paired-end sequencing. Please click here to view a larger version of this figure.
Figure 2: Method validation. (A) Representative electropherograms generated using an automated electrophoresis system to determine the quality of generated libraries treated with KOH or KCl. (B) Summarized signal at HincII sites in heavy (HS) and light strand (LS) human mtDNA after KCl (left panels) or KOH (right panels) treatment. (C) Circos figure of free 5´-ends (green) and from HydEn-Seq (free 5´-ends and ribonucleotides, magenta) in HS (left panel) and LS (right panel) human mtDNA. Peaks are normalized to per million reads and the maximum peak is adjusted to the maximum number of reads of the HydEn-seq library. (D) Summarized raw reads at ribonucleotides (upper panel) and HincII sites (lower panel) in heavy (H) and light (L) strand in human mtDNA (Mito.) or in reverse (RV) or forward (FW) strand in nuclear (Nuc.) DNA. Figures B, C and D are adapted from reference12. Error bars represent the standard error of the mean. Please click here to view a larger version of this figure.
Figure 3: Representative Results. (A) The relative number of ribonucleotides normalized to reads at HincII sites for KOH treated libraries on the heavy (H) or light (L) strand. (B) Ratio of ribonucleotide identity to mtDNA genome composition for KOH treated (KOH) and HincII cleaved with KOH treated (HincII+KOH) libraries on the heavy (H) or light (L) strand of mtDNA. (C) Ribonucleotide frequency normalized to 1,000 complementary bases for HincII and KOH treated libraries on the heavy (H) or light (L) strand of mtDNA. Figures are adapted from reference12. Error bars represent the standard error of the mean. Please click here to view a larger version of this figure.
Name | Sequence | |||
ARC49 | AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT | |||
ARC76 | GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCTNNNN*N*N | |||
ARC77 | AGATCGGAAGAGCACACGTCTGAACTCCAGTC*A*C | |||
ARC78 | CAAGCAGAAGACGGCATACGAGATCGTGATGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT | |||
ARC84 | CAAGCAGAAGACGGCATACGAGATACATCGGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT | |||
ARC85 | CAAGCAGAAGACGGCATACGAGATGCCTAAGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT | |||
ARC86 | CAAGCAGAAGACGGCATACGAGATTGGTCAGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT | |||
ARC87 | CAAGCAGAAGACGGCATACGAGATCACTGTGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT | |||
ARC88 | CAAGCAGAAGACGGCATACGAGATATTGGCGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT | |||
ARC89 | CAAGCAGAAGACGGCATACGAGATGATCTGGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT | |||
ARC90 | CAAGCAGAAGACGGCATACGAGATTCAAGTGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT | |||
ARC91 | CAAGCAGAAGACGGCATACGAGATCTGATCGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT | |||
ARC93 | CAAGCAGAAGACGGCATACGAGATAAGCTAGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT | |||
ARC94 | CAAGCAGAAGACGGCATACGAGATGTAGCCGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT | |||
ARC95 | CAAGCAGAAGACGGCATACGAGATTACAAGGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT | |||
ARC96 | CAAGCAGAAGACGGCATACGAGATTGTTGACTGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT | |||
ARC97 | CAAGCAGAAGACGGCATACGAGATACGGAACTGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT | |||
ARC98 | CAAGCAGAAGACGGCATACGAGATTCTGACATGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT | |||
ARC99 | CAAGCAGAAGACGGCATACGAGATCGGGACGGGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT | |||
ARC100 | CAAGCAGAAGACGGCATACGAGATGTGCGGACGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT | |||
ARC101 | CAAGCAGAAGACGGCATACGAGATCGTTTCACGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT | |||
ARC102 | CAAGCAGAAGACGGCATACGAGATAAGGCCACGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT | |||
ARC103 | CAAGCAGAAGACGGCATACGAGATTCCGAAACGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT | |||
ARC104 | CAAGCAGAAGACGGCATACGAGATTACGTACGGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT | |||
ARC105 | CAAGCAGAAGACGGCATACGAGATATCCACTCGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT | |||
ARC106 | CAAGCAGAAGACGGCATACGAGATATATCAGTGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT | |||
ARC107 | CAAGCAGAAGACGGCATACGAGATAAAGGAATGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT | |||
ARC140 | /5AmMC6/ACACTCTTTCCCTACACGACGCTCTTCCGATCT |
Table 1: Oligonucleotides. Listed are the oligonucleotides used for HydEn-Seq. Bold face indicates indexing. * indicates a phosphorothioate bond. ARC140 contains a 5´-amino group instead of a 5´-OH group, in combination with a C6 linker. This modification reduces formation of ARC140 concatemers during ligation.
Here we present a technique to simultaneously map and quantify ribonucleotides in gDNA, and mtDNA in particular, by the simple introduction of DNA cleavage at sequence specific sites in the genome as an addition to the established HydEn-seq protocol. While this study focuses on human mtDNA, originally the HydEn-seq method was developed in Saccharomyces cerevisiae, illustrating the method's translation to other organisms12,16.
For reliable results obtained from this approach, some critical steps should be noted: (A) Since sequencing adapters ligate to all available 5´-ends, it is crucial to work with highly intact DNA. DNA should be isolated and libraries should be made preferably immediately after DNA isolation, or the DNA can be stored at -20 °C. It is not recommended to store DNA in the fridge for a long time or to repeatedly freeze and thaw it. (B) To generate suitable libraries with this method, it is crucial to perform the KOH treatment of the DNA in an incubation oven, rather than a heating block, assuring homogenous heating of the whole sample and quantitative hydrolysis. (C) Furthermore, it is critical to control the quality of libraries before pooling and sequencing. The DNA should be quantified and analyzed using an automated electrophoresis system to ensure adequate amounts of library DNA, confirm appropriate fragment sizes, and check for primer dimers.
For a meaningful data analysis, it is also important to note that the informative value of this method is dependent on appropriate controls to assess background counts and sequence or strand biases. We routinely achieve a mapping efficiency in KCl samples of close to 70% when only digesting with the sequence specific endonuclease (Figure 2B, left panels). In addition, it is important to confirm that the endonuclease treatment is not affecting the overall detection of incorporated ribonucleotides by comparing HincII treated and untreated samples (Figure 3B). In these experiments, we have used HincII to introduce site specific cuts, though other high-fidelity restriction enzymes could also be used.
The protocol could be adapted to study other types of DNA lesions that can be processed to 5´-phosphate or 5´-OH ends. The accuracy of the results is dependent on the specificity of processing and requires suitable controls (e.g., wild type or untreated) for verification. Moreover, when adapting this method to other applications or for use with other organisms, one should consider that the method in its current setup requires about 1 µg of DNA which is processed to a library. Since the number of ends is dependent on the number of embedded ribonucleotides, which varies depending on the organism or mutant, samples including a lower number of ribonucleotides would require more input DNA to generate a sufficient number of ends in the subsequent library construction. Similarly, if DNA samples have a much higher number of ribonucleotides, it would also require using less input DNA to obtain optimal conditions for ligation, second strand synthesis, and PCR amplification. It is noteworthy that the library construction as described in this protocol also generated data covering the nuclear genome (as displayed in Figure 2D) and only the data analysis was focused on mtDNA. This illustrates that larger genomes with moderately lower ribonucleotide frequencies are also captured by this method.
When considering this method, certain limitations should be taken into account: Although this method should, in theory, be applicable to virtually any organism, a suitable reference genome is necessary for the alignment of reads. Furthermore, the results obtained from our protocol represent the reads from a large number of cells. Specific ribonucleotide incorporation patterns of a subset of cells cannot be identified by this approach. If ribonucleotides are mapped in larger genomes with a very low number of ribonucleotides, it may be challenging to discriminate ribonucleotides from random nicks and appropriate controls are therefore needed.
The method we describe here, extends the available in vivo techniques such as HydEn-Seq16, Ribose-Seq17, Pu-Seq18, or emRiboSeq19. These approaches take advantage of the embedded ribonucleotides' sensitivity to alkaline or RNase H2 treatment, respectively, employing Next-generation sequencing to identify ribonucleotides genome-wide, which allows their mapping and the comparison of relative incorporation. By cleaving the DNA sequence specifically, as described above, in addition to alkaline hydrolysis at embedded ribonucleotides, the reads for ribonucleotides can be normalized to those cleavage sites, allowing not only the identification and mapping of ribonucleotides, but also their quantitation for each DNA molecule. The application of our technique in the context of diseases related to DNA replication, DNA repair, and TLS could provide a deeper understanding of the role of ribonucleotides in underlying molecular mechanisms and genome integrity in general.
The authors have nothing to disclose.
This study was supported by Swedish Research Council (www.vr.se) grants to ARC (2014-6466 and the Swedish Foundation for Strategic Research (www.stratresearch.se) to ARC (ICA14-0060). Chalmers University of Technology provided financial support to MKME during this work. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
10x T4 Polynucleotide Kinase Reaction Buffer | New England Biolabs | B0201S | |
10x T4 RNA Ligase Reaction Buffer | New England Biolabs | B0216L | |
1x PBS | Medicago | 09-9400-100 | dissolve 1 tablet in H2O to a final volume of 1 L |
2-Propanol | Sigma-Aldrich | 33539-1L-GL-R | |
2100 Bioanalyzer | Agilent Technologies | G2940CA | |
50 mL Centrifuge Tube | VWR | 525-0610 | |
Adenosine 5'-Triphosphate (ATP, 10 mM) | New England Biolabs | P0756S | dilute with EB to 2 mM |
Agilent DNA 1000 Kit | Agilent Technologies | 5067-1504 | |
BSA, Molecular Biology Grade (20 mg/mL) | New England Biolabs | B9000S | diltue with nuclease-free H2O to 1 mg/mL |
Buffer EB | QIAGEN | 19086 | referred to as EB |
CleanPCR paramagnetic beads | CleanNA | CPCR-0050 | |
Deoxynucleotide (dNTP) Solution Mix (10 mM each) | New England Biolabs | N0447L | dilute with EB to 2 mM |
DMEM, high glucose, GlutaMAX Supplement | Gibco | 61965026 | |
DynaMag 96 Side | Thermo Fisher | 12331D | |
Ethanol 99.5% analytical grade | Solveco | 1395 | dilute with milliQ water to 70% |
Ethylenediaminetetraacetic acid solution (EDTA, 0.5 M) | Sigma-Aldrich | 03690-100ML | |
Fetal bovine serum | Gibco | 10500056 | |
HEPES buffer pH 8.0 (1 M) sterile BC | AppliChem | A6906,0125 | |
Hexammine cobalt(III) chloride (CoCl3(NH3)6) | Sigma-Aldrich | H7891-5G | dissolve in nuclease-free H2O for 10 M solution, sterile filter. CAUTION: carcinogenic, sensitizing and hazardous to aquatic environment. |
HincII | New England Biolabs | R0103S | supplied with NEBuffer 3.1 |
Hybridiser HB-1D | Techne | FHB4DD | |
KAPA HiFi HotStart ReadyMix (2X) | Kapa Biosystems | KK2602 | |
Lysis buffer | 50 mM EDTA, 20 mM HEPES, NaCl 75 mM, Proteinase K (200 µg/mL), 1% SDS | ||
Micro tube 1.5 mL | Sarstedt | 72.690.001 | |
Microcentrifuge 5424R | Eppendorf | 5404000014 | |
Microcentrifuge MiniStar silverline | VWR | 521-2844 | |
Multiply µStripPro 0.2 mL tube | Sarstedt | 72.991.992 | |
Nuclease-free water | Ambion | AM9937 | |
Phenol – chloroform – isoamyl alcohol (25:24:1) | Sigma-Aldrich | 77617-500ML | |
Potassium chloride (KCl) | VWR | 26764.232 | dissolve in nuclease-free H2O for 3 M solution, sterile filter |
Potassium hydroxide (KOH) | VWR | 26668.296 | dissolve in nuclease-free H2O for 3 M solution, sterile filter |
Proteinase K | Ambion | AM2546 | |
Qubit 3.0 Fluorometer | Invitrogen | Q33216 | |
Qubit Assay Tubes | Invitrogen | Q32856 | |
Qubit dsDNA BR Assay Kit | Invitrogen | Q32850 | CAUTION: Contains flammable and toxic components |
Qubit dsDNA HS Assay Kit | Invitrogen | Q32851 | CAUTION: Contains flammable and toxic components |
Refrigerated Centrifuge 4K15 | Sigma Laboratory Centrifuges | No. 10740 | |
SDS Solution, 10% | Invitrogen | 15553-035 | |
Sodium acetate buffer solution, pH 5.2, 3 M (NaAc) | Sigma-Aldrich | S7899 | |
Sodium chloride (NaCl) | VWR | 27810.295 | dissolve in nuclease-free H2O for 5 M solution, sterile filter |
T100 Thermal Cycler | Bio-Rad | 1861096 | |
T4 Polynucleotide Kinase (3' phosphatase minus) | New England Biolabs | M0236L | |
T4 RNA Ligase 1 (ssRNA Ligase) | New England Biolabs | M0204L | supplied with PEG 8000 (50%) |
T7 DNA Polymerase (unmodified) | New England Biolabs | M0274S | supplied with 10x T7 DNA Polymerase Reaction Buffer |
TE Buffer | Invitrogen | 12090015 | |
ThermoMixer F2.0 | Eppendorf | 5387000013 |