We describe chromatin endogenous cleavage coupled with high-throughput sequencing (ChEC-seq), a chromatin immunoprecipitation (ChIP)-orthogonal method for mapping protein binding sites genome-wide with micrococcal nuclease (MNase) fusion proteins.
Genome-wide mapping of protein-DNA interactions is critical for understanding gene regulation, chromatin remodeling, and other chromatin-resident processes. Formaldehyde crosslinking followed by chromatin immunoprecipitation and high-throughput sequencing (X-ChIP-seq) has been used to gain many valuable insights into genome biology. However, X-ChIP-seq has notable limitations linked to crosslinking and sonication. Native ChIP avoids these drawbacks by omitting crosslinking, but often results in poor recovery of chromatin-bound proteins. In addition, all ChIP-based methods are subject to antibody quality considerations. Enzymatic methods for mapping protein-DNA interactions, which involve fusion of a protein of interest to a DNA-modifying enzyme, have also been used to map protein-DNA interactions. We recently combined one such method, chromatin endogenous cleavage (ChEC), with high-throughput sequencing as ChEC-seq. ChEC-seq relies on fusion of a chromatin-associated protein of interest to micrococcal nuclease (MNase) to generate targeted DNA cleavage in the presence of calcium in living cells. ChEC-seq is not based on immunoprecipitation and so circumvents potential concerns with crosslinking, sonication, chromatin solubilization, and antibody quality while providing high resolution mapping with minimal background signal. We envision that ChEC-seq will be a powerful counterpart to ChIP, providing an independent means by which to both validate ChIP-seq findings and discover new insights into genomic regulation.
Mapping the binding sites of transcription factors (TFs), chromatin remodelers, and other chromatin-associated regulatory factors is key to understanding all chromatin-based processes. While chromatin immunoprecipitation and high-throughput sequencing (ChIP-seq) approaches have been used to gain many important insights into genome biology, they have notable limitations. We recently introduced an alternative method, termed chromatin endogenous cleavage and high-throughput sequencing (ChEC-seq)1, to circumvent these drawbacks.
ChIP-seq is most often performed with an initial formaldehyde crosslinking step (X-ChIP-seq) to preserve protein-DNA interactions. However, a number of recent studies have indicated that X-ChIP-seq captures transient or nonspecific protein-DNA interactions2,3,4,5,6,7,8, giving rise to false positive binding sites. In addition, sonication, commonly used to fragment chromatin in X-ChIP-seq experiments, preferentially shears regions of open chromatin, leading to biased recovery of fragments from these regions9,10. Sonication also yields a heterogeneous mixture of fragment lengths, ultimately limiting binding site resolution, though the addition of an exonuclease digestion step can greatly improve resolution11,12. Native ChIP methods such as occupied regions of genomes from affinity-purified naturally isolated chromatin (ORGANIC)13 do not use crosslinking and fragment chromatin with micrococcal nuclease (MNase), alleviating potential biases associated with formaldehyde crosslinking and sonication. However, the solubility of many chromatin-bound proteins under the relatively mild conditions required for native chromatin extraction is poor, potentially leading to reduced dynamic range and/or false negatives14.
While various iterations of ChIP-seq are most commonly used for genome-wide mapping of protein-DNA interactions, several mapping techniques based on fusion of proteins of interest to various DNA-modifying enzymes have also been implemented. One such approach is DNA adenine methyltransferase identification (DamID)15, wherein a chromatin-binding protein of interest is genetically fused to Dam and this fusion is expressed in cells or animals, resulting in methylation of GATC sequences proximal to binding sites for the protein. DamID is advantageous in that it does not rely upon immunoprecipitation and so avoids crosslinking, antibodies, or chromatin solubilization. It is also performed in vivo. However, the resolution of DamID is limited to the kilobase scale and the methylating activity of the Dam fusion protein is constitutive. A second method based on enzymatic fusion is Calling Card-seq16, which employs fusion of a factor of interest to a transposase, directing site-specific integration of transposons. Like DamID, Calling Card-seq is not based on immunoprecipitation and thus has similar advantages, with the added benefit of increased resolution. However, Calling Card-seq may be limited by sequence biases of transposases and is also reliant on the presence of restriction sites close to transposon insertion sites.
A third enzymatic fusion method, developed in the Laemmli lab, is chromatin endogenous cleavage (ChEC)17. In ChEC, a fusion between a chromatin-associated protein and MNase is expressed in cells, and upon calcium addition to activate MNase, DNA is cleaved proximal to binding sites for the tagged factor (Figure 1). In conjunction with Southern blotting, ChEC has been used to characterize chromatin structure and protein binding at a number of individual loci in yeast17,18, and has been combined with low-resolution microarray analysis to probe the interaction of nuclear pore components with the yeast genome19. ChEC offers benefits similar to DamID and Calling Card-seq, and its resolution is nearly single-base pair when analyzed by primer extension19. ChEC is also controllable: robust DNA cleavage by MNase depends on the addition of millimolar calcium, ensuring that MNase is inactive at the low free calcium concentrations observed in live yeast cells20.
Previously, we postulated that combining ChEC with high-throughput sequencing (ChEC-seq) would provide high-resolution maps of TF binding sites. Indeed, ChEC-seq generated high-resolution maps of the budding yeast general regulatory factors (GRFs) Abf1, Rap1, and Reb1 across the genome1. We have also successfully applied ChEC-seq to the modular Mediator complex, a conserved, essential global transcriptional coactivator21, expanding the applicability of ChEC-seq to megadalton-size complexes that do not directly contact DNA and may be difficult to map by ChIP-based methods. ChEC-seq is a powerful method both for independent validation of ChIP-seq results and generation of new insights into the regulation of chromatin-resident processes. Here, we present a step-by-step protocol for the implementation of this method in budding yeast.
1. Generation of Yeast Strains
2. ChEC
3. Size Selection
NOTE: The goal of size selection is to remove multi-kilobase fragments of genomic DNA from the sample to be sequenced and enrich fragments of ~150 bp (approximately the size of nucleosomal DNA) or smaller. In sequencing data, fragments <400 bp are enriched, with a notable peak around the size of nucleosomal DNA (~150 bp) and a peak or broad distribution of subnucleosomal fragments.
In the case of a successful ChEC experiment, analysis of DNA by agarose gel electrophoresis will reveal a calcium-dependent increase in DNA fragmentation over-time, as indicated by smearing and eventual complete digestion of genomic DNA. In some cases, a ladder of bands similar to that seen with a traditional MNase digestion is observed after extended digestion. This is the case for ChEC analysis of Reb1, a general regulatory factor that binds nucleosome-depleted regions (NDRs) (Figure 2). We have found, in the case of GRFs, extended digestion leads to loss of signal at rapidly cleaved binding sites1. Ideally, a sample with a reduction in the size of the genomic DNA band that displays high-molecular weight smearing, such as the 30 s and 1 min time points for Reb1 ChEC, will be used for sequencing to avoid time-dependent loss of signal.
Visualization of fragment ends from a Reb1 ChEC-seq experiment reveals robust enrichment of Reb1 over background derived from both the Reb1-MNase strain without calcium added and the free MNase strain after an equal duration of free calcium treatment (30 s) (Figure 3). Similarly, we observe substantial DNA cleavage by a fusion of the Mediator subunit Med8 with MNase, but not free MNase driven by the MED8 promoter, 1 min after calcium addition (Figure 3).
To compare Reb1 ChEC-seq data to ChIP-seq, we obtained a list of 1,991 Reb1 peaks determined by ORGANIC13. These peaks were motif-centered with the average fragment end count at each base position in a 100 bp window around the motif midpoint. We observed a striking asymmetry in cleavage, with the majority of fragment ends mapping to the upstream side of the motif (Figure 4).
Figure 1: Schematic of the ChEC-seq Method. A chromatin-associated protein (CAP)-MNase fusion is expressed in yeast cells. The protein binds to DNA but does not generate cleavage above background levels due to the very low free calcium in the nucleus. Upon permeabilization of cells with digitonin and addition of millimolar calcium, CAP-MNase fusions bound to the genome cleave DNA, releasing small fragments. These fragments are then purified, sequenced, and mapped back to the genome, giving peaks of fragment ends proximal to binding sites for the CAP-MNase fusion. Please click here to view a larger version of this figure.
Figure 2: Agarose Gel Electrophoresis of DNA from a Reb1 ChEC Experiment. A 5 µL aliquot of DNA from each ChEC time point was analyzed on a 1.5% TAE-agarose gel prior to size selection. This shows progressive digestion of genomic DNA by the Reb1-MNase fusion. Please click here to view a larger version of this figure.
Figure 3: Genome Browser Snapshots of Reb1-MNase and Free MNase ChEC-seq Experiments. IGV views of fragment end signal for Reb1 and free MNase ChEC-seq 30 s after calcium addition and Med8 and free MNase ChEC-seq 1 min after calcium addition along a representative segment of the yeast genome. Datasets were normalized by dividing the number of fragment ends mapped to each base position by the total number of fragment ends mapped and multiplying by the total number of bases mapped. Please click here to view a larger version of this figure.
Figure 4: Enrichment of Reb1-MNase-released Fragment Ends around Reb1 ORGANIC Sites. Average plot of 30 s Reb1 and free MNase ChEC-seq fragment end signal around 1,991 Reb1 motifs determined by ORGANIC13. Data were normalized as in Figure 3. Please click here to view a larger version of this figure.
Plasmid | Yeast selectable marker | Notes | Addgene plasmid number |
pGZ108 | kanMX6 | 3xFLAG-MNase tagging, 33 aa linker | 70231 |
pGZ109 | HIS3MX6 | 3xFLAG-MNase tagging, 33 aa linker | 70232 |
pGZ110 | TRP1 | 3xFLAG-MNase tagging, 33 aa linker | 70233 |
pGZ136 | URA3 | Expresses 3xFLAG-MNase-SV40 NLS under the control of the REB1 promoter | 72273 |
pGZ173 | kanMX6 | MNase tagging, 8 aa linker | 70234 |
Table 1: Details of ChEC Plasmids. All tagging vectors are based on pFA6a vectors and so are compatible with the commonly used F2/R1 tagging primer pairs. The tagging cassette consists of a linker of the indicated length, a 3xFLAG epitope to facilitate detection by western blotting (except in the case of pGZ173, where the linker has been shortened to remove the 3xFLAG tag), the mature chain of MNase (GenBank P00644, aa 83 – 231), and the indicated selectable marker.
Reagent | Volume | [Final] |
5x PCR buffer | 10 mL | 1x (2 mM MgCl2) |
10 mM dNTP mix (2.5 mM each dNTP) | 1 mL | 200 mM (50 mM each dNTP) |
10 mM F2 primer | 2.5 mL | 0.5 mM |
10 mM R1 primer | 2.5 mL | 0.5 mM |
pGZ108/109/110/172 (1-5 ng/mL) | 1 mL | |
2 U/µL hot start high-fidelity polymerase | 0.5 mL | 1 U |
ddH2O | 32.5 mL |
Table 2: Reaction Mixture for PCR Amplification of 3xFLAG-MNase Tagging Cassettes. F2/R1 primer sequences can be found at http://yeastgfp.yeastgenome.org/yeastGFPOligoSequence.txt.
Temperature (°C) | Time | Cycles |
98 | 30 s | 1 |
98 | 10 s | 25 |
55 | 30 s | 25 |
72 | 1 min 15 s | 25 |
72 | 2 min | 1 |
10 | Forever | Hold |
Table 3: Thermal Cycling Conditions for PCR Amplification of 3xFLAG-MNase Tagging Cassettes. Incubation temperature and extension time may need to be adjusted based on the DNA polymerase used.
Reagent | Volume | [Final] |
10x Taq buffer | 5 mL | 1x (1.5 mM MgCl2) |
10 mM dNTP mix (2.5 mM each dNTP) | 1 mL | 200 mM (50 mM each dNTP) |
10 mM Check primer | 1 mL | 0.2 mM |
10 mM MNase-R or negative control primer | 1 mL | 0.2 mM |
5 U/mL Taq polymerase | 0.5 mL | 2.5 U |
ddH2O | 41.5 mL |
Table 4: Reaction Mixture for Colony PCR Confirmation of MNase Tagging. Check primer sequences can be found at http://yeastgfp.yeastgenome.org/yeastGFPOligoSequence.txt. The check primer is used as the forward primer along with a reverse primer within MNase (5'-TTGTGCAGCTTCTTGGTAC-3') and a reverse primer ∼500 base pairs (bp) downstream of the respective gene as a negative control.
Temperature (°C) | Time | Cycles |
95 | 5 min | 1 |
95 | 30 s | 35 |
55 | 30 s | 35 |
68 | 1 min | 35 |
68 | 5 min | 1 |
10 | Forever | Hold |
Table 5: Thermal Cycling Conditions for Colony PCR Confirmation of MNase Tagging. Incubation temperature and extension time may need to be adjusted based on the DNA polymerase used.
Reagent | Volume | [Final] |
1 M Tris, pH 7.5 | 1.5 mL | 15 mM |
1 M KCl | 8 mL | 80 mM |
0.2 M EGTA | 50 mL | 0.1 mM |
ddH2O | Fill to 100 mL |
Table 6: Recipe for Buffer A. Prior to use, add half of a protease inhibitor tablet, 50 μL of 100 mM PMSF (1 mM final), 5 μL of 200 mM spermine (0.2 mM final), and 2.5 μL of 1 M spermidine (0.5 mM final) per 5 mL of solution.
Reagent | Volume | [Final] |
5 M NaCl | 8 mL | 400 mM |
0.5 M EDTA | 4 mL | 20 mM |
0.2 M EGTA | 2 mL | 4 mM |
ddH2O | Fill to 100 mL |
Table 7: Recipe for 2x Stop Solution. Combine 90 μL of stop solution with 10 μL of 10% SDS in a microfuge tube for each time point to be taken (see step 2.4).
We have shown that ChEC can map diverse classes of yeast proteins on chromatin, and anticipate that it will be broadly applicable to different families of TFs and other chromatin-binding factors in yeast. ChEC-seq is advantageous in that it does not require crosslinking, chromatin solubilization, or antibodies. Thus, ChEC avoids artifacts potentially present in X-ChIP-seq, such as the hyper-ChIPable artifact3,4, and native ChIP, such as false negatives due to incomplete protein solubilization14. The major drawback of ChEC-seq, as with all enzymatic fusion methods, is the requirement for a fusion protein. This can be achieved quickly in budding yeast, but is more laborious in metazoans. Another limitation of ChEC-seq is that it cannot be implemented as-is for profiling of histone modifications. However, a modification-binding domain could be fused to MNase and expressed, enabling histone modification mapping with ChEC-seq. In this vein, ChIP-seq using epitope-tagged histone modification-binding domains has been used to map genome-wide patterns of histone modification26.
The use of a free MNase control is important in establishing the specificity of ChEC-seq results. The free MNase strain bears MNase tagged with 3xFLAG and a simian virus 40 nuclear localization signal (SV40 NLS) under the control of the endogenous promoter for the gene of interest. It is conceptually similar to the unfused Dam control used in DamID studies27 and controls for nonspecific cleavage due to chromatin accessibility. As a free MNase control for TF ChEC-seq, we integrated 3xFLAG-MNase-SV40 NLS driven by the REB1 promoter at the ura3 locus1. For Mediator ChEC-seq, we expressed 3xFLAG-MNase-SV40 NLS under the control of the MED8 promoter in a non-integrating plasmid vector maintained in selective medium21. In both cases, very little cleavage by free MNase was observed. Thus, a single control strain wherein 3xFLAG-MNase-SV40 NLS is driven by a relatively robust promoter should be appropriate for most experiments.
When planning a ChEC experiment, structural consideration of the protein of interest may be important. For instance, we found that appending MNase to the C-terminus of Reb1 gave an asymmetric cleavage pattern around Reb1 motifs, while expression of Reb1 with N-terminal MNase gave a symmetric cleavage pattern1. We attribute these different cleavage patterns to the structure of Reb1. The DNA binding domain of Reb1 is located at its C-terminus; thus, the C-terminus is structured and close to DNA, limiting the range of motion of MNase and only allowing C-terminal cleavage. In contrast, the C-terminus of Abf1 is relatively unstructured and may thus act to increase to reach of MNase, allowing cleavage on both sides of Abf1 binding sites. In the case of Rap1, another yeast GRF, we found that shortening the linker between the C-terminus and MNase from 33 to 8 aa was sufficient to severely reduce cleavage, perhaps due to the proposed distance of the C-terminal portion of Rap1 from DNA28. Such structural consideration is also likely to be important when attempting to map the binding of proteins present in large complexes. For our ChEC-seq analysis of Mediator binding to the yeast genome21, we selected subunits whose C-termini were structurally predicted to be exposed, rather than buried within the complex. Of the three subunits fused to MNase, one did not robustly cleave DNA, suggesting that either steric constraints, or interactions with other DNA binding factors attenuated DNA cleavage.
We found that ChEC-seq detects 4-12 times more peaks for the yeast GRFs Abf1, Rap1, and Reb1 than various ChIP approaches when all time points were considered together1. These sites could be divided into two classes based on MNase cleavage kinetics. Analysis of ChEC-seq data for yeast GRFs revealed two temporally distinct classes of binding sites1. The first, termed 'fast,' displayed maximal cleavage <1 min after calcium addition and generally containing robust matches to known consensus motifs. The second, termed 'slow,' took several minutes to reach appreciable levels of cleavage and were depleted of motif matches. Both classes of sites were, on average, enriched for binding in ChIP datasets, though they were not necessarily called as peaks in those ChIP studies. The majority of sites for a given factor were slow sites. We speculate that fast sites are representative of high-affinity transcription factor binding sites, while slow sites represent loci transiently sampled during binding site scanning. The observation of two classes of binding sites separated by the kinetics of MNase cleavage may explain the observation in many ChIP-seq datasets that a large number of peaks for a given factor with a well-established DNA binding specificity do not contain a consensus motif29: formaldehyde crosslinking, performed for 10-15 min in most ChIP protocols, can repeatedly capture transient or opportunistic interactions of a factor with the genome, leading to inflation of signal. Indeed, comparison of ChIP-exo and live cell imaging suggests that this is the case2. Notably, we did not observe such a dramatic time-dependent decrease in Mediator ChEC-seq signals21, with Mediator ChEC-seq signal relatively constant from 30 s to 20 min. Given these observations, we recommend performing short-duration ChEC-seq to recover high-confidence binding sites.
We anticipate that ChEC-seq can be adapted to non-yeast systems. The concentration of free calcium in unstimulated mammalian cells is 50-300 nM30,32, far below the threshold for efficient MNase activation. The limiting step for establishment of ChEC-seq in metazoan systems is thus the generation of fusion proteins, which remains far more laborious in metazoan than yeast cells. However, endogenous tagging of metazoan genes has been greatly facilitated by the advent of CRISPR-based genome engineering33, and so this limitation should be readily surmounted. Alternatively, MNase fusion proteins could be expressed at low levels from plasmids. This approach has been quite successful for DamID in Drosophila34,35 and human36 cells and thus might also facilitate ChEC-seq in metazoan cells.
The authors have nothing to disclose.
We thank Moustafa Saleh and Jay Tourigny for critical reading of the manuscript and Steven Hahn and Steven Henikoff for mentorship and support during the development of ChEC-seq and its application to the Mediator complex. S.G. is supported by NIH grants R01GM053451 and R01GM075114 and G.E.Z. is supported by Indiana University startup funds.
dNTPs | NEB | N0447 | |
Q5 high-fidelity DNA polymerase | NEB | M0491L | Other high-fidelity DNA polymerases, such as Phusion, may be used for cassette amplification. |
TrackIt 1 Kb Plus DNA ladder | ThermoFisher Scientific | 10488085 | |
Taq DNA polymerase | NEB | M0273L | |
cOmplete Mini EDTA-free protease inhibitor cocktail | Sigma-Aldrich | 11836170001 | It is important that an EDTA-free protease inhibitor mix is used, so as not to inhibit Mnase cleavage by chelation of Ca2+. |
PMSF | ACROS Organics | AC215740010 | |
Digitonin, High Purity | EMD Millipore | 300410-250MG | Make a 2% stock by dissolving 20 mg digitonin in 1 mL DMSO with vigorous vortexing. |
Proteinase K, 20 mg/mL | Invitrogen | 25530049 | |
RNase A, 10 mg/mL | ThermoFisher Scientific | EN0531 | |
Ampure XP beads | Beckman Coulter | A63880 | Ampure-like beads can be generated using a published protocol (ref 24). |
MagneSphere magnetic rack | Promega | Z5342 |