A protocol for mapping the three-dimensional genome organization with nucleosome resolution using the genome-wide chromosome conformation capture method Micro-C-XL is presented here.
Three-dimensional (3D) chromosome organization is a major factor in genome regulation and cell-type specification. For example, cis-regulatory elements, known as enhancers, are thought to regulate the activity of distal promoters via interaction in 3D space. Genome-wide chromosome conformation capture (3C)-technologies, such as Hi-C, have transformed our understanding of how genomes are organized in cells. The current understanding of 3D genome organization is limited by the resolution with which the topological organization of chromosomes in 3D space can be resolved. Micro-C-XL measures chromosome folding with resolution at the level of the nucleosome, the basic unit of chromatin, by utilizing micrococcal nuclease (MNase) to fragment genomes during the chromosome conformation capture protocol. This results in an improved signal-to-noise ratio in the measurements, thus facilitating the better detection of insulation sites and chromosome loops compared to other genome-wide 3D technologies. A visually supported, detailed, step-by-step protocol for preparing high-quality Micro-C-XL samples from mammalian cells is presented in this article.
Micro-C-XL is a genome-wide technique to measure 3D genome conformation with nucleosome resolution. Micro-C-XL builds on the widely used proximity ligation-based Hi-C technology, which has transformed our understanding of how 3D genomes are organized1. Micro-C-XL and its first iteration, Micro-C, were initially developed in Saccharomyces cerevisiae2,3 and later adapted to mammalian cell systems, for which the protocol has demonstrated its full potential in detecting short-range features of the 3D genome, such as chromosome loops and insulation sites. This version is based on recent mammalian Micro-C-XL publications4,5. As Micro-C-XL supersedes Micro-C, Micro-C-XL is henceforth referred to as Micro-C in the manuscript.
The major differences between Micro-C and Hi-C6 are as follows: 1) genome fragmentation with micrococcal nuclease (MNase) compared to restriction enzymes and 2) additional crosslinkers with larger atomic spacing between the reactive groups compared to only formaldehyde. Both steps contribute significantly to the improved signal-to-noise ratio of Micro-C compared to conventional Hi-C. The fragmentation size limits the resolution to which the 3D genome organization can be resolved during the proximity ligation protocol. MNase is a nuclease that preferentially digests accessible DNA and leaves nucleosomal-protected DNA intact. Nucleosome footprinting using MNase-sequencing has shown that nucleosomes fully cover most eukaryotic genomes7. As nucleosomes are distributed throughout the genome with an average spacing of 160-220 bp, depending on species and cell type, MNase is the ideal enzyme for high-resolution mapping of the genome architecture.
The use of an additional crosslinker in combination with formaldehyde (FA) in the Micro-C method additionally improves the signal-to-noise ratio2,8. Amine-specific crosslinkers with longer atomic spacers between the reactive groups facilitate protein-protein crosslinks. These are typically disuccinimidyl glutarate (DSG) or ethylene glycol bis-succinimidyl succinate (EGS) with 7.7 Å and 16.1 Å spacers, respectively. The reduction of noise through EGS or DSG is particularly apparent in experiments with high fragmentation rates, such as Micro-C, and presumably occurs due to a reduction in the rate of random ligation events8.
A recently developed Hi-C 3.0 protocol that utilizes ESG/DSG crosslinking and multiple combinations of restriction enzymes reduces noise in Hi-C experiments and significantly improves the detection of chromosome loops and insulation sites8,9. Still, a site-by-site comparison of various interaction data features found that Micro-C had superior detection of short-range features, such as chromosome loops and insulation sites, compared to both Hi-C 3.0 and conventional Hi-C8. However, Hi-C 3.0 does improve the detection of short-range features and maintains strong detection of genome compartmentalization compared to conventional Hi-C. In summary, the choice of a chromosome conformation capture method should be determined by the objective and biological question.
Here, we provide a step-by-step protocol for successful Micro-C experiments that can unravel 3D genome organization.
1. Cell culture and crosslinking
2. MNase titration
NOTE: Performing an MNase titration is necessary to determine the optimal concentration of MNase before processing the double-crosslinked cells' preparative library.
3. Preparative MNase digestion
4. DNA end processing and proximity ligation
5. Di-nucleosomal DNA purification and size selection
6. Preparation of the streptavidin beads
7. Streptavidin pull-down and on-bead library preparation
8. Estimation of the required PCR cycles
NOTE: Estimating the required PCR cycles for library amplification is advisable. Typically, a Micro-C library requires 8-15 cycles of PCR. Although the step is not essential, it helps avoid over-amplification and reduces the risk of PCR duplicates.
9. Sequencing library amplification
10. DNA sequencing and data processing
The successful preparation of Micro-C libraries can be evaluated in several steps of the protocol. The most important step is the choice of a proper MNase digestion degree. Therefore, the MNase concentration must be titrated to consistently yield 70%-90% mono-nucleosomes over di-nucleosomes for every sample. It is important to note that chromatin digestion is different for eu- and hetero-chromatin, with MNase digesting heterochromatin less efficiently. Thus, the optimal digestion degree depends on the chromatin region of interest and the cell type studied since the relative proportion of eu- and hetero-chromatin is cell type-specific. Therefore, it is advisable to carefully titrate the MNase concentration required and to first evaluate the Micro-C experiment's success by low-input sequencing.
A typical MNase titration pattern of chromatin treated with decreasing amounts of MNase is shown in Figure 1A. Here, chromatin from 250,000 cells per reaction is digested with a four-fold dilution of MNase. The highest concentration (10 U of MNase, Lane 2) shows over-digested chromatin almost exclusively consisting of mono-nucleosomal DNA (~150 bp). Notably, the center of the mono-nucleosomal band runs lower in the agarose gel compared to the corresponding bands in the samples with reduced MNase concentrations, indicating an over-digestion of nucleosomal DNA. Over-digested nucleosomes are inefficiently ligated in the proximity ligation reaction; therefore, the sample in Lane 2 is suboptimal for Micro-C experiments. Lane 3 (2.5 U of MNase) displays an almost appropriate digestion degree for Micro-C experiments. Here, the mono-nucleosomal band is the dominant species, and the subnucleosomal smear, indicative of over-digested nucleosomes, is reduced; however, it is still present. The digestion degree in Lane 4 (0.635 U of MNase) is an ideal condition for a Micro-C experiment in this titration example. A clear mono-nucleosomal band without sub-nucleosomal DNA is present. The band intensity for the mono- and di-nucleosome DNA is almost equal, indicating a mono-nucleosome yield of 66% or higher. It is worth noting that the di-nucleosomal DNA is approximately twice the size of the mono-nucleosomal DNA (~320 bp vs. ~150 bp), so its band intensity per mole of DNA is twice as high compared to its mono-nucleosomal counterpart. The digestion degree in Lane 5 (0.156 U of MNase) shows under-digested chromatin with almost no nucleosomal DNA, and this, therefore, represents a suboptimal sample.
In conclusion, in this example, the digestion of 2.5 x 105 mouse ES cells with 0.625 U of MNase (corresponding to 2.5 U of MNase for 1 x 106 cells in 200 µL) offers the most promising starting point for preparative digestions in Micro-C experiments. However, an intermediate MNase concentration between the conditions used for the samples in Lane 3 and Lane 4 (corresponding to 5 U of MNase for 1 x 106 cells in 200 µL) should also be considered. Importantly, chromatin digestion with MNase cannot be scaled linearly, and it is not recommended to upscale the preparative digestion more than 4x. To prepare Micro-C libraries from more than 1 x 106 cells, it is advisable to digest the chromatin in aliquots of 1 x 106 cells and pool them after MNase inactivation.
To assess the success of the proximity ligation protocol, the input control, which is MNase-digested and not proximity ligated (step 3.8), should be compared to the proximity-ligated sample (step 5.3) by 1.5% agarose gel electrophoresis (Figure 1B). The proximity-ligated mono-nucleosome band has an approximate size of 300 bp, similar to that of di-nucleosomes. Therefore, the mono- to di- nucleosomal band signal ratio should shift from predominantly mono-nucleosomes (Lane 1) toward di-nucleosomes (Lane 3 and Lane 4). As the agarose gel in this step is the di-nucleosomal DNA that is excised and purified, splitting the samples into multiple lanes is advisable to avoid over-loading.
Assessing the quality and quantity of the prepared sequencing library by minimal PCR is recommended. Here, DNA from 1 µL of beads (1/20 of the total sample) is amplified for 16 cycles in 10 µL of PCR reaction. The total concentration of the minimal PCR library typically ranges from 50-500 ng after 16 PCR cycles. In theory, this corresponds to a 1-10 µg library from the remaining 19 µL sample if it were also amplified for 16 cycles. It is recommended to use the minimum number of PCR cycles needed to generate a library of approximately 100 ng from the total DNA. Assuming logarithmic amplification in the PCR, the theoretical concentration of the DNA obtained from the 19 µL input at 16 cycles can be divided successively by two to calculate the number of PCR cycles required to generate a 100 ng library. For example, a 100 ng yield from 1 µL after 16 cycles corresponds to a 1,900 ng yield amplified from 19 µL. In this scenario, 12 cycles should ideally generate a 118 ng sequencing library from the total DNA (1,900 ng/[2 × 2 × 2 × 2] = 118 ng). The remaining 9 µL sample from the minimal PCR can then be used to assess the quality of the library by agarose gel electrophoresis (Figure 1C). Visualization should show one distinct band at 420 bp and no bands for adaptor dimers (120 bp). Smaller fragments may also appear, and these correspond to unused PCR primers.
Next, analyzing and confirming successful Micro-C sample preparation by low-input sequencing is recommended before committing to resource-intensive deep sequencing. Typically, libraries are sequenced to a read depth of 5 x 106 to 1 x 107 and evaluated based on the following criteria: the sequencing read duplication rate, the cis- versus trans-chromosomal interaction rate, and the sequencing read orientation frequency. The Micro-C libraries are processed with Distiller, a full-service pipeline that processes the data from sequencing read files (Fastq format) to read-pairs files (Bedpe format) and scalable interaction matrices (Cool and Mcool formats) using cooler, pairtools, and cooltools10,11,12. The pipeline also generates a summary file that is ideal for assessing the quality of the Micro-C libraries10 (https://github.com/open2c/distiller-nf). The PCR duplication rate provides information on the sequencing library complexity and can be extracted from the *.stats file generated. High-quality Micro-C libraries have less than 5%-10% PCR duplication rates when generated from 5 million or more cells. Notably, some sequencing platforms generate PCR duplicates during cluster formation independent of the sequencing library complexity. Figure 2A shows the relative duplication rates of two experiments: one that we consider a good sample and one bad sample. In this example, both samples displayed acceptable map rates. The next criterias to assess the quality of Micro-C libraries is the cis versus trans ratio and read orientation frequencies. Within the nucleus, chromosomes inhabit individual chromosome territories and, thus, rarely interact with other chromosomes. A high rate of detected trans-chromosomal interactions indicates a high rate of random ligations. It should be noted that at this level of analysis, the bad sample showed a high rate of trans-chromosomal interactions compared to the good sample (Figure 2B). For Micro-C, a 70% or higher cis-chromosomal interaction rate is desirable.
A Micro-C library has a fragment size similar to the di-nucleosomal DNA band, which can co-purify with the proximity ligated sample and contaminate the experiment. These contaminates are always cis-chromosomal interactions. Therefore, it is important to also evaluate the read orientation rates. The rate of di-nucleosomal contamination can be estimated by low-input sequencing. Di-nucleosomal DNA stems from two neighboring nucleosomes that have not been cleaved by MNase. Thus, the resulting sequencing reads will always display a forward-reverse read orientation (F and R), and the distance between the read pairs will be around 320 bp. Proximity-ligated fragments, in comparison, can be ligated in four orientations, yielding read-pairs with F-R, R-R, R-F, and F-F, ideally with equal abundance (Figure 2C). In addition, they display various distances between the two read pairs. To estimate the quantity of di-nucleosomal contaminates, the frequency of read orientations can be calculated from the *stats files generated by distiller (Figure 2D). Notably, in this work, the fraction of F-R reads (red) was higher in the bad sample compared to the good sample, and this became more apparent when the read orientations were stratified by distance (Figure 2E). The F-R fraction is dominated by di-nucleosomal fragments compared to Micro-C libraries when the read pairs are stratified into reads with distances <562 bp or ≥562 bp. Here, the fraction of reads with distance <562 bp are dominated by F-R reads, whereas the fraction with distances ≥562 bp displays an even distribution between the four possible orientations, indicating that the global over-representation of F-R reads stems from di-nucleosomal contaminants. The choice of 562 bp as the threshold for subsetting is defined by the binning in the *stats file generated. Although not necessary for this quality control, more defined subsetting can be achieved by extracting the distances from the *pairs file, which is also generated by distiller. It is important to note that di-nucleosomal reads do not decrease the quality of the Micro-C sample as they can be identified and ignored during data processing. However, they do not contain valuable information about 3D interactions, and they dilute the informative reads.
Thus, careful MNase titration and thorough quality control with low-input sequencing are the best tools to optimize the quality of Micro-C experiments.
Figure 1: Intermediate stages of the Micro-C protocol. (A) Agarose gel electrophoresis of chromatin from 2.5 x 105 mouse ES cells digested with varying MNase concentrations. The mono-, di-, and tri-nucleosomal bands are indicated by arrows. M: DNA ladder (Lane 1/6); 10 U of MNase per 250,000 cells (Lane 2); 2.5 U of MNase per 250,000 cells (Lane 3); 0.625 U of MNase per 250,000 cells (Lane 4); 0.156 U of MNase per 250,000 cells (Lane 2). (B) The 1.0% agarose gel electrophoresis of the Micro-C prepared samples (Lane 3 and Lane 4) and the MNase digested input control (Lane 1). Lane 1 and Lane 2 (M: DNA ladder) are enhanced to emphasize the relative change in mono- to di-nucleosomal fragment intensity. The mono- and di-nucleosomal bands are indicated by arrows. The di-nucleosomal band in the proximity-ligated sample combines di-nucleosomal and Micro-C library DNA. (C) The 1.0% agarose gel electrophoresis of the Micro-C sequencing libraries amplified from 1 µL sample to evaluate the quality. Lane 1 (M): DNA ladder; Lane 2 (S): Mirco-C library. (D) Fragment Analyzer trace of the final Micro-C library. Please click here to view a larger version of this figure.
Figure 2: Sample statistics for the low-input sequencing of a good sample and a bad sample. (A) Bar graph of the percentage mapped (green) and unmapped (red) reads. (B) Normalized fraction of reads mapping cis and trans-chromosomal interactions. The data sets were normalized to the cis-mapping read. The cis-mapping reads were stratified by the distance between the first and the second reads of the paired-end sequenced samples: ≤1 kbp (yellow), >1 kbp and ≤10 kbp (orange), and >10 kbp (red). (C) Schematic of the potential molecular species with the di-nucleosomal sizes. (D) Percentages of read-pair orientations of all the reads of the good sample and the bad sample. (E) Same as panel (D) but stratified by distances (left, <562 bp and right, ≥562 bp). Please click here to view a larger version of this figure.
Components | 1x | 4.4x |
10x NEBuffer 2.1, | 10 µL | 44 µL |
2 µL 100 mM ATP | 2 µL | 8.8 µL |
100 mM DTT | 5 µL | 22 µL |
H2O | 68 µL | 299.2 µL |
10 U/µL T4 PNK | 5 µL | 22 µL |
Total | 90 µL | 396 µL |
Table 1: Micro-C master mix 1. Composition of the master mix for the end chewing reaction.
Components | 1x | 4.4x |
1 mM Biotin-dATP | 10 µL | 44 µL |
1 mM Biotin-dCTP | 10 µL | 44 µL |
10 mM mix of dTTP and dGTP | 1 µL | 4.4 µL |
10x T4 DNA Ligase Buffer | 5 µL | 22 µL |
200x BSA | 0.25 µL | 1.1 µL |
H2O | 23.75 µL | 104.5 µL |
Table 2: Micro-C master mix 2. Composition of the master mix for the end labeling reaction.
Components | 1x | 4.4x |
10x NEB T4 Ligase reaction buffer | 50 µL | 220 µL |
H2O | 422,5 µL | 1859 µL |
T4 DNA Ligase | 25 µL | 110µL |
Table 3: Micro-C master mix 3. Composition of the master mix for the proximity ligation reaction.
Components | 1x | 4.4x |
10x NEBuffer 1.1 | 20 µL | 88 µL |
H2O | 180 µL | 792 µL |
ExoIII nuclease | 10 µL | 44 µL |
Table 4: Micro-C master mix 4. Composition of the master mix for the biotin removal reaction.
The success of a Micro-C experiment depends on a few critical steps in the protocol that need to be carefully executed. First, crosslinking with the additional crosslinker DSG or EGS can lead to the aggregation of cells, depending on the cell type. Adding 0.1%-0.5% BSA to the crosslinking reaction significantly reduces the aggregation without affecting the crosslinking efficiency. Inefficient crosslinking can result in increased rates of trans-chromosomal interactions that are indicative of random ligations. The second, but most crucial, step in this protocol is the digestion of chromatin with MNase. Suboptimal chromatin digestion leads to inefficient proximity ligation (over-digestion) or increased rates of non-proximity-ligated di-nucleosomes (under-digestion). The efficiency of the ligation reaction can be evaluated by agarose gel electrophoresis (Figure 1B) and is additionally best estimated by low-input sequencing. If the low-input sequencing reveals either a high duplication rate (inefficient ligation) or increased di-nucleosome rates, the MNase digestion degree should be re-evaluated. Notably, the loss of sample when executing the protocol can lead to reduced library complexity. The concentration of a sample is best evaluated after DNA purification (step 5.3) or by minimal PCR (step 8). The total yield of DNA from 5 x 106 mammalian cells after DNA purification is typically >2 µg. The DNA concentration should be controlled after the MNase digestion, ExoIII digestion, and DNA purification. Endogenous nucleases, the abundance of which is cell type-specific and species-specific, can be a source of DNA degradation. In addition, column-based DNA purification can lead to sample loss due to incompatibility with the SDS from deproteination reactions. Ethanol precipitation can be considered if the DNA concentration is low at this step.
As Micro-C requires sample-specific MNase titration, it is challenging to apply Micro-C to small cell populations, such as with small organs from various model organisms, embryos and single cells, organoids, or patient biopsies. Here, Hi-C 3.0 offers a well-established alternative using an endpoint reaction by sequence-specific restriction endonucleases8,9.
Micro-C is a widely applicable high-resolution chromosome conformation technology with a high dynamic range and a low signal-to-noise ratio, which makes it particularly suitable for investigating short-range chromosome features4,5,8, such as chromosome loops. The resolution of Micro-C allows for capturing promoter-enhancer loops, which are beyond the detection limit of Hi-C, efficiently enabling a more detailed analysis of the relationship between genome organization and regulation13,14,15. Furthermore, DNA capture strategies have recently been combined with Micro-C to increase the locus-specific resolution of the targeted genomic loci to unprecedented levels, revealing novel insights into the ultrastructure of the 3D genome16,17,18. In summary, we envision that Micro-C and its derivates will be a key technology for dissecting the role of the 3D genome in transcriptional regulation and, consequently, cell type differentiation and maintenance.
The authors have nothing to disclose.
We thank Christl Gaubitz and Kathleen Stewart-Morgan for their critical reading of the manuscript. We thank Anja Groth and the Groth lab for their support in establishing our lab. We thank the staff of the CPR/reNEW Genomics Platform for support: H. Wollmann, M. Michaut, and A. Kalvisa. The Novo Nordisk Foundation Center for Stem Cell Medicine (reNEW) is supported by the Novo Nordisk Foundation grant number NNF21CC0073729. The Novo Nordisk Foundation Center for Protein Research (CPR) is supported by the Novo Nordisk Foundation grant number NNF14CC0001. We thank the Brickman lab at the Novo Nordisk Center for Stem Cell Medicine, reNEW Copenhagen, for the mouse ES cells.
1 mM Biotin dATP | Jenna Bioscience | NU-835-Bio14-S | |
1 mM Biotin dCTP | Jenna Bioscience | NU-809-BioX-S | |
10 mM dGTP | NEB | N0442S | |
10 mM dTTP | NEB | N0443S | |
10 U/ml T4 PNK | NEB | M0201L | |
100 U/L Exonuclease III | NEB | M0206L | |
10x NEBuffer 1.1 | NEB | B7001S | |
10x NEBuffer 2.1 | NEB | B7202S | |
10x T4 DNA Ligase buffer | NEB | B0202A | |
1x DPBS w/o Mg2+ and Ca2+ | ThermoFisher | 14190144 | |
1x LIF | |||
2_Mercaptoethanol 50 mM | Gibco | 31350010 | 0.1 mM b-mercaptoethanol |
37% Formaldehyde | Sigma Aldrich | 252549-500ML | Caution. See manufactures MSDS |
400 U/ml T4 DNA Ligase | NEB | M0202L | |
5 U/ml Klenow Fragment | NEB | M0210L | |
Agarose | BIO-RAD | 1613102 | Caution. See manufactures MSDS |
BSA 20mg/ml | NEB | B9000S | |
CaCl2 | |||
cell counter | |||
Dimethyl Sulfoxide (DMSO) | Sigma Aldrich | D8418-100ML | Caution. See manufactures MSDS |
Dynabeads MyOne Streptavidin C1 | Invitrogen | 65001 | |
DynaMag-2 Magnet | Invitrogen | 12321D | refered to as: magnet magnet for 1.5 ml tubes |
DynaMag-PCR Magnet | Invitrogen | 492025 | refered to as: magnet magnet for PCR tubes |
EDTA Ultrapure 0.5M pH 8.0 | Invitrogen | 15575-038 | |
EGTA Ultrapure 0.5M pH 8.0 | BioWorld | 40121266-1 | |
Ethanol 96% | VWR Chemicals | 20824365 | quality control system |
Ethidium Bromide | Invitrogen | 15585-011 | |
Ethylene glycol bis(succinimidyl succinate) (EGS) | ThermoFisher | 21565 | |
Fetl Bovin Serum | Sigma Aldrich | F7524 | 15% FBS |
Gel Loading dye purple (6X) | NEB | B7024S | |
Glycine | PanReac AppliChem | A1067.0500 | |
Halt Proteinase inhibitor (100x) | ThermoFisher | 78430 | Caution. See manufactures MSDS |
IGEPAL CA-630 (NP-40) | Sigma Aldrich | 18896-50ML | |
MgCl 1 M | Invitrogen | AM9530G | |
Micrococcal Nuclease (MNase) | Worthington | LS004798 | |
mouse embryonic stem cells | |||
NaCl | Sigma Aldrich | S9888-1KG | |
NEBNext Multiplex Oligos for Illumina (Dual Index primers) | NEB | E7600S | amplification primers for sequencing libraries |
NEBNext Ultra II DNA library prep kit for Illumina | NEB | E7645L | sequencing library preparation kit |
NEBNext Ultra II Q5 Master mix | NEB | M0544S | Caution. See manufactures MSDS |
Non-Essential Amino Acids Solution | Gibco | 11140050 | 1x NEAA |
Penicillin-Streptomycin (10,000 U/mL) | Gibco | 15140148 | 1% Pen-Strep |
Proteinase K (40 mg/ml) | GoldBio | P-480-1 | Caution. See manufactures MSDS |
QIAquick Gel extraction kit | QIAgen | 28706 | refered to as: DNA gel elution kit |
QIAquick PCR purification kit | QIAgen | 28106 | refered to as: commercial DNA purification kit |
Qubit dsDNA HS Assay kit | Invitrogen | Q32854 | high sensitivity DNA quantification instrument |
Quick load purple 1kb plus DNA Ladder | NEB | N0550S | |
SPRIselect size selection beads | Beckman Coulter | B23319 | paramagnetic beads |
ThermoMixer C | Eppendorf | 5382000015 | refered to as: thermomixer |
Tris | Merck | 10708976001 | |
Trypsin | |||
Tween20 | Sigma Aldrich | P7949-100ML | |
Ultrapure 10% SDS | Invitrogen | 15553-035 | |
Ultrapure Phenol Chloroform Isoamyl Alcohol (PCI) | Invitrogen | 15593-031 | |
Fragment Analyzer |