Hi-C 3.0 is an improved Hi-C protocol that combines formaldehyde and disuccinimidyl glutarate crosslinkers with a cocktail of DpnII and DdeI restriction enzymes to increase the signal-to-noise ratio and the resolution of chromatin interaction detection.
Chromosome conformation capture (3C) is used to detect three-dimensional chromatin interactions. Typically, chemical crosslinking with formaldehyde (FA) is used to fix chromatin interactions. Then, chromatin digestion with a restriction enzyme and subsequent religation of fragment ends converts three-dimensional (3D) proximity into unique ligation products. Finally, after reversal of crosslinks, protein removal, and DNA isolation, DNA is sheared and prepared for high-throughput sequencing. The frequency of proximity ligation of pairs of loci is a measure of the frequency of their colocalization in three-dimensional space in a cell population.
A sequenced Hi-C library provides genome-wide information on interaction frequencies between all pairs of loci. The resolution and precision of Hi-C relies on efficient crosslinking that maintains chromatin contacts and frequent and uniform fragmentation of the chromatin. This paper describes an improved in situ Hi-C protocol, Hi-C 3.0, that increases the efficiency of crosslinking by combining two crosslinkers (formaldehyde [FA] and disuccinimidyl glutarate [DSG]), followed by finer digestion using two restriction enzymes (DpnII and DdeI). Hi-C 3.0 is a single protocol for the accurate quantification of genome folding features at smaller scales such as loops and topologically associating domains (TADs), as well as features at larger nucleus-wide scales such as compartments.
Chromosome conformation capture has been used since 20021. Fundamentally, every conformation capture variant relies on the fixation of DNA-protein and protein-protein interactions to preserve 3D chromatin organization. This is followed by DNA fragmentation, usually by restriction digestion, and, finally, religation of nearby DNA ends to convert spatially proximal loci into unique covalent DNA sequences. Initial 3C protocols used PCR to sample specific, "one-to-one" interactions. Subsequent 4C assays allowed the detection of "one-to-all" interactions2, while 5C detected "many-to-many" interactions3. Chromosome conformation capture came to full fruition after implementing next-generation, high-throughput sequencing (NGS), which allowed detection of "all-to-all" genomic interactions using, genome-wide Hi-C4 and comparable techniques such as 3C-seq5, TCC6 and Micro-C7,8 (see also review by Denker and De Laat9).
In Hi-C, biotinylated nucleotides are used to mark 5′ overhangs after digestion and before ligation (Figure 1). This allows for the selection of properly digested and religated fragments using streptavidin-coated beads, setting it apart from GCC10. An important update to the Hi-C protocol was implemented by Rao et al.11, who performed the digestion and religation in intact nuclei (i.e., in situ) to reduce spurious ligation products. Moreover, substituting HindIII digestion with MboI (or DpnII) digestion reduced the fragment size and increased the resolution potential of Hi-C. This increase allowed the detection of relatively small-scale structures and a more precise genomic localization of points of contact, such as DNA loops between small cis-elements, e.g., loops between CTCF-bound sites generated by loop extrusion11,12. However, this potential comes at a cost. First, a two-fold increase in resolution requires a four-fold (22) increase in sequencing reads13. Second, the small fragment sizes increase the possibility of mistaking undigested neighboring fragments for digested and religated fragments14. As mentioned, in Hi-C, digested and religated fragments differ from undigested fragments by the presence of biotin at the ligation junction. However, proper biotin removal from unligated ends is required to assure that only ligation junctions are pulled down14,15.
With the decreasing cost of NGS, it becomes feasible to study chromosome folding in greater detail. To decrease the size of DNA fragments, and thereby increase resolution, the Hi-C protocol can be adapted to use more frequently cutting restriction enzymes16 or to use combinations of restriction enzymes17,18,19. Alternatively, MNase7,8 in Micro-C and DNase in DNase Hi-C20 can be titrated to achieve optimal digestion.
A recent systematic evaluation of the fundamentals of 3C methods showed that the detection of chromosome folding features at every length scale greatly improved with sequential crosslinking with 1% FA followed by 3 mM DSG17. Furthermore, Hi-C with HindIII digestion was the best option for detecting large-scale folding features, such as compartments, and that Micro-C was superior at detecting small-scale folding features such as DNA loops. These results led to the development of a single, high-resolution "Hi-C 3.0" strategy, that uses the combination of FA and DSG crosslinkers followed by double digestion with DpnII and DdeI endonucleases21. Hi-C 3.0 provides an effective strategy for general use because it accurately detects folding features across all length scales17. The experimental portion of the Hi-C 3.0 protocol is detailed here and typical results that can be expected after sequencing are shown.
Figure 1: Hi-C procedure in six steps. Cells are fixed first with FA, and then DSG (1). Then, lysis precedes a double digestion with DdeI and DpnII (2). Biotin is added by overhang fill-in and proximal blunted ends are ligated (3) before DNA purification (4). Biotin is removed from unligated ends before sonication and size selection (5). Finally, pull-down of biotin allows for adapter ligation and library amplification by PCR (6). Abbreviations: FA = formaldehyde; DSG = disuccinimidyl glutarate; B = Biotin. Please click here to view a larger version of this figure.
1. Fixation by crosslinking
2. Chromosome conformation capture
Table 1: Digestion reagents. Please click here to download this Table.
Table 2: Biotin fill-in reagents. *Note that changing enzymes may require different buffers and biotinylated dNTPs. Please click here to download this Table.
Table 3: Ligation mix reagents. Abbreviation: BSA = bovine serum albumin. Please click here to download this Table.
Table 4: Gel loading parameters for quality and size selection assessment. Please click here to download this Table.
Figure 2: Agarose gel showing typical postDNA purification quality control results. (A) The CI control should indicate a band of high molecular weight DNA. (B) The DC and Hi-C samples show a range of DNA sizes. The Hi-C sample, having been combined into larger fragments, should be of higher molecular weight than the DC. The concentration range of markers allows for generating a standard curve. Note, in this example, the CI was loaded on a separate gel, but it is advised to load and run all samples and controls together. Abbreviations: CI = chromatin integrity; DC = digestion control; Hi-C = proximity-ligated. Please click here to view a larger version of this figure.
3. Hi-C sequencing library preparation
Table 5: Biotin removal reagents and temperatures Please click here to download this Table.
Table 6: Parameters for sonication. Please click here to download this Table.
Table 7: End repair reagents and temperatures. Please click here to download this Table.
Table 8: A-tailing reagents and temperatures. Please click here to download this Table.
Table 9: PCR Primers and Paired-End-adapter oligos with reagents annealing for annealing. Abbreviation: 5PHOS = 5' phosphate. Asterisks indicate phosphorothioated DNA bases. #Combine an indexed oligo with the Universal oligo to anneal into an indexed adapter. Please click here to download this Table.
Table 10: Adapter ligation reagents. Please click here to download this Table.
Table 11: PCR reagents and cycling parameters. Please click here to download this Table.
Figure 3: Agarose gel showing typical post-size selection results. The Upper and Lower Fractions for four samples (numbered 1-4) of DpnII-DdeI Hi-C are shown. The first lane for each sample contains the Upper fraction, derived from a 0.8x magnetic beads mixture, and the second and third lanes contain a dilution of the Lower Fraction derived from a 1.1x magnetic beads mixture. Please click here to view a larger version of this figure.
Figure 4: Agarose gel with PCR titration results. Starting from 5 cycles of PCR, samples are taken after every 2 cycles (5, 7, 9, and 11 cycles) for each of four libraries. Based on this figure, 6 cycles was chosen as the optimal cycle for each sample. Please click here to view a larger version of this figure.
Figure 5: Final PCR products. After cleaning and size-selection, PCR products (Hi-C) were loaded next to a ClaI-digested fraction of the same library (ClaI). ClaI-digested fragments indicate the presence of sought-after DpnII-DpnII ligations. Note that ClaI does not digest DpnII-DdeI junctions and, therefore, not all ligations will contribute to a size reduction from this restriction. Please click here to view a larger version of this figure.
The figures in this manuscript were generated from a separate, replicate experiment of the one published previously by Lafontaine et al.21. After obtaining high-throughput sequencing data, the Open Chromatin Collective (Open2C: https://github.com/open2c) was used to process the Hi-C data. A similar pipeline can be found on the data portal of the 4D Nucleome project (https://data.4dnucleome.org/resources/data-analysis/hi_c-processing-pipeline). Briefly, the Nextflow pipeline distiller (https://github.com/open2c/distiller-nf) was implemented to (1) align the sequences of Hi-C molecules to the reference genome, (2) parse .sam alignment and form files with Hi-C pairs, (3) filter PCR duplicates, and (4) aggregate pairs into binned matrices of Hi-C interactions. These HDF5 formatted matrices, called coolers, can then be (1) viewed on a HiGlass server (https://higlass.io/) and (2) analyzed using a large set of open source computational tools present in the “cooltools” collection maintained by the Open Chromatin Collective (https://github.com/open2c/cooltools) to extract and quantify folding features such as compartments, TADs, and loops.
Some quality indicators of the Hi-C3.0 libraries can be assessed immediately after mapping read pairs to a reference genome, using a few simple metrics/indicators. First, typically ~50% of sequenced read pairs can be uniquely mapped for human cells. Due to the polymeric nature of chromosomes, most of these mapped reads (~60%-90%) represent interactions within a chromosome (cis), with interaction frequencies rapidly decaying with increasing genomic distance (distance-dependent decay). The distance-dependent decay can be visualized best in a "scaling plot", which shows the contact probability (per chromosome arm) as a function of genomic distance. We found that the use of different crosslinkers and enzymes can alter the distance-dependent decay at long- and short-range distances17. The addition of DSG crosslinking increases the detectability of interactions at short distances when combined with enzymes such as Mnase and combinations of DpnII-DdeI that produce smaller fragments (Figure 6A).
Distance-dependent decay can also be observed directly from 2D interaction matrices: interactions become more infrequent when located further away from the central diagonal (Figure 6B). Additionally, genomic folding features, such as compartments, TADs, and loops can be identified from Hi-C matrices and scaling plots as deviations from the general genome-wide average distance-dependent decay. Importantly, crosslinking with DSG in addition to FA decreases random ligations, which are unrestricted due to the polymer nature of chromosomes and, therefore, more likely to occur between chromosomes (in trans) (Figure 6C). Reducing random ligation leads to increased signal-to noise ratios, especially for inter-chromosomal and very long range (>10-50 Mb) intrachromosomal interactions.
Figure 6: Representative results of mapped and filtered Hi-C libraries. (A) Scaling plots with contact probability and its derivative for various enzymes, ordered by fragment length (top) and crosslinking with either FA or FA + DSG (bottom). Digestion with MNAse (microC) or DpnII-DdeI (Hi-C 3.0) significantly increases short range contacts (top) as does adding DSG to FA (bottom). (B) Columns show Hi-C heatmaps of DpnII digestion after just FA crosslinking and DpnII or DdeI digestion after FA+DSG crosslinking. White arrows show increasing strength of "dots" after DSG crosslinking and DdeI digestion, implying better detection of DNA loops. Rows show different parts of chromosome 3 at increasing resolution, aligning with panel C: top row: entire chromosome 3 (0-198,295,559 Mb); middle row: 186-196 Mb; bottom row: 191.0-191.5 Mb. (C) Coverage graphs for the regions depicted in A. Black arrows show the lower coverage (%cis reads) for FA-only crosslinking. Abbreviations: FA = formaldehyde; DSG = disuccinimidyl glutarate; chr = chromosome. Please click here to view a larger version of this figure.
Not all mapped reads are useful. A second quality indicator is the number of PCR duplicates. Exact duplicate reads are highly unlikely to occur by chance after ligation and sonication. Thus, such reads likely resulted from PCR amplification and need to be filtered out. Duplicates often arise when too many PCR cycles are required to amplify low complexity libraries. Generally, for Hi-C, most libraries only need 5-8 cycles of final PCR amplification, as determined by titration PCR (see step 3.9; Figure 4). However, libraries with sufficient complexity can be obtained even after 14 cycles of PCR amplification.
Another category of duplicate reads, so-called optical duplicates, can arise from the amplification process on Illumina sequencing platforms that use patterned flow cells (such as HiSeq4000). Optical duplicates are found from either overloading the flow cell, causing (large) clusters to be called two separate clusters, or from local reclustering of the original paired-end molecule after a first round of PCR. Because both types of optical duplicates are local, they can be identified and distinguished from PCR duplicates by their location on the flow cell. Whereas libraries with >15% PCR duplicates would need regeneration, libraries with optical duplicates can be reloaded after optimizing the loading process.
Supplemental Table S1: Buffers and solutions. Please click here to download this Table.
Critical steps for cell handling
Although it is possible to use a lower number of input cells, this protocol has been optimized for ~5 × 106 cells per sequencing lane (~400 M reads) to ensure proper complexity after deep sequencing. Cells are best counted prior to fixation. For the generation of ultradeep libraries, we generally multiply the number of lanes (and cells) until the desired read-depth is reached. For optimal fixation, serum-containing medium should be replaced with PBS prior to FA fixation, and fixative solutions should be added immediately and without concentration gradients15,22. For cell harvesting, scraping is preferred over trypsinization, because the transition from a flatter to a spherical shape after trypsinization could affect the nuclear conformation. After the addition of DSG, loose and clumpy cell pellets are easily lost. Be careful when handling cells at this stage and add up to 0.05% BSA to decrease clumping.
Modifications to the method
This protocol was developed using human cells17. Yet, based on experience with chromosome conformation capture, this protocol should work for most eukaryotic cells. For a significantly lower input (~1 × 106 cells), we advise using half the volumes for the lysis and conformation capture procedures [steps 2.1-2.4]. This would also allow DNA isolation [step 2.5] to be performed in a tabletop centrifuge with 1.7 mL tubes, which could improve pelleting for low DNA concentrations. The quantification of DNA (step 2.6.6) will indicate how to proceed. For low amounts of isolated DNA (1-5 µg), we suggest skipping the size selection (step 3.3) and proceed with biotin removal after reducing the volume from 130 µL to ~45 µL with a CFU.
This protocol was developed specifically to ensure high-quality data after subsequent crosslinking with FA and DSG and digestion with DpnII and DdeI. However, alternative crosslinking strategies such as FA followed by EGS (ethylene glycol bis(succinimidyl succinate)), which is also used in ChIP-seq23 and ChIA-PET24, might work equally well17. Similarly, different enzyme combinations, such as DpnII and HinfI18 or MboI, MseI, and NlaIII19 can be used for digestion. When adapting enzyme combinations, be sure to use biotinylated nucleotides that can fill in the specific 5' overhangs and use the most optimal buffers for each cocktail. DpnII comes with its own buffer and the enzyme manufacturer recommends a specific buffer for DdeI digestion. Yet, for the double-digestion with DpnII and DdeI in this protocol, Restriction Buffer is recommended because it is rated at 100% activity for both enzymes.
Troubleshooting conformation capture
The three key steps in chromosome conformation capture: crosslinking, digestion, and religation have all been performed before the results can be visualized on gel. To determine the quality of each of these three steps and discern where problems could have arisen, aliquots before (CI) and after digestion (DC) are taken and loaded on the gel along with the ligated Hi-C sample (Figure 2). This gel is used to determine the quality of the Hi-C sample and whether it will be worth continuing the protocol. Without the CI and DC, it is difficult to pinpoint potential suboptimal step(s). It is worth noting that suboptimal ligation could be due to a problem in the ligation itself, the fill-in, or a problem with crosslinking. To troubleshoot crosslinking, be sure not to use more than 1 × 107 cells per library and start with fresh crosslinking reagents and clean cells (i.e., rinsed with PBS). For ligation, make sure cells and ligation mixture are kept on ice. Add T4 DNA ligase just before the 4 h incubation at 16 °C and mix well.
Troubleshooting library preparation
If more than 10 PCR cycles are needed or no PCR product can be seen on gel after PCR titration (Figure 4), there are a few options to save the Hi-C sample. Working back from the PCR titration, the first option is to try the PCR again. If there is still not enough product, it is possible to attempt another round of A-tailing and adapter ligation (step 3.6) after washing the beads twice with 1x TLE buffer. After this additional A-tailing and adapter ligation, one can proceed to the PCR titration as before. If there is still no product, the last option is to resonicate the 0.8x fraction from step 3.3 and proceed from there.
Limitations and advantages of Hi-C3.0
It is important to realize that Hi-C is a population-based method that captures the average frequency of interactions between pairs of loci in the cell population. Some computational analyses are designed to disentangle combinations of conformations from a population25, but in principle, Hi-C is blind to differences between cells. Although it is possible to perform single-cell Hi-C26,27 and computational inferences can be made28, single-cell Hi-C is not suitable for obtaining ultrahigh-resolution 3C information. An additional limitation of Hi-C is that it only detects pairwise interactions. To detect multicontact interactions, one can either use frequent cutters combined with short-read sequencing (Illumina)16 or perform multicontact 3C29 or 4C30, using long-read sequencing from PacBio or Oxford Nanopore platforms. Hi-C derivatives to specifically detect contacts between and along sister chromatids have also been developed31,32.
Although Hi-C19 and Micro-C33 can be used to generate contact maps at subkilobase resolutions, both require a large amount of sequencing reads and this can become a costly undertaking. To get to similar or even higher resolution without the costs, enrichment for specific genomic regions (capture-C34) or specific protein interactions (ChiA-PET35, PLAC-seq36, Hi-ChIP37) can be applied. The strength and downside of these enrichment applications is that only a limited number of interactions are sampled. With such enrichments, the global aspect of Hi-C (and the option of global normalization) is lost.
Importance and potential applications of Hi-C3.0
This protocol was designed to enable high-resolution, ultradeep 3C while simultaneously detecting large-scale folding features such as TADs and compartments17 (Figure 6). This protocol starts with 5 × 106 cells per tube for each Hi-C library, which should be more than enough material to sequence one or two lanes on a flow cell to obtain up to 1 billion paired-end reads. For ultradeep sequencing, multiple tubes of 5 × 106 cells should be prepared, depending on the number of mapped reads and PCR duplicates. At the highest resolution (<1 kb), looping interactions are mostly found between CTCF sites, but promoter-enhancer interactions can also be detected. Readers can refer to Akgol Oksuz et al.17 for a detailed description of the data analysis.
The authors have nothing to disclose.
We would like to thank Denis Lafontaine for protocol development and Sergey Venev for Bioinformatic assistance. This work was supported by a grant from the National Institutes of Health Common Fund 4D Nucleome Program to J.D. (U54-DK107980, UM1-HG011536). J.D. is an investigator of the Howard Hughes Medical Institute.
This article is subject to HHMI's Open Access to Publications policy. HHMI lab heads have previously granted a nonexclusive CC BY 4.0 license to the public and a sublicensable license to HHMI in their research articles. Pursuant to those licenses, the author-accepted manuscript of this article can be made freely available under a CC BY 4.0 license immediately upon publication.
1 kb Ladder | New England Biolabs | N3232L | |
Agarose | Invitrogen | 16500100 | |
Agencourt AMPure XP magnetic beads , 60 mL | Beckman Coulter | A63881 | |
Amicon Ultra-0.5 Centrifugal Filter Unit (CFU) | EMD Millipore | UFC500396 | |
Annealing Buffer (5x) | See recipe in supplemental materials | ||
ATP 10 mM | ThermoFisher | R0441 | |
Avanti J-25i High Speed Refrigerated ultra-centrifuge | Beckman Coulter | ||
beckman ultracentrifuge tube 35 mL | Beckman Coulter | 357002 | |
Binding Buffer (2x) | See recipe in supplemental materials | ||
biotin-14-dATP 0.4 mM | Invitrogen | 19524-016 | |
BSA 10 mg/mL | New England Biolabs | B9000S | dilute from 20 mg/mL |
Cell scraper | Falcon | 353089 | |
Cell scraper | Corning | 3008 | |
Conical polypropylene tubes 50 mL | Denville | C1062-P | |
Conical tube 15 mL | Denville | C1017-P | |
Covaris micro tube AFA fiber with snap-cap 130 µL | Covaris | 520045/520077 | |
Covaris Sonicator | Covaris | E220/E220evolution/M220 | |
Culture flask 175 cm2 | Falcon | 353112 | |
Culture plates 150 mm x 25 mm | Corning | 430599 | |
dATP 1 mM | Invitrogen | 56172 | |
dATP 10 mM | Invitrogen | 56172 | |
dCTP 10 mM | Invitrogen | 56173 | |
DdeI | New England Biolabs | R0175L | |
dGTP 10 mM | Invitrogen | 56174 | |
DMSO | Sigma | D2650-5x10ML | |
dNTP mix 25 mM | Invitrogen | 10297117 | |
Dounce homogenizer | DWK Life Sciences | 8853010002/8853030002 | |
DPBS | Gibco | 14190-144 | |
DpnII | New England Biolabs | R0543M | |
DSG | ThermoScientific | 20593 | |
dTTP 10 mM | Invitrogen | 56175 | |
Ethanol 70% | Fisher | A409-4 | Diluted from 100% |
Ethidium Bromide | Fisher | BP1302-10 | |
Formaldehyde (37%) | Fisher | BP531-500 | |
Gel loading dye (6x ) | New England Biolabs | B7024S | |
Glycine in ultrapure water 2.5 M | Sigma | G8898-1KG | |
HBSS | Gibco | 14025-092 | |
Igepal CA-630 detergent | MP Biomedicals | 198596 | |
Klenow DNA polymerase 5 U/µL | New England Biolabs | M0210L | |
Klenow Fragment 3–>5’ exo-, 5 U/µL | New England Biolabs | M0212L | |
ligation buffer (10x) | New England Biolabs | B7203S | |
Liquid nitrogen | |||
LoBind microcentrifuge tube 1.7 mL | Eppendorf | 22431021 | |
Low Molecular Weight DNA Ladder | New England Biolabs | N3233L | |
Lysis buffer | See recipe in supplemental materials | ||
Magnetic Particle separator | ThermoFisher | 12321D | |
Microfuge tubes 1.7 mL | Axygen | MCT-175-C | |
MyOne Streptavidin C1 beads | Invitrogen | 65001 | |
NEBuffer 2.1 (10x) | New England Biolabs | B7002S | |
NEBuffer 3.1 (10x) | New England Biolabs | B7203S | |
PBS | Gibco | 70013-032 | |
PCR (strip) tubes | Biorad | TBS0201/ TCS0803 | |
PCR thermocycler | Biorad | T100 | |
Pfu Ultra II Buffer (10x) | Agilent | Comes with Pfu Ultra | |
PfuUltra II Fusion HS DNA Polymerase | Agilent | 600674 | |
Phase lock tube 15 mL | Qiagen | 129065 | |
Phase lock tubes 2 mL | Qiagen | 129056 | |
Phenol:chloroform:isoamyl alcohol | Invitrogen | 15593-049 | |
Protease inhibitor cocktail | ThermoFisher | 78440 | |
Proteinase K in ultrapure water 10 mg/mL | Invitrogen | 25530-031 | |
Refrigerated Centrifuge | Eppendorf | 5810R | |
RNase A, DNase and protease-free 10 mg/mL | Thermo Scientific | EN0531 | |
Rotator | Argos technologies | EW-04397-40 or rocking platform | |
SDS 1% | Fisher | BP13111 | |
Sodium acetate pH = 5.2, 3 M | Sigma | ||
Sub-Cell GT Horizontal Electrophoresis System | Biorad | 1704401 | |
T4 DNA ligase 1 U/µL | Invitrogen | 100004817 | |
T4 DNA polymerase | New England Biolabs | M0203L | |
T4 DNA polymerase | New England Biolabs | M0203L | |
T4 DNA polymerase 3 U/µL | New England Biolabs | M0203L | |
T4 ligation buffer (5x) | Invitrogen | Y90001 | |
T4 polynucleotide kinase 10 U/µL | New England Biolabs | M0201L | |
Tabletop centrifuge | Eppendorf | 5425 | |
TBE buffer | See recipe in supplemental materials | ||
Tris Low EDTA Buffer (TLE) | See recipe in supplemental materials | ||
Triton X-100 (10%) | Sigma | 93443 | |
Truseq adapter oligos | Integrated DNA Technologies (IDT)) | https://www.idtdna.com/site/order/oligoentry | 250 nmole and HPLC purified |
Tween 20 detergent | Fisher | 9005-64-5 | |
Tween Wash Buffer | See recipe in supplemental materials | ||
Vortex | Scientific Industries | (G560)SI-0236 |