Gene expression is regulated by interactions of gene promoters with distal regulatory elements. Here, we descirbe how low input Capture Hi-C (liCHi-C) allows the identification of these interactions in rare cell types, which were previously unmeasurable.
Spatiotemporal gene transcription is tightly regulated by distal regulatory elements, such as enhancers and silencers, which rely on physical proximity with their target gene promoters to control transcription. Although these regulatory elements are easy to identify, their target genes are difficult to predict, since most of them are cell-type specific and may be separated by hundreds of kilobases in the linear genome sequence, skipping over other non-target genes. For several years, Promoter Capture Hi-C (PCHi-C) has been the gold standard for the association of distal regulatory elements to their target genes. However, PCHi-C relies on the availability of millions of cells, prohibiting the study of rare cell populations such as those commonly obtained from primary tissues. To overcome this limitation, low input Capture Hi-C (liCHi-C), a cost-effective and customizable method to identify the repertoire of distal regulatory elements controlling each gene of the genome, has been developed. liCHi-C relies on a similar experimental and computational framework as PCHi-C, but by employing minimal tube changes, modifying the reagent concentration and volumes, and swapping or eliminating steps, it accounts for minimal material loss during library construction. Collectively, liCHi-C enables the study of gene regulation and spatiotemporal genome organization in the context of developmental biology and cellular function.
Temporal gene expression drives cell differentiation and, ultimately, organism development, and its alteration is closely related to a wide plethora of diseases1,2,3,4,5. Gene transcription is finely regulated by the action of regulatory elements, which can be classified as proximal (i.e., gene promoters) and distal (e.g., enhancers or silencers), the latter of which are frequently located afar from their target genes and physically interact with them through chromatin looping to modulate gene expression6,7,8.
The identification of distal regulatory regions in the genome is a matter which is widely agreed upon, since these regions harbor specific histone modifications9,10,11 and contain specific transcription factor recognition motifs, acting as recruiting platforms for them12,13,14. Besides, in the case of enhancers and super-enhancers15,16, they also have low-nucleosome occupancy17,18 and are transcribed into non-coding eRNAs19,20.
Nonetheless, each distal regulatory element's target genes are more difficult to predict. More often than not, interactions between distal regulatory elements and their targets are cell-type and stimulus specific21,22, span hundreds of kilobases, bridging over other genes in any direction23,24,25, and can even be located inside intronic regions of their target gene or other non-intervening genes26,27. Furthermore, distal regulatory elements can also control more than one gene at the same time, and vice versa28,29. This positional complexity hinders pinpointing regulatory associations between them, and therefore, most of each regulatory element's targets in every cell type remain unknown.
During recent years, there has been a significant boom in the development of chromosome conformation capture (3C) techniques for studying chromatin interactions. The most widely used of them, Hi-C, allows to generate a map of all the interactions between every fragment of a cell's genome30. However, to detect significant interactions at the restriction fragment level, Hi-C relies on ultra-deep sequencing, prohibiting its use to routinely study the regulatory landscape of individual genes. To overcome this economic limitation, several enrichment-based 3C techniques have emerged, such as ChIA-PET31, HiChIP32, and its low-input counterpart HiCuT33. These techniques depend on the use of antibodies to enrich for genome-wide interactions mediated by a specific protein. Nonetheless, the unique feature of these 3C techniques is also the bane of their application; users count on the availability of high-quality antibodies for the protein of interest and cannot compare conditions in which the binding of the protein is dynamic.
Promoter Capture Hi-C (PCHi-C) is another enrichment-based 3C technique that circumvents these limitations34,35. By employing a biotinylated RNA probe enrichment system, PCHi-C is able to generate genome-wide high-resolution libraries of genomic regions interacting with 28,650 human- or 27,595 mouse-annotated gene promoters, also known as the promoter interactome. This approach allows one to detect significant long-range interactions at the restriction fragment level resolution of both active and inactive promoters, and robustly compare promoter interactomes between any condition independently of the dynamics of histone modifications or protein binding. PCHi-C has been widely used over recent years to identify promoter interactome reorganizations during cell differentiation36,37, identify the mechanism of action of transcription factors38,39, and discover new potential genes and pathways deregulated in disease by non-coding variants40,41,42,43,44,45,46,47,48, alongside new driver non-coding mutations49,50. Besides, by just modifying the capture system, this technique can be customized according to the biological question to interrogate any interactome (e.g., the enhancer interactome51 or the interactome of a collection of non-coding alterations41,52).
However, PCHi-C relies on a minimum of 20 million cells to perform the technique, which prevents the study of scarce cell populations such as the ones often used in developmental biology and clinical applications. For this reason, we have developed low input Capture Hi-C (liCHi-C), a new cost-effective and customizable method based on the experimental framework of PCHi-C to generate high-resolution promoter interactomes with low-cell input. By performing the experiment with minimal tube changes, swapping or eliminating steps from the original PCHi-C protocol, drastically reducing reaction volumes, and modifying reagent concentrations, library complexity is maximized and it is possible to generate high-quality libraries with as little as 50,000 cells53.
Low input Capture Hi-C (liCHi-C) has been benchmarked against PCHi-C and used to elucidate promoter interactome rewiring during human hematopoietic cell differentiation, discover potential new disease-associated genes and pathways deregulated by non-coding alterations, and detect chromosomal abnormalities53. The step-by-step protocol and the different quality controls through the technique are detailed here until the final generation of the libraries and their computational analysis.
To ensure minimal material loss, (1) work with DNA low-binding tubes and tips (see Table of Materials), (2) place reagents on the tube wall instead of introducing the tip inside the sample and, (3) if possible, mix the sample by inversion instead of pipetting the sample up and down, and spin down afterward to recover the sample.
1. Cell fixation
2. Lysis and digestion
3. Ligation and decrosslinking
4. DNA purification
5. Optional quality controls
6. Sonication
7. End-repair
8. Biotin pull-down
9. dATP-tailing, adapter ligation, and PCR amplification
10. Library capture
11. Biotin pull-down and PCR amplification
liCHi-C offers the possibility of generating high-quality and resolution genome-wide promoter interactome libraries with as little as 50,000 cells53. This is accomplished by – besides the drastic reduction of reaction volumes and the use of DNA low-binding plasticware throughout the protocol – removing unnecessary steps from the original protocol, in which significant material losses occur. These include the phenol purification after decrosslinking, the biotin removal, and subsequent phenol-chloroform purification and ethanol precipitation. Besides that, reorganizing the steps of the Hi-C library preparation (biotin pulldown, A-tailing, adapter ligation, PCR amplification, and double-sided paramagnetic bead selection-also as the PCR product purification) allows us to remove yet another unnecessary DNA purification step. An overview of the experimental workflow can be found in Figure 1A.
To assess library quality, several controls throughout the protocol are performed, the first of which is the calculation of genome digestion efficiency; values over 80% are considered acceptable (Table 3). Checking for the digestion efficiency of the cell type in a separate experiment is suggested in order to not lose a significant amount of material from a single liCHi-C experiment. Second, before sonication and end-repair, it is recommended to check for the sensitivity of interaction detection by amplifying cell-type invariant chromatin interactions by conventional PCR (Figure 1B). If the specific product is detected, a third control should be performed, focusing on the efficiency of biotinylation and ligation by differentially digesting one of the previously obtained PCR products with HindIII and NheI (Figure 1C,D). When filling a digested HindIII restriction site and blunt-end ligating it with another one, a new NheI restriction site is generated instead of regenerating the original HindIII one. Therefore, the digestion of the PCR amplicon should only be observed when NheI is present. Finally, just before and after the entire capture, the concentration and size distribution should be checked using automated electrophoresis. Pre-capture library amplification must aim at obtaining 500-1,000 ng of Hi-C material, the exact amount needed to perform the RNA probe capture, since excessive amplification of both the pre- and post-captured library leads to a high percentage of PCR duplicates and the consequent loss of sequencing reads during analysis. Libraries can be reamplified again under the same conditions if not enough material is obtained during the first conservative PCR amplification. The amount of post-captured library material can vary, but as a rule of thumb, it should be approximately tenfold to 20-fold less than before the capture. The size distribution of the library should fall around 450-550 bp (Figure 2A,B), invariable between pre- and post-captured libraries. Collectively, correct results of these controls ensure the generation of excellent liCHi-C libraries.
Finished liCHi-C libraries are then (at least) 100 bp paired-end sequenced and analyzed. Raw sequencing data53 is processed using the HiCUP pipeline56 for mapping and filtering out artifacts. The ideal HiCUP report shows a fivefold to tenfold increase in the distribution of cis (inside the same chromosome) compared to trans (between different chromosomes) paired-end reads, as described previously according to in-nucleus ligation Hi-C57 (Figure 3B). The obtention of more than 100 million unique, valid reads after the removal of PCR duplicates is enough to proceed to the following step in the analysis (Figure 3B), which is to assess the capture efficiency. Paired-end reads in which none of their ends map into a captured restriction fragment by the RNA probe enrichment system, are discarded, keeping only the ones representing the promoter interactome of the cell (i.e., those reads in which at least one of their ends maps into restriction fragments containing one or more gene promoters [Figure 3C], ideally more than 60%).
Finally, significant interactions are called with the CHiCAGO pipeline, as described in58,59. Two or more biological replicates are needed for the final set of significant promoter interactions. The data quality can also be validated using principal component analysis (PCA), since biological replicates must cluster together and cell types must be separated. For instance, by analyzing liCHi-C datasets from four different primary cell types from the human hematopoietic tree (common myeloid progenitors, monocytes, megakaryocytes, and erythroblasts), we can observe in a PCA that liCHi-C libraries cluster in a developmental trajectory-reflecting fashion (Figure 3D). A closer examination of the significant interactions detected for the four cell types reveals that promoter interactomes are cell-type specific and dynamic during cell development. For example, the AHSP gene, a key chaperone synthesized in erythroid precursors which oversees the correct folding of hemoglobin60,61,62, shows a gain of interactions with potentially active regulatory elements (i.e., H3K27ac and H3K4me1 enriched regions) in erythroblasts, but not in other cell types (Figure 4). This demonstrates that the liCHi-C method can uncover potential regulatory interactions in rare cell types.
Figure 1: Protocol overview and quality controls of the sample before sonication. (A) Schematic overview of the liCHi-C protocol divided by days. B and blue hands represent, respectively, biotin molecules and steps in which one can safely stop the protocol for a large period of time. (B) Representative results of the 3C interaction controls. Both interaction sets for human (left) and mouse (right) are shown. The bands to expect are marked in dark blue, while an unspecific band is marked in light blue. (C) Representative fill-in and ligation controls using the "Dekker" human interaction primer pair. The band is cut only in lanes 2 and 3, where NheI is added. (D) Schematic representation of the generation of a new NheI restriction site during fill-in and ligation of a HindIII restriction site. Please click here to view a larger version of this figure.
Figure 2: Representative automated electrophoresis profiles from libraries pre- and post-capture. (A) Automated electrophoresis profile from a library just before capture. The total amount of DNA obtained is 994 ng (49.7 ng/µL in 20 µL). (B) High-sensitivity automated electrophoresis profile from a finished liCHi-C library. The sample is loaded half-diluted to preserve as much material as possible. The total amount of DNA obtained is 61.2 ng (1.53 ng/µL in 20 µL x2 to account for the dilution). Please click here to view a larger version of this figure.
Figure 3: Representative HiCUP pipeline output and sample replicate clustering by PCA. (A) Classification of the validity of read pairs by percentages and total counts. Invalid read pairs are subclassified by the experimental artifact type. (B) Deduplication percentages and classification of the interaction types. Cis interactions are further subclassified as cis-close (less than 10 kb) and cis-far (more than 10 kb). (C) Percentage of capture efficiency. Captured read-pairs are further subclassified whether one end, the other, or both contain one or more promoters. (D) Principal component analysis of CHiCAGO significant interaction scores from both replicates of liCHi-C libraries from common myeloid progenitors (CMP), erythroblasts, megakaryocytes, and monocytes. Please click here to view a larger version of this figure.
Figure 4: AHSP interaction landscape in human primary hematopoietic cells. Representative example of the AHSP promoter-centered interaction landscape in common myeloid progenitors (CMP), erythroblasts (Ery), megakaryocytes (MK), and monocytes (Mon) as seen in the WashU Epigenome Browser63. Arcs represent significant interactions. The dark blue shade shows the AHSP gene promoter, while the light blue shades overlap potential active regulatory elements that interact specifically with the AHSP promoter in erythroblasts. Please click here to view a larger version of this figure.
Table 1: Buffer composition and preparation. Please click here to download this Table.
Table 2: PCR primers and adapter sequences. Please click here to download this Table.
Table 3: Example of the calculation for the digestion efficiency. Please click here to download this Table.
liCHi-C offers the capability of generating high-resolution promoter interactome libraries using a similar experimental framework from PCHi-C's but with a vastly reduced cell number. This is greatly achieved by eliminating unnecessary steps, such as phenol purification and biotin removal. In the classical in-nucleus ligation Hi-C protocol57 and its subsequent derivative technique PCHi-C, biotin is removed from non-ligated restriction fragments to avoid pulling down DNA fragments that are afterward uninformative. Skipping this part and its subsequent DNA purification does not translate in a significant reduction in the percentage of valid reads (Figure 3A) while cutting out potential DNA-wasting steps, which are DNA purifications. The reorganization of the Hi-C library preparation after the sonication allows to skip yet another unnecessary purification by using the double-sided selection as the purification step itself. All of this enhances the performance of the whole protocol by employing minimal tube changes, and together with the reduction in reaction volume, changes in reagent concentration, and the use of DNA low-binding plasticware, is what allows the generation of high-quality libraries using as little as 50,000 cells. It is important to keep in mind that the starting cell number determines, in part, the number of significant interactions due to the library's complexity. Although libraries generated with 50,000 cells retain the cell-type specific and invariant topological features of more complex libraries53, our recommendation is to use, if possible, over 100,000 cells per biological replicate in order to capture a higher number of significant promoter interactions.
The resolution of the interactions detected in 3C techniques is essentially given by the restriction enzyme used. Here, the application of liCHi-C is described using HindIII, a 6 bp-cutting enzyme that gives a theoretical mean resolution of 4,096 bp. liCHi-C allows for the restriction enzyme to be replaced, for example, toward a 4 bp-cutting enzyme or even a micrococcal nuclease, thus increasing the resolution of the significant interactions detected. The generation of liCHi-C libraries, switching the HindIII restriction enzyme for the 4 bp-cutting MboI enzyme, has been reported to deliver excellent results detecting nearly double the total interactions, albeit with the shifting of the mean linear distance of the interactions detected down to half the distance53.
Regarding the actual RNA probe enrichment system, one of the main advantages of using this type of capture over an antibody-based one, such as HiChIP32 or HiCuT33, among others, is the ability to compare conditions independently of the binding of the protein, as well as not having to rely on the availability of a working antibody for the protein of interest. Furthermore, the RNA enrichment system can be tailored to capture any specific regions genome-wide to suit each investigator's need (the design is discussed in35,64).
In addition to antibody-based capture methods, several outstanding single-cell (such as scHi-C65, Dip-C66, or Sci-Hi-C67 among others) or low-input methods (such as Low-C68 or Easy Hi-C69) to investigate the 3D genome architecture have been developed in recent years. However, these generate sparse contact maps with low-resolution that do not allow the identification of contacts between distal regulatory elements and target genes. liCHi-C is a method that is able to overcome this limitation, opening the possibility of studying the promoter centric genome architecture in scarce cell types and providing the opportunity to advance our understanding of cellular and developmental biology and disease development.
Despite all of its features, liCHi-C is not exempt of limitations. First, processing the raw sequencing data is not trivial, and fair computational skills are needed to analyze the data and interpret its results. Moreover, liCHi-C does not discriminate between functional and structural interactions; it is required for the liCHi-C data to be integrated with epigenetic data and/or functional analysis to validate the potential functional interactions of gene promoters with their target regulatory elements. Lastly, library complexity is sacrificed when working on the lower end of cell numbers. This is reflected on the amount of unique interactions detected compared to higher cell number liCHi-C libraries and their deduplication rate, which can reach up to 80%. However, low-cell number liCHi-C libraries retain the topological features of higher cell number libraries in a more focal manner53, demonstrating that it is feasible to perform liCHi-C libraries that recapitulate the cell's promoter interactome with as little as 50,000 cells.
Overall, liCHi-C is a cost-effective and customizable method to generate high-quality and high-resolution promoter interactome libraries in scarce cell types. It is the first low-input method to map the promoter interactome and call for significant loops at the restriction fragment resolution. We foresee that this new tool, as its predecessor PCHi-C, will provide new insights in cell differentiation and organism development, both in health and disease.
The authors have nothing to disclose.
We thank the rest of the members from the Javierre lab for their feedback on the manuscript. We thank CERCA Program, Generalitat de Catalunya, and the Josep Carreras Foundation for institutional support. This work was financed by FEDER/Spanish Ministry of Science and Innovation (RTI2018-094788-A-I00), the European Hematology Association (4823998), and the Spanish Association against Cancer (AECC) LABAE21981JAVI. BMJ is funded by La Caixa Banking Foundation Junior Leader project (LCF/BQ/PI19/11690001), LR is funded by an AGAUR FI fellowship (2019FI-B00017), and LT-D is funded by an FPI Fellowship (PRE2019-088005). We thank the biochemistry and molecular biology PhD program from the Universitat Autònoma de Barcelona for its support. None of the funders were involved at any point in the experimental design or manuscript writing.
0.4 mM Biotin-14-dATP | Invitrogen | 19524-016 | |
0.5 M EDTA pH 8.0 | Invitrogen | AM9260G | |
1 M Tris pH 8.0 | Invitrogen | AM9855G | |
10x NEBuffer 2 | New England Biolabs | B7002S | Referenced as restriction buffer 2 in the manuscript |
10x PBS | Fisher Scientific | BP3994 | |
10x T4 DNA ligase reaction buffer | New England Biolabs | B0202S | |
16% formaldehyde solution (w/v), methanol-free | Thermo Scientific | 28908 | |
20 mg/mL Bovine Serum Albumin | New England Biolabs | B9000S | |
5 M NaCl | Invitrogen | AM9760G | |
5PRIME Phase Lock Gel Light tubes | Qiuantabio | 2302820 | For phenol-chloroform purification in section 4 (DNA purification). Phase Lock Gel tubes are a commercial type of tubes specially designed to maximize DNA recovery after phenol-chloroform purifications while avoiding carryover of contaminants in the organic phase by containing a resin of intermediate density which settles between the organic and aqueous phase and isolates them. PLG tubes should be spun at 12,000 x g for 30 s before use to ensure that the resin is well-placed at the bottom of the tube |
Adapters and PCR primers for library amplification | Integrated DNA Technologies | – | Bought as individual primers with PAGE purification for NGS |
Cell scrappers | Nunc | 179693 | Or any other brand |
Centrifuge (fixed-angle rotor for 1.5 mL tubes) | Any brand | ||
CHiCAGO R package | 1.14.0 | ||
CleanNGS beads | CleanNA | CNGS-0050 | |
dATP, dCTP, dGTP, dTTP | Promega | U120A, U121A, U122A, U123A | Or any other brand |
DNA LoBind tube, 1.5 mL | Eppendorf | 30108051 | |
DNA LoBind tube, 2 mL | Eppendorf | 30108078 | |
DNA polymerase I large (Klenow) fragment 5000 units/mL | New England Biolabs | M0210L | |
Dynabeads MyOne Streptavidin C1 beads | Invitrogen | 65002 | For biotin pull-down of the pre-captured library in section 8 (biotin pull-down) |
Dynabeads MyOne Streptavidin T1 beads | Invitrogen | 65602 | For biotin pull-down of the post-captured library in section 11 (biotin pull-down and PCR amplification) |
DynaMag-2 | Invitrogen | 12321D | Or any other magnet suitable for 1.5 ml tubeL |
Ethanol absolute | VWR | 20821.321 | |
FBS, qualified | Gibco | 10270-106 | Or any other brand |
Glycine | Fisher BioReagents | BP381-1 | |
GlycoBlue Coprecipitant | Invitrogen | AM9515 | Used for DNA coprecipitation in section 4 (DNA purification) |
HiCUP | 0.8.2 | ||
HindIII, 100 U/µL | New England Biolabs | R0104T | |
IGEPAL CA-630 | Sigma-Aldrich | I8896-50ML | |
Klenow EXO- 5000 units/mL | New England Biolabs | M0212L | |
Low-retention filter tips (10 µL, 20 µL, 200 µL and 1000 µL) | ZeroTip | PMT233010, PMT252020, PMT231200, PMT252000 | |
M220 Focused-ultrasonicator | Covaris | 500295 | |
Micro TUBE AFA Fiber Pre-slit snap cap 6 x 16 mm vials | Covaris | 520045 | For sonication in section 6 (sonication) |
NheI-HF, 100 U/µL | New England Biolabs | R3131M | |
Nuclease-free molecular biology grade water | Sigma-Aldrich | W4502 | |
PCR primers for quality controls | Integrated DNA Technologies | – | |
PCR strips and caps | Agilent Technologies | 410022, 401425 | |
Phenol: Chloroform: Isoamyl Alcohol 25:24:1, Saturated with 10 mM Tris, pH 8.0, 1 mM EDTA | Sigma-Aldrich | P3803 | |
Phusion High-Fidelity PCR Master Mix with HF Buffer | New England Biolabs | M0531L | For amplification of the library in sections 9 (dATP-tailing, adapter ligation and PCR amplification) and 11 (biotin pull-down and PCR amplification) |
Protease inhibitor cocktail (EDTA-free) | Roche | 11873580001 | |
Proteinase K, recombinant, PCR grade | Roche | 3115836001 | |
Qubit 1x dsDNA High Sensitivity kit | Invitrogen | Q33230 | For DNA quantification after precipitation in section 4 (DNA purification) |
Qubit assay tubes | Invitrogen | Q32856 | |
rCutsmart buffer | New England Biolabs | B6004S | |
RPMI Medium 1640 1x + GlutaMAX | Gibco | 61870-010 | Or any other brand |
SDS – Solution 10% for molecular biology | PanReac AppliChem | A0676 | |
Sodium acetate pH 5.2 | Sigma-Aldrich | S7899-100ML | |
SureSelect custom 3-5.9 Mb library | Agilent Technologies | 5190-4831 | Custom designed mouse or human capture system, used for the capture |
SureSelect Target Enrichment Box 1 | Agilent Technologies | 5190-8645 | Used for the capture |
SureSelect Target Enrichment Kit ILM PE Full Adapter | Agilent Technologies | 931107 | Used for the capture |
T4 DNA ligase 1 U/µL | Invitrogen | 15224025 | For ligation in section 3 (ligation and decrosslink) |
T4 DNA ligase 2000000/mL | New England Biolabs | M0202T | For ligation in section 9 (dATP-tailing, adapter ligation and PCR amplification) |
T4 DNA polymerase 3000 units/mL | New England Biolabs | M0203L | |
T4 PNK 10000 units/mL | New England Biolabs | M0201L | |
Tapestation 4200 instrument | Agilent Technologies | For automated electrophoresis in section 9 (dATP-tailing, adapter ligation, and PCR amplification) and section 11 (Biotin pull-down and PCR amplification). Any other automated electrophoresis system is valid |
|
Tapestation reagents | Agilent Technologies | 5067-5582, 5067-5583, 5067-5584, 5067-5585, | For automated electrophoresis in section 9 (dATP-tailing, adapter ligation, and PCR amplification) and section 11 (Biotin pull-down and PCR amplification). Any other automated electrophoresis system is valid |
Triton X-100 for molecular biology | PanReac AppliChem | A4975 | |
Tween 20 | Sigma-Aldrich | P9416-50ML |