DNA regulatory elements, such as enhancers, control gene expression by physically contacting target gene promoters, often through long-range chromosomal interactions spanning large genomic distances. Promoter Capture Hi-C (PCHi-C) identifies significant interactions between promoters and distal regions, enabling the assignment of potential regulatory sequences to their target genes.
The three-dimensional organization of the genome is linked to its function. For example, regulatory elements such as transcriptional enhancers control the spatio-temporal expression of their target genes through physical contact, often bridging considerable (in some cases hundreds of kilobases) genomic distances and bypassing nearby genes. The human genome harbors an estimated one million enhancers, the vast majority of which have unknown gene targets. Assigning distal regulatory regions to their target genes is thus crucial to understand gene expression control. We developed Promoter Capture Hi-C (PCHi-C) to enable the genome-wide detection of distal promoter-interacting regions (PIRs), for all promoters in a single experiment. In PCHi-C, highly complex Hi-C libraries are specifically enriched for promoter sequences through in-solution hybrid selection with thousands of biotinylated RNA baits complementary to the ends of all promoter-containing restriction fragments. The aim is to then pull-down promoter sequences and their frequent interaction partners such as enhancers and other potential regulatory elements. After high-throughput paired-end sequencing, a statistical test is applied to each promoter-ligated restriction fragment to identify significant PIRs at the restriction fragment level. We have used PCHi-C to generate an atlas of long-range promoter interactions in dozens of human and mouse cell types. These promoter interactome maps have contributed to a greater understanding of mammalian gene expression control by assigning putative regulatory regions to their target genes and revealing preferential spatial promoter-promoter interaction networks. This information also has high relevance to understanding human genetic disease and the identification of potential disease genes, by linking non-coding disease-associated sequence variants in or near control sequences to their target genes.
Accumulating evidence suggests that the three-dimensional organization of the genome plays an important functional role in a range of nuclear processes, including gene activation1,2,3, repression4,5,6,7,8, recombination9,10, DNA repair11, DNA replication12,13, and cellular senescence14. Distant enhancers are found in close spatial proximity to the promoters they regulate15,16,17, which is essential for proper spatio-temporal gene expression control. Enhancer deletions show that distal enhancers are essential for target gene transcription18,19,20,21,22, and 'forced chromatin looping' demonstrates that engineered tethering between an enhancer and its target promoter in the Hbb locus is sufficient to drive transcriptional activation23. Further, genome rearrangements that bring genes under the control of ectopic enhancers can result in inappropriate gene activation and disease24,25,26. Together, these examples illustrate that promoter-enhancer interactions are essential for gene control and require tight regulation to ensure appropriate gene expression. The human and mouse genomes are each estimated to harbor around one million enhancers. For the vast majority of these enhancers, target genes are unknown, and the 'rules of engagement' between promoters and enhancers are poorly understood. Assigning transcriptional enhancers to their target genes thus remains a major challenge in deciphering mammalian gene expression control.
Our understanding of three-dimensional genome architecture has been revolutionized by the introduction of 3C27 (chromosome conformation capture) and its variants28,29,30,31. The most powerful of these techniques, Hi-C (high throughput chromosome conformation capture) is designed to identify the entire ensemble of chromosomal interactions within a cell population. Hi-C libraries, typically generated from millions of cells, are highly complex with an estimated 1011 independent ligation products between ~4 kb fragments in the human genome32. As a consequence, reliable and reproducible identification of interactions between individual restriction fragments (such as those containing a promoter or enhancer) from Hi-C data is not feasible unless Hi-C libraries are subjected to ultra-deep sequencing, which is not an economically viable solution for laboratories preparing Hi-C libraries routinely. To circumvent this shortcoming, we developed Promoter Capture Hi-C to specifically enrich promoter-containing ligation products from Hi-C libraries. We focused on promoters for two reasons. First, promoter-enhancer contacts have been shown to be crucial for proper gene expression levels in numerous studies (see references above), and second, as promoters are largely invariant between cell types, the same capture bait system can be used to interrogate the regulatory circuitry across multiple cell types and conditions. Our approach relies on in-solution hybridization of Hi-C libraries with tens of thousands of biotinylated RNA 120mers complementary to promoter-containing Hi-C ligation products and subsequent capture on streptavidin-coated magnetic beads. This results in PCHi-C libraries with much reduced complexity compared to the original Hi-C library, focusing only on the identification of fragments that are ligated to promoters at significantly high frequencies.
We have used PCHi-C in a number of human and mouse cell types to contribute to a better understanding of gene expression control by uncovering long-range distal promoter interacting regions with putative regulatory function, as well as non-random promoter-promoter contacts in the three-dimensional space of the nucleus. The studies have mapped hundreds of thousands of promoter-enhancer contacts across numerous cell types33,34,35,36,37,38,39, identified Polycomb Repressive Complex-mediated spatial genome organization in mouse embryonic stem cells7, demonstrated large-scale rewiring of promoter interactomes during cellular differentiation37,38,39, and linked non-coding disease-associated sequence variants to gene promoters35.
PCHi-C is an ideally suited method to map the genome-wide ensemble of DNA sequences interacting with promoters. Related approaches, such as Capture Hi-C of continuous genomic regions (see Discussion) are the method of choice to obtain high-resolution interaction profiles for selected genomic regions. PCHi-C and Capture Hi-C are extremely similar from an experimental point of view (the only difference is the choice of capture system), so that the advice and guidelines we provide are applicable to both approaches. Here, we present a detailed description of PCHi-C. We outline the rationale and design of a PCHi-C experiment, provide a step-by-step PCHi-C library generation protocol, and illustrate how the quality of PCHi-C libraries can be monitored at various steps in the protocol to yield high-quality data.
1. Formaldehyde Fixation
2. Cell Lysis
3. HindIII Digestion
4. Biotinylation of Restriction Fragment Overhangs
5. In-nucleus Ligation
6. Crosslink Reversal
7. DNA Purification
8. Quality Controls
9. DNA Fragmentation
10. Double-sided SPRI-bead Size Selection
11. Biotin/Streptavidin Pull-down of Ligation Products
12. End Repair and Removal of Biotin at Non-ligated DNA Ends
13. dATP Tailing
14. Adapter Ligation
15. Hi-C Library Amplification
16. Hybrid In-solution Capture
NOTE: Blocker and buffer (SHS1-4) solutions below are from the SureSelect kit (see Table of Materials).
17. Isolation of Promoter Fragment-containing Ligation Products
NOTE: The following steps are recommended to be done with SureSelect adapter kit and library (see Table of Materals).
18. PCHi-C Library Amplification
Promoter Capture Hi-C has been used to enrich mouse7,34,36,39 and human33,35,37,38 Hi-C libraries for promoter interactions. A similar protocol (named HiCap) has been described by the Sandberg group40. Figure 1A shows the schematic workflow for Promoter Capture Hi-C. In the protocol described here, Hi-C libraries are generated using in-nucleus ligation41, which results in a significantly reduced number of spurious ligation products42. For PCHi-C, highly complex mouse or human Hi-C libraries are subjected to in-solution hybridization and capture using 39,021 biotinylated RNAs complementary to 22,225 mouse promoter-containing HindIII restriction fragments, or 37,608 biotinylated RNAs targeting 22,076 human promoter-containing HindIII restriction fragments, respectively. Promoter containing restriction fragments can be targeted at either or both ends by individual biotinylated RNAs (Figure 1B). We found that capture of both ends improved coverage of individual promoters (Figure 1C; raw sequence reads) nearly two-fold, as expected. Thus, whenever possible (i.e., in non-repetitive regions), we advise to use biotinylated RNAs complementary to both ends of a restriction fragment to be captured.
To assess PCHi-C library quality at an early stage during library preparation, we perform two controls after DNA ligation and purification, as previously described31. The first is to use specific primer pairs to amplify ligation products as in 3C27. We use primer pairs (Table 1) to amplify cell-type invariant long-range ligation products, such as between the Myc gene and its known enhancers located approximately 2 Mb away (Figure 2A) or between genes of the Hist1 locus (separated by 1.5 Mb), and between two regions located in close linear proximity ('short-range control').
The second quality control is carried out to determine the efficiency of biotin incorporation during Klenow-mediated fill-in of restriction site overhangs with biotin-dATP. Successful Klenow fill-in and subsequent blunt-end ligation results in the disappearance of the original restriction site between the DNA molecules of a ligation product, and in the case of HindIII in the formation of a new NheI recognition site (Figure 2B). The ratio of the HindIII to NheI digested ligation product is a direct readout of biotin incorporation efficiency. A poor quality Hi-C library will show a high level of HindIII digestion, whereas high-quality libraries have near-complete NheI digestion of ligation products (Figure 2B).
After Hi-C library preparation (i.e., after biotin-streptavidin pull down of size-selected Hi-C ligation products, adapter ligation and pre-capture PCR), the integrity and size distribution of the Hi-C library is assessed by Bioanalyzer (Figure 2C). The same control is carried out at the end of PCHi-C library preparation (i.e., after hybridization capture of promoter-containing ligation products and post-capture PCR). Comparison of the Hi-C and PCHi-C Bioanalyzer profiles shows that as expected, Hi-C libraries are much more concentrated than the corresponding PCHi-C libraries, but the size distribution of the libraries is highly similar, indicating that the capture step in PCHi-C does not introduce a size bias (Figure 2C, D).
After paired-end sequencing, the PCHi-C reads are mapped, quality controlled and filtered using the HiCUP pipeline43. High-quality PCHi-C libraries contain between 70-90% 'valid pairs' (i.e., paired-end sequence reads between two restriction fragments that are not neighboring on the linear genomic map; Figure 3A, B). Using the in-nucleus ligation protocol41,42, the percentage of trans read pairs (i.e., paired-end sequence reads between two restriction fragments that are located on different chromosomes) are usually low, between 5 and 25%, reflecting the existence of chromosome territories, and indicating good library quality. Direct comparison of the percentage of 'valid pairs' between Hi-C libraries and their corresponding PCHi-C libraries35, shows that in all cases the percentage of valid pairs is higher in the PCHi-C libraries (Figure 3B). This is accompanied by a reduction in the percentage of non-valid 'same fragment internal' reads in PCHi-C (Figure 3C). This is expected, as the capture step not only enriches for promoter-containing ligation products, but also for restriction fragment ends, due to the position of the capture oligos on the restriction fragments (see Figure 1B).
After HiCUP filtering, we determine the capture efficiency. PCHi-C libraries contain three types of valid sequence reads after HiCUP filtering:
1.) Promoter: genome reads (i.e., reads between a captured promoter fragment and a non-promoter HindIII restriction fragment anywhere in the genome)
2.) Promoter: promoter reads (reads between two captured promoter fragments)
3.) Genome: genome reads (background Hi-C ligation products where neither of the ligation product partners maps to a captured promoter). These are discarded prior to downstream analyses.
High-quality PCHi-C libraries have capture efficiencies (sum of categories 1 and 2 above) between 65–90% (Figure 3D). A direct comparison to Hi-C libraries shows that PCHi-C results in a ~15-fold enrichment for promoter-containing ligation products (Figure 3D), in some cases 17-fold. This is close to the hypothetical maximum (19.6-fold) enrichment for PCHi-C, which is dependent on the percentage of the genome restriction fragments covered by the capture system. Greater enrichment can be achieved by designing capture systems targeting fewer restriction fragments44,45,46.
Analysis of promoter interactomes demonstrates cell type and lineage-specificity33,34,35, with pronounced changes during cellular differentiation37,38,39. Figures 4 and 5 show examples of lineage specificity and differentiation dynamics at specific promoters. For example, ALAD is constitutively expressed in all cells but its expression is upregulated in erythroblasts47. The ALAD promoter contacts several distal fragments in all hematopoietic cells and engages in additional interactions specifically in erythroblasts (Figure 4). IL-8 shows no statistically significant interactions in B cells, very few interactions in T cells, but dozens of interactions in cells of the myeloid lineage, including cell-type specific interactions in monocytes, neutrophils and megakaryocytes (Figure 5). These examples demonstrate how PCHi-C can be used to unravel cell-type specific interactomes and identify promoter-interacting regions with regulatory potential.
Figure 1: Promoter Capture Hi-C rationale and capture bait design. (A) Schematic workflow of PCHi-C. In-nucleus ligation Hi-C41,42 (I) is followed by in-solution hybridization with biotinylated RNA baits (II) targeting the restriction fragments of all human (depicted here) or mouse gene promoters (III). (B) Bait design for PCHi-C. Biotinylated RNA capture baits (red curved lines) are designed against the ends of promoter-containing restriction fragments (grey; note that the promoter sequences themselves (red) are only targeted by the RNA capture baits if they are located at restriction fragment ends). Ligation products consisting of promoter-containing restriction fragments (grey) and their interacting restriction fragments (yellow and green) are isolated through sequence-complementarity hybridization between RNA bait and DNA target, and subsequent biotin-streptavidin pulldown, as shown in A. (C) Comparison of PCHi-C capture efficiency for promoter-containing restriction fragments targeted by one RNA bait capture probe vs two RNA bait capture probes (see schematic in B). Please click here to view a larger version of this figure.
Figure 2: PCHi-C pre-sequencing quality controls. (A) Left, schematic of spatial juxtaposition between promoter and PIR, resulting in a Hi-C ligation product consisting of a promoter-containing restriction fragment (grey; promoter sequence in red) and a PIR restriction fragment (yellow). Right, DNA gel electrophoresis showing examples of Hi-C ligation products amplified using specific primer pairs (as depicted in schematic on the left). (B) Left, representative examples of HindIII, NheI and HindIII/NheI restriction digests of Hi-C ligation products (PCR products shown in A). Right, schematic of DNA sequence after Hi-C ligation following unsuccessful (top) or successful (bottom) dNTP Klenow fill-in of restriction junctions and subsequent ligation. (C) Representative Hi-C library bioanalyzer profile (1/5 dilution). (D) Representative PCHi-C library bioanalyzer profile (no dilution). Please click here to view a larger version of this figure.
Figure 3: PCHi-C post-sequencing quality controls. (A) Comparison of percentage valid sequence read pairs after HiCUP43 processing in PCHi-C vs corresponding Hi-C libraries (data from Javierre et al., 201635). (B) Representative HiCUP PCHi-C result showing valid read pairs, and other sequence categories that are discarded prior to downstream analyses (data from Javierre et al., 201635). (C) Comparison of percentage 'same fragment internal' reads after HiCUP processing in PCHi-C vs corresponding Hi-C libraries (data from Javierre et al., 201635). (D) Comparison of percentage sequence reads involving baited promoter fragments (capture efficiency) in PCHi-C vs corresponding Hi-C libraries (data from Javierre et al., 201635). Please click here to view a larger version of this figure.
Figure 4: ALAD PCHi-C profile in human hematopoietic cells. Promoter interactions of myeloid cell types are shown as blue arches, and promoter interactions of lymphoid cell types are shown as purple arches. Erythroblast-specific interactions are indicated by red arrows (data from Javierre et al., 201635). Please click here to view a larger version of this figure.
Figure 5: IL8 PCHi-C profile in human hematopoietic cells. Promoter interactions of myeloid cell types are shown as blue arches, and promoter interactions of lymphoid cell types are shown as purple arches. Monocyte-specific interactions are indicated by green arrows, neutrophil-specific interactions are indicated by red arrows, and a megakaryocyte-specific interaction is indicated by a brown arrow (data from Javierre et al., 201635). Please click here to view a larger version of this figure.
Human | ||||||||
Name | Sequence | Chromosome | Strand | Start GRCh38/hg38 | End GRCh38/hg38 | Primer combinations to test 3C interactions and biotin incorporation | ||
hs AHF64 Dekker | GCATGCATTAGCCTCTGCTGTTCTCTGAAATC | 11 | + | 116803960 | 116803991 | use in combination with hs AHF66 Dekker | ||
hs AHF66 Dekker | CTGTCCAAGTACATTCCTGTTCACAAACCC | 11 | + | 116810219 | 116810248 | use in combination with hs AHF64 Dekker | ||
hs MYC locus | GGAGAACCGGTAATGGCAAA | 8 | – | 127733814 | 127733833 | use in combination with hs MYC +1820 or hs MYC -538 | ||
hs MYC +1820 | AAAATGCCCATTTCCTTCTCC | 8 | + | 129554527 | 129554547 | use in combination with hs MYC locus | ||
hs MYC -538 | TGCCTGATGGATAGTGCTTTC | 8 | – | 127195696 | 127195716 | use in combination with hs MYC locus | ||
hs HIST1 F | AAGCAGGAAAAGGCATAGCA | 6 | + | 26207174 | 26207193 | use in combination with hs HIST1 R | ||
hs HIST1 R | TCTTGGGTTGTGGGACTTTC | 6 | + | 27771575 | 27771594 | use in combination with hs HIST1 F | ||
Mouse | ||||||||
Sequence | Chromosome | Strand | Start GRCm38/mm10 | End GRCm38/mm10 | Primer combinations to test 3C interactions and biotin incorporation | |||
TCATGAGTTCCCCACATCTTTG | 8 | + | 84841090 | 84841111 | use in combination with mm Calr2 | |||
CTGTGGGCACCAGATGTGTAAAT | 8 | + | 84848519 | 84848541 | use in combination with mm Calr1 | |||
TATCAAGGGTGCCCGTCACCTTCAGC | 6 | + | 125163098 | 125163123 | use in combination with Gapdh4 Dekker | |||
GGGCTTTTATAGCACGGTTATAAAGT | 6 | + | 125163774 | 125163799 | use in combination with Gapdh3 Dekker | |||
GGAGGAGGGAAAAGGAGTGATT | 6 | + | 52212829 | 52212850 | use in combination with mm Hoxa13 | |||
CAGGCATTATTTGCTGAGAACG | 6 | – | 52253490 | 52253511 | use in combination with mm Hoxa7 | |||
GGGTAATGGTGTCACTAACTGG | 13 | + | 23571284 | 23571305 | use in combination with mm Hist1h3e or mm Hist1h4i | |||
GGGTTTGATGAGTTGGTGAAG | 13 | + | 23566541 | 23566561 | use in combination with mm Hist1h2ae | |||
TTGGGCCAAAGCCTATATGA | 13 | + | 22043085 | 22043104 | use in combination with mm Hist1h2ae |
Table 1: Primer sequences for quality control of human and mouse Hi-C libraries.
Modular design of Promoter Capture Hi-C
Promoter Capture Hi-C is designed to specifically enrich Hi-C libraries for interactions involving promoters. These interactions comprise only a subset of ligation products present in a Hi-C library.
Capture Hi-C can easily be modified to enrich Hi-C libraries for any genomic region or regions of interest by changing the capture system. Capture regions can be continuous genomic segments44,45,46,48, enhancers that have been identified in PCHi-C ('Reverse Capture Hi-C'35), or DNase I hypersensitive sites49. The size of the capture system can be adjusted depending on the experimental scope. For example, Dryden et al. target 519 bait fragments in three gene deserts associated with breast cancer44. The capture system by Martin et al. targets both continuous genomic segments ('Region Capture': 211 genomic regions in total; 2,131 restriction fragments) and selected promoters (3,857 gene promoters)45.
SureSelect libraries are available in different size ranges: 1 kb to 499 kb (5,190–4,806), 500 kb to 2.9 Mb (5,190–4,816), and 3 Mb to 5.9 Mb (5,190–4,831). As each individual capture biotin-RNA is 120 nucleotides long, these capture systems accommodate a maximum of 4,158, 24,166 and 49,166 individual capture probes, respectively. This corresponds to 2,079, 12,083, and 24,583 targeted restriction fragments, respectively (note that the numbers for restriction fragments are lower bounds based on the assumption that two individual capture probes can be designed for every restriction fragment — in reality due to repetitive sequences this will not be the case for every restriction fragment (see also Figure 1B, C), resulting in a higher number of targetable restriction fragments for a constant number of available capture probes).
The protocol described here is based on the use of a restriction enzyme with a 6 bp recognition site to uncover long-range interactions. Using a restriction enzyme with a 4 bp recognition site for greater resolution of more proximal interactions is also possible40,49.
Limitations of PCHi-C
One inherent limitation of all chromosome conformation capture assays is that their resolution is determined by the restriction enzyme used for the library generation. Interactions that occur between DNA elements located on the same restriction fragment are invisible to 'C-type' assays. Further, in PCHi-C, in some cases more than one transcription start site can be located on the same promoter-containing restriction fragment, and PIRs in some cases harbor both active and repressive histone marks, making it difficult to pinpoint which regulatory elements mediate the interactions, and to predict the regulatory output of promoter interactions. Using restriction enzymes with 4 bp recognition sites mitigates this issue but comes at the expense of vastly increased Hi-C library complexity (Hi-C libraries generated with 4 bp recognition site restriction enzymes are at least 100 times more complex than Hi-C libraries generated with 6 bp recognition site restriction enzymes), and the associated costs for next generation sequencing.
Another limitation is that the current PCHi-C protocol requires millions of cells as starting material, precluding the analysis of promoter interactions in rare cell types. A modified version of PCHi-C to enable the interrogation of promoter contacts in cell populations with 10,000 to 100,000 cells (for example cells during early embryonic development or hematopoietic stem cells) would therefore be a valuable addition to the Capture Hi-C toolbox.
Finally, like all methods that rely on formaldehyde fixation, PCHi-C only records interactions that are 'frozen' at the time point of fixation. Thus, to study the kinetics and dynamics of promoter interactions, methods such as super-resolution live cell microscopy are required alongside PCHi-C.
Methods to dissect spatial chromosome organization at high resolution
The vast complexity of chromosomal interaction libraries prohibits the reliable identification of interaction products between two specific restriction fragments with statistical significance. To circumvent this problem, sequence capture has been used to enrich either Hi-C33,34,40,44 or 3C50,51 libraries for specific interactions. The major advantage of using Hi-C libraries over 3C libraries for the enrichment step is that Hi-C, unlike 3C, includes an enrichment step for genuine ligation products. As a consequence, the percentage of valid reads in PCHi-C libraries is approximately 10-fold higher than in Capture-C libraries50, which contained around 5–8% valid reads after HiCUP filtering. Sahlen et al. have directly compared Capture-C to HiCap, which like PCHi-C uses Hi-C libraries for capture enrichment, in contrast to Capture-C which uses 3C libraries. Consistent with our findings, they found that Capture-C libraries are mainly composed of un-ligated fragments40. In addition, HiCap libraries had a higher complexity than Capture-C libraries40.
A variant of Capture-C, called next-generation Capture-C52 (NG Capture-C) uses one oligo per restriction fragment end, as previously established in PCHi-C33,34, instead of overlapping probes used in the original Capture-C protocol50. This increases the percentage of valid reads compared to Capture-C modestly, but NG Capture-C employs two sequential rounds of capture enrichment, and a relatively high number of PCR cycles (20 to 24 cycles in total, compared to 11 cycles typically for PCHi-C), which inevitably results in higher numbers of sequence duplicates and lower library complexity. In trial experiments during the optimization of PCHi-C, we found that the percentage of unique (i.e., not duplicated) read pairs was only around 15% when we used 19 PCR cycles (13 cycles pre-capture + 6 cycles post-capture; data not shown), however optimization to a lower number of PCR cycles, typically yields 75–90% unique read pairs. Thus, reducing the number of PCR cycles substantially increases the amount of informative sequence data.
A recent method combines ChIP with Hi-C to focus on chromosomal interactions mediated by a specific protein of interest (HiChIP53). Compared to ChIA-PET54, which is based on a similar rationale, HiChIP data contains a higher number of informative sequence reads, allowing for higher-confidence interaction calling53. It will be very interesting to directly compare the corresponding HiChIP and Capture Hi-C data sets once they become available (for example HiChIP using an antibody against the cohesin unit Smc1a53 with Capture Hi-C for all Smc1a bound restriction fragments) side by side. One inherent difference between these two approaches is that Capture Hi-C does not rely on chromatin immunoprecipitation, and therefore is capable of interrogating chromosomal interactions irrespective of protein occupancy. This enables comparison of 3D genome organization in the presence or absence of specific factor binding, as has been used to identify PRC1 as a key regulator of mouse ESC spatial genome architecture7.
PCHi-C and GWAS
Genome-wide association studies (GWAS) have revealed that greater than 95% of disease-associated sequence variants are located in non-coding regions of the genome, often at great distances to protein-coding genes55. GWAS variants are often found in close proximity to DNase I hypersensitive sites, which is a hallmark of sequences with potential regulatory activity. PCHi-C and Capture Hi-C have been used extensively to link promoters to GWAS risk loci implicated in breast cancer44, colorectal cancer48, and autoimmune disease35,45,46. A PCHi-C study on 17 different human hematopoietic cell types found SNPs associated with autoimmune disease were enriched in PIRs in lymphoid cells, whereas sequence variants associated with platelet and red blood cell specific traits were predominantly found in the macrophages and erythroblasts, respectively35,56. Thus, tissue-type specific promoter interactomes uncovered by PCHi-C may help to understand the function of non-coding disease-associated sequence variants and identify new potential disease genes for therapeutic intervention.
Characteristics of promoter-interacting regions
Several lines of evidence link promoter interactomes to gene expression control. First, several PCHi-C studies have demonstrated that genomic regions interacting with promoters of (highly) expressed genes are enriched in marks associated with enhancer activity, such as H3K27 acetylation and p300 binding33,34,37. We found a positive correlation between gene expression level and the number of interacting enhancers, suggesting that additive effects of enhancers result in increased gene expression levels34,35. Second, naturally occurring expression quantitative trait loci (eQTLs) are enriched in PIRs that are connected to the same genes whose expression is affected by the eQTLs35. Third, by integrating TRIP57 and PCHi-C data, Cairns et al. found that TRIP reporter genes mapping to PIRs in mouse ESCs show stronger reporter gene expression than reporter genes at integration sites in non-promoter-interacting regions58, indicating that PIRs possess transcriptional regulatory activity. Together, these findings suggest that promoter interactomes uncovered by PCHi-C in various mouse and human cell types include key regulatory modules for gene expression control.
It is worth noting that enhancers represent only a small fraction (~20%) of all PIRs uncovered by PCHi-C33,34. Other PIRs could have structural or topological roles rather than direct transcriptional regulatory functions. However, there is also evidence that PCHi-C may uncover DNA elements with regulatory function that do not harbor classical enhancer marks. In a human lymphoid cell line, the BRD7 promoter was found to interact with a region devoid of enhancer marks that was shown to possess enhancer activity in reporter gene assays33. Regulatory elements with similar characteristics may be more abundant than currently appreciated. For example, a CRISPR-based screen for regulatory DNA elements identified unmarked regulatory elements (UREs) that control gene expression but are devoid of enhancer marks59.
In other cases, PIRs have been shown to harbor chromatin marks associated with transcriptional repression. PIRs and interacting promoters bound by PRC1 in mouse ESCs were engaged in an extensive spatial network of repressed genes bearing the repressive mark H3K27me37. In human lymphoblastoid cells, a distant element interacting with the BCL6 promoter repressed transgene reporter gene expression33, suggesting that it may function to repress BCL6 transcription in its native context.
PIRs enriched for occupancy of the chromatin insulator protein CTCF in human ESCs and NECs37 may represent yet another class of PIRs. Collectively, these results suggest that PIRs harbor a collection of gene regulatory activities yet to be functionally characterized.
The authors have nothing to disclose.
We thank Valeriya Malysheva for critical reading of the manuscript and expert help with Figure 1. This work was supported by the Medical Research Council, UK (MR/L007150/1) and the UK Biotechnology and Biological Sciences Research Council, UK (BB/J004480/1).
16% (vol/vol) paraformaldehyde solution | Agar Scientific | R1026 | |
Dulbecco's Modified Eagle Medium (DMEM) 1x | Life Technologies | 41965-039 | |
Fetal bovine serum (FBS) sterile filtered | Sigma | F9665 | |
Low-retention filter tips | Starlab | S1180-3810, S1180-1810, S1180-8810 and S1182-1830 | |
10x PBS pH 7.4 | Life Technologies | 70011-036 | |
Molecular biology grade water | Sigma-Aldrich | W4502 | |
1 M Tris-HCl pH 8.0 | Life Technologies | 15568-025 | |
IGEPAL CA-630 | Sigma-Aldrich | I8896 | |
5 M NaCl | Life Technologies | 24740-011 | |
Protease inhibitor cocktail (EDTA-free) | Roche Diagnostics | 11873580001 | |
Restriction buffer 2 (10x NEBuffer 2) | New England Biolabs | B7002 | |
DNA LoBind tube, 1.5 mL | Eppendorf | 0030 108.051 | |
DNA LoBind tube, 2 mL | Eppendorf | 30108078 | |
20% (wt/vol) SDS | Bio-Rad Laboratories | 161-0418 | |
20% (vol/vol) Triton X-100 | Sigma-Aldrich | T8787 | |
HindIII, 100 U/uL | New England Biolabs | R0104 | |
10 mM dCTP | Life Technologies | 18253-013 | |
10 mM dGTP | Life Technologies | 18254-011 | |
10 mM dTTP | Life Technologies | 18255-018 | |
0.4 mM Biotin-14-dATP | Life Technologies | 19524-016 | |
DNA polymerase I large (Klenow) fragment 5000 units/mL | New England Biolabs | M0210 | |
10x T4 DNA ligase reaction buffer | New England Biolabs | B0202 | |
100x 10mg/ml Bovine Serum Albumin | New England Biolabs | B9001 | |
T4 DNA ligase, 1 U/μL | Invitrogen | 15224-025 | |
RNase A | Roche | 10109142001 | |
Proteinase K, recombinant, PCR grade | Roche | 3115836001 | |
20 000×g 50 ml centrifuge tube | VWR | 525-0156 | |
0.5 M EDTA pH 8.0 | Life Technologies | 15575-020 | |
Phenol pH 8.0 | Sigma | P4557 | |
Phenol: Chloroform: Isoamyl Alcohol 25:24:1 | Sigma | P3803 | |
Sodium acetate pH 5.2 | Sigma | S7899 | |
Quant-iT PicoGreen | Invitrogen | P7589 | |
QIAquick Gel Extraction Kit | Qiagen | 28704 | |
QIAquick PCR Purification Kit | Qiagen | 28104 | |
Restriction buffer 2.1 (10x NEBuffer 2.1) | New England Biolabs | B7202 | |
NheI, 100U/uL | New England Biolabs | R0131 | |
Micro TUBE AFA Fiber Pre-slit snap cap 6x16mm vials | Covaris | 520045 | For sonication |
SPRI beads (Agencourt AMPure XP) | Beckman Coulter | A63881 | |
Dynabeads MyOne Streptavidin C1 beads | Invitrogen | 65001 | |
Tween 20 | Sigma | P9416 | |
10 mM dATP | Life Technologies | 18252-015 | |
T4 DNA polymerase 3000 units/mL | New England Biolabs | M0203 | |
T4 PNK 10000 units/mL | New England Biolabs | M0201 | |
Klenow exo minus 5000 units/mL | New England Biolabs | M0212 | |
Quick ligation reaction buffer | New England Biolabs | B6058 | |
NEB DNA Quick ligase | New England Biolabs | M2200 | |
PE adapter 1.0 (5'-P-GATCGGAAGAGCGGTTCAGC AGGAATGCCGAG-3') |
Illumina | ||
PE adapter 2.0 (5'-ACACTCTTTCCCTACACGACGCT CTTCCGATCT-3') |
Illumina | ||
NEB Phusion PCR kit | New England Biolabs | M0530 | |
PE PCR primer 1.0 (5'-AATGATACGGCGACCACCGA GATCTACACTCTTTCCCTAC ACGACGCTCTTCCGATCT-3') |
Illumina | ||
PE PCR primer 2.0 (5'-CAAGCAGAAGACGGCATACGA GATCGGTCTCGGCATTCCT GCTGAACCGCTCTTCCGATCT-3') |
Illumina | ||
PCR strips | Agilent Technologies | 410022 and 401425 | |
SureSelect SSEL TE Reagent ILM PE full adaptor kit | Agilent Technologies | 931108 | |
SureSelect custom 3-5.9 Mb library | Agilent Technologies | 5190-4831 | custom design mouse or human PCHi-C system |
Dynabeads MyOne Streptavidin T1 beads | Invitrogen | 65601 | |
E220 high-performance focused ultra-sonicator | Corvaris | E220 |