The identification of physical interactions between genes and regulatory elements is challenging but has been facilitated by chromosome conformation capture methods. This modification to the 4C-seq protocol mitigates PCR bias by minimizing over-amplification of PCR templates and maximizes the mappability of reads by incorporating an addition restriction enzyme digest step.
The identification of regulatory elements for a given target gene poses a significant technical challenge owing to the variability in the positioning and effect sizes of regulatory elements to a target gene. Some progress has been made with the bioinformatic prediction of the existence and function of proximal epigenetic modifications associated with activated gene expression using conserved transcription factor binding sites. Chromatin conformation capture studies have revolutionized our ability to discover physical chromatin contacts between sequences and even within an entire genome. Circular chromatin conformation capture coupled with next-generation sequencing (4C-seq), in particular, is designed to discover all possible physical chromatin interactions for a given sequence of interest (viewpoint), such as a target gene or a regulatory enhancer. Current 4C-seq strategies directly sequence from within the viewpoint but require numerous and diverse viewpoints to be simultaneously sequenced to avoid the technical challenges of uniform base calling (imaging) with next generation sequencing platforms. This volume of experiments may not be practical for many laboratories. Here, we report a modified approach to the 4C-seq protocol that incorporates both an additional restriction enzyme digest and qPCR-based amplification steps that are designed to facilitate a greater capture of diverse sequence reads and mitigate the potential for PCR bias, respectively. Our modified 4C method is amenable to the standard molecular biology lab for assessing chromatin architecture.
The identification of regulatory elements for gene expression has been facilitated by the Encyclopedia of DNA Elements (ENCODE) Project that comprehensively annotated functional activity for 80% of the human genome1,2. The identification of the sites for in vivo transcription factor binding, DNaseI hypersensitivity, and epigenetic histone and DNA methylation modifications in individual cell types paved the way for the functional analyses of candidate regulatory elements for target gene expression. Armed with these findings, we are faced with the challenge of determining the functional interconnectivity between regulatory elements and genes. Specifically, what is the relationship between a given target gene and its enhancer(s)? The chromatin conformation capture (3C) method directly addresses this question by identifying physical, and likely functional, interactions between a region of interest and candidate interacting sequences through captured events in fixed chromatin3. As our understanding of chromatin interactions has increased, however, it is clear that the investigation of preselected candidate loci is insufficient to provide a complete understanding of gene-enhancer interactions. For example, ENCODE used the high-throughput chromosomal conformation capture carbon copy (5C) method to examine a small portion of the human genome (1%, pilot set of 44 loci) and reported complex interconnectivity of the loci. Genes and enhancers with identified interactions averaged 2–4 different interacting partners, many of which were hundreds of kilobases away in linear space4. Further, Li et al. used Chromatin Interaction Analysis by Paired-End Tag Sequencing (ChIA-PET) to analyze whole-genome promoter interactions and found that 65% of RNA polymerase II binding sites were involved in chromatin interactions. Some of these interactions resulted in large, multi-gene complexes spanning hundreds of kilobases of genomic distance and containing, on average, 8-9 genes each5. Together, these findings highlight the need for unbiased whole-genome methods for interrogating chromatin interactions. Some of these methods are reviewed in Schmitt et al.6.
More recent methods for chromatin conformation capture studies coupled with next-generation sequencing (Hi-C and 4C-seq) enable the discovery of unknown sequences interacting with a region of interest6. Specifically, circular chromosome conformation capture with next-generation sequencing (4C-seq) was developed to identify loci interacting with a sequence of interest in an unbiased manner7 by sequencing DNA from captured chromatin proximal to the region of interest in 3D space. Briefly, chromatin is fixed to preserve its native protein-DNA interactions, cleaved with a restriction enzyme, and subsequently ligated under dilute conditions to capture biologically relevant "tangles" of interacting loci (Figure 1). The cross links are reversed to remove the protein, thus leaving the DNA available for additional cleavage with a second restriction enzyme. A final ligation generates smaller circles of interacting loci. Primers to the sequence of interest are then used to generate an amplified library of unknown sequences from the circularized fragments, followed by downstream next generation sequencing.
The protocol presented here, which focuses on sample preparation, makes two major alterations to existing 4C-seq methods8,9,10,11,12. First, it uses a qPCR-based method to empirically determine the optimal number of amplification cycles for 4C-seq library preparation steps and thus mitigates the potential for PCR bias stemming from over-amplification of libraries. Second, it uses an additional restriction digest step in an effort to reduce the uniformity of known "bait" sequences that hinders accurate base-calling by the sequencing instrument and, hence, maximizes the unique, informative sequence in each read. Other protocols circumvent this issue by pooling many (12-15)8 4C-seq libraries with different bait sequences and/or restriction sites, a volume of experiments which may not be achievable by other laboratories. The modifications presented here allow a small number of experiments, samples, and/or replicates to be indexed and pooled into a single lane.
1. Restriction Enzyme Selection
2. Design and Test Primers for Inverse PCR and Digestion Efficiency qPCR
3. Collection of Cells
4. Formaldehyde Cross-linking of Cells to Preserve Chromatin Interactions
5. Cell Lysis
6. First Restriction Digestion
7. First Ligation
8. Reverse Cross-linking and Isolate Chromatin
9. Second Restriction Digestion: Trimming the Circles
NOTE: This step creates smaller circles to minimize the overrepresentation of smaller captured fragments due to PCR bias in downstream amplification steps.
10. Second Ligation and DNA Purification
11. PCR Amplify Unknown Interacting Sequences by Inverse PCR
12. Third Restriction Digestion: Trim off Bait Sequences
NOTE: This step removes non-informative bait sequences from the inverse PCR products to maximize informative captured sequences in the downstream sequencing steps. To monitor digest efficiency, a “digest monitor”15 is digested in parallel using equivalent DNA and enzyme concentration. If RE1 and RE2 are incompatible for simultaneous digestion, for example, due to different optimal incubation temperatures or reaction buffers, this must be done as a sequential digest (this is not ideal).
13. Preparation of Sequencing Library
14. Sequencing and Analysis of Sequencing Data
15. Analysis of Sequence Data
Primary human keratinocytes were isolated from 2–3 discarded neonatal foreskins, pooled, and cultured in KSFM supplemented with 30 µg/mL bovine pituitary extract, 0.26 ng/mL recombinant human epidermal growth factor, and 0.09 mM calcium chloride (CaCl2) at 37 °C, 5% carbon dioxide. The cells were split into two flasks, and one flask was differentiated by the addition of CaCl2 to a final concentration of 1.2 mM for 72 h. 107 cells each from proliferating and differentiated keratinocyte populations were fixed in 1% formaldehyde for 10 min at room temperature. Separately, K562 cells were grown in RPMI supplemented with 10% fetal bovine serum at 37 °C, 5% carbon dioxide. 107 cells were fixed in 1% formaldehyde for 10 min at room temperature.
The cells were lysed, and the fixed chromatin was digested with HindIII and ligated as described above. Cross-links were reversed, and the DNA is purified and digested with CviQI. DNA was ligated and used as the template for inverse PCR amplification. Primers used were: 5'-GATCAGGAGGGACTGGAACTTG/5'- CCTCCCTTCACATCTTAGAATG. 1 µg of purified inverse PCR product was digested overnight with CviQI in parallel with 225 ng of RE digest monitor and column-purified. Purified DNA was then digested overnight with HindIII in parallel with 125 ng of RE digest monitor and column-purified. 250 ng of purified double-digested inverse PCR product was used for library preparation. Briefly, the ends were repaired via conversions to 5'-phosphorylated and blunt-ends and subsequent DNA column-purification. Ends were A-tailed 20 min at 72 °C in a 50 µL reaction with 1 U Taq polymerase and 200 µM dATP and column-purified. Compatible sequencing library prep adapters were ligated at a 10:1 (adapter:DNA) ratio using T4 DNA ligase in a 30 µL reaction at room temperature for 15 min and the DNA column-purified. Libraries were run on a 2% agarose gel, gel slices were cut from 120 bp to the top of the ladder to remove adapters, and libraries were purified. 200 pg of each library were evaluated in 10 µL qPCR reactions containing indexing primers to determine optimal PCR amplification cycles. Libraries were amplified to add indices using the number of cycles determined by qPCR and sequenced on a HiSeq2500 to obtain 1×50 reads.
Reads were demultiplexed and trimmed. First, sequencing adapters were removed. Subsequently, the sequence from the beginning of each primer to the restriction site was trimmed. This enables the mapping of inverse PCR products that were not fully digested prior to library preparation. Importantly, including sequence between the primer binding site and the restriction site prevents mapping of non-specifically-amplified PCR product. Trimmed reads were mapped to hg38 using BWA. Figure 7 shows reads mapping to the region surrounding the viewpoint.
Figure 1: Schematic of 4C-seq workflow. Chromatin is cross-linked to preserve protein-DNA contacts, digested with RE1, and ligated to link interacting loci. Cross-links are reversed, and DNA is digested with RE2 and ligated. Unknown interacting sequences are amplified using primers that bind to the region of interest, and PCR products are digested with RE1 and RE2 to trim known sequences. Amplified DNA of unknown sequence is used in a sequencing library prep, in which the adapters are ligated and the library is PCR-amplified and sequenced. Please click here to view a larger version of this figure.
Figure 2: Schematics of primer designs. (A) Binding sites for inverse PCR primers. Primers are oriented "outward" (i.e., the 5' primer is a "reverse" primer, complementary to the plus strand, while the 3' primer is a "forward" primer, complementary to the minus strand) and bind within 50 bp of each restriction site (indicated by red shading). (B) Binding sites for qPCR primers for determining restriction enzyme digest efficiency. One primer pair flanks each restriction enzyme site. A control primer set amplifies a sequence that does not contain a site for either enzyme and is used to normalize Ct values for DNA input. Please click here to view a larger version of this figure.
Figure 3: Agarose gel electrophoresis to assess the ligation efficiency for RE1-digested chromatin. Uncut, HindIII-digested, and ligated samples were treated with proteinase K and heated to reverse cross-links, phenol:chloroform extracted, EtOH precipitated, and resuspended in H2O. Purified DNA was run on a 0.6% agarose gel and bands visualized by ethidium bromide staining with comparison to 1 kb plus ladder. Please click here to view a larger version of this figure.
Figure 4: 4C template titration for inverse PCR. Serial dilutions were made of 4C templates and used in inverse PCR reactions. PCR products were run on 1.5% agarose gel and visualized by ethidium bromide staining alongside 1 kb plus ladder. NTC denotes no-template control reaction. Note a correlative increase in amplification with an increase in template concentration. This representative amplification was conducted on chromatin cleaved with HindIII as RE1 and CviQI as RE2, and primer sequences used for inverse PCR amplification were 5'-GATCAGGAGGGACTGGAACTTG/5'- CCTCCCTTCACATCTTAGAATG. Please click here to view a larger version of this figure.
Figure 5: qPCR-mediated determination of amplification cycles needed to amplify template for inverse PCR. 4C templates were amplified in reactions containing 1x SYBR Green and 1x ROX in a real-time thermocycler. Peak fluorescence was determined and the number of cycles required to reach ¼ of peak fluorescence was calculated. This number of cycles was used to amplify the 4C template for inverse PCR. Please click here to view a larger version of this figure.
Figure 6: Agarose gel electrophoresis of digested RE monitor indicating sufficient digestion. RE digest monitor was amplified from human genomic DNA using the primers F: 5'-TCCTATCCCTGGTCTGTCTTAT and R: 5'-CCACATTGGTCCTTCTAGTCTTC and purified. 225 ng of monitor was digested with 15 U CviQI in a 50 µL reaction at 25 °C overnight and run on a 1.5% agarose gel alongside 1 kb plus ladder. Bands were visualized with ethidium bromide staining. Uncut monitor is 2515 bp; expected fragment sizes are 132, 343, 488, 539, and 1013 bp. Please click here to view a larger version of this figure.
Figure 7: Representative genomic track of read coverage within 5 kb of the viewpoint region. Reads were trimmed and mapped to hg38 using BWA. The majority of reads (blue peaks) align at HindIII or CviQI sites adjacent to HindIII sites, as expected. The viewpoint region is highlighted in red. Please click here to view a larger version of this figure.
Volume for 1x | |
10x Expand Long Template Buffer 1 | 2.5 µL |
dNTPs (10 mM) | 0.5 µL |
Forward primer | 35 pmol |
Reverse primer | 35 pmol |
Expand Long Template Polymerase (5 U/µL) | 0.35 µL |
DNA | |
Nuclease-free water | to 25 µL |
Table 1: PCR reaction mixture for the amplification of 4C template (Step 11.1).
Volume for 1x | |
10x Expand Long Template Buffer 1 | 1.5 µL |
dNTPs (10 mM) | 0.3 µL |
Forward primer | 21 pmol |
Reverse primer | 21 pmol |
100x SYBR Green I | 0.15 µL |
50x ROX | 0.3 µL |
Expand Long Template Polymerase (5 U/µL) | 0.21 µL |
DNA | 100 ng |
Nuclease-free water | to 25 µL |
Table 2: qPCR reaction mixture for the determination of the number of amplification cycles for inverse PCR (Step 11.3.1).
Volume for 1x | |
10x Expand Long Template Buffer 1 | 80 µL |
dNTPs (10 mM) | 16 µL |
Forward primer | 1.12 nmol |
Reverse primer | 1.12 nmol |
Expand Long Template Polymerase (5 U/µL) | 11.2 µL |
DNA | 3.2 µg |
Nuclease-free water | to 800 µL |
Table 3. PCR reaction mixture for the final inverse PCR amplification of 4C template (Step 11.4).
Volume for 1x | |
5x Phusion HF buffer | 2 µL |
dNTPs (10 mM) | 0.2 µL |
Miltiplexing Primer 1.0 | 5 pmol |
Miltiplexing Primer 2.0 | 0.1 pmol |
Index primer | 5 pmol |
100x SYBR Green I | 0.1 µL |
50x ROX | 0.2 µL |
Phusion polymerase | 0.1 µL |
DNA | 2 µL |
Nuclease-free water | to 10 µL |
Table 4: qPCR reaction mixture for the determination of the number of amplification cycles for sequencing library prep (Step 13.3).
Volume for 1x | |
5x Phusion HF buffer | 10 µL |
dNTPs (10 mM) | 1 µL |
Miltiplexing Primer 1.0 | 25 pmol |
Miltiplexing Primer 2.0 | 0.5 pmol |
Index primer | 25 pmol |
Phusion polymerase | 0.5 µL |
DNA | 10 µL |
Nuclease-free water | to 50 µL |
Table 5: PCR reaction mixture for the amplification of libraries for sequencing (Step 13.4).
4C results have the potential to reveal chromatin interactions that can identify previously unknown regulatory elements and/or target genes that are important in a specific biological context24,25,26. However, technical hurdles may limit the data obtained from these experiments. PCR bias stemming from over-amplification of template in 4C protocols is likely. This protocol addresses this issue by utilizing qPCR to determine the optimal number of amplification cycles in an objective manner. In addition, the removal of the bait sequences from amplified inverse PCR products by restriction digest can facilitate the identification of chromatin interactions for two reasons. First, it reduces the length of non-informative (bait) base pairs from the material to be sequenced. Second, and as a result, it increases the likelihood that more reads will be generated from diverse sequences (a property required for accurate base calling) and thus more informative interacting sequences can be mapped. Other protocols require the pooling of many libraries using different bait sequences and/or restriction enzymes or require increasing the phiX concentration of the sequenced sample to circumvent the sequence uniformity issue for accurate base calling. This method allows multiple samples with the same bait sequence to be pooled into a single sequencing lane without occupying valuable sequencing capacity with excess phiX.
Cell fixation and lysis are critical early steps, both of which may require optimization for particular cell types. Insufficient fixation will fail to preserve specific contacts between a region of interest and its interacting sequences, yielding uninformative data dominated by noise. In contrast, over-fixation will decrease the ability of restriction enzymes to cleave chromatin, resulting in fewer informative ligation events. Both fixative concentration and incubation time can be altered to optimize this variable. Similarly, insufficient cell lysis reduces access of restriction enzymes to chromatin, again reducing the number of informative ligation events. In our hands, human primary keratinocytes lysed most effectively using a combination of hypotonic conditions and detergent. Other cell types might require different lysis conditions, which will have to be determined empirically. Efficient lysis can be identified by microscopy dye exclusion methods such as Trypan Blue staining.
One limitation of the 4C method is that the results can only represent a population average. With a heterogeneous cell population, it can be difficult to determine true interactions vs. noise due to biological variability. While the use of a cell line or the sorting of cells to produce a homogeneous cell population is predicted to generate clearer signals, variability between individual cells is still a possible source of noise. Recent advances in single-cell sequencing technologies have the potential to overcome this problem. Additionally, the validation of population-based 4C results may be performed using methods such as digital droplet PCR or FISH to determine if these results are reflected at the single-cell level.
The authors have nothing to disclose.
This work was supported by NIAMS (R01AR065523).
HindIII | NEB | R0104S | |
CviQI | NEB | R0639S | |
DNA oligonucleotide primers | IDT | To be designed by the reader | |
50 mL conical centrifuge tubes | Fisher Scientific | 06-443-19 | |
1.7 mL microcentrifuge tubes | MidSci | AVSS1700 | |
Phosphate buffered saline | Thermo Fisher | 14190-136 | |
Formaldehyde, methanol free | Electron Microscopy Sciences | 15710 | |
Nutator | VWR | 15172-203 | |
Glycine | JT Baker | 4059-00 | |
Benchtop centrifuge | |||
Refrigerated microcentrifuge | |||
Ethylenediaminetetraacetic acid (EDTA) | Sigma-Aldrich | ED2SS | |
20% SDS solution | Sigma-Aldrich | 05030 | |
Trypan Blue | Thermo Fisher | 15250061 | |
Glass slides | Fisher Scientific | 12-550-143 | |
Cover slips | VWR | 16004-094 | |
Light microscope | |||
Triton X-100 | Alfa Aesar | A16046 | |
Shaking heat block | |||
2M Tris-HCl | Quality Biological | 351-048-101 | |
Proteinase K | NEB | P8107S | |
Phenol:chloroform:isoamyl alcohol (25:24:1) | Sigma-Aldrich | P2069 | |
Sodium acetate | Sigma-Aldrich | M5661 | |
20 mg/mL glycogen | Thermo Fisher | R0561 | |
Ethanol | Fisher Scientific | 04-355-223 | |
Nuclease-free water | Fisher Scientific | MT-46-000-CM | |
qPCR cycler | Thermo Fisher | 4453536 | |
qPCR plates | Thermo Fisher | 4309849 | |
Thermocycler | Thermo Fisher | 4375786 | |
PCR strip tubes | MidSci | AVSST-FL | |
1M Magnesium chloride | Quality Biological | 351-033-721 | |
Dithiothreotol | Sigma-Aldrich | 43815 | |
Adenosine triphosphate | Sigma-Aldrich | A2383 | |
T4 DNA Ligase | NEB | M0202S | |
Agarose | Sigma-Aldrich | A6013 | |
RNase A | Thermo Fisher | EN0531 | |
Qiaquick PCR purification kit | Qiagen | 28104 | |
MinElute PCR Purification kit | Qiagen | 28004 | |
Spectrophotometer | |||
Expand Long Template PCR System | Sigma-Aldrich | 11681834001 | |
dNTP mix | Thermo Fisher | R0191 | |
SYBR Green I | Sigma-Aldrich | S9430 | |
ROX | BioRad | 172-5858 | |
Sodium chloride | Sigma-Aldrich | S5886 | |
End-It DNA End-Repair Kit | Lucigen | ER0720 | |
LigaFast Rapid DNA Ligation System | Promega | M8221 | |
SYBR Safe | Thermo Fisher | S33102 | |
Taq Polymerase | NEB | M0267S | |
UltraSieve Agarose | IBI Scientific | IB70054 | |
Qiaquick Gel Extraction Kit | Qiagen | 28704 |