Here, we present an adaption of the chromosome conformation capture (3C) technique in detail with an emphasis on undergraduate involvement and learning.
Chromosome conformation capture (3C) is a powerful tool that has spawned a family of similar techniques (e.g., Hi-C, 4C, and 5C, referred to here as 3C techniques) that provide detailed information of the three-dimensional organization of chromatin. The 3C techniques have been used in a wide range of studies, from monitoring the changes in chromatin organization in cancer cells to identifying enhancer contacts made with gene promoters. While many of the studies using these techniques are asking big genome-wide questions with intricate sample types (i.e., single-cell analysis), what is often lost is that the 3C techniques are grounded in basic molecular biology methods that are applicable to a broad range of studies. By addressing tightly focused questions of chromatin organization, this cutting-edge technique can be used to enhance the undergraduate research and teaching lab experience. This paper presents a 3C protocol and provides adaptations and points of emphasis for implementation at primarily undergraduate institutions in undergraduate research and teaching experiences.
An organism's genome not only holds all the genes required for function but also all the instructions on how and when to use them. This makes regulating access to the genome one of the most important functions of the cell. There are many mechanisms to control gene function; however, at its base level, gene regulation comes down to the ability of regulatory transcription factors (trans-factors) to bind to their specific DNA sequences (cis-regulatory sequences). This is not an innate ability; instead, it is governed by the organization/structure of the genome in the nucleus, which controls the availability/exposure of the cis-regulatory sequences to the trans-factors1,2,3. If the trans-factors cannot find their cis-regulatory sequences, then the trans-factors cannot perform their regulatory tasks. This has made understanding how genomes are organized in the nucleus an important source of inquiry.
It is widely accepted that during interphase, eukaryotic chromosomes in the nucleus occupy their own domain anchored to the nuclear lamina and nuclear matrix (Figure 1), thus making the chromosome more like a slice of pizza, rather than a noodle on a plate of spaghetti. Chromosomes are partially condensed by protein-DNA interactions (chromatin) that twist and loop portions of the chromosome. Through electron microscopy, three-dimensional DNA fluorescence in situ hybridization (FISH), and DNA tagging techniques (i.e., fluorescent and artificial DNA methylation), inactive domains of chromatin have been found to be packed tightly along the nuclear periphery4,5,6, while portions of active, less condensed chromatin are found in the interior of the nucleus7,8,9,10. These experiments provide a wide-angle view of chromosome dynamics but do little to capture the changes that occur locally around the gene promotors observed in DNase11,12 and nucleosome13,14,15 studies.
The key to unlocking higher-resolution chromatin dynamics was the formulation of the 3D chromosome mapping technique, 3C. The 3C technique itself comprises four main steps: crosslinking of chromatin, chromatin digestion by restriction enzymes, chromatin ligation, and DNA purification (Figure 2). The new artificial DNA fragments generated by this process can then be characterized to reveal the close physical association between linearly distant pieces of DNA16. The 3C technique became the basis for the creation of multiple spin-off techniques that utilize the initial steps of 3C to ask broader genome-wide questions (e.g., Hi-C, 4C, ChIP-C). This family of 3C techniques has identified that chromosomes are organized into multiple discrete units termed topologically associated domains (TADs). TADs are encoded in the genome and are defined by chromatin loops flanked by unlooped boundaries16,17,18,19. The TAD boundaries are maintained by two evolutionarily conserved and ubiquitous factors, including CCCT binding factor (CTCF) and cohesion, which prevent loops within separate TADs from interacting16,20. The loops are mediated by the interaction of trans-factors with their regulatory sequences, as well as CTCF and cohesion21.
Though many studies using 3C technologies ask broad genome-wide questions and employ complicated sample collection techniques, the formulation of the 3C technique is based on basic molecular biology techniques. This makes 3C intriguing for deployment in both undergraduate research and teaching labs. The 3C technique can be employed for smaller focused questions and is inherently flexible to scaling up or down (single genes22, chromosomes16, and/or genomes18) depending on the focus and direction of the questions asked. This technique has also been applied to a wide range of model systems7,16,19,23 and has been proven to be versatile in its use. This makes 3C an excellent technique for undergraduates in that students can gain experience in common molecular biology techniques while also gaining valuable experience in answering directed questions.
Presented here is an adapted protocol for 3C library preparation based on previously published protocols24,25,26,27. This protocol has been optimized for approximately 1 × 107 cells, though it has generated 3C libraries with as little as 1 × 105 cells. This protocol has proven to be versatile and has been used to generate 3C libraries from zebrafish embryos, zebrafish cell lines, and young-adult (YA) Caenorhabditis elegans (roundworm). The protocol should also be appropriate for mammalian cell lines and, with further adaptation, yeast.
The goal of these adaptations is to make 3C more accessible for undergraduates. Care has been taken to use techniques that are similar to those that can be accomplished in an undergraduate teaching laboratory. The 3C technique provides many learning opportunities for undergraduates to learn basic molecular biology techniques that will benefit their development at the bench, in the classroom, and in their endeavors after graduation.
1. Primer design
NOTE: The 3C primers design tools are available online28. Alternatively, custom primers can be designed by the students (see below).
2. Day 1
NOTE: The protocol can be paused (frozen at −20 °C) after chromatin cross-linking and after the nuclei collection. The steps take, on average, 5-6 h with undergraduates.
3. Day 2
NOTE: On average, it takes undergraduates 5 h to complete these steps.
4. Day 3
NOTE: On average, it takes undergraduates 15-30 min to complete these steps. After the overnight incubation, the samples can be frozen.
5. Day 4
NOTE: On average, it takes undergraduates 4-5 h to complete these steps.
6. Day 5
NOTE: On average, it takes undergraduates 1-2 h to complete these steps.
This procedure will produce one experimental 3C sample and two control samples (undigested and digested). Using these three samples, qPCR was performed. From these results, the digestion efficiency was calculated (equation 1) and recorded (Table 1). From these calculations, it was determined that the 3C sample had an approximately 88% digestion efficiency (average of Table 1) across the seven genomic loci tested.
Next, the samples were tested for the presence of long-range chromatin contacts between the different genomic loci using combinations of loci-specific primers (Table 2) and qPCR. Using these results, the product abundances relative to a control primer set were calculated (equation 2) and graphed for comparison (Figure 4). These data indicated that 8 of the 10 reactions were conditional positives for long-range interactions.
The PCR reactions were then run on an agarose gel. The expected PCR product for the eight conditional positives and one negative reaction were gel-purified and sent for Sanger sequencing. The results for representative positive (blue arrow, blue box) and negative (red arrow, red box) reactions are shown (Figure 5).
Figure 1: Chromosome structure in the nucleus. Hypothetical chromosome organization inside the nucleus. (A) Nuclear envelope, black lines; (B) nuclear lamina, orange; (C) heterochromatin, compacted lines; (D) euchromatin, loose loops. Please click here to view a larger version of this figure.
Figure 2: Schematic of the 3C protocol. Distant portions of linear chromosomes (blue, yellow, and pink) are brought close together within the nucleus through regulatory loops. (A) The loop structures are mediated by transcription factors (grey circle and black star); these interactions are preserved through chemical crosslinking. (B) The loops are broken through enzymatic digestion (black lines). (C) Distant chromatin pieces are ligated together through the sticking ends created by the digestion. (D) DNA is purified from protein. (E) The sequence within the fragments is identified and mapped back to the genome. Please click here to view a larger version of this figure.
Figure 3: The 3C primer scheme. The 3C primers are designed around restriction sites at varying distances from the genomic location of interest. The pink arrows represent the experimental primer sets surrounding a DpnII site. The experimental primers can be mixed and matched to assess chromatin looping in the region. The orange primers represent a negative control without a DpnII site. Please click here to view a larger version of this figure.
Figure 4: Representative qPCR data for a 3C experiment. Relative abundance of the 3C test primer sets. (A) Control graph showing the relative abundance of the product in the undigested control (blue), digested control (grey), and 3C sample (orange). (B) The 3C experiment; the relative abundance of the product from combinations of test primers in the undigested control (blue), digested control (grey), and 3C sample (orange). Please click here to view a larger version of this figure.
Figure 5: Visualization of the 3C qPCR products. Top, gel with the qPCR endpoint products with the samples indicated. Bottom, representative Sanger sequence trace files for the reactions indicated. The orange sample is an example of a false positive (see Figure 4, r38/34), and the blue sample is an example of a true positive with the DpnII site indicated above the trace. Please click here to view a larger version of this figure.
Site | % digestion |
r8 | 86.98 |
r3 | 88.44 |
r38 | 89.64 |
r34 | 87.55 |
r33 | 87.85 |
r14 | 86.97 |
r47 | 89.45 |
Table 1: Calculated digestion efficiency.
Name | Sequence | Genomic loci | ||
r3 FWD | ACGCAAGTAAAATTCTGGTTTTTGACC | chrX:11361475 | ||
r3 RVS | TTTCCTGAGCTCTAACCATGTTTGC | chrX:11361561 | ||
r38 FWD | TTACTTCTGAAGTAATCTTTTCTTATCCCC | chrX:5859700 | ||
r38 RVS | AGACGAGCTGATTAAAAGTAGTTGAGAG | chrX:5859775 | ||
r34 FWD | ATTTGTGGATTGCGTGGAGACG | chrX:5429702 | ||
r34 RVS | AATAATCCTCTTAACAAACGTGGCC | chrX:5429777 | ||
r33 FWD | AAGAGTTGTCCAAAATAAATTGAGCTAAC | chrX:6296704 | ||
r33 RVS | TTCAGAAAAGTAAACTTTGACTTGGAACG | chrX:6296807 | ||
r14 FWD | AATTATCGATTTTTCCATCGCGCAG | chrX:8036367 | ||
r14 RVS | ATTTCAATGAAAATGTAAAAATGTTCCTTC | chrX:8036427 | ||
r47 FWD | ATCTAGACTTGATAATATTTGTGTGTCCTC | chrX:9464939 | ||
r47 RVS | AAGTTCTGCAACTGTTAGATGAATAACAC | chrX:9465064 | ||
r8 FWD | GAGAATGTTGTTCTGTAACTGAAAACTTG | chrX:11094257 | ||
r8 RVS | TTACGAAATTTGGTAGTTTTGGACC | chrX:11094362 | ||
Control primer FWD | CAATCGTCTCGCTCACTTGTC | chrX:7608049 | ||
Control primer RVS | GATGTGAGCAACAAGGCACC | chrX:7608166 |
Table 2: Primers for the representative 3C experiment.
The 3C is a powerful technique that is rooted in basic molecular techniques. It is this foundation of fundamental tools that makes 3C such an intriguing technique to use with undergraduates. With so many recent studies observing chromatin dynamics on such a broad scale, using these results to devise a narrow-focused experiment on a single gene or genomic region has the potential to create a unique and impactful experiment in undergraduate research. Often, experiments like these are considered too advanced for undergraduates, but with careful planning, they are easily achievable. It is important to note that the assays designed to probe the chromatin connections captured by the 3C library can vary from semi-quantitative endpoint PCR to whole-genome sequencing. In fact, data from the first 3C paper16 were generated from qPCR. This wide range of assays can all be used because all 3C technologies produce the same product-a library of DNA fragments representing 3D connections in the nucleus.
Presented here is an adaptation of a more flexible and accommodating protocol that is a better fit for undergraduate researchers. The pausing periods listed above imply overnight delays; however, these pauses can extend over weekends and, in the case of the cells and nuclei, for weeks. The most crucial consideration is when the work will get done. Often in protocols, there are time-sensitive steps when pausing is not an option. Outside of a few points (day 1 and day 2), there are many places to stop and freeze the sample. These are critical when working with undergraduates where the schedules and timings of lab work need to be flexible. In addition to engineering these stops into the protocol, undergraduates are encouraged to work in pairs or even small groups of three or four. Groups work well for this protocol, as the students can support each other and create a buddy system so everyone is working safely. Lab work is also more fun with others involved. With groups, students can also work on a variety of questions focused on the organization of chromatin while still performing the same protocol. Thus, even while students are working on separate projects, the protocol links their efforts, and because of this, they can support each other.
Other adaptations are meant to work around the fact that certain specialized tools and equipment are not necessarily found in all undergraduate institutions. These pieces of equipment include but are not limited to qPCR thermocyclers, gel documentation systems, and nano volume spectrometers. Indeed, these pieces of equipment are convenient but are not a requirement. Here, the classic method of 3C is also described in the primer design portion; this involves identifying a genomic locus of interest and, from that, assessing other genomic loci for chromatin contact points further away. This technique also works well if a published dataset is used, such as a dataset using Hi-C, where known positive (connecting) and negative (non-connecting) loci are identified. Designing experiments using these published data sets is another great adaptation for teaching labs, as the chance for the successful identification of chromatin connections is usually greater. In addition, the research article can be discussed in class and be used as a reference.
This protocol uses a modified qPCR approach to visualize the 3C product formation. Controls are essential to the success of the 3C technique. Each experiment uses both sample controls and primer controls to determine the completion of the 3C procedure. The sample controls include an undigested control (genomic DNA) and digested control. The undigested control determines the baseline signal for the primer sets and is used with the cross-linked digestion control to determine the digestion efficiency It is expected that there will be a drop in product for any primers directed across a restriction site. Comparing this value to the undigested control provides an indication of how well the sample was digested.
The primers for the PCR include a control primer and test primers. The control primer is a primer set that is near the genomic region being assayed and does not contain a restriction site. This provides the baseline for determining the abundance of the test primer PCR products. Test primers are forward and reverse primers that flank a restriction site for a particular genomic locus of interest (Figure 3). Reactions using these primer sets are compared to determine the digestion efficiency, as the product abundance should drop if the restriction site has been cut. In determining the chromatin organization, one test primer from one locus is paired with another test primer from a different genomic locus to determine if these two loci are close together in 3D space. In that case, the expectation is that a PCR product would only be found using the 3C sample as the template.
It is important to note that even validated primers have the tendency to fail (Figure 4: r14 primer set). In addition, PCR products are frequently identified in control reactions and in reactions in which a chromatin connection is not predicted (such as the digested control, since it is not ligated). These instances are sequenced and either fail Sanger QC or return without a defined sequence (Figure 5). Additionally, traditional 3C experiments generate a "control template," an uncross-linked, digested, and ligated DNA sample, which represents all the possible ligated fragments that can be produced with a given amount of DNA. The "control template" plays an important role in comparing the intensities of qPCR signals between two genomic loci to determine if the signal represents a true interaction or just a random association. Creating a "control template" can be problematic, as a large portion of the chromatin being assayed must be captured in the form of an artificial chromosome and processed along with the 3C samples. Securing such a construct may not be feasible, and creating one could be outside the scope of a semester project. Due to these difficulties, we suggest using a control primer. The control primer does not replace all of the functionality of the "control template" but still provides the opportunity to analyze the data to make "presence" or "absence" determinations.
When performing qPCR, using equal amounts of the sample is important. This should be determined, even if using a nano-spectrophotometer such as a nanodrop, by generating a standard curve from genomic DNA of a known concentration and fitting the 3C samples to that line. These amounts should be recorded and used in subsequent PCRs. The quality of the PCR reaction is also important. As the PCR runs in qPCR, the product abundance is measured using fluorescence and recorded. This recording is accessible in the amplification plot. Once the program has finished, it is important to check the amplification plot and ensure that the reactions (except for the no template controls) have three phases: a baseline, an exponential, and a plateau/saturation phase. It is important to check that the reactions have an exponential phase, in particular for setting the threshold (see below). Additionally, for serially diluted samples, there should be a shift in Ct values consistent with the dilution of the sample (the highest concentration will have the lowest Ct values, while the lowest concentration will have the highest Ct values). Samples that do not reflect this change in the amplification plot require a new dilution or indicate a larger issue with the 3C sample formation. Finally, while generating the standard curve, the PCR software will compute the PCR efficiency and the R2 value. The PCR efficiency should be greater than 90%, and the R2 value should be greater than 0.99. If either of these conditions are not met, it is likely that something is wrong with the sample or the PCR primers.
Post qPCR, the percent digestion and the presence of 3C interactions can be calculated using the qPCR Ct for each reaction. To determine these, first, the threshold for the PCR reaction must be set. This is normally done using the software that comes with the qPCR machine. Setting the threshold will define the concentration of PCR product that will be used to compare the sample Ct values. The threshold should bisect the amplification curves of the PCR reactions in the exponential phase of amplification. Only PCR reactions with exponential amplification can be compared (in this case, the control primers and the test primer reactions), as this is the only way to ensure the reactions are amplifying DNA at the same rate and can be compared faithfully. When analyzing the 3C graphs, conditionally positive reactions are identified as those with more product over the control samples, genomic control, and the digestion control (Figure 4B). However, these samples must be further validated using Sanger sequencing following the gel purification of the PCR product.
After Sanger sequencing, samples that pass the QC can be analyzed using Blat. The goal of this analysis is to determine if the sample has the sequence of both target genomic loci flanking the restriction site (DpnII in the case of this protocol). If both sequences are identified, then the 3C fragment can be considered validated. If the results of the Blat do not return the expected sequence, this may indicate that one or both primers are not optimal, resulting in a false positive qPCR result. The trace files for the false positive samples will have undefined base peaks, and the Seq reports will contain mostly "n" base calls.
Sanger validation is essential, as false positives from artifactual PCR product formation are possible. These false positives can be identified when the sequencing products do not have the expected target sequence or a DpnII site characteristic of a proper 3C fragment (Figure 5). The sequencing of the PCR fragments also provides another data point for the experiment and drives home to the students that the 3C technique is identifying distant genomic loci that are coming together in 3D space within the nuclei.
The 3C technique provides a wealth of foundational molecular techniques for undergraduates in a flexible, straightforward procedure. This 3C technique is also a launching point for the other 3C techniques that incorporate next-generation sequencing (NGS). These types of experiments can expose undergraduates to important aspects of bioinformatics and are rooted in the basic principles outlined here. Undergraduate experience and involvement are key to their success and development as young scientists. By providing these opportunities, undergraduates can strengthen their understanding of basic principles while building their confidence to tackle cutting-edge techniques and questions.
The authors have nothing to disclose.
This work was supported in part by the Rhode Island Institutional Development Award (IDeA) Network of Biomedical Research Excellence from the National Institute of General Medical Sciences of the National Institutes of Health under grant number P20GM103430 and the Bryant Center of Health and Behavioral Sciences.
37% Formaldehyde | Millapore-Sigma | F8775 | |
100% Ethanol | Millapore-Sigma | E7023 | |
CaCl2 | MP Biomedical | 215350280 | |
chloroform | Millapore-Sigma | C0549 | |
cOmplete, EDTA-free Protease Inhibitor Cocktail | Millapore-Sigma | COEDTAF-RO | mixed to 50x in water. Diluted to 1x in Sucrose buffer and GB buffer fresh |
Dithiothreitol (DTT) | Millapore-Sigma | D0632 | 1 M stock diluted to 500 µM in Sucrose buffer and GB buffer fresh |
DpnII | NEB | R0543M | |
Glycerol | Millapore-Sigma | G9012 | |
glycine | Millapore-Sigma | G8898 | |
glycogen | Millapore-Sigma | 10901393001 | |
HEPES | Millapore-Sigma | H3375 | |
KCl | Millapore-Sigma | P3911 | |
KH2PO4 | Millapore-Sigma | P5655 | |
methyl green pyronin | Millapore-Sigma | HT70116 | |
MgAc2 | Thermoscientific | 1222530 | |
Na2HPO4 | Millapore-Sigma | S5136 | |
NaCl | Millapore-Sigma | S9888 | |
phenol-chloroform | Millapore-Sigma | P3803 | |
Pronase | Millapore-Sigma | 11459643001 | |
Proteinase K | IBI Scientific | IB05406 | |
qPCR Ready mix (Phire Taq etc) | Millapore-Sigma | KCQS07 | |
RNase A | Millapore-Sigma | R6148 | |
Sodium Acetate | Millapore-Sigma | S2889 | |
sodium dodecyl sulfate (SDS) | Millapore-Sigma | L3771 | |
Sucrose | Millapore-Sigma | S0389 | |
T4 DNA Ligase | Promega | M1804 | |
Tris-HCl | Millapore-Sigma | 108319 | |
Triton X-100 | Millapore-Sigma | T9284 | |
Trypsin-EDTA | Millapore-Sigma | T4049 |