This work describes a protocol for the generation of high resolution in situ Hi-C libraries from tightly staged pre-gastrulation Drosophila melanogaster embryos.
Investigating the three-dimensional architecture of chromatin offers invaluable insight into the mechanisms of gene regulation. Here, we describe a protocol for performing the chromatin conformation capture technique in situ Hi-C on staged Drosophila melanogaster embryo populations. The result is a sequencing library that allows the mapping of all chromatin interactions that occur in the nucleus in a single experiment. Embryo sorting is done manually using a fluorescent stereo microscope and a transgenic fly line containing a nuclear marker. Using this technique, embryo populations from each nuclear division cycle, and with defined cell cycle status, can be obtained with very high purity. The protocol may also be adapted to sort older embryos beyond gastrulation. Sorted embryos are used as inputs for in situ Hi-C. All experiments, including sequencing library preparation, can be completed in five days. The protocol has low input requirements and works reliably using 20 blastoderm stage embryos as input material. The end result is a sequencing library for next generation sequencing. After sequencing, the data can be processed into genome-wide chromatin interaction maps that can be analyzed using a wide range of available tools to gain information about topologically associating domain (TAD) structure, chromatin loops, and chromatin compartments during Drosophila development.
Chromatin conformation capture (3C) has emerged as an exceptionally useful method to study the topology of chromatin in the nucleus1. The 3C variant Hi-C allows measuring the contact frequencies of all chromatin interactions that occur in the nucleus in a single experiment2. Application of Hi-C has played an important role in the discovery and characterization of many fundamental principles of chromatin organization, such as TADs, compartments, and loops3,4,5.
Studies of chromatin architecture in the context of developmental transitions and cell differentiation are increasingly used to unravel the mechanisms of gene regulation during these processes6,7,8,9. One of the model organisms of great interest is Drosophila melanogaster, whose development and genome are well characterized. However, few studies that investigate chromatin architecture in Drosophila outside of in vitro tissue culture settings have been conducted10,11. In embryos 16–18 h post fertilization, TADs and compartments reminiscent of similar structures in mammals were identified10, which raises the question of which role they are playing in gene regulation during Drosophila embryo development. Especially in the early stages of development, prior to gastrulation, such studies are technically challenging. Before gastrulation, Drosophila embryos undergo 13 synchronous nuclear divisions that proceed at an extremely rapid pace of 8–60 min per cycle12,13. In addition to this, the lack of visual features to distinguish the different stages make it difficult to obtain tightly staged embryo material in sufficient quantities.
In order to develop a protocol that allows studying chromatin architecture in early Drosophila development at nuclear cycle resolution, we combined two existing techniques: in situ Hi-C, which allows the generation of high resolution whole genome contact maps5, and embryo staging using a transgenic Drosophila line expressing a eGFP-PCNA transgene13,14. This transgene localizes to the nucleus during interphase and disperses throughout the syncytial blastoderm during mitosis. Using this property, it is possible to easily distinguish different stages by their nuclear density and mitotic embryos by the dispersion of the GFP signal.
Together, these techniques enable studying the three-dimensional structure of chromatin in high resolution from as few as 20 Drosophila embryos. This protocol includes the instructions for harvesting and sorting Drosophila embryos to obtain populations of embryos from a single nuclear division cycle. It further describes how the obtained embryos are used to perform in situ Hi-C. The end result is a nucleotide library suitable for sequencing on next generation sequencing machines. The resulting sequencing reads can then be processed into detailed chromatin interaction maps covering the entire Drosophila genome.
1. Drosophila Embryo Collection
NOTE: An equivalent embryo collection can be performed as shown in a previous publication15.
2. Embryo Fixation
NOTE: Optimal fixation conditions, primarily the concentration of detergent, formaldehyde and the duration of fixation, need to be empirically determined to fit the stage of the embryos. For stages around the syncytial blastoderm, a final concentration of 0.5% Triton X-100 and 1.8% formaldehyde in the aqueous phase work well. For later stages beyond embryo stage 9, further optimization of these parameters may be necessary. All solutions used during fixation and sorting should contain protease inhibitors.
3. Embryo Sorting
NOTE: Sorting can be done on any fluorescent stereo microscope equipped with a GFP filter at 60–80X magnification.
4. In Situ Hi-C
5. Sequencing Library Preparation
NOTE: All library steps are done using components from a commercial DNA library preparation kit (see Table of Materials). However, alternative kits or other reagents may be substituted. Precipitation tends to form in the library preparation agents during freezer storage. It is therefore important to make sure that all precipitation is dissolved before using the reagents.
Sorted embryo populations at nuclear cycle 12, 13, and 14 (corresponding to 1:30, 1:45, and 2:10 hours post fertilization, respectively12) and 3–4 h post fertilization (hpf) were obtained according to the procedures described in the protocol. By taking pictures of the eGFP-PCNA signal of each sorted embryo batch, it is possible to document the precise stage and cell cycle status of every single embryo that is used in downstream experiments. Example pictures of embryos from sorted populations are shown in Figure 1B-E. The output of the in situ Hi-C protocol is a nucleotide library ready to be sequenced on next generation sequencing machines. For this purpose, a final library concentration of at least 2–4 nM is usually required. Using the recommended amounts of input material, this concentration is reliably achieved (Table 1).
The expected size distribution of DNA fragments after size selection is between 300–600 bp, with a maximum at around 500 bp (Figure 2A), depending on the exact shearing and size selection parameters. For sequencing, we recommend paired-end reads of at least 75 bp length to minimize the number of unmappable restriction fragments in the genome. High-resolution maps with 1–2 kb bin size can be obtained from 400 million reads. We recommend sequencing multiple biological replicates at a lower depth of ~150 million reads each, instead of sequencing a single replicate at very high depth. This allows assessment of the biological variation and leads to a lower number of discarded reads due to PCR duplication. For visual representation, the replicates can be combined. Before committing to sequencing a sample at high depth, we recommend running samples using shallow sequencing (a few million reads per sample) to determine basic library quality parameters as in Figure 2B.
Analysis of Hi-C data requires significant computational resources and bioinformatics expertise. As a rough overview, the paired reads are mapped independently to the reference genome, the resulting alignments are filtered for quality and orientation, then a matrix of contacts at a given bin resolution or fragment level can be generated from the filtered alignments. The contact matrix is the basis for all further downstream analysis exploring TADs, loops, and compartments. For the initial analysis of the sequencing reads, several bioinformatics pipelines are available that enable processing of raw reads into contact matrices without much specialized bioinformatics knowledge18,19,20,21,22,23. How further analysis is carried out depends largely on the exact biological question under study and might require significant experience in programming and scripting in R or Python. However, several tools and algorithms to call TADs are available5,24,25,26,27,28, as well as software to analyze and explore Hi-C data in the web browser and as stand-alone desktop applications29,30,31,32.
Once processed, the quality of the library can be determined using different metrics (Figure 2B). First, the rate of PCR duplicates, which is the number of sequenced read pairs arising from the same original molecule, should be as low as possible to limit the amount of wasted sequence reads. However, even libraries with >40% PCR duplication can be processed into high-quality contact maps if the duplicates are filtered. Second, the rate of filtered reads due to their orientation, as described in4, should consistently be lower than 10% of aligned read pairs.
During pre-gastrular development of Drosophila between nuclear cycle 12 and 14, the nuclear architecture is drastically remodeled33 (Figure 3). At nuclear cycle 12, few TADs are detected, and the overall distribution of contacts is very smooth without many discernable features. This is dramatically changed at nuclear cycle 13 and 14, when TADs are increasingly prominent and unspecific long-range contacts are depleted.
Figure 1: Representative pictures of eGFP-PCNA embryos during sorting. (A) eGFP-PCNA signal from an unsorted population of embryos after 60 min collection and 2 h incubation at 25 °C (B-E) Examples of embryos from sorted populations at nuclear cycle 12 (B), nuclear cycle 13 (C), nuclear cycle 14 (D), and from embryos undergoing synchronous mitosis (E). Scale bars = 200 µm. Please click here to view a larger version of this figure.
Figure 2: Examples of in situ Hi-C library quality metrics. (A) Bioanalyzer traces showing the distribution of DNA fragment sizes from a successful Hi-C library (Library 1, top) and from a library that displays a peak of fragments that are too large for sequencing (Library 2, bottom). Library 2 was successfully sequenced, but even larger amounts of undesired DNA fragments may lead to decreased sequencing yields. (B) Filtering statistics of two Hi-C libraries: displayed is the number of aligned read pairs that are excluded from further analysis due to read orientation and distance (inward, outward)4 or PCR duplication (duplicate). In each bar, the number of reads passing the filter (remaining) and failing (filtered) are plotted. The percentage of reads passing the filter is additionally shown as text. Please click here to view a larger version of this figure.
Figure 3: Hi-C interaction maps from staged embryos. Hi-C interaction maps are binned at 10 kb resolution and balanced as described before33. Shown is a region on chromosome 2L. Please click here to view a larger version of this figure.
Library | Stage | Number of embryos | Amount DNA before shearing (ng) | PCR cycles | Final library concentration (nM) |
1 | nuclear cycle 12 | 71 | 46 | 12 | 28.2 |
2 | nuclear cycle 12 | 46 | 40 | 12 | 22.2 |
3 | nuclear cycle 12 | 60 | 13 | 13 | 12.3 |
4 | nuclear cycle 13 | 36 | 39 | 12 | 22.2 |
5 | nuclear cycle 13 | 35 | 10 | 12 | 5.0 |
6 | nuclear cycle 13 | 48 | 18 | 12 | 8.7 |
7 | nuclear cycle 14 | 33 | 30 | 12 | 39.8 |
8 | nuclear cycle 14 | 24 | 36 | 12 | 20.4 |
9 | nuclear cycle 14 | 14 | 8 | 12 | 4.2 |
10 | 3-4 hpf | 17 | 30 | 12 | 24.0 |
11 | 3-4 hpf | 18 | 42 | 11 | 19.1 |
12 | 3-4 hpf | 22 | 63 | 11 | 48.4 |
Table 1: List of representative sequencing library statistics. For each library in the list, the number of embryos that were used for its generation, the amount of total DNA before biotin pulldown and shearing measured by Qubit, the number of PCR cycles used for amplification, and the final concentration of the sequencing library after purification and size selection are indicated.
The protocol presented here is very effective at generating high-quality maps of the chromatin architecture in early Drosophila embryos. Compared to an earlier protocol34, the approach described here uses an up-to-date in situ Hi-C procedure5, resulting in quicker processing, higher resolution, and less reagent usage. The overall procedure including the in situ Hi-C protocol is expected to work on a wide range of stages and experimental systems besides Drosophila. Since the protocol has a low input requirement, it could also be used on isolated cell populations. In Drosophila, when using the protocol for embryos outside the range described here, some parameters, in particular the fixation of the material, might need to be adjusted. Since older embryos develop a highly impermeable cuticle, raising the concentration of formaldehyde and prolonging fixation may be appropriate. For collection of embryos at stages other than nuclear cycle 14, the incubation times of embryos at 25 °C in step 1.4 need to be adjusted as follows: nuclear cycle 12, 70 min; nuclear cycle 13, 90 min; 3–4 hpf, 3:30 h.
During the 13 cleavage divisions (stage 1-4), the nuclei density roughly doubles with each division. The nuclei can easily be identified by their bright GFP fluorescence. During mitosis, eGFP-PCNA is not located in the nucleus, and its signal is dispersed throughout the embryo. This feature makes identifying embryos that are undergoing a synchronous cleavage division possible. For studying chromatin conformation, these mitotic embryos are usually not desirable, since the mitotic organization of chromatin is drastically different than the interphase organization35. It is possible to adapt the protocol to specifically select embryos undergoing a synchronous mitotic division. In this case, only embryos with dispersed, non-nuclear distribution of eGFP-PCNA should be kept, and all other embryos should be discarded. Since the nuclear density cannot be determined, alternative methods to stage embryos by their morphology viewed in transmitted light microscopy must be employed. Presence of pole cells and nuclei at the embryo periphery indicate that the embryo has completed at least nuclear cycle 9, whereas visible cellularization at the periphery indicates nuclear cycle 1412.
Hi-C experiments can be successfully performed using a wide selection of restriction enzymes5. Current approaches typically use enzymes that recognize either a 4-base sequence, such as MboI, or a 6-base recognition site, such as HindIII. The advantage of 4-base cutters over 6-base cutters is that they offer higher potential resolution, given enough sequencing depth, and a more even coverage of restriction sites across the genome. There is no clear advantage in choosing one 4-base cutter over another5,23,36,37. The two most commonly used enzymes, MboI and DpnII, both recognize the same GATC recognition site. DpnII is less sensitive to CpG methylation, which is of no concern in Drosophila. The protocol presented here can also be successfully completed using DpnII as a restriction enzyme. In section 4.2. restriction enzyme and buffer have to be adjusted for DpnII compatibility, according to the manufacturer's recommendations.
If the fragment size of the sequencing library deviates significantly from the range shown in Figure 2A, cluster formation during sequencing may be less efficient or fail completely. In this case, the size distribution after shearing should be checked and shearing parameters adjusted accordingly. Peaks in the distribution of DNA fragments of very small (<100 bp) or very large (>1,000 bp) sizes indicates problems with size selection, such as carry over of beads or supernatant that are supposed to be discarded. Often these libraries with small peaks at these undesirable sizes, such as the one pictured, are still sequenced successfully with only a minor decrease in clustering efficiency.
High rates of PCR duplication should be avoided because this drastically reduces the number of usable sequence reads. The rate of PCR duplicates is directly related to the amount of input material. Using more input therefore usually alleviates problems with PCR duplication.
Higher numbers of reads filtered due to read orientation (Figure 2B) indicate insufficient digestion, which can be the result of using too little enzyme, too much input material, or incomplete homogenization of the embryos.
The authors have nothing to disclose.
This research was funded by the Max Planck Society. C.B.H. was supported by a fellowship from the International Max Planck Research School – Molecular Biomedicine. We thank Shelby Blythe and Eric Wieschaus for kindly providing the eGFP-PCNA Drosophila melanogaster line.
Biotin-14-dATP | Life Technologies | 19524016 | |
MboI | New England Biolabs | R0147L | |
DNA Polymerase I Klenow Fragment | New England Biolabs | M0210L | |
T4 DNA Ligase | Thermo Fisher | EL0012 | T4 DNA Ligase Buffer included |
T4 DNA Polymerase | New England Biolabs | M0203L | |
Proteinase K | AppliChem | A4392 | |
GlycoBlue | Life Technologies | AM9516 | |
Complete Ultra EDTA-free protease inhibitors | Roche | 5892791001 | |
NEBNext Multiplex Oligos for Illumina (Index Primers Set 1) | New England Biolabs | E7335 | Sequencing Adaptor, Forward (unindexed) PCR primer and Reverse (indexed) PCR primer and USER enzyme used in the Library preparation section are components of this kit |
NEBNext Ultra II DNA Library Prep Kit | New England Biolabs | E7645 | End Prep Enzyme Mix, End Prep Reaction Buffer, Ligation Enhancer, Ligation Master Mix and Polymerase Master Mix used in the Library preparation section are components of this kit |
Covaris S2 AFA System | Covaris | ||
DNA LoBind Tubes, 1.5 mL | Eppendorf | 0030108051 | |
Falcon cell strainer 100 µm | Corning | 352360 | Embryo collection baskets |
37% formaldehyde | VWR | 437536C | |
Heptane | AppliChem | 122062.1612 | |
M165 FC fluorescent stereo microscope | Leica | ||
M165 FC DFC camera | Leica | ||
Metal micro pestle | Carl Roth | P985.1 | Used to lyse embryos in step 4.1.4 |
RNase A | AppliChem | A3832,0050 | |
Dynabeads MyOne Streptavidin C1 | Life Technologies | 65002 | Streptavidin coated magnetic beads |
Ampure XP beads | Beckman Coulter | A63881 | |
Qubit 3.0 Fluorometer | Thermo Fisher Scientific | Q33216 | |
Qubit assay tubes | Thermo Fisher Scientific | Q32856 | |
Qubit dsDNA HS Assay Kit | Thermo Fisher Scientific | Q32854 | |
Phosphate buffered saline (PBS) | Sigma-Aldrich | P4417 | |
eGFP-PCNA flies | Gift from S. Blythe and E. Wieschaus | ||
Sodium hypochlorite 13% | Thermo Fisher | AC219255000 | |
Triton X-100 | AppliChem | A4975 | |
Tris buffer pH 8.0 (1 M) for molecular biology | AppliChem | A4577 | |
NaCl | AppliChem | A2942 | |
IGEPAL CA-630 | Sigma-Aldrich | I8896 | |
1.5 mL microcentrifuge tubes | Greiner Bio-One | 616201 | |
SDS for molecular biology | AppliChem | A2263 | |
10x CutSmart buffer | New England Biolabs | B7204S | Restriction enzyme buffer |
PCR Nucleotide Mix | Sigma-Aldrich | 11814362001 | Unmodified dCTP, dGTP, dTTP |
BSA, Molecular Biology Grade | New England Biolabs | B9000S | |
EDTA 0.5M solution for molecular biology | AppliChem | A4892 | |
Sodium acetate 3M pH 5.2 | Sigma-Aldrich | S7899 | |
DynaMag-2 Magnet | Life Technologies | 12321D | Magnetic stand |
Intelli-Mixer RM-2L | Omnilab | 5729802 | Rotator |
ThermoMixer F1.5 | Eppendorf | 5384000012 | Mixer |
Small Embryo Collection Cages | Flystuff.com | 59-100 | Egg collection cage |
Centrifuge 5424 R | Eppendorf | 5404000413 | |
C1000 Touch Thermal Cycler | Bio-Rad | 1851148 | |
PCR tube strips | Greiner Bio-One | 673275 | |
NEBuffer 2.1 | New England Biolabs | B7202S | T4 DNA Polymerase buffer |