Bisulfite amplicon sequencing (BSAS) is a method for quantifying cytosine methylation in targeted genomic regions of interest. This method uses bisulfite conversion paired with PCR amplification of target regions prior to next-generation sequencing to produce absolute quantitation of DNA methylation at a base-specific level.
The role of epigenetic processes in the control of gene expression has been known for a number of years. DNA methylation at cytosine residues is of particular interest for epigenetic studies as it has been demonstrated to be both a long lasting and a dynamic regulator of gene expression. Efforts to examine epigenetic changes in health and disease have been hindered by the lack of high-throughput, quantitatively accurate methods. With the advent and popularization of next-generation sequencing (NGS) technologies, these tools are now being applied to epigenomics in addition to existing genomic and transcriptomic methodologies. For epigenetic investigations of cytosine methylation where regions of interest, such as specific gene promoters or CpG islands, have been identified and there is a need to examine significant numbers of samples with high quantitative accuracy, we have developed a method called Bisulfite Amplicon Sequencing (BSAS). This method combines bisulfite conversion with targeted amplification of regions of interest, transposome-mediated library construction and benchtop NGS. BSAS offers a rapid and efficient method for analysis of up to 10 kb of targeted regions in up to 96 samples at a time that can be performed by most research groups with basic molecular biology skills. The results provide absolute quantitation of cytosine methylation with base specificity. BSAS can be applied to any genomic region from any DNA source. This method is useful for hypothesis testing studies of target regions of interest as well as confirmation of regions identified in genome-wide methylation analyses such as whole genome bisulfite sequencing, reduced representation bisulfite sequencing, and methylated DNA immunoprecipitation sequencing.
It has been over a half-century since the first report of naturally occurring DNA modifications in the form of cytosine methylation1. Cytosine methylation is a modified cytosine nucleotide base where the 5th carbon on the base ring has a methyl group (5-mC), most frequently occurring in the CpG dinucleotide motif in mammalian genomes. The functional presence of 5-mC in gene promoters is generally associated with transcriptional repression, while the absence is associated with transcriptional activity2.
Tremendous advances have been made in our understanding of the role of DNA methylation in development3, transgeneration propagation of epigenetic profiles4, cancer pathogenesis5,6, and a number of other research areas. Many of these advances have come in the last few years as new methodologies have been developed to profile DNA methylation7. With the advent and popularization of next-generation sequencing (NGS) techniques there are now a number of methods to profile DNA methylation across a whole genome or large regions of a genome8. These approaches open the possibility of discovery studies that quantify DNA methylation patterns and differences in DNA methylation. In addition to these hypothesis-generating approaches, there is a significant need for targeted/hypothesis testing DNA methylation quantitation methods.
Pyrosequencing, methylation specific PCR, and direct Sanger sequencing of bisulfite converted DNA have been the most used methods for analysis of targeted regions (i.e., a promoter region of a single gene or a CpG island)9-12. All of these methods rely on bisulfite conversion, a chemical deamination reaction where unmethylated cytosine’s are converted into uracil’s, while methylated cytosine’s remain intact13,14. PCR amplification of bisulfite-converted DNA results in replacement of uracil’s with thymine15 allowing the differential methylation to read out as a base difference. While highly useful, the limitations to these methods include low quantitative accuracy, short read length, and low sample throughput. To address these limitations we developed a method we have termed Bisulfite Amplicon Sequencing (BSAS)16. The goal of this method is to be able to analyze target genomic regions of interest in a large number of samples with high quantitative accuracy.
BSAS uses elements of existing methods (bisulfite conversion and region-specific PCR amplification) and combines them with simple next-generation library construction (transposome-mediated)17 and benchtop NGS18 (Figure 1). This approach provides for a more quantitative and higher throughput method to examine the cytosine methylation in any region of interest. Here we present the method in detail and describe approaches for troubleshooting and quality checking the BSAS process.
1. Nucleic Acid Isolation from Tissue
NOTE: Co-isolation of DNA and RNA from the same tissue sample offers the opportunity to collect DNA for epigenetic analysis and paired RNA for gene expression analysis. The example given here is a form of column purification from experimental tissue but alternative approaches for different sample types or other isolation methods can be employed. The primary requirement is for highly purified nucleic acid without contaminating protein or organic solvents.
2. Target Identification and Primer Design
3. Bisulfite Specific PCR Optimization
NOTE: For bisulfite conversion of genomic DNA, a number of different commercial kits for bisulfite conversion are available. Select the kit or protocol that best suits the planned experiment.
4. NGS Library Preparation and Library QC
NOTE: BSAS NGS library preparation uses a simplified transposome-mediated protocol with dual-indexing (See Materials List for details on library preparation kit selection). This method provides an extremely rapid and high-throughput route for library construction that has been validated and shows little to no bias in bisulfite sequencing applications16,19. Dual-indexing allows for a higher multiplexing of samples than single indexing. Other library preparation styles may be possible but have not been tested.
5. Sequencing Libraries Using a Benchtop Sequencer
6. Methylation Quantitation Data Analysis
NOTE: There are multiple programs available for NGS data analysis including both commercial and open source. Details on running programs can be found with each specific package. General instructions are given at each step along with the specific commands for this software package. (See Materials List for details on software used in this protocol).
Equation 1. Bisulfite conversion efficiency.
NOTE: While mammalian genomes contain very low amounts of non-CpG cytosine methylation (with the exception potentially of stem cells20), plant genomes contain a significant amount. In order to determine bisulfite conversion efficiency in these reference genomes, a suitable route is to sequence a portion of the mitochondrial genome or chloroplast genome or known unmethylated genes present in plant genomes21 and calculating conversion efficiency as described above. Bisulfite conversion efficiency should be ≥98% in most cases, with the unconverted C’s potentially arising from non CpG methylation.
BSAS reads properly aligned to converted reference sequence will resemble Figure 5. CpG dinucleotides can clearly be identified and methylation states can be estimated by observing base calls at CpG sites in the mapped reads. For example, with methylation controls, 0% methylation will result in all the reads mapped to CpG sites containing T’s (Figure 5A). 100% methylation controls will result in all the reads mapped reads to CpG sites containing C’s (Figure 5B).
Quantitation of cytosine frequency (C/C+T) at CpG sites yields the methylation frequency in the original sample. A representative standard curve generated from whole genome methylation controls (n = 3/methylation ratio) shows linearity of methylation quantitation as well as precision in quantitation at each methylation ratio (Figure 6). As an example of this method in total, RNA and DNA were co-isolated from mouse cerebellum and retina (n=4/group). Rhodopsin expression, selectively expressed in retinal tissue, was measured using qPCR. Expression was only detected in retina (Figure 7A). Using BSAS, CpG methylation levels were quantified in the rhodopsin promoter region. Cumulative methylation levels across the promoter region were >80% in the cerebellum compared to <15% in the retina (p < 0.001, parametric t-test) (Figure 7B). BSAS methylation quantitation benefits from site-specific methylation quantitation, and methylation levels can be compared on a CpG-specific basis across any given genomic region. CpG methylation levels across the rhodopsin promoter were significantly higher in cerebellar samples when compared to retina samples (p <0.001, parametric t-test at each CpG site) (Figure 7C).
Figure 1: Schematic of BSAS method. In this method, genomic DNA is bisulfite converted to modify unmethylation cytosines to uracils. Subsequently, during PCR these uracils are changed to thymines. PCR amplification is directed against regions of interest and highly enriches for just these sequences. The resulting PCR amplicons are made into dual-indexed libraries through a simple tagmentation process. Subsequently the libraries are sequenced on a benchtop next generation sequencer and the sequencing reads are mapped to the in silico converted reference sequence and percent methylation of cytosine’s is determined in a base-specific manner.
Figure 2: In silico bisulfite conversion of region of interest. Any region of interest from any genomic reference can be selected for in silico bisulfite conversion. Non-CpG cytosines are replaced with thymine’s in the reference sequence. CpG cytosines remain cytosines in the bisulfite converted reference.
Figure 3: Primer placement. Bisulfite PCR primers are designed against an in silico bisulfite converted reference sequence. Primers are to be designed so they do not overlap CG dinucleotides or designed adjacent to CG dinucleotides.
Figure 4: Examples of high and poor quality PCR amplicons and sequencing libraries. Representative electropherogram traces and gel show (A) ideal amplicon for transposome-mediated library generation with a single high concentration product. (B) Library generated from this amplicon shows high concentration and even size distribution. (C) Poor quality PCR amplicons can be low in size, concentration, and contain multiple products and/or primer dimers. These poor quality amplicons will lead to (D) failed library generation with low concentration and low size distribution.
Figure 5: Examples of sequencing outputs from methylation controls. (A) In silico converted reference sequencing with CpG sites highlighted in red. 0% methylation control mapped reads to the selected region showing thymines (greed highlight) mapping at CpG sites. (B) 100% methylation control mapped reads show cytosine’s (blue highlight) mapping at CpG sites.
Figure 6: Methylation quantitation across a range of methylation controls. Whole genome methylation controls mixed at methylation ratios (n = 3/ratio) from 0% to 100% were quantified using BSAS. Plotting quantitation of expected versus quantified demonstrates linearity of methylation control quantitation. Across all controls, there was high precision in methylation quantitation.
Figure 7: Example of paired DNA methylation and gene expression analysis from tissue samples. (A) Rhodopsin relative mRNA expression from mouse cerebellar and retinal tissue. (B) Rhodopsin average promoter methylation quantified by BSAS between cerebellar and retinal DNA (***p <0.001, parametric t-test). C) CpG site-specific methylation levels across rhodopsin promoter (-244 to +6, relative to transcription start site [TSS]) in mouse cerebellum and retina DNA.
BSAS allows for focused, accurate DNA methylation quantitation with high throughput capabilities in both number of samples and number of targets. Additionally, this library generation using transposome-mediated tagmentation requires less input DNA, is quicker and contains fewer steps than traditional shearing and subsequent adapter ligation techniques. These improvements over existing methods allow for precise base-level DNA methylation analysis of any target of interest. Moreover, BSAS provides a new method for confirmation of regions of interest (differentially methylation regions (DMRs), CpG islands, regulatory regions) that have been identified using whole genome bisulfite22 or capture23 approaches. Using orthogonal confirmation approaches and more quantitatively precise methods improves the analytical rigor of epigenomic studies. The high read depth achieved with BSAS allows for accurate quantitation in a narrow confidence interval, resulting in precise quantitation16. Additionally, the ability to run many samples in a single experiment can increase the sample size for confirmations of whole genome methods, which will increase the statistical power; a weakness of many current epigenetic studies. Importantly, BSAS provides a base-specific CpG methylation quantitation, which allows for the generation of testable hypotheses for epigenetic research. Hypotheses such as increased methylation at a specific response element will result in decreased binding of a specific transcription factor. BSAS, combined with paired mRNA expression data, provides a method for obtaining both epigenetic regulation and mRNA expression within a single piece of tissue or cell population. Paired measurements of regulation and expression also increase the scientific rigor and impact of the findings.
Critical steps within the BSAS protocol include primer design, amplicon optimization, and library generation/QC. PCR bias is introduced if primers are not designed properly and amplify at different efficiencies8. The use of methylation standards, such as enzymatically generated 100% and 0% cytosine methylated standards, can be used to determine the degree, if any, of methylation quantitation bias, and is a proper methodological control when quantifying methylation16. Similarly, care should be taken when optimizing PCR reactions for generating a single high-quality amplicon. If no amplicon is present using the suggested guidelines detailed in the protocol, optimization steps include increasing PCR cycle numbers or bsDNA input amount. If there are multiple PCR products, including primer-dimers, optimization steps include decreasing bsDNA input amount, decreasing the PCR primer molar input concentrations, increasing annealing temperatures, or decreasing PCR cycle number. One or a combination of these optimization procedures yields sufficient results to generate a single high-quality amplicon. Alternatively, redesigning primers is a good approach for primer sets that do not yield a single amplicon. For those regions where redesign is not suitable or possible, degenerate primer sets including both cytosine’s and thymine’s at CpG sites in forward primers and adenine’s and guanine’s at CpG sites in reverse primers may work. However, these primer sets need further optimization to ensure no PCR bias is occurring, a process outside the scope of this protocol. Quality libraries are extremely important for success. Ensure that QC measures are carried out to determine size, quality and quantity of libraries before sequencing. Sequencing reactions are prone to failure with input of low-quality libraries. Bias in methylation quantitation can be mitigated by ensuring that methylation frequencies are present in both forward and reverse sequencing reads. Additionally, quality scores of bases aligning to CpG sites should be ≥Q30 for high confidence in quantitation. While others have shown previously little to no bias in tagmentation reactions of bisulfite converted DNA19, ensuring the metrics described above allows for confirmation of low bias methylation quantitation.
BSAS currently measures methylation at CpG sites, however, future expansions of this method will allow paired measurements of CpG 5-hmC with the addition of experimental steps distinguishing between 5-mC and 5-hmC such as introducing potassium perruthenate before bisulfite conversion24. This will increase the impact of BSAS utility by allowing for paired methylation, both 5-mC and 5-hmC, and expression data in order to determine gene expression regulatory roles of DNA methylation. Transposome-mediated library preparation of PCR amplicons does result in decreased sequencing depth at the ends of amplified regions. Therefore, regions of high interest should be designed to reside in the middle of amplicons. However, because BSAS generates sequencing read depths of >1,000 X, the quantitative accuracy of 5-mC measurements near the ends of amplified regions remains high.
The authors have nothing to disclose.
The authors thank Colleen Van Kirk for mouse retina and cerebellum samples, and Peter Gregory for figure generation. The authors also thank Dr. Allison Gillaspy and the Laboratory for Molecular Biology and Cytometry Research core for access to the MiSeq. This work was supported by NIH grants DA029405, EY021716, and AG026607 and the Donald W. Reynolds Foundation.
Name | Company | Cat # | コメント |
Agilent 2100 Bioanlyzer | Agilent | G29393AA | |
Agilent RNA 6000 Nano kit | Agilent | 5067-1511 | |
Agilent High Sensitivity DNA kit | Agilent | 5067-4626 | |
Typhoon 9200 | Amersham/GE Healthcare Life Sciences | ||
Agencourt AMPure XP – PCR Purification | Beckman Coulter | A63880 | 5 mL |
PowerPac Basic Power Supply | Bio Rad | 164-5050 | |
twin.tech PCR plate 96 | Eppendorf | 951020362 | semi-skirted |
Heatsealing Film | Eppendorf | 0030127838 | |
Heatsealing Foil | Eppendorf | 0030127854 | |
Heat Sealer | Eppendorf | 5390000.024 | |
0.1-10ul Tips | Eppendorf | 0030073.002 | |
2-200ul Tips | Eppendorf | 0030073.045 | |
50-1000ul Tips | Eppendorf | 0030073.100 | |
MixMate | Eppendorf | 5353000.014 | |
1.5 mL Microcentrifuge Tubes | Eppendorf | 0030121.023 | |
Centrifuge 5424 | Eppendorf | 5424000.010 | |
2.0 mL Microcentrifuge Tube | Eppendorf | 022363352 | |
Nextera XT DNA Sample Preparation Kit | Illumina | FC-131-1024 | 24 rxn |
Nextera XT Index Kit | Illumina | FC-131-1001 | 24 indexes |
MiSeq Desktop Sequencer | Illumina | ||
MiSeq Reagent Kit v2 | Illumina | MS-102-2002 | 300cycle |
KAPA SYBR FAST Universal qPCR kit | KAPA Biosystems | KK4824 | |
Quant-iT PicoGreen dsDNA Assay kit | Life Technologies | P7589 | |
Quant-iT RiboGreen RNA Assay kit | Life Technologies | R11490 | |
ProFlex 96-well PCR system | Life Technologies | 4484075 | |
Novex 6% TBE DNA gel | Life Technologies | EC6261BOX | 10-well |
Novex TBE Running Buffer | Life Technologies | LC6675 | 5X (1 L) |
Novex Hi-Density TBE Sample Buffer | Life Technologies | LC6678 | 5X (10 mL) |
1 Kb Plus DNA Ladder | Life Technologies | 10787-018 | |
Xcell SureLock Mini-Cell | Life Technologies | EI0001 | |
UltraPure 10mg/mL Ethidium Bromide | Life Technologies | 15585-011 | |
7900HT Fast Real-Time PCR system | Life Technologies | 4329001 | 384well block |
DynaMag-96 Side Skirted | Life Technologies | 12027 | |
SpectroMax M2 Multi-Mode Microplate Reader | Molecular Devices/VWR | 89429-532 | |
AllPrep DNA/RNA Mini Kit | Qiagen | 80204 | |
Stainless Steel Beads | Qiagen | 69989 | 5 mm |
CLC Genomics Workbench | Qiagen | Software for methylation pipeline | |
QIAQuick PCR Purification kit | Qiagen | 28104 | 50 rxn |
TissueLyser II | Retsch/Qiagen | 85300 | |
Nuclease-Free Water | Sigma | W4502 | |
TRIS hydrochloride | Sigma | PHG0002-100G | |
Tween-20 | Sigma | P9416-50ML | |
NanoDrop 2000 | Thermo Scientific | ||
384-well Full-Skirted Plate, Standard | Thermo Scientific | AB-1384 | |
Absolute qPCR Seal | Thermo Scientific | AB-1170 | |
EZ DNA Methylation-Lightning kit | Zymo Research | D5030 | 50 rxn |
ZymoTaq DNA Polymerase | Zymo Research | E2001 | 50 rxn |