Here we present a protocol to investigate genome wide DNA methylation in large scale clinical patient screening studies using the Methyl-Binding DNA Capture sequencing (MBDCap-seq or MBD-seq) technology and the subsequent bioinformatics analysis pipeline.
Methylation is one of the essential epigenetic modifications to the DNA, which is responsible for the precise regulation of genes required for stable development and differentiation of different tissue types. Dysregulation of this process is often the hallmark of various diseases like cancer. Here, we outline one of the recent sequencing techniques, Methyl-Binding DNA Capture sequencing (MBDCap-seq), used to quantify methylation in various normal and disease tissues for large patient cohorts. We describe a detailed protocol of this affinity enrichment approach along with a bioinformatics pipeline to achieve optimal quantification. This technique has been used to sequence hundreds of patients across various cancer types as a part of the 1,000 methylome project (Cancer Methylome System).
Epigenetic regulation of genes through DNA methylation is one of the essential mechanisms required to determine the cell fate by stable differentiation of different tissue types in the body1. Dysregulation of this process has been known to cause various diseases including cancer2.
This process mainly involves the addition of methyl groups on the cytosine residue in the CpG dinucleotides of DNA3. There are a few different techniques currently used to investigate this mechanism, each having their own advantages as outlined in many studies2-8. Here we will discuss one of these techniques called Methyl-Binding DNA Capture sequencing (MBDCap-seq), where we use an affinity enrichment technique to identify methylated regions of the DNA. This technique builds upon the methyl-binding ability of the MBD2 protein to enrich for the genomic DNA fragments containing methylated CpG sites. We utilize a commercial methylated DNA enrichment kit for the isolation of these methylated regions. Our laboratory has screened hundreds of patient samples using this technique and here we provide a comprehensive optimized protocol, which can be used to investigate large patient cohorts.
As evident with any next-generation sequencing technology, MBDCap-seq also requires a specific bioinformatics approach in order to accurately quantify the levels of methylation across the samples. There have been many recent studies in an effort to optimize the normalization and analysis process of the sequencing data9,10. In this protocol, we demonstrate one of these methods implementing a unique read recovery approach — LONUT — followed by linear normalization of each sample in order to enable unbiased comparisons across large number of patient samples.
All tissues are obtained following approval of the Institutional Review Board committee and when all participants consented to both molecular analyses and follow-up studies. The protocols are approved by the Human Studies Committee at University of Texas Health Science Center at San Antonio.
1. Methyl-binding DNA Capture (MBDCap)
2. Sequencing
3. Bioinformatics Analysis
Note: Further process the raw fastq files obtained from the sequencing to perform quality control and mapping the short DNA sequences (reads) to the genome.
We have used MBDCap-seq to study DNA methylation alterations in a large number of patients from diverse cancer types including breast12, endometrial13, prostate14, and liver cancers among others. Here we demonstrate some information from the breast cancer study published recently12. In this instance, we used the whole genome sequencing approach to identify CpG islands that are differentially methylated in tumor with respect to normal across different genomic regions. The investigation revealed that out of the total investigated CpG islands located in gene promoters (n = 13,081), 19.5% showed differential methylation in tumors compared to normal. Similarly, out of 6,959 intragenic CpG islands, 55.2% showed differential methylation. Intergenic promoters showed 28.1% of 4,847 and for gene promoters without CpG islands, 1.8% of 5,454 investigated regions showed differential DNA methylation (Figure 1A). A visual representation (Figure 1B) shows representative examples of the regions discussed above. The heatmap clearly depicts the methylated regions across 77 breast tumors, 10 breast normal tissues, and 38 breast cancer cell lines. The quantification of the methylation as discussed in the bioinformatics protocol detailed in this manuscript, enables us to perform various statistical tests across these patient samples to identify significant methylation differences in the whole population or a subpopulation. This analysis approach provides us with testable targets to further investigate the epigenetic mechanisms in these patient samples. Once these targets are identified, the levels of methylation can be further quantified and validated using pyrosequencing. The MBDCap-seq provides a genomic resolution of about 100 – 200 bp for the targets. To obtain further resolution, individual CpG locations within these regions can be selected for pyrosequencing quantification. We use this approach to quantify and validate the methylation differences observed in the MBDCap-seq data at a greater resolution and for comparison between individual patient groups. Figure 2 shows the quantification of methylation in 2 endometrial cancer subgroups nonrecurrent (NR) compared to recurrent (R). The figure depicts the level of DNA methylation for each patient in the 2 groups at different CpG sites in the promoter CpG island of the identified target gene SFRP1.
Figure 1: DNA hypermethylation in breast cancer samples relative to normal breast tissue in promoter and non-promoter CpG islands. Methyl capture sequencing (MBD-seq) was used to generate DNA methylation profiles of the genomes of breast tumors (n = 77) and normal breast tissue (n = 10). (A) Pie charts demonstrate differential methylation in promoter, intragenic, and intergenic CGIs as well as non-CGI promoter regions. (B) Example loci showing promoter CGI, intragenic, intergenic, and non-CGI promoter regions. Dashed squares highlight regions corresponding to breast cancer hypermethylation. Please click here to view a larger version of this figure.
Figure 2: Quantification of DNA methylation using pyrosequencing of CpG sites in an identified target gene. DNA methylation of SFRP 5 promoter region in recurrent and non-recurrent endometrial carcinoma patients detected by pyrosequencing. Each site represents one CpG site (R: Recurrent, n=21 and NR: Non-Recurrent, n=71). The plots show the mean and error bars show standard error of the mean or SEM. Please click here to view a larger version of this figure.
The MBDCap-seq technique is an affinity enrichment approach3, considered as a cost effective alternative when investigating cohorts with a large number of patients15. The pipeline presented here describes a comprehensive approach from sample procurement to data analysis and interpretation. One of the most important steps is setting up a PCR amplification procedure to improve the PCR efficiency of the GC enriched regions in the genome as this is where DNA methylation occurs. Also, it is essential to ensure that after sequencing, each sample has at least more than 20 million uniquely mapped reads to the genome. This coverage is expected to provide sufficient enrichment or sequence depth to map the entire genome12. If this coverage is not met during the first round of sequencing for a particular sample and more DNA for that sample from part one is available, another round of sequencing in part two can be run. The resulting reads can be merged with the reads from the first round to achieve the coverage. The overall experiments should be designed to include some biological controls (e.g., normal tissue) to account for bias based on factors like copy-number variations15.
Our bioinformatics approach to investigate the DNA methylation observed in the patient samples provides possible target genes that show differential methylation enrichment in core promoter, promoter shore regions and also various combination of these. Despite the fact that MBDCap-seq can at most reach the resolution down only up to a 100 bp, the distinction of these regions into core and shore as described in the protocol enables us to identify potential target regions for further investigation of the regulatory roles of differential methylation enrichments and conduct future mechanistic studies. Here we also describe a way to visualize these identified regions using tornado plots, which provides an overview of the differential enrichment patterns.
There are other techniques which are based on chemical conversion of unmethylated cytosine to uracil by sodium bisulfide through deamination, and provide a more comprehensive coverage of DNA methylation at a single nucleotide resolution. Usually after bisulfide conversion, the DNA is sequenced using pyrosequencing for target-based experiments or by whole genome shotgun bisulfide sequencing (BS-seq). These techniques address the limitation of the technique described here as MBDCap-seq can only achieve at the most a resolution of up to 100 bp. However, despite these approaches being more comprehensive, they are relatively expensive and can exponentially increase the cost as the depth of sequencing or the number of patients is increased. On the other hand, commonly used bisulfide microarray platforms are cost effective, but provide relatively low coverage with probes investigating regions close to gene promoters rather than the whole genome. MBDCap-seq however, provides a balance between these high-cost sequencing techniques and low-cost methylation arrays16. We are using this protocol to investigate the DNA methylation in a large number of patient cohorts, as a part of the Cancer Methylome System project10 and can be used in future studies involving large patient cohorts.
The authors have nothing to disclose.
The work is supported by CPRIT Research Training Award RP140105, as well as partially supported by US National Institutes of Health (NIH) grants R01 GM114142 and by William & Ella Owens Medical Research Foundation.
Methylminer DNA enrichment Kit | Invitrogen | ME10025 | |
Dynabeads M-280 Streptavidin | Invitrogen | 112-05D | |
Bioruptor Plus Sonication Device | diagenode | B01020001 | |
3M sodium acetate pH 5.2 | Sigma | S7899 | 100ml |
SPRIworks Fragment Library System I | Beckman Coulter | A50100 | Fully automated library construction system |
Adapter Primers | Bioo Scientific | 514104 | PCR primer mix |
Qubit | Invitrogen | Q32854 | Fluorometric Quantitation System |
PCR master mix | KAPA scientific | KK2621 | PCR master mix |
AMPure XP | Beckman Coulter | A63881 | PCR Purification beads |
EB Buffer | Qiagen | 19086 | |
HiSeq 2000 Sequencing System | Illumina |