This work describes an optimized methyl-CpG-binding domain (MBD) sequencing protocol and a computational pipeline to identify differentially methylated CpG-rich regions in chronic lymphocytic leukemia (CLL) patients.
The role of long noncoding RNAs (lncRNAs) in cancer is coming to the forefront due to growing interest in understanding their mechanistic functions during cancer development and progression. Despite this, the global epigenetic regulation of lncRNAs and repetitive sequences in cancer has not been well investigated, particularly in chronic lymphocytic leukemia (CLL). This study focuses on a unique approach: the immunoprecipitation-based capture of double-stranded, methylated DNA fragments using methyl-binding domain (MBD) proteins, followed by next-generation sequencing (MBD-seq). CLL patient samples belonging to two prognostic subgroups (5 IGVH mutated samples + 5 IGVH unmutated samples) were used in this study. Analysis revealed 5,800 hypermethylated and 12,570 hypomethylated CLL-specific differentially methylated genes (cllDMGs) compared to normal healthy controls. Importantly, these results identified several CLL-specific, differentially methylated lncRNAs, repetitive elements, and protein-coding genes with potential prognostic value. This work outlines a detailed protocol for an MBD-seq and bioinformatics pipeline developed for the comprehensive analysis of global methylation profiles in highly CpG-rich regions using CLL patient samples. Finally, a protein-coding gene and an lncRNA were validated using pyrosequencing, which is a highly quantitative method to analyze CpG methylation levels to further corroborate the findings from the MBD-seq protocol.
The use of next-generation sequencing techniques to analyze global DNA methylation profiles has been increasingly popular during recent years. Genome-wide methylation assays, including microarray- and non-microarray-based methods, were developed based on the following: the bisulfite conversion of genomic DNA, methylation-sensitive restriction enzyme digestions, and the immunoprecipitation of methylated DNA using methyl CpG-specific antibodies.
Aberrant DNA methylation is one of the hallmarks of leukemia and lymphomas, includingchronic lymphocytic leukemia (CLL). Earlier, several groups including ours characterized the DNA methylation profiles of different CLL prognostic subgroups and normal, healthy B-cell controls using the bisulfite conversion of genomic DNA, followed by micro-array-based methods or whole-genome sequencing1,2,3,4. The bisulfite conversion of genomic DNA leads to the deamination of unmodified cytosines to uracil, leaving the modified methylated cytosines in the genome. Once converted, the methylation status of the DNA can be determined by PCR amplification and sequencing using different quantitative or qualitative methods, such as micro-array-based or whole-genome bisulfite sequencing (WGBS). Although bisulfite conversion-based methods have many advantages and are widely used in different cancer types to analyze DNA methylation levels, there are a few drawbacks associated with this technique. WGBS sequencing allows single-base-pair resolution with lower amounts of DNA and is the best suitable option for analyzing a large number of samples. However, this method fails to differentiate the modifications between the 5 mC and 5 hmC levels in the genome5,6. Additionally, microarray-based methods do not offer complete coverage of the genome.
In a recent study from our laboratory7, immunoprecipitation based methods, rather than bisulfite conversion, were used to identify highly CpG-rich, differentially methylated regions on a global scale in CLL patients and normal healthy controls. Inmethyl-CpG-binding domain (MBD)next-generation sequencing (MBD-seq), the enrichment of double-stranded fragmented DNA depends upon the degree of CpG methylation. This method can overcome the drawbacks of the bisulfite conversion method and can also provide genome-wide coverage of CpG methylation in an unbiased and PCR-independent manner. Additionally, unlike bisulfite conversion-based microarray methods, MBD-seq can be used to analyze the methylation status of repetitive elements, such as long interspersed nuclear elements (LINEs), short interspersed nuclear elements (SINEs), long terminal repeats (LTRS), etc. However, compared to bisulfite conversion methods, an MBD-seq protocol requires a relatively large amount of input DNA. Also, the quality of the sequencing reads and the data depend on the specificity, affinity, and quality of the antibodies used.
The current study explains a detailed MBD-seq protocol to enrich methylated DNA for next-generation sequencing. It uses a commercial methylated DNA binding enrichment kit (listed in the Materials Table), as well as a computational pipeline to visualize and interpret methylation sequencing data to identify CLL-specific hyper- and hypomethylated regions compared to normal healthy controls. Basically, this method makes use of the ability of the MBD of the human MBD2 protein interaction with methylated CpGs to extract DNA enriched with methylated CpGs, and this is followed by the high-throughput sequencing of methylated DNA.
The ethical approval for collecting the CLL samples is from 2007-05-21, with the following registration number: EPN Gbg dnr 239/07. All CLL patients were diagnosed according to recently revised criteria8, and the samples were collected at the time of diagnosis. The patients in the study were included from different hematology departments in the western part of Sweden after written consent had been obtained. Only CLL peripheral blood mononuclear cell (PBMC) samples with a tumor percentage of leukemic cells ≥70% were selected in this study.
1. Preparations
2. Genomic DNA Extraction and Sonication
3. Bead Preparation Prior to Binding with MBD-biotin Protein
4. Binding the MBD-biotin Protein to the Washed Beads
5. Binding MBD-biotin Beads with Fragmented Genomic DNA
6. Removing the Unbound DNA and Eluting the Methylated DNA from the Beads
7. Ethanol Precipitation and the Enrichment of Methylated DNA
8. Bioinformatics Analysis Method 1: Identifying CLL-associated Differentially Methylated Regions (cllDMRs)
9. Bioinformatics Analysis Method 2: Identifying CLL-associated Significantly Differentially Methylated Regions (cll sigDMR's)
MBD-seq was recently performed on CLL patients and matched, normal, healthy controls to identify CLL-specific differentially hyper- and hypomethylated genes7. The experimental and bioinformatic pipeline used for analyzing the data generated from CLL and normal healthy samples are shown in Figure 1A and 1B. These analyses identified several CLL-specific differentially methylated regions (cllDMRs), which were significantly hyper/hypomethylated from IGHV-mutated and IGHV-unmutated samples compared to control samples, with a p-value <0.00001. Figure 2A shows all the cllDMRs obtained from both normal B-cell and normal PMBC comparisons. All the cllDMRs were mapped to different classes of protein-coding and noncoding genes, as shown in Figure 2B. Importantly, in this analysis, the comparisons were performed independently between the CLL patient samples, with two different normal controls, sorted B-cell and PMBCs. Interestingly, the analysis resulted in a large overlap of common CLL-specific differentially methylated genes (cllDMGs; 851 hypermethylated and 2,061 hypomethylated) when compared to normal B-cell control and normal PBMC control samples (Figure 2C), suggesting that these cllDMGs can be possible CLL signature genes with significant roles in disease pathogenesis. To strengthen the analyzed data, several CpG sites located in one hypermethylated protein coding gene, SKOR1, and one hypomethylated long noncoding lncRNA, AC012065.7, were validated using a pyrosequencing method, as described in earlier publications7,17,18 (Figure 3A and B). Figure 4 shows sheared DNA after the sonication protocol. The fragmented DNA ranges between 150 and 300 bp, making it ideal for sequencing purposes.
Figure 1: Overview of the Work Flow Used in this Study. This figure, obtained from our recent publication7, shows the design of the overall analysis used to identify the differentially methylated regions (DMRs) in chronic lymphocytic leukemia (CLL) patient samples. (a) Experimental design of the MBD-based enrichment of methylated DNA. (b) The bioinformatics analysis pipeline used to identify CLL-specific differentially methylated regions (cllDMRs). Please click here to view a larger version of this figure.
Figure 2: Hypermethylated and Hypomethylated cllDMRs and cllDMGs of CLL Patient Samples Compared to Normal, Healthy, Sorted B Cells7. (a) All the chronic lymphocytic leukemia (CLL)-associated differentially methylated regions with significant p-values (< 0.00001) obtained from comparing the IGHV-mutated and -unmutated CLL samples to normal sorted B cells (upper panel) and normal PBMC samples (lower panel). The enrichments shown in the heatmap were within a ± 3-kb window from differentially methylated region (DMRs). (b) Venn diagram showing the overlap of CLL-specific differentially methylated genes (cllDMGs; hypermethylated and hypomethylated) between IGHV-mutated and IGHV-unmutated groups. The pie chart represents the percentage of genes belonging to different classification, such as protein-coding, long noncoding RNA (lncRNA), pseudogenes, antisense, and other noncoding RNAs. (c) Venn diagram showing the overlap of common differentially methylated genes (IGHV-mutated and IGHV-unmutated prognostic groups) between B-cell and PBMC comparisons. The left panel of the Venn diagram shows the overlap for hypermethylated genes and the right panel for hypomethylated genes. Please click here to view a larger version of this figure.
Figure 3: Validation of the DNA Methylation Status for Individual CpG Sites on the Selected Significantly Differentially Methylated Target Genes. Pyrosquencing data for the quantification of the percentage of DNA methylation for two selected genes using an independent chronic lymphocytic leukemia (CLL) sample cohort containing 54 samples and 6 normal, healthy, age-matched B-cell samples. (A) The boxplots represent the percentage of methylation levels for 3 individual CpG sites of the hypermethylated SKOR1 gene using pyrosequencing. (B) The boxplot shows the degree of methylation for 5 individual CpG sites of hypomethylated AC012065.7 gene using pyrosequencing. The boxes indicate the interquartile range (25-75%), while the inner horizontal line indicates the median value. The whiskers represent the minimum and maximum values. The p-value corresponding to the degree of differential methylation of CLL samples over normal B cells are shown in the boxplots for each individual CpG site. Please click here to view a larger version of this figure.
Figure 4: Shearing of Genomic DNA. Representative results of four chronic lymphocytic (CLL) DNA samples after sonication, along with one CLL sample of before (0 min) sonication was run in a 2% pre-cast agarose gel, stained and visualized under UV light. The DNA size ladder is indicated on lane 1. Please click here to view a larger version of this figure.
MBD-seq is a cost-effective, immunoprecipitation-based technique that can be used to study methylation patterns with complete genome-wide coverage. Both MeDIP seq (methylated DNA immunoprecipitation followed by sequencing) and MBD-seq result in the enrichment of CpG-rich methylated DNA. However, MBD-seq shows more affinity towards binding to highly CpG-rich regions when compared to MeDIP seq19. Using a methyl-binding enrichment kit, one can fractionate the DNA into high-CpG- and low-CpG-rich regions by eluting DNA with high-salt and low-salt buffers, respectively. In this investigation, only a single fraction elution was performed to capture highly enriched CpG-rich regions covering most of the CpG islands.
MBD-seq can be a powerful alternative to WGBS, which is commonly used in CLL and other leukemia investigations. Even though MBD-seq requires a relatively higher amounts of input DNA compared to the bisulfite conversion-based methods, it allows for the investigation of global methylation changes that are specific only for 5 mC modifications, without any PCR amplification bias created after the conversion. Thus, MBD-seq is an ideal method to address cllDMGs, which could be potential epigenetic-based CLL signature genes with prognostic value.
The crucial factors for performing this method are the quality and the sonication range of the fragmented DNA used prior to the binding reaction. All samples showed shorter fragment ranges, between 150 and 300 bp (Figure 4), resulting in good-quality sequencing results, with around 25-33 million unique reads for each sample mapping to the genome after data filtering and cleaning.
Finally, the methylation status of hyper- and hypomethylated cllDMGs has been validated for few genes using a pyrosequencing method in an independent sample cohort. This method gives the percentage of DNA methylation, depending on the ratio of C-to-T at individual CpG sites based on the amount of C and T incorporated during the sequence extension. The hypermethylated cllDMGs showed high percentages of methylation in CLL samples compared to the normal samples. Likewise, the hypomethylated cllDMGs showed the opposite. Pyrosequencing data for two cllDMGs is shown in Figure 3.
Compared to array-based methods, this method allows for the wider examination of sequences across the genome, including previously annotated regions spanning protein-coding regions and non-annotated sequences spanning repetitive elements and lncRNAs. Thus, this approach on CLL patients identified several novel cllDMGs spanning lncRNAs and repetitive elements. Since these investigations have not been performed previously in CLL patient samples, this study serves as a valuable resource for identifying CLL-associated differential methylated regions in different prognostic subgroups. These cllDMGs will serve as novel biomarkers and targets for epigenetic drug therapy.
The authors have nothing to disclose.
This study was supported by the Swedish Research Council, the Swedish Cancer Society, the Knut and Alice Wallenberg Foundation (KAW), and FoU VästraGötalandsregionen.
Dneasy Blood and tissue kit | Qaigen | 69504 | |
Lymphoprep solution | A X I S-S H I E L D | 1114544 | |
Nano drop 2000 | Thermo Fischersceintific | ||
TE buffer PH 8 | Sigma aldrich | 93283 | |
Bioruptor standard sonication device | Diagenode | UCD-200 | |
TPX bioruptor tubes 1.5ml | Diagenode | C30010010-300 | |
3 M Sodium acetate | Diagenode | C03030002 | |
E-gel iBase safe imager combo kit | Thermo Fischersceintific | G6465EU | |
E-gel 2% Agarose gels | Thermo Fischersceintific | G441002 | |
Methylminer Methylated DNA enrichment kit | Thermo Fischersceintific | ME10025 | |
Labquake Tube Shaker/Rotators | Thermo Fischersceintific | 415110 | |
Dynal MPC-S | Thermo Fischersceintific | A13346 | |
Vortex mixer | VWR | 12620-848 | |
Absolute Ethanol | Any company | ||
70% Ethanool | Any company | ||
DNAse free water | Milli Q | ||
DNA precipitant (3M sodium acetate) | Diagenode | C03030002 | |
Safe seal 1.5ml eppendorf tubes | Eppendorf | 4036-3204 | |
Qubit dsDNA HS Assay Kit | Thermo Fischersceintific | Q32851 | |
Qubit 0.5ml tubes | Thermo Fischersceintific | Q32856 | |
Qubit | Thermo Fischersceintific | Q32866 | |
Illumina Hiseq2000 Platform | Illumina | ||
Water Bath | Grant | ||
Heat block | grant | ||
Tube rotater | Labquake |