The present study describes the workflow to manage DNA methylation data obtained by microarray technologies. The protocol demonstrates steps from sample preparation to data analysis. All procedures are described in detail, and the video shows the significant steps.
Obesity is directly connected to lifestyle and has been associated with DNA methylation changes that may cause alterations in the adipogenesis and lipid storage processes contributing to the development of the disease. We demonstrate a complete protocol from selection to epigenetic data analysis of patients with and without obesity. All steps from the protocol were tested and validated in a pilot study. 32 women participated in the study, in which 15 individuals were classified with obesity according to Body Mass Index (BMI) (45.1 ± 5.4 kg/m2); and 17 individuals were classified without obesity according to BMI (22.6 ± 1.8 kg/m2). In the group with obesity, 564 CpG sites related to fat mass were identified by linear regression analysis. The CpG sites were in the promoter regions. The differential analysis found 470 CpGs hypomethylated and 94 hypermethylated sites in individuals with obesity. The most hypomethylated enriched pathwayswere in the RUNX, WNT signaling, and response to hypoxia. The hypermethylated pathways were related to insulin secretion, glucagon signaling, and Ca2+. We conclude that the protocol effectively identified DNA methylation patterns and trait-related DNA methylation. These patterns could be associated with altered gene expression, affecting adipogenesis and lipid storage. Our results confirmed that an obesogenic lifestyle could promote epigenetic changes in human DNA.
Large-scale omics technologies have been increasingly used in studies of chronic diseases. An interesting feature of these methods is the availability of a large amount of generated data to the scientific community. Therefore, a demand to standardize the protocols has arisen to allow technical comparison between studies. The present study suggests the standardization of a protocol to obtain and analyze DNA methylation data, using a pilot study as an applicated example.
Negative energy expenditure predominates in modern human lifestyles, leading to an excessive accumulation of adipose tissue and, consequently, the development of obesity¹. Many factors have increased obesity rates, such as sedentarism, high-calorie diets, and stressful routines. The World Health Organization (WHO) estimated that 1.9 billion adults were obese in 2016, which means that more than 20% of the world's population has over 30 kg/m2 BMI2. The most recent update of 2018 revealed that the prevalence of obesity in the United States of America (USA) was higher than 42%3.
Epigenetics is the structural adaptation of chromosomal regions to register, signal, or perpetuate altered activity states4. DNA methylation is a reversible chemical alteration in the cytosine-guanosine dinucleotides sites (CpG sites), forming 5-methylcytosine-pG (5mCpG). It can modulate gene expression by regulating the access of the transcription machinery to the DNA5,6,7,8. In this context, it is essential to understand which CpG sites are associated with obesity-related traits9. Many factors can support or prevent site-specific DNA methylation. Necessary enzymes for this process, such as DNA methyltransferases10 (DNTMs) and ten-eleven translocations (TETs), can promote DNA methylation or demethylation under environmental exposures11.
Considering the growing interest in DNA methylation studies over the last years, choosing the most appropriate analysis strategy to precisely answer each question has been an essential concern of researchers12,13,14. The 450K DNA methylation array is the most popular method, used in more than 360 publications14 for determining the DNA methylation profile. It can determine the methylation of up to 485,000 CpGs located in 99% of known genes15. However, this array has been discontinued and replaced with the EPIC, covering 850,000 CpG sites. The present protocol can be applied for both 450K and EPIC16,17,18.
The protocol is presented step-by-step in Figure 1 and comprises the following steps: population selection, sampling, experiment preparation, DNA methylation pipeline, and bioinformatics analysis. A pilot study performed in our laboratory is demonstrated here to illustrate the steps of the proposed protocol.
Figure 1: Schematic of the presented protocol. Please click here to view a larger version of this figure.
The ethics committee of Ribeirão Preto Medical School University Hospital of the University of São Paulo (HCRP-USP) approved the study (CAAE:14275319.7.0000.5440). The participants signed the consent form, and all procedures were conducted following the Declaration of Helsinki.
1. Population and sampling
2. Anthropometry and body composition
Table 1: Population characteristics. All variables were parametric (p > 0.05, Shapiro Wilk), and differences were evaluated using an Independent t-Test, in which p < 0.05 was considered significant. *p < 0.0001. Please click here to download this Table.
3. Collection of biological material
4. DNA extraction
5. Preparation before methylation analysis
6. DNA methylation pipeline
NOTE: The DNA methylation experiment is divided into 4 days (Figure 1). Follow the manufacturer's recommendations and specifications to obtain accurate results.
7. Bioinformatics analysis
NOTE: It is necessary to change some attributes of the kind of microarray (e.g., arraytype = "450k" should be replaced by arraytype = "EPIC"). The author of the ChAMP pipeline describes in detail all fields that should be modified to analyze arrays on the Bioconductor page of the package30.
After using the ChAMP pipeline, 409,887 probes were considered in the analysis after all filters (unqualified CpGs, non-CG probes, CpG near SNPs, multihit sites, and CpGs related to XY); a scheme is represented in Figure 2.
Figure 2: Pipeline of the bioinformatics analysis. Please click here to view a larger version of this figure.
Furthermore, the density plot revealed that all samples had similar densities of the beta distribution related to the quality control steps. This analysis evaluates the distribution of beta values and points out if there are any samples that need to be excluded. In this batch, no samples were excluded based on this analysis.
Figure 3: Beta values density plot obtained with the ChAMP package. Please click here to view a larger version of this figure.
The singular value decomposition (SVD) analysis was used to verify the principal components that significantly influenced DNA methylation data variability. The data revealed that BMI, WC (p < 1e10-5), and FM (p < 0.05) had a significant effect on the data variability (Figure 4).
Figure 4: Singular value decomposition analysis. Please click here to view a larger version of this figure.
Cell type estimation revealed that both natural killer cells (NK) and B cells were higher in obese women (Figure 5).
Figure 5: Cell fractions estimated by Houseman's method. Gran: granulocyte. CD4T: helper T cell, lymphocyte. CD8T: cytotoxic T cell, lymphocyte. Mono: monocyte. B Cell: B lymphocyte. NK: natural killer cells, lymphocytes. *p < 0.05. Please click here to view a larger version of this figure.
DNA methylation levels between obese and non-obese women differed before and after cell-type correction. Before DNA methylation data correction for cell types, 43,463 differentially methylated positions (DMPs) were observed, and 3,329 CpGs remained significant after. 445 CpG sites were in intergenic regions (IGR), and 2,884 were in genic regions, with most in the promoter region (n = 1,438). The distribution along all regions were TSS1500 (n = 612), TSS200 (n = 826), 5'UTR (n = 390), first exon (n = 273), body (n = 724), and 3'UTR (n = 59). Considering Δβ values <-0.05 and >0.05, 162 CpGs were hypomethylated and 576 were hypermethylated in obese compared to non-obese individuals (Figure 6). The data are available in the GEO database under the register code GSE166611.
Figure 6: Differentially methylated positions. Please click here to view a larger version of this figure.
It is possible to evaluate the different methylation between groups and find CpG sites associated with specific traits in methylation studies. For fat mass studies, 13,222 CpG sites were found. 6,159 CpGs in the promoter region were related to fat mass, with 470 hypermethylated and 94 hypomethylated (Figure 7), and the respective genes were enriched (Table 2).
Figure 7: Promoter regions of hypomethylated and hypermethylated genes, independently related to fat mass. Please click here to view a larger version of this figure.
Table 2: Functional enrichment of the genes from the differentially methylated CpG sites. Please click here to download this Table.
Supplemental Material 1: Please click here to download this File.
DNA methylation arrays are the most used methods to access DNA methylation due to their cost-benefit ratio14. The present study described a detailed protocol using a commercially available microarray platform to evaluate DNA methylation in a pilot study performed in a Brazilian cohort. The obtained results from the pilot study confirmed the effectiveness of the protocol. Figure 3 shows the sample comparability and the complete bisulfite conversion32.
As a quality control step, the ChAMP algorithm recommended the exclusion of CpGs sites during the filtering process. The aim of excluding probes is to improve data analysis and eliminate bias. The low-quality CpGs (p-values lower than 0.05) were removed to eliminate experimental noise in the dataset. The targets remained passed in the density plot analysis. Zhou33 described the importance of filtering CpGs near SNPs to avoid mismatches, misinterpretation of polymorphic cytosines' methylation, and causing switch color of type I probe design34. Also, as XY chromosomes are differentially impacted by imprinting, Heiss and Just35 reinforced the importance of filtering those probes because, in females, the problems with hybridization may be confounding factors35.
The DMAPs expiration date, the formamide opening date, the analytical quality of the absolute ethanol, and total leucocyte counts are considered critical steps in the protocol.
Furthermore, according to our observations, the cell-type estimation is essential in performing the bioinformatics analysis. The Houseman method performs the cell type estimation as described in Tian's study30. This method is based on 473 specific CpG sites that can predict the percentages of the most important cell types, such as granulocytes, monocytes, B cells, and T cells36. We used the recommended function "myRefbase" from the ChAMP package. After the estimation, the ChAMP algorithm adjusts the beta values and eliminates this bias from the dataset. This step is crucial in studies focused on obesity because this population has a considerable difference in white blood cells due to their chronic inflammatory state.
We only changed the original cap map for the common PCR seal regarding method modification and troubleshooting. After each centrifugation process, the seal was changed for a new one. We could not use the standard heat sealing and adapted it using aluminum foil around the plate.
Although commercial assays have been considered a gold standard for epigenetic studies, one limitation of the protocol could be the specificity of the reagents and equipment from a unique brand37,38,39,40. Another limitation is the lack of indicators that allow identification of the correct progress of the experiment41.
The standardization of the present protocol represents a great guide for epigenetic research, reducing human errors during the process and allowing successful data analysis and comparability between different studies.
According to our results, DNA methylation experiments are suitable for studies comparing individuals with and without obesity43. Also, the proposed bioinformatics analysis provided high-quality data and could be considered in large-scale studies.
Using the SVD analysis, we identified that the obesity-related traits (BMI, WC, and FM) influenced the variability in DNA methylation data. As a significant result, the cell-type estimation indicates that both natural killer cells (NK) and B cells were higher in women with obesity than in women without obesity (Figure 5). The higher counts of those cells could be explained by the low-grade inflammatory state of these individuals44. We observed that patients with obesity have hypo- and hypermethylated CpGs in promoter regions of genes associated with fat mass. Most of the sites were hypomethylated, which could be related to the natural increase in reactive oxygen species (ROS) levels in these individuals. This oxidative stress condition may promote guanine perturbance at the dinucleotide site, forming 8-hydroxy-2'-deoxyguanosine (8-OHdG), resulting in a 5mCp-8-OHdG dinucleotide site, and causing TET enzymes recruitment. All these events could be responsible for promoting DNA hypomethylation and hypermethylation by different mechanisms of action45.
In addition, the rate of adipogenesis seems to increase in individuals with obesity, with approximately 10% of new cells to old cells46,47. Epigenetic contributions, emphasizing the obesogenic environment, can alter the cells' proliferation and differentiation rates, favoring the development of fat mass48. Epigenetic changes can also affect adipogenic programs, facilitating or restricting their development. Primary transcription factors (PPARγ or C / EBPα) or the assembly of multiprotein complexes, positioned in downstream promoter regions operated by including or excluding epigenetic modifying enzymes, regulate gene expression through hyper- or hypomethylation45. The PPARγ pathway has been previously described to alter the WNT pathway, which had genes enriched in this study. Although it is still unknown how WNT signaling occurs during adipogenesis, recent studies have reported that it might have essential roles in adipocyte metabolism, particularly under obesogenic conditions49.
The authors have nothing to disclose.
We would like to thank Yuan Tian, Ph.D. (tian.yuan@ucl.ac.uk) for being available to answer all doubts about the ChAMP package. We also thank Guilherme Telles, Msc. for his contribution both to the technical and scientific issues from this paper; he made important considerations regarding epigenetics and video capturing and formatting techniques (guilherme.telles@usp.br). Consumables Funding: São Paulo Research Foundation (FAPESP) (#2018/24069-3) and National Council for Scientific and Technological Development (CNPq: #408292/2018-0). Personal funding: (FAPESP: #2014/16740-6) and Academic Excellence Program from Coordination for Higher Education Staff Development (CAPES:88882.180020/2018-01). The data will be made publicly and freely available without restriction. Address correspondence to NYN (e-mail: nataliayumi@usp.br) or CBN (e-mail: carla@fmrp.usp.br).
Absolute ethanol | J.T. Baker | B5924-03 | |
Agarose gel | Kasvi | K9-9100 | |
Electric bioimpedance | Quantum BIA 450 Q – RJL System | ||
Ethylenediaminetetraacetic acid (EDTA) | Corning | 46-000-CI | |
EZ DNA Methylation-Gold kit | ZymoResearch, Irvine, CA, USA | D5001 | |
Formamide | Sigma | F9037 | |
FMS—Fragmentation solution | Illumina | 11203428 | Supplied Reagents |
HumanMethylation450 BeadChip | Illumina | ||
Maxwell Instrument | Promega, Brazil | AS4500 | |
MA1—Multi-Sample Amplification 1 Mix | Illumina | 11202880 | Supplied Reagents |
MicroAmp Optical Adhesive Film | Thermo Fisher Scientific | 201703982 | |
MSM—Multi-Sample Amplification Master Mix | Illumina | 11203410 | Supplied Reagents |
NaOH | F. MAIA | 114700 | |
PB1—Reagent used to prepare BeadChips for hybridization | Illumina | 11291245 | Supplied Reagents |
PB2—Humidifying buffer used during hybridization | Illumina | 11191130 | Supplied Reagents |
2-propanol | Emsure | 10,96,34,01,000 | |
RA1—Resuspension, hybridization, and wash solution | Illumina | 11292441 | Supplied Reagents |
RPM—Random Primer Mix | Illumina | 15010230 | Supplied Reagents |
STM—Superior Two-Color Master Mix | Illumina | 11288046 | Supplied Reagents |
TEM—Two-Color Extension Master Mix | Illumina | 11208309 | Supplied Reagents |
Ultrapure EDTA | Invitrogen | 155576-028 | |
96-Well Reaction Plate with Barcode (0.1mL) | ByoSystems | 4346906 | |
96-Well Reaction Plate with Barcode (0.8mL) | Thermo Fisher Scientific | AB-0859 | |
XC1—XStain BeadChip solution 1 | Illumina | 11208288 | Supplied Reagents |
XC2—XStain BeadChip solution 2 | Illumina | 11208296 | Supplied Reagents |
XC3—XStain BeadChip solution 3 | Illumina | 11208392 | Supplied Reagents |
XC4—XStain BeadChip solution 4 | Illumina | 11208430 | Supplied Reagents |