Here, we present a protocol using RNA-seq to monitor mRNA levels over time during the hypoxic response of S. cerevisiae cells. This method can be adapted to analyze gene expression during any cellular response.
Complex changes in gene expression typically mediate a large portion of a cellular response. Each gene may change expression with unique kinetics as the gene is regulated by the particular timing of one of many stimuli, signaling pathways or secondary effects. In order to capture the entire gene expression response to hypoxia in the yeast S. cerevisiae, RNA-seq analysis was used to monitor the mRNA levels of all genes at specific times after exposure to hypoxia. Hypoxia was established by growing cells in ~100% N2 gas. Importantly, unlike other hypoxic studies, ergosterol and unsaturated fatty acids were not added to the media because these metabolites affect gene expression. Time points were chosen in the range of 0 – 4 h after hypoxia because that period captures the major changes in gene expression. At each time point, mid-log hypoxic cells were quickly filtered and frozen, limiting exposure to O2 and concomitant changes in gene expression. Total RNA was extracted from cells and used to enrich for mRNA, which was then converted to cDNA. From this cDNA, multiplex libraries were created and eight or more samples were sequenced in one lane of a next-generation sequencer. A post-sequencing pipeline is described, which includes quality base trimming, read mapping and determining the number of reads per gene. DESeq2 within the R statistical environment was used to identify genes that change significantly at any one of the hypoxic time points. Analysis of three biological replicates revealed high reproducibility, genes of differing kinetics and a large number of expected O2-regulated genes. These methods can be used to study how the cells of various organisms respond to hypoxia over time and adapted to study gene expression during other cellular responses.
Many organisms respond to hypoxia, or low O2, by altering gene expression 1,2,3. This response helps cells cope with the lack of a substrate critical for aerobic respiration and for several biosynthetic reactions, but also with a changing redox state 4. Several microarray studies performed in S. cerevisiae show that the mRNA levels of hundreds of genes change in response to hypoxia 5,6,7,8,9,10,11,12. Recently, RNA-seq was used to characterize gene expression changes over time during hypoxia 13. Here, the experimental details are presented and discussed.
Hypoxia can be achieved in various ways, each producing a different level of O2. Here, hypoxia was established by continuously flowing ultra-high-purity N2 into flasks, which lowers [O2] dissolved immediately with reproducible kinetics 10. It is possible that there are some O2 molecules present that contribute to metabolism and gene expression but this environment is considered very close to anaerobic. In the absence of O2, yeast cells cannot biosynthesize heme, ergosterol and unsaturated fatty acids 4,12,14. Thus, previous studies have included these metabolites when growing yeast without oxygen 5,10,15. However, many hypoxic responses are mediated by depletion of these metabolites and thus replenishing them reverses the hypoxic gene expression responses 12,16. In order to mimic natural hypoxia, these metabolites were not added to the media. In the short time that cells were exposed to hypoxia without the presence of these essential metabolites, there was no noticeable increase in cell death (data not shown) nor a prolonged stress response 13.
The response is also dependent upon the strain and its genotype. Especially important are the alleles of the known regulators of hypoxic responses 2. The S288C strain background is highly desired so that results can be compared to the other genomic studies performed with this strain. However, S288C contains a partial loss-of-function allele of the HAP1 gene 17, a transcriptional regulator critical for the hypoxic response. This allele was repaired in S288C using a wildtype copy from the Σ1278b strain background 11.
Gene expression is highly dependent upon the cellular environment. Thus, when performing genome-wide mRNA analysis, it is important to maintain a constant environment while varying another parameter such as time, stimulus, or genotype. To achieve highly reproducible results, consider these three practices for the study and all of its biological or technical replicates. First, the same experimenter(s) should carry out the study, since technical practices may vary across experimenters. Second, the same batch of ingredients should be used in the growth media as each batch has a slightly different composition that can affect gene expression. Third, to minimize cell cycle effects, each time point should consist of asynchronous cells in the mid-log phase of growth (1 – 2 x 107 cells/mL).
When characterizing a complex response like the gene expression response to hypoxia, a time course is advantageous for determining the kinetics of various events. Specific time points should be chosen that will capture the major changes of the response. In this study, time points between 0 and 4 h were observed, because past experiments revealed widespread changes in gene expression during this period 13.
To measure global gene expression, RNA-seq was used 18,19. This method uses next-generation sequencing to determine the relative abundance of each gene's transcript. Compared to DNA microarray analysis, RNA-seq exhibits higher sensitivity (to detect less abundant transcripts), a greater dynamic range (to measure greater fold changes) and superior reproducibility (to accurately follow gene expression over time). Typically, most cellular RNA is ribosomal RNA so many methods have been developed to enrich for specific RNA species 20. Here, poly-T beads were used to purify poly-A-containing mRNA transcripts, though the various commercially- available rRNA depletion kits could also be effective in mRNA enrichment.
Here, the S. cerevisiae gene expression response to hypoxia was characterized. Cells were exposed to hypoxia and then sampled at eight time points (0, 5, 10, 30, 60, 120, 180 and 240 min). To confirm reproducibility and to identify statistically changed transcripts, three biological replicates were performed. RNA was extracted by mechanical disruption and column purification, and then processed for RNA-seq analysis. The post-sequencing pipeline is described and programming scripts are provided that allow exact replication of the analyses performed. Specifically, Trimmomatic 21, TopHat2 22, HTseq 23, the R statistical environment 24, and the DESeq2 package 25 were used to process the RNA-seq data and to identify 607 genes that change significantly during hypoxia. Principal component analysis (PCA) and gene expression of the replicates indicated the reproducibility of the technique. Clustering and heatmaps revealed wide-ranging expression kinetics, while gene ontology (GO) analysis showed that many cellular processes, like aerobic respiration, are enriched in the set of oxygen-regulated genes.
1. Inducing Hypoxia
2. RNA Extraction
3. Determining RNA Concentration and Quality
4. RNA-seq Analysis
The hypoxia time course and RNA-seq analysis were performed independently three times. To examine the reproducibility of the three replicates, gene expression data for all genes was analyzed using Principal Component Analysis (PCA). Figure 2 shows how the samples change over the first two principal components, which together represent 58.9% of the variability. This analysis indicated that each time course exhibits similar changes (as depicted by the similar shape of each curve). In addition, the last two replicates were more similar to each other than to the first replicate. This is consistent with the fact that the last two replicates were performed by the same person using the same batches of media components, while the first replicate was performed by a different person and batches. This finding highlights the fact that consistency of technique and chemicals is an important factor in obtaining high reproducibility. Lastly, PCA analysis is useful in assessing whether the chosen time points capture the major changes that occur during the time course. Focusing on one replicate curve, there are larger distances between the earlier samples, indicating large changes in gene expression, and smaller distances between the later samples, indicating smaller changes and possibly the end of the switch from aerobic to hypoxic gene expression.
Next, the DESeq2 package in R25 was used to identify genes showing a significant change compared to time 0 (p-values in Supplementary Table 1). Of these significant genes, 607 changed more than 4-fold (fold changes in Supplementary Table 1). The expression patterns of these genes were clustered, so that similarly-regulated genes were grouped together, and then displayed in a heatmap (Figure 3). This figure shows that expression changes are highly reproducible across replicates. Also, genes exhibit variable kinetics with both increases and decreases in expression. Many genes exhibit a peak increase or decrease at 30 min, likely due to a transient stress response 13.
Figure 4 depicts the expression of two downregulated and two upregulated genes previously found to be O2-regulated. The genes change expression reproducibly, with similar curves between the biological replicates. The gene with the least variability between replicates is DAN1, with overlapping curves. The gene with the most variability is CYC1, perhaps because expression of this gene is naturally variable. This figure also illustrates that large fold changes (up to 1,024-fold) were detected.
To identify the cellular processes enriched in the set of 607 O2-regulated genes, GO Term enrichment was performed (Supplementary Table 2). As expected, many O2-regulated genes play a role in cellular respiration, other aspects of metabolism, and in ribosomal processes 13.
Sample# | Time in Hypoxia | mL culture + mL YPD | ||
2 | 5 min | 20 mL + 0 mL | ||
3 | 10 min | 20 mL + 0 mL | ||
4 | 30 min | 20 mL + 0 mL | ||
5 | 60 min | 10 mL + 10 mL | ||
6 | 120 min | 10 mL + 10 mL | ||
7 | 180 min | 5 mL + 15 mL | ||
8 | 240 min | 4 mL + 16 mL |
Table 1. Cell Dilutions Performed at Time 0.
These dilutions have been experimentally determined to result in mid-log concentration (1 – 2 × 107 cells/mL) after the indicated time in N2.
Supplementary Table 1. Expression and Statistical Data for all Genes.
This table contains the systematic name, the adjusted p-values for seven DESeq2 tests (i.e., 5 min vs. 0 min, 10 min vs. 0 min, etc.), the minimum adjusted p-value, and the log2 maximum fold-change. Please click here to download this file.
Supplementary Table 2. GO Term Enrichment Results.
This table shows GO Processes, their representation in the set of 607 O2-regulated genes and their representation in the genome. Then, the chi-square test was used to identify GO Terms that were significantly over- or under-represented in the set of 607 genes. GO enrichment analysis was performed using GO Slim Mapper at SGD 28. Please click here to download this file.
Figure 1. Hypoxia Set-up for Taking Time Points without Disrupting Gas Flow to Later Time Points.
It is important that all of the connections remain secure, as described in the protocol. Note the location of the short (9 cm) and long (17 cm) glass tubes. Please click here to view a larger version of this figure.
Figure 2. Principal Component Analysis (PCA) Shows the Relationship Between the Three Biological Replicates.
Each curve has the 0 min to 240 min samples in order. The log2 normalized gene expression for all genes was used in the PCA. PC1 accounts for 34.7% of the total variance and PC2 accounts for 24.2%. The black line is the first replicate, the red line is the second, and the blue line is the third. Note that the arrows are being used to point out the 240 min time-point in different samples. Principal component analysis (PCA) was carried out in R by using the prcomp function (Q-mode, or singular value decomposition) on the log2-transformed matrix containing genes in columns and time points in rows 29. Please click here to view a larger version of this figure.
Figure 3. Heatmap Showing Expression of the 607 Genes Identified as O2-regulated.
These genes were significant in at least one of the DESeq2 tests and changed expression by more than 4-fold. Shown is log2(relative expression). Red = increased expression. Green = decreased expression. Intensity of color is proportional to degree of change. Before creating the heatmap in TreeView 30, gene expression was k-means clustered with k=10 in Cluster 3.0 31. The key below the heatmap shows the scale. Please click here to view a larger version of this figure.
Figure 4. Relative mRNA Levels of Four Genes Over Time in the Three Replicates.
The black line is the first replicate, the red line is the second, and the blue line is the third. Common/systematic gene names are CYC1/YJR048W, ERG5/YMR015C, DAN1/YJR150C, and NCE103/YNL036W. Please click here to view a larger version of this figure.
Supplemental File 1. fastq_pipeline.sh
Please click here to download this file.
Supplemental File 2. time_course_script.R
Please click here to download this file.
In this study, the mRNA levels for all genes was measured during hypoxia in the yeast S. cerevisiae. The goal was to analyze how global gene expression changes due to growth in a controlled near-anoxic environment. Several steps were taken to ensure that the method described here was carefully controlled and reproducible. First, cells were exposed to a precisely defined hypoxic environment: 99.999% N2 in rich media (YPD). Other studies of hypoxia have closed off the flask or tube to air 32, used the hypoxia-mimetic cobalt chloride 33, or employed hypoxia-inducing anaerobic chambers or bags 34. Each of these methods may result in varying levels of hypoxia or other unintended environmental changes. It would be useful to study gene expression while systematically varying the gas mixture (e.g., 90% N2 and 10% O2), as yeast in the wild are likely to experience less-severe hypoxic conditions. Second, ergosterol and unsaturated fatty acids (e.g., Tween 80) were not added to the culture media, since they influence gene expression as discussed in the Introduction. Third, to minimize gene expression changes caused by different phases of growth, empirically-determined dilutions were performed so that when a culture reached a time point, it was in the mid-log phase of growth (~1 – 2 x 107 cells/mL). Fourth, the quick filtering and freezing (~15 s) of cells that have been removed from hypoxia is critical, as gene expression changes can occur in <5 min of exposure to O2, as occurs when cells are collected by centrifugation (data not shown). The time that cells are exposed to O2 could be reduced by performing the growth and harvesting of cells in an anoxic hood or chamber. Fifth, an appropriate strain was chosen for this study; to be consistent with other genomic studies, the common S288C lab strain background was employed. However, the S288C strains contains a mutant hap1 allele and thus was repaired 11. This allowed the study of genes like CYC1 that respond to O2 levels via the Hap1 transcription factor 11.
Time points were chosen because they exhibited large gene expression changes in previous hypoxia experiments. The results shown here can inform future experiments. For example, it would be helpful to use a higher resolution of time points during intervals that show large changes (e.g., 0 to 30 min hypoxia) to more finely capture the diverse kinetics. Additionally, later time points, beyond the time period assayed here (i.e., 4 h of hypoxia) may reveal further expression changes.
RNA-seq was used to measure mRNA levels because its reproducibility and wide dynamic range allow measurement of large fold changes 18,35. RNA was extracted by mechanical disruption of the cells using vigorously shaken beads and purified on a column. Other RNA purification methods could be used, such as hot phenol 36. Ideally, the RNA concentration will be in the range of 200 ng/µL – 1000 ng/µL since the RNA concentration will be diluted to 100 ng/µL for reverse transcription and library preparation. The RNA concentration should be measured with a fluorometer and RNA-specific dye; a UV spectrophotometer is limited because it measures the total concentration of nucleic acids, consisting of both RNA and DNA. The quality of the RNA is determined by gel electrophoresis; ribosomal RNA bands should be well-defined on the gel and smearing indicates RNA degradation. Here, mRNA transcripts were enriched, but there are diverse methods for isolating different RNA types 20 that may also be important in the response.
To save resources, between 8 and 12 samples were multiplexed and sequenced together in one lane, resulting in an average of at least 693 reads per gene for each sample, sufficient to reproducibly determine the number of reads per gene. In fact, more than 12 samples could be multiplexed in one lane, likely resulting in adequate sampling of most genes. In order to calculate the average reads per gene, the total number of reads that mapped to genes (in the HTSeq output file) was divided by the number of genes provided to HTSeq (in this study, 7140). When estimating the maximum number of samples that can be multiplexed in one lane, consider genes of interest that exhibit the lowest expression (i.e., lowest normalized counts), using data from this and other RNA-seq studies. If the normalized read counts for a gene vary significantly across replicates, then the number of reads may be too low, suggesting that less samples should be multiplexed per sequencing lane.
Next-generation sequencing will generate FASTQ files containing the reads and other information. If multiplexing was performed, a barcode file and an additional FASTQ file may be generated. These files can be downloaded for local analysis, as in this protocol. Alternatively, there are several online next-gen sequence analysis tools for those not familiar with command-line functions or R. In this regard, we recommend Galaxy (https://usegalaxy.org/) 37 and GenomeSpace (http://www.genomespace.org/). In the present protocol, two scripts that have been written by the authors are used for FASTQ processing and expression analysis. The scripts are well-annotated and can be edited for different experimental designs and computer setups. Also, the two scripts and the “Materials” Table provide websites for downloading the tools used in these scripts. The first script, “fastq_pipeline.sh”, is a shell script to be run at the command line in a UNIX/Linux terminal. This script de-multiplexes the original FASTQ file containing all of the reads obtained from one lane of the sequencer, thus generating one FASTQ file for each sample present in that lane. Next, the reads in each FASTQ file are quality trimmed using Trimmomatic with default settings 21. The trimmed reads are then mapped to the S. cerevisiae S288C reference genome (SGD R64-1-1_20110203) using TopHat2 with default settings 22. The script finally uses HTSeq to determine the number of reads that map to each annotated feature 19.
The second script, "time_course_script.R", has also been written by the authors and is run within the R statistical environment 24. The script initially imports into R the read count data generated by HTSeq. Each sample is assumed to be a time point and thus is organized relative to the other time points. The raw read counts are then imported into the R package, DESeq2 25, in order to identify genes that are differentially expressed at any of the time points relative to the first (i.e., aerobic, time 0). Because of its flexibility, DESeq2 can be used to perform other statistical tests, such as whether expression changes as a function of time 13. Alternatively, other tools employing parametric and non-parametric statistics for RNA-seq read count data could be used 20. The script then normalizes the read counts to control for the degree of sequencing of each sample, and log2 transforms the counts. These processed reads are used to perform PCA analysis, to determine the maximum fold-change for each gene, and to create the expression graphs in Figure 4.
An important issue is how best to employ statistics to identify genes that respond significantly to hypoxia. At least three biological replicates of the time course are necessary to calculate the natural variability of each gene and thus determine genes that respond to hypoxia more significantly than expected by chance. Alternatively, it is possible to successfully identify genes that respond over time, without biological replicates 13,25,38,39. The main drawback to these methods is that they do not take into account the biological variability of each gene. However, if RNA-seq replicates are not performed, individual gene expression changes could be verified by performing RT-qPCR with multiple biological replicates 13.
There are two primary benefits to assessing multiple time points of a response. First, as discussed earlier (especially in Figure 2), a time course identifies the intervals exhibiting large expression changes, informing whether additional time points are required to improve resolution of kinetics or to observe later events of the response. Second, a time course allows the differentiation of each gene's expression kinetics. This helps with identifying unique signaling pathways that act at specific times during the response, and gives insight into the timing of initiation and termination of signaling. Specifically, the clustering of genes by expression pattern that was performed here resulted in sets of co-regulated genes. The promoters of a gene set may share a transcription factor binding site, suggesting that the transcription factor becomes active when the genes change expression.
When studying a cellular response, the observed variability is ideally due to the stimulus (e.g., hypoxia) and not to other experimental variables. However, as seen in Figure 2, the individual researcher and/or the batches of the media components make major contributions to the variability in gene expression. Thus, these parameters should be considered in order to achieve high reproducibility and minimal experimental variability. As little time as possible should separate each replicate, because experimental variables are more likely to change as time passes.
In conclusion, this method can be used to accurately monitor genome-wide alterations in mRNA levels (or other events such as histone modifications or protein levels) during a hypoxia time course. The method can be adapted for other organisms, especially microbial species that can be grown in liquid culture. Such genome-wide time-course analyses will complement studies of cell morphology, protein activity and metabolism. Together, these data will lead to a richer understanding of a cellular response.
The authors have nothing to disclose.
We thank the Lewis-Sigler Institute for Integrative Genomics Sequencing Core Facility at Princeton University for technical advice and for RNA library preparation and sequencing. This work was supported by grants from Rowan University and NIH NIGMS R15GM113187 to M.J.H.
Enclosed dry incubator | Thermo Scientific | MaxQ 4000 | Set at 30°C. Modify the door to allow entry of one Tygon tube. Alternatively, use the New Brunswick G25 incubator, which contains a tube port. Do not use an open-air water shaker, as condensation will collect in the tubes between flasks, possibly cross-contaminating cultures. |
Micro-analysis Filter Holder | Millipore | XX1002530 | 25mm diameter, stainless steel support, no. 5 perforated silicone stopper mounts in standard 125mL filtering flask |
Strong vacuum | Edwards | E-LAB 2 | The “house” vacuum may be too weak. Alternatively, use an electric-power portable vacuum pump like the one listed here. |
1000-mL flask | To act as vacuum trap. | ||
~2-foot lengths of Heavy Wall Vacuum Tubing, inner diameter 3/8 in, outer diameter 7/8 in | Tygon | 38TT | Two pieces: the first connects vacuum to trap, and the second connects trap to filter system. |
High-pressure N2 gas tank | 99.999% purity, >1000psi, with a regulator and gas flow controller | ||
Autoclaved 500mL flask | Opening covered with aluminum foil. One for each yeast strain. | ||
Autoclaved 250mL flasks | Openings covered with aluminum foil. One for each time point plus two for water traps. | ||
Flask stoppers (size 6, two holes with 5mm diameter) | Sterilized with 70% ethanol. One for each flask. | ||
Glass tubing, length 9 cm or 17 cm, inner diameter 2 mm, outer diameter 5 mm | Sterilized with 70% ethanol. Two tubes for each flask. Place into the holes of each stopper. See Figure 1 for placement of 9- vs 17- cm tubes. | ||
~25-cm lengths of plastic tubing, inner diameter 5 mm | Tygon | E-3603 | One piece for each flask. Sterilized with 70% ethanol. |
Sterile filter discs | Millipore | HAWP02500 | 25mm diameter, 0.45 µm pore size, one for each time point |
Sterile dH2O (~100mL) | |||
1-mL cuvettes | For measuring OD600 (i.e., cell concentration) | ||
50-mL sterile centrifuge tubes | One for each time point | ||
Clean and sterile tweezers | |||
liquid nitrogen | For freezing cells | ||
acid-washed beads | Sigma | G8772 | Keep at 4°C for lysing cells |
Qiagen RNeasy Mini Kit | Qiagen | 74104 | For RNA column purification |
Qiagen RLT buffer | Prepare by adding 10µL of β-Mercaptoethanol per 1mL of RLT buffer, keep at 4C. | ||
2-mL collection tubes | Qiagen | included in the Qiagen Rneasy Mini Kit | |
Buffer RPE | Qiagen | included in the Qiagen Rneasy Mini Kit | |
Buffer RW1 | Qiagen | included in the Qiagen Rneasy Mini Kit | |
DNase I stock and working solutions | Qiagen | 79254 | The DNase I enzyme comes as lyophilized powder in a glass vial. Using a sterile needle and syringe, inject 550µL of RNase-free water (provided in Qiagen kit) into the vial. Mix by gently inverting the bottle. To avoid denaturing the enzyme, do not vortex. Using a pipet, remove this stock solution from the vial and store in freezer (-20°C) in single-use aliquots (80µL each). The stock solution should not be thawed and refrozen. |
Buffer RDD | Qiagen | included in the Qiagen DNase Kit | |
Ice cold 2-mL screw-cap tubes | For lysing cells during RNA extraction | ||
bead mill homogenizer | Biospec Mini-Beadbeater-24 | 112011 | Keep in cold room |
Bacto Peptone | BD | DF0118 | for liquid YPD media |
Bacto Yeast Extract | BD | DF0886 | for liquid YPD media |
glucose | Fisher | D16 | for liquid YPD media |
Qubit assay tubes | Thermo Fisher | Q32856 | for measuring nucleic acid concentration |
Quant-iTTM dsDNA BR Assay Kit | Thermo Fisher | Q32853 | for measuring nucleic acid concentration |
Quant-iTTM RNA Assay Kit | Thermo Fisher | Q32855 | for measuring nucleic acid concentration |
Qubit Fluorometer | Thermo Fisher | Q33216 | for measuring nucleic acid concentration |
Commercial electrophoresis system | Agilent | Bioanalyzer 2100 | for measuring nucleic acid quality |
Next-generation sequencer | Illumina | HiSeq 2500 | for sequencing libraries |
automated liquid handling system | Wafergen | Apollo 324 | for creating sequencing libraries |
PrepX PolyA mRNA Isolation Kit | Wafergen | 400047 | for isolating mRNA from total RNA |
PrepX RNA SEQ for Illumina Library Kit | Wafergen | 400039 | for creating strand-specific sequencing libraries from total RNA |
Barcode Splitter | https://toolshed.g2.bx.psu.edu/repository?repository_id=7119c4f7a89efa57&changeset_revision=e7b7cdc1834d | ||
Samtools, which includes the gzip command | http://www.htslib.org/download/ | ||
Trimmomatic | http://www.usadellab.org/cms/?page=trimmomatic | ||
Bowtie2 (installed before TopHat) | http://bowtie-bio.sourceforge.net/bowtie2/index.shtml | ||
TopHat | https://ccb.jhu.edu/software/tophat/index.shtml | ||
HTSeq | http://www-huber.embl.de/HTSeq/doc/overview.html | ||
R (installed before R Studio) | https://cran.rstudio.com | ||
R Studio (free version) | https://www.rstudio.com/products/rstudio/download/ |