We describe a method to sort single mammalian cells and to quantify the expression of up to 96 target genes of interest in each cell. This method includes the use of internal qPCR standards to enable the estimation of absolute transcript counts.
Gene expression measurements from bulk populations of cells can obscure the considerable transcriptomic variation of individual cells within those populations. Single-cell gene expression measurements can help assess the role of noise in gene expression, identify correlations in the expression of pairs of genes, and reveal subpopulations of cells that respond differently to a stimulus. Here, we describe a procedure to measure the expression of up to 96 genes in single mammalian cells isolated from a population growing in tissue culture. Cells are sorted into lysis buffer by fluorescence-activated cell sorting (FACS), and the mRNA species of interest are reverse-transcribed and amplified. Gene expression is then measured using a microfluidic real-time PCR machine, which performs up to 96 qPCR assays on up to 96 samples at a time. We also describe the generation and use of PCR amplicon standards to enable the estimation of the absolute number of each transcript. Compared with other methods of measuring gene expression in single cells, this approach allows for the quantification of more distinct transcripts than RNA FISH at a lower cost than RNA-Seq.
Individual cells in a population can show widely differing responses to a uniform physiological stimulus1,2,3,4. The genetic variation of cells in a population is one mechanism for this variety of responses, but there are also several non-genetic factors that can increase the variability of responses, even in a clonal population of cells. For example, the levels of individual proteins and other important signaling molecules can vary on a cell-by-cell basis, giving rise to variation in downstream gene expression profiles. Additionally, gene activation can occur in short-duration bursts of transcripts5,6 that may be limited to a relatively small number of transcripts per burst7,8,9. Such stochasticity in gene activation can greatly contribute to variability in biological responses and can provide a selective advantage in microorganisms10 and in mammalian cells1,2 responding to a physiological stimulus. Due to both genetic and non-genetic sources of variation, the gene expression profile of any given cell in response to a stimulus may differ greatly from the average gene expression profile obtained from the measurement of the bulk response. Determining the extent to which individual cells show variability in response to a stimulus requires techniques for the isolation of individual cells, the measurement of the expression levels for transcripts of interest, and the computational analysis of the resulting expression data.
There are several approaches for assaying gene expression in single cells, covering a wide range of costs, number of transcripts probed, and accuracy of quantification. For example, single-cell RNA-Seq offers a wide depth of transcript coverage and the ability to quantify thousands of distinct transcripts for the most highly expressed genes in individual cells; however, the cost associated with such sequencing depth can be prohibitive, although costs continue to decrease. Conversely, single-molecule RNA fluorescence in situ hybridization (smRNA FISH) offers precise quantification of transcripts for even low-expressing genes at a reasonable cost per gene of interest; however, only a small number of target genes can be assayed in a given cell by this approach. Quantitative PCR-based assays, described in this protocol, provide a middle ground between these techniques. These assays employ a microfluidic real-time PCR machine to quantify up to 96 transcripts of interest at a time in up to 96 cells. While each of the aforementioned methods has requisite hardware costs, the cost of any individual qPCR assay is relatively low. This protocol is adapted from one suggested by a manufacturer of a microfluidic real-time PCR machine (Protocol ADP 41, Fluidigm). To enable the estimation of the absolute number of each transcript in a PCR-based approach, we have expanded the protocol to make use of internal controls of prepared target gene amplicons that can be used across multiple experiments.
As an example of this technique, the quantification of the expression of genes regulated by the tumor suppressor p53 in MCF-7 human breast carcinoma cells is described11. The cells are challenged with a chemical agent that induces DNA double-strand breaks. Previous studies have shown that the p53 response to DNA double-strand breaks exhibits a great deal of heterogeneity in individual cells, both in terms of p53 levels12 and in the activation of distinct target genes11. Furthermore, p53 regulates the expression of over 100 well-characterized target genes involved in numerous downstream pathways, including cell cycle arrest, apoptosis, and senescence13,14. Since the p53-mediated response in each cell is both complex and variable, the analysis of the system benefits from an approach in which nearly 100 target genes can be probed simultaneously in individual cells, such as that described below. With slight modifications (such as alternative methods for single-cell isolation and lysis), the protocol can be readily adapted to study a wide range of mammalian cell types, transcripts, and cellular responses.
With proper advance preparation, a round of cell sorting and gene expression measurement can be conducted according to this protocol over a period of three days. The following timing is suggested: in advance, select the transcripts of interest, identify and validate the primer pairs that amplify the cDNA from those transcripts, and prepare the standards and primer mixes using those primers. On Day 1, following cell treatment, harvest and sort the cells, perform reverse transcription and specific target amplification, and treat the samples with an exonuclease to remove unincorporated primers. On Day 2, perform quality control on sorted cells using qPCR. Finally, on Day 3, measure the gene expression in the sorted cells using microfluidic qPCR. Figure 1 summarizes the steps involved.
1. Advance Preparation
2. Treatment
3. Lysis Buffer Preparation for Cell Sorting
NOTE: Making a single plate of lysis buffer takes about 1 hr. It is advisable to make and sort multiple plates, as cell sorting can be inefficient and yield many wells with no detectable cell.
4. Cell Sorter Setup
5. Cell Harvest and Sorting
6. Exonuclease Treatment
7. Sample Dilution
8. Sort Quality Control Using qPCR
NOTE: Because cell sorting is not perfectly efficient, this step is necessary to identify which wells of the sorted plate actually received a cell. These samples can then be used for further analysis.
9. Gene Expression Measurement Using Microfluidic qPCR
NOTE: For every step in this section, pipette only to the first stop to minimize the formation of bubbles in the reagents.
10. Data Analysis
A general overview of the protocol is shown in Figure 1, including steps for cell treatment, the isolation of single cells by FACS, the generation and pre-amplification of cDNA libraries from single-cell lysates, the confirmation of single-cell cDNA libraries in sorted wells, and the measurement of gene expression by qPCR.
In preparation for single-cell isolation and gene expression analysis, it is necessary to first identify valid oligonucleotide primer pairs for each target gene of interest. Figure 2 shows two examples of primer quality-control methods. Melt curves are generated following qPCR with the primer pair being tested. Valid primers result in a melt curve with a single peak (Figure 2A, green curve). If multiple peaks (Figure 2A, blue curve) or a curve with a pronounced shoulder (Figure 2A, red curve) are present, this indicates the presence of multiple PCR products, and the primers should be redesigned. As a second quality-control method, agarose gel electrophoresis can be used to visually verify that a single band of the correct size is present for each primer pair tested (Figure 2B). Multiple bands or bands of the wrong size indicate that the primers are not valid (Figure 2B).
Following primer validation and the generation of libraries of qPCR standards from the correct amplicons, cell sorting can be performed. An example sorting scheme is indicated in Figure 3. Each sorted plate should include wells containing a dilution series of amplicon standards, with 10, 100, 1,000, or 10,000 copies of each target amplicon. It is also recommended to include wells into which no cells, 10 cells, or 100 cells are sorted, in addition to the single-cell wells. Due to sorting inefficiency, it is often necessary to identify the wells of the plate in which single cells have successfully been deposited. For this validation step, following the procedure for reverse transcription and specific target amplification, qPCR should be performed to measure the expression of a housekeeping gene (Figure 4). The wells for the amplicon standards (Figure 4, black curves) should show evenly spaced amplification curves. There should also be clear separation in the curves corresponding to the "no-cell" wells (Figure 4; red curves), "10-cell" wells (Figure 4; blue curves), and "100-cell" wells (Figure 4; purple curves). A clear separation should also occur for the "1-cell" wells corresponding to successful cell deposition (Figure 4; green amplification curves with Ct values less than those for the no-cell controls but greater than those for the 10-cell controls) versus unsuccessful cell deposition (Figure 4; green curves with Ct values near those for the no-cell controls). Once wells with a single cell lysate have been identified, cDNA from the wells (potentially from multiple plates) can be transferred to a microfluidic qPCR chip, with one single-cell lysate per well. For example, Figure 5 shows a hypothetical array that combines cDNA from single cells from three different sorted 96-well plates into a single new plate for qPCR analysis.
To analyze the microfluidic qPCR results, the samples that likely did not receive a cell should first be removed, as indicated by many missing or zero gene expression measurements (Figure 7, rows indicated by arrows) or unusually low housekeeping gene expression. For each assay, any measurements with melt peaks that do not match those of the other samples in the same assay should also be removed. After removing bad samples and measurements, linear regression can be performed on the measurements corresponding to the amplicon standards to estimate the absolute count of each transcript of interest in each single-cell sample. Figure 8 shows a visualization of these gene expression estimates, measured in units of molecules × reverse transcription efficiency, as beeswarm and violin plots. In this example, untreated cells show a broad range of CDKN1A expression levels, with many cells not expressing CDKN1A at all and with others expressing the gene at levels spanning two orders of magnitude. After treatment with neocarzinostatin (NCS) for 3 hr, most cells express CDKN1A at much higher levels, and the variability decreases to roughly a single order of magnitude. As the duration of treatment increases, the average level of CDKN1A expression decreases and the variability in expression expands.
Figure 1: Overview of the steps in single-cell gene expression profiling. Treated cells are harvested and sorted via FACS directly into lysis buffer. mRNA species of interest are reverse-transcribed, and the resulting cDNA is amplified by PCR. Housekeeping gene expression in each sample is measured by qPCR to determine which samples actually received a cell. In samples passing this quality-control test, the expression of up to 96 genes is measured using microfluidic qPCR. This figure has been modified from a previous publication11. Please click here to view a larger version of this figure.
Figure 2: Example results of primer testing. (A) Melt curves from qPCR. The blue melt curve (TNFRSF10D) has multiple peaks and the red melt curve (TRIM22) has a "shoulder," which amounts to a second peak close to the first peak. These indicate that the primers used in those reactions amplify multiple targets and need to be redesigned. The green melt curve (SCN3B) has a single peak, consistent with primers that amplify a single target. (B) Amplified DNA run on an agarose gel. The lanes for SCN3B, TRIM22, and TRPM2 contain multiple bands; the primers used to make those DNA amplicons amplify multiple targets and need to be redesigned. The lane for TNFRSF10D has a single band, consistent with primers that amplify a single target. It is worth noting that the primers for SCN3B pass the melt curve test but fail the gel test, while those for TNFRSF10D pass the gel test but fail the melt curve test; these two methods of primer validation are complementary. In this example, one can conclude that all four primer sets need to be redesigned. Please click here to view a larger version of this figure.
Figure 3: Plate map for cell sorting. This plate map includes 1-cell wells (red), 10- and 100-cell wells (darker red), standards (green), and no-cell wells (white). Including two replicates of each standard on each plate helps to compensate for variability in standards. Including 10-cell and 100-cell wells is helpful as a sanity check for gene expression. The no-cell wells serve as a negative control. Please click here to view a larger version of this figure.
Figure 4: Example of qPCR amplification curves from sort quality control measuring GAPDH expression. No-cell samples (red) show a background level of expression. 1-cell samples (green) divide into two distinct groups, one with high expression and one with low expression similar to the no-cell samples. The high-expression samples most likely received a real cell during sorting; the low-expression samples most likely did not. 10-cell (blue) and 100-cell (purple) samples show higher expression than the 1-cell samples; the 10-cell samples are close to the highest-expressing 1-cell samples because of sort inefficiency. Standards (black) are evenly distributed, as expected. Please click here to view a larger version of this figure.
Figure 5: Example of Sample Mixes plate map. This plate map includes 1-cell samples from three different plates (rows A-G), standards mixed from all three plates (H1-H8), no-cell controls (H9-H10), and no-template controls for qPCR (H11-H12). Please click here to view a larger version of this figure.
Figure 6: Diagram of microfluidic qPCR chip with the order of loading (1, 2, 3…). The notch (A1) is the top left corner of the plate. Please click here to view a larger version of this figure.
Figure 7: Heat map of the qPCR results. Each row represents a 1-cell sample, standard, or control; each column represents a measured transcript. Rows with an unusual number of black squares or Xs (arrows) likely indicate samples that did not receive an actual cell. Please click here to view a larger version of this figure.
Figure 8: Beeswarm and violin plots. In a beeswarm plot (blue points), each point represents gene expression in a particular cell. In a violin plot (gray shape), the width of the "violin" represents the frequency of gene expression measurements around a particular level in the population of cells analyzed. Light blue bars represent the lowest standard. Measurements above the blue bars are interpolated between standards, while measurements within the blue bars are extrapolated. The data shown here are measurements of CDKN1A expression in MCF-7 cells expressing Venus-tagged p53 treated with neocarzinostatin (NCS) for the indicated times in order to induce DNA double-strand breaks. This figure has been modified from previous publication11. Please click here to view a larger version of this figure.
Condition | Temperature | Time |
Hold | 50 °C | 15 min |
Hold | 95 °C | 2 min |
20 cycles | 95 °C | 15 sec |
60 °C | 4 min | |
Hold | 4 °C | ∞ |
Table 1: Reverse transcription and specific target amplification (RTSTA) thermal cycling program.
Condition | Temperature | Time |
Hold | 37 °C | 30 min |
Hold | 80 °C | 15 min |
Hold | 4 °C | ∞ |
Table 2: Exonuclease I treatment (EXOI) thermal cycling program.
Component | Volume/well (µl) | 96 volumes w/ 10% overage (µl) |
2x reaction mix | 5.00 | 528.00 |
Reverse transcriptase/polymerase mix | 0.20 | 21.12 |
10x STA Primer Mix | 1.00 | 105.60 |
2.64 U/µl (dilute) RNAse inhibitor | 0.02 | 2.00 |
Nuclease-free water | 0.78 | 82.48 |
Total | 7.00 | 739.20 |
Table 3: Components of the lysis buffer for the cells.
Component | Volume/well (µl) | 96 volumes w/ 10% overage (µl) |
Nuclease-free water | 2.52 | 266.00 |
Exonuclease I Reaction Buffer (10x) | 0.36 | 38.00 |
Exonuclease I (20 U/µl) | 0.72 | 76.00 |
Total | 3.60 | 380.00 |
Table 4: Components of the exonuclease mix to treat the amplified cDNA samples.
We have presented a method for isolating individual mammalian cells from a population of adherent cells grown in culture and for assaying the expression of approximately 96 genes in each cell. Good advance preparation is critical for this method to work well. In particular, designing and testing primer pairs specific to the transcripts of interest (steps 1.2-1.3) are time-consuming but important steps, as the primers determine the quality of the single-cell measurements. Once reliable primer pairs have been obtained, they are used to amplify cDNA from the transcripts of interest; the amplicons are then combined together in equimolar amounts to make standards (step 1.4). This step is critical, as the standards are required to estimate the absolute transcript counts; standards should be made once, aliquoted, and used for all subsequent rounds of cell sorting and gene expression measurements. Likewise, on the day that the cells are sorted, it is especially important to dilute the standards carefully using low-binding pipette tips and microcentrifuge tubes when making plates of lysis buffer (steps 3.7 and 3.11). Aiming the cell sorter (step 4.2) is another critical step; this must be done very carefully in order for the cells to successfully hit the target of 9 µl of lysis buffer in a PCR plate.
There are several modifications to the protocol that can be made to adapt it to a variety of research needs. Here, we focused on a DNA binding dye-based qPCR approach, which provides a relatively affordable method for quantifying gene expression. However, this method has the potential for creating a high background signal, since any amplified DNA, even that of off-target sequences, results in a fluorescent signal. Such a background can be minimized by ensuring that the primer pair used to amplify a specific target yields a melt curve with a single peak and a single PCR product of the correct size (Figure 2). If the background is still a concern after using such quality-control methods, a probe-based qPCR approach can be used to minimize off-target detection, albeit at a greater cost of reagents. Another option would be a nested PCR approach, in which one set of primers is used in the RTSTA step to reverse-transcribe and amplify a large region of the transcript and a second set of primers, which amplifies a smaller region contained within that large region, is used to measure gene expression by qPCR. This approach has been shown to improve the specificity of DNA binding dye-based qPCR23.
Several methods are available for isolating individual cells for gene expression analysis. The choice of cell isolation method depends on several factors, including the availability of equipment (such as a fluorescence-activated cell sorter), the source of the cells to be evaluated (such as established cell lines versus primary cells from a tissue), or the speed with which the cells must be isolated. In the example presented in this protocol, we used FACS of a cell line expressing a clearly detectable fluorescent-tagged protein. FACS has the benefits of rare cell detection capabilities and relatively rapid cell isolation. One challenge of the method is that the deposition of the cells of interest into the requisite small volume of lysis buffer in a 96-well PCR plate can be inefficient and may require the optimization of the cell sorting geometry to ensure a successful isolation. Alternative approaches that may lead to greater precision in cell isolation at lower speeds and/or higher costs include the micropipetting of individual cells, the laser capture microdissection of specific cells from a tumor sample, or the use of microfluidic systems (e.g., the Fluidigm C1 system).
One major benefit of the protocol described here is that the internal controls, a dilution series of purified PCR amplicons, enable the estimation of the absolute number of each transcript in each cell. Without these standards, only relative levels of gene expression can be obtained based on calculations derived from the Ct values. However, with these standards, absolute transcript counts can be estimated, ideally through interpolation within the range of precisely quantified amplicon standards (Figure 8). As with any PCR-based method of gene expression quantification, accurate absolute quantification requires knowledge of the efficiency of each process, including reverse transcription.
A current challenge in quantifying transcript levels in single cells is that the limit of detection for many methods is estimated to be 10-100 mRNAs per cell24, preventing the reliable quantification of a significant percentage of the transcriptome. As an alternative approach, one can use smRNA FISH to quantify targets of particular interest25,26, including those with very few transcripts. Advances in smRNA FISH have expanded the number of distinct transcripts that can be detected at once27 and the number of cells that can be profiled at once28, given the appropriate equipment. Although smRNA FISH has its own limitations (potential false-positive and false-negative transcript detection, restrictions to only a few target genes per cell, the cost of equipment and reagents), it may provide a powerful method to complement and validate results from a qPCR-based analysis of a subset of target genes of interest.
The authors have nothing to disclose.
We would like to thank V. Kapoor in the CCR ETIB Flow Cytometry Core for her aid in performing the cell sorting during the development of this protocol. We also thank M. Raffeld and the CCR LP Molecular Diagnostics Unit and J. Zhu and the NHLBI DNA Sequencing and Genomics Core for their aid in performing the qPCR during the development of this protocol. This research was supported by the Intramural Program of the NIH.
RNeasy Plus Mini Kit | Qiagen | 74134 | |
High Capacity cDNA Reverse Transcription Kit with RNase Inhibitor | ThermoFisher | 4374966 | |
Phusion High-Fidelity DNA Polymerase | New England BioLabs | M0530S | |
QIAquick Gel Extraction Kit | Qiagen | 28704 | |
Quant-iT High-Sensitivity dsDNA Assay Kit | ThermoFisher | Q33120 | |
2.0-mL low adhesion microcentrifuge tubes | USA Scientific | 1420-2600 | |
DNA Suspension Buffer | Teknova | T0221 | |
Axygen 0.2-mL Maxymum Recovery Thin Wall PCR Tubes | Corning | PCR-02-L-C | |
GE 96.96 Dynamic Array DNA Binding Dye Sample & Assay Loading Reagent Kit | Fluidigm | 100-3415 | |
HyClone RPMI 1640 media | GE Healthcare Life Sciences | SH30027.01 | |
Fetal Bovine Serum, Certified (US) | ThermoFisher | 16000-044 | |
Antibiotic-Antimycotic Solution | Corning | 30-004-CI | |
Neocarzinostatin | Sigma | N9162 | |
ELIMINase | Decon Labs | 1101 | |
SUPERase-In | ThermoFisher | AM2696 | |
CellsDirect One-Step qRT-PCR Kit | ThermoFisher | 11753500 | |
E. coli DNA | Affymetrix | 14380 10 MG | |
ThermalSeal Sealing Film, Sterile | Excel Scientific | STR-THER-PLT | |
BD FACSAria IIu | BD Biosciences | ||
HyClone Trypsin 0.05% | GE Healthcare Life Sciences | SH30236.01 | |
PBS, 1x | Corning | 21-040-CV | |
Falcon 40µm Cell Strainer | Corning | 352340 | |
Exonuclease I | New England BioLabs | M0293S | |
SsoFast EvaGreen Supermix with Low ROX | Bio-Rad | 172-5210 | |
96.96 Dynamic Array IFC for Gene Expression (microfluidic qPCR chip) | Fluidigm | BMK-M-96.96 | |
IFC Controller HX (loading machine) | Fluidigm | ||
BioMark or BioMark HD (microfluidic qPCR machine) | Fluidigm | ||
Real-Time PCR Analysis software | Fluidigm | ||
MATLAB software | MathWorks |