This protocol describes the general processes and quality control checks necessary for preparing healthy adult mammalian single cells for droplet-based, high throughput single cell RNA-Seq preparations. Sequencing parameters, read alignment, and downstream single-cell bioinformatic analysis are also provided.
The analysis of single cell gene expression across thousands of individual cells within a tissue or microenvironment is a valuable tool for identifying cell composition, discrimination of functional states, and molecular pathways underlying observed tissue functions and animal behaviors. However, the isolation of intact, healthy single cells from adult mammalian tissues for subsequent downstream single cell molecular analysis can be challenging. This protocol describes the general processes and quality control checks necessary to obtain high-quality adult single cell preparations from the nervous system or skin that enabled subsequent unbiased single cell RNA sequencing and analysis. Guidelines for downstream bioinformatic analysis are also provided.
With the development of high throughput single cell technology1,2 and advancements in user-friendly bioinformatics tools over the last decade3, a new field of high-resolution gene expression analysis has emerged – single-cell RNA sequencing (scRNA-Seq). The study of single cell gene expression was first developed to identify heterogeneity within defined cell populations, such as in stem cells or cancer cells, or to identify rare populations of cells4,5, which were unattainable using traditional bulk RNA sequencing techniques. Bioinformatic tools have enabled the identification of novel sub-populations (Seurat)2, visualization of the order of cells along a psuedotime space (Monocle)6, definition of active signaling networks within or between populations (SCENIC)7, prediction of the assembly of single-cells in an artificial 3D space (Seurat, and more)8. With these new and exciting analyses available to the scientific community, scRNA-Seq is fast-becoming the new standard approach for gene expression analysis.
Despite the vast potential of scRNA-Seq, the technical skillsets required to produce a clean dataset and to accurately interpret results can be challenging to newcomers. Here, a basic, but comprehensive protocol, starting from the isolation of single cells from whole primary tissues to visualization and presentation of data for publication is presented (Figure 1). First, the isolation of healthy single cells can be deemed challenging, as different tissues vary in their degree of sensitivity to enzymatic digestion and subsequent mechanical dissociation. This protocol provides guidance in these isolation steps and identifies important quality control checkpoints throughout the process. Second, understanding the compatibility and requirements between single cell technology and next-generation sequencing can be confusing. This protocol provides guidelines to implement a user-friendly, droplet-based single-cell barcoding platform and perform sequencing. Finally, computer programming is an important prerequisite for analyzing single-cell transcriptomic datasets. This protocol provides resources for getting started with the R programming language and provides guidance on implementing two popular scRNA-Seq-specific R packages. Together, this protocol can guide newcomers in performing scRNA-Seq analysis for obtaining clear, interpretable results. This protocol can be adjusted to most tissues in the mouse, and importantly could be modified for use with other organisms, including human tissue. Adjustments depending on the tissue and user will be required.
There are several considerations to keep in mind while following this protocol; including, 1) Following all quality control guidelines in Steps 1 and 2 of this protocol is recommended to ensure a viable single cell suspension of all cells within the sample of interest while ensuring accurate total cell number counts (summarized in Figure 2). Once this is achieved, and if all the optimized conditions are followed, the quality control steps can be dropped (to save time – preserving RNA quality and reducing cell loss). Confirming successful isolation of high viability single cells from the tissue of interest is highly recommend before any downstream processing. 2) Since some cell types are more sensitive than others to stress, excessive dissociation techniques can inadvertently bias the population, therefore confounding downstream analysis. Gentle dissociation without unnecessary cellular shearing and digestion is critical for achieving high cellular yields and an accurate representation of tissue composition. Shear forces occur during the trituration, FACS and resuspension steps. 3) As with any RNA work, is it best to introduce as little additional RNase into the sample as possible during preparation. This will help maintain high-quality RNA. Use ribonuclease inhibitor solutions with rinsing to clean tools and any equipment that is not RNase-free but avoid DEPC-treated products. 4) Perform preparations as quickly as possible. This will help maintain high-quality RNA and reduce cell death. Depending on the tissue dissection length and animal number, consider starting multiple dissections/preparations at the same time. 5) Prepare cells on ice when possible to maintain high quality RNA, reduce cell death, and slow cell signaling and transcriptional activity. Albeit, ice-cold processing is ideal for most cell types, some cell types (e.g., neutrophils) perform better when processed at room temperature. 6) Avoid calcium, magnesium, EDTA, and DEPC-treated products during cell preparation.
All protocols described here are in accordance with and approved by the University of Calgary’s Animal Care Committee.
1. Dissociating Tissue (Day 1)
2. Isolating Viable and Healthy Cells (Day 1)
3. GEM (Gel Bead in Emulsion) Generation and Barcoding (Day 1)
NOTE: Steps 3-6 of this protocol are designed to be used in conjunction with the most common microdroplet-based single-cell platform, manufactured by 10X Genomics. Detailed guidelines for Steps 3 and 4 are outlined in the manufacturer’s protocol (Refer to the Chromium Single Cell 3' protocol)11,12 and must be followed in conjunction with this protocol. For best results, Step 3 must be completed immediately after dissociation (Step 1) and cell isolation (Step 2) steps on day 1 of this protocol.
4. Clean-Up, Amplification, Library Construction and Library Quantification (Day 2 Onwards)
NOTE: Detailed guidelines for Steps 4 are outlined in the manufacturer’s protocol 11,12, and must be followed in conjunction with this protocol.
5. Library Sequencing (Day 3 Onwards)
NOTE: The single-cell transcriptome barcoding platform used in this protocol generates Illumina-compatible paired-end libraries beginning and ending with P5 and P7 sequences. Although minimum depth needed to resolve cell-type identity can be as few as 10,000–50,000 reads/cell15,16, ~100,000 reads/cell is recommended as an optimal cost-coverage trade-off for adult in vivo cells (keeping in mind some cell types or minimally activated cell states will reach saturation at 30,000-50,000 reads/cell).
6. Processing Read Files
NOTE: Sequencing a single cell 3’ Library using this protocol generates raw data in binary base call (BCL) format. The Cell Ranger package is used to generate text-based FASTQ files from BCL files, perform genomic and transcriptomic alignments, gene counts, demultiplexing, and aggregation of samples. In this section, the key steps that enable users to download raw BCL data from a sequencing facility and generate filtered gene-barcode matrices ready for downstream bioinformatics is presented.
7. Advanced Analysis of scRNA-Seq Datasets
NOTE: A complete scRNA-Seq tools database can be found at scRNA-tools3,27. Below is a framework for unsupervised cell clustering using Seurat2 and pseudotemporal ordering using Monocle6. Although much of this work can be done on a local computer, the following steps assume that computation will be completed using an institutional server.
8. NCBI’s GEO and SRA Submissions
NOTE: Since easy access to raw sequencing files ensure reproducibility and reanalysis, accessioned submissions to online publicly available repositories are recommended or required prior to manuscript submission. National Center for Biotechnology Information’s (NCBI) Gene Expression Omnibus (GEO) and Sequence Read Archive (SRA) are publicly accessible data repositories for high-throughput sequencing data35,36.
The repertoire of open source packages designed to analyze scRNA-Seq datasets has increased dramatically40 with the majority of these packages use R-based languages3. Here, representative results using two of these packages are presented: assessing unsupervised grouping of single cells based on gene expression, and ordering single cells along a trajectory in order to resolve cell heterogeneity and deconstruct biological processes.
Figure 4 illustrates the use of Seurat for pre-processing quality checks and downstream bioinformatics analysis. First, filtration and removal of deviant cells from analysis is essential for quality checking. This was done using violin (Figure 4a) and scatter plots (Figure 4b) to visualize the percentage of mitochondrial genes, number of genes (nGene), and number of UMI (nUMI) to identify cell doublets and outliers. Any cell with a clear outlier number of genes, UMI, or percentage of mitochondrial genes was removed using Seurat's FilterCells function. Since Seurat uses principal component (PC) analysis scores to clusters cells, determining statistically significant PCs to include is a critical step. Elbow plots (Figure 4c) were used for PC selection, in which PCs beyond the plateau of the 'standard deviation of PC' axis were excluded. The resolution of clustering was also manipulated demonstrating that the number of clusters can be changed, ranging from 0.4 (low resolution leading to fewer cell clusters, Figure 4d) to 4 (high resolution leading to higher cell clusters, Figure 4e). At low resolution, it is likely that each cluster represents a defined cell type, whereas at high resolution this may also represent subtypes or transitional states of a cell population. In this instance, low-resolution cluster settings were used for further analyzing expression heatmaps (using Seurat's DoHeatmap function) to identify the most highly expressed genes in a given cluster (Figure 4f). In this instance, the most highly expressed genes were identified by assessing differential expression in a given cluster versus all other clusters combined, demonstrating that each cluster was uniquely represented by defined genes. Additionally, individual candidate genes can be visualized on tSNE plots using Seurat's FeaturePlot function (Figure 4g). This allowed for deciphering whether there were clusters that represented macrophages. Using FeaturePlot, we found that both cluster 2 and 4 were expressing Cd68 – a pan-macrophage marker.
The Monocle package was used for corroborating cell clusters identified in Seurat, and for building cell trajectories, or pseudotemporal ordering, to recapitulate biological processes (Figure 5). Pseudotemporal ordering can be used for samples where single-cell expression profiles are expected to follow a biological time course. Cells can be ordered along a pseudotemporal continuum to resolve intermediate states, bifurcation points of two alternative cell fates, and identify gene signatures underlying acquisition of each fate. Firstly, similar to Seurat's filtration, poor-quality cells were removed such that the distribution of mRNA across all cells was log normal and fell between upper and lower bounds as identified in Figure 5a. Then, using Monocle's newCellTypeHierarchy function, single cells were classified and counted using known lineage marker genes (Figure 5b, 5c). For example, cells expressing PDGF receptor alpha or Fibroblast Specific Protein 1 were assigned to Cell Type #1 to create a criterion for defining fibroblasts. Next, this population (Cell Type #1) was assessed to decipher fibroblast trajectories. To do this, Monocle's differential GeneTest function was utilized, which compared the cells representing the extreme states within the population and found differential genes for ordering the remaining cells in the population (Figure 5d). By applying manifold learning methods (a type of non-linear dimensionality reduction) across all cells, a coordinate along a pseudotemporal path was assigned. This trajectory was then visualized by cell state (Figure 5e) and pseudotime (Figure 5f).
Figure 1: Flow chart. Steps from whole animal preparation to analyzing single cell RNA-Seq datasets to submitting final datasets to a publicly available repository. Gel beads in Emulsion (GEMs) refer to beads with barcoded oligonucleotides which encapsulate thousands of single cells. Please click here to view a larger version of this figure.
Figure 2: Creating viable single cell suspension from nerve tissue. (a) Cartoon overview of quality control checks. (b) Cells and debris with cells still incorporated in debris (red arrows). (c) Cells released from debris (red arrows). (d) Cell isolation by FACS. P0: debris fraction; P1: cell-like fraction; P3: exclusion of duplets; P4: viability dye (Sytox Orange) negative fraction. (e) No viability dye control. (f) Image of P0 fraction representing isolated debris. (g) Image of P4 fraction representing isolated viable cells (red arrows). (b)(c)(f) and (g) had nuclear dye added 20 minutes before imaging. Scale Bars: 80 µm. Please click here to view a larger version of this figure.
Figure 3: Shallow sequencing predicts the number of recovered cells in 10X processed samples. (a) An example (Sample 1.6) of MiSeq-generated csv listing cell barcodes and its corresponding UMI counts as determined by confidently mapped reads. (b) Barcode rank plot for Sample 1.6 shows one significant drop in UMI count as a function of cell barcodes. The dashed and solid lines represent the cutoff between cells and background as determined by visual inspection. (c) Cell barcodes observed using the Cell Ranger pipeline post-HiSeq reveals shallow sequencing accurately approximated the number of cells for Sample 1.6. (d) An example of a flow-cell set-up based on shallow sequencing derived cell estimates. For Sample 1.6, since shallow sequencing predicted 3480 cells, 1.17 lanes were assigned to ensure >100,000 reads per cell sequencing coverage in HiSeq. Note: All lanes must add to 100%. Please click here to view a larger version of this figure.
Figure 4: Quality control and bioinformatics of single-cell RNA-Seq dataset using Seurat R package. (a) Plots of quality control metrics which include number of genes, number of unique molecular identifiers (UMIs), and the percentage of transcripts mapping to the mitochondrial genome. (b) Sample gene plots detecting cells with deviant levels of mitochondrial transcripts and UMIs. (c) Sample elbow plot used for ad hoc determination of statistically significant PCs. The dashed and dot-dashed lines represent the cutoff where a clear "elbow" becomes apparent in the graph. PC dimensions before this elbow are included in downstream analysis. (d, e) Graph-based cell clusters visualized at two different resolutions in a low-dimensional space using a tSNE plot. (f) Top marker genes (yellow) for each cluster visualized on an expression heatmap using Seurat's DoHeatmap function. (g) Visualizing marker expression of, for example, Cd68 gene representing macrophages (purple) using Seurat's FeaturePlot function. This suggests that cluster 2 and 4 (in panel d) of this dataset represents macrophages. Please click here to view a larger version of this figure.
Figure 5: Cell categorization and ordering along peudotemporal trajectory using Monocle toolkit. (a) Inspecting the distribution of mRNA (inferred from UMI counts) across all cells in a sample. Only cells with mRNA between 0 – ~20,000 were used for downstream analysis. (b, c) Assigning and counting cell types based on known lineage cell markers. For example, cells expressing PDGF receptor alpha or Fibroblast Specific Protein 1 were assigned to Cell Type #1 representing pan-fibroblasts using Monocle's newCellTypeHierarchy function. Number of different cell types can be visualized as a pie chart (b) and as a table (c). (d) Using Cell Type #1 (fibroblasts) as an example, the genes used for ordering cells can be visualized using a scatter plot that demonstrates gene dispersion vs. mean expression. The red curve shows the cutoff for genes used for ordering calculated by the mean-variance model using Monocle's estimateDispersions function. Genes that meet this cutoff were used for downstream pseudotime ordering. (e, f) Visualization of cell trajectories in a reduced two-dimensional space colored by cell's "State" (e), and by Monocle-assigned "Pseudotime" (f). Please click here to view a larger version of this figure.
This protocol demonstrates how the appropriate preparation of single cells can uncover the transcriptional heterogeneity of thousands of single cells and discriminate functional states or unique cellular identities within a tissue. The protocol does not require fluorescent reporter proteins or transgenic tools and can be applied to the isolation of single cells from various tissues of interest including those from humans; keeping in mind each tissue is unique and this protocol will require some degree of adjustment/modification.
The diverse and highly dynamic transcriptional programs within cells have emphasized the value of single-cell genomics. Aside from isolating high-quality RNA, a critical sample preparation step necessary for high quality datasets is ensuring that cells are completely released from tissue and that cells are healthy and intact. This is relatively straight forward for collecting cells that are easily released, such as circulating cells or in tissues where cells are loosely retained, such as in lymphoid tissues. But this can be challenging for other adult tissues, due to the highly-developed cellular architecture spanning large distances, surrounding extracellular matrix and the often-rigid cytoskeletal proteins involved in maintaining cell structure. Even with appropriate dissociation techniques for the full release of cells, there is potential that the rigorous and often lengthy processing required would alter mRNA quality and cell integrity. In addition, the high temperatures used for enzyme-assisted dissociation also affect transcriptional signatures29,30. The intent of the protocol is to present quality control checks, using tissues such as the myelinated adult nerve and the extracellular matrix-rich adult skin, to demonstrate how optimization can help to overcome these obstacles.
A major consideration when designing any scRNA-Seq experiment is the choice of sequencing depth. Sequencing can be highly multiplexed and read depth can vary from being very low using Drop-Seq2 to up to 5 million reads/cell14 using a full-length RNA-Seq method such as Smart-Seq. Most scRNA-Seq experiments can detect moderate-to-high expression transcripts with sequencing as low as 10,000 reads/cell, which is usually sufficient for cell type classification41,42. Shallow sequencing depth is of value to save on sequencing costs when trying to detect rare cell populations across complex tissues where thousands of cells may be needed to confidently ascribe rare populations. But shallow depth sequencing is not adequate when detailed information on gene expression and processes associated with subtle transcriptional signatures is necessary. Currently, it is estimated that the large majority of genes in a cell are detected with 500,000 reads/cell, but this can vary depending on the protocol and tissue type43,44. While full-length transcript sequencing circumvents the need for assembly and can, therefore, detect novel or rare splice variants, sequencing costs often limit scaling such approaches to examine thousands of cells comprising a complex tissue system. In contrast, 3' tagged single-cell libraries such as the ones described in this protocol typically have lower complexity and require shallower sequencing. It is important to note that libraries generated using the described protocol can be sequenced on one of five supported sequencers: 1) NovaSeq, 2) HiSeq 3000/4000, 3) HiSeq 2500 Rapid Run and High Output, 4) NextSeq 500/550, and 5) MiSeq.
An alternative approach to single cell RNA-Seq, that reduces the need for delicate tissue and cell handling yet maintains some of the benefits of single cell RNA-Seq, is the analysis of RNA from single nuclei45. This approach allows more rapid processing reducing RNA degradation, and more extreme measures to ensure adequate release of nuclei, and thus likely allows for a more confident capture of the transcriptional profiles representing all cells within a given tissue. This would, of course, only provide a portion of the transcriptional activity present within a given cell, thus depending on what experimental objectives are of interest this approach may or may not be appropriate.
Besides the complete characterization of cellular identities within a given tissue, one of the most valuable analyses for scRNA-Seq datasets is the assessment of intermediate transcriptional states across 'defined' cell populations. These intermediary states can impart insights into the lineage relationships between cells within identified populations, which was not possible with traditional bulk RNA-Seq approaches. Several scRNA-Seq bioinformatic tools have now been developed to elucidate this. Such tools can assess the processes involved in, for example, cancer cells transitioning to an oncogenic/metastatic state, stem cells maturing into diverse terminal fates or immune cells shuttling between active and quiescent states. Subtle transcriptome differences in cells may also be indicative of lineage biases that, recently developed bioinformatic tools like FateID, can infer47. Since the distinctions between transitioning cells can be difficult to ascertain given the transcriptional differences may be subtle, deeper sequencing may be necessary46. Fortunately, coverage of a shallowly sequenced library can be increased if interested in probing the dataset further by re-running the library on another flow cell.
Taken together, this protocol provides an easy-to-adapt workflow that enables users to transcriptionally profile hundreds to thousands of single-cells within one experiment. The final quality of a scRNA-Seq dataset relies on optimized cell isolation, flow cytometry, cDNA library generation, and interpretation of raw gene-barcode matrices. To this end, this protocol provides a comprehensive overview of all key steps that can be easily modified to enable studies of diverse tissue types.
The authors have nothing to disclose.
We acknowledge the support staff at the UCDNA Services Facility, as well as the Animal Care facility staff at the University of Calgary. We thank Matt Workentine for his bioinformatics support and Jens Durruthy for his technical support. This work was funded by a CIHR grant (R.M. and J.B.), a CIHR New Investigator Award to J.B., and an Alberta Children's Health Research Institute Fellowship (J.S.).
Products | |||
RNAse out | Biosciences | 786-70 | |
Pentobarbital sodium | Euthanyl | 50mg/kg | |
HBSS | Gibco | 14175-095 | |
Dispase 5U/ml | StemCell Technologies | 7913 | 5 mg/ml |
Collagenase-4 125 CDU/mg | Sigma-Aldrich | C5138 | 2 mg/ml |
DNAse | Sigma-Aldrich | DN25 | 10mg/ml |
BSA | Sigma-Aldrich | A7906 | |
15 ml Narrow bottom tube VWR® High-Performance Centrifuge Tubes | VWR | 89039-666 | |
Sytox Orange Viability Dye | Molecular Probes | 11320972 | 1.3 nM/µl |
Nuc Blue Live ReadyProbes | Invitrogen | R37605 | |
Agilent Bioanalyzer High senitivity DNA Reagents | Agilent | 5067-4626 | |
Kapa DNA Quantification Kit | Kapa Biosystems | KK4844 | |
Equipment | |||
BD FACSAria III | BD Biosciences | ||
Agilent Bioanalyzer Platform | Agilent | ||
Illumina® HiSeq 4000 | Illumina | ||
Illumina® MiSeq SR50 | Illumina | ||
Software | |||
The Cell Ranger | 10x GENOMICS | support.10xgenomics.com/single-cell-gene-expression | |
/software/overview/welcome | |||
Loupe Cell Browser | 10x GENOMICS | support.10xgenomics.com/single-cell-gene-expression | |
/software/downloads/latest | |||
R | https://anaconda.org/r/r |