Single cell sequencing is an increasingly popular and accessible tool for addressing genomic changes at high resolution. We provide a protocol that uses single cell sequencing to identify copy number alterations in single cells.
Detection of genomic changes at single cell resolution is important for characterizing genetic heterogeneity and evolution in normal tissues, cancers, and microbial populations. Traditional methods for assessing genetic heterogeneity have been limited by low resolution, low sensitivity, and/or low specificity. Single cell sequencing has emerged as a powerful tool for detecting genetic heterogeneity with high resolution, high sensitivity and, when appropriately analyzed, high specificity. Here we provide a protocol for the isolation, whole genome amplification, sequencing, and analysis of single cells. Our approach allows for the reliable identification of megabase-scale copy number variants in single cells. However, aspects of this protocol can also be applied to investigate other types of genetic alterations in single cells.
Alterations in DNA copy number can range in size from several base pairs (copy number variants) (CNVs) to entire chromosomes (aneuploidy). Copy number alterations affecting large regions of the genome can have significant phenotypic consequences by altering the expression of up to thousands of genes1,2. CNVs that are present in all cells of a population can be detected by bulk sequencing or microarray-based methods3,4. However, populations can also be genetically heterogeneous, with CNVs existing in a subset of the population or even single cells. Genetic heterogeneity is common in cancer, driving tumor evolution, and also present in normal tissues, with unknown consequence5,6,7,8,9,10.
Traditionally, genetic heterogeneity was assessed either by cytologic approaches or bulk sequencing. Cytologic approaches, such as fluorescence in situ hybridization (FISH), chromosome spreads, and spectral karyotyping (SKY), have the benefit of identifying alterations present in individual cells but have high error rates due to artifacts of hybridization and spreading11. These approaches are also limited in their resolution—only revealing copy number changes spanning several megabases. Sequencing or microarrays of bulk DNA, though higher in accuracy and resolution, is less sensitive. In order to detect genetic heterogeneity by population-based approaches, the variants must be present in a substantial fraction of cells in the population. The emergence of methods to amplify the genomic DNA from single cells has made it possible to sequence the genome of single cells. Single cell sequencing has the advantages of high resolution, high sensitivity, and, when appropriate quality control methods are applied, high accuracy12.
Here, we describe a method for detecting megabase-scale copy number alterations in single cells. We isolate single cells by microaspiration, amplify genomic DNA using linker-adaptor PCR, prepare libraries for next-generation sequencing, and detect copy number variants by both hidden Markov model and circular binary segmentation.
1. Isolating Single Cells
2. Whole Genome Amplification
3. Sequencing
4. Data Analysis
NOTE: A Unix-based environment is required to run the programs and scripts in this section. Install the software mentioned in the protocol following their installation guides. All scripts can be found at https://sourceforge.net/projects/singlecellseqcnv/.
The assembled aspirator should appear similar to the one in Figure 1A. The needle should be drawn such that it is sufficiently wide to accommodate a single cell but not so wide that a large volume is drawn up with the single cell. Aspirating single cells is easiest when there are between one and five cells in a 10X field (Figure 1B).
If whole genome amplification is successful, the sample will appear as a smear on an agarose gel (Figure 2, lanes 1, 2, 4, 5, 6, and 7). A faint or absent smear indicates a failed amplification reaction and the sample should not be sequenced (Figure 2, lanes 3 and 8).
Following library preparation, the fragment size distribution of the samples should be assessed by capillary electrophoresis on a fragment analyzer. Successfully prepared libraries will have a rather even distribution of fragment sizes from 150 to 900 bp (Figure 3, A,B). Failed library preparation will result in a skewed fragment size distribution and such libraries should not be sequenced (Figure 3C).
Processing sequencing data through the hidden Markov model (HMMcopy) and circular binary segmentation (DNAcopy) will parse the genome of each cell into segments of estimated copy number. These segments can then be filtered to identify those with an estimated copy number consistent with gain or loss in a single cell (Table 1). These filtered segments from HMMcopy and DNAcopy should then be overlapped to identify high-confidence CNVs.
Figure 1. Single cell isolation. (A) An assembled microaspirator. (B) A 10X field showing dissociated cells (arrows) and microaspirator needle (lower right corner). Aspirating single cells is easiest when there are between one and five cells in a 10X field. Please click here to view a larger version of this figure.
Figure 2. Whole genome amplification. Agarose gel electrophoresis of 5 µl of whole genome amplification products. Samples that successfully amplify will appear as bright smears from 100 bp to 1 kb (lanes 1, 2, 4, 5, 6, and 7) and can be sequenced. Samples that do not successfully amplify will produce faint smears or no smear (lanes 3 and 8) and should not be sequenced. Please click here to view a larger version of this figure.
Figure 3. Library preparation. Representative results from a fragment analyzer. The graphs show fragment size (in bp) on X axis and relative fluorescence units (RFU) on Y axis. To the right of each graph is a simulated gel lane. (A) Results for an ideal sample, with an even distribution between 150 and 900 bp and without sharp peaks or bias toward one side. This sample is acceptable for sequencing. (B) Results from an okay sample, with a size distribution skewed toward lower fragment sizes. While this is not optimal, the sample can still be sequenced. (C) Results from a failed sample, with predominantly small fragment sizes. This is likely caused by prolonged incubation during the tagmentation step of library preparation. This sample should not be sequenced. Please click here to view a larger version of this figure.
HMMcopy | |||||||
Sample | Segment | Chr | Start | End | State | Median | |
D15-4998 | 23 | chr8 | 144,500,001 | 146,500,000 | 6 | 0.5008794 | |
D15-4998 | 29 | chr10 | 67,000,001 | 134,500,000 | 6 | 0.4031945 | |
D15-4998 | 52 | chr19 | 1 | 20,000,000 | 6 | 0.4616884 | |
D15-4998 | 57 | chrY | 1 | 59,500,000 | 2 | -1.506532 | |
DNAcopy | |||||||
Sample | Chrom | Start bin | End bin | Start | End | Seg mean | Seg med |
D15-4998 | chr10 | 88 | 197 | 62,612,945 | 129,971,511 | 1.4688 | 0.157 |
D15-4998 | chr19 | 0 | 31 | 0 | 28,416,392 | 1.4141 | 0.1674 |
D15-4998 | chrX | 77 | 126 | 51,659,160 | 95,343,369 | 1.3548 | 0.1874 |
D15-4998 | chrY | 0 | 14 | 0 | 23,805,358 | -2.7004 | 0.3591 |
Table 1. Data analysis. Filtered segments generated by HMMcopy and DNAcopy from a single cell. Overlapping these two results reveals a gain on chromosome 10 from 67 to 130 Mb as well as a gain on chromosome 19 from 0 to 20 Mb. We have found that gains on the proximal portion of chromosome 19 are an artifact of single cell sequencing 12.
Traditionally, identifying CNVs and aneuploidy at the level of single cells required cytologic methods such as FISH and SKY. Now, single cell sequencing has emerged as an alternative approach for such questions. Single cell sequencing has advantages over FISH and SKY as it is both genome-wide and high resolution. Moreover, when appropriate quality control methods are applied, single cell sequencing can provide a more reliable assessment of CNVs and aneuploidy as it is not susceptible to the hybridization and spreading artifacts inherent to FISH and SKY. However, many of the recent applications of single cell sequencing have not been substantiated by thorough assessment of the sensitivity and specificity of the methods and analyses. Indeed, some of the analytic approaches used by other studies are associated with high frequencies (>50%) of false positive CNV calls12. The approach we describe has been rigorously tested using cells of known CNV burden in order to determine true and false discovery rates and optimize the sensitivity and specificity of CNV detection12. Using the quality control and analytic approaches described in this protocol, approximately 20% of 5 Mb gains, 75% of 5 Mb losses, and all CNVs exceeding 10 Mb can be detected. Though determining the false discovery rate of single cell sequencing is difficult, we have estimated it to be less than 25%. This protocol can be applied to cells from a variety of sources, the scripts can be modified to adjust the resolution of CNV detection, and the protocol can be adapted to identify other types of genomic alterations.
There are a variety of means of dissociating fresh tissues into single cells, and many publications describe procedures optimized for specific tissues, such as skin24 and brain25. We prefer to isolate single cells by microaspiration as it allows for visual assessment of each cell to be sequenced. However, it is also possible to isolate single cells by fluorescence-activated cell sorting (FACS)26 and microfluidic devices27. If single cell isolation and whole genome amplification is performed manually, it is reasonable to isolate and amplify up to forty cells in a single sitting. In order to obtain high quality single cell sequencing data, it is crucial that the amplification of single cell genomes is uniform and complete. We find that the quality of single cells isolated as well as the efficiency of lysis and amplification has a significant impact on the quality of sequencing data. As such, the cells should be harvested from their native environment just prior to isolation and whole genome amplification should begin immediately after cells are isolated. Moreover, the lysis and fragmentation step should be followed exactly as described in the steps 2.3-2.6.
The algorithms can be adjusted to change the resolution of CNV detection, with opposing effects on sensitivity and specificity12. It is also possible to adjust the thresholds to detect whole-chromosome aneuploidy in the setting of tetraploidy11. However, we find that our approach is limited to detecting CNVs greater than 5 Mb, as noise introduced during whole genome amplification complicates the detection of smaller variants12. Future improvements in whole genome amplification approaches should ultimately enhance the resolution of CNV detection using single cell sequencing.
Single cell sequencing allows for investigation of not only copy number alterations but also single nucleotide variations28,29 and structural variation30. Our protocol for single cell isolation can be applied to answer these other questions. However, the choice of whole genome amplification method depends on the specific application. The method described in this protocol, which is based on polymerase chain reaction, is best suited for detecting copy number alterations because it is associated with lower levels of amplification bias32. For investigating other types of genomic alterations, such as single nucleotide polymorphisms, other methods of whole genome amplification are believed to be more suitable31,32.
The authors have nothing to disclose.
We thank Stuart Levine for comments on this manuscript. This work was supported by the National Institutes of Health Grant GM056800 and the Kathy and Curt Marble Cancer Research Fund to Angelika Amon and in part by the Koch Institute Support Grant P30-CA14051. Angelika Amon is also an investigator of the Howard Hughes Medical Institute and the Glenn Foundation for Biomedical Research. K.A.K. is supported by the NIGMS Training Grant T32GM007753.
Aspirator tube assemblies for calibrated microcapillary pipettes | Sigma | A5177 | |
PVC tubing (ID 3/16", OD 5/16", ~1 foot) | VWR | 89068-500 | |
PVC tubing (ID 5/16", OD 7/16", ~6 inches) | VWR | 89068-508 | |
Acrodisc Syringe Filter with HT Tuffryn Membrane (diameter 25 mm, pore size 0.2 um) | VWR | 28144-040 | |
Serological pipette (5 ml) | BioExpress | P-2837-5 | Any plastic 5 mL pipette should suffice |
In-Line Water Trap for Oxygen Use | Amazon.com | 700220210813 | |
Capillary Melting Point Tubes (ID 0.8-1.1 mm, 100 mm length) | VWR | 34502-99 | |
Modeling clay | VWR | 470156-850 | |
Petri dish (150 mm diameter) | VWR | 25384-326 | Surface should be non-adherent to facilitate aspiration of single cells |
Hard-Shell Full-Height 96-Well Semi-Skirted PCR Plates | Bio-Rad | HSS9601 | Any 96-well PCR plate should suffice |
96-well tissue culture plate | VWR | 62406-081 | Any 96-well tissue culture plate should suffice |
GenomePlex Single Cell Whole Genome Amplification Kit | Sigma | WGA4 | |
Microseal 'A' Film | Bio-Rad | MSA5001 | |
Mini plate spinner | Thomas Scientific | 1225Z37 | |
Thermal cycler | Bio-Rad | 1861096 | Any thermal cycler should suffice so long as it can accommodate a 96-well PCR plate |
Agencourt Ampure Beads | Beckman Coulter | A63880 | |
Dynamag-2 Magnet | Thermo Fisher Scientific | 12321D | Any similar magnetic tube strip should suffice |
Nextera XT DNA Library Preparation Kit | Illumina | FC-131-1096 | |
Nextera XT Index Kit | Illumina | FC-121-1012 | |
Complete kit (optimized for Roche LightCycler 480) | Kapa Biosystems | KK4845 | This kit is optimized for the Roche LightCycler 480 real-time PCR machine. If using a different machine, substitute a kit optimized for your machine. |