Single-cell sequencing reveals genotypic heterogeneity in biological systems, but current technologies lack the throughput necessary for the deep profiling of community composition and function. Here, we describe a microfluidic workflow for sequencing >50,000 single-cell genomes from diverse cell populations.
Sequencing technologies have undergone a paradigm shift from bulk to single-cell resolution in response to an evolving understanding of the role of cellular heterogeneity in biological systems. However, single-cell sequencing of large populations has been hampered by limitations in processing genomes for sequencing. In this paper, we describe a method for single-cell genome sequencing (SiC-seq) which uses droplet microfluidics to isolate, amplify, and barcode the genomes of single cells. Cell encapsulation in microgels allows the compartmentalized purification and tagmentation of DNA, while a microfluidic merger efficiently pairs each genome with a unique single-cell oligonucleotide barcode, allowing >50,000 single cells to be sequenced per run. The sequencing data is demultiplexed by barcode, generating groups of reads originating from single cells. As a high-throughput and low-bias method of single-cell sequencing, SiC-seq will enable a broader range of genomic studies targeted at diverse cell populations.
The genome serves as a blueprint of cellular identity and function, containing the entirety of an organism's coding potential. An understanding of cellular biology at the genome level can explain the observed phenotypic diversity within heterogeneous cell populations. This heterogeneity is apparent in biological systems and has broad implications for human health and disease. For example, gene copy number variations among tumor cells are linked to the evolution and spread of cancer1,2. In bacterial infections, pathogenicity islands present in a small fraction of genomes can be horizontally transferred and lead to the proliferation of antibiotic-resistant bacteria3,4. A primary challenge in studying genomes at the single-cell level is the low quantities of DNA available, as well as the need to analyze thousands of cells to sample the full diversity of genotypes. For these reasons, limitations in experimental throughput have hindered the effectiveness of single-cell studies, biasing results towards the most abundant cells. Single-cell isolation techniques such as flow sorting5,6, optical tweezers7, embedment in bulk gels8, and microfluidics9 are capable of processing hundreds of cells for sequencing; however, this represents only a small fraction of most samples. A method for single-cell genome sequencing with substantially higher throughput would allow deeper and more complete profiling of cell populations, thereby elucidating the role of genotypic diversity within these communities.
Droplet microfluidics enables the high-throughput manipulation of cells and biological reagents within millions of picoliter-scale reactors. To date, microdroplet technologies have been used to study differential expression patterns among cells from heterogeneous tissues10,11,12, deeply sequence long molecules13,14,15, and conduct chromatin immunoprecipitation sequencing (ChiP-seq) analyses on single cells16. Indeed, microdroplets are capable of high-throughput, compartmentalized operations, making them amenable to applications in single-cell genomics. The development of this technology presents its own unique technological challenges, however. Cells must be lysed, purified, and amplified with minimal bias, to uniformly sample cell populations. Additionally, unlike polyadenylated mRNA transcripts in mammalian cells, there is no comparable molecular motif in the genome to facilitate the capture of the target nucleic acid. For these reasons, single-cell genome sequencing has been difficult to implement in microdroplet platforms.
In this work, we provide a detailed protocol of our previously reported single-cell microfluidic approach capable of sequencing the genomes of tens of thousands of cells in a single experiment17. With this technology, called SiC-seq, bacterial cells are encapsulated in micron-scale hydrogels and individually lysed, tagmented, and merged with a microdroplet containing a unique oligonucleotide barcode, which is spliced onto the cell's genomic DNA via a single overlap extension polymerase chain reaction (PCR). The hydrogels serve as isolated containers in which high-molecular-weight genomic DNA is sterically encased, allowing smaller molecules such as detergents and lytic enzymes to access and purify DNA prior to barcoding18. This protocol processes >50,000 single cells in a matter of hours, resulting in a barcoded library ready for sequencing. Following the sequencing, the reads are demultiplexed according to their single-cell barcode sequence, resulting in a dataset comprised of millions of reads, each with a cellular index.
1. Microfluidic Device Fabrication
2. Encapsulation of Cells in Agarose Microgels
NOTE: See Figure 2A.
3. Breaking and Washing the Agarose Microgels
4. Lysis of Cells in Agarose via Lytic Enzymes
5. Detergent-based Microgel Treatment
6. Generating Barcode Droplets by Digital PCR
7. Tagmentation of Genomic DNA in Droplets
NOTE: See Figure 2B.
8. Single-cell Barcoding by Microfluidic Double Merger
NOTE: See Figure 2C.
9. Library Preparation and Sequencing
10. Single-cell Data Analysis
NOTE: Custom Python scripts for quality control and preliminary analysis of SiC-seq data can be downloaded from https://www.github.com/AbateLab/SiC-seq.
The SiC-seq experimental workflow contains 3 PDMS microfluidic devices fabricated using a soft lithography procedure (Figure 1). A co-flow dropmaker (Figure 3A) generates 25 µm of digital barcode droplets for labeling genomic DNA with a unique single-cell identifier. The barcode oligonucleotides consist of a 15 bp degenerate sequence flanked by PCR handles for amplification (Table 2, BAR primer). The barcodes are diluted to a femtomolar concentration to achieve the single-molecule encapsulation, and all droplets receive either 0 or 1 barcode fragment(s). The droplets containing a barcode are amplified, yielding many copies of double-stranded barcode amplicons. A nucleic acid stain is used to verify the successful amplification and quantify the encapsulation rate of the barcode fragments (Figure 4B). The microgels are generated by co-flowing a bacterial cell suspension and a molten agarose gel at equal flow rates (Figure 2A). The agarose is prepared at twice the desired final concentration, as the co-flow dropmaking process effectively dilutes the aqueous solutions by a factor of 2. As the agarose cools, it solidifies into a 25 µm diameter microgel occupying the spherical volume of the droplet.
A series of wash and lysis steps purifies the high-molecular-weight genomic DNA in the microgels (Figure 2B). After breaking the emulsions, aqueous washes are carried out in large volumes to dilute out trace organic solvents which can inhibit the downstream enzymatic treatments. The washed microgels are observed under a microscope to verify the cell encapsulation rate (Figure 4A). A cocktail of enzymes with broad lytic activity is added to the microgel suspension to digest the cell walls of the bacteria and eukaryotic microbes19. A second treatment with Proteinase K and detergent degrades the proteins and solubilizes cell debris.
Tagmentation of the purified DNA is carried out in droplets to avoid potential cross-contamination resulting from the diffusion of small tagmented DNA fragments between the microgels18. A droplet encapsulation device (Figure 3B) compartmentalizes each microgel with a buffer and tagmentation enzyme, which simultaneously fragments double-stranded DNA while also "tagging" it with a preloaded oligonucleotide20. The microgels are loaded into the droplets as close-packed particles, achieving encapsulation rates approaching 1 microgel for every drop with few doublets21 (Figure 4C).
In the final step of the microfluidic workflow (Figure 2C), a device performs a double merger operation combining 1 barcode drop, 1 microgel-containing drop, and the amplification mix in a controlled two-step process. First, a droplet containing PCR reagent is paired and merged with a barcode drop in the region shown in yellow (Figure 5). Saltwater electrodes in the microfluidic channel produce a high electric field gradient which triggers the droplet merger. In a similar manner, the first merged droplet is paired with a microgel droplet and merged a second time in the region shown in red. The droplets are collected and thermal cycled off-chip in a single-overlap extension (SOE) PCR. The overlapping complementary ends of the barcode and the tagmented genomic DNA allow fusion and exponential amplification of only properly barcoded constructs.
The sequencing data is first filtered by a read quality and then parsed by grouping the reads according to their 15-bp single-cell barcode sequence. For a barcode group to be considered valid, it should contain a minimum number of reads; this thresholding limits the analysis to cells with a useful amount of sequencing data and removes the PCR-mutated barcode "orphans" from the dataset. In this sample run, the minimum is set to 7.5 kbps per group (50 reads of 150 bp each). A histogram of the barcode counts versus the group size shows that a significant portion of the valid barcode groups is just above the threshold size (Figure 6A).
In a control experiment where the microbial community composition is known, the purity and relative abundance metrics are used to evaluate the quality of a SiC-seq run. Here, a synthetic 10-cell community consisting of 3 gram-negative bacteria, 5 gram-positive bacteria, and 2 yeasts is analyzed. The purity of a given barcode group is defined as the number of reads mapping to the most common genome in the group divided by the total number of reads in the group. The vast majority of the barcode groups have purities greater than 0.95 (Figure 6B). Relative abundance of the cell types is calculated by counting the raw reads and by counting the barcode groups, where the groups are assigned a cell type corresponding to the consensus of its member reads (Figure 6C). The abundance of reads and barcode groups track in roughly equal proportions, indicating that the cell populations are being sampled such that certain species are not contained in disproportionately small or large barcode groups. Plotting the aggregate coverage of all barcode groups from a single species indicates a high coverage across the entire genome, with few or no dropout regions (Figure 7). The uniformity of coverage can be verified with a frequency distribution of normalized coverage values, with most values centered around the average (Figure 7, inset).
Figure 1: Fabrication of microfluidic devices by photolithography. (A) Master molds with a single feature height are fabricated by spin coating a layer of SU-8 photoresist onto a silicon wafer. The photoresist is then patterned with a photolithographic mask and UV light, crosslinking the exposed SU-8. Finally, uncrosslinked SU-8 is dissolved in a developer bath. The resulting mold is used to cast PDMS which is bonded to a glass slide to produce the complete microfluidic device. (B) For a double-layer device, the fabrication similarly begins with spin coating and exposure steps. These steps are then repeated to create a two-layer device. Please click here to view a larger version of this figure.
Figure 2: Overview of the SiC-seq workflow. (A) A microbial suspension is co-flowed with molten agarose in a dropmaker device to encapsulate single cells in microgels. (B) The microgels are subjected to a series of washes to purify the bacterial genomic DNA. Lytic enzymes digest the cell walls of gram-positive bacteria and yeasts, and detergent solubilizes the cellular debris. The microgels are re-encapsulated into droplets for the tagmentation to reduce cross-contamination. (C) The microfluidic merger combines a digital PCR barcode, a tagmented microgel genome, and an amplification mix at a rate >1 kHz. Off-chip SOE-PCR splices a unique single-cell barcode onto the tagmented genome and selectively amplifies fully barcoded constructs. Please click here to view a larger version of this figure.
Figure 3: Microfluidic devices for dropmaking and microgel re-encapsulation. (A) This panel shows a co-flow dropmaker (25 µm of feature height). Cells and molten agarose are introduced into the device at equal flow rates to produce 25 µm droplets at a 25 µm x 25 µm junction. For the digital barcode dropmaking, the cell inlet is plugged, and a PCR mix is introduced into the agarose inlet. (B) This panel shows a microgel re-encapsulation device (25 µm of feature height). The microgels flow into a funnel-shaped inlet to maintain their close-packed ordering and receive a volume of tagmentation mix prior to the re-encapsulation at a 25 µm x 30 µm junction. Please click here to view a larger version of this figure.
Figure 4: Micrographs of droplets and microgels. (A) This panel shows washed 25 µm microgels prior to enzymatic lysis. The bacteria are fluorescently stained for the quantification of the encapsulation rate. Poisson loading statistics dictate that the cells should be encapsulated at a rate of 1 in 10 drops or less to minimize the frequency of multiple-encapsulation events. (B) This panel shows a fluorescence microscopy image of 25 µm digital barcode droplets treated with a nucleic acid stain. The droplets containing amplified barcode fragments produce a strong fluorescence signal. (C) This panel shows microgels re-encapsulated in 50 µm drops. The close-packing of the microgels allows for encapsulation rates approaching 1 gel per drop with few doublets. Please click here to view a larger version of this figure.
Figure 5: Microfluidic double merger device for single-cell genome barcoding. A two-step merger operation pairs barcode droplets with tagmented genomes at a high-throughput. A droplet of PCR mix is first generated and merged with a barcode droplet in the region shown in yellow using saltwater electrodes. Next, a droplet containing a microgel is introduced and merged a second time in the region shown in red. Oil inlets allow for precise control of the spacing between the reinjected droplets. The barcode reinjection chamber and its spacer oil are placed on the shorter 25 µm layer, shaded in blue. All other device features belong to the thicker layer with 45 µm of total height. Please click here to view a larger version of this figure.
Figure 6: Barcode group metrics for a 10-cell synthetic microbial community. (A) This panel shows the distribution of the barcode group sizes. The number of groups of a given size decreases exponentially as the group size increases. A minimum threshold of 7.5 kbps per group limits the analysis to groups with a sufficient amount of information and removes the PCR-mutated sequence "orphans." (B) This panel shows the distribution of barcode group purities. The vast majority (>90%) of groups are of very high purity (>95%). (C) This panel shows the relative abundance of 10 species calculated at the read and barcode group level. The 2 counting methods produce similar results, indicating that the barcode group sizes are consistent across species. Please click here to view a larger version of this figure.
Figure 7: Aggregate genomic coverage of Bacillus subtilis barcode groups. The reads from all the barcode groups mapping to the bacterium B. subtilis (N = 9,398) are pooled and analyzed in aggregate. A circular coverage map illustrates the coverage uniformity of the SiC-seq reads, with no observable dropout regions. A dashed line around the circumference indicates the average coverage (5.55x). The inset histogram of the relative coverage frequencies shows that a bulk of the bases are covered at a depth near the genome-wide average, represented by the dashed line. Please click here to view a larger version of this figure.
Device | 1st Layer Height (µm) | 1st Layer Spin Speed (rpm) | 2nd Layer Height (µm) | 2nd Layer Spin Speed (rpm) |
Co-flow dropmaker | 25 | 4000 | N/A | N/A |
Gel re-encapsulator | 25 | 4000 | N/A | N/A |
Double merger | 25 | 4000 | 20 | 5000 |
Table 1: Microfluidic device fabrication parameters. This table shows a listing of the microfluidic devices used in the SiC-seq workflow with their required speeds for photoresist spin coating (based on the manufacturer's specifications for SU-8 3025).
Label | Sequence (5' > 3') | ||||
BAR | GCAGCTGGCGTAATAGCGAGTACAATCTGCTCTGATGCCGCATAGNNNNNNNNNNNNNNNTAAGCCAGCCCCGACACT | ||||
DNA_BAR | CTGTCTCTTATACACATCTCCGAGCCCACGAGACGTGTCGGGGCTGGCTTA | ||||
P7_BAR | CAAGCAGAAGACGGCATACGAGATCAGCTGGCGTAATAGCG | ||||
P5_DNA | AATGATACGGCGACCACCGAGATCTACACTCGTCGGCAGCGTC | ||||
I7_READ | GCCCACGAGACGTGTCGGGGCTGGCTTA |
Table 2: Primer sequences.
Supplemental File 1: Please click here to download this file.
Supplemental File 2: Please click here to download this file.
The SiC-seq microfluidic workflow produces single-cell genome sequencing data from thousands of bacterial cells. Digital barcodes spliced onto the genomes of microgel-encapsulated cells allow for the in silico deconvolution of NGS data into groups of barcoded reads originating from the same cell. A control experiment with a microbial community of known composition is necessary for evaluating the purity of the barcode groups. A large fraction of low-purity groups indicates that the cell encapsulation rate is too high or that there is significant droplet cross-contamination occurring during the microfluidic processing steps. According to Poisson statistics, the barcodes and cells should be encapsulated at a target ratio of 1 particle for every 10 drops to limit the rate of multiple encapsulation events to less than 5% of all non-empty droplets. An encapsulation rate higher than this increases the rates of doublets exponentially, so the verification of the encapsulation ratio during the dropmaking process is of critical importance. Users should be particularly cautious of the encapsulation of multiple cells in a single microgel because reads from different cells sharing the same barcode sequence cannot be bioinformatically separated. In the case that 1 cell receives 2 different barcodes, the barcode group purity is unaffected although the abundance metrics are skewed when counting by barcode sequence.
Droplet cross-contamination may also arise due to suboptimal merger conditions. During a successful operation, the microfluidic merger device (Figure 5) can controllably pair 1 barcode droplet with 1 microgel and a volume of PCR reagent. Non-ideal flow rates will result in a droplet pairing at incorrect ratios: 1 barcode could be paired with 2 microgels, for example. All flow rates listed in the protocol are intended to be estimates and may need to be adjusted depending on slight variations in the device geometry and droplet sizes. Users with access to cameras with high-speed recording capabilities (>10,000 frames/s) should verify the correct droplet merger at the beginning and over the course of the microfluidic operation. Users without access to a high-speed camera can collect a small volume of the merged output and manually measure the droplet sizes under a microscope. The droplet size should be uniform: an excess of unmerged barcodes or microgel drops indicates that the reinjection rates should be reduced accordingly.
Several general precautions should be taken when handling microgels and microdroplets to preserve their integrity. Microgels, though mechanically robust, must be sufficiently cooled prior to the breaking and washing steps to ensure complete gelation. Non-spherical microgels are an indication that the agarose was not given adequate time to solidify. When washing microgels, spin the suspensions down at the required speeds to avoid a loss of product. Agarose hydrogel has a refractive index closely matching that of water and may be difficult to see in a tube22, so users should carefully identify the gel-liquid boundary prior to aspiration. Water-in-oil droplets are susceptible to coalescence by the build-up of static forces23 on laboratory gloves and tubing. For this reason, we recommend loading the droplet reinjection syringes with bare hands and treating all the reinjection lines with an anti-static gun prior to the pump priming. Large coalesced droplets can be removed by slowly rotating the emulsions in a syringe and manually aspirating the larger drops, which accumulate near the top due to their larger buoyant force.
SiC-seq is the first technology to demonstrate single-cell genome sequencing of >50,000 bacterial cells. This platform offers significant advantages in throughput over existing approaches and enables a deeper sampling of heterogeneous microbial communities. To date, microfluidic technologies for single-cell genome sequencing have employed microchambers9 and microwells24 for cell isolation and amplification, but with throughputs in the range of only tens to hundreds of cells. The flow sorting of single cells into wellplates5,6 requires no specialized microfluidic instrumentation but possesses a similarly low throughput. Given that soil and water samples from the environment commonly have alpha diversities of >1,000 at the species level25,26, SiC-seq is highly advantageous by virtue of its ability to sample a far greater number of organisms. The SiC-seq workflow is adaptable to cell inputs from laboratory culture, the natural environment, or a living host. A cell sample need only be in an aqueous suspension and free of large particles (>10 µm) to be suitable for microfluidic encapsulation. For example, the method has been previously applied to a sample of seawater using a series of wash and filtering steps to pre-process the cells prior to encapsulation17.
The SiC-seq protocol generates a relatively sparse amount of sequencing data from each single cell and may not be suitable for all applications. Some bioinformatics algorithms such as de novo genome assembly or single nucleotide variant (SNV) calling require higher coverage depths to work effectively. Instead, barcode groups can be clustered in silico by taxonomic binning methods27 so that algorithms can be applied on larger sets of reads. The relatively low overall barcoding efficiency of the SiC-seq workflow may also present challenges in cases where the availability of the input sample is low. SiC-seq relies on a Poisson-distributed barcode encapsulation step, therefore approximately 10% of the cells receive a molecular barcode and are amplified during the final library preparation step. While this is comparable to other microdroplet-based barcoding schemes10, users working with precious cell samples may have difficulty achieving the adequate library yield for sequencing and may need to increase the number of PCR cycles in the final amplification step. Another potential solution for users with microfluidic expertise is to sort positive barcode droplets after the digital PCR step, thus bringing the overall barcoding efficiency to >85%28.
A potential future direction for SiC-seq technology is adapting the workflow for use with mammalian cells, paving the way for new clinical single-cell studies. As an example, an analysis of the copy number variation among single cancer cells may further our understanding of the role of heterogeneity in cancer pathology2. Alternatively, integrating SiC-seq with existing methods to probe and enrich DNA sequences of interest29 would enable the targeted single-cell sequencing of subpopulations or rare strains of cells. With environmental samples, genes from within a known metabolic pathway could be targeted and analyzed contextually alongside neighboring genes to identify novel genomic islands. From within a human host environment, low-titer pathogenic bacteria samples could be isolated and sequenced at the single-cell level to examine more closely their genotypic origins of virulence.
The authors have nothing to disclose.
This work was supported by the National Science Foundation through a CAREER Award (grant number DBI-1253293); the National Institutes of Health (NIH) (grant numbers HG007233-01, R01-EB019453-01, 1R21HG007233, DP2-AR068129-01, R01-HG008978); and the Defense Advanced Research Projects Agency Living Foundries Program (contract numbers HR0011-12-C-0065, N66001-12-C-4211, HR0011-12-C-0066).
3" silicon wafers, P type, virgin test grade | University Wafers | 447 | |
SU-8 3025 photoresist | Microchem | 17030192 | |
Spin coater | Specialty Coating Systems | G3P-8 | |
Photomasks | CadArt Servcies | (custom) | See Supplemental Files for mask designs |
PGMEA developer | Sigma-Aldrich | 484431 | |
Isopropanol | Sigma-Aldrich | 109827 | |
Sylgard 184 silicone elastomer kit | Krayden | 4019862 | |
Degassing chamber | Bel-Art | 42025 | |
0.75 mm biopsy punch | World Precision Instruments | 504529 | |
Glass microscope slides (75 mm x 50 mm) | Corning | 294775X50 | |
Aquapel (hydrophobic glass treatment) | Pittsburgh Glass Works | 47100 | |
PE-2 polyethylene tubing | Scientific Commodities | B31695-PE/2 | |
1 mL syringes | BD | 309628 | |
27 gauge needles | BD | 305109 | |
Syringe pump | New Era Pump Systems | NE-501 | |
Novec HFE-7500 fluorinated oil (HFE) | 3M | 98-0212-2928-5 | |
FC-40 fluorinated oil | Sigma-Aldrich | F9755 | |
PEG-PFPE surfactant | Ran Biotechnologies | 008-FluoroSurfactant | |
Space heater | Lasko | CD09250 | |
Agarose, low gelling temperature | Sigma-Aldrich | a9414 | |
TE (10X) | Rockland | mb-007 | |
PBS 1X, pH 7.4 | E&K Scientific Products | EK-65083 | |
OptiPrep (density gradient medium) | Sigma-Aldrich | d1556 | |
1H,1H,2H-Perfluoro-1-Octanol (PFO) | Sigma-Aldrich | 370533 | |
Span 80 (sorbitane monooleate) | Sigma-Aldrich | s6760 | |
Hexane | Sigma-Aldrich | 139386 | |
Tween 20 (polysorbate 20) | Sigma-Aldrich | p2287 | |
Lysozyme Type IV | MP Biomedicals | 195303 | |
Mutanolysin | Sigma-Aldrich | M9901 | |
Zymolyase (yeast lytic enzyme) | Zymo Research | e1004 | |
Lysostaphin | Sigma-Aldrich | L7386 | |
Sodium chloride | Sigma-Aldrich | S9888 | |
EDTA | Sigma-Aldrich | E6758 | |
Tris-HCl, pH 7.5, 1M | Invitrogen | 15567-027 | |
Dithiothreitol (DTT) | Teknova | d9750 | |
Lithium dodecyl sulfate | Sigma-Aldrich | L9781 | |
Proteinase K | New England Biosciences | P8107S | |
Ethanol, 200 Proof (100%) | Koptec | V1001 | |
SYBR Green I (nucleic acid stain) | Invitrogen | S7563 | |
PEG 6k | Sigma-Aldrich | 81260 | |
Triton X-100 (octylphenol ethoxylate) | Sigma-Aldrich | t8787 | |
Nextera DNA Library Prep Kit | Illumina | FC-121-1030 | |
Phusion Hot Start Flex Master Mix (High-Fidelity Hot Start Master Mix) | New England Biosciences | m05365 | |
Platinum Multiplex PCR Master Mix (Taq Master Mix) | Applied Biosystems | 4464263 | |
Warmstart 2.0 Bst Polymerase (isothermal polymerase) | New England Biosciences | m0538m | |
NT buffer from Nextera XT kit (neutralization buffer) | Illumina | FC-131-1024 | |
Cold cathode fluorescent inverter | (custom) | (custom) | |
DC power supply | Mastech | HY1503D | |
Zerostat 3 anti-static gun | Milty | 5036694022153 | |
3D-printed centrifuge syringe holder | (custom) | (custom) | See Supplemental Files for 3D print file |
Zymo DNA Clean & Concentrator-5 | Zymo Research | D4003 |