Described here is a protocol for tagging endogenously expressed proteins with fluorescent tags in human induced pluripotent stem cells using CRISPR/Cas9. Putatively edited cells are enriched by fluorescence activated cell sorting and clonal cell lines are generated.
A protocol is presented for generating human induced pluripotent stem cells (hiPSCs) that express endogenous proteins fused to in-frame N- or C-terminal fluorescent tags. The prokaryotic CRISPR/Cas9 system (clustered regularly interspaced short palindromic repeats/CRISPR-associated 9) may be used to introduce large exogenous sequences into genomic loci via homology directed repair (HDR). To achieve the desired knock-in, this protocol employs the ribonucleoprotein (RNP)-based approach where wild type Streptococcus pyogenes Cas9 protein, synthetic 2-part guide RNA (gRNA), and a donor template plasmid are delivered to the cells via electroporation. Putatively edited cells expressing the fluorescently tagged proteins are enriched by fluorescence activated cell sorting (FACS). Clonal lines are then generated and can be analyzed for precise editing outcomes. By introducing the fluorescent tag at the genomic locus of the gene of interest, the resulting subcellular localization and dynamics of the fusion protein can be studied under endogenous regulatory control, a key improvement over conventional overexpression systems. The use of hiPSCs as a model system for gene tagging provides the opportunity to study the tagged proteins in diploid, nontransformed cells. Since hiPSCs can be differentiated into multiple cell types, this approach provides the opportunity to create and study tagged proteins in a variety of isogenic cellular contexts.
The use of genome-editing strategies, especially CRISPR/Cas9, to study cellular processes is becoming increasingly accessible and valuable1,2,3,4,5,6,7. One of the many applications of CRISPR/Cas9 is the introduction (via homology directed repair (HDR)) of large exogenous sequences such as GFP into specific genomic loci that then serve as reporters for the activity of a gene or protein product8. This technique can be used to join a fluorescent protein sequence to an endogenous open reading frame where the resulting endogenously regulated fusion protein can be used to visualize the subcellular localization and dynamics of the protein of interest5,6,9,10,11. While endogenously tagged proteins offer many benefits compared to overexpression systems, inserting large sequences into the human genome is an inefficient process typically demanding a selection or enrichment strategy to obtain a population of cells that can be easily studied5,12.
This protocol describes the insertion of a DNA sequence encoding a fluorescent protein (FP) into a desired genomic locus. The protocol includes design and delivery of the donor template plasmid, and the ribonucleoprotein (RNP) complex (wild type S. pyogenes Cas9 protein combined with synthetic CRISPR RNA (crRNA) and trans-activating crRNA (tracrRNA)). Also described is the enrichment of putatively edited cells via fluorescence activated cell sorting (FACS) and the clonal cell line generation process. To date, this method has been used to generate hiPSC lines with either monoallelic or (rarely) bi-allelic green fluorescent protein (GFP) tags labeling twenty-five proteins representing major cellular structures. The resulting edited cells from these efforts have been confirmed to have the expected genetic insertion, express a correctly localizing fusion protein, and maintain pluripotency and a stable karyotype12 (and unpublished data). This method has also been used to generate multiple other single and dual (two different proteins tagged in the same cell) edited populations of hiPSCs (unpublished data).
Human iPSCs derived from a healthy donor were chosen for these genome-editing efforts because, unlike many conventional cell lines, they are diploid, karyotypically stable, non-transformed, and proliferative. These properties provide an attractive model for studying fundamental cell biology and disease modeling. Furthermore, the differentiation potential of hiPSCs provides the opportunity to study multiple developmental stages in parallel across various lineages and cell types using isogenic cells including organoids, tissues and "disease in a dish" models13,14,15. While this protocol was developed for hiPSCs (WTC line), it may be informative for the development of protocols using other mammalian cell lines.
1. In Silico Design of crRNA and Donor Template Plasmid for FP Knock-in
2. Ribonucleoprotein (RNP) Transfection for CRISPR/Cas9 Mediated Knock-in in hiPSCs
NOTE: In this protocol, the term 'gRNA' describes synthetic crRNA and tracrRNA properly re-suspended, quantified, and pre-complexed per manufacturer's instructions (see Table of Materials). Supplement all media with 1% Penicillin Streptomycin. General culturing guidelines of the WTC hiPSC line are described in more detail at the Allen Cell Explorer18,19. WTC hiPSCs are used in this protocol, but with proper transfection optimization, electroporation of RNP and donor template plasmid may be successfully adapted to other cell types.
3. FACS-Enrichment of Putatively Edited hiPSCs
Note: When sorting stem cells, adapt instrument settings to promote cell survival as suggested in the Discussion. Briefly, use the largest nozzle possible (130 µm), a low flow-rate (≤ 24 µL/min), preservative-free sheath fluid (such as saline, see Table of Materials), and low sample pressure (10 psi).
4. Generating Putatively Edited Clonal hiPSC Lines
5. Cryopreservation of clonal cell lines in 96-well plate format
The goal of this experiment was to fuse mEGFP (monomeric enhanced GFP) to the nuclear lamin B1 protein by introducing the mEGFP sequence to the 5ʹ end of the LMNB1 gene (N-terminus of the protein). A linker (amino acid sequence SGLRSRAQAS) was chosen based on previous cDNA constructs from the Michael Davidson Fluorescent Protein Collection21. Because the crRNA binding region in the donor template plasmid for each candidate crRNA was disrupted after the in silico insertion of mEGFP and the linker sequence, no point mutations needed to be made to disrupt potential crRNA recognition and cleavage by Cas9 of the donor sequence (Figure 1). The donor sequence contained 1 kb homology arms flanking both ends of the mEGFP-linker sequence. The resulting 2,734 bp of DNA was cloned into a pUC57 backbone, sequence verified, and the resulting donor template plasmid was purified using an endotoxin-free maxi prep. The donor template plasmid and RNP complex were transfected, putatively edited cells were enriched, and the localization of the mEGFP-nuclear lamin B1 fusion protein was confirmed by fluorescence microscopy (Figure 2). Only results from the crRNA1 transfection are described here, although both crRNA sequences produced putatively edited populations12.
When compared to the negative control, which contained no gRNA, Cas9 protein, or donor template plasmid in the electroporation reaction (buffer only), the LMNB1 crRNA1 transfected cells contained 0.95% mEGFP-positive cells representing the putatively edited mEGFP-nuclear lamin B1 population (Figure 3a). This result was within the range of knock-in efficiencies observed across many genomic loci using this method, as previously reported12.
The mEGFP-positive cells were isolated by FACS and imaged by live microscopy to confirm expected localization of the mEGFP-nuclear lamin B1 fusion protein to the nuclear envelope. After FACS-enrichment, approximately 90% of sorted cells from the LMNB1 crRNA1 population were mEGFP-positive (as determined by microscopy), suggesting that some mEGFP-negative cells co-purified with the GFP-positive cells during the sorting procedure. This was an acceptable level of enrichment that allowed for picking of 96 clones that could then be genetically screened for successful editing. Generally, a cut-off for successful enrichment is 50% GFP positive.
The majority of the sorted cell population displayed fluorescence at the nuclear envelope (nuclear periphery) in nondividing cells and to an extended nuclear lamina within the cytoplasm during mitosis providing confidence in the correct genomic edit at the LMNB1 locus. The enriched population contained cells with either bright or dim signal. This difference in signal intensity may reflect a combination of correct and incorrect editing outcomes and highlights the utility of generating a genetically validated clonal line for further studies (see Discussion) (Figure 3b)12. After clonal line generation, genetically validated cells showed uniform GFP intensity in microscopy experiments (Figure 3c).
Figure 1. Design strategy for N-terminus GFP tagging of LMNB1 gene. The GFP tag was designed for N-terminal insertion 5ʹ of the first exon of LMNB1 located on chromosome 5. Both 5ʹ and 3ʹ homology arms are 1 kb each and meet between the start codon (ATG) and the second codon (homology arms only partially represented in figure). Two candidate crRNAs were designed to guide Cas9 to cleave as close to the intended insertion site as possible, while still being unique in the genome. Sequence for mEGFP and an amino acid linker were inserted just 3ʹ of the start codon (mEGFP and linker sequence not to scale). Please click here to view a larger version of this figure.
Figure 2. Workflow for producing endogenously tagged hiPSC clonal lines. Transfection components, including the Cas9/crRNA/tracrRNA RNP complex (shown as red Cas9 protein with gold crRNA and purple tracrRNA), the donor template plasmid containing the homology arms (HAs, shown in gold), and FP+linker sequence (shown in green), were electroporated. After 4 days, the FP-positive putatively edited cells were enriched by FACS and expanded as a population by seeding all of the sorted cells into a single well of a 96-well plate (~1,000 cells), and then expanded in culture until a working population of several million cells could be assayed as the "enriched population" (see Protocol step 3.8). The yield of FP-positive cells differs by experiment due to variable rates of HDR12; a successful enrichment may typically include ~300-5,000 cells after transfection of approximately 1.6 x 106 hiPS cells. Preliminary imaging studies confirmed the signal and localization of the fusion protein in the enriched population. Colonies were manually picked into a 96-well plate for expansion and cryopreservation. Further genomic quality control screening using droplet digital PCR (ddPCR) and other PCR-based assays was then used to identify properly edited clones, as previously described12. Please click here to view a larger version of this figure.
Figure 3. Enrichment of putatively edited cell populations. (A) Flow cytometry plots of the LMNB1 edited cells four days post-transfection. The y-axis displays GFP intensity and the x-axis displays forward scatter. Sorting gates were set based on the buffer-only control. Since hiPSCs are sensitive to perturbation, live/dead stain was omitted and a very conservative FSC/SSC gate was used instead. (B) After enrichment, the population of LMNB1 Cr1 edited cells showed ~90% of cells with GFP localizing to the nuclear envelope (expected LMNB1 localization). The population contained cells of varying GFP intensity as well as some GFP-negative cells. Scale bars are 10 microns. (C) After clonal line generation, cells showed a uniform GFP intensity, with some cell-cycle dependent differences. Scale bar is 20 microns.
The method presented here for generating endogenously regulated fluorescent protein fusions in hiPSCs is a versatile and powerful approach for generating gene edited cell lines with applications ranging from live cell imaging to various functional studies and "disease in a dish" models using patient-derived hiPSC lines13,14,15. While this method has been used to introduce large FP tags to the N- or C-terminus of endogenous proteins, it could potentially be used to introduce other tags or small genetic changes to model or correct disease-causing mutations22,23. For smaller inserts, the size of the homology arm may be reduced, but the general approach to editing presented in this method may still apply24,25. While the use of hiPSCs is strongly encouraged for their vast utility, with careful optimization, this protocol may be adapted to edit the genomes of other mammalian cell lines.
When identifying a gene of interest for FP tagging, transcript abundance estimates (from microarray or RNA-Seq data) are a good starting point for assessing whether a gene or isoform of interest is expressed, although transcript levels do not always correlate with protein levels. The FACS-enrichment strategy described here will work best for genes that are at least moderately well expressed in the cell type of interest. This strategy has also been successful in selecting for fusions that show punctate and/or discrete localization patterns such as centrin, desmoplakin, and paxillin where the signal to background ratio is very low12,19. Genes that are not highly expressed or are only expressed in derivative cell types may require additional selection strategies.
The starting point for crRNA and donor template plasmid designs used in human cell lines should be the human reference genome (GRCh38). Because the genomes of different cell lines can vary within the same species, and because CRISPR/Cas9 is sequence-specific, it is extremely helpful to identify cell line-specific variants (single nucleotide polymorphisms or insertions/deletions (indels)) that differ from the reference genome and incorporate these into the design. This ensures that crRNAs will be compatible with the host genome and that the donor template plasmid homology arms will retain any cell-line specific variants. A suggested strategy is to incorporate homozygous variants into crRNAs and donor template plasmid homology arms during the design process. Incorporating heterozygous variants is optional. The specific reagents used for large knock-in experiments and other key considerations for this protocol are discussed below.
Cas9 Protein
The primary benefit of using Cas9 protein is that introducing the Cas9 and gRNA as an RNP complex has been shown to result in a limited duration of nuclease activity compared to plasmid-based approaches where expression of the Cas9 and gRNA may continue for days and lead to greater on- and off-target activity26,27. An additional benefit of using Cas9 protein is that it is readily available to cleave once inside the cells. This contrasts with more conventional methods of using Cas9 mRNA or Cas9/gRNA plasmid that require transcription, translation and protein processing26,28. Wildtype S. pyogenes Cas9 protein is now available from many commercial sources.
Guide RNA
There are many publicly available tools for finding crRNA targets near the desired FP insertion site that have zero or few predicted off-targets in the host genome29,30,31,32. Efficiencies in HDR and the precision of the HDR outcome vary widely between crRNA targets used at a given locus12. For this reason, testing several crRNAs (2-4 and preferably within 50 bp of the desired insertion site) per locus is recommended as this may increase the probability of a successful editing experiment. Current possibilities for delivering gRNA include synthetic 2-part crRNA and tracrRNA, synthetic single gRNAs (sgRNAs), in vitro transcribed sgRNAs, or delivering a plasmid to cells expressing the sgRNA from a U6 promoter. This protocol was not optimized for high cleavage activity. Unmodified 2-part crRNA and tracrRNA (see Table of Materials) were used with the goal of generating mono-allelic FP-tagged cell lines while causing the least potential perturbation to the cells.
Donor Template Plasmid
Because some of the homology arm sequence provided in the donor template plasmid will be incorporated into the host genome during the HDR event, point mutations to the crRNA recognition sites should be introduced to prevent further cleavage by Cas9 following HDR. Often the simplest disruptive change is to mutate the PAM sequence. Because some non-canonical PAM sequences can still be recognized by wild type S. pyogenes Cas9, it is best to avoid using NGG, NAG or NGA33. When mutating the homology arm, avoid non-synonymous mutations and the introduction of rare codons. If a synonymous change to the PAM sequence is not possible, consider making three synonymous point mutations in the seed region (10 bp proximal to PAM) of the crRNA binding site. Extreme care should be taken when making these changes in the 5ʹ untranslated region (UTR) since these regions may contain important regulatory sequences. Consulting a genetic conservation database such as the UCSC Genome Browser's Comparative Genomics tracks can provide guidance in these cases, as changes to non-conserved bases may be better tolerated than changes to highly conserved bases17. Sometimes the mere insertion of the FP sequence is enough to disrupt the crRNA binding site (as in Figure 1); however, the newly appended sequence should be checked for the persistence of crRNA binding and PAM sequences.
Amino acid linkers between the FP and the native protein are recommended to conserve the function of the fusion protein34. Often an amino acid linker may be chosen for its particular charge or size. If a cDNA fusion with a design similar to the targeted endogenous fusion protein has been well studied, that same linker sequence can be used for the CRISPR/Cas9 knock-in experiment12,19. If such information is unavailable, a short linker such as GTSGGS has also been used successfully12. Other studies have demonstrated success with a generic small 3-amino acid linker sequence for a variety of targets35.
Transfection and FACS Enrichment
Many commercially available transfection reagents are formulated for delivery of certain types of molecules to cells, whereas an electroporation system can be used to deliver reagents with a wide range of size, charge, and composition. In addition to being a common transfection method for hard-to-transfect cells like hiPSCs, electroporation also carries the benefit of delivering all three components for CRISPR/Cas9-mediated FP knock-in as described in this method. Electroporation was found to produce the best results when compared to other commercially available reagents when developing this method (data not shown), and has also been used by others for RNP delivery26,28,36.
When using this protocol for editing hiPSCs, special care should be taken to ensure gentle handling of the cells before and after the gene editing process for optimal cell survival and minimal spontaneous differentiation. In particular, the FACS enrichment methods should be adapted for sorting stem cells by using the largest nozzle possible (130 µm), a low flow-rate (≤24 µL/min), preservative-free sheath fluid (such as saline, see Table of Materials), and low sample pressure (10 psi). Instead of single cell sorting, which results in suboptimal viability in stem cells, the FACS-enriched hiPSCs are sorted in bulk and expanded as a population to optimize cell viability and stem cell integrity. However, single cell sorting may be appropriate for less sensitive cell types. To promote cell survival, cells are returned to culture no longer than one hour after harvesting for the FACS enrichment and kept at room temperature throughout the sorting process. For some cell types, cell survival may also be enhanced by incubating cells on ice (4°C) throughout the sorting process.
The bulk expansion of FP-positive cells provides an opportunity to evaluate the population by imaging analysis for fusion protein localization prior to generating clonal lines. While the resulting enriched population of cells may be sufficient for some studies, these populations frequently display FP signal of varying intensity. The isolated clonal lines have uniform signal (Figure 3), making them more appropriate for functional experiments12.
Clonal Cell Line Generation
Throughout the editing and clonal line generation process, it is important to monitor cell morphology. hiPSC colonies grown in feeder-free conditions should exhibit smooth edges and an even, well-packed center12,18,19. Differentiated cells should be observed in less than 5% of the culture. When picking individual colonies, choose those that exhibit good morphology. During the 96-well plate passaging events, check clones for morphology and discontinue those that have overgrown as this may lead to differentiation or be an indication of genetic instability.
Generation of clonal cell lines allows for genetic confirmation of precise editing, which is important because Cas9-induced double strand breaks in the genome are often repaired imprecisely despite incorporation of the tag at the desired locus. Previously described PCR-based assays showed that cumulatively across ten unique genomic loci many (45%) of the FP-expressing clones suffered from donor plasmid backbone integration at the targeted locus or (rarely) randomly in the genome12. Additionally, 23% of GFP-positive clones (n=177) across ten unique loci were found to harbor mutations at or near the anticipated crRNA cutting site in the untagged allele, most likely due to NHEJ12. This genetic analysis of many clonal lines (~100 clones/edit) underscored the importance of genetic validation that is not possible in a cell population since FP-expression and expected fusion protein localization alone do not guarantee precise editing12. Additionally, these PCR-based assays cannot be performed on an enriched population of cells with any certainty, warranting the need for clonal line generation before meaningful analysis can be completed. Genetic confirmation of the inserted FP tag and verification of the genetic integrity of the unedited allele (in a mono-allelic edited clone) are both necessary to ensure precise editing at the targeted locus beyond tag expression.
A low rate of bi-allelic edits and lack of off-target mutations (as assayed by Sanger and exome sequencing) have been observed to date using this method (unpublished data)12. This is consistent with previous studies describing the use of short-lasting RNP for CRISPR/Cas9 experiments26,27. The lack of clonal cell lines with bi-allelic edits may also be locus specific or due to the inability of the cell to tolerate two tagged copies of an essential protein as suggested from previously published experiments where putative bi-allelic edited cells were observed for one locus (LMNB1), but not another (TUBA1B)12. Bi-allelic fully validated clonal cell lines have been generated using this method to tag ST6 beta-galactoside alpha-2, 6-sialyltransferase 1 (ST6GAL1), and RAB5A member RAS oncogene family (RAB5A) with mEGFP19.
Beyond confirming precision of the edit in the genome, there are a variety of quality control assays that can be used to further characterize the clonal line and identify clones that fulfill all stem cell, genomic, and cell biological criteria for use in future studies. Cell biological and functional assays can be used to confirm appropriate expression, localization, and function of the fusion protein12. The comparison to unedited parental controls will help evaluate the influence of the editing process on localization, dynamics, and function. Other assays such as growth analysis and tests for genomic stability can also help determine if the tagged protein is perturbing to the cell. When using hiPSC in this protocol, evaluation of pluripotency markers and differentiation potential can be critical in determining a clone that is valuable for downstream studies12. Because extended culture of hiPSC has been shown to lead to genetic instability, monitoring the growth rate and karyotype of clonal cell lines is also important12,37. However, the final intended use of the edited cells will ultimately determine the level and breadth of quality control analysis and will vary based on the application.
The authors have nothing to disclose.
We thank Daphne Dambournet for many insightful discussions and advice on gene editing, Thao Do for illustration, Angelique Nelson for critical reading of the manuscript, and Andrew Tucker for generating the mEGFP-tagged Lamin B1 cell line. We wish to acknowledge the Stem Cells and Gene Editing and Assay Development teams at the Allen Institute for Cell Science for their contributions to the gene editing and quality control process. The WTC line that we used to create our gene-edited cell line was provided by the Bruce R. Conklin Laboratory at the Gladstone Institutes and UCSF. We thank the Allen Institute for Cell Science founder, Paul G. Allen, for his vision, encouragement, and support.
Geneious R9 | Biomatters, or similar | bioinformatics software for in silico donor plasmid design | |
TE Buffer pH 8.0 | IDT, or similar | 11-01-02-05 | |
HERAcell VIOS 160i CO2 incubator, or similar | ThermoFisher Scientific, or similar | 51030408 | |
Pipettes (1000 µL, 200 µL, 20 µL, 10 µL, 2 µL) | Rainin, or similar | ||
Pipette tips (1000 µL, 200 µL, 20 µL) | Rainin, or similar | ||
Multi-channel Pipette (200 µL) | Rainin, or similar | ||
Serological pipetes (25 mL, 10 mL, 5 mL) | Costar, or similar | ||
BRAND 8-Channel Manifold for Quiksip, Autoclavable | Millipore Sigma, or similar | BR704526-1EA | use with non-filtered pipet tips, such as Molecular BioProducts Low Retention Pipet Tips, Pure 10, below |
Molecular BioProducts Low Retention Pipet Tips, Pure 10 | Thermo Fisher, or similar | 3501-05 | |
XP2 Pipette Controller | Drummond, or similar | 4-000-501 | |
Disposable Pasteur Pipets | VWR, or similar | 53300-567 | |
Class II, Type A2 Biological Safety Cabinet | CELLGARD, or similar | NU-481 | |
Matrigel Matrix, Growth Factor Reduced | Corning | 354230 | lot tested before use with hiPSC |
DMEM/F12 (-phenol red) | Gibco | 11039-021 | cold, for diluting Matrigel 1:30 |
mTeSR1 Complete Media | StemCell Technologies | 85850 | recommended growth media for WTC hiPSC line |
Penicillin-streptomycin | Gibco | 15070-063 | |
WTC hiPSC line | Coriell | GM25256 | the hiPSC line used in this protocol is available through Coriell |
Tissue culture dish 100 mm | Falcon | 353003 | |
6-well Cell Culture Plate | CELLSTAR | 657160 | |
StemPro Accutase | Gibco | A11105-01 | |
Dulbecco's Phosphate Buffered Saline (DPBS) no calcium, no magnesium | Gibco | 14190144 | |
15 mL polystyrene conical | Sarstedt | 62.554.100 | |
Y-27632 (ROCK Inhibitor) | StemCell Technologies | 72308 | |
Edit-R CRISPR-Cas9 Synthetic crRNA, unmodified (custom sequence) | Dharmacon | Custom0247 | |
Edit-R CRISPR-Cas9 Synthetic tracrRNA | Dharmacon | U-002005-05 | |
Recombinant wild-type Streptococcus pyogenes Cas9-NLS purified protein, 40 µM | University of California-Berkeley QB3 Macrolab | ||
Custom donor plasmid (PriorityGENE) | Genewiz | donor insert design was synthesized and cloned into pUC57 backbone by Genewiz | |
DNA LoBind Tube 1.5 mL | Eppendorf | 22431021 | |
NucleoBond Xtra Maxi EF | Clontech | 740424.50 | |
Neon Transfetion System | ThermoFisher Scientific | MPK5000 | |
Neon Transfection System, 100 µL kit | ThermoFisher Scientific | MPK10096 | |
5 mL Polystyrene Round-bottom Tube with Cell Strainer Cap | Falcon | 352235 | |
15 mL High Clarity Polyproylene Conical Tube | Falcon | 352196 | |
FACSAriaIII Fusion | BD Biosciences | 656700 | |
FACSDiva software | BD Biosciences | ||
FlowJo version 10.2 | TreeStar | ||
NERL Blood Bank Saline | ThermoFisher Scientific | 8504 | used as preservative-free FACS Buffer |
Olympus SZX7 Stereo Microscope, or similar | Olympus, or similar | ||
Tissue culture plate, 96-well | Falcon | 353072 | |
96 well Cell Culture Plate, V-Bottom | CELLSTAR | 351180 | |
CryoStor CS10 | Sigma | C2874-100ML | used as cryopreservation buffer for cells in 96-well plate format |
Parafilm | Bemis | PM-996 | |
24 well Cell Culture Plate | CELLSTAR | 662160 |