This manuscript describes a genome-scale cell-based screening approach to identify extracellular receptor-ligand interactions.
Intercellular communication mediated by direct interactions between membrane-embedded cell surface receptors is crucial for the normal development and functioning of multicellular organisms. Detecting these interactions remains technically challenging, however. This manuscript describes a systematic genome-scale CRISPR/Cas9 knockout genetic screening approach that reveals cellular pathways required for specific cell surface recognition events. This assay utilizes recombinant proteins produced in a mammalian protein expression system as avid binding probes to identify interaction partners in a cell-based genetic screen. This method can be used to identify the genes necessary for cell surface interactions detected by recombinant binding probes corresponding to the ectodomains of membrane-embedded receptors. Importantly, given the genome-scale nature of this approach, it also has the advantage of not only identifying the direct receptor but also the cellular components that are required for the presentation of the receptor at the cell surface, thereby providing valuable insights into the biology of the receptor.
Extracellular interactions by cell surface receptor proteins direct important biological processes such as tissue organization, host-pathogen recognition, and immune regulation. Investigating these interactions is of interest to the wider biomedical community, because membrane receptors are actionable targets of systematically delivered therapeutics such as monoclonal antibodies. Despite their importance, studying these interactions remains technically challenging. This is mainly because membrane-embedded receptors are amphipathic, making them difficult to isolate from biological membranes for biochemical manipulation, and their interactions are typified by the weak interaction affinities (KDs in the µM-mM range)1. Consequently, many commonly used methods are unsuitable to detect this class of protein interactions1,2.
A range of methods has been developed to specifically investigate extracellular receptor-ligand interactions that take their unique biochemical properties into consideration3. A number of these approaches involve expressing the entire ectodomain of a receptor as a soluble recombinant protein in mammalian or insect cell-based systems to ensure that these proteins contain posttranslational modifications that are structurally important, such as glycans and disulfide bonds. To overcome the low-affinity binding, the ectodomains are often oligomerized to increase their binding avidity. Avid protein ectodomains have been successfully used as binding probes to identify interaction partners in direct recombinant protein-protein interaction screens4,5,6,7. While broadly successful, recombinant protein-based methods require that the ectodomain of a membrane receptor be produced as a soluble protein. Therefore, it is only generally applicable to proteins that contain a contiguous extracellular region (e.g., single-pass type I, type II, or GPI-anchored) and is not generally suitable for receptor complexes and membrane proteins that span the membrane multiple times.
Expression cloning techniques in which a library of complementary DNAs (cDNAs) is transfected into cells and tested for a gain-of-binding phenotype have also been used to identify extracellular protein-protein interactions8. The availability of large collections of cloned and sequenced cDNA expression plasmids in recent years has facilitated methods in which cell lines overexpressing cDNAs encoding cell surface receptors are screened for the binding of recombinant proteins to identify interactions9,10. The cDNA overexpression-based approaches, unlike recombinant protein-based methods, afford the possibility to identify interactions in the context of the plasma membrane. However, the success of using cDNA expression constructs depends on the cells' ability to overexpress the protein in the correctly folded form, but this often requires cellular accessory factors such as transporters, chaperones, and correct oligomeric assembly. Transfecting a single cDNA might therefore not be enough to achieve cell surface expression.
Screening techniques using cDNA constructs or recombinant protein probes are resource-intensive and require large collections of cDNA or recombinant protein libraries. Specifically designed mass spectrometry-based methods have been utilized recently to identify extracellular interactions that do not require the assembly of large libraries. However, these techniques require chemical manipulation of the cell surface, which can alter the biochemical nature of the molecules present on the surface of the cells and are currently only applicable for interactions mediated by glycosylated proteins11,12. The majority of the currently available methods also heavily focus on the interactions between proteins while largely ignoring the contribution from the membrane microenvironment, including molecules such as glycans, lipids, and cholesterol.
The recent development of highly efficient bialleleic targeting using CRISPR-based approaches has enabled genome-scale libraries of cells lacking defined genes in a single pool that can be screened in a systematic and unbiased way to identify cellular components involved in different contexts, including dissecting cellular signaling processes, identification of perturbations that confer resistance to drugs, toxins, and pathogens, and determining specificity of antibodies13,14,15,16. Here, we describe a genome-scale CRISPR-based knockout cell screening assay that provides an alternative to the current biochemical approaches to identify extracellular receptor-ligand interactions. This approach of identifying interactions mediated by membrane receptors by genetic screens is particularly suitable for researchers that have a focused interest on individual ligands because it avoids the need to compile large libraries of cDNAs or recombinant proteins.
This assay consists of three major steps: 1) Highly avid recombinant protein binding probes consisting of the extracellular regions of a receptor of interest are produced and used in fluorescence-based flow cytometry-based binding assays; 2) The binding assays are used to identify a cell line that expresses the interaction partner of the recombinant protein probe; 3) A Cas9-expressing version of the cell line that interacts with the protein of interest is produced and a genome-scale CRISPR/Cas9-based knockout screen is performed (Figure 1). In this genetic screen, binding of a recombinant protein to cell lines is used as a measurable phenotype in which cells within the knockout library that have lost the ability to bind the probe are sorted using fluorescence-based activated cell sorting (FACS) and the genes that caused the loss of the binding phenotype identified by sequencing. In principle, the genes encoding the receptor responsible for binding the avid probe and those required for its cell surface display are identified.
The first step of this protocol involves the production of avid recombinant protein probes representing the ectodomain of the membrane-bound receptors. These receptors are known to frequently retain their extracellular binding functions when their ectodomains are expressed as a recombinant soluble protein1. For a protein of interest, soluble recombinant proteins can be produced in any suitable eukaryotic protein expression system in any format provided that it can be oligomerized for increased binding avidity, and it contains tags that can be used in fluorescence-based flow cytometry-based binding assays (e.g., FLAG-tag, biotin-tag). Detailed protocols for the production of soluble ectodomains of membrane receptors using the HEK293 protein expression system, as well as different multimerization techniques and the protein expression constructs for the production of both pentameric proteins and monomeric proteins have been previously described1,17. The protocol here will describe the steps for generating fluorescent avid probes from monomeric biotinylated proteins by conjugating them to streptavidin conjugated to a fluorochrome (e.g., phycoerythrin, or PE), which can be used directly in cell-based binding assays and has the advantage of not requiring a secondary antibody for detection. General protocols for performing genome-scale screens have already been described20,21, thus the protocol mainly focus on the specifics of performing flow cytometry-based recombinant protein binding screens using the CRISPR/Cas9 knockout screening system using the Human V1 ("Yusa") library18.
1. Production and purification of biotinylated His-tagged proteins
2. Quantification and oligomerization of monomeric biotinylated protein
NOTE: To increase the binding avidity, oligomerize biotinylated monomeric proteins on tetrameric streptavidin-PE before using them in binding assays. Achieve optimal conjugation ratios of monomeric proteins and tetrameric streptavidin-PE by testing a dilution series of biotinylated monomers against a fixed concentration of streptavidin and by empirically establishing the minimum dilution at which no excess biotinylated monomers can be detected.
3. Flow cytometry-based cell binding assays
4. Determining binding contributions from heat labile epitopes and heparan sulphate sidechains
NOTE: The activity of many proteins is heat labile, so loss of binding activity following heat treatment is encouraging. It is advised to determine the contribution from negatively charged glycosaminoglycans, mainly heparan sulphate (HS), in mediating binding of the recombinant proteins. This is because the binding by HS in the cell binding assay described here can be additive rather than codependent on other receptors19. This means that the observed binding could be entirely mediated by HS side chains of cell surface proteoglycans and not by a specific receptor. Binding to HS on the cell surface is not necessarily nonspecific, but rather a property of a protein, which is useful to know before performing a full genetic screen.
5. Selection of cell lines stably expressing Cas9
NOTE: Before the cell line that binds the probe of interest can be used in CRISPR screening, it must first be engineered to express the Cas9 nuclease and a highly active clone selected19.
6. Selecting high Cas9-activity clones
NOTE: Polyclonal Cas9 can be used to successfully perform genetic screens; however, selecting a clone with high Cas9 activity improves the screening results18.
7. Generation of genome-wide CRISPR-Cas9 screening knockout library
8. Genetic screening for cell surface binding
9. Genomic DNA extraction and first PCR for gRNA enrichment
10. Second round of PCR for index barcoding and sequencing
11. Bioinformatics analysis to identify the receptor and related pathways
Sequencing data from two representative genome-scale knockout screens for the identification of the binding partner of human TNFSF9 and P. falciparum RH5 performed in NCI-SNU-1 and HEK293 cells respectively are provided (Supplementary Table 1). The binding behavior of RH5 was affected by both heparan sulphate and its known receptor BSG24 (Figure 3C), whereas TNFRSF9 specifically bound to its known receptor TNFSF9 and did not lose binding upon preincubation with soluble heparin. Protein 3 in Figure 3B represents TNFRSF9.
For both cell lines, the distribution of gRNAs in the control mutant library after 3 days (9, 14, and 16 days posttransduction) are also provided (Supplementary Table 1). The gRNA distribution revealed that the library complexity was maintained throughout the course of the experiment (Figure 5A). The genetic screen for the identification of the ligand for TNFSF9 was performed on day 14 posttransduction, whereas that for RH5 was performed day 9 posttransduction. The technical quality of the screens was assessed by examining the distribution of observed fold-changes of gRNAs targeting a reference set of nonessential genes compared to the distribution for reference set of essential genes22 (Figure 5B). In addition, pathway-level enrichment also revealed that expected essential pathways were identified and significantly enriched in the "drop-out" population when comparing the control sample to the original plasmid library. An example with day 14 NCI-SNU-1 sample is depicted in Figure 5C.
The distribution of the gRNAs in the control versus sorted population using the -test function of MAGeCK (see Supplementary Table 1 for the gene summary output from MAGeCK) was used to identify the receptor from the phenotypic screens. The modified RRA score reported by MAGeCK in the gene-level analysis is plotted against the genes ranked by p values. The RRA score in MAGeCK provides a measure in which gRNAs are ranked consistently higher than expected. In the screen for TNFRSF9, the top hit was TNFSF9, which is a known binding partner of TNFRSF9 (Figure 5D). In addition, a number of genes related to the TP53 pathway were also identified. In the case of RH5, in addition to the known receptor (BSG) and the gene required for the production of the sulfated GAGs (SLC35B2), an additional gene (SLC16A1) was also identified (Figure 5E). SLC16A1 is a chaperone required for trafficking BSG to the surface of cells25. Together, these results demonstrate the ability of the screen to identify directly interacting receptors and the cellular components required for that receptor to be expressed on the surface of the cells in a functional form.
Figure 1: Overview of the genetic screening approach to identify cell surface receptors. This assay consists of three major steps: First, recombinant proteins representing the ectodomain of cell surface receptors are expressed in a cell line that can add structurally critical posttranslational modifications such as HEK293 cells. Monomeric protein ectodomains are oligomerized by conjugating to streptavidin-PE to increase their binding avidity. Second, these avid probes are used in cellular binding assays where bright staining on the cell lines indicated by a prominent shift in PE fluorescence (in green) compared to a negative control protein (in black) demonstrates the presence of a cell surface binding partner. Third, receptor-positive Cas9-expressing cell lines are selected and genome-scale screening using gRNAs targeting the vast majority of protein-coding genes is performed. While generating mutant libraries, it is common to use 30% transduction efficiency, which is based on the Poisson distribution probability that ensures each cell receives a single gRNA such that the resultant phenotype is attributed to a specific knockout. The BFP marker expressed by the transduced cells is used to select cells containing gRNAs using FACS. Phenotypic screens are performed between 9-16 days posttransduction. On the day of the screen, the total mutant cell population is divided into two. One half is kept as the control population and the other half is selected for recombinant protein binding. The cells from the mutant library that are no longer able to bind the recombinant protein are sorted using FACS and the enrichment of gRNAs in the sorted versus control population is used to identify genes required for cell surface binding of the labeled avid probe. Steps in the protocol that require considerable time are indicated. This figure has been modified from Sharma et al.19. Please click here to view a larger version of this figure.
Figure 2: Establishing the ratios of biotinylated protein to streptavidin-PE using an ELISA-based method. An example of streptavidin-PE conjugation strategy used to generate an avid probe from a biotinylated monomeric protein. A dilution series of biotinylated monomers was incubated against a fixed concentration of streptavidin. The minimum dilution at which no excess biotinylated monomers can be detected was determined by ELISA. ELISA was performed with or without preincubating a range of protein dilutions with 10 ng of streptavidin-PE. In the presence of streptavidin-PE, the minimum dilution at which no signal was identified (circled black) and the amount of protein required for the saturation was calculated to generate a 10x stock solution with 4 µg/mL streptavidin-PE. Please click here to view a larger version of this figure.
Figure 3: Representative binding of proteins to cell lines. (A) Protein binding to cell lines had a clear increase in cell-associated fluorescence compared to the control sample. Heat treatment (80 °C for 10 min) of recombinant protein abrogated all binding back to a negative control, demonstrating that the binding behavior was dependent on correctly folded protein. (B) Different classes of protein binding behavior to cell surfaces; dependence on GAGs. From left to right, the proteins can be classified into three types: Protein type 1 only adsorbs to HS. These proteins lose their binding after preincubation with heparin concentrations over 0.2 mg/mL. Protein type 2 binds to HS in addition to a specific receptor. These proteins lose partial binding in the preblocking experiments. Protein type 3 does not bind HS. These proteins do not lose binding compared to parental lines. (C) An example of a protein (i.e., RH5) that binds to HS and a specific receptor in an additive manner. Targeting either the receptor (i.e., BSG) or enzymes required for HS synthesis (e.g., SLC35B2, EXTL3) only partially reduces the binding of RH5 to cells relative to controls. Transduced polyclonal lines can be used in such experiments to establish binding behavior. This figure has been modified from Sharma et al.19. Please click here to view a larger version of this figure.
Figure 4: Selecting clonal cell lines with high Cas9 activity. Genome-editing efficiency of both polyclonal and cloned lines of NCI-SNU-1 cell lines were assessed using the GFP-BFP reporter system, in which cell lines were transduced with viruses with a gRNA-targeting plasmid encoded GFP or without (i.e., "empty"). A schematic is depicted. Flow cytometry was used to test both BFP and GFP expression after transduction and compared to uninfected control. GFP expression was used as a proxy for Cas9 activity, whereas BFP expression marked transduced cells. The profile for uninfected and empty infected cells looked similar for all clones. Representative profiles are depicted in the left panel. All five clones of the NCI-SNU-1 cell line showed a higher loss of GFP compared to the polyclonal line (right panel), with clone 4 showing the highest efficiency with the lowest refractory population. This figure has been modified from Sharma et al.19. Please click here to view a larger version of this figure.
Figure 5: Representative results from genetic screens for the identification of the cell surface binding partners. (A) Cumulative distribution function plots comparing the gRNA abundance in the plasmid library to the mutant libraries of HEK-293-E and NCI-SNU-1 cells on day 9, 14, and 16 days posttransduction. For any given number, cumulative density function reports the percent of datapoints that were below that threshold. The small shift of the mutant cell population compared to the original plasmid population represents the depletion in a subset of gRNAs compared to the plasmid library. (B) Distribution of log-fold changes in genes that have been previously categorized as being essential (red) or nonessential (black) in the HEK293 and NCI-SNU-1 cell lines. The distribution of fold-changes for nonessential genes centered at ~0, whereas that for essential genes shifted to the left towards negative fold changes. (C) Significantly enriched pathways in genes depleted in NCI-SNU-1 mutant control population 14 days posttransduction. Expected known cell-essential pathways were identified. (D) Robust Rank Algorithm (RRA)-score for genes that were enriched in the sorted cells that had lost the ability to bind the TNFRSF9 probe. Genes were ranked according to the RRA-score. The known interaction partner TNFSF9 and genes related to the TP53 pathway (labeled in red) were identified in the screen. (E) Rank-ordered RRA-scores for genes identified from gRNA enrichment analysis required for RH5 binding to HEK293 cells (left panel). SLC35B2 and SLC16A1 were identified within a false-discovery-rate (FDR) threshold of 5%. Two additional genes in the HS biosynthesis pathway (i.e., EXTL3 and NDST1) were identified within FDR of 25%. Schematic depicting the general GAG biosynthesis pathway with the relevant genes mapped to the corresponding steps (panel 2). Genes required for the commitment to chondroitin sulphate biogenesis (i.e., CSGALNACT1/2) were not identified in the screen. This figure has been modified from Sharma et al.19. Please click here to view a larger version of this figure.
Plasmid name | Plasmid # | Use |
Protein expression construct: CD200RCD4d3+4-bio-linker-his | Addgene: 36153 | Production of recombinant Protein with CD4d3+4, biotin and 6-his tags. |
pMD2.G | Addgene: 12259 | VSV-G envelope expressing plasmid; production of lentivirus |
psPAX2 | Addgene: 12260 | Lentiviral packaging plasmid, production of lentivirus |
Cas9-construct: pKLV2-EF1a-Cas9Bsd-W | Addgene: 68343 | Production of constitutively expressing Cas9 line |
gRNA expression construct: pKLV2-U6gRNA5(BbsI)-PGKpuro2ABFP-W | Addgene: 67974 | CRISPR gRNA expression vector with an improved scaffold and puro/BFP markers |
Human Improved Genome-wide Knockout CRISPR Library | Addgene: 67989 | A gRNA library against 18,010 human genes, designed for use in lentivirus. |
GFP-BFP construct: pKLV2-U6gRNA5(gGFP)-PGKBFP2AGFP-W | Addgene: 67980 | Cas9 activity reporter with BFP and GFP. |
Empty construct: pKLV2-U6gRNA5(empty)-PGKBFP2AGFP-W | Addgene: 67979 | Cas9 activity reporter (control) with BFP and GFP. |
Table 1: Plasmids used in this approach.
Buffer name | Components |
HBS (10X) | 1.5 M NaCl and 200 mM HEPES in MiliQ water, adjust to pH 7.4 |
PBS (10X) | 80 g NaCl, 2 g KCl, 14.4 g Na2HPO4 and 2.4 g KH2PO4 in MiliQ water, adjust to pH 7.4 |
Sodium Phosphate Buffer (80mM stock) | 7.1 g Na2HPO4.2H2O, 5.55 g NaH2PO4, adjust to pH 7.4 |
His-purification binding buffer | 20 mM Sodium Phosphate Buffer, 0.5 M NaCl and 20 mM Imidazole, adjust to pH 7.4 |
His-purification elution buffer | 20 mM Sodium Phosphate Buffer, 0.5M NaCl and 400 mM Imidazole, adjust to pH 7.4 |
Diethanolamine buffer | 10% diethanolamine and 0.5 mM MgCl2 in MiliQ water, adjust to pH 9.2: |
D10 | DMEM, 1% penicillin-streptomycin (100 units/mL) and 10% heat inactivated FBS |
Table 2: Buffers required for this study.
Components | 10-cm dish | 6-well plate |
293FT cells | 70–80% confluent | 70–80% confluent |
Transfection compatible media (Opti-MEM) (Step 5.1.2) | 3 mL | 500 µL |
Transfection compatible media (Opti-MEM) (Step 5.1.4) | 5 mL | 2 mL |
Lentiviral transfer vector | 3 µg | 0.5 µg |
psPax2 (see table 1) | 7.4 µg | 1.2 µg |
pMD2.G (see table 1) | 1.6 µg | 0.25 µg |
PLUS reagent | 12 µL | 2 µL |
Lipofectamine LTX | 36 µL | 6 µL |
D10 (Step 7.1.7) | 5 mL | 1.5 mL |
D10 (Step 7.1.8 and 7.1.10) | 8 mL | 2 mL |
Table 3: Amounts and volumes of reagents for lentivirus packaging mix.
Table 4: Primer sequences for amplifying gRNA and NGS. Please click here to view this file (Right click to download).
Reagent | Volume per reaction | Master mix (x38) |
Q5 Hot Start High-Fidelity 2x | 25 μL | 950 μL |
Primer (L1/U1) mix (10 μM each) | 1 μL | 38 μL |
Genomic DNA (1 mg/mL) | 2 μL | 72 μL |
H2O | 22 μL | 1100 μL |
Total | 50 μL | 1900 μL |
Table 5: PCR for the amplification of gRNAs from high complexity samples.
Cycle number | Denature | Annealing | Extension |
1 | 98 °C, 30s | ||
2-24 | 98 °C, 10s | 61 °C, 15s | 72 °C, 20s |
25 | 72 °C, 2 min |
Table 6: PCR conditions for the first PCR.
Reagent | Volume per reaction |
KAPA HiFi HotStart ReadyMix | 25 μL |
Primer (PE1.0/index primer) mix (5 μM each) | 2μL |
First PCR product (40 pg/μL) | 5 μL |
H2O | 18 μL |
Total | 50 μL |
Table 7: PCR for the index tagging of sgRNAs from genetic screens.
Cycle number | Denature | Annealing | Extension |
1 | 98 °C, 30s | ||
2-15 | 98 °C, 10s | 66 °C, 15s | 72 °C, 20s |
16 | 72 °C, 5 min |
Table 8: PCR conditions for second PCR.
Supplementary Figure S1: A guide to drawing gates for sorting the nonbinding population. (A) An ideal protein candidate for screening should have a clear shift of binding population compared to the control population and the binding should be retained on cells lacking machinery for HS biosynthesis. A heparin blocking experiment can be used in place of testing on SLC35B2 targeted cell lines. (B) Cells lacking the surface staining from the protein ectodomain but expressing BFP fluorescence from lentiviral transduction were collected. The cells displayed are from a screen for the identification of receptor for GABBR222. This figure has been modified from Sharma et al.19. Please click here to view a larger version of this figure.
Supplementary Figure S2: Cell surface glycoprotein transcriptomics based PCA plot using RNA-seq data from over 1,000 cancer cell lines. Cell lines from Cell Model Passport27 were clustered using K-means clustering according to the FPKM values of ~1,500 cell surface glycoproteins. Representative cell lines from each cluster are labeled. Cluster 5 was entirely composed of cell lines of hematopoietic origin (also see Supplementary Table 2). Please click here to view a larger version of this figure.
Supplementary Figure S3: Essentiality scores for KEGG-annotation protein export and N-linked glycosylation genes from project scores. Adjusted Bayes-essentiality scores for ~330 cell lines (columns, not labeled) are plotted for genes of protein export and N-linked glycosylation pathway (X-axis). Scores higher than 0 represent significant depletion in the mutant population compared to the original plasmid library. The genes can be divided into three distinct clusters that represent different levels of essentiality in the cell lines. This clustering can be used to decide the day of sorting. If the screen is performed at a late time point (day 16), it is possible that genes that are known to be essential for cells (clusters 1 and 3) will not be identified. Please click here to view a larger version of this figure.
Supplementary Table 1: Raw count files for and MAGeCK software generated gene_summary files related to the representative genetic screens. Please click here to view this file (Right click to download).
Supplementary Table 2: Clustering of cell lines according to the expression of cell surface receptors. Please click here to view this file (Right click to download).
A CRISPR-based screening strategy to identify genes encoding cellular components involved in cellular recognition is described. A similar approach using CRISPR activation also provides a genetic alternative to identify directly interacting receptors of recombinant proteins without the need to generate large protein libraries26. However, one major advantage of this approach is that it is applicable to interactions mediated by surface molecules natively displayed on the cell and does not depend on the overexpression of receptors, which can influence the binding avidity of the receptor. Unlike other methods, therefore, this technique makes no assumptions regarding the biochemical nature or cell biology of the receptors and provides an opportunity to study interactions mediated by proteins that are normally difficult to study using biochemical approaches, such as very large proteins, or those that traverse the membrane multiple times or form complexes with other proteins, and molecules other than proteins such as glycans, glycolipids, and phospholipids. Given the genome-scale nature of the method, this approach also has the advantage of not only identifying the receptor but also additional cellular components that are required for the binding event, thereby providing insights into the cell biology of the receptor.
One of the major limitations of this method when using it to identify the receptor of an orphan protein is the initial requirement to first identify a cell line that binds to the protein. This is not always easy and identifying a cell line that displays a binding phenotype that is also permissive to genetic screens can be the time-limiting step for deploying this assay. Some cell lines tend to bind to more proteins than others. This is especially relevant for proteins that bind HS, because these proteins tend to bind to any cell line that displays HS side chains, irrespective of the native binding context. Additionally, we have observed that upregulation of syndecans (i.e., proteoglycans that contain HS) in cell lines leads to increased binding of HS-binding proteins26. This could be a factor to take into consideration when selecting the cell line for screening. However, also important to note is that the additive binding of HS does not interfere with the binding to a specific receptor. This means that if binding is observed, it is possible that it is mediated solely by HS because the binding mediated by HS in this assay is additive rather than codependent19. In such a scenario, the heparin blocking approach described can identify such behaviors without having the need to perform a full genetic screen.
A useful resource for choosing cell lines is Cell Model Passport, which contains genomics, transcriptomics, and culture condition information for ~1,000 cancer cell lines27. Depending on the biological context, cells can be chosen based on their expression profiles. To aid the selection of cell lines, we clustered ~1,000 cell lines in Cell Model Passport according to the expression of ~1,500 preannotated human cell surface glycoproteins28 (Supplementary Figure 2; cluster information for each cell line together with growth conditions are provided in Supplementary Table 2). When testing the binding of a protein with unknown function, it is useful to select a panel of representative cell lines from each cluster to increase the chance of covering a wide range of receptors. Given a choice, it is recommended to choose cell lines that are easy to culture and easy to transduce. As these cell lines will be used in genome-scale screening, it is preferable that they can be grown easily in large quantities and are permissive to lentiviral transduction, because it is the most commonly available method for delivery of sgRNA for CRISPR-based genetic screening in the later steps.
Generally, the phenotype selections are carried out in a single sort. However, this is determined by the brightness of the stained cell population compared to the control. Iterative rounds of selections could be adopted for scenarios in which the signal-to-noise ratio of the desired phenotype is low, or when the aim of the screen is to identify mutants that have strong phenotypes. When using an iterative selection approach for FACS-based screens, it is important to consider that the sorting process can cause cell death, mainly due to the sheer force of the sorter. Thus, not all collected cells will be represented in the next round of sorting.
Library complexity is a very important factor in performing successful genetic screens, especially for negative selection screens because the extent of depletion in these can only be determined by comparing results to what was present in the starting library. For negative selection screens, it is common to maintain libraries of 500-1,000 x complexity. Positive selection screens, however, are more robust to library sizes, because in such screens only a small number of mutants are expected to be selected for a particular phenotype. Therefore, in the positive selection screen described here, the library size can be decreased to 50-100x complexity without compromising the quality of the screen. In addition, in these screens it is also possible to use a control library for a given cell line on a given day as a "general control" for all samples sorted on the day for that given cell line. This will reduce the number of control libraries that need to be produced and sequenced.
Another important consideration for using this approach is the limitations of loss-of-function screens in identifying genes that are essential for in vitro cell growth. The timing of the screens is crucial in this regard, as the longer the mutant cells are kept in culture, the higher the likelihood that cells with mutations in essential genes become nonviable and are no longer represented in the mutant library. The recent genetic screens performed as a part of the Project Score initiative in over 300 cell lines show that multiple genes in the KEGG-annotated protein secretion and N-glycosylation pathway are often identified as being essential for a number of cell lines (Supplementary Figure 3)29. This can be taken into consideration when designing screens if the effect of genes required for proliferation and viability is to be investigated in the context of cellular recognition process. In this case, carrying out screens at an early timepoint (e.g., day 9 posttransduction) would be generally appropriate. However, if the approach is used to identify a few targets with strong size effects rather than general cellular pathways, it might be appropriate to perform screens at a later time point (e.g., day 15-16 posttransduction).
The results from the screening are very robust; in eight recombinant protein binding screens performed in the past, the cell surface receptor was the top hit in every case19. When using this approach to identify the interaction partner, one should therefore expect the receptor and the factors contributing to its presentation on the surface to be identified with a high statistical confidence. Once the screen is performed and a hit is validated using a single gRNA knockout, further follow-ups can be performed using existing biochemical methods such as AVEXIS4 and direct saturable binding of purified proteins using surface plasmon resonance. The approach described here is applicable for all proteins for which it is possible to generate a soluble recombinant binding probe.
In summary, this is a genome-scale CRISPR knockout approach to identify interactions mediated by cell surface proteins. This method is generally applicable to identify cellular pathways required for cell surface recognition in a wide range of different biological contexts, including between an organism's own cells (e.g., neural and immunological recognition), as well as between host cells and pathogen proteins. This method provides a genetic alternative to biochemical approaches designed for receptor identification, and because it does not require any prior assumptions regarding the biochemical nature or cell biology of the receptors it has great potential to make completely unexpected discoveries.
The authors have nothing to disclose.
This work was supported by the Wellcome Trust grant number 206194 awarded to GJW. We thank the Cytometry Core facility: Bee Ling Ng, Jennifer Graham, Sam Thompson, and Christopher Hall for help with FACS.
Anti-mouse alkaline phosphatase | Sigma | A4656 | |
Blasticidin | Chem-Cruz | SC-204655 | |
Blood & Cell Culture DNA Maxi Kit | Qiagen | 13362 | |
BSA | Sigma | A9647-100G | |
Diethanolamine | Sigma | 398179 | |
DMEM | Gibco | 31966-021 | |
Dneasy Blood and Tissue kit | Qiagen | 69504 | |
DynaMag-96 Side Magnet | Invitrogen | 12331D | |
HEK293T packaging cells | ATCC | CRL-3216 | |
Heparin | Sigma | H4784-1G | |
KAPA HiFi HotStart ReadyMix | Kapa | KK2602 | |
Lipofectamine LTX with PLUS reagent | Invitrogen | 15338100 | |
MoFlo XDP cell sorter | BD | ||
Ni2+-NTA agarose beads | Jena Bioscience | AC-501-25 | |
OPTI-MEM | Life Technologies | 31985-070 | |
OX-68 antibody | AbD Serotec | MCA1022R | |
p-nitrophenyl phosphate | Sigma | 1040-506 | |
PD-10 desalting columns | GE healthcare | 17085101 | |
Polybrene | Millipore | TR-1003-G | |
Polypropylene tubes with 5 mL bed volume | Qiagen | 34964 | |
Proteinase K, recombinant, PCR Grade | Roche | 3115879001 | |
Puromycin | Gibco | A11138-03 | |
Q5 Hot Start High-Fidelity 2× Master Mix | NEB | M0494L | |
QIAquick PCR purification kit | Qiagen | 28104 | |
SCFA filter | Nalgene | 190-2545 | |
Sony Cell sorter | Sony | ||
SPRI beads (Agencourt AMPure XP beads) | Beckman | A63881 | |
Streptavidin-coated microtitre plates | Nalgene | 734-1284 | |
Streptavidin-PE | Biolegend | 405204 |