Described is a proteomics workflow for identifying protein interaction partners from a nuclear subcellular fraction using immunoaffinity enrichment of a given protein of interest and label-free mass spectrometry. The workflow includes subcellular fractionation, immunoprecipitation, filter aided sample preparation, offline cleanup, mass spectrometry, and a downstream bioinformatics pipeline.
Immunoaffinity purification mass spectrometry (IP-MS) has emerged as a robust quantitative method of identifying protein-protein interactions. This publication presents a complete interaction proteomics workflow designed for identifying low abundance protein-protein interactions from the nucleus that could also be applied to other subcellular compartments. This workflow includes subcellular fractionation, immunoprecipitation, sample preparation, offline cleanup, single-shot label-free mass spectrometry, and downstream computational analysis and data visualization. Our protocol is optimized for detecting compartmentalized, low abundance interactions that are difficult to identify from whole cell lysates (e.g., transcription factor interactions in the nucleus) by immunoprecipitation of endogenous proteins from fractionated subcellular compartments. The sample preparation pipeline outlined here provides detailed instructions for the preparation of HeLa cell nuclear extract, immunoaffinity purification of endogenous bait protein, and quantitative mass spectrometry analysis. We also discuss methodological considerations for performing large-scale immunoprecipitation in mass spectrometry-based interaction profiling experiments and provide guidelines for evaluating data quality to distinguish true positive protein interactions from nonspecific interactions. This approach is demonstrated here by investigating the nuclear interactome of the CMGC kinase, DYRK1A, a low abundance protein kinase with poorly defined interactions within the nucleus.
The human proteome exhibits vast structural and biochemical diversity through the formation of stable multisubunit complexes and transient protein-protein interactions. Accordingly, the identification of interaction partners for a protein of interest is commonly required in investigations to unravel molecular mechanism. Recent advances in affinity purification protocols and the advent of high-resolution fast-scanning mass spectrometry instrumentation have enabled easy mapping of protein interaction landscapes in a single unbiased experiment.
Protein interaction protocols commonly employ ectopic expression systems with affinity-tagged fusion constructs to identify protein interactions without requiring high-quality antibodies recognizing a protein of interest1,2. However, epitope tag-based methods have several drawbacks. Physical interactions with the epitope may lead to the detection of nonspecific copurifying proteins3. Additionally, fusion of these epitope tags to the N- or C-terminal of a protein may block native protein-protein interactions, or disrupt protein folding to promote non-physiological conformations4. Furthermore, ectopic expression systems typically overexpress the bait protein at supraphysiological concentrations, which can result in the identification of artifactual protein interactions, particularly for dosage-sensitive genes5. To circumvent these issues, the endogenous bait protein can be immunoprecipitated along with associated interacting prey proteins, assuming availability of a high-quality antibody that recognizes the native protein.
Provided here is an interaction proteomics workflow for detecting the nuclear interactome of an endogenous protein using the CMGC protein kinase DYRK1A as an example. Disruption of DYRK1A copy number, activity level, or expression can cause severe intellectual disability in humans, and embryonic lethality in mice6,7,8,9. DYRK1A exhibits dynamic spatiotemporal regulation10, and compartmentalized protein interactions11,12, requiring approaches capable of detecting low abundance interaction partners specific to different subcellular compartments.
This protocol employs cellular fractionation of human HeLa cells into cytosol and nucleoplasm fractions, immunoprecipitation, sample preparation for mass spectrometry, and an overview of a bioinformatic pipeline for evaluating data quality and visualizing results, with R scripts provided for analysis and visualization (Figure 1). Proteomics software packages used in this workflow are all freely available for download or can be accessed through a web interface. For additional information on software and computational methods, in-depth tutorials and instruction are available at the links provided.
NOTE: All buffer compositions and protease mixtures are outlined in Table 1.
1. Preparation of cells
NOTE: A starting material of 1−10 mg nuclear lysate per replicate is desired for this immunoprecipitation mass spectrometry (IP-MS) approach. Cell quantities will be given for 1 mg of nuclear immunoprecipitations in triplicate plus triplicate controls.
2. Preparation of nuclear extract
NOTE: Protease and phosphatase inhibitors should be added to the fractionation buffers within 30 min of use.
3. Validation of subcellular fractionation
4. Immunoprecipitation of endogenous nuclear bait protein
NOTE: It is recommended to use low retention tubes from this point on. This will reduce the nonspecific binding to the tubes during sample handling and avoid unnecessary loss of sample. Additionally, ensure that LCMS grade H2O is used to prepare buffers for the remaining steps.
5. Sample preparation
NOTE: Insulin spiked into the immunoprecipitation elution samples aids in the recovery of proteins during trichloroacetic acid (TCA) precipitation and sample processing, which is important for low abundance endogenous bait proteins.
6. LC/MS system suitability
NOTE: Due to the small scale and generally lower abundance of protein from affinity-purified samples, it is critical that the LC/MS platform operates at a maximal sensitivity and robustness.
7. Data Processing
8. Data visualization
NOTE: There are many programs that can effectively visualize proteomics data (e.g., R, Perseus, Cytoscape, STRING-DB). Analyzing the connectivity between high-confidence hits, and functional enrichment of these interactors can be a useful strategy for prioritizing hits for further validation and functional characterization.
The majority of protein mass identified in an IP-MS experiment consists of nonspecific proteins. Thus, one of the key challenges of an IP-MS experiment is the interpretation of which proteins are high-confidence interactors vs. nonspecific interactors. To demonstrate the crucial parameters used in the evaluation of data quality the study analyzed triplicate immunoprecipitations from 5 mg of HeLa nuclear extract utilizing a bead only control. The first internal check to ensure that an IP-MS experiment is reliable is whether the bait protein ranks as one of the highest enriched proteins identified by both fold-change over control and SAINT probability. In this case, the bait DYRK1A ranked among the top three enriched proteins over the control (Figure 2A,B). In a nuclear interactome study of DYRK1A utilizing four independent antibodies, an FC-A cutoff of >3.00 and SAINT probability cutoff >0.7 provided a stringent cutoff for identification of both novel and previously validated interactors22. When applied to this experiment, a clear separation could be seen between the high-confidence interactors and >95% of copurified proteins identified as nonspecific (Figure 2A,B). Applying both a fold change enrichment and probability threshold increases stringency by requiring a consistently high enrichment of protein IDs across biological replicates.
In addition to statistical scoring, the CRAPome analysis workflow also maps previously reported interactions onto bait-prey data23. While this mapping can be useful for thresholding high and low-confidence interactions, previously reported interactions can score poorly by FC-A and SAINT probabilities, potentially indicating that many known interactions of a given bait may exist only in specific cell types, contexts, or organelles. For the example DYRK1A dataset, iREF interactor FC-A values were as low as 0.45, representing a very low enrichment over control (Figure 2C). To avoid inflation of false positives, statistical thresholding should be performed in a manner that prioritizes stringency over reduction of false negatives. It should be noted that the detection of these interactions was independent of protein abundance (Figure 2C). Calculated absolute copy number of each iREF interaction within HeLa cells showed no correlation to the detection levels of an interaction partner by IP-MS24.
Cytoscape serves as an effective tool for visualizing multiple layers of interaction data19. In the DYRK1A immunoprecipitation experiment described here, the combined use of FC-A > 3.0 and SAINT > 0.9 reduced the list of high-confidence interactors to six proteins (Figure 2D). However, when applying an FC-A cutoff of > 3.0 in isolation, eight additional proteins were added to the network. These additional protein interactors have high connectivity with the interactors already in the network, suggesting they are associated in similar complexes or functional roles. To this end, evidence from the STRING-DB of protein-protein interactions was integrated into this network as blue dashed lines20. While this single-bait, triplicate experiment provides a limited sample of the full DYRK1A interaction network, the use of additional baits, replicates, and integration of large public data sets can be used to expand the network of high-confidence interactions. The statistical cutoffs will thus be specific to each individual experiment and will need to be evaluated thoroughly.
Figure 1: Representative proteomics workflow for subcellular IP-MS. Cells are grown in either 4 L round bottom flasks or 15 cm tissue culture dishes and harvested at the same time for subcellular fractionation. Cells are fractionated into a cytosolic, nuclear, and a nuclear pellet, and immunoprecipitations are done from 1−10 mg of nuclear lysate using one or multiple antibodies recognizing the same bait. Filter aided sample prep (FASP) and offline sample cleanup are performed prior to single shot mass spectrometry. A downstream computational pipeline is used to process data into interpretable interaction data. Please click here to view a larger version of this figure.
Figure 2: Representative data for a single-bait single-antibody IP-MS experiment. (A) FC-A and SAINT probability output from CRAPome analysis workflow for an optimal experiment using a single antibody for the kinase DYRK1A (n = 3). Beads-only controls were used for comparison. Red solid lines represent cutoffs set at FC-A > 3.00 and SAINT > 0.7. (B) MaxQuant protein abundance estimates (iBAQ) output vs. log2 ratio of protein abundance in DYRK1A IP to control, colored by the adjusted p value range from empirical Bayes analysis of the label-free intensities. (C) FC-A and estimated copy number of proteins listed as interacting proteins in the iRef database23,24. (D) Cytoscape network visualization of DYRK1A interactors. Blue nodes = FC-A > 3.00, SAINT > 0.7. Orange nodes = FC-A > 3.00. Black edges = proteins identified as interactors in IPMS experiment. Blue dashed edge = SAINT interaction between prey protein (confidence > .150). Please click here to view a larger version of this figure.
Protease inhibitor (PI) mixture | |
Reagent | Final Concentration |
Sodium Metabisulfite | 1 mM |
Benzamidine | 1 mM |
Dithiothreitol (DTT) | 1 mM |
Phenylmethanesulfonyl fluoride (PMSF) | 0.25 mM |
Phosphatase Inhibitor (PhI) mixture | |
Reagent | Final Concentration |
Microcystin LR | 1 µM |
Sodium Orthovanadate | 0.1 mM |
Sodium fluoride | 5 mM |
Subcellular fractionation Buffers: | |
Buffer A pH 7.9 | |
Reagent | Final Concentration |
HEPES | 10 mM |
MgCl2 | 1.5 mM |
KCl | 10 mM |
Buffer B pH 7.9 | |
Reagent | Final Concentration |
HEPES | 20 mM |
MgCl2 | 1.5 mM |
NaCl | 420 mM |
Ethylenediaminetetraacetic acid (EDTA) | 0.4 mM |
Glycerol | 25% (v/v) |
Buffer C pH 7.9 | |
Reagent | Final Concentration |
HEPES | 20 mM |
MgCl2 | 2 mM |
KCl | 100 mM |
Ethylenediaminetetraacetic acid (EDTA) | 0.4 mM |
Glycerol | 20% (v/v) |
Immunoprecipitation Buffers: | |
IP Buffer 1 | |
Reagent | Final Concentration |
HEPES | 20 mM |
KCl | 150 mM |
EDTA | 0.1 mM |
NP-40 | 0.1% (v/v) |
Glycerol | 10% (v/v) |
IP Buffer 2 | |
Reagent | Final Concentration |
HEPES | 20 mM |
KCl | 500 mM |
EDTA | 0.1 mM |
NP-40 | 0.1% (v/v) |
Glycerol | 10% (v/v) |
SDS Alkylation Buffer pH 8.5 | |
Reagent | Final Concentration |
SDS | 4% (v/v) |
Chloroacetamide | 40 mM |
TCEP | 10 mM |
Tris | 100 mM |
UA pH 8.5 | |
Reagent | Final Concentration |
Urea | 8 M |
Tris | 0.1 M |
* use HPLC grade H2O |
Table 1: Buffer compositions
Supplemental Coding Files. Please click here to download this file.
The proteomics workflow outlined here provides an effective method for identifying high-confidence protein interactors for a protein of interest. This approach decreases the sample complexity through subcellular fraction and focuses on increasing the identification interaction partners through robust sample preparation, offline sample clean up, and stringent quality control of the LC-MS system. The downstream data analysis described here allows for a simple statistical evaluation of the proteins identified as copurifying with the bait. However, due to a high number of experimental variables (scale, cell line, antibody choice), each experiment requires different cutoffs and considerations regarding data visualization and enrichment.
The first design consideration in an IP-MS experiment is the selection of antibodies that will be used for copurification of the protein of interest along with its interacting partners. While the availability of commercial antibodies has expanded to cover larger portions of the human proteome over the past several decades, there are still many proteins for which reagents are limited. Furthermore, antibodies that have been validated for applications such as western blot detection may be incapable of selective enrichment of the target protein in an immunoprecipitation experiment. Prior to conducting a large-scale interaction proteomics experiment, it is suggested to complete an IP from a 90% confluent 10 cm dish, or equivalent cell number, and probe for the target protein of interest by western blotting. If more than a single antibody is available for immunoprecipitation, it is additionally suggested to select multiple antibodies recognizing epitopes within different portions of the protein. The binding of an antibody to a bait protein can occlude the necessary binding interface for putative interacting partners. Selection of a secondary epitope for the bait protein will increase the coverage of the interaction profile identified by a mass spectrometry-based experiment.
A second major consideration lies in the selection of the appropriate control for distinguishing high-confidence interactions from low-confidence or nonspecific interactions from those identified as copurifying with the bait. The most stringent control for an IP-MS experiment is to complete the immunoprecipitation from a CRISPR KO cell line of the bait. Such a control enables identification and filtering out of nonspecific proteins that bind directly to the antibody rather than the bait protein. In cases where generating a CRISPR KO cell line of each bait protein is not feasible, an IgG-bead control of the same isotype of the bait antibody can be used. In experiments employing a panel of antibodies representing multiple species, the use of a beads only control can be appropriate but will increase the rate of false positives identified as high-confidence interactors.
Selection of the cell line used in an IP-MS experiment is dependent on several key factors. Protein expression and localization are largely dependent on cell type. While RNA expression estimates can be found for most genes in many commonly used cell lines, protein expression is poorly correlated with RNA expression and must be determined experimentally25. Cell lines in which a bait protein is expressed in very low copy number should be avoided to circumvent problems associated with drastic increases in cell culture scale that may be required. It should be noted, however, that sample preparation can be optimized for the detection of very low abundance proteins. The filter aided sample prep (FASP) method, while robust, can cause a more than 50% loss of peptide in a sample. The Single-Pot Solid-Phase-enhanced Sample Preparation (SP3) is an efficient method of generating samples for mass spectrometry analysis that minimizes sample loss26. The increased recovery enabled by the SP3 method of sample preparation can be a useful alternative in this workflow for quantification of proteins that fall near the limit of detection.
This proteomics workflow has been applied across many nuclear baits, including kinases, E3 ubiquitin ligases, and scaffolding members of multisubunit complexes. Assuming proper validation of antibody reagents, successful execution of this workflow will result in detection of high-confidence protein nuclear interaction partners for a protein of interest.
The authors have nothing to disclose.
This work was supported by a Grand Challenge grant to W.M.O. from the Linda Crnic Institute for Down syndrome and by a DARPA cooperative agreement 13-34-RTA-FP-007. We would like to thank Jesse Kurland and Kira Cozzolino for their contributions in reading and commenting on the manuscript.
0.25% Trypsin, 0.1% EDTA | Thermo Fisher Scientific | 25200056 | |
1.5 ml low-rention microcentrifuge tubes | Fisher Scientific | 02-681-320 | |
4-20% Mini PROTEAN TGX Precast Protein Gels | Bio-Rad | 4561096 | |
acetone (HPLC) | Thermo Fisher Scientific | A949SK-4 | |
Amicon Ultra 0.5 ml 30k filter column | Millipore Sigma | UFC503096 | |
Benzamidine | Sigma-Aldrich | 12072 | |
benzonase | Sigma-Aldrich | E1014 | |
Chloroacetamide | Sigma-Aldrich | C0267 | |
Dialysis tubing closure | Caroline Biological Supply Company | 684239 | |
DTT | Sigma-Aldrich | 10197777001 | |
EDTA | Sigma-Aldrich | EDS | |
GAPDH antibody | Santa Cruz Biotechnology | Sc-47724 | |
Glycerol | Fisher Scientific | 887845 | |
Glycine | Sigma-Aldrich | G8898 | |
HeLa QC tryptic digest | Pierce | 88329 | |
HEPES | Fisher Scientific | AAJ1692630 | |
insulin | Thermo Fisher Scientific | 12585014 | |
iodoacetamide | Sigma-Aldrich | I1149 | |
KONTES Dounce homogenizer 7 ml | VWR | KT885300-0007 | |
Large Clearance pestle 7ml | VWR | KT885301-0007 | |
Lysyl endopeptidase C | VWR | 125-05061 | |
Magnesium Chloride | Sigma-Aldrich | 208337 | |
Microcystin | enzo life sciences | ALX-350-012-C100 | |
Nonidet P 40 Substitute solution | Sigma-Aldrich | 98379 | |
p84 antibody | GeneTex | GTX70220 | |
Phosphate Buffered Saline | |||
Pierce BCA Protein Assay Kit | Thermo Fisher Scientific | 23227 | |
Pierce BSA Protein Digest, MS grade | Thermo Fisher Scientific | 88341 | LCMS QC |
Pierce C18 spin columns | Thermo Fisher Scientific | PI-89873 | |
Pierce Trypsin Protease, MS Grade | Thermo Fisher Scientific | 90057 | For mass spectrometry sample prep |
PMSF | Sigma-Aldrich | P7626 | |
Potassium Chloride | Sigma-Aldrich | P9541 | |
Protein A Sepharose CL-4B | GE Healthcare Bio-Sciences | 17-0780-01 | |
Protein G Sepharose 4 Fast Flow | GE Healthcare Bio-Sciences | 17-0618-01 | |
SDS | Sigma-Aldrich | L3771 | |
Silica emitter tip | Pico TIP | FS360-20-10 | |
Small Clearance pestle 7ml | VWR | KT885302-0007 | |
Sodium Chloride | Sigma-Aldrich | S3014 | |
Sodium Fluoride | Sigma-Aldrich | 201154 | |
Sodium metabisulfite | Sigma-Aldrich | 31448 | |
Sodium orthovanadate | Sigma-Aldrich | S6508 | |
Spectra/ Por 8 kDa 24 mm dialysis tubing | Thomas Scientific | 3787K17 | |
TC Dish 150, Standard | Sarstedt | 83.3903 | Tissue culture dish for adherent cells |
TCA | Sigma-Aldrich | T9159 | |
TCEP | Thermo Scientific | PG82080 | |
TFA | Thermo Fisher Scientific | 28904 | |
Thermo Scientific Orbitrap Fusion MS | Thermo Fisher Scientific | ||
Trizma Base | Sigma-Aldrich | T6066 | |
Urea | Thermo Fisher Scientific | 29700 | |
Waters ACQUITY M-Class UPLC | Waters | ||
Waters ACQUITY UPLC M-Class Column Reversed-Phase 1.7µm Spherical Hybrid (1.7 µm, 75 µm x 250 mm) | Waters | 186007484 | nanoflow C18 column |