A detailed protocol is described for the separation, identification, and characterization of proteoforms in protein samples using capillary zone electrophoresis-electrospray ionization-tandem mass spectrometry (CZE-ESI-MS/MS). The protocol can be used for the high-resolution characterization of proteoforms in simple protein samples and the large-scale identification of proteoforms in complex proteome samples.
Capillary zone electrophoresis-electrospray ionization-tandem mass spectrometry (CZE-ESI-MS/MS) has been recognized as a useful tool for top-down proteomics that aims to characterize proteoforms in complex proteomes. However, the application of CZE-MS/MS for large-scale top-down proteomics has been impeded by the low sample-loading capacity and narrow separation window of CZE. Here, a protocol is described using CZE-MS/MS with a microliter-scale sample-loading volume and a 90-min separation window for large-scale top-down proteomics. The CZE-MS/MS platform is based on a linear polyacrylamide (LPA)-coated separation capillary with extremely low electroosmotic flow, a dynamic pH-junction-based online sample concentration method with a high efficiency for protein stacking, an electro-kinetically pumped sheath flow CE-MS interface with extremely high sensitivity, and an ion trap mass spectrometer with high mass resolution and scan speed. The platform can be used for the high-resolution characterization of simple intact protein samples and the large-scale characterization of proteoforms in various complex proteomes. As an example, a highly efficient separation of a standard protein mixture and a highly sensitive detection of many impurities using the platform is demonstrated. As another example, this platform can produce over 500 proteoform and 190 protein identifications from an Escherichia coli proteome in a single CZE-MS/MS run.
Top-down proteomics (TDP) aims for the large-scale characterization of proteoforms within a proteome. TDP relies on the effective liquid-phase separation of intact proteins before electrospray ionization-tandem mass spectrometry (ESI-MS/MS) analysis due to the high complexity and large concentration dynamic range of the proteome1,2,3,4,5. Capillary zone electrophoresis (CZE) is a powerful technique for the separation of biomolecules based on their size-to-charge ratios6. CZE is relatively simple, requiring only an open tubular-fused silica capillary, a background electrolyte (BGE), and a power supply. A sample of intact proteins can be loaded into the capillary using pressure or voltage, and separation is initiated by immersing both ends of the capillary in the BGE and applying a high voltage. CZE can approach ultra-high separation efficiency (> one million theoretical plates) for the separation of biomolecules7. CZE-MS has a drastically higher sensitivity than widely used reversed-phase liquid chromatography (RPLC)-MS for the analysis of intact proteins8. Although CZE-MS has a great potential for large-scale top-down proteomics, its wide application in proteomics has been impeded by several issues, including a low sample-loading capacity and narrow separation window. The typical sample loading volume in CZE is about 1% of the total capillary volume, which usually corresponds to less than 100 nL9,10,11. The separation window of CZE is usually less than 30 min due to the strong electroosmotic flow (EOF)9,10. These issues limit the CZE-MS/MS for the identification of a large number of proteoforms and low abundant proteoforms from a complex proteome.
Much effort has been made to improve the sample loading volume of CZE via online sample concentration methods (e.g., solid-phase microextraction [SPME]12,13, field-enhanced sample stacking [FESS]9,11,14, and dynamic pH junction15,16,17,18). FESS and dynamic pH junction are simpler than SPME, only requiring a significant difference between the sample buffer and the BGE in conductivity and pH. FESS employs a sample buffer with much lower conductivity than the BGE, leading to a stacking of analytes on the boundary between the sample zone and the BGE zone in the capillary. Dynamic pH junction utilizes a basic sample plug (e.g., 50 mM ammonium bicarbonate, pH 8) and an acidic BGE (e.g., 5% [v/v] acetic acid, pH 2.4) on both sides of the sample plug. Upon application of a high positive voltage at the injection end of the capillary, titration of the basic sample plug occurs, focusing the analytes into a tight plug before undergoing a CZE separation. Recently, the Sun group systematically compared FESS and dynamic pH junction for the online stacking of intact proteins, demonstrating that dynamic pH junction could produce much better performance than FESS for the online concentration of intact proteins when the sample injection volume was 25% of the total capillary volume19.
Neutrally coated separation capillaries (e.g., linear polyacrylamide [LPA]) have been employed to reduce the EOF in the capillary, slowing down the CZE separation and widening the separation window20,21. Recently, the Dovichi group developed a simple procedure for the preparation of stable LPA coating on the inner wall of capillaries, utilizing ammonium persulfate (APS) as the initiator and temperature (50 °C) for free radical production and polymerization22. Very recently, the Sun group employed the LPA-coated separation capillary and the dynamic pH junction method for the CZE separation of intact proteins, reaching a microliter-scale sample loading volume and a 90-min separation window19. This CZE system opens the door to using CZE-MS/MS for large-scale top-down proteomics.
CZE-MS requires a highly robust and sensitive interface to couple CZE to MS. Three CE-MS interfaces have been well developed and commercialized in the history of CE-MS, and they are the co-axial sheath-flow interface23, the sheathless interface using a porous tip as the ESI emitter24, and the electro-kinetically pumped sheath flow interface25,26. The electro-kinetically pumped sheath-flow-interface-based CZE-MS/MS has reached a low zeptomole peptide detection limit9, over 10,000 peptide identifications (IDs) from the HeLa cell proteome in a single run14, a fast characterization of intact proteins11, and highly stable and reproducible analyses of biomolecules26. Recently, the LPA-coated separation capillary, the dynamic pH junction method, and the electro-kinetically pumped sheath flow interface were used for large-scale top-down proteomics of an Escherichia coli (E. coli) proteome19,27. The CZE-MS/MS platform approached over 500 proteoform IDs in a single run19 and nearly 6,000 proteoform IDs via coupling with size-exclusion chromatography (SEC)-RPLC fractionation27. The results clearly show the capability of CZE-MS/MS for large-scale top-down proteomics.
Herein, a detailed procedure of using CZE-MS/MS for large-scale top-down proteomics is described. The CZE-MS/MS system employs the LPA-coated capillary to reduce the EOF in the capillary, the dynamic pH junction method for the online concentration of proteins, the electro-kinetically pumped sheath flow interface for coupling CZE to MS, an orbitrap mass spectrometer for the collection of MS and MS/MS spectra of proteins, and a TopPIC (TOP-Down Mass Spectrometry-Based Proteoform Identification and Characterization) software for proteoform ID via database search.
1. Preparation of LPA Coating on the Inner Wall of the Separation Capillary
2. Etching of the Capillary with Hydrofluoric Acid
CAUTION: Use appropriate safety procedures while handling hydrofluoric acid (HF) solutions. All the HF-related operations need to be done in a chemical hood. Before any HF-related operation, make sure that 2.5% calcium gluconate gel is available for use in the case of exposure. Double gloves are required, a typical nitrile glove inside and a heavy neoprene glove outside. Wear a lab coat and chemical safety goggles. After the HF operations, keep liquid and solid hazardous waste separate. The liquid HF waste must be neutralized immediately with a high-concentration sodium hydroxide solution for temporary storage before waste pick-up. The solid HF waste needs to be temporarily stored in a plastic container that is lined with two thick plastic one-gallon Ziploc bags and a lid. Both the solid and liquid waste must be labeled properly.
3. Preparation of the Samples
4. Set-up of the CZE-MS/MS System and Analysis of the Samples
5. Database Search of the Collected Raw Files with the TopPIC Software
Figure 1 shows a diagram of the dynamic pH-junction-based CZE-ESI-MS system used in the experiment. A long plug of the sample in a basic buffer is injected into an LPA-coated separation capillary filled with an acidic BGE. After applying high voltages I and II, the analytes in the sample zone will be concentrated via the dynamic pH junction method. To evaluate the performance of the CZE-MS system, a standard protein mixture (cytochrome c, lysozyme, β-casein, myoglobin, CA, and BSA) is typically analyzed. The representative electropherogram for the standard protein mixture is shown in Figure 2A. The standard protein mixture is typically run at least in duplicate to evaluate the separation efficiency and the reproducibility of the system. The separation efficiency can be evaluated with the number of theoretical plates of some proteins, as shown in Figure 2B. The reproducibility can be evaluated by the relative standard deviations of protein intensity and migration time. Figure 3A shows a zoomed-in view of an electropherogram of the E. coli protein sample analyzed by the dynamic pH-junction-based CZE-MS/MS. The normalized-level (NL) protein intensity should be on the scale of 108 if 1 µg of E. coli proteins are loaded for the analysis with a quadrupole ion trap mass spectrometer. A zoomed-in view of the electropherogram can be used to assess the separation window of the system. In this case, the separation window is 80 – 90 min. Figure 3B shows an example PrSM, including the general corresponding proteoform information, protein sequence, observed fragmentation pattern, and modifications. The very low E-Value (2.11E-48) and Spectral FDR (0) suggest the high confidence of the proteoform ID. The high number of matched fragment ions (60) further indicates the high confidence of the ID. The observed fragmentation pattern shows that the fragmentation of the proteoform is highly efficient covering the termini and middle part of the proteoform. The N-terminal cleavage of three amino acids (MTM) is determined through the database search.
Figure 1: Diagram of the dynamic pH-junction-based CZE-ESI-MS system. The sample is dissolved in 50 mM ammonium bicarbonate, pH 8. The BGE is 5% or 10% (v/v) acetic acid, pH ~2.4 or ~2.2. A 1-m LPA-coated capillary is used for separation. An electro-kinetically pumped sheath flow interface is used to couple CZE to MS. A quadrupole ion trap mass spectrometer is used. The high voltage I (HV I) is provided by the power supply integrated into the CE autosampler for separation. High voltage II (HV II) is provided by a separate power supply for electrospray. The mass spectrometer is grounded. The distance between the orifice of the emitter and the entrance of the mass spectrometer is ~2 mm. The distance between the end of the etched capillary in the emitter and the orifice of the emitter is less than 500 µm. The size of the orifice of the electrospray emitter is 20 – 40 µm. Please click here to view a larger version of this figure.
Figure 2: Data of a standard protein mixture analyzed by the dynamic pH-junction-based CZE-MS. (A) This panel shows a base peak electropherogram. (B) This panel shows the number of theoretical plates (N) of three proteins. The sample injection volume was 500 nL. HV I was 30 kV for separation, and HV II was 2.2 kV for electrospray. The BGE was 5% (v/v) acetic acid, pH 2.4. The sample was in 50 mM ammonium bicarbonate (pH 8) and contains cytochrome c (0.01 mg/mL), lysozyme (0.01 mg/mL), β-casein (0.04 mg/mL), myoglobin (0.01 mg/mL), carbonic anhydrase (CA, 0.05 mg/mL), and bovine serum albumin (BSA, 0.1 mg/mL). No MS/MS spectra were acquired for the standard protein mixture sample. Please click here to view a larger version of this figure.
Figure 3: Data of the E.coli proteomeanalyzed by the dynamic pH-junction-based CZE-MS/MS. (A) This panel shows a zoomed-in view of the based peak electropherogram of the E. coli protein sample. (B) This panel shows an example PrSM identified through the TopPIC database search of the acquired MS/MS spectra. The general corresponding proteoform information, protein sequence, observed fragmentation pattern, and modifications are presented. The matched fragment ions from individual PrSM can also be viewed if needed. HV I was 20 kV for separation, and HV II was 2.2 kV for electrospray. The BGE was 10% (v/v) acetic acid, pH ~2.2. The sample was in 50 mM ammonium bicarbonate (pH 8) and the protein concentration was 2 mg/mL. The sample injection volume was 500 nL. A top8 DDA method was used to acquire the data. Please click here to view a larger version of this figure.
Here we provide a detailed protocol to use CZE-MS/MS forthe high-resolution characterization of proteoforms in simple protein samples and for the large-scale identification of proteoforms in complex proteome samples. A diagram of the CZE-ESI-MS/MS system is shown in Figure 1. There are four critical steps in the protocol. First, the preparation of high-quality LPA coating on the inner wall of the separation capillary is extremely important. An LPA-coated separation capillary can reduce the EOF in the capillary, widen the separation window of CZE, and reduce protein adsorption on its inner wall19. The polymerization reaction for making the LPA coating is performed via filling the capillary with a mixture of acrylamide (monomer) and ammonium persulfate (polymerization initiator), followed by heating the capillary in a water bath to trigger the reaction22. The timing in the water bath is critical. Too short a reaction time will lead to an incomplete reaction and poor coating. We typically allow the reaction to goup to 50 minutes, allowing the reaction to proceed for a longer period for a better coating. With longer reaction periods, the capillary may become blocked and an HPLC pump might need to be used to push out the polymer inside of the capillary with water. One LPA-coated capillary can be continuously used for about one week, based on the literature data and our experience22.
The second critical step is coupling CZE to MS with the electro-kinetically pumped sheath flow CE-MS interface25,26. The distance between the end of the separation capillary in the ESI emitter and the emitter orifice affects the sensitivity of CZE-MS significantly, and a shorter distance produces a higher sensitivity25. The end of the separation capillary needs to be etched with HF, to reduce its outer diameter from 360 µm to less than 100 µm, allowing the end of the capillary to be pushed close to the emitter orifice. The distance between the emitter orifice and mass spectrometer entrance has been optimized to reach the best sensitivity and good stability26. We typically keep the distance around 2 mm.
The third critical step is the CZE-MS and MS/MS analysis with the dynamic pH-junction-based online sample stacking19. The pH of the sample buffer and the BGE need to be significantly different in order to ensure efficient sample stacking. The protein concentration in the sample needs to be appropriate. If the protein concentration is too high, the proteins can be precipitated in the capillary during sample stacking. The protein concentrations of the samples used for Figures 2 and 3 can be considered as typical examples. The last critical step is the database search with TopPIC for proteoform IDs30. Multiple steps are required for file conversions before the database search with TopPIC can be performed. A proper FDR filter is necessary to ensure the quality of the identified proteoforms. We typically use a 1% spectrum-level FDR to filter the database search results. We note that the top-down proteomics initiative is a very good resource for top-down proteomics methods and data analysis software32,33.
We must note that some parts of the protocol may need to be modified slightly and some troubleshooting may be required. The time of polymerization in the water bath for the preparation of the LPA coating may vary at different labs. The time for HF etching of the separation capillary may need to be slightly modified due to a different temperature in different labs. When setting up the CE-MS interface, make sure the electrospray is stable before threading the separation capillary into the electrospray emitter. If no electrospray or an unstable electrospray is observed, check the interface to make sure there are no big bubbles in the emitter, in the T, or in the tubing that used to connect the sheath buffer vial and the T. In addition, the distance between the orifice of the emitter and the mass spectrometer entrance can affect the electrospray stability and is usually about 2 mm. The electrospray voltage can also influence the spray stability. For 20- to 40-µm emitters, 2 – 2.2 kV is usually good enough. As described in the previous paragraph, the protein concentration in the sample needs to be appropriate. If the protein concentration is too high, the proteins can be precipitated in the capillary during sample stacking. Pay attention to the current profile on the CE autosampler computer. If the current is zero at the very beginning of the CZE run, it means that a plug of air is injected into the capillary during the sample injection step. Make sure the volume of the sample in the vial is large enough. If the current is normal at the beginning and suddenly becomes zero during the run, it usually means that the protein concentration is too high.
CZE-MS/MS based on this protocol enables the high-resolution characterization of simple protein samples and the large-scale top-down proteomics of complex proteomes. As an example, the CZE-MS approached the high-resolution characterization of a standard protein mixture containing six proteins19. As shown in Figure 2, the CZE-MS clearly separated the six proteins with a high separation efficiency, revealed three forms of β-casein, and detected many impurities. As another example, single-shot CZE-MS/MS reproducibly yielded over 500 proteoform IDs and 190 protein IDs from the E. coli proteome, using only 1 µg of E. coli proteins with a 1% spectrum-level FDR19. The CZE-MS/MS improved the number of proteoform IDs from a complex proteome by at least three times, compared with previous single-shot CZE-MS/MS studies. As shown in Figure 3A, the CZE-MS/MS simultaneously reached a 500-nL sample-loading volume and a 90-min separation window for the analysis of the E. coli proteome19. The TopPIC software can provide comprehensive information about the identified proteoforms (Figure 3B). The CZE-MS/MS system provides the proteomics community with a useful tool for large-scale and high-resolution top-down proteomics.
CZE-MS/MS still has some limitations for deep top-down proteomics. Although we demonstrated that the CZE-MS/MS can reach over 500proteoform IDs from the E. coli proteome in a single run, the number of proteoform IDs from single-shot CZE-MS/MS is only roughly 50% of that from state-of-the-art RPLC-MS/MS34,35. In this protocol, only HCD is used for protein fragmentation, leading to limited fragmentation coverages of identified proteoforms. Several improvements can be made to the CZE-MS/MS protocol to improve both the number and quality of the proteoform IDs from single-shot CZE-MS/MS. First, increasing the length of the separation capillary should provide a wider separation window, leading to more proteoform IDs. Second, using a mass spectrometer with a much higher resolving power, faster scan rate, and multiple fragmentation methods (e.g., HCD,36 electron transfer dissociation [ETD],37 and ultraviolet photodissociation [UVPD]38) will certainly improve both the number of proteoform IDs and the fragmentation coverages of identified proteoforms.
The authors have nothing to disclose.
The authors thank Heedeok Hong’s group at the Department of Chemistry, Michigan State University, for kindly providing the Escherichia coli cells for the experiments. The authors thank the support from the National Institute of General Medical Sciences, the National Institutes of Health (NIH) through Grant R01GM118470 (to X. Liu) and Grant R01GM125991 (to L. Sun and X. Liu).
Fused silica capillary | Polymicro Technologies | 1068150017 | 50 µm i.d. 360 µm o.d. |
Sodium hydroxide pellets | Macron Fine Chemicals | 7708-10 | Corrosive |
LC-MS grade water | Fisher Scientific | W6-1 | |
Hydrochloric acid | Fisher Scientific | SA48-1 | Corrosive |
Methanol | Fisher Scientific | A456-4 | Toxic, Health Hazard |
3-(Trimethoxysilyl)propyl methacrylate | Sigma-Aldrich | M6514 | Moisture and heat sensitive |
Hydrofluoric acid | Acros Organics | 423805000 | Highy toxic |
Acrylamide | Acros Organics | 164855000 | Toxic, health hazard |
Ammonium persulfate | Sigma-Aldrich | A3678 | Health hazard, Oxidizer |
lysozyme | Sigma-Aldrich | L6876 | |
Cytochrome C | Sigma-Aldrich | C7752 | |
Myoglobin | Sigma-Aldrich | M1882 | |
ß-casein | Sgma-Aldrich | C6905 | |
Carbonic anhydrase | Sigma-Aldrich | C3934 | |
Bovine serum albumin | Sigma-Aldrich | A2153 | |
Urea | Alfa Aesar | 36428-36 | |
DL-Dithiothreitol | Sigma-Aldrich | D0632 | Health Hazard |
Iodoacetamide | Fisher Scientific | AC122270250 | Health Hazard |
Formic Acid | Fisher Scientific | A117-50 | Corrosive, Health Hazard |
C4 trap column | Sepax Technologies | 110043-4001C | 3 µm particles, 300 Å pores, 4.0 mm i.d. 10 mm long |
Acetonitrile | Fisher Scientific | A998SK-4 | Toxic, Oxidizer |
Ammonium bicarbonate | Sigma-Aldrich | 1066-33-7 | |
Nalgene rapid-flow filters | Thermo Scientific | 126-0020 | 0.2 µm CN membrane, and 50 mm diameter |
E. coli cells | K-12 MG1655 | ||
Dulbecco's phosphate-buffered saline | Sigma-Aldrich | D8537 | |
BCA assay | Thermo Scientific | 23250 | |
Acetone | Fisher Scientific | A11-1 | |
HPLC system for protein desalting | Agilient | 1260 Infinity II | |
Acetic Acid | Fisher Scientific | A38-212 | |
CE autosampler | CMP Scientific | ECE-001 | |
Electro-kinetically pumped sheath flow interface | CMP Scientific | ||
Q Exactive HF Hybrid Quadrupole-Orbitrap Mass Spectrometer | Thermo Fisher Scientific | ||
Sutter flaming/brown micropipette puller | Sutter Instruments | P-1000 | |
Ultrasonic cell disruptor for cell lysis | Branson | 101063196 | Model S-250A |
Vaccum concentrator | Thermo Fisher Scientific | SPD131DDA-115 |