Targeted cross-linking mass spectrometry creates quaternary protein structure models using mass spectrometry data acquired using up to three different acquisition protocols. When executed as a simplified workflow on the Cheetah-MS web server, the results are reported in a Jupyter Notebook. Here, we demonstrate the technical aspects of how the Jupyter Notebook can be extended for a more in-depth analysis.
Protein-protein interactions can be challenging to study yet provide insights into how biological systems function. Targeted cross-linking mass spectrometry (TX-MS), a method combining quaternary protein structure modeling and chemical cross-linking mass spectrometry, creates high-accuracy structure models using data obtained from complex, unfractionated samples. This removes one of the major obstacles to protein complex structure analysis because the proteins of interest no longer need to be purified in large quantities. Cheetah-MS web server was developed to make the simplified version of the protocol more accessible to the community. Considering the tandem MS/MS data, Cheetah-MS generates a Jupyter Notebook, a graphical report summarizing the most important analysis results. Extending the Jupyter Notebook can yield more in-depth insights and better understand the model and the mass spectrometry data supporting it. The technical protocol presented here demonstrates some of the most common extensions and explains what information can be obtained. It contains blocks to help analyze tandem MS/MS acquisition data and the overall impact of the detected XLs on the reported quaternary models. The result of such analyses can be applied to structural models that are embedded in the notebook using NGLView.
Protein-protein interactions underpin the structure and function of biological systems. Having access to quaternary structures of proteins can provide insights into how two or more proteins interact to form high-order structures. Unfortunately, obtaining quaternary structures remains challenging; this is reflected in the comparatively small number of Protein DataBank (PDB) entries1 containing more than one polypeptide. Protein-protein interactions can be studied with technologies such as X-ray crystallography, NMR, and cryo-EM, but obtaining a sufficient amount of purified protein under conditions where the methods can be applied can be time-consuming.
Chemical cross-linking mass spectrometry was developed to obtain experimental data on protein-protein interactions with fewer restrictions on sample preparation as mass spectrometry can be used to acquire data on arbitrarily complex samples2,3,4,5,6,7,8,9. However, the combinatorial nature of the data analysis and the relatively small number of cross-linked peptides require that the samples be fractionated before analysis. To address this shortcoming, we developed TX-MS, a method that combines computational modeling with chemical cross-linking mass spectrometry10. TX-MS can be used on arbitrarily complex samples and is significantly more sensitive compared to previous methods10. It accomplishes this by scoring all data associated with a given protein-protein interaction as a set instead of interpreting each MS spectrum independently. TX-MS also uses up to three different MS acquisition protocols: high-resolution MS1 (hrMS1), data-dependent acquisition (DDA), and data-independent acquisition (DIA), further providing opportunities to identify a cross-linked peptide by combining multiple observations. The TX-MS computational workflow is complex for several reasons. First, it relies on multiple MS analysis software programs11,12,13 to create protein structure models14,15. Second, the amount of data can be considerable. Third, the modeling step can consume significant amounts of computer processing power.
Consequently, TX-MS is best used as an automated, simplified computational workflow through Cheetah-MS web server16 that runs on large computational infrastructures such as computer clouds or clusters. To facilitate the interpretation of the results, we produced an interactive Jupyter Notebook17. Here, we demonstrate how the Jupyter Notebook report can be extended to yield a more in-depth analysis of a given result.
1. Submit workflow at https://txms.org.
2. Run Cheetah-MS.
NOTE: Convert the vendor-specific formats to mzML or MGF using the ProteoWizard MSConvert software19.
3. Install JupyterHub.
4. Download the report.
5. Extend the report.
TX-MS provides structural outputs supported by MS-derived experimental constraints. It works by combining different MS data acquisition types with computational modeling. Therefore, it is helpful to parse each MS data separately and provide visualization of the output structure. Supplementary Data 1 contains an example notebook that can parse DDA and DIA data produced as TX-MS output. Users can select the XL of interest. By running the notebook, the MS2 spectrum of that XL will be shown where different colors help to discriminate between fragments related to the first peptide, second peptide, and the combinatorial fragment ions. The XL can also be mapped to the structure using the NGLView widget embedded in a Jupyter Notebook.
Another cell in this notebook can help users to parse and visualize DIA data. However, visualizing DIA data is more difficult because the analyzed data need to be prepared in the correct format.
Figure 1 shows an example structure of M1 and albumin with top XLs mapped on the structure. TX-MS obtained all XLs after parsing hrMS1, DDA, and DIA data, and the RosettaDock protocol provided the computational models.
As this report is a Jupyter Notebook, any valid Python code can be added to new notebook cells. For example, the code below will create a histogram over the MS2 counts, indicating how well supported each cross-link is by the underlying data.
import seaborn as sns
sns.distplot(ms2['count']);
Figure 1: Structural model of Streptococcus pyogenes M1 protein and human albumin with XLs mapped on the structure. The M1 protein is shown in gray and constitutes a homodimer. The six albumin molecules are presented as pairs in various shades of blue. Cross-links and distances are given in red with black text. Please click here to view a larger version of this figure.
Supplementary File. Jupyter notebook data. Please click here to download this File.
Modern computational workflows are often complex, with multiple tools from many different vendors, complex interdependencies, high data volumes, and multifaceted results. Consequently, it is increasingly difficult to accurately document all the steps required to obtain a result, making it difficult to reproduce the given result. Here, we demonstrate a general strategy that combines the automation and ease of an automated workflow that produces a generic report, with the flexibility to customize the report in a reproducible fashion.
Three requirements need to be fulfilled for the protocol to work: First, the proteins selected for analysis need to interact in such a manner that the chemical cross-linking experiment can produce cross-linked species at a sufficiently high concentration to be detected by the mass spectrometer; different mass spectrometers have different levels of detection and are also dependent on the acquisition protocol as well as the choice of cross-linking reagent. The current version of TX-MS protocol only allows for DSS, a lysine-lysine homobifunctional cross-linking reagent. Still, this limitation is primarily due to the possibility that the machine learning step would need to be adjusted for other reagents. This limitation has been improved in the Cheetah-MS web server as two more cross-linking reagents can be considered, but all three are non-cleavable reagents. Second, the two proteins need either to have an experimentally determined structure or be modeled using comparative modeling techniques or de novo techniques. Not all proteins can be modeled, but a combination of improved software and a constant deposition of experimental structures in the PDB expands the number of proteins that can be modeled. Third, the interacting proteins should remain sufficiently similar in their bound and unbound states so that the docking algorithms in use by TX-MS and Cheetah-MS can create quaternary structures of adequate quality to enable scoring. This requirement is relatively vague, as acceptable quality is highly system-dependent, where smaller proteins of known structure are generally easier to compare than larger proteins of unknown structure.
In case of a negative result, first check that TX-MS found intra-links, cross-links between residues that are part of the same polypeptide chain. If none are discovered, the most likely explanation is that something went wrong with the sample preparation or the data acquisition. If multiple distance constraints do not support the models, visually inspect the models to ensure that the conformation is supported by cross-linked residues. There is no obvious way to pivot one of the interactors without disrupting at least one cross-link. If there are cross-links longer than the permitted distance for the given cross-linking reagent, try to improve the modeling of the interactors by incorporating cross-linking data.
It is possible to use alternative software applications to accomplish equivalent results provided that the sensitivity of the chosen software is comparable to the sensitivity of TX-MS. For example, there are online versions of RosettaDock, HADDOCK, and others. It is also possible to analyze chemical cross-linking data through xQuest/xProphet5,6, plink7, and SIM-XL26.
We are continuously applying TX-MS and Cheetah-MS to new projects27,28,29, thereby improving the reports produced by these approaches to allow for a more detailed analysis of results without making the reports larger.
The authors have nothing to disclose.
This work was supported by the Foundation of Knut and Alice Wallenberg (grant no. 2016.0023) and the Swiss National Science Foundation (grant no. P2ZHP3_191289). In addition, we thank S3IT, University of Zurich, for its computational infrastructure and technical support.
Two Protein DataBank files of the proteins of interest. | N/A | N/A | Example files available on txms.org and zenodo.org, DOI 10.5281/zenodo.3361621 |
An mzML data file acquired on a sample where the proteins of interest were crosslinked. | N/A | N/A | Example files available on txms.org or zenodo.org, DOI 10.5281/zenodo.3361621 |