This article describes the use of a software application, mAbScale, for the calculation of masses for monoclonal antibody-based protein therapeutics.
Biotherapeutic masses are a means of verifying identity and structural integrity. Mass spectrometry (MS) of intact proteins or protein subunits provides an easy analytical tool for different stages of biopharmaceutical development. The protein’s identity is confirmed when the experimental mass from MS is within a pre-defined mass error range of the theoretical mass. While several computational tools exist for the calculation of protein and peptide molecular weights, they either were not designed for direct application to biotherapeutic entities, have access limitations due to paid licenses, or require uploading protein sequences to host servers.
We have developed a modular mass calculation routine that enables the easy determination of the average or monoisotopic masses and elemental compositions of therapeutic glycoproteins, including monoclonal antibodies (mAb), bispecific antibodies (bsAb), and antibody-drug conjugates (ADC). The modular nature of this Python-based calculation framework will allow the extension of this platform to other modalities such as vaccines, fusion proteins, and oligonucleotides in the future, and this framework could also be useful for the interrogation of top-down mass spectrometry data. By creating an open-source standalone desktop application with a graphical user interface (GUI), we hope to overcome the restrictions around use in environments where proprietary information cannot be uploaded to web-based tools. This article describes the algorithms and application of this tool, mAbScale, to different antibody-based therapeutic modalities.
Over the past two decades, biotherapeutics have evolved to become a mainstay of the modern pharmaceutical industry. The SARS-CoV2 pandemic and other life-threatening conditions have further increased the need for the faster and broader development of biopharmaceutical molecules1,2,3.
The biotherapeutic molecular weight is critical for the identification of the molecule, in combination with other analytical assays. The intact and reduced subunit masses are used throughout the discovery and development lifecycles as part of control strategies aimed at maintaining the quality, as described in the QTPP (Quality Target Product Profile)4.
Analytical development in the biopharmaceutical industry relies heavily on mass measurements for intact mass analysis and deep characterization using peptide mapping or multi-attribute method (MAM) monitoring. At the center of these techniques utilizing modern mass spectrometry (MS) platforms is the ability to provide high-resolution accurate mass (HR/AM) measurements. Most HR/AM instruments yield mass accuracies in the range of 0.5-5 ppm, which scale with the mass range. The ability to measure masses accurately for intact large molecules enables the quick and confident identification of large-molecule therapeutics. As isotopic resolution cannot be attained using typical experimental conditions for large molecules (>10 kDa), average masses must be calculated for comparison and identification5,6.
A typical intact or subunit protein mass spectrum represents the overall proteoform profile, which contains composite information on the various molecular forms resulting from post-translational modifications (PTM) and any primary structure differences, such as clips or sequence variants. The relatively easy and high-throughput nature of these measurements make them attractive for characterization and as in-process monitoring controls7,8. Data analysis for these experiments usually requires the user to define the search space for molecular forms (range of PTMs or other molecular forms). For glycosylated proteins, this search space is largely driven by glycoform heterogeneity. Combinations of multiple PTMs, disulfide bond configurations, and other variations along the primary structure make calculating all the possible molecular forms a tedious task. Therefore, the manual calculation of the possible molecular forms is a time and resource-consuming process with a high potential for human error.
Here, we present a mass calculation tool that was developed considering the most important features of biotherapeutic molecules, such as mAbs, bsAbs, ADCs, etc. The tool allows the easy incorporation of search-space variables for the consistent calculation of masses and elemental compositions. The modular nature of this tool will enable it to be further developed and applied to mass calculation and mass matching for other modalities.
The GUI module allows the user to specify the input for the mass calculation, as shown in Figure 1; specifically, the user enters single-letter amino acid sequences for light and heavy antibody chains. Common modifications for heavy-chain N-terminal cyclization and C-terminal lysine clipping are included as check boxes. Further, the chemical formula/elemental composition can be added/subtracted from these protein chains through the respective Chem Mod text box. This allows the user the flexibility to add an elemental composition that includes multiple post-translational modifications or a small-molecule payload in the case of an ADC. As most therapeutic mAbs are engineered to remove the glycosylation sites in the light chain, glycosylation in the light chain is left optional and can be specified using a check box on the GUI.
A typical variation on intact mass analysis for antibodies is a reduced subunit mass analysis, where the light chain is detached from the heavy chain by reducing the interchain disulfide bonds. Depending on the strength of the reducing agent used, the intrachain disulfide bonds may or may not be cleaved. The users have the flexibility of entering the total number of disulfide bonds depending on the IgG subtype or in case of a cysteine-conjugated ADC9.
The application calculates masses in a bottom-up manner, in which the elemental compositions are first calculated for the individual heavy chains and light chains. Next, heavy chain (HC) N-terminal cyclization Lys-clipping is accounted for by adjusting the calculated elemental compositions. Any specified chemical modifications are then applied to the heavy and/or light chains. Depending on the type of analysis and the disulfide-bond patterns specified by the user, the number of hydrogens is adjusted for the two polypeptide chains. The glycosylated HC and light chain (LC) (optional) masses are calculated based on the user's input. Finally, multiple HC and LC masses are combined, and the disulfide bond numbers are automatically updated for the intact mass calculation.
With larger molecules such as intact proteins, monoisotopic masses cannot be measured due to the additive mass defect when using mass spectrometers with typical resolving power. Instead, nominal or average masses are measured or reported5,10,11,12,13. The average elemental masses can vary based on the source used for the curated masses14,15. While the differences in elemental masses may be small, they can add up to significant values for large-molecule molecular weight calculations. The average elemental masses used by default in the software application are shown in Supplementary Table 1. For regulated environments like the biopharmaceutical research and development (R&D) field, it is important to maintain consistent molecular masses because changes in masses may imply changes to the molecular entity during regulatory filings. To enable consistency in the use of elemental masses, a dictionary of elemental masses is included with the software tool as a comma-separated value (csv) text file: Element_Mass.csv (Supplementary Coding File 1). Similarly, a curated list of glycan compositions typically seen on mAbs is included: Glycan.csv (Supplementary Coding File 2). Both files are saved in the same folder location as an executable application and can be modified by the user to use a specific elemental mass list or glycan library.
Figure 1: GUI interface for the mAbScale application. The GUI module allows the user to specify the input for the mass calculation. The user enters single-letter amino acid sequences for the light and heavy antibody chains. Common modifications for the heavy-chain N-terminal cyclization and C-terminal lysine clipping are included as check boxes. Chemical formulas/elemental compositions can be added/subtracted through the respective Chem Mod text box. Please click here to view a larger version of this figure.
The high-level workflow for mAbScale is shown in Figure 2. Each step has more sophisticated inner decision branches, loops, and combinatorics. A detailed algorithmic workflow describing the calculation process is presented in Supplementary Figure 1. The application output is saved in a spreadsheet format in the user-selected folder. The output file consists of multiple separate worksheets, which can be categorized as the user input, molecular weight calculations, and references for the average isotopic mass derivations (example output is provided in supplemental tables). The user input worksheets include the protein amino acid sequences and other information entered by the user, averaged elemental masses, and glycan masses, which are used to calculate the elemental composition and different molecular weights. The molecular weight calculation sheets include the chemical composition of various forms, the reduced mass with and without glycosylation and chemical modification, and the intact mass with and without glycosylation and chemical modification. Sheets containing half-antibody masses will be generated automatically if the user enters two different HCs and/or two different LCs in the user input page, since half-antibodies are primary impurities that need to be identified and quantified relative to the desired heterodimer. The source code for mAbScale can be accessed through the following repository: https://github.com/kkhatri99/mAbScale.
Figure 2: Overview of the steps involved in the calculation of elemental compositions and masses using the application. Color coding can be used to link to the process flow described in Supplementary Figure 1. Please click here to view a larger version of this figure.
1. Opening the mAbscale application
2. Sequence entry
3. Specifying the number of disulfide bonds
4. Setting the output folder and running the application
A variety of mAbs were selected to represent different types of mAbs. A commercially available mAb standard was selected to represent a conventional mAb with identical heavy chains, identical light chains, and one N-linked glycosylation site in the Fc region. A mAb with an additional light chain N-linked glycosylation, a bispecific mAb, and an antibody-drug conjugate (ADC) mAb were also chosen to widen the application usage. The chemical composition, calculated mass, measured mass, and mass error of these example mAbs are summarized in Table 1. The protein chemical compositions and calculated masses reported by mAbScale were confirmed by GPMAW16, a program for protein and peptide primary structure analysis.
For the intact mass analysis, the mAb samples were diluted to 1 mg/mL using LC-MS grade water and injected for analysis. For the reduced analysis, the samples were first treated with dithithreitol and incubated at 37 °C for 15 min to cleave the inter-chain disulfide bonds. All the samples were analyzed using an Acquity UPLC system coupled to a mass spectrometer. A BEH 200 SEC column was employed for online desalting and the separation of the heavy and light chains using an isocratic method with water/acetonitrile (65:35) and 0.1% TFA as the mobile phase. The mass spectrometer was operated in positive ion mode, and the data was acquired with a scan range of 700-5,000 m/z.
The intact and reduced workflows of Protien Metrics, Inc. (PMi) Byos were used to process the intact and reduced raw spectra, respectively. The protein mass range was set to 143,000-163,000 Da for the intact mass deconvolution, 47,000-53,000 Da for the HC mass deconvolution, and 20,000-27,000 Da for the LC mass deconvolution. For the automated mass/peak-picking, the minimum difference between the mass peaks was set to 15 Da, and the maximum number of mass peaks was limited to 10. A list of expected glycans was entered/selected for the mass matching tab, and the upper limit for the mass matching tolerance was set to 10 Da.
The small mass errors between the calculated masses and measured masses were within the normal mass error acceptance criteria (≤10 Da for intact mAbs, ≤5 Da for reduced heavy chains and light chains, respectively), suggesting that the calculated masses were accurate17.
For the calculation of the ADC theoretical masses, a chemical modification with the linker/payload elemental composition can be added to specific mAb subunits. However, only the molecular weight of one drug load ratio mass will be included in the output. The composite molecular mass of antibodies with different drug load ratios must be added manually by the user. These capabilities could be added in a later version of mAbScale or could be modified with community support, given the open-source nature of this project.
Table 1: Comparison of the calculated and measured masses for various mAb subunits and molecular forms. The chemical compositions, calculated masses, measured masses, and mass errors of example mAbs are summarized in this table. Please click here to download this Table.
Supplementary Figure 1: Detailed algorithmic workflow for mAbScale. Please click here to download this File.
Supplementary Table 1: The calculated average elemental masses used in mAbScale14,15. Please click here to download this File.
Supplementary Coding File 1: List of elemental masses. Please click here to download this File.
Supplementary Coding File 2: List of glycans. Please click here to download this File.
Supplementary Coding File 3: Bundled application- mAbScale executable. Please click here to download this File.
mAbScale provides an intuitive user interface with the flexibility to alter the building blocks for mass and elemental calculations. The users are expected to have a basic understanding of the target molecule to use the application, derive correct masses, and interpret the results. For example, the intact or reduced mass output sheet can be overwhelming due to the numerous rows of intact or reduced masses, since the default glycan database contains 88 N-linked glycans that are commonly found in the Fc portion of therapeutic antibodies, and the application calculates all the possible glycoform masses that are included in the database18,19. While most therapeutics mAbs are engineered to remove glycosylation in the Fab region, some mAbs might retain this glycosylation site, and this could further increase the total number of glycosylated proteoforms. Users are recommended to curate a glycan database that focuses on the most appropriate glycoforms for a given molecule to reduce the complexity of the output and to better align the results with the measured masses for mass peak identification.
The level of complexity increases further with bsAbs due to the heterogeneity of the light and heavy chains. This software application generates all the possible permutations and combinations with the provided LC and HC sequences and glycoforms to allow for the generation of all the potential by-products from the mispairing or incomplete pairing of the antibody subunits, such as half-antibodies. This leaves it up to the user to filter out the most appropriate proteoforms for their use. The software output divides glycosylated and non-glycosylated outputs into separate worksheets, which makes it easier for the user to review. The intact and reduced molecular masses are also segregated, and all possible half-antibody combinations for bsAbs are listed in a dedicated worksheet to further simplify the uptake of the processed results.
A limitation of the current software version is that the application calculates the ADC masses with only one drug-to-antibody ratio at one time, since the payload chemical structure is entered in the Heavy Chain Chem Mod and Light Chain Chem Mod text boxes. For each drug-to-antibody ratio (DAR), the elemental composition needs to be entered by the user for recalculation.
The ability to calculate masses for intact proteins is provided by several applications, but they either require a commercial license to be purchased or are web-based tools that require the protein sequences to be uploaded16,20,21. These applications offer very limited flexibility to the user for adding custom chemical modifications or easily incorporating intramolecular bonds, such as disulfides. Further, the value of web-based applications is limited when proprietary and confidential information is involved, such as in pharmaceutical development or other controlled environments, because the biotherapeutic sequence information cannot be uploaded to external servers. Consequently, researchers must rely on either manual calculations or programmatic routines that are less flexible, difficult to disseminate, and could lead to inconsistencies.
We have developed an open-source framework for the calculation of molecular mass and elemental composition with a focus on alleviating the restrictions associated with the existing applications. The standalone desktop application with a GUI will overcome the restrictions associated with uploading proprietary information to external servers and enable easy access for users. This tool can be used for the most common biotherapeutic modalities, including mAbs, bsAbs, and ADCs. Further, the range of modifications and source elemental masses can be easily customized to fit the user's needs. The flexible nature of this workflow will allow future development to include applications to other therapeutic modalities, like non-mAb protein therapeutics, multi-subunit vaccines, and oligonucleotides or mRNA. By making this framework open-source, we hope to engage the community in further development and adaptation to other modalities, as well as in adding more features, such as the calculation of theoretical fragments for top-down MS data interrogation.
The authors have nothing to disclose.
The authors thank Robert Schuster for assistance with data verification.
Acquity UPLC system | Waters Corp., Milford, MA | N/A | Modular system |
Antibody-drug conjugate (ADC) | GlaxoSmithKline | N/A | Proprietory molecule |
BEH 200 SEC column | Waters Corp., Milford, MA | 176003904 | |
Bispecific mAb | GlaxoSmithKline | N/A | Proprietory molecule |
Byos | Protein Metrics, Cupertino, CA | https://proteinmetrics.com/byos/ Version 4.5 |
|
GPMAW | GPMAW | http://www.gpmaw.com/ | |
LC-MS grade water | Thermo Fisher Scientific, Waltham, MA | W6-1 | |
mAb standard | Waters Corp., Milford, MA | 186009125 | Waters Humanized mAb Mass Check Standard |
mAbScale | GlaxoSmithKline | Apache License, Version 2.0 | |
Xevo G2 Q-TOF mass spectrometer | Waters Corp., Milford, MA | N/A | Modular system |