IDBac is an open-source mass spectrometry-based bioinformatics pipeline that integrates data from both intact protein and specialized metabolite spectra, collected on cell material scraped from bacterial colonies. The pipeline allows researchers to rapidly organize hundreds to thousands of bacterial colonies into putative taxonomic groups, and further differentiate them based on specialized metabolite production.
In order to visualize the relationship between bacterial phylogeny and specialized metabolite production of bacterial colonies growing on nutrient agar, we developed IDBac—a low-cost and high-throughput matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF MS) bioinformatics pipeline. IDBac software is designed for non-experts, is freely available, and capable of analyzing a few to thousands of bacterial colonies. Here, we present procedures for the preparation of bacterial colonies for MALDI-TOF MS analysis, MS instrument operation, and data processing and visualization in IDBac. In particular, we instruct users how to cluster bacteria into dendrograms based on protein MS fingerprints and interactively create Metabolite Association Networks (MANs) from specialized metabolite data.
A major barrier to researchers who study bacterial function is the ability to quickly and simultaneously assess the taxonomic identity of a microorganism and its capacity to produce specialized metabolites. This has prevented significant advances in understanding the relationship between bacterial phylogeny and specialized metabolite production in the majority of bacteria isolated from the environment. Although MS-based methods that use protein fingerprints to group and identify bacteria are well described1,2,3,4, these studies have generally been performed on small groups of isolates, in a species-specific manner. Importantly, information on specialized metabolite production, a major driver of microbial function in the environment, has remained unincorporated in these studies. Silva et al.5 recently provided a comprehensive history detailing the underuse of MALDI-TOF MS to analyze specialized metabolites and the shortage of software to relieve current bioinformatics bottlenecks. In order to address these shortcomings, we created IDBac, a bioinformatics pipeline that integrates both linear and reflectron modes of MALDI-TOF MS6. This allows users to rapidly visualize and differentiate bacterial isolates based on both protein and specialized metabolite MS fingerprints, respectively.
IDBac is cost-effective, high-throughput, and designed for the lay user. It is freely available (chasemc.github.io/IDBac), and only requires access to a MALDI-TOF mass spectrometer (reflectron mode will be required for specialized metabolite analysis). Sample preparation relies on the simple “extended direct transfer” method7,8 and data are collected with consecutive linear and reflectron acquisitions on a single MALDI-target spot. With IDBac, it is possible to analyze the putative phylogeny and specialized metabolite production of hundreds of colonies in under four hours, including sample preparation, data acquisition, and data visualization. This presents a significant time and cost advantage over traditional methods of identifying bacteria (such as gene sequencing), and analyzing metabolic output (liquid chromatography-mass spectrometry [LCMS] and similar chromatographic methods).
Using data obtained in linear mode analysis, IDBac employs hierarchical clustering to represent the relatedness of protein spectra. Since the spectra mostly represent ionized ribosomal proteins, they provide a representation of the phylogenetic diversity present in a sample. In addition, IDBac incorporates reflectron mode data to display specialized metabolite fingerprints as Metabolite Association Networks (MANs). MANs are bipartite networks that allow for easy visualization of shared and unique metabolite production between bacterial isolates. The IDBac platform allows researchers to analyze both protein and specialized metabolite data in tandem but also individually if only one data-type is acquired. Importantly, IDBac processes raw data from Bruker and Xiamen instruments, as well as txt, tab, csv, mzXML, and mzML. This eliminates the need for manual conversion and formatting of data sets, and significantly reduces the risk of user error or mishandling of MS data.
1. Preparation of MALDI matrix
2. Preparation of MALDI target plates
NOTE: See Sauer et al.7, for more details.
Figure 1: MALDI-target plate showing two different isolates before adding formic acid and MALDI matrix (top 3 spots – Bacillus sp.; bottom 3 spots – Streptomyces sp.). For both, column 3 represents excess sample; column 2 represents the appropriate amount of sample; column 1 represents insufficient sample for MALDI analysis. Please click here to view a larger version of this figure.
3. Data acquisition
NOTE: The general parameters for data acquisition are listed in Table 1.
Parameter | Protein | Specialized Metabolite |
Mass Start (Da) | 1920 | 60 |
Mass End (Da) | 21000 | 2700 |
Mass Deflection (Da) | 1900 | 50 |
Shots | 500 | 1000 |
Frequency (Hz) | 2000 | 2000 |
Laser Size | Large | Medium |
MaxStdDev (ppm) | 300 | 30 |
Table 1.
Figure 2: Example protein spectra displaying the effect of modifying laser power and detector gain. Spectra quality is best in panel A, and decreases until insufficient spectra quality in panels C and D. While the spectrum in panel B may result in useable peaks, panel A displays optimal data. Please click here to view a larger version of this figure.
Figure 3: Example specialized metabolite spectra displaying the effect of modifying laser power and detector gain. Spectra quality is best in panel A and decreases until insufficient spectra quality in panels C and D. While the spectrum in panel B may result in useable peaks, panel A displays optimal data. Please click here to view a larger version of this figure.
4. Cleaning the MALDI target plate (adapted from Sauer et al.7)
5. Installing the IDBac Software
6. Starting with Raw Data
NOTE: Detailed explanations and instructions of each data processing step are embedded within IDBac, however the main analyses and interactive inputs are described below.
Figure 4: IDBac data conversion and preprocessing step. IDBac converts raw spectra into the open mzML format and stores mzML, peak lists, and sample information in a database for each experiment. Please click here to view a larger version of this figure.
7. Work with previous experiments
Figure 5: "Work with Previous Experiments" page. Use IDBac’s “Work with Previous Experiments” page to select an experiment to analyze or modify. Please click here to view a larger version of this figure.
Figure 6: Input sample information. Within the “Work with Previous Experiments” page users can input information about samples such as taxonomic identity, collection location, isolation conditions, etc. Please click here to view a larger version of this figure.
Figure 7: Transfer data. The “Work with Previous Experiments” page contains the option to transfer data between existing experiments and to new experiments. Please click here to view a larger version of this figure.
8. Setting up protein data analysis and creating mirror plots
Figure 8: Choose how peaks are retained for analysis. After selecting an experiment to analyze, visiting the “Protein Data Analysis” page and subsequently opening the “Choose how Peaks are Retained for Analysis” menu allows users to choose settings like signal-to-noise ratio for retaining peaks. The displayed mirror plot (or dendrogram) will automatically update to reflect the chosen settings. Please click here to view a larger version of this figure.
9. Clustering samples using protein data
Figure 9: Select samples from the chosen experiment to include within the displayed dendrogram. Please click here to view a larger version of this figure.
10. Customizing the protein dendrogram
Figure 10: Adjust the dendrogram. IDBac provides a few options for modifying how the dendrogram looks, these may be found within the menu “Adjust the Dendrogram”. This includes coloring branches and labels by k-means, or by “cutting” the dendrogram at a user-provided height. Please click here to view a larger version of this figure.
Figure 11: Incorporate info about samples. Within the “Adjust the Dendrogram” menu is the option “Incorporate info about samples”. Selecting this will allow plotting information about samples next to the dendrogram. Sample information is input within the “Work with Previous Experiments” page. Please click here to view a larger version of this figure.
11. Insert samples from a separate experiment into the dendrogram
Figure 12: Insert Samples from Another Experiment menu. Sometimes it is helpful to compare samples from another experiment. Use the “Insert Samples from Another Experiment” menu to choose samples to include within the currently-displayed dendrogram. Please click here to view a larger version of this figure.
12. Analyzing specialized metabolite data and metabolite association networks (MANs)
Figure 13: Small Molecule Data Analysis” page. If a dendrogram was created from protein spectra, it will be displayed within the “Small Molecule Data Analysis” page. This page will also display Metabolite Associate Networks (MANs) and Principle Components Analysis (PCA) for small molecule data. Please click here to view a larger version of this figure.
13. Sharing data
We analyzed six strains of Micromonospora chokoriensis and two strains of Bacillus subtilis, which were previously characterized6, using data available at DOI: 10.5281/zenodo.2574096. Following directions in the Starting with Raw Data tab, we selected the option Click here to convert Bruker files and followed the IDBac-provided instructions for each dataset (Figure 14).
After the automated conversion and preprocessing/peak-peaking steps were completed, we proceeded to create a new combined IDBac experiment by transferring samples from the two experiments into a single experiment containing both Bacillus and Micromonosopora samples (Figure 15). The resulting analysis involved comparing protein spectra using mirror plots, as pictured in Figure 16, which was useful for evaluating spectra quality and adjusting peak-picking settings. Figure 17 displays a screenshot of the protein clustering results with default settings selected. The dendrogram was colored by adjusting the threshold on the plot (appears as a dotted line). Of note is the clear separation between genera, with both M. chokoriensis and B. subtilis isolates clustering separately.
Figure 18, Figure 19, and Figure 20 highlight the ability to generate MANs of user-selected regions by clicking and dragging across the protein dendrogram. With this we were able to rapidly create MANs to compare only the B. subtilis strains (Figure 18), only the M. chokoriensis strains (Figure 19), and all the strains simultaneously (Figure 20). The primary function of these networks is to provide researchers with a broad overview of the degree of specialized metabolite overlap between bacteria. With these data in hand, researchers now have the capacity to make informed decisions from only a small amount of material scraped from a bacterial colony.
Figure 14: Spectra processing. Downloaded Bruker autoFlex spectra were converted and processed using IDBac. Please click here to view a larger version of this figure.
Figure 15: Combined IDBac experiment. Because the Micromonospora and Bacillus spectra were collected on different MALDI target plates, the two experiments were subsequently combined into a single experiment-“Bacillus_Micromonsopora”. This was done within the “Work with Previous Experiments” tab, following directions within the menu “Transfer samples from previous experiments to new/other experiments”. Please click here to view a larger version of this figure.
Figure 16: Comparison. Micromonspora and Bacillus spectra were compared using the mirror plots within the “Protein Data Analysis” page. Ultimately, default peak settings were chosen. Please click here to view a larger version of this figure.
Figure 17: Hierarchical clustering. Hierarchical clustering, using default settings, correctly grouped Bacillus and Micromonospora isolates. The dendrogram was colored by “cutting” the dendrogram at an arbitrary height (displayed as a dashed-line) and 100 bootstraps used to show confidence in branching. Please click here to view a larger version of this figure.
Figure 18: MAN created by selecting the Bacillus sp. strains from the protein dendrogram showed differential production of specialized metabolites. Please click here to view a larger version of this figure.
Figure 19: MAN created by selecting the six Micromonospora sp. strains from the protein dendrogram showed differential production of specialized metabolites. Please click here to view a larger version of this figure.
Figure 20: MAN of Bacillus sp. and Micromonospora sp. strains showing a differential production of specialized metabolites. Please click here to view a larger version of this figure.
The IDBac protocol details bacterial protein and specialized metabolite data acquisition and analysis of up to 384 bacterial isolates in 4 h by a single researcher. With IDBac there is no need to extract DNA from bacterial isolates or generate specialized metabolite extracts from liquid fermentation broths and analyze them using chromatographic methods. Instead, protein and specialized metabolite data are gathered by simply spreading material from bacterial colonies directly onto a MALDI target plate. This greatly reduces the time and cost associated with alternative techniques such as 16S rRNA gene sequencing and LCMS9.
It is important to add a matrix blank and calibration spots to the MALDI plate, and we recommend using an appropriate number of replicates to ensure reproducibility and statistical confidence. The numbers of replicates will be experiment-dependent. For example, if a user intends to differentiate thousands of colonies from a collection of environmental diversity plates, fewer replicates may be necessary (our lab collects three technical replicates per colony). Alternatively, if a user wishes to create a custom database of strains from specific bacterial taxa to rapidly determine sub-species classifications of unknown isolates, then more replicates are appropriate (our lab collects eight biological replicates per strain).
IDBac is a tool for rapidly differentiating highly-related bacterial isolates based on putative taxonomic information and specialized metabolite production. It can complement or serve as a precursor to orthogonal methods such as in-depth genetic analyses, studies involving metabolite production and function, or characterization of specialized metabolite structure by Nuclear Magnetic Resonance spectroscopy and/or LC-MS/MS.
Specialized metabolite production (IDBac MANs) is highly susceptible to bacterial growth conditions, especially using different media, which is a potential limitation of the method. However these traits may be exploited by the user, as IDBac can readily generate MANs showing the differences in specialized metabolite production under a variety of growth conditions. It is important to note that while specialized metabolite fingerprints may vary by growth condition, we have previously shown that protein fingerprints remain relatively stable across these variables (see Clark et al.6). When dealing with environmental diversity plates, we recommend purifying bacterial isolates prior to analysis in order to reduce possible contributions from neighboring bacterial cross-talk.
Finally, the lack of a searchable public database of protein MS fingerprints is a major shortcoming in the use of this method to classify unknown environmental bacteria. We created IDBac with this in mind, and included automated conversion of data into a community-accepted open-source format (mzML)10,11,12 and designed the software to allow searching, sharing, and creation of custom databases. We are in the process of creating a large public database (>10,000 fully characterized strains), which will allow for the classification of some isolates to the species-level, including links to GenBank accession numbers when available.
IDBac is open source and the code is available for anyone to customize their data analysis and visualization needs. We recommend that users consult an extensive body of literature (Sauer et al.7, Silva et al.5) to help support and design their experimental goals. We host a forum for discussion at: https://groups.google.com/forum/#!forum/idbac and a means to report issues with the software at: https://github.com/chasemc/IDBacApp/issues.
The authors have nothing to disclose.
This work was supported by National Institute of General Medical Sciences Grant R01 GM125943, National Geographic Grant CP-044R-17; Icelandic Research Fund Grant 152336-051; and University of Illinois at Chicago startup funds. Also, we thank the following contributors: Dr. Amanda Bulman for assistance with MALDI-TOF MS protein acquisition parameters; Dr. Terry Moore and Dr. Atul Jain for recrystallizing alpha-cyano-4-hydroxycinnamic acid matrix (CHCA).
Acetonitrile | Fisher | 60-002-65 | LC-MS Ultra CHROMASOLV |
Autoflex Speed LEF MALDI-TOF instrument | Bruker Daltonics | ||
Bruker Daltonics Bacterial test standard | Fisher | NC0884024 | Bruker Daltonics 8604530 |
Bruker Peptide Calibration standard | Fisher | NC9846988 | Bruker Daltonics 8206195 |
Formic Acid | Fisher Chemical | A117-50 | 99.5+%, Optima LC/MS Grade |
MALDI-TOF target Plate | Bruker Daltonics | ||
Methanol | Fisher Chemical | A456-500 | Optima LC/MS Grade |
Toothpicks | any is ok | ||
Trifluoroacetic acid | Fisher | AC293810010 | 99.5%, for biochemistry, ACROS Organics |
Water | VWR | 7732-18-5 | LC-MS |
α-Cyano-4-hydroxycinnamic acid | Sigma | 28166-41-8 | (C2020-25G) ≥98% (TLC), powder |