Cytofast is a visualization tool used to analyze output from clustering. Cytofast can be used to compare two clustering methods: FlowSOM and Cytosplore. Cytofast can rapidly generate a quantitative and qualitative overview of mass cytometry data and highlight the main differences between different clustering algorithms.
The complexity of data generated by mass cytometry has necessitated new tools to rapidly visualize analytic outcomes. Clustering methods like Cytosplore or FlowSOM are used for the visualization and identification of cell clusters. For downstream analysis, a newly developed R package, Cytofast, can generate a rapid visualization of results from clustering methods. Cytofast takes into account the phenotypic characterization of cell clusters, calculates the cell cluster abundance, then quantitatively compares groups. This protocol explains the applications of Cytofast to the use of mass cytometry data based on modulation of the immune system in the tumor microenvironment (i.e., the natural killer [NK] cell response) upon tumor challenge followed by immunotherapy (PD-L1 blockade). Demonstration of the usefulness of Cytofast with FlowSOM and Cytosplore is shown. Cytofast rapidly generates visual representations of group-related immune cell clusters and correlations with immune system composition. Differences are observed in the clustering analysis, but separation between groups are visible with both clustering methods. Cytofast visually shows the patterns induced by PD-L1 treatment that include a higher abundance of activated NK cell subsets, expressing a higher intensity of activation markers (i.e., CD54 or CD11c).
Mass cytometry (cytometry by time-of-flight, or CyTOF) allows detection of a wide range of intracellular or extracellular biomarkers in millions of single cells. The high-dimensional nature of mass cytometry data necessitates certain analysis tools, such as cell clustering techniques like SPADE1, FlowMaps2, FlowSOM3, Phenograph4, VorteX5, and scaffold maps6. In addition, various dimensionality reduction-based techniques have been developed (i.e., principal component analysis [PCA]7, t-distributed stochastic neighbor embedding [t-SNE]8, hierarchical stochastic neighbor embedding [HSNE]9, uniform manifold approximation and projection [UMAP]10, and diffusion maps11) to improve the speed, interpretation and visualization of high-dimensional datasets.
Downstream analysis of high-dimensional flow and mass cytometric data often lacks automatic processes to perform statistical tests on cluster frequency and links with clinical outcomes. Previously, we developed an R-based workflow known as Cytofast12, which allows for visual and quantitative downstream analyses of clustering techniques by Cytosplore or FlowSOM.
The protocol described here clarifies the use of Cytofast in R and shows how to generate quantitative and qualitative heatmaps and graphs. Furthermore, it facilitates the determination of connections between observed immune phenotypes and clinical outcomes. This report also describes the analysis of a specific mass cytometry dataset using two different clustering procedures: FlowSOM and Cytosplore. By using Cytofast with both clustering methods, it is correspondingly shown that the activation phenotype of NK cells is influenced by PD-L1 immune checkpoint blockade.
All animal experiments were approved by the Animal Experiments Committee of LUMC and were executed according to the animal experimentation guidelines of the LUMC in compliance with the guidelines of Dutch and European committees.
NOTE: For experimental set-up, C57BL/6 mice were subcutaneously inoculated on the right flank with the murine colon tumor MC38 at a concentration of 0.3 x 106 cells/200 µL of phosphate-buffered saline (PBS). After 10 days, when tumors were palpable, the mice were treated with PD-L1 blocking antibodies (clone MIH-5, 200 µg/mouse, intra-peritoneal injection) or were mock-treated. The tumors were resected 3 days later after PD-L1 injection, processed ex vivo, and analyzed by CyTOF mass cytometry using 38 markers13.
1. Equipment and Software for Data Analysis
NOTE: Use a computer (Windows 7 or newer) and processor I5 at 2.4 GHz or equivalent, installed memory RAM 6 GB, and 10 GB of free hard drive space. The R package Cytofast uses existing functions: mainly flowCore, pheatmap, and ggplot. The command lines to be executed in R are included in the protocol. The resource for R instructions can be found at https://education.rstudio.com/.
2. Creating Clusters
NOTE: To showcase the two clustering methods Cytosplore and FlowSOM with Cytofast, the NK cells (CD161+) in the tumor micro-environment 3 days after PD-L1 treatment are analyzed.
3. Visualization: Post-processing Clustering Analysis
NOTE: This step is a method that is common to both clustering methods. Therefore, it can be performed after clustering either with FlowSOM or Cytosplore.
The workflow Cytofast (Figure 1) is meant to provide a quantitative and qualitative overview of the data originally clustered by analysis software (i.e., FlowSOM or Cytosplore). Cytofast runs several possible outputs, including the heatmap of all clusters identified in the analysis and based on marker expression (Figure 2 and Figure 3). The dendrogram on the top represents the hierarchical similarity between the identified clusters. The upper panel displays another heatmap showing the relative quantity of corresponding subsets in each sample. The dendrogram on the right shows the similarity between samples and is based on hierarchical clustering performed on the Euclidean distances between samples. The combined heatmaps are shown for FlowSOM followed by Cytofast de Figure 2 and for Cytosplore followed by Cytofast de Figure 3. Cytofast can also be used to present the data quantitatively and display the results in boxplots (by using cytoBoxplots function), as shown in Figure 4 and Figure 5.
Similar clusters were found between the two different methods (e.g., cluster 8 from Cytosplore corresponds to cluster 10 from FlowSOM), and co-expression of some inhibitory markers like PD-1 and LAG-3 were still visible in both methods). Both clustering methods allowed discrimination between PD-L1 vs. PBS treated mice. In contrast, some differences between both methods can be highlighted. FlowSOM identifies 2 clusters (MHC-II+), whereas Cytosplore shows only one cluster (MHC-II+dim). This is due to the initial gating strategy in which NK cells were manually gated on CD161+ cells, then further processed by FlowSOM. However, Cytosplore automatically gated cells from the CD45+ population on the first HSNE level, which were then clustered in a higher hierarchical level. Thus, Cytosplore defined the NK cell subsets more precisely than how manual gating focused on CD161. Nevertheless, hierarchical clustering of the samples was preserved, as shown in the dendrogram on the right, indicating that segregation between the two groups (PD-L1 and PBS) was not dependent on the chosen clustering method.
The number of clusters can be manually defined using both methods. Cytofast enables the user to assess the heterogeneity of their data and can provide insight into how to choose the number of clusters into which the data should be divided. Other features are included in the Cytofast package, such as the msiPlot function (step 3.4.2), showing the median signal Intensity (MSI) plot of every marker per group (Figure 6 and Figure 7). This function allows detection of global changes, such as increases in the expression of CD54 or CD11c in NK cells of the PD-L1-treated group. Optional features can be incorporated in the Cytofast package, such as displaying data in bar graphs and other methods of data representation. The latter requires the addition of ggplot tools, which can be generated by R.
Figure 1: Workflow of Cytofast package. The data were generated by mass cytometry from a tumor 3 days after treatment with immunotherapy or left untreated. Two different clustering techniques were compared: Cytosplore and FlowSOM. Cytofast was used to visualize differences between the two techniques. Please click here to view a larger version of this figure.
Figure 2: Cluster overview and cluster abundance per group as analyzed by Cytofast following Cytosplore. Heatmap of all NK cell clusters (CD161+ cells defined automatically by Cytosplore), which were identified 3 days after immunotherapy (PD-L1). Data shown is based on Cytosplore clustering and pooled from the untreated and PD-L1 treated groups. Levels of ArcSinh5-transformed expression marker are displayed on a rainbow scale. On the lower panel, the relative abundance of each sample is represented by the green-to-purple scale. The dendrogram on the right represents the similarity between samples based on subset frequencies. The frequency scale represents the dispersion of the mean. A low or a high frequency is represented by a green or purple color, respectively. Please click here to view a larger version of this figure.
Figure 3: Cluster overview and cluster abundance per group as analyzed by Cytofast following FlowSOM. Heatmap of all NK cell clusters (pre-gated on CD161+ events), which were identified 3 days after immunotherapy (PD-L1). Data shown is based on FlowSOM clustering and pooled from the untreated and PD-L1 treated groups. Levels of ArcSinh5-transformed expression marker are displayed on a rainbow scale. On the lower panel, the relative abundance of each sample is represented by the green-to-purple scale. The dendrogram on the right represents the similarity between samples based on subset frequencies. The frequency scale represents the dispersion of the mean. A low or a high frequency is represented by a green or purple color, respectively. Please click here to view a larger version of this figure.
Figure 4: Cytofast representation with boxplots of the clusters defined by Cytosplore. The frequency of each cluster is represented in a boxplot, separated into the two groups (PBS and PD-L1). One individual dot corresponds to one mouse. Please click here to view a larger version of this figure.
Figure 5: Cytofast representation with boxplots of the clusters defined by FlowSOM. The frequency of each cluster is represented in a boxplot, separated into the two groups (PBS and PD-L1). One individual dot corresponds to one mouse. Please click here to view a larger version of this figure.
Figure 6: Distribution of signal intensity plots from NK cells automatically gated by Cytosplore. Distribution of signal intensities are shown in a histogram for three specific markers: CD45, CD11c, and CD54. Please click here to view a larger version of this figure.
Figure 7: Distribution of signal intensity plots from NK cells automatically gated by FlowSOM. Distribution of signal intensities are shown in a histogram for three specific markers: CD45, CD11c, and CD54, segregated by groups PBS and PD-L1. Please click here to view a larger version of this figure.
Supplementary Files 1.1–1.8. Please click here to view this file (Right click to download).
Supplementary Files 2.1–2.10. Please click here to view this file (Right click to download).
Supplementary File 3. Please click here to view this file (Right click to download).
Supplementary Files 4.1–4.8. Please click here to view this file (Right click to download).
Supplementary File 5. Please click here to view this file (Right click to download).
Cytofast is a rapid computational tool that provides a quick and global exploration of cytometric data by highlighting and quantifying treatment-specific cellular subsets. The protocol described aims to further process clustering analyses with Cytosplore or FlowSOM. Other clustering analysis tools are suitable for Cytofast, but this requires the use of Cytofast to assign each cell to a subset. Cytofast, however, is not a clustering method, and therefore requires clustering procedures before use.
The analysis performed here showed that certain CD161+ NK cell subsets in the tumor microenvironment were sensitive to a PD-L1 blockade. This was evidenced by changes in their phenotype and abundance, which were observed using both Cytosplore and FlowSOM as clustering methods. Both methods distinguished the main NK cell cluster (CD11b+ NKG2A+) with slightly different frequencies (15%–20% for Cytosplore, 30%–40% for FlowSOM). The differences in abundance and this approximation did not affect the global pattern, because both dendrograms displayed in the right panels of Figure 2 and Figure 3 showed similar results. By using Cytofast, it is thus possible (independent from the clustering method chosen) to segregate PD-L1-treated and untreated mice based on analyses of NK cell cluster phenotype and abundance.
Depending on the recorded parameters, modifications to the protocol are needed. Specifically, certain parameters such as time and background must be removed while performing the clustering analysis. In addition, it is important that each cell is assigned to a subset. The cfData function will simply add the raw cell counts per cluster per sample into the cfList. From this step, the cytoheatmap can be built as explained in section 3.
Cytofast has been successfully used as a visualization and quantification tool to compare different clustering methods13. This R package is also compatible with advanced features, such as the globaltest14, which can test associations between groups of clusters using clinical variables. In the future, the globaltest tool and other algorithms can be integrated with Cytofast for more in-depth visualization and quantification.
The authors have nothing to disclose.
We acknowledge funding from the European Commission of a H2020 MSCA award under proposal number 675743 (ISPIC). We thank Tetje van der Sluis and Iris Pardieck for testing the protocol.