This protocol describes a high-throughput workflow for artificial intelligence-driven segmentation of pathology-confirmed regions of interest from stained, thin tissue section images for enrichment of histology-resolved cell populations using laser microdissection. This strategy includes a novel algorithm enabling the transfer of demarcations denoting cell populations of interest directly to laser microscopes.
The tumor microenvironment (TME) represents a complex ecosystem comprised of dozens of distinct cell types, including tumor, stroma, and immune cell populations. To characterize proteome-level variation and tumor heterogeneity at scale, high-throughput methods are needed to selectively isolate discrete cellular populations in solid tumor malignancies. This protocol describes a high-throughput workflow, enabled by artificial intelligence (AI), that segments images of hematoxylin and eosin (H&E)-stained, thin tissue sections into pathology-confirmed regions of interest for selective harvest of histology-resolved cell populations using laser microdissection (LMD). This strategy includes a novel algorithm enabling the transfer of regions denoting cell populations of interest, annotated using digital image software, directly to laser microscopes, thus enabling more facile collections. Successful implementation of this workflow was performed, demonstrating the utility of this harmonized method to selectively harvest tumor cell populations from the TME for quantitative, multiplexed proteomic analysis by high-resolution mass spectrometry. This strategy fully integrates with routine histopathology review, leveraging digital image analysis to support enrichment of cellular populations of interest and is fully generalizable, enabling harmonized harvests of cell populations from the TME for multiomic analyses.
The TME represents a complex ecosystem populated by a highly diverse array of cell types, such as tumor cells, stromal cells, immune cells, endothelial cells, other mesenchymal cell types, and adipocytes, along with a complex extracellular matrix1. This cellular ecosystem varies within and across different disease organ sites, resulting in complex tumor heterogeneity2,3. Recent studies have shown that heterogeneous tumors and tumors with low tumor cellularity (low purity) often correlate with poor disease prognosis2,3.
To understand the molecular interplay between tumor and non-tumor cell populations within the TME at scale, standardized and high-throughput strategies are necessary to selectively harvest distinct cellular populations of interest for downstream multiomic analysis. Quantitative proteomics represents a rapidly evolving and increasingly important technique to further the understanding of cancer biology. To date, the preponderance of studies employing proteomics have done so with proteins extracted from whole tumor tissue preparations (e.g., cryopulverized), leading to a paucity in the understanding of proteome-level heterogeneity in the TME4,5,6.
The development of sample collection strategies that seamlessly integrate with and harness information from clinical pathology workflows will enable a new generation of histology-resolved proteomics that are highly complementary to gold-standard, diagnostic pathology workflows. LMD enables direct and selective collection of cellular subpopulations or regions of interest (ROIs) through microscopic inspection of histologically stained tissue thin sections7. Recent major advances in digital pathology and AI-enabled analysis have demonstrated the ability to identify unique compositional features and ROIs within the TME in an automated fashion, many of which correlate with molecular alterations and clinical disease features, such as resistance to therapy and disease prognosis8.
The workflow described in the protocol presented here leverages commercial software solutions to selectively annotate tumor ROIs within digital histopathology images, and utilizes software tools developed in-house to transfer these tumor ROIs to laser microscopes for automated collection of discrete cellular populations of interest that seamlessly integrates with downstream multiomic analysis workflows. This integrated strategy significantly diminishes LMD operator time and minimizes the duration for which tissues are required to be at ambient temperature. The integration of automated feature selection and LMD harvest with high-throughput quantitative proteomics is demonstrated through a differential analysis of the TME from two representative epithelial ovarian cancer histologic subtypes, high-grade serous ovarian cancer (HGSOC) and ovarian clear cell carcinoma (OCCC).
All study protocols were approved for use under a Western IRB-approved protocol "An Integrated Molecular Analysis of Endometrial and Ovarian Cancer to Identify and Validate Clinically Informative Biomarkers" deemed exempt under US Federal regulation 45 CFR 46.102(f). All experimental protocols involving human data in this study were in accordance with the Declaration of Helsinki. Informed consent was obtained from all subjects involved in the study.
CAUTION: The following reagents used throughout the protocol are known or suspected carcinogens and/or contain hazardous materials: ethanol, DEPC water, Mayer's Hematoxylin solution, Eosin Y solution, methanol, acetonitrile, and formic acid. Proper handling, as described in the respective safety data sheets (SDS), and use of appropriate personal protective equipment (PPE) is mandatory.
1. Generating the default shape list data (.sld) file containing calibrator fiducials
NOTE: The protocol steps described in this section are specific to use with an inverted laser microscope and the associated software (see the Table of Materials). Creation of a default .sld file is only necessary once per laser microscope. The resultant file can be used for cutting fiducials into all PEN slides used thereafter. Approximate time: 5 min (once only).
2. LMD slide(s) preparation
NOTE: The protocol steps described in this section are specific to use with an inverted laser microscope and the associated software (see the Table of Materials). Approximate time: 5 min.
3. Tissue staining
NOTE: Approximate time: 30 min.
4. Slide imaging
NOTE: The protocol steps described in this section are specific for slides scanned (see the Table of Materials) and the resultant images saved as .svs files. Use any scanner and its associated software that generate image files in a format that the image analysis software (see the Table of Materials) can open. File types using pyramidal tiffs that are supported include JPG, TIF, MRXS, QPTIFF, component TIFF, SVS, AFI, SCN, LIF, DCM, OME.TIFF, ND2, VSI, NDPI, NDPIS, CZI, BIF, KFB, and ISYNTAX. Approximate time: 5 min.
5. Automated feature selection using image analysis software
6. Laser microdissection
NOTE: The protocol steps described in this section are specific to use with an inverted laser microscope and the associated software (see the Table of Materials). Approximate time: 2 h; case dependent.
7. Protein digestion by pressure-cycle technology (PCT)
NOTE: Approximate time: 4 h (3 h without vacuum centrifuge drying time).
8. Tandem-mass tag (TMT) labeling and EasyPep cleanup
NOTE: Approximate time: 7 h 20 min (2 h 20 min without vacuum centrifuge drying time).
9. TMT multiplex sample fractionation and pooling
NOTE: Approximate time: 3 h 30 min (1 h 30 min without vacuum centrifuge drying time).
10. Liquid chromatography tandem mass spectrometry (LC-MS/MS)
NOTE: Approximate time: Instrument method and experimental design dependent.
11. Bioinformatic data analysis
NOTE: Approximate time: Experimental design dependent.
Fresh-frozen tissue thin sections from two HGSOC and two OCCC patients were analyzed using this integrated AI-driven tissue ROI identification, segmentation, LMD, and quantitative proteomic analysis workflow (Figure 1). Representative H&E-stained tissue sections for each tumor were reviewed by a board-certified pathologist; tumor cellularity ranged from 70% to 99%. Tissues were thin-sectioned onto PEN membrane slides (Supplemental File 2) and precut with calibrator fiducials (Supplemental File 1), enabling integration of positional orientation data from annotations generated in the image analysis software (see the Table of Materials) with Cartesian coordinate orientation in the LMD software. Following H&E staining, high-resolution images (20x) of the PEN slides containing the tissue plus calibrators were captured.
Tumor and stromal cell populations in the micrographs were segmented using image analysis software (see the Table of Materials) for selective harvest by LMD, along with harvests representing the entire tissue thin section (e.g., whole tumor tissue) (Figure 1). Non-discriminate annotations for whole tumor tissue collections were generated by partitioning the entire tissue section with tiles of 500 µm2, leaving a 40 µm gap between tiles to maintain PEN membrane integrity and prevent the membrane from curling during LMD. On slides for histology-resolved LMD enrichment, the AI classifier in the image analysis software (see the Table of Materials) was trained to discriminate between tumor and stromal cells, along with the blank glass slide background. Representative tumor, stroma, and blank glass regions were manually highlighted, and the classifier tool was used to segment these ROIs throughout the entire tissue section. The segmented layers representing whole tissue, tumor epithelium, and stroma were saved separately as individual .annotation files (Supplemental File 3 and Supplemental File 6). In a separate copy of the image file (without the partitioned ROI annotations), a short line from the centermost tip of each of the three fiducial calibrators was annotated and saved as a .annotation file using the same file name as each of the LMD annotation layer files but appended with the suffix "_calib" (Supplemental File 4). These lines were used to co-register the position of the PEN membrane calibrators with the annotation shape list data drawn in the image analysis software.
The present study provides two algorithms, "Malleator" and "Dapọ" in Python to support this AI-driven LMD workflow, which are available at https://github.com/GYNCOE/Mitchell.et.al.2022. The Malleator algorithm extracts the specific Cartesian coordinates for all individual annotations (tissue ROI and calibrators) from the paired .annotation files and merges these into a single Extensible Markup Language (XML) import file (Supplemental File 5). Specifically, the Malleator algorithm uses the directory name from a parent folder as input to search all subdirectory folders and generates .xml files for any subfolders that do not already have a .xml merged file. The Malleator algorithm merges all annotation layers in the image analysis software (see the Table of Materials) into a single layer and converts the AI-generated shape list data, which is saved as proprietary .annotation file type, into .xml format compatible with the LMD software. After merging the annotation and calibrator files, the algorithm-generated .xml file is saved and imported into the LMD software. Slight adjustments are necessary to manually adjust the alignment of annotations, which also serves to register the vertical (z-plane) position of the slide stage on the laser microscope. The Dapọ algorithm is used specifically for LMD-enriched collections. Partitioned tiles are automatically assigned to individual annotation layers by the image analysis software. The Dapọ algorithm merges all partitioned tiles into a single annotation layer prior to use of the Classifier tool, thereby reducing the Classifier analysis run time for LMD enriched collections.
The whole tumor and LMD-enriched tissue samples were digested, labeled with TMT reagents, multiplexed, fractionated offline, and analyzed via quantitative MS-based proteomics as previously described9. The mean peptide yield (43-60 µg) and recovery (0.46-0.59 µg/mm2) for samples harvested using this AI-driven workflow were comparable with previous reports9,10. A total of 5,971 proteins were co-quantified across all samples (Supplemental Table S1). Unsupervised hierarchical clustering using the 100 most variable proteins resulted in segregation of the HGSOC and OCCC histotypes from the LMD-enriched and whole tumor samples (Figure 2A), similar to that previously described11. By contrast, the LMD-enriched stroma samples from both HGSOC and OCCC clustered together and independently from the LMD-enriched tumor and whole tumor samples. Among the 5,971 quantified proteins, 215 were significantly altered (LIMMA adj. p < 0.05) between whole tumor collections from HGSOC and OCCC specimens (Supplemental Table S2). These altered proteins were compared with those identified to differentiate HGSOC and OCCC tumor tissue by Hughes et al.11. Of the 76 signature proteins quantified by Hughes et al., 57 were co-quantified in this dataset and were highly correlated (Spearman Rho = 0.644, p < 0.001) (Figure 2B).
Figure 1: Summary of the integrated workflow for automated tissue region of interest selection for laser microdissection for downstream quantitative proteomics. Calibration fiducials are cut onto PEN membrane slides to co-register positional orientation data from AI-derived segments of tissue ROI in the image analysis software, HALO, with horizontal positioning on the LMD microscope. The Malleator algorithm is used to merge the annotated segmentation data across all annotation layers for a slide with the _calib reference file, and to convert it to a .xml file compatible with the LMD software. LMD-harvested tissue for proteomic analysis is digested and analyzed by high-throughput quantitative proteomics as previously described9. Abbreviations: LMD = laser microdissection; ROI = region of interest; TMT = tandem mass tag; Quant. = quantification; Ident. = identification; LC-MS/MS = liquid chromatography-tandem mass spectrometry. Please click here to view a larger version of this figure.
Figure 2: Analysis of the proteins in LMD-enriched and whole tumor samples. (A) Unsupervised hierarchical cluster analysis of the 100 most variably abundant proteins in HGSOC and OCCC LMD enriched and whole tumor samples. (B) Correlation of log2 fold-change protein abundances between HGSOC and OCCC whole tumor harvests in the present study (Mitchell et al., x-axis) and a similar study by Hughes et al. (y-axis)11. Abbreviations: LMD = laser microdissection; HGSOC = high-grade serous ovarian cancer; OCCC = ovarian clear cell carcinoma; log2FC = log2-transformed proteomic abundance. Please click here to view a larger version of this figure.
Supplemental Table S1: Abundances of 5,971 proteins co-quantified across all LMD enriched and whole tumor samples from HGSOC and OCCC tissue specimens. Abbreviations: LMD = laser microdissection; HGSOC = high-grade serous ovarian cancer; OCCC = ovarian clear cell carcinoma. Please click here to download this Table.
Supplemental Table S2: Differentially expressed proteins (215) in whole tumor collections from HGSOC vs OCCC (LIMMA adj. p < 0.05). Abbreviations: HGSOC = high-grade serous ovarian cancer; OCCC = ovarian clear cell carcinoma. Please click here to download this Table.
Supplemental File 1: Representative shape list data (.sld) file containing standard calibrator fiducials for four slide positions. The file can be imported into the LMD software. Please click here to download this File.
Supplemental File 2: Representative .svs image file of a H&E-stained high-resolution (20x) tissue section. The file can be opened and viewed using image analysis software or LMD software. Abbreviation: H&E = hematoxylin and eosin; LMD = laser microdissection. Please click here to download this File.
Supplemental File 3: Representative .annotation file of partitioned whole tumor segments. The file can be imported into image analysis software. Please click here to download this File.
Supplemental File 4: Representative _calib.annotation file of calibrator fiducial segments. Coordinate information represents oriental positioning of the short calibrator lines drawn from each arrowhead fiducial. The file can be imported into image analysis software. Please click here to download this File.
Supplemental File 5: Representative extensible markup language (.xml) file generated by the Malleator algorithm. The file can be imported into the laser microdissection software. Please click here to download this File.
Supplemental File 6: Representative .annotation file of partitioned AI-classified segments for LMD-enriched collections. The file can be imported into image analysis software. Abbreviations: AI = artificial intelligence; LMD = laser microdissection. Please click here to download this File.
While there have been multiple study precedents aimed at developing and/or improving workflows for enrichment of target cellular subpopulations from FFPE and/or fresh-frozen tissues and methodologies for maintaining sample quality during processing9,12,13,14,15, a substantial need exists to develop automated strategies for preparing clinical tissue specimens for molecular analyses to decrease variability and increase reproducibility. This workflow describes a standardized, semiautomated protocol that integrates existing image analysis software tools (see the Table of Materials) for histology-resolved harvest of discrete cell populations by LMD from clinical tissue specimens.
Spatially-resolved LMD enrichment of ROIs capturing discrete cell populations represents a next-generation tissue processing step prior to multiomic analyses to improve molecular characterization and identification and facilitate cell-selective biomarker discovery. This protocol improves upon existing methodologies by reducing the often-long exposure of tissue sections to the ambient environment that is associated with manual segmentation of ROI by a histologist (which can take >1-2 h prior to the LMD collection). This workflow instead allows ROI to be preidentified by AI-guided classification and segmentation. Limiting tissue dwell time will decrease spurious variations in the assessments of highly labile molecular targets, such as phosphopeptides and mRNA, or for antibody-based analytical techniques that rely on a target protein being in its native conformation for detection.
Cutting neat calibrator fiducials onto the PEN membrane slide that are clearly visible in the scanned slide image is one of the key components enabling integration of the image analysis software (see the Table of Materials) with the LMD workflow. Ensuring that the calibrators have a precise ("clean") point at the bottom of the "V" shape allows for selection of a precise point in the image analysis software for the calibrator lines to be drawn from, as described in steps 5.1.6 and 5.2.13. Alignment of these points during import into the LMD software is critical for properly overlaying the annotations (facilitated through the generation of a compatible .xml file using the "Malleator" and/or "Dapọ" algorithms) onto the relevant tissue ROI on the physical LMD slide. It is necessary to highlight all shapes and collectively "drag and drop" into place even when the alignment is precise upon import into the LMD software to register the vertical (z-plane) position of the slide stage on the laser microscope. Minor adjustments to the positioning of the annotations over the tissue ROI can also be made during this step, if needed.
A limitation of the current version of the Malleator algorithm is that it is not compatible with the predefined annotation shape tools provided by the image analysis software (see the Table of Materials), although future updates/versions of the algorithm will aim to improve this compatibility. The .annotation file for shapes drawn using these tools contains only two sets of paired x and y coordinates for each annotation, without the complete spatial orientation around those points. Current use of these tools results in the annotations being converted to straight lines defined by only two points during the import process. Manual definition of tissue ROI segments is required for successful conversion to XML format and LMD import. This can either be performed by manually defining each ROI with individual free-hand polygonal annotations specific to the target area or by applying an approximated circular or rectangular annotation across all tissue ROI segments, if desired, and will be compatible with this workflow.
While the workflow presented here was demonstrated for proteomic analysis of fresh-frozen human cancer tissue specimens, this AI-driven LMD workflow can be equivalently used with FFPE tissues, non-cancerous tissue types, and those from non-human sources. It can also support other downstream molecular profiling workflows, including transcriptomic, genomic, or phosphoproteomic analyses. This workflow can also leverage other uses of the image analysis software (see the Table of Materials), including with capabilities associated with cell counting or other analytical modules, including the "Multiplex IHC" module or the "Tissue Microarray (TMA) Add-on". Future applications of this workflow may also benefit from predefining the number of cells per ROI segment, thereby ensuring equivalent cellular inputs across multiple collections, or by using alternative methods to define cellular ROIs of interest, such as by immunohistochemistry or cell sociology.
The authors have nothing to disclose.
Funding for this project was provided in part by the Defense Health Program (HU0001-16-2-0006 and HU0001-16-2-00014) to the Uniformed Services University for the Gynecologic Cancer Center of Excellence. The sponsors had no role in the design, execution, interpretation, or writing of the study. Disclaimer: The views expressed herein are those of the authors and do not reflect the official policy of the Department of Army/Navy/Air Force, Department of Defense, or U.S. Government.
1260 Infinity II System | Agilent Technologies Inc | Offline LC system | |
96 MicroCaps (150uL) in bulk | Pressure Biosciences Inc | MC150-96 | |
96 MicroPestles in bulk | Pressure Biosciences Inc | MP-96 | |
96 MicroTubes in bulk (no caps) | Pressure Biosciences Inc | MT-96 | |
9mm MS Certified Clear Screw Thread Kits | Fisher Scientific | 03-060-058 | Sample vial for offline LC frationation and mass spectrometry |
Acetonitrile, Optima LC/MS Grade | Fisher Chemical | A995-4 | Mobile phase solvent |
Aperio AT2 | Leica Microsystems | 23AT2100 | Slide scanner |
Axygen PCR Tubes with 0.5 mL Flat Cap | Fisher Scientific | 14-222-292 | Sample tubes; size fits PCT tubes and thermocycler |
Barocycler 2320EXT | Pressure Biosciences Inc | 2320-EXT | Barocycler |
BCA Protein Assay Kit | Fisher Scientific | P123225 | |
cOmplete, Mini, EDTA-free Protease Inhibitor Cocktail | Roche | 11836170001 | |
Easy-nLC 1200 | Thermo Fisher Scientific | Liquid Chromatography | |
EasyPep Maxi Sample Prep Kit | Thermo Fisher Scientific | NCI5734 | Post-label sample clean up column |
EASY-SPRAY C18 2UM 50CM X 75 | Fisher Scientific | ES903 | Analytical column |
Eosin Y Solution Aqueous | Sigma Aldrich | HT110216 | |
Formic Acid, 99+ % | Thermo Fisher Scientific | 28905 | Mobile phase additive |
ggplot2 version 3.3.5 | CRAN | https://cran.r-project.org/web/packages/ggplot2/ | |
HALO | Indica Labs | Image analysis software | |
IDLE (Integrated Development and Learning Environment) | Python Software Foundation | ||
iheatmapr version 0.5.1 | CRAN | https://cran.r-project.org/web/packages/iheatmapr/ | |
iRT Kit | Biognosys | Ki-3002-1 | LC-MS QAQC Standard |
limma version 3.42.2 | Bioconductor | https://bioconductor.org/packages/release/bioc/html/limma.html | |
LMD Scanning stage Ultra LMT350 | Leica Microsystems | 11888453 | LMD stage model outfitted with PCT tube holder |
LMD7 (software version 8.2.3.7603) | Leica Microsystems | LMD apparatus (microscope, laser, camera, PC, tablet) | |
Mascot Server | Matrix Science | Data analysis software | |
Mass Spec-Compatible Human Protein Extract, Digest | Promega | V6951 | LC-MS QAQC Standard |
Mayer’s Hematoxylin Solution | Sigma Aldrich | MHS32 | |
PEN Membrane Glass Slides | Leica Microsystems | 11532918 | |
Peptide Retention Time Calibration Mixture | Thermo Fisher Scientific | 88321 | LC-MS QAQC Standard |
Phosphatase Inhibitor Cocktail 2 | Sigma Aldrich | P5726 | |
Phosphatase Inhibitor Cocktail 3 | Sigma Aldrich | P0044 | |
Pierce LTQ Velos ESI Positive Ion Calibration Solution | Thermo Fisher Scientific | 88323 | Instrument calibration solution |
PM100 C18 3UM 75UMX20MM NV 2PK | Fisher Scientific | 164535 | Pre-column |
Proteome Discoverer | Thermo Fisher Scientific | OPTON-31040 | Data analysis software |
Python | Python Software Foundation | ||
Q Exactive HF-X | Thermo Fisher Scientific | Mass spectrometer | |
R version 3.6.0 | CRAN | https://cran-archive.r-project.org/bin/windows/base/old/2.6.2/ | |
RColorBrewer version 1.1-2 | CRAN | https://cran.r-project.org/web/packages/RColorBrewer/ | |
Soluble Smart Digest Kit | Thermo Fisher Scientific | 3251711 | Digestion reagent |
TMTpro 16plex Label Reagent Set | Thermo Fisher Scientific | A44520 | isobaric TMT labeling reagents |
Veriti 60 well thermal cycler | Applied Biosystems | 4384638 | Thermocycler |
Water, Optima LC/MS Grade | Fisher Chemical | W6-4 | Mobile phase solvent |
ZORBAX Extend 300 C18, 2.1 x 12.5 mm, 5 µm, guard cartridge (ZGC) | Agilent Technologies Inc | 821125-932 | Offline LC trap column |
ZORBAX Extend 300 C18, 2.1 x 150 mm, 3.5 µm | Agilent Technologies Inc | 763750-902 | Offline LC analytical column |