Histopathology images involve the analysis of tissue samples to diagnose several diseases, such as cancer. The analysis of tissue samples is a time-consuming procedure, manually made by medical experts, namely pathologists. Computational pathology aims to develop automatic methods to analyze Whole Slide Images (WSI), which are digitized histopathology images, showing accurate performance in terms of image analysis. Although the amount of available WSIs is increasing, the capacity of medical experts to manually analyze samples is not expanding proportionally. This paper presents a full automatic pipeline to classify lung cancer WSIs, considering four classes: Small Cell Lung Cancer (SCLC), non-small cell lung cancer divided into LUng ADenocarcinoma (LUAD) and LUng Squamous cell Carcinoma (LUSC), and normal tissue. The pipeline includes a self-supervised algorithm for pre-training the model and Multiple Instance Learning (MIL) for WSI classification. The model is trained with 2,226 WSIs and it obtains an AUC of 0.8558 ± 0.0051 and a weighted f1-score of 0.6537 ± 0.0237 for the 4-class classification on the test set. The capability of the model to generalize was evaluated by testing it on the public The Cancer Genome Atlas (TCGA) dataset on LUAD and LUSC classification. In this task, the model obtained an AUC of 0.9433 ± 0.0198 and a weighted f1-score of 0.7726 ± 0.0438.
With a prevalence of 1-2% Celiac Disease (CD) is one of the most commonly known genetic and autoimmune diseases, which is induced by the intake of gluten in genetically predisposed persons. Diagnosing CD involves the analysis of duodenum biopsies to determine the small intestine condition. In this study, we propose a singlescale pipeline and the combination of two single-scale pipelines, forming a multi-scale approach, to accurately classify CD signs in histopathology whole slide images with automatically generated labels. The automatic classification of CD signs in histopathological images of these biopsies has not been extensively studied, resulting in the absence of a standardized guidelines or best-practices for this purpose. To fill this gap, we evaluated different magnifications and architectures, including a pre-trained MoCov2 model, for both single- and multiscale approaches. Furthermore, for the multi-scale approach, methods for aggregating feature vectors from several magnifications are explored. For the single-scale pipeline we achieved an AUC of 0.9975 and a weighted F1-score of 0.9680, while for the multiscale Pipeline an AUC of 0.9966 and a weighted F1-score of 0.9250 was achieved. On large datasets, no significant differences were observed; however, with only 10% of the dataset, the multi-scale framework outperforms the single-scale framework significantly. Moreover, the multi-scale approach requires only half of the dataset and half of the time compared to the best single-scale result to identify the optimal model. In conclusion, the multi-scale framework emerges as an exceptionally efficient solution, capable of delivering superior results with minimal data and resource demands.
Diet, lifestyle and an aging population have led to many diseases, some of which can be seen well in the eyes and analyzed by simple means, such as OCT (Optical Coherence Tomography) scans. This article presents a comparative study examining transfer learning methods for classifying retinal OCT scans. The study focuses on the classification of several retina alterations such as Age-related Macular Degeneration (AMD), Choroidal Neovascularization (CNV), Diabetic Macular Edema (DME) and normal cases. The approach was evaluated on a large dataset of labeled OCT scans. In this work we use CNN architectures such as VGG16, VGG19, ResNet50, MobileNet, InceptionV3 and Xception with the weights pre-trained on ImageNet and then fine-tuned on the domain-specific data. The results indicate that the proposed transfer learning is a powerful tool for classifying multi-class retinal OCT scans.
In this work, we propose a deep learning system for weakly supervised object detection in digital pathology whole slide images. We designed the system to be organ- and object-agnostic, and to be adapted on-the-fly to detect novel objects based on a few examples provided by the user. We tested our method on detection of healthy glands in colon biopsies and ductal carcinoma in situ (DCIS) of the breast, showing that (1) the same system is capable of adapting to detect requested objects with high accuracy, namely 87% accuracy assessed on 582 detections in colon tissue, and 93% accuracy assessed on 163 DCIS detections in breast tissue; (2) in some settings, the system is capable of retrieving similar cases with little to none false positives (i.e., precision equal to 1.00); (3) the performance of the system can benefit from previously detected objects with high confidence that can be reused in new searches in an iterative fashion.
Hematoxylin and Eosin (H&E) are one of the main tissue stains used in histopathology to discriminate between nuclei and extracellular material while performing a visual analysis of the tissue. However, histopathology slides are often characterized by stain color heterogeneity, due to different tissue preparation settings at different pathology institutes. Stain color heterogeneity poses challenges for machine learning-based computational analysis, increasing the difficulty of producing consistent diagnostic results and systems that generalize well. In other words, it is challenging for a deep learning architecture to generalize on stain color heterogeneous data, when the data are acquired at several centers, and particularly if test data are from a center not present in the training data. In this paper, several methods that deal with stain color heterogeneity are compared regarding their capability to solve center-dependent heterogeneity. Systematic and extensive experimentation is performed on a normal versus tumor tissue classification problem. Stain color normalization and augmentation procedures are used while training a convolutional neural networks (CNN) to generalize on unseen data from several centers. The performance is compared on an internal test set (test data from the same pathology institutes as the training set) and an external test set (test data from institutes not included in the training set). This also allows to measure generalization performance. An improved performance is observed when the predictions of the two best-performed stain color normalization methods with augmentation are aggregated. An average AUC and F1-score on external test are observed as 0:892±0:021 and 0:817±0:032 compared to the baseline 0:860±0:027 and 0:772 ± 0:024 respectively.
Prostate cancer (PCa) is one of the most frequent cancers in men. Its grading is required before initiating its treatment. The Gleason Score (GS) aims at describing and measuring the regularity in gland patterns observed by a pathologist on the microscopic or digital images of prostate biopsies and prostatectomies. Deep Learning based (DL) models are the state-of-the-art computer vision techniques for Gleason grading, learning high-level features with high classification power. However, for obtaining robust models with clinical-grade performance, a large number of local annotations are needed. Previous research showed that it is feasible to detect low and high-grade PCa from digitized tissue slides relying only on the less expensive report{level (weakly) supervised labels, thus global rather than local labels. Despite this, few articles focus on classifying the finer-grained GS classes with weakly supervised models. The objective of this paper is to compare weakly supervised strategies for classification of the five classes of the GS from the whole slide image, using the global diagnostic label from the pathology reports as the only source of supervision. We compare different models trained on handcrafted features, shallow and deep learning representations. The training and evaluation are done on the publicly available TCGA-PRAD dataset, comprising of 341 whole slide images of radical prostatectomies, where small patches are extracted within tissue areas and assigned the global report label as ground truth. Our results show that DL networks and class-wise data augmentation outperform other strategies and their combinations, reaching a kappa score of κ = 0:44, which could be further improved with a larger dataset or combining both strong and weakly supervised models.
The overall lower survival rate of patients with rare cancers can be explained, among other factors, by the limitations resulting from the scarce available information about them. Large biomedical data repositories, such as PubMed Central Open Access (PMC-OA), have been made freely available to the scientific community and could be exploited to advance the clinical assessment of these diseases. A multimodal approach using visual deep learning and natural language processing methods was developed to mine out 15,028 light microscopy human rare cancer images. The resulting data set is expected to foster the development of novel clinical research in this field and help researchers to build resources for machine learning.
Grading whole slide images (WSIs) from patient tissue samples is an important task in digital pathology, particularly for diagnosis and treatment planning. However, this visual inspection task, performed by pathologists, is inherently subjective and has limited reproducibility. Moreover, grading of WSIs is time consuming and expensive. Designing a robust and automatic solution for quantitative decision support can improve the objectivity and reproducibility of this task. This paper presents a fully automatic pipeline for tumor proliferation assessment based on mitosis counting. The approach consists of three steps: i) region of interest selection based on tumor color characteristics, ii) mitosis counting using a deep network based detector, and iii) grade prediction from ROI mitosis counts. The full strategy was submitted and evaluated during the Tumor Proliferation Assessment Challenge (TUPAC) 2016. TUPAC is the first digital pathology challenge grading whole slide images, thus mimicking more closely a real case scenario. The pipeline is extremely fast and obtained the 2nd place for the tumor proliferation assessment task and the 3rd place in the mitosis counting task, among 17 participants. The performance of this fully automatic method is similar to the performance of pathologists and this shows the high quality of automatic solutions for decision support.
The Gleason grading system was developed for assessing prostate histopathology slides. It is correlated to the
outcome and incidence of relapse in prostate cancer. Although this grading is part of a standard protocol
performed by pathologists, visual inspection of whole slide images (WSIs) has an inherent subjectivity when
evaluated by different pathologists. Computer aided pathology has been proposed to generate an objective and
reproducible assessment that can help pathologists in their evaluation of new tissue samples. Deep convolutional
neural networks are a promising approach for the automatic classification of histopathology images and can
hierarchically learn subtle visual features from the data. However, a large number of manual annotations from
pathologists are commonly required to obtain sufficient statistical generalization when training new models that
can evaluate the daily generated large amounts of pathology data. A fully automatic approach that detects
prostatectomy WSIs with high–grade Gleason score is proposed. We evaluate the performance of various deep
learning architectures training them with patches extracted from automatically generated regions–of–interest
rather than from manually segmented ones. Relevant parameters for training the deep learning model such as
size and number of patches as well as the inclusion or not of data augmentation are compared between the tested
deep learning architectures. 235 prostate tissue WSIs with their pathology report from the publicly available
TCGA data set were used. An accuracy of 78% was obtained in a balanced set of 46 unseen test images with
different Gleason grades in a 2–class decision: high vs. low Gleason grade. Grades 7–8, which represent the
boundary decision of the proposed task, were particularly well classified. The method is scalable to larger data
sets with straightforward re–training of the model to include data from multiple sources, scanners and acquisition
techniques. Automatically generated heatmaps for theWSIs could be useful for improving the selection of patches
when training networks for big data sets and to guide the visual inspection of these images.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.