1.IntroductionHematoxylin and eosin (H&E) staining is a general staining method commonly performed in pathological diagnosis. Hematoxylin dye stains the cell nuclei, whereas eosin dye stains the cytoplasm. The morphological features, such as cell and tissue structures, morphology, color, and texture, are evaluated to determine the pathological diagnosis. H&E staining is followed by the classification of the histological type and differentiation grade. Various proteins within the cells related to targeted therapy are visualized via immunohistochemical (IHC) reactions. IHC staining has been used to visualize estrogen receptor, progesterone receptor, human epidermal growth factor receptor 2 (HER2), and Ki-67 in patients with breast cancer. In IHC, various proteins are visualized using 3,3′-diaminobenzidine (DAB), and the nuclei are visualized via counterstaining with hematoxylin.1 The IHC staining method plays a vital role in the pathology diagnosis of cancers; however, it is more expensive and complicated than H&E staining. Whole slide imaging (WSI) technology has revolutionized the domain of pathology diagnosis. Tissue slides are digitized into high-resolution images using microscopic scanners (Fig. 1), and the laborious process of manual quantitation is replaced with efficient automated algorithms. Figure 2 presents the H&E stain and IHC stain WSIs of a uterine corpus specimen. Additionally, storing digital images allows the application of image analysis technology, or artificial intelligence, including deep learning technology. Powered by the evolution of computing hardware such as the graphics processing unit (GPU), deep learning technology has achieved impressive outcomes in the field of computer vision. Previous studies have established the ability of deep learning techniques to perform pathology image analysis tasks, such as the classification of histological types and differentiation of cancer, the detection of mitotic cells in the tissues, and the segmentation of the tumors. Ki-67 protein is expressed during the G1, S, G2, and M phases of the cell cycle, except for the quiescent phase (G0).1,2 Consequently, Ki-67 has been used as a biomarker to assess the proliferative ability of malignant cells and determine the malignancy of cancer.3 The labeling index (LI), also known as the proliferation index,4 is one of the crucial diagnostic parameters calculated from the IHC expression of Ki-67. The LI represents the ratio between the number of IHC-positive nuclei and the total number of nuclei within the tumor. After obtaining the whole slide images, pathologists would select regions of interest (ROIs) from hotspots containing sufficient expression areas for quantification. An ROI should contain around 1000 to 1200 cell nuclei for the calculation of LI.5 However, manual measurement and evaluation of LI is labor-intensive. Therefore, a technique for converting the H&E-stained specimens into their IHC-stained counterparts, which can be used for automatic quantitation, was developed in this study using deep learning and image processing methods. The Ki-67 protein is expressed during the active phases of the cell cycle. Thus the expression of this histochemical can be inferred from the morphological and texture features visualized via the H&E staining method.1,2,6–8 Deep learning methods could facilitate this process. U-Net, a deep learning model for image segmentation, was used in this study to generate digital hematoxylin-3,3′-diaminobenzidine (H-DAB) IHC stains with accurate nuclear positivity. The proposed method was applied to the WSIs of patients with endometrial adenocarcinoma. Previous studies have evaluated the utility of digital IHC staining using deep learning.9 However, this is the first study to depict the correlation between the LI of digital IHC staining and physical IHC staining and analyze the cross-case generalization of Ki-67 digital staining models in uterine corpus endometrial carcinoma (UCEC). 2.Related Work2.1.Encoder–Decoder ModelsDeep learning methods have been widely used to perform generative computer vision tasks. The encoder–decoder architecture is one such popular paradigm. The encoder accepts an input image and projects it to a high-dimensional feature space with a relatively lower spatial resolution and abundant semantic information. The decoder recovers an output containing task-specific information, such as the segmentation of objects or images with alternated styles,10 from the encoded tensors. The fully convolutional neural network11 (FCN) was designed for semantic segmentation in general scenes. FCN was the first network to produce a pixel-to-pixel translation of images using convolutional layers only. Compared with its predecessors, FCN contains no dense layer but introduces the upsampling operation to decode the output image with a resolution identical to the input from the feature maps encoded by convolutional layers from the input image. U-Net is a widely used model for segmenting medical images12 and generative computer vision tasks. Compared with FCN, U-Net inserts skip connections between the encoder and decoder layers. This operation contributes to aggregating information from different scales and generates fine-grained results. 2.2.Classification of Ki-67-Positive Nuclei Using Hand-Crafted FeaturesCells in different phases of the cell cycle possess unique morphological and texture features.13 Kimura et al.6 used the support vector machine (SVM) to classify the Ki-67-positive and Ki-67-negative single nuclei cropped from endometrial adenocarcinoma specimens. The nuclei were extracted and divided into positive and negative groups equally. The signal intensities, texture features represented by the gray-level co-occurrence matrix,14 morphological features, and chromatin distributions of each nucleus were differentiated using a linear SVM.15 This method resulted in an accuracy of 85%. Those studies suggested that the proliferation status of cells may be correlated with the morphological and texture characteristics. Therefore, it may be possible to translate an H&E-stained specimen to its IHC counterpart by analyzing the features of the nuclei and identifying the proliferating nuclei that should be marked with the DAB component. 2.3.Digital StainingDigital staining has enabled the visualization of tissue regions via the analysis of their features using algorithms instead of physical pigments. Traditionally, digital staining can be realized by analyzing the spectral characteristics of the tissue.16 Advances in the field of deep learning have facilitated further research on transforming the stain types with neural networks. By leveraging the visual features of different tissue regions, colors of corresponding pigments are assigned to each tissue area. For example, Chang et al.17 proposed to transform H&E to the immunofluorescence stain using the Pix2Pix model,18 Xu et al.19 and Quiros et al.20 used adversarial networks to generate real-like stained specimen samples. De Haan et al.21 used GAN-based methods to transfer the H&E stain to Masson’s Trichrome, Jones, and PAS stains. However, these stains are histological stains corresponding to human-visible structures, such as membranes or fibers, and have no functionality to reveal molecule-level activities. Mercan et al.22 utilized the Cycle-GAN23 to map an image of H&E stained breast specimen to its phosphohistone H3 (pHH3) stain counterpart and revealed the presence of mitotic cells in the tissue. Li et al.24 used a U-Net with Gaussian-weighted masks of cell centroids to distinguish the mitotic cells and revealed a correlation between the visual patterns in H&E images and the cell cycle information revealed by pHH3. Notice that their problem setting is similar to the work presented herein, whereas the pHH3 is only expressed during the mitosis and G2 phases, and Ki-67 is expressed during all active phases of the cell cycle.25 Moreover, the mitotic cells in the H&E stained specimens can be distinguished visually, whereas Ki-67 positive cells cannot be directly observed. Therefore, utilizing features related to Ki-67 expression is a more challenging task as the visual characteristics are relatively subtle during nonmitotic phases. Three highly related studies are introduced herein. Liu et al.26 used ResNet-1527 to classify manually annotated nucleus patches in neuroendocrine tumors. The network was reformed into an FCN to generate a heatmap of positive nuclei. A strong correlation was observed between the positive pixel area ratios in the prediction and the ground truth. Liu et al.28 used a Cycle-GAN-like model on serial cuts of neuroendocrine cancer and breast cancer to generate digital Ki-67 stains and obtained a strong correlation of Ki-67 positive area. Martino et al.29 used the Pix2Pix model to predict Ki-67 positivity in H&E-stained oral squamous cell carcinoma tissues and reported a strong correlation of the LI. Precedent research has shown the possibility of stain conversion with generative models. However, the generalization of models has not been elucidated, especially in terms of the reliable derivation of nucleus-level diagnostic metrics in intercase scenarios. Moreover, the results of the nucleus-level evaluation, such as LI, have not been reported, and the cross-case performance of their model remains unclear. Additionally, the FCN sacrifices image resolution as it downsamples the image, whereas a U-Net-based generator can preserve the resolution of the input. In this study, we used cross-case schemes to quantitatively evaluate the generalization gap of U-Net-based Ki-67 LI prediction across cases. This study presents the results of deriving stain density maps from the optical density (OD) image instead of the RGB image. Compared with the RGB space, using the OD images to train the U-Net improved the correlation of the LI under the intraslide and cross-case scenarios. This is the first report to depict the correlation between the LIs of digital IHC stain and physical stain in a cross-case condition for UCEC. 3.Methodology3.1.OverviewWe used the U-Net12 to directly predict the digital staining images in the OD or RGB space. Both models were trained in an end-to-end manner. The stain density maps were calculated from OD using the color unmixing technique (see Sec. 3.2). As for the physical processing of specimens, a section of a physical specimen was manually stained with H&E. After the physical specimen section was scanned and digitized as an H&E-staining WSI, we destained the very section and manually applied the IHC method on it. Finally, we scanned the IHC-stained physical specimen and used its WSI as the physical ground truth. After scanning and spatial registration, we extracted the OD of the stain’s IHC image using the color unmixing method. The image pairs of H&E-IHC were used to train the U-Net, as shown in Fig. 3. Four input–output color space combinations were used in the present study: OD–OD, RGB–RGB, RGB–OD, and OD–RGB, where OD images separate stains into individual channels. For inference, we used the trained model to predict the OD image of the stains and convert the output OD to RGB or infer the RGB image of Ki-67 IHC staining directly. The matrix for color-unmixing was reused during OD–RGB output conversion. 3.2.Color UnmixingThe color unmixing method was used to separate the stains from an RGB image based on their absorption characteristics.30 Each pigment in the physically stained specimens has its own absorption coefficients for the R, G, and B lights. Thus it is assumed that the OD values (absorbance) of RGB components can be represented by a linear combination of the stain amounts.31 In the case of H&E staining, the OD image mentioned in the previous section consists of the stain density maps of H&E and the map of the background component. The term “stain density” represents the amount of stain estimated in each pixel. Since the absorption characteristic of the background is unknown and we have an image with only three channels, we approximately consider a residual component as the background. Then the linear mixing relationship is given by where is a vector of the stain densities of hematoxylin and eosin and the intensity of the residual component, and is a vector of OD values of R, G, and B components. is a matrix of the absorption coefficients of hematoxylin and eosin and the coefficients for the residual component. is sometimes called the stain matrix, and if is the stain matrix for H&E stain, we have where the absorption coefficient vectors of stain , with , E, and the residual coefficient vector, is obtained by the cross product . The exact definition of and its derivation are described at the end of this section. The actual values of matrix given in Eq. (2) are obtained from the study conducted by Ruifrok et al.,30 although they may not be suitable for the slides used in the present study owing to the variations in the absorption characteristics caused by chemical conditions of the stain, staining time, and specimen transmittance.32 Quantitation and alleviation of such biases will be investigated in future studies.Similarly, the OD image of H-DAB staining consists of the stain density maps of hematoxylin, DAB, and the map of the residual component. The eosin component in Eq. (2) is substituted with DAB. Thus the residual coefficient vector . The stain matrix for H-DAB staining becomes Following the Beer–Lambert law,33 the pixel values , which represent the light intensities recorded by the sensor, were normalized by the maximum intensity , which can be obtained from glass regions, and converted to the OD with an element-wise division followed by logarithm as shown in Eq. (4), where . The density of each stain was calculated using the OD of the R, G, and B channels. For example, Eq. (5) shows the calculation of hematoxylin and eosin intensities from a H&E stained RGB image: where is the inverse of the stain OD matrix ; , , and are the stain densities of hematoxylin, eosin, and residual components, respectively; , , and are the normalized intensities of the R, G, and B channels for each pixel, respectively. Figure 4 presents an example of a color-unmixed IHC stain, where the residual channel denotes the component orthogonal to the hematoxylin and DAB stains. The positive nuclei are clearly stained in the DAB channel, and the negative cells can be distinguished from the hematoxylin channel.The exact definition of was derived according to the Beer–Lambert law. Let denote the absorbance of a sample that contains the material (stain) for wavelength , denotes the molar absorption coefficient of the material stain , denotes the molar concentration of the stain, and denotes the optical path length in the sample stain. Thus, If we neglect the scattering in the material, the intensity of the transmitted light with wavelength in a region purely stained with , is given by where is the light intensity incident into the material. Another approximation is to consider the wavelength only with R, G, and B color channels. The OD of a single-stained sample corresponds to the absorbance, .In WSI, represents the amount of molecule in the effective cross section that corresponds to a single pixel. However, it is difficult to quantify the absolute amount of molecule and we do not need the absolute value of the material concentration. Now let us consider an arbitrary constant alpha for normalization. Then, we have where denotes the relative absorption coefficient after normalization, and we define the relative amount of molecule , whereas . If we select pixels purely stained with the stain and obtain , we can determine the absorption coefficient vector by normalizing . 3.3.Spatial AlignmentSince we washed out the H&E stain and applied the H-DAB stain subsequently to visualize the Ki-67 positivity of the corresponding nuclear regions, the IHC and H&E images of one specimen have location misalignment caused by rescanning. Therefore, image registration of H&E and IHC WSIs was performed. The registration was performed based on affine matrix estimation, the transformation from one biased image to the reference image. The implementation of Marzahl et al.,34 which used ORB features and FLANN matching,35 was applied in this study. Figure 5 shows an example of spatial alignment. 3.4.U-NetFigure 6 shows the U-Net implementation. The numbers on the top of each convolutional module indicate the number of filters for the output convolutional layer in that module. The input shape of the network was during training and arbitrary for inference. All filter sizes were , except for the output layer, which used convolution to generate a three-channel output image. All downsampling and upsampling rates are . DenseNet-121 was used36 as the backbone of our network, which was pretrained on the ImageNet dataset.37 The models and pretrained weights were adapted from the implementation of segmentation models38 code base. 3.5.TrainingThe mean absolute error (MAE) was used as the loss function to train the U-Net, as defined by where and correspond to the prediction and the ground truth, respectively. is the number of pixels in a minibatch. As shown in Fig. 3, in all schemes, images in the color space of the U-Net’s input and output were prepared in advance, i.e., we calculated and backpropagated the loss with the output stain density map and a precomputed ground truth if the U-Net predicts OD rather than computing the loss with the RGB IHC stain after postprocessing. Identical and were used in all training and predictions.To validate the generalization ability of our method, we designed two different schemes for training and validation. The intraslide training and validation scheme included all WSIs in the training set; the randomly sampled regions in each case were used for validation. As shown in Fig. 7, the green grids represent the data shards used for training, whereas the yellow grids represent data for validation. Intraslide inference can be used to generate digital staining of tiles when annotation and IHC staining of other tiles in the same tissue are available. This scheme was used frequently in previous reports. However, the similarities in tissue structure and staining condition in the intraslide scheme may introduce biases, thereby preventing its generalization to other cases. This gap is experimentally presented in Sec. 5.3. The cross-case validation scheme was used to test the model’s generalization ability across the cases. That is, we took sixteen cases in each grade from the dataset for training and left three cases in each grade for validation. In this sixfold validation, no information from any regions in the testing cases was involved in training, and the effectiveness of the models’ cross-case prediction could be qualitatively shown. 4.Experiment4.1.Hardware and SoftwareAs Table 1 shows, we used TensorFlow 2.039 as the basic framework for neural network construction and data processing. QuPath40 was adopted as a third-party tool for annotation and evaluation. All experiments were performed on the Nvidia DGX workstation with quad V100 GPUs, each with 32 GB of memory. A batch size of 64 was used for each GPU. The Adam41 optimizer’s base learning rate of was scaled by the number of GPUs operating in parallel.42 Four GPUs were utilized for the training. The hyperparameters of the Adam optimizer were , , and . No weight decay was used. We used a U-Net with a DenseNet-121 backbone, i.e., DenseNet-121-based encoder layers, and the model was trained for 50 epochs, taking approximately 11 h. All inference results were obtained with models at epoch 50. The test results of the cross-case models were generated using the model of the corresponding fold. The identical U-Net architecture and backbone for the generator were used while training the GAN-based Pix2Pix and Cycle-GAN models. The models were trained using the same step number. Table 1Summary of the experiment settings.
The objective of Pix2Pix is shown in the following equation: where is the generated IHC patch, is the ground-truth IHC patch, is the input H&E patch, is the generator, is the discriminator, is the adversarial loss term, is the L1 loss term, and is the mathematical expectation calculated by averaging a minibatch. and are L2 norm and L1 norm, respectively. The weight factor .18 Training the Pix2Pix for 50 epochs took .The objective of Cycle-GAN is shown in the following equation: where is the H&E patch, is the ground-truth IHC patch, is the generator converting an H&E patch to IHC, is the generator converting an IHC patch to H&E, is the discriminator for generated IHC patches, and is the discriminator for generated H&E patches. is the adversarial loss, is the cycle consistency loss such that , the output of the IHC-H&E generator, approximates the ground-truth H&E image and vice versa for , the output of the H&E-IHC generator. is the identity loss such that the H&E-IHC generator does not change an IHC input and vice versa for the IHC-H&E generator . The weight factors , .23RGB–RGB color space was used to train the Cycle-GAN models owing to the heavy computation. The learning rate of Adam was set to for Pix2Pix and for Cycle-GAN. was set to 0.5. The remaining hyperparameters of the GAN-based methods are identical to the proposed method. Training the Cycle-GAN for 50 epochs took . On average, inference of the U-Net generator in all models required 2.9 s with CPU and 1.5 s with GPU for a tile in the test set. 4.2.Dataset4.2.1.Pathology specimensTo acquire original pathological data of paired H&E and IHC stains, we used specimens of UCEC diagnosed at the Shinshu University Hospital. Fifty-seven cases classified as G1, G2, and G3 according to the International Federation of Gynecology and Obstetrics (FIGO) classifications43 were used in this study. Each grade comprised 19 cases-specimens. The H&E stained specimens were decolorized after scanning, and the IHC reaction for Ki-67 was performed on the same specimens. This process reveals positive reactions of the nuclei in the IHC and shows the fundamental morphological and texture features in the H&E specimens. The IHC reaction was performed using the Novolink Polymer method (Leica Biosystems, Nussloch, Germany). The primary antibody against the Ki-67 protein (clone: MIB-1, Dako, Santa Clara, California, USA) was allowed to react at room temperature for 1 h. The IHC reaction products were visualized by a DAB substrate chromogen with deep brown. Ki-67 negative nuclei were stained blue with Mayer’s hematoxylin, thereby yielding high visual contrast. Both H&E-stained and IHC-stained specimens were scanned using a whole slide scanner (NanoZoomer 2.0-HT, Hamamatsu Photonics Corp., Shizuoka, Japan) with a 40× objective lens (pixel pitch = ). The WSIs were aligned subsequently. Thus, 57 pairs of H&E-stained and IHC-stained WSIs of the physical specimens were obtained. 4.2.2.Sampling and preprocessingManual registration of all 57 cases was laborious and infeasible. We used affine matrix estimation from ORB features and FLANN matching instead. The window size for keypoint extraction was set to , and the maximum number of features was set to 131,072. Registration was performed on WSIs downsampled to 32,768 pixels of width. The registration error was evaluated by comparing the MAE of the and coordinates among ninety landmark points manually set in nine cases. The average registration error was (6.4 pixels) and (3.8 pixels). The error with that of manual registration yielding and for the same images. Thus automatic registration was considered acceptable (Fig. 8). To extract the ROIs and build the dataset, tiles with the size of were sampled according to the blue ratio of the downsampled WSIs. The regions with a higher blue ratio were considered to have concentrated tumor cells stained with hematoxylin. These regions are considered suitable for training.44 After preprocessing, 7370 samples with the size of were extracted from the 57 pairs of WSIs. We randomly selected six samples from each WSI in advance and used them for testing. As a result, we have 7028 sets of H&E-IHC tile pairs in OD and RGB color spaces for training and validation and 342 sets for testing. The tile pairs in the training and validation splits were selected randomly during runtime with a fixed random seed. Those tiles were cropped to in the training phase. 4.3.Evaluation Metrics4.3.1.Labeling indexWe evaluate the Pearson correlation of Ki-67 LI calculated from the digital staining results and the corresponding physical stains. Ki-67 LI is the proportion of Ki-67-positive cells in a tumor region. The calculation of LI is shown in Eq. (11), where is the number of positive nuclei in the tumor regions and is the number of negative nuclei: As shown in Fig. 3, six patches excluded from training and validation sets were sampled from each WSI. Identical parameters for postprocessing were set in the QuPath quantitation software for nucleus counting in all experiments. The derived labeling indices may vary according to the parameter settings and the selection of evaluation regions. 4.3.2.Image similarityThe image similarity metrics that are commonly used in image processing tasks were evaluated to compare the proposed method with the baseline comprehensively. We report the peak signal-to-noise ratio (PSNR) and the structural similarity index measure (SSIM)45 of the digital stain. Let and denote the prediction and the ground truth images, respectively. The PSNR is defined in the following equation: where is the maximum value function. The SSIM is defined in the following equation: where and are the pixel sample mean, and are the standard deviation, and is the cross correlation of and . and are small factors for numerical stabilization. Because registration errors exist in the preprocessing of our dataset, there are location misalignments between the H&E image and the IHC image. When evaluating the similarity metrics between the digital IHC stain, which is generated from physical H&E and the physical IHC stain, the translation sensitivity of PSNR and SSIM would result in lower, biased scores. Therefore, we also report the complex wavelet SSIM (CW-SSIM),46 which computes the similarity of images in the frequency domain and alleviates the effect of registration errors. We computed the average of CW-SSIM between the channels of the digital IHC staining image and the physical IHC staining image.5.Result5.1.Visual ResultWe report the results using OD–OD, RGB–RGB, OD–RGB, and RGB–OD color spaces under intraslide and cross-case schemes. Figures 9 and 10 show the tiles of the H&E specimens, corresponding physical IHC stains, and the digital stains generated from the U-Net. Figures 11 and 12 show the results of GAN-based models. The size of the test images was . The intraslide models generated results with loyal colors and precise Ki-67 positivity predictions. In contrast, the cross-case models exhibited artifacts, such as local blurring and color variations. The distribution of the Ki-67-positive nuclei was correlated with the physical staining in general. However, the advantages and disadvantages of color space combinations could not be determined qualitatively. Pix2Pix generates images with acceptable colors and positivity. The Cycle-GAN model failed to demonstrate a meaningful Ki-67-positive cell distribution even under the intraslide scheme. Moreover, the color of the generated images differed significantly from the ground truth. 5.2.Similarity MetricsTable 2 presents the pixel level PSNR, SSIM, and CW-SSIM of U-Net-based, end-to-end models with different input/output color space combinations for intraslide and cross-case experiments. “O” and “R” correspond to the OD space and RGB space, respectively. For example, “OR” means the model uses OD input and RGB output to train the generator. The difference between the color space combinations was not prominent in general. Slightly higher scores of RGB–RGB metric were indicated by the potentially better color and structural fidelity of the result images; however, nucleus-level comparison necessitates further evaluations. In terms of the image similarity metrics, the Pix2Pix model achieved scores that were comparable with those of the U-Net-based models. We only report the results of Cycle-GAN with RGB–RGB generators; the experiments for color space combinations other than RGB–RGB were not conducted due to obvious visual and quantitative inferiority. Table 2Similarity metrics of different models with various color space combinations.
5.3.Quantitation of Labeling IndexFigures 13 and 14 present the Pearson correlation and Bland–Altman plots of the LI derived using the U-Net, respectively. The U-Net yielded a correlation stronger than with statistical significance when the grade of each case was not addressed, i.e., according to the intraslide scheme. However, the digital staining models trained with the cross-case scheme were not equally correlated with the physical IHC staining. The differences in the mean values were also quantitated and visualized with Bland–Altman plots. The statistical analysis results are summarized in Tables 3 and 4, wherein agreement means there is no significant difference between the mean value of the LIs of the digital and physical stains according to a two-sided -test. The -values of the Pearson correlation and two-sided -test were measured. The output of the model was considered consistent with the physical stain when the -test revealed an insignificant difference . We also quantitatively showed the error with the MAE of the LI. The results of the intraslide models were consistent with the physical stain, indicating a strong correlation. Although weaker, a correlation was also observed in the CCV models, indicating the utility of digital Ki-67 staining in the future after considerable improvement in the technology. The Bland–Altman plots revealed negative biases, indicating the necessity to alleviate false negatives, especially in high-grade cases. As shown in Figs. 15 and 16, the Pearson correlation of Pix2Pix and Cycle-GAN failed to outperform the U-Net under any training scheme. Table 3Statistical evaluation results, intraslide. p<10−9 are shown as 0.
Note: bold values represent the best result obtained for each metric. Table 4Statistical evaluation results, cross-case validation.
Note: bold values represent the best result obtained for each metric. 6.DiscussionThe two main features of the proposed method are as follows. First, it intuitively revealed the positivity of cells intuitively on the resultant images, and the resolution of the generated digital stain was higher than that of the FCN-based method.26 The generated digital stains, wherein the textures of chromatin and stromal tissues are preserved, were more intelligible for pathologists. Second, the proposed method utilized the color unmixing method to separate the stains, thereby enabling the direct supervision of the Ki-67 positive regions in the DAB channel without necessitating the manual annotation of each nucleus. The preprocessing procedure based on color unmixing could facilitate the explicit extraction of Ki-67-positive nuclei, even from low-quality stains; the methods based on generative models do not focus on the semantics and harm the explainability of the model. OD–OD inference yielded the highest correlation with the ground truth in the intraslide and cross-case schemes. It was presumed that the color difference of the output image affects the results of nuclei quantification, as the effective features for distinguishing positive nuclei are mainly textural and chromatic. Thus calculating the OD for the input and output can provide clearer supervision of positive pixels and address the variation of each stain separately. Also unmixing the stain channels will facilitate stain intensity adjustment and color normalization of WSIs. Using such per-stain labels might contribute to the cross-case generalization of the digital staining models. The primary limitation of the current method is the difficulty in generalizing the high prediction precision to cross-case scenarios. This generalization gap may be attributed to the color differences and redundant global information. With the stains in WSIs separated into OD channels, it would be feasible to normalize the staining intensities in training, which is a part of our future work. The U-Net accepts input images containing a tissue region rather than a single nucleus and involves global characteristics, such as glandular structures and specific patterns of cell swarm, during training. Such features vary in each case and can hinder the cross-case inference of our models. There have been previous studies on digital staining; however, practical evaluations for clinical applications have not been conducted. The evaluation has primarily relied on assessing the visual similarity between digital images and their physically stained counterparts. As the purpose of IHC is to evaluate protein expression, it is meaningless unless the performance of digital IHC is assessed with a clinically relevant index. The result of evaluating with LI is quite valuable in positioning the potential for clinical application. The OD of DAB is also utilized as a diagnostic index and is an issue of future challenge. Two evaluation schemes were compared in this study. Naturally, none of the schemes used the same patches for training and validation data. However, in the intraslide scheme, the training and validation sets included the images from the same slide. A high correlation was observed in that case, whereas the correlation decreased remarkably when the training and validation data were separated by case. It should be noted that only a single slide from each case was used in this study. If multiple slides are created for a single case, even though they are different slides, they should be considered as intraslide data and treated accordingly. The trending GAN-based domain transfer models, particularly Cycle-GAN and Pix2Pix, did not exhibit superiority in the nucleus quantitation task, though competitive pixel-level image similarity metrics were observed with the use of Pix2Pix. The Cycle-GAN failed to yield a correlated result due to the lack of effective fidelity supervision, such as L1 loss. A cross-case generalization gap was also observed in the Pix2Pix model, and its nucleus-level LI correlation was even lower than that of the RGB–RGB U-Net baseline. Thus methods with direct fidelity loss, like L1, are preferred over generative frameworks. It is essential to refer to the CCV evaluation and strive for further improvement to facilitate the wider application of deep learning in clinical practices. Previous studies have not specified whether to use the CCV or intraslide scheme. However, it is crucial to explicitly state the training scheme used, as it can lead to a significant difference in the results. On the other hand, there might be use cases resembling the intraslide scheme, although it is currently challenging and requires ingenuity in the case of digital staining. In such unique use cases, the results obtained from the intraslide evaluation can serve as a reference. 7.ConclusionWe propose a digital staining model that utilizes the OD of stains and converts an image of a hematoxylin-eosin stain to its hematoxylin-DAB stain counterpart. We examined the correlation between the digital stain and the physical stain with the Ki-67 LI, a diagnostic metric widely used in clinical practices for cell proliferation assessment. The algorithm was evaluated with 57 WSIs for cell proliferation assessment, and the results indicate that the U-Net can generate a real-like digital stain that fairly correlates with the ground truth. We tested color space combinations of OD and RGB color spaces. Conversion from OD of H&E to OD of IHC yielded the highest correlation compared with other choices. Correlation and bias analysis revealed a tendency toward a lower prediction of LI value and false negatives. A comparison of the CCV and intraslide training schemes revealed that the correlation coefficients of LI were 0.66 and 0.98 for the CCV and intraslide schemes, respectively. The accuracy of CCV must be enhanced to enable its application in digital staining technology; namely, the model’s generalizability across the cases must be improved. In some other publications, it is unclear whether the evaluation is conducted across cases or not. This study demonstrated a high correlation for the intraslide scheme but a considerably lower correlation for the CCV scheme. Thus the agreement of diagnostic metrics, such as the LI, should be evaluated via case-based cross validation or clearly stated in the report. Although the current model could not yield a diagnostically precise digital stain for every specimen, a significant correlation was observed even during cross-case evaluation. Digital stains will assist pathologists in identifying the expression of Ki-67 in the specimens and determining the malignancy of neoplasms. Code and Data AvailabilityThe programs are available on https://github.com/jic-titech/ki67, where the readers can also access raw data, including the evaluation images and the model weights for reproducing experiment results presented in this paper. AcknowledgmentsThis study was approved by the Committee for Medical Ethics of Shinshu University, School of Medicine. This work was supported by Support for Pioneering Research Initiated by the Next Generation of Japan Science and Technology Agency (JST SPRING) (Grant No. JPMJSP2106). This work was conducted in part under a project subsidized by the New Energy and Industrial Technology Development Organization (No. JPNP20006). We would like to acknowledge Editage for editing the English language in this paper. ReferencesT. Scholzen and J. Gerdes,
“The Ki-67 protein: from the known and the unknown,”
J. Cell. Physiol., 182
(3), 311
–322 https://doi.org/10.1002/(SICI)1097-4652(200003)182:3<311::AID-JCP1>3.0.CO;2-9 JCLLAX 0021-9541
(2000).
Google Scholar
S. Uxa et al.,
“Ki-67 gene expression,”
Cell Death Differ., 28
(12), 3357
–3370 https://doi.org/10.1038/s41418-021-00823-x
(2021).
Google Scholar
J. Gerdes et al.,
“Cell cycle analysis of a cell proliferation-associated human nuclear antigen defined by the monoclonal antibody Ki-67,”
J. Immunol., 133
(4), 1710
–1715 https://doi.org/10.4049/jimmunol.133.4.1710
(1984).
Google Scholar
L. Fulawka et al.,
“Assessment of Ki-67 proliferation index with deep learning in DCIS (ductal carcinoma in situ),”
Sci. Rep., 12
(1), 3166 https://doi.org/10.1038/s41598-022-06555-3
(2022).
Google Scholar
N. Kato et al.,
“Immunohistochemical expression of cyclin E in endometrial adenocarcinoma (endometrioid type) and its clinicopathological significance,”
J. Cancer Res. Clin. Oncol., 129
(4), 222
–226 https://doi.org/10.1007/s00432-003-0426-x JCROD7 1432-1335
(2003).
Google Scholar
F. Kimura et al.,
“Detection of Ki67 expression by analyzing texture of hematoxylin-and-eosin-stained images, the effectiveness of signal intensity, and co-occurrence matrix features,”
Anal. Quant. Cytopathol. Histopathol., 40
(1), 9
–19
(2018).
Google Scholar
S. Watanabe et al.,
“Analysis of nuclear chromatin distribution in cervical glandular abnormalities,”
Acta Cytol., 48
(4), 505
–513 https://doi.org/10.1159/000326412 ACYTAN 0001-5547
(2004).
Google Scholar
M. Sobecki et al.,
“Cell-cycle regulation accounts for variability in Ki-67 expression levels,”
Cancer Res., 77
(10), 2722
–2734 https://doi.org/10.1158/0008-5472.CAN-16-0707 CNREA8 0008-5472
(2017).
Google Scholar
B. Bai et al.,
“Deep learning-enabled virtual histological staining of biological samples,”
Light: Sci. Appl., 12
(1), 57 https://doi.org/10.1038/s41377-023-01104-7
(2023).
Google Scholar
S. Minaee et al.,
“Image segmentation using deep learning: a survey,”
IEEE Trans. Pattern Anal. Mach. Intell., 44
(7), 3523
–3542 https://doi.org/10.1109/TPAMI.2021.3059968 ITPIDJ 0162-8828
(2021).
Google Scholar
E. Shelhamer, J. Long and T. Darrell,
“Fully convolutional networks for semantic segmentation,”
IEEE Trans. Pattern Anal. Mach. Intell., 39
(4), 640
–651 https://doi.org/10.1109/TPAMI.2016.2572683 ITPIDJ 0162-8828
(2017).
Google Scholar
O. Ronneberger, P. Fischer and T. Brox,
“U-Net: convolutional networks for biomedical image segmentation,”
Lect. Notes Comput. Sci., 9351 234
–241 https://doi.org/10.1007/978-3-319-24574-4_28 LNCSD9 0302-9743
(2015).
Google Scholar
T. J. Fuchs and J. M. Buhmann,
“Computational pathology: challenges and promises for tissue analysis,”
Comput. Med. Imaging Graphics, 35
(7–8), 515
–530 https://doi.org/10.1016/j.compmedimag.2011.02.006 CMIGEY 0895-6111
(2011).
Google Scholar
R. M. Haralick, K. Shanmugam and I. H. Dinstein,
“Textural features for image classification,”
IEEE Trans. Syst. Man Cybern.,
(6), 610
–621 https://doi.org/10.1109/TSMC.1973.4309314
(1973).
Google Scholar
C. Cortes and V. Vapnik,
“Support-vector networks,”
Mach. Learn., 20 273
–297 https://doi.org/10.1007/BF00994018 MALEEZ 0885-6125
(1995).
Google Scholar
P. A. Bautista et al.,
“Digital staining for multispectral images of pathological tissue specimens based on combined classification of spectral transmittance,”
Comput. Med. Imaging Graphics, 29
(8), 649
–657 https://doi.org/10.1016/j.compmedimag.2005.09.003 CMIGEY 0895-6111
(2005).
Google Scholar
Y. H. Chang et al.,
“SHIFT: speedy histopathological-to-immunofluorescent translation of whole slide images using conditional generative adversarial networks,”
Proc. SPIE, 10581 1058105 https://doi.org/10.1117/12.2293249 PSISDG 0277-786X
(2018).
Google Scholar
P. Isola et al.,
“Image-to-image translation with conditional adversarial networks,”
in Proc. IEEE Conf. Comput. Vision and Pattern Recognit.,
1125
–1134
(2018). Google Scholar
Z. Xu et al.,
“GAN-based virtual re-staining: a promising solution for whole slide image analysis,”
(2019). Google Scholar
A. C. Quiros, R. Murray-Smith and K. Yuan,
“Pathology GAN: learning deep representations of cancer tissue,”
(2019). Google Scholar
K. de Haan et al.,
“Deep learning-based transformation of HE stained tissues into special stains,”
Nat Commun., 12
(1), 4884 https://doi.org/10.1038/s41467-021-25221-2 NCAOBW 2041-1723
(2021).
Google Scholar
C. Mercan et al.,
“Virtual staining for mitosis detection in breast histopathology,”
in Proc. Int. Symp. Biomed. Imaging,
1770
–1774
(2020). Google Scholar
J.-Y. Zhu et al.,
“Unpaired image-to-image translation using cycle-consistent adversarial networks,”
in Proc. IEEE Int. Conf. Comput. vision,
2223
–2232
(2017). Google Scholar
J. Li et al.,
“U-Net based mitosis detection from HE-stained images with the semi-automatic annotation using pHH3 IHC-stained images,”
Image Process., 12032 669
–681
(2022).
Google Scholar
P. S. Nielsen et al.,
“Proliferation indices of phosphohistone H3 and Ki67: strong prognostic markers in a consecutive cohort with stage I/II melanoma,”
Mod. Pathol., 26
(3), 404
–413 https://doi.org/10.1038/modpathol.2012.188 MODPEO 0893-3952
(2013).
Google Scholar
Y. Liu et al.,
“Predict Ki-67 positive cells in H&E-stained images using deep learning independently from IHC-stained images,”
Front. Mol. Biosci., 7
(Aug.), 183 https://doi.org/10.3389/fmolb.2020.00183
(2020).
Google Scholar
K. He et al.,
“Deep residual learning for image recognition,”
in Proc. IEEE Conf. Comput. Vision and Pattern Recognit.,
770
–778
(2016). https://doi.org/10.1109/CVPR.2016.90 Google Scholar
S. Liu et al.,
“Unpaired stain transfer using pathology-consistent constrained generative adversarial networks,”
IEEE Trans. Med. Imaging, 40
(8), 1977
–1989 https://doi.org/10.1109/TMI.2021.3069874 ITMID4 0278-0062
(2021).
Google Scholar
F. Martino et al.,
“A deep learning model to predict Ki-67 positivity in oral squamous cell carcinoma,”
J. Pathol. Inf., 15
(May 2023), 100354 https://doi.org/10.1016/j.jpi.2023.100354
(2024).
Google Scholar
A. C. Ruifrok et al.,
“Quantification of histochemical staining by color deconvolution,”
Anal. Quant. Cytol. Histol., 23
(4), 291
–299 AQCHED 0884-6812
(2001).
Google Scholar
I. Oshina and J. Spigulis,
“Beer–Lambert law for optical tissue diagnostics: current state of the art and the main limitations,”
J. Biomed. Opt., 26
(10), 100901 https://doi.org/10.1117/1.JBO.26.10.100901 JBOPFO 1083-3668
(2021).
Google Scholar
Y. Murakami et al.,
“Color correction for automatic fibrosis quantification in liver biopsy specimens,”
J. Pathol. Inf., 4
(1), 36 https://doi.org/10.4103/2153-3539.124009
(2013).
Google Scholar
D. F. Swinehart,
“The Beer-Lambert law,”
J. Chem. Educ., 39
(7), 333 https://doi.org/10.1021/ed039p333 JCEDA8 0021-9584
(1962).
Google Scholar
C. Marzahl et al.,
“Robust quad-tree based registration of whole slide images,”
in MICCAI Workshop Comput. Pathol. (COMPAY 2021),
(2021). Google Scholar
E. Rublee et al.,
“ORB: an efficient alternative to SIFT or SURF,”
in Int. Conf. Comput. Vision (Basel),
2564
–2571
(2011). Google Scholar
G. Huang et al.,
“Densely connected convolutional networks,”
in Proc. 30th IEEE Conf. Comput. Vision and Pattern Recognit., CVPR 2017,
2261
–2269
(2017). https://doi.org/10.1109/CVPR.2017.243 Google Scholar
J. Deng et al.,
“ImageNet: a large-scale hierarchical image database,”
in IEEE Conf. Comput. Vision and Pattern Recognit.,
248
–255
(2010). https://doi.org/10.1109/CVPR.2009.5206848 Google Scholar
P. Iakubovskii,
“Segmentation models,”
(2019). https://github.com/qubvel/segmentation_models Google Scholar
M. Abadi et al.,
“TensorFlow: large-scale machine learning on heterogeneous systems,”
(2015). https://tensorflow.org Google Scholar
P. Bankhead et al.,
“QuPath: open source software for digital pathology image analysis,”
Sci. Rep., 7
(1), 16878 https://doi.org/10.1038/s41598-017-17204-5
(2017).
Google Scholar
D. P. Kingma and J. L. Ba,
“Adam: a method for stochastic optimization,”
in 3rd Int. Conf. Learn. Represent., ICLR 2015—Conf. Track Proc.,
(2015). Google Scholar
S. L. Smith et al.,
“Don’t decay the learning rate, increase the batch size,”
in 6th Int. Conf. Learn. Represent., ICLR 2018—Conf. Track Proc.,
1
–11
(2018). Google Scholar
S. Pecorelli,
“Revised FIGO staging for carcinoma of the vulva, cervix, and endometrium,”
Int. J. Gynaecol. Obstetr., 105
(2), 103
–104 https://doi.org/10.1016/j.ijgo.2009.02.012 IJGOAL
(2009).
Google Scholar
H. Chang, L. Loss and B. Parvin,
“Nuclear segmentation in H&E sections via multi-reference graph cut (MRGC),”
in Int. Symp. Biomed. Imaging (ISBI),
1
–4
(2012). Google Scholar
Z. Wang et al.,
“Image quality assessment: from error visibility to structural similarity,”
IEEE Trans. Image Process., 13
(4), 600
–612 https://doi.org/10.1109/TIP.2003.819861 IIPRE4 1057-7149
(2004).
Google Scholar
Z. Wang and E. P. Simoncelli,
“Translation insensitive image similarity in complex wavelet domain,”
in Proc. IEEE Int. Conf. Acoust. Speech and Signal Process. (ICASSP)–,
573
–576
(2005). https://doi.org/10.1109/ICASSP.2005.1415469 Google Scholar
BiographyCunyuan Ji is a PhD candidate at the School of Engineering of Tokyo Institute of Technology. He graduated from the School of Optoelectronic Science and Engineering at the University of Electronic Science and Technology of China and received his Master of Engineering degree from Tokyo Institute of Technology. His research interests include pathology image analysis, deep learning, and explainable AI. Takumi Urata completed his master's degree in the Department of Health Sciences of the Clinical Laboratory Sciences Division at Shinshu University. He is currently a PhD candidate in the Department of Information at Tokyo Institute of Technology, where he is conducting research on the relationship between the expression levels of DNA replication-related proteins and endometrial carcinoma and its precursors. His main research areas are deep learning, medical image analysis, and biomedical engineering. Fumikazu Kimura received his PhD in medical science from the Department of Clinical Cytology of the Graduate School of Medical Sciences at Kitasato University in 2010. He was a research fellow of the Global Scientific Information and Computing Center at Tokyo Institute of Technology from 2010 to 2015. Since 2015, he is in the Department of Biomedical Laboratory Sciences of the School of Health Sciences at Shinshu University, where he is currently a junior associate professor. Keiko Ishii received her medical doctor degree and her PhD from Shinshu University School of Medicine. She is working as a pathologist in the Division of Diagnostic Pathology of Okaya City Hospital. Her research includes lobular endocervical glandular hyperplasia and gynecological pathology. Takeshi Uehara has been an associate professor in the Department of Laboratory Medicine at Shinshu University School of Medicine (since 2014), and the director of the Diagnostic Pathology (since 2015), the director of the Laboratory Medicine (since 2022), and the director of the Blood Transfusion Center (since 2023) at Shinshu University Hospital. He graduated from Shinshu University School of Medicine in 1997 and has held various academic positions. Kenji Suzuki worked in the Department of Radiology at the University of Chicago as an assistant professor, and at Illinois Institute of Technology as an associate professor (tenured). He is currently a professor (tenured) and founding director of the Biomedical Artificial Intelligence Research Unit, Tokyo Institute of Technology, Japan. He has published more than 395 papers (including 125 peer-reviewed journal papers). His H-index is 61. He has been actively researching on deep learning in medical imaging and AI-aided diagnosis for the past 25 years. Saori Takeyama is an assistant professor in the Department of Information and Communications Engineering of the School of Engineering at Tokyo Institute of Technology. She received her BE degree in engineering in 2016 and her ME and PhD degrees in information and communications engineering in 2018 and 2021 from Tokyo Institute of Technology, respectively. From 2018 to 2021, she was a research fellow (DC1) of the Japan Society for the Promotion of Science. Masahiro Yamaguchi is a professor at the School of Engineering of Tokyo Institute of Technology. He has been a faculty member of the same institute since 1989 and became a full professor in 2011. His research includes color and multispectral imaging, holography, pathology image analysis, and computational imaging. He was the editor-in-chief of Optical Review, published by the Optical Society of Japan, from 2020 to 2022, and is currently the convener of CIE RF-01 “Spectral Imaging.” |
RGB color model
Education and training
Lithium
Visualization
Cross validation
Deep learning
Current controlled voltage source