The performance of Deep Learning (DL) segmentation algorithms is routinely determined using quantitative metrics like the Dice score and Hausdorff distance. However, these metrics show a low concordance with humans’ perception of segmentation quality. The successful collaboration of health care professionals with DL segmentation algorithms will require a detailed understanding of experts’ assessment of segmentation quality. Here, we present the results of a study on expert quality perception of brain tumor segmentations of brain MR images generated by a DL segmentation algorithm. Eight expert medical professionals were asked to grade the quality of segmentations on a scale from 1 (worst) to 4 (best). To this end, we collected four ratings for a dataset of 60 cases. We observed a low inter-rater agreement among all raters (Krippendorff’s alpha: 0.34), which potentially is a result of different internal cutoffs for the quality ratings. Several factors, including the volume of the segmentation and model uncertainty, were associated with high disagreement between raters. Furthermore, the correlations between the ratings and commonly used quantitative segmentation quality metrics ranged from no to moderate correlation. We conclude that, similar to the inter-rater variability observed for manual brain tumor segmentation, segmentation quality ratings are prone to variability due to the ambiguity of tumor boundaries and individual perceptual differences. Clearer guidelines for quality evaluation could help to mitigate these differences. Importantly, existing technical metrics do not capture clinical perception of segmentation quality. A better understanding of expert quality perception is expected to support the design of more human-centered DL algorithms for integration into the clinical workflow.
The communication of reliable uncertainty estimates is crucial in the effort towards increasing trust in Deep Learning applications for medical image analysis. Importantly, reliable uncertainty estimates should remain stable under naturally occurring domain shifts. In this study, we evaluate the relationship between epistemic uncertainty and segmentation quality under domain shift within two clinical contexts: optic disc segmentation in retinal photographs and brain tumor segmentation from multi-modal brain MRI. Specifically, we assess the behavior of two epistemic uncertainty metrics derived from i, a single UNet’s sigmoid predictions, ii, deep ensembles, and iii, Monte Carlo dropout UNets, each trained with both soft Dice and weighted cross-entropy loss. Domain shifts were modeled by excluding a group with a known characteristic (glaucoma for optic disc segmentation and low-grade glioma for brain tumor segmentation) from model development and using the excluded data as additional, domain-shifted test data. While the performance of all models dropped slightly on the domain-shifted test data compared to the in-domain test set, there was no change in the Pearson correlation coefficient between the uncertainty metrics and the Dice scores of the segmentations. However, we did observe differences in the performance of two quality assessment applications based on epistemic uncertainty between the segmentation tasks. We introduce a new metric, the empirical strength distribution, to better describe the strength of the relationship between segmentation performance and epistemic uncertainty on a dataset level. We found that failures of the studied quality assessment applications were largely caused by shifts in the empirical strength distributions between training, in-domain, and domain-shifted test datasets. In conclusion, quality assessment tools based on the strong relationship between epistemic uncertainty and segmentation quality can be stable under small domain shifts. Developers should thoroughly evaluate the strength relationships for all available data and, if possible, under domain shift to ensure the validity of these uncertainty estimates on unseen data.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.