Presentation + Paper
18 June 2024 Synthetic vs real: exploring the impact of synthetic data on medical image classification
José Carlos Moreno-Tagle, Jimena Olveres, Boris Escalante-Ramírez
Author Affiliations +
Abstract
This study explores the impact of synthetic medical images, created through Stable Diffusion, on neural network training for lung condition classification. Using a hybrid dataset combining real and synthetic images, diverse state-of-the-art vision models were trained. Neural networks effectively learned from synthetic data, its performance is similar or superior to models trained purely on real images as long as the training is carried out under equal conditions: same architecture, same number of epochs, same training style, same resolution of the input image. We selected ConvNext-small as our test architecture. Its best performance when trained with a hybrid dataset (synthetic and real images) was 89% while when trained with purely real images it was 85%. These results were obtained when evaluated with an external validation data set curated by a radiologist. However, hybrid models seem to show a limit in their performance when exploring different training techniques. In contrast, a simpler architecture trained with only real images can take advantage of more complex training regimes to elevate its final performance. In this regard, our best hybrid-trained model (ConvNext-small) achieved an external validation accuracy of 87% while ResNet-34 attained a 93% validation accuracy trained only on real images. Both models were evaluated with the real-image-only dataset provided by the radiologist. This study concludes by comparing our top AI models and radiologists’ performance levels.
Conference Presentation
(2024) Published by SPIE. Downloading of the abstract is permitted for personal use only.
José Carlos Moreno-Tagle, Jimena Olveres, and Boris Escalante-Ramírez "Synthetic vs real: exploring the impact of synthetic data on medical image classification", Proc. SPIE 12998, Optics, Photonics, and Digital Technologies for Imaging Applications VIII, 1299802 (18 June 2024); https://doi.org/10.1117/12.3022261
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Data modeling

Performance modeling

Medical imaging

Artificial intelligence

Diffusion

Image classification

Computed tomography

Back to Top