Towards efficient diagnostics: refining vision transformers for medical image multi-label classification

Garrett I. Cayce; Benjamin M. Hand; Aidan G. Kurz; Colleen P. Bailey

doi:10.1117/12.3013977

7 June 2024 Towards efficient diagnostics: refining vision transformers for medical image multi-label classification

Garrett I. Cayce, Benjamin M. Hand, Aidan G. Kurz, Colleen P. Bailey

Proceedings Volume 13043, Anomaly Detection and Imaging with X-Rays (ADIX) IX; 130430L (2024) https://doi.org/10.1117/12.3013977
Event: SPIE Defense + Commercial Sensing, 2024, National Harbor, Maryland, United States

Abstract

Medical imaging, including the use of chest x-rays, is an important tool for modern healthcare, enabling early and accurate disease diagnosis, facilitating timely interventions to mitigate health issues. By capturing images of critical internal organs like the lungs and heart, x-rays enable doctors to make informed diagnoses and treatment decisions, especially concerning respiratory and cardiac conditions. The importance of early and accurate disease diagnosis, particularly with multiple pathologies, is paramount, as it greatly impacts patient outcomes by enabling timely and specific treatments. Recently, multi-label classification has become increasingly important in medical imaging, since several pathologies can be present within a single x-ray. While traditional convolutional neural networks (CNNs) have played a pivotal role in enhancing the accuracy of x-ray diagnoses, the expanding complexity of multi-label imaging demands more sophisticated methods. Vision Transformers (ViTs) have emerged as a promising approach in medical image classification, showcasing their ability to effectively process x-ray images and identify pathologies within them. While traditional ViTs perform well, they have significant drawbacks. Most ViT models utilize a large number of parameters, often ranging from millions to billions of parameters. Such parameter-intensive designs, while powerful, are computationally heavy. This not only increases the resource requirements, but also raises concerns about their feasibility and scalability in real-world, time-sensitive healthcare settings. We propose a novel Vision Transformer architecture aimed at effectively classifying multi-label x-ray images while significantly enhancing the efficiency of ViT-based multi-label medical image classification methods. By optimizing model architectures and exploring techniques for parameter reduction, we seek to develop more streamlined and resource-efficient approaches without completely sacrificing the efficacy of these methods. Our work endeavors to bridge the gap between cutting-edge technology and practical healthcare applications, promising a more efficient and accessible future for medical image analysis.

Conference Presentation

(2024) Published by SPIE. Downloading of the abstract is permitted for personal use only.

Citation Download Citation

Garrett I. Cayce, Benjamin M. Hand, Aidan G. Kurz, and Colleen P. Bailey "Towards efficient diagnostics: refining vision transformers for medical image multi-label classification", Proc. SPIE 13043, Anomaly Detection and Imaging with X-Rays (ADIX) IX, 130430L (7 June 2024); https://doi.org/10.1117/12.3013977

ACCESS THE FULL ARTICLE

INSTITUTIONAL
Select your institution to access the SPIE Digital Library.

SELECT YOUR INSTITUTION

PERSONAL
Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.

PERSONAL SIGN IN

No SPIE Account? Create one

PURCHASE THIS CONTENT

SUBSCRIBE TO DIGITAL LIBRARY

50 downloads per 1-year subscription

Members: $195

Non-members: $335 ADD TO CART

25 downloads per 1 - year subscription

Members: $145

Non-members: $250 ADD TO CART

PURCHASE SINGLE ARTICLE

Includes PDF, HTML & Video, when available