The training and inference performance optimization of Resnet50 on CUDA RTX 4090 GPU using DALI and AMP

JunJie Lin

doi:10.1117/12.3032945

5 July 2024 The training and inference performance optimization of Resnet50 on CUDA RTX 4090 GPU using DALI and AMP

JunJie Lin

Proceedings Volume 13184, Third International Conference on Electronic Information Engineering and Data Processing (EIEDP 2024); 131844B (2024) https://doi.org/10.1117/12.3032945
Event: 3rd International Conference on Electronic Information Engineering and Data Processing (EIEDP 2024), 2024, Kuala Lumpur, Malaysia

Abstract

As a deep residual network model, Resnet50 has significant practical significance in image classification, target recognition, and image semantic recognition. In this paper, Nvidia RTX 4090 GPU is used to conduct detailed performance testing and bottleneck analysis for Resnet50 training and inference, including specific calculation delay and data processing delay under different batch sizes. In order to verify the overall acceleration effect of Resnet50, we use two optimization methods on the basis of GPU computing acceleration: the first is to use mixed precision to improve GPU training and inference efficiency, and the second is to use DALI to optimize data preprocessing and reduce data loading delay. The experimental results show that when the batch size is 256, the mixed precision is improved by about 90% compared with FP32, but the overall performance improvement is not obvious. When using mixed precision and DALI for GPU computing and data loading optimization at the same time, it can bring 1.4 and 2.5 times improvement in the overall performance of training and inference. The experimental results show that only using the mixed precision can not improve the overall computing efficiency of the system, and the data loading time cost frequently limits the end-to-end performance. Therefore, only by optimizing GPU computation and data loading delay at the same time can end users get a significant speed increase. This paper is of great significance to evaluate and improve the computational acceleration performance of GPU-based deep neural networks.

(2024) Published by SPIE. Downloading of the abstract is permitted for personal use only.

Citation Download Citation

JunJie Lin "The training and inference performance optimization of Resnet50 on CUDA RTX 4090 GPU using DALI and AMP", Proc. SPIE 13184, Third International Conference on Electronic Information Engineering and Data Processing (EIEDP 2024), 131844B (5 July 2024); https://doi.org/10.1117/12.3032945

ACCESS THE FULL ARTICLE

INSTITUTIONAL
Select your institution to access the SPIE Digital Library.

SELECT YOUR INSTITUTION

PERSONAL
Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.

PERSONAL SIGN IN

No SPIE Account? Create one

PURCHASE THIS CONTENT

SUBSCRIBE TO DIGITAL LIBRARY

50 downloads per 1-year subscription

Members: $195

Non-members: $335 ADD TO CART

25 downloads per 1 - year subscription

Members: $145

Non-members: $250 ADD TO CART

PURCHASE SINGLE ARTICLE

Includes PDF, HTML & Video, when available

Members: $17.00

Non-members: $21.00 ADD TO CART

PROCEEDINGS
8 PAGES

DOWNLOAD PAPER SAVE TO MY LIBRARY

GET CITATION

RIGHTS & PERMISSIONS

Get copyright permission Get copyright permission on Copyright Marketplace

KEYWORDS

Performance modeling

Mathematical optimization

Data modeling

Deep learning

Data processing

Neural networks

Artificial intelligence

Show All Keywords

Keywords/Phrases

Search In:

Publication Years