Paper
5 July 2024 The training and inference performance optimization of Resnet50 on CUDA RTX 4090 GPU using DALI and AMP
JunJie Lin
Author Affiliations +
Proceedings Volume 13184, Third International Conference on Electronic Information Engineering and Data Processing (EIEDP 2024); 131844B (2024) https://doi.org/10.1117/12.3032945
Event: 3rd International Conference on Electronic Information Engineering and Data Processing (EIEDP 2024), 2024, Kuala Lumpur, Malaysia
Abstract
As a deep residual network model, Resnet50 has significant practical significance in image classification, target recognition, and image semantic recognition. In this paper, Nvidia RTX 4090 GPU is used to conduct detailed performance testing and bottleneck analysis for Resnet50 training and inference, including specific calculation delay and data processing delay under different batch sizes. In order to verify the overall acceleration effect of Resnet50, we use two optimization methods on the basis of GPU computing acceleration: the first is to use mixed precision to improve GPU training and inference efficiency, and the second is to use DALI to optimize data preprocessing and reduce data loading delay. The experimental results show that when the batch size is 256, the mixed precision is improved by about 90% compared with FP32, but the overall performance improvement is not obvious. When using mixed precision and DALI for GPU computing and data loading optimization at the same time, it can bring 1.4 and 2.5 times improvement in the overall performance of training and inference. The experimental results show that only using the mixed precision can not improve the overall computing efficiency of the system, and the data loading time cost frequently limits the end-to-end performance. Therefore, only by optimizing GPU computation and data loading delay at the same time can end users get a significant speed increase. This paper is of great significance to evaluate and improve the computational acceleration performance of GPU-based deep neural networks.
(2024) Published by SPIE. Downloading of the abstract is permitted for personal use only.
JunJie Lin "The training and inference performance optimization of Resnet50 on CUDA RTX 4090 GPU using DALI and AMP", Proc. SPIE 13184, Third International Conference on Electronic Information Engineering and Data Processing (EIEDP 2024), 131844B (5 July 2024); https://doi.org/10.1117/12.3032945
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Performance modeling

Mathematical optimization

Data modeling

Deep learning

Data processing

Neural networks

Artificial intelligence

Back to Top