Convolution neural networks (CNNs) and transformers are good at extracting local and global features, respectively, whereas both local and global features are important for the no-reference image quality assessment (NR-IQA) task. Therefore, we innovatively propose a CNN–transformer dual-stream parallel fusion network for NR-IQA that can simultaneously extract local and global hierarchical features related to image quality. In addition, considering the importance of saliency in NR-IQA, a saliency-guided CNN and transformer feature fusion module is proposed to fuse and optimize the hierarchical features extracted by the dual-stream network. Finally, the high-level features of the dual-stream network are fused through the local and global cross-attention module to better model the interaction relationship between local and global information in the image, and the quality prediction module containing evaluation and weight branches is used to obtain the quality score of distorted images. To comprehensively evaluate the performance of our model, we conducted experiments on six standard image quality assessment datasets, and the experimental results showed that our model has better quality prediction performance and generalization ability than previous representative NR-IQA models. |
ACCESS THE FULL ARTICLE
No SPIE Account? Create one
Image quality
Transformers
Feature extraction
Neural networks
Data modeling
Distortion
Performance modeling