Convolution neural networks (CNNs) and transformers are good at extracting local and global features, respectively, whereas both local and global features are important for the no-reference image quality assessment (NR-IQA) task. Therefore, we innovatively propose a CNN–transformer dual-stream parallel fusion network for NR-IQA that can simultaneously extract local and global hierarchical features related to image quality. In addition, considering the importance of saliency in NR-IQA, a saliency-guided CNN and transformer feature fusion module is proposed to fuse and optimize the hierarchical features extracted by the dual-stream network. Finally, the high-level features of the dual-stream network are fused through the local and global cross-attention module to better model the interaction relationship between local and global information in the image, and the quality prediction module containing evaluation and weight branches is used to obtain the quality score of distorted images. To comprehensively evaluate the performance of our model, we conducted experiments on six standard image quality assessment datasets, and the experimental results showed that our model has better quality prediction performance and generalization ability than previous representative NR-IQA models.
Under unfavorable conditions, fusion images of infrared and visible images often lack edge contrast and details. To address this issue, we propose an edge-oriented unrolling network, which comprises a feature extraction network and a feature fusion network. In our approach, after respective enhancement processes, the original infrared/visible image pair with their enhancement version is combined as the input to get more prior information acquisition. First, the feature extraction network consists of four independent iterative edge-oriented unrolling feature extraction networks based on the edge-oriented deep unrolling residual module (EURM), in which the convolutions in the EURM modules are replaced with edge-oriented convolution blocks to enhance the edge features. Then, the convolutional feature fusion network with differential structure is proposed to obtain the final fusion result, through utilizing the concatenate operation to map multidimensional features. In addition, the loss function in the fusion network is optimized to balance multiple features with significant differences in order to achieve better visual effect. Experimental results on multiple datasets demonstrate that the proposed method produces competitive fusion images as evaluated subjectively and objectively, with balanced luminance, sharper edge, and better details.
This paper proposes a network model based on a three-stream network and improved attention mechanism for blind image quality assessment (TSAIQA). The inputs of the three streams are the distorted image, the pseudoreference image obtained by the improved generative adversarial network (GAN), and the gradient map of the distorted image. The distorted image stream focuses on the holistic quality-related features, the pseudoreference image stream is used to supplement the lost features due to distortion, and the gradient stream explicitly extracts the quality-related structural features. In addition, spatial and channel attention mechanisms combining first- and second-order information are proposed, and the improved attention mechanisms are applied to the three-stream network to optimize spatial and channel-level features effectively. Finally, the three-stream fusion features are input to the quality regression network to predict the image quality. To demonstrate the effectiveness of the proposed model, experiments are conducted on four classical IQA databases and two new large-scale databases. The experimental results show that the results of our TSAIQA model outperform the most advanced IQA methods and confirm the effectiveness of the proposed network structure and attention mechanisms.
We propose a saliency-enhanced two-stream convolutional network (SETNet) for no-reference image quality assessment. The proposed SETNet contains two subnetworks of image stream and saliency stream. The image stream focuses on the whole image content, while the saliency stream explicitly guides the network to learn spatial salient features that are more attractive to humans. In addition, the spatial attention module and dilated convolution-based channel attention module are employed to refine multiple levels features in spatial and channel dimensions. Finally, the image stream and saliency stream features fusion strategy is proposed to integrate features at the corresponding layer, and the final quality scores are predicted by using multiple levels of integrated features and weighting strategy. The experimental results of the proposed method and several representative methods on four synthetic distortion datasets and two real distortion datasets show that our SETNet has higher prediction accuracy and generalization ability.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.