Over the years, U-Net has become a predominant model in the domain of retinal vessel image segmentation. However, its constrained receptive field and the inherent biases associated with convolutional operations present significant challenges in effectively capturing long-range dependencies. In recent years, although Transformer-based techniques have been integrated into the U-Net architecture to overcome this limitation, the self-attention mechanism inherent in Transformers demands substantial computational resources, thereby increasing computational complexity and the risk of overfitting. To address these challenges, we propose a model that integrates lightweight Transformer and CNN networks, namely MobileViTv2-ResUNet, for precise retinal vessel segmentation. We chose U-Net as the framework for the automated retinal vessel segmentation model. Firstly, in the encoding phase, we introduced MobileViTv2 blocks to replace traditional convolutional modules for feature extraction. Subsequently, inverted residuals are employed within the encoding phase to perform downsampling operations, thereby reducing computational complexity while enhancing the network’s representation and generalization capabilities. Additionally, an ASPP module is incorporated between the encoder and decoder to effectively fuse feature information from different scales. Finally, in the decoding phase, we integrate our designed LeakyRes module to prevent the occurrence of the “neuron death” phenomenon, thereby improving the accuracy of retinal vessel segmentation. We validated our MobileViTv2-ResUNet on the public datasets HRF and STARE. Experimental results demonstrate that our MobileViTv2-ResUNet outperforms most existing state-ofthe-art algorithms, significantly enhancing vessel segmentation methods, particularly for images with anomalies, bifurcations, and microvessel segmentation challenges.
|