In the segmentation of skin lesions, there are several difficult phenomena, such as blurred edges, hair occlusion, circular field of vision and diagnostic markers. In response to the above problems, we propose the Gated Axial Transformer with Comprehensive Attention (CA-GAT) to segment skin lesions. First, the U-Net encoder-decoder structure is used as the main framework, and the encoder primarily consists of axial transformer layers. The axial attention mechanism is able to efficiently grasp long-distance dependence while greatly reducing the computational complexity. The gating mechanism allows the model to learn accurate positional encodings without pre-training on large-scale datasets. Second, the triple attention mechanism is introduced into the decoder, thus enabling the model to better differentiate the lesion boundary. Finally, the Local-Global training strategy (LoGo) enables the model to better exclude external interference based on contextual information while improving model performance. We conducted experiments on ISIC2018 dataset. Compared with U-Net, CA-Net, MedT and CA-GAT without LoGo, the Dice coefficient of our model increases by 6.3%, 0.46%, 1.2% and 1.49% respectively, and other indicators are also improved. As indicated by the experiment, the model CA-GAT exhibits favorable segmentation performance.
|