Brain tumor is one of the most dangerous diseases. Automated brain tumor segmentation technology is particularly important in the diagnosis and treatment of brain tumors. Traditional brain tumor segmentation methods mostly rely on UNet or associate variants, and the segmentation performance is highly dependent on the feature extraction quality. Recently, diffusion probabilistic model (DPM) has received a lot of attention and achieved remarkable success in medical image segmentation. However, the existing DPM-based brain tumor segmentation method did not utilize the advantages of complementary information between multimodal MRI. Additionally, they all constrained the generation of DPM using the original images. In this work, we propose a DPM-based brain tumor segmentation method, which consists of DPM, uncertainty generation module and collaborative Module. The collaborative module takes the input MRI from multimodal information and dynamically provide conditional constraints for DPM. This allows DPM to obtain more detailed brain tumor features. Considering that Previous works mainly ignore the influence of DPM's uncertainty on the results, we proposed an uncertainty generation module. It calculates the uncertainty of each step of the DPM and assigns corresponding uncertainty weights. The results of each step are fused according to inferred uncertainty weights to get the final segmentations. The proposed method obtained 89.32% and 87.82% dice scores on the BraTS2020 and BraTS2021 datasets, respectively, which verified the effectiveness of the proposed method.
Existing human pose estimation methods in videos often rely on sampling strategies to select frames for estimation tasks. Common sampling approaches include uniform sparse sampling and keyframe selection. However, the former focuses solely on fixed positions of video frames, leading to the omission of dynamic information, while the latter incurs high computational costs by processing each frame. To address these issues, we propose an efficient and effective pose estimation framework, named Joint Misalignment-aware Bilateral Detection Network (J-BDNet). Our framework incorporates a Bilateral Dynamic Attention Module (BDA) using knowledge distillation for efficiency. BDA detects dynamic information on both left and right halves of a video segment, guiding the sampling process. Additionally, employing a smart bilateral recursive sampling strategy with BDA enables extracting more spatiotemporal dependencies from pose data, reducing computational costs without increasing the pose estimator’s usage frequency. Moreover, we enhance existing denoise network robustness by randomly exchanging body joint positions in pose data. Experiments demonstrate the performance of our framework in terms of high occlusion, spatial blur, and illumination variations, and achie state-of-the-art performance on Sub-JHMDB datasets.
Medical image segmentation aims to categorize pixels into different regions according to their corresponding tissues / organs in medical image. In recent years, due to Transformer's outstanding ability in the field of computer vision, various visual Transformers has been exploited in this task. However, these models often suffer from quadratic complexity in the self-attention and multi-scale information interaction. In this paper, we propose a novel dual attention and pyramid-aware network, DAPFormer, to solve the aforementioned limitations. It effectively combines efficient and channel attention into a dual attention mechanism to capture spatial and inter-channel relationships in the feature dimensions, meanwhile maintains computational efficiency. Additionally, we use pyramid-aware module to redesign the skip connection, modeling the cross-scale dependencies and addressing complex scale variations. Experiments on multi-organ cardiac and skin lesion segmentation datasets demonstrate that DAPFormer outperforms state-of-the-art methods.
Affective image analysis aims to understand the sentiment of different images. The challenge is to develop a discriminative representation that bridges the affective gap between low-level features and high-level emotions. Most existing studies bridge the gap by designing deep models carefully to learn global representations in one shot directly or identify image emotion by extracting features at different levels in the model. They ignore that both local regions of an image and relationships between them impact emotional representation learning. This paper develops an affective image analysis method based on the aesthetic fusion hybrid attention network (AFHA). A modular hybrid attention block is designed to extract image emotion features and model long-range dependencies of images. By stacking hybrid attention blocks in ResNet-style, we obtain an affective representation backbone. Furthermore, considering that image emotion is inseparable from aesthetics, we employ a modified ResNet to extract image aesthetics. Finally, through a fusion strategy, the image's emotion is considered with the aesthetics conveyed. Experiments demonstrate the close relationship between emotion and aesthetics, and our plan has an excellent competitive effect compared with existing methods on the image sentiment analysis dataset.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.