Depth estimation and semantic segmentation are crucial for visual perception and scene understanding. Multi-task learning, which captures shared features across multiple tasks within a scene, is often applied to depth estimation and semantic segmentation tasks to jointly improve accuracy. In this paper, a deformable attention-guided network for multi-task learning is proposed to enhance the accuracy of both depth estimation and semantic segmentation. The primary network architecture consists of a shared encoder, initial pred modules, deformable attention modules and decoders. RGB images are first input into the shared encoder to extract generic representations for different tasks. These shared feature maps are then decoupled into depth, semantic, edge and surface normal features in the initial pred module. At each stage, effective attention is applied to depth and semantic features under the guidance of fusion features in the deformable attention module. The decoder upsamples each deformable attention-enhanced feature map and outputs the final predictions. The proposed model achieves mIoU accuracy of 44.25% and RMSE of 0.5183, outperforming the single task baseline, multi-task baseline and state-of-the-art multi-task learning model.
In the context of operating in low-light conditions where sufficient visual information is lacking, visual SLAM (Simultaneous Localization and Mapping) becomes a considerably challenging task. To address this issue, we propose a visual SLAM method based on Near-Infrared (NIR) illumination, which operates effectively in complete darkness while being visually friendly to the human eye. This approach employs NIR imagery to estimate the camera's motion and pose, achieving simultaneous localization and mapping. Through experiments and quantitative/qualitative analysis, the effectiveness of this method is demonstrated, particularly in low-light environments. Research findings also indicate that the performance of the NIR-based SLAM system is on par with its visible-light counterpart. Moreover, in indoor settings, the NIR-based SLAM system outperforms the visible-light SLAM system, suggesting that NIR-based SLAM could be a potential solution for robust camera pose estimation.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.