Real‐time deep learning semantic segmentation for 3-D augmented reality

V. Voronin; E. Semenishchev; A. Zelensky; M. Zhdanova; N. Gapon

doi:10.1117/12.2691152

28 November 2023 Real‐time deep learning semantic segmentation for 3-D augmented reality

V. Voronin, E. Semenishchev, A. Zelensky, M. Zhdanova, N. Gapon

Proceedings Volume 12772, Real-time Photonic Measurements, Data Management, and Processing VII; 127720L (2023) https://doi.org/10.1117/12.2691152
Event: SPIE/COS Photonics Asia, 2023, Beijing, China

Conference Poster

Abstract

Augmented reality is a visualization technology that displays information by adding virtual images to the real world. In many cases, augmented reality requires recognition of the current scene. Extracting foreground objects from a video in real-time on limited hardware, such as a smartphone, is demanding. The augmented reality for the environment must have a model in which it is clear where it is necessary to detect the object or apply a mask. One way to recognize a scene in the current context without prior information is to use semantic segmentation techniques. This article proposes a new neural network architecture for efficient semantic image segmentation in the task of building augmented reality. The developed architecture is based on the combination of Shufflenet V2 and DPC, which provides good performance due to the balance between predictive accuracy and efficiency. First, the ShuffleNet V2 neural network architecture obtains features from RGB images. The resulting feature maps are then passed to one of the Deeplab V3+ Dense Prediction Cell encoders. At the final stage, the features are decoded by bilinear interpolation to create segmentation masks. The augmented reality construction algorithm is based on the ARCore framework and the OpenGL interface for embedded systems. The proposed approach for recognizing scene objects in augmented reality uses semantic segmentation, providing real-time information. The implementation of the algorithm shows that the detected objects can be tracked in 3-D space using visual-inertial odometry without resorting to constantly updating the environment model. The frequency of object detection and semantic mask generation can be reduced, resulting in battery and processing power savings, which is critical for mobile and embedded systems. The semantic information provided by these solutions can be used in autonomous driving, robotics navigation, localization, and scene recognition in conditions of limited resources. In augmented reality, the proposed approach can remove objects from a scene, draw attention to objects, or provide scene recognition to software logic. The experiment results confirmed the high efficiency of the proposed method compared to the state-of-the-art techniques for real‐time 3-D augmented reality construction.

(2023) Published by SPIE. Downloading of the abstract is permitted for personal use only.

Citation Download Citation

V. Voronin, E. Semenishchev, A. Zelensky, M. Zhdanova, and N. Gapon "Real‐time deep learning semantic segmentation for 3-D augmented reality", Proc. SPIE 12772, Real-time Photonic Measurements, Data Management, and Processing VII, 127720L (28 November 2023); https://doi.org/10.1117/12.2691152

ACCESS THE FULL ARTICLE

INSTITUTIONAL
Select your institution to access the SPIE Digital Library.

SELECT YOUR INSTITUTION

PERSONAL
Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.

PERSONAL SIGN IN

No SPIE Account? Create one

PURCHASE THIS CONTENT

SUBSCRIBE TO DIGITAL LIBRARY

50 downloads per 1-year subscription

Members: $195

Non-members: $335 ADD TO CART

25 downloads per 1 - year subscription

Members: $145

Non-members: $250 ADD TO CART

PURCHASE SINGLE ARTICLE

Includes PDF, HTML & Video, when available