Effectively recognizing human gestures from variant viewpoints plays a fundamental role in the successful collaboration between humans and robots. Deep learning approaches have achieved promising performance in gesture recognition. However, they are usually data-hungry and require large-scale labeled data, which are not usually accessible in a practical setting. Synthetic data, on the other hand, can be easily obtained from simulators with fine-grained annotations and variant modalities. Existing state-of-the-art approaches have shown promising results using synthetic data, but there is still a large performance gap between the models trained on synthetic data and real data. To learn domain-invariant feature representations, we propose a novel approach which jointly takes RGB videos and 3D meshes as inputs to perform robust action recognition. We empirically validate our model on the RoCoG-v2 dataset, which consists of a variety of real and synthetic videos of gestures from the ground and air perspectives. We show that our model trained on synthetic data can outperform state-of-the-art models under the same training setting and models trained on real data.
Effective communication and control of a team of humans and robots is critical for a number DoD operations and scenarios. In an ideal case, humans would communicate with the robot teammates using nonverbal cues (i.e., gestures) that work reliably in a variety of austere environments and from different vantage points. A major challenge is that traditional gesture recognition algorithms using deep learning methods require large amounts of data to achieve robust performance across a variety of conditions. Our approach focuses on reducing the need for “hard-to-acquire” real data by using synthetically generated gestures in combination with synthetic-to-real domain adaptation techniques. We also apply the algorithms to improve the robustness and accuracy of gesture recognition from shifts in viewpoints (i.e., air to ground). Our approach leverages the soon-to-be released dataset called Robot Control Gestures (RoCoG-v2), consisting of corresponding real and synthetic videos from ground and aerial viewpoints. We first demonstrate real-time performance of the algorithm running on low-SWAP, edge hardware. Next, we demonstrate the ability to accurately classify gestures from different viewpoints with varying backgrounds representative of DoD environments. Finally, we show the ability to use the inferred gestures to control a team of Boston Dynamic Spot robots. This is accomplished using inferred gestures to control the formation of the robot team as well as to coordinate the robot’s behavior. Our expectation is that the domain adaptation techniques will significantly reduce the need for real-world data and improve gesture recognition robustness and accuracy using synthetic data.
Automatic target recognition (ATR) technology is likely to play an increasingly prevalent role in maintaining situational awareness in the modern battlefield. Progress in deep learning has enabled considerable progress in the development of ATR algorithms; however, these algorithms require large amounts of high-quality annotated data to train and that is often the main bottleneck. Synthetic data offers a potential solution to this problem, especially given recent proliferation of tools and techniques to synthesize custom data. Here, we focus on ATR, in the visible domain, from the perspective of a small drone, which represents a domain of growing importance to the Army. We describe custom simulators built to support synthetic data for multiple targets in a variety of environments. We describe a field experiment where we compared a baseline (YOLOv5) model, trained on off-the-shelf large generic public datasets, with a model augmented with specialized synthetic data. We deployed the models on a VOXL platform in a small drone. Our results showed a considerable boost in performance when using synthetic data of over 40% in target detection accuracy (average precision with at least 50% overlap). We discuss the value of synthetic data for this domain, the opportunities it creates, but also the novel challenges it introduces.
Recent years have seen impressive progress in Automatic Target Recognition (ATR) technology, both in the visible and non-visible spectra, which introduces an important challenge to the Army: understanding gaps in ATR algorithms’ feature space for informed design methodology. To tackle this challenge, we look at a combination of synthetic data and adversarial learning techniques to explore the feature space of Machine Learning (ML) algorithms. Adversarial learning, however, requires large amounts of training data representing diversity in terms of target pose, lighting, and environmental conditions. Often the main bottleneck is collecting and labeling this real training data. The problem is exacerbated in infrared (IR) given unique challenges due to material and thermal variation. Here, we present a solution based on a simulator that supports generation of physically accurate custom synthetic IR training data; this data is then leveraged to systematically study weaknesses in a state-of-the-art ATR algorithm that is often used in practice, YOLOv5. We will present results showing that this approach can lead to critical insight on algorithm weaknesses with practical consequence for the design of defense mechanisms against ATR technology as well as improved training of ML algorithms to reduce feature space vulnerabilities.
Conference Committee Involvement (5)
Synthetic Data for Artificial Intelligence and Machine Learning: Tools, Techniques, and Applications II
22 April 2024 | National Harbor, Maryland, United States
Synthetic Data for Artificial Intelligence and Machine Learning: Tools, Techniques, and Applications
1 May 2023 | Orlando, Florida, United States
Virtual, Augmented, and Mixed Reality (XR) Technology for Multi-Domain Operations III
4 April 2022 | Orlando, Florida, United States
Virtual, Augmented, and Mixed Reality (XR) Technology for Multi-Domain Operations II
12 April 2021 | Online Only, Florida, United States
Virtual, Augmented, and Mixed Reality (XR) Technology for Multi-Domain Operations
27 April 2020 | Online Only, California, United States
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.