Presentation + Paper
7 June 2024 Real-time human action recognition from aerial videos using autozoom and synthetic data
Author Affiliations +
Abstract
In this paper, we propose a novel approach for real-time human action recognition (HAR) on resource-constrained UAVs. Our approach tackles the limited availability of labeled UAV video data (compared to ground-based datasets) by incorporating synthetic data augmentation to improve the performance of a lightweight action recognition model. This combined strategy offers a robust and efficient solution for UAV-based HAR.We evaluate our method on the RoCoG v21 and UAV-Human2 datasets, showing a notable increase in top-1 accuracy across all scenarios on RoCoG: 9.1% improvement when training with synthetic data only, 6.9% with real data only, and the highest improvement of 11.8% with a combined approach. Additionally, using an X3D backbone further improves accuracy on the UAV-Human dataset by 5.5%. Our models deployed on a Qualcomm Robotics RB5 platform achieve real-time predictions at approximately 10 frames per second (fps) and demonstrate a superior trade-off between performance and inference rate on both low-power edge devices and high-end desktops.
Conference Presentation
(2024) Published by SPIE. Downloading of the abstract is permitted for personal use only.
Ruiqi Xian, Bryan I. Vogel, Celso M De Melo, Andre V. Harrison, and Dinesh Manocha "Real-time human action recognition from aerial videos using autozoom and synthetic data", Proc. SPIE 13035, Synthetic Data for Artificial Intelligence and Machine Learning: Tools, Techniques, and Applications II, 130350I (7 June 2024); https://doi.org/10.1117/12.3013547
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Video

Action recognition

Unmanned aerial vehicles

Data modeling

Education and training

Detection and tracking algorithms

Performance modeling

Back to Top