This paper presents YEEHaD, an Extremely Efficient Hand Detection approach based on YOLO architecture. We replace the Cross Stage Partial (CSP) blocks of YOLOv5 with HG blocks, which utilize lightweight convolutions with the squeeze and excitation technique, enhancing detection efficiency without compromising performance. YEEHaD demonstrates remarkable computational efficiency, maintaining under 3 GFLOPs and using fewer than 1.1M parameters. We conduct extensive evaluations of YEEHaD’s performance on two public datasets and provide manual annotations of hand locations of about 40K frames from the NVGesture dataset. Its detection accuracy is comparable with heavier versions of YOLOv5, achieving 99.45 mAP@0.5 on the Hagrid dataset and 99.43 mAP@0.5 on the NVGesture dataset. Additionally, we analyze YEEHaD’s performance on standard desktop GPUs and two GPU-embedded devices. Our model can run at 220 FPS on a standard 2080Ti GPU. This adaptability is explored in-depth, showcasing its potential in different hardware environments. Finally, we delve into the possibility of fine-tuning YEEHaD for hand gesture recognition (HGR), offering insights into the balance between efficiency and effectiveness in HGR applications.
Hand segmentation is usually considered a pixel-wise binary classification problem, where the foreground hand is meant to be recognized in an input image. However, we envision that finger-level hand segmentation is more useful for applications like hand gesture and sign language recognition. Therefore, in this paper, we compare five state-of-the-art (SOTA) real-time semantic segmentation methods for the task of finger-level hand segmentation. To do that, we introduce two subsets consisted of 1,000 images manually annotated pixel-wise selected from new proposed datasets of hand gesture and world-level sign language recognition. With these subsets, we evaluate the accuracy of the recent SOTA methods of DABNet, FastSCNN, FC-HardNet, FASSDNet, and DDRNet. Since each subset has relatively few images (500), we introduce a simple yet effective loss function to train with synthetic data that includes the same annotations. Finally, we present a real-time performance evaluation of the five algorithms on the NVIDIA Jetson family of GPU-powered embedded systems, including Jetson Xavier NX, Jetson TX2, and Jetson Nano.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.