Hands play a significant role in human interaction with objects and can enhance our understanding of the activity being performed. They serve as the focal point in egocentric vision. Accurate segmentation of hands from the background and other objects present in a scene enables the understanding of hand movements and gestures. In this paper, we propose an improved hand segmentation algorithm that leverages deep learning techniques to achieve improved segmentation results.
Our approach makes use of optimization techniques that enhance the accuracy of the mask produced using an optimized model (RefineNet-Pix). We employed techniques that help optimize computational resources while improving performance. We then trained the model with the Pascal VOC 2012 dataset and tested the trained model with the Egohands dataset.
The publicly available hands datasets, Egohands, test images was used to evaluate the model performance. Additionally, we compared the result with the baseline model for performance evaluation. The result obtained shows an improved performance of over 5% when compared with the baseline. This enhanced model shows better segmentation accuracy across these metrics (mean intersection over union (mIoU), pixel accuracy (pixelAcc.), and mean accuracy (meanAcc.), and lesser processing time outperforming the baseline model. This improved model can be used in several computer vision applications including object manipulation, activity recognition, rehabilitation therapy, and assistive technologies.
Semantic segmentation is a high-level task in computer vision that associates each pixel of an image with a semantic(class) label. Fine-semantic segmentation is a pixel-level task that provides detailed information necessary to easily identify the region of the object of interest. Hands are one of the main channels for communication, enhancing human-object and human-environment interaction, and in egocentric videos, they appear to be ubiquitous and at the center of vision and activities, hence our interest in hand segmentation. Fine-semantic segmentation of hands locates, identifies, and groups together pixels associated with the hands, with a hand semantic label. We performed fine semantic segmentation of hands, by improving the architecture of the state-of-the-art deep convolutional neural network (RefineNet). We achieve a finer and more accurate result by amending the process of obtaining and combining high and low-level features, and the pixel grouping for pixel-level classification. We performed this task on a public egocentric video dataset (EgoHands). We evaluate our model (RefineNet-Pix) performance by adopting the existing pixel-level metric, mean precision (mPrecision). Comparing our result with the baseline reported in Urooj’s work, we obtain accuracy higher than 87.9% of the benchmark. Our finer and more accurate semantic segmentation result guarantees good performance under various lighting conditions and complex backgrounds, making it suitable for use in both indoor and outdoor environments. Fine-hand semantic segmentation can be applied in image analysis, medical systems (with a focus on understanding hand motion for prediction, diagnosis, and monitoring), hand gesture recognition (human-computer interaction and understanding action), and robotics(grasp and manipulation of objects).
Good health and functional ability are important for individuals to lead fulfilling mental, psychological, and social lives. The diseases such as Dementia causes irreversible damage, decline in cognition, function, and behavior which translates into difficulty in independently performing daily tasks. Studies showed that assessment of Instrumental activities of daily living(IADLs) correlate with humans' cognitive and functional status. Analysis of biomechanical markers such as hand movement/use was done with artificial intelligence(AI). We present an optimized AI algorithm for hand detection in the analysis of egocentric video recordings. This improved AI algorithm is based on a probabilistic approach where hand regions are detected in egocentric videos. They then feed the human functional pattern recognition process. To evaluate the performance of our proposal we use a dataset containing the four functional patterns organized into four classes, based on the prehensile patterns of the hands: strength-precision, and on the kinematics of the instruments: displacementhandling. This work was inspired by a previous work done by our group, where biomechanical markers were analyzed throughout the performance of IADL activities to recognize the human functional pattern. The result of our proposal yielded an accuracy of 87.5% in recognizing strength-precision and displacement-handling movement patterns when evaluating the test database with information from Segmented and Not-Segmented videos. This resulted in a single video that changed its classification ratio between the two subsets. This can be of great potential in the development of technological tools for the creation of an automated model to support the diagnosis of early Alzheimer's disease.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.