2D multi-person pose estimation is a well-studied problem for understanding humans in an image. This involves keypoint detection, which requires to detect and localize the points of interest (human joints). Multi-person pose estimation remains challenging because of occlusion of body parts, non-rigidity of human body, variable number of persons in an image and various scales. The most common existing method for keypoint detection is heatmap-based regression. However, there are several drawbacks. The precision relies on the resolution of the output heatmap; the computation is costly for post-process or pre-process for high resolution heatmap; the overlapping heatmap signals of spatial closely keypoints could not be distinguished. Therefore, heatmap-free pose estimation was emerged to tackle these problems. KAPAO and YOLO-Pose are the representations. They both utilized YOLO for keypoint detection since YOLO is an extremely fast object detection method with high accuracy. A graph consists of a collection of nodes and a collection of edges that connect the nodes. A human pose could be referred to a graph, where human joints are nodes and corresponding connection will draw the pose. Graph neural network (GNN) is designed for data with graph structure. Inspired by these, we introduce a YOLO-based GNN, a heatmap-free approach for 2D multi-person pose estimation. YOLO-based network is leveraged for keypoint detection. The detected keypoints and connections will be then re-arranged and refined by GNN. We tested our framework on COCO-2017 dataset and preliminary results show superior performance in accuracy and efficiency.
|