Deep High-Resolution Representation Learning for Human Pose Estimation |
Ke Sun     Bin Xiao     Dong Liu     Jingdong Wang |
       |
In this paper, we are interested in the human pose estimation problem with a focus on learning reliable high-resolution representations. Most existing methods recover high-resolution representations from low-resolution representations produced by a high-to-low resolution network. Instead, our proposed network maintains high-resolution representations through the whole process. |
We start from a high-resolution subnetwork as the first stage, gradually add high-to-low resolution subnetworks one by one to form more stages, and connect the mutli-resolution subnetworks in parallel. We conduct repeated multi-scale fusions such that each of the high-to-low resolution representations receives information from other parallel representations over and over, leading to rich high-resolution representations. As a result, the predicted keypoint heatmap is potentially more accurate and spatially more precise. We empirically demonstrate the effectiveness of our network through the superior pose estimation results over two benchmark datasets: the COCO keypoint detection dataset and the MPII Human Pose dataset. In addition, we show the superiority of our network in pose tracking on the PoseTrack dataset. |
The code and models are publicly available at GitHub. |
(a) Hourglass [1]; (b) Cascaded pyramid networks [2]; (c) Simple baseline [3]; (d) Combined with dilated convolutions [4]. |
    |
    |
paper |
We released the training and testing code and the pretrained model at GitHub |
more ... | |||||
Pose estimation | Semantic segmentation | Face alignment | Image classification | Object detection | |
|
[1]  | A. Newell, K. Yang, and J. Deng. Stacked hourglass net-works for human pose estimation. ECCV, pages 483–499, 2016. |
[2]  | Y. Chen, Z. Wang, Y. Peng, Z. Zhang, G. Yu, and J. Sun. Cascaded pyramid network for multi-person pose estimation. CoRR, abs/1711.07319, 2017. |
[3]  | B. Xiao, H. Wu, and Y. Wei. Simple baselines for human pose estimation and tracking. ECCV, pages 472–487, 2018. |
[4]  | E. Insafutdinov, L. Pishchulin, B. Andres, M. Andriluka, and B. Schiele. Deepercut: A deeper, stronger, and faster multi-person pose estimation model. ECCV, pages 34–50, 2016. |