Deep High-Resolution Representation Learning for Human Pose Estimation |
| Ke Sun     Bin Xiao     Dong Liu     Jingdong Wang |
      
|
| In this paper, we are interested in the human pose estimation problem with a focus on learning reliable high-resolution representations. Most existing methods recover high-resolution representations from low-resolution representations produced by a high-to-low resolution network. Instead, our proposed network maintains high-resolution representations through the whole process. |
| We start from a high-resolution subnetwork as the first stage, gradually add high-to-low resolution subnetworks one by one to form more stages, and connect the mutli-resolution subnetworks in parallel. We conduct repeated multi-scale fusions such that each of the high-to-low resolution representations receives information from other parallel representations over and over, leading to rich high-resolution representations. As a result, the predicted keypoint heatmap is potentially more accurate and spatially more precise. We empirically demonstrate the effectiveness of our network through the superior pose estimation results over two benchmark datasets: the COCO keypoint detection dataset and the MPII Human Pose dataset. In addition, we show the superiority of our network in pose tracking on the PoseTrack dataset. |
| The code and models are publicly available at GitHub. |
| (a) Hourglass [1]; (b) Cascaded pyramid networks [2]; (c) Simple baseline [3]; (d) Combined with dilated convolutions [4]. |
   
|
   
|
paper
|
| We released the training and testing code and the pretrained model at GitHub |
![]() |
![]() |
![]() |
![]() |
![]() |
more ... |
| Pose estimation | Semantic segmentation | Face alignment | Image classification | Object detection | |
|
| [1]  | A. Newell, K. Yang, and J. Deng. Stacked hourglass net-works for human pose estimation. ECCV, pages 483–499, 2016. |
| [2]  | Y. Chen, Z. Wang, Y. Peng, Z. Zhang, G. Yu, and J. Sun. Cascaded pyramid network for multi-person pose estimation. CoRR, abs/1711.07319, 2017. |
| [3]  | B. Xiao, H. Wu, and Y. Wei. Simple baselines for human pose estimation and tracking. ECCV, pages 472–487, 2018. |
| [4]  | E. Insafutdinov, L. Pishchulin, B. Andres, M. Andriluka, and B. Schiele. Deepercut: A deeper, stronger, and faster multi-person pose estimation model. ECCV, pages 34–50, 2016. |