Deep High-Resolution Representation Learning for Human Pose Estimation

Ke Sun     Bin Xiao     Dong Liu     Jingdong Wang
      

Abstract

In this paper, we are interested in the human pose estimation problem with a focus on learning reliable high-resolution representations. Most existing methods recover high-resolution representations from low-resolution representations produced by a high-to-low resolution network. Instead, our proposed network maintains high-resolution representations through the whole process.
We start from a high-resolution subnetwork as the first stage, gradually add high-to-low resolution subnetworks one by one to form more stages, and connect the mutli-resolution subnetworks in parallel. We conduct repeated multi-scale fusions such that each of the high-to-low resolution representations receives information from other parallel representations over and over, leading to rich high-resolution representations. As a result, the predicted keypoint heatmap is potentially more accurate and spatially more precise. We empirically demonstrate the effectiveness of our network through the superior pose estimation results over two benchmark datasets: the COCO keypoint detection dataset and the MPII Human Pose dataset. In addition, we show the superiority of our network in pose tracking on the PoseTrack dataset.
The code and models are publicly available at GitHub.

Summary of existing solutions

(a) Hourglass [1]; (b) Cascaded pyramid networks [2]; (c) Simple baseline [3]; (d) Combined with dilated convolutions [4].
   
   

Paper

paper

Code

We released the training and testing code and the pretrained model at GitHub

Other applications

more ...
Pose estimation Semantic segmentation Face alignment Image classification Object detection

Citation

@inproceedings{SunXLW19,
  title={Deep High-Resolution Representation Learning for Human Pose Estimation},
  author={Ke Sun and Bin Xiao and Dong Liu and Jingdong Wang},
  booktitle={CVPR},
  year={2019}
}

References

[1]  A. Newell, K. Yang, and J. Deng. Stacked hourglass net-works for human pose estimation. ECCV, pages 483–499, 2016.
[2]  Y. Chen, Z. Wang, Y. Peng, Z. Zhang, G. Yu, and J. Sun. Cascaded pyramid network for multi-person pose estimation. CoRR, abs/1711.07319, 2017.
[3]  B. Xiao, H. Wu, and Y. Wei. Simple baselines for human pose estimation and tracking. ECCV, pages 472–487, 2018.
[4]  E. Insafutdinov, L. Pishchulin, B. Andres, M. Andriluka, and B. Schiele. Deepercut: A deeper, stronger, and faster multi-person pose estimation model. ECCV, pages 34–50, 2016.

HTML Hit Counters