Deep, Robust and Single Shot 3D Multi-Person Human Pose Estimation from Monocular Images
Abstract
In this paper, we propose a new single shot method for multi-person 3D pose estimation, from monocular RGB images. Our model jointly learns to locate the human joints in the image, to estimate their 3D coordinates and to group these predictions into full human skeletons. Our approach leverages and extends the Stacked Hourglass Network and its multi-scale feature learning to manage multi-person situations. Thus, we exploit the Occlusions Robust Pose Maps (ORPM) to fully describe several 3D human poses even in case of strong occlusions or cropping. Then, joint grouping and human pose estimation for an arbitrary number of people are performed using associative embedding. We evaluate our method on the challenging CMU Panoptic dataset, and demonstrate that it achieves better results than the state of the art.
Origin | Files produced by the author(s) |
---|