UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

Structured Output Prediction and Learning for Deep Monocular 3D Human Pose Estimation

Kinauer, S; Güler, RA; Chandra, S; Kokkinos, I; (2017) Structured Output Prediction and Learning for Deep Monocular 3D Human Pose Estimation. In: Pelillo, M and Hancock, ER, (eds.) Proceedings of the International Workshop on Energy Minimization Methods in Computer Vision and Pattern Recognition (EMMCVPR), 11th international conference, Venice, Italy, 30 October to 1 November 2017. (pp. pp. 34-48). Springer Nature: Switzerland: Cham. Green open access

[thumbnail of camera ready (1).pdf]
Preview
Text
camera ready (1).pdf - Accepted Version

Download (6MB) | Preview

Abstract

In this work we address the problem of estimating 3D human pose from a single RGB image by blending a feed-forward CNN with a graphical model that couples the 3D positions of parts. The CNN populates a volumetric output space that represents the possible positions of 3D human joints, and also regresses the estimated displacements between pairs of parts. These constitute the ‘unary’ and ‘pairwise’ terms of the energy of a graphical model that resides in a 3D label space and delivers an optimal 3D pose configuration at its output. The CNN is trained on the 3D human pose dataset 3.6M, the graphical model is trained jointly with the CNN in an end-to-end manner, allowing us to exploit both the discriminative power of CNNs and the top-down information pertaining to human pose. We introduce (a) memory efficient methods for getting accurate voxel estimates for parts by blending quantization with regression (b) employ efficient structured prediction algorithms for 3D pose estimation using branch-and-bound and (c) develop a framework for qualitative and quantitative comparison of competing graphical models. We evaluate our work on the Human 3.6M dataset, demonstrating that exploiting the structure of the human pose in 3D yields systematic gains.

Type: Proceedings paper
Title: Structured Output Prediction and Learning for Deep Monocular 3D Human Pose Estimation
Event: 11th International Conference, EMMCVPR 2017, Venice, Italy, 30 October to 1 November 2017
ISBN-13: 978-3-319-78198-3
Open access status: An open access version is available from UCL Discovery
DOI: 10.1007/978-3-319-78199-0
Publisher version: https://doi.org/10.1007/978-3-319-78199-0
Language: English
Additional information: This version is the author accepted manuscript. For information on re-use, please refer to the publisher’s terms and conditions.
UCL classification: UCL
UCL > Provost and Vice Provost Offices > UCL BEAMS
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science > Dept of Computer Science
URI: https://discovery.ucl.ac.uk/id/eprint/10060979
Downloads since deposit
137Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item