Learning Latent Representations of 3D Human Pose with Deep Neural Networks

I. Katircioglu; B. Tekin; M. Salzmann; Vincent Lepetit; Pascal Fua

doi:10.1007/s11263-018-1066-6

Learning Latent Representations of 3D Human Pose with Deep Neural Networks

I. Katircioglu, B. Tekin^*, M. Salzmann, Vincent Lepetit, Pascal Fua

^*Corresponding author for this work

Research output: Contribution to journal › Article › peer-review

Abstract

Most recent approaches to monocular 3D pose estimation rely on Deep Learning. They either train a Convolutional Neural Network to directly regress from an image to a 3D pose, which ignores the dependencies between human joints, or model these dependencies via a max-margin structured learning framework, which involves a high computational cost at inference time. In this paper, we introduce a Deep Learning regression architecture for structured prediction of 3D human pose from monocular images or 2D joint location heatmaps that relies on an overcomplete autoencoder to learn a high-dimensional latent pose representation and accounts for joint dependencies. We further propose an efficient Long Short-Term Memory network to enforce temporal consistency on 3D pose predictions. We demonstrate that our approach achieves state-of-the-art performance both in terms of structure preservation and prediction accuracy on standard 3D human pose estimation benchmarks

Original language	English
Pages (from-to)	1326-1341
Journal	International Journal of Computer Vision
Volume	126
DOIs	https://doi.org/10.1007/s11263-018-1066-6
Publication status	Published - 2018
Externally published	Yes

Access to Document

10.1007/s11263-018-1066-6

Cite this

@article{01b6c84c943b48fcb0d7f461edb1279c,

title = "Learning Latent Representations of 3D Human Pose with Deep Neural Networks",

abstract = "Most recent approaches to monocular 3D pose estimation rely on Deep Learning. They either train a Convolutional Neural Network to directly regress from an image to a 3D pose, which ignores the dependencies between human joints, or model these dependencies via a max-margin structured learning framework, which involves a high computational cost at inference time. In this paper, we introduce a Deep Learning regression architecture for structured prediction of 3D human pose from monocular images or 2D joint location heatmaps that relies on an overcomplete autoencoder to learn a high-dimensional latent pose representation and accounts for joint dependencies. We further propose an efficient Long Short-Term Memory network to enforce temporal consistency on 3D pose predictions. We demonstrate that our approach achieves state-of-the-art performance both in terms of structure preservation and prediction accuracy on standard 3D human pose estimation benchmarks",

author = "I. Katircioglu and B. Tekin and M. Salzmann and Vincent Lepetit and Pascal Fua",

year = "2018",

doi = "10.1007/s11263-018-1066-6",

language = "English",

volume = "126",

pages = "1326--1341",

journal = "International Journal of Computer Vision",

issn = "0920-5691",

publisher = "Springer Vieweg",

}

TY - JOUR

T1 - Learning Latent Representations of 3D Human Pose with Deep Neural Networks

AU - Katircioglu, I.

AU - Tekin, B.

AU - Salzmann, M.

AU - Lepetit, Vincent

AU - Fua, Pascal

PY - 2018

Y1 - 2018

N2 - Most recent approaches to monocular 3D pose estimation rely on Deep Learning. They either train a Convolutional Neural Network to directly regress from an image to a 3D pose, which ignores the dependencies between human joints, or model these dependencies via a max-margin structured learning framework, which involves a high computational cost at inference time. In this paper, we introduce a Deep Learning regression architecture for structured prediction of 3D human pose from monocular images or 2D joint location heatmaps that relies on an overcomplete autoencoder to learn a high-dimensional latent pose representation and accounts for joint dependencies. We further propose an efficient Long Short-Term Memory network to enforce temporal consistency on 3D pose predictions. We demonstrate that our approach achieves state-of-the-art performance both in terms of structure preservation and prediction accuracy on standard 3D human pose estimation benchmarks

AB - Most recent approaches to monocular 3D pose estimation rely on Deep Learning. They either train a Convolutional Neural Network to directly regress from an image to a 3D pose, which ignores the dependencies between human joints, or model these dependencies via a max-margin structured learning framework, which involves a high computational cost at inference time. In this paper, we introduce a Deep Learning regression architecture for structured prediction of 3D human pose from monocular images or 2D joint location heatmaps that relies on an overcomplete autoencoder to learn a high-dimensional latent pose representation and accounts for joint dependencies. We further propose an efficient Long Short-Term Memory network to enforce temporal consistency on 3D pose predictions. We demonstrate that our approach achieves state-of-the-art performance both in terms of structure preservation and prediction accuracy on standard 3D human pose estimation benchmarks

U2 - 10.1007/s11263-018-1066-6

DO - 10.1007/s11263-018-1066-6

M3 - Article

SN - 0920-5691

VL - 126

SP - 1326

EP - 1341

JO - International Journal of Computer Vision

JF - International Journal of Computer Vision

ER -

Learning Latent Representations of 3D Human Pose with Deep Neural Networks

Abstract

Access to Document

Fingerprint

Cite this