Measuring Generalisation to Unseen Viewpoints, Articulations, Shapes and Objects for 3D Hand Pose Estimation under Hand-Object Interaction

Anil Armagan, Guillermo Garcia-Hernando, Seungryul Baek, Shreyas Hampali Shivakumar, Mahdi Rad, Zhaohui Zhang, Shipeng Xie, MingXiu Chen, Boshen Zhang, Fu Xiong, Yang Xiao, Zhiguo Cao, Junsong Yuan, Pengfei Ren, Weiting Huang, Haifeng Sun, Marek Hrúz, Jakub Kanis, Zdeněk Krňoul, Qingfu WanShile Li, Linlin Yang, Dongheui Lee, Angela Yao, Weiguo Zhou, Sijia Mei, Yunhui Liu, Adrian Spurr, Umar Iqbal, Pavlo Molchanov, Philippe Weinzaepfel, Romain Brégier, Gregory Rogez, Vincent Lepetit, Tae-Kyun Kim

Publikation: Beitrag in Buch/Bericht/KonferenzbandBeitrag in einem Konferenzband

Abstract

We study how well different types of approaches generalise in the task of 3D hand pose estimation under single hand scenarios and hand-object interaction. We show that the accuracy of state-of-the-art methods can drop, and that they fail mostly on poses absent from the training set. Unfortunately, since the space of hand poses is highly dimensional, it is inherently not feasible to cover the whole space densely, despite recent efforts in collecting large-scale training datasets. This sampling problem is even more severe when hands are interacting with objects and/or inputs are RGB rather than depth images, as RGB images also vary with lighting conditions and colors. To address these issues, we designed a public challenge (HANDS’19) to evaluate the abilities of current 3D hand pose estimators (HPEs) to interpolate and extrapolate the poses of a training set. More exactly, HANDS’19 is designed (a) to evaluate the influence of both depth and color modalities on 3D hand pose estimation, under the presence or absence of objects; (b) to assess the generalisation abilities w.r.t. four main axes: shapes, articulations, viewpoints, and objects; (c) to explore the use of a synthetic hand models to fill the gaps of current datasets. Through the challenge, the overall accuracy has dramatically improved over the baseline, especially on extrapolation tasks, from 27 mm to 13 mm mean joint error. Our analyses highlight the impacts of: Data pre-processing, ensemble approaches, the use of a parametric 3D hand model (MANO), and different HPE methods/backbones.

Originalspracheenglisch
TitelComputer Vision – ECCV 2020 - 16th European Conference, Glasgow, 2020, Proceedings
Redakteure/-innenAndrea Vedaldi, Horst Bischof, Thomas Brox, Jan-Michael Frahm
Seiten85-101
Seitenumfang17
DOIs
PublikationsstatusVeröffentlicht - 23 Aug 2020
Veranstaltung16th European Conference on Computer Vision: ECCV 2020 - Virtual, Glasgow, Großbritannien / Vereinigtes Königreich
Dauer: 23 Aug 202028 Aug 2020

Publikationsreihe

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Band12368 LNCS
ISSN (Print)0302-9743
ISSN (elektronisch)1611-3349

Konferenz

Konferenz16th European Conference on Computer Vision
KurztitelECCV 2020
LandGroßbritannien / Vereinigtes Königreich
OrtVirtual, Glasgow
Zeitraum23/08/2028/08/20

ASJC Scopus subject areas

  • !!Theoretical Computer Science
  • !!Computer Science(all)

Fingerprint

Untersuchen Sie die Forschungsthemen von „Measuring Generalisation to Unseen Viewpoints, Articulations, Shapes and Objects for 3D Hand Pose Estimation under Hand-Object Interaction“. Zusammen bilden sie einen einzigartigen Fingerprint.

Dieses zitieren