Honnotate: A Method for 3D Annotation of Hand and Object Poses

Shreyas Hampali Shivakumar, Mahdi Rad, Markus Oberweger, Vincent Lepetit

Research output: Chapter in Book/Report/Conference proceedingConference paperpeer-review

Abstract

We propose a method for annotating images of a hand manipulating an object with the 3D poses of both the hand and the object, together with a dataset created using this method. Our motivation is the current lack of annotated real images for this problem, as estimating the 3D poses is challenging, mostly because of the mutual occlusions between the hand and the object. To tackle this challenge, we capture sequences with one or several RGB-D cameras and jointly optimize the 3D hand and object poses over all the frames simultaneously. This method allows us to automatically annotate each frame with accurate estimates of the poses, despite large mutual occlusions. With this method, we created HO-3D, the first markerless dataset of color images with 3D annotations for both the hand and object. This dataset is currently made of 77,558 frames, 68 sequences, 10 persons, and 10 objects. Using our dataset, we develop a single RGB image-based method to predict the hand pose when interacting with objects under severe occlusions and show it generalizes to objects not seen in the dataset.

Original languageEnglish
Title of host publicationProceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Pages3193-3203
Number of pages11
DOIs
Publication statusPublished - Jun 2020
Event2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition: CVPR 2020 - virtuell, Virtual, United States
Duration: 14 Jun 202019 Jun 2020

Conference

Conference2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition
Abbreviated titleCVPR 2020
Country/TerritoryUnited States
CityVirtual
Period14/06/2019/06/20

ASJC Scopus subject areas

  • Software
  • Computer Vision and Pattern Recognition

Fingerprint

Dive into the research topics of 'Honnotate: A Method for 3D Annotation of Hand and Object Poses'. Together they form a unique fingerprint.

Cite this