Real-time Gesture Animation Generation from Speech for Virtual Human Interaction

Manuel Rebol; Christian Gütl; Krzysztof Pietroszek

doi:10.1145/3411763.3451554

Real-time Gesture Animation Generation from Speech for Virtual Human Interaction

Manuel Rebol, Christian Gütl, Krzysztof Pietroszek

Institute of Interactive Systems and Data Science (7060)

Research output: Chapter in Book/Report/Conference proceeding › Conference paper › peer-review

Abstract

We propose a real-time system for synthesizing gestures directly from speech. Our data-driven approach is based on Generative Adversarial Neural Networks to model the speech-gesture relationship. We utilize the large amount of speaker video data available online to train our 3D gesture model. Our model generates speaker-specific gestures by taking consecutive audio input chunks of two seconds in length. We animate the predicted gestures on a virtual avatar. We achieve a delay below three seconds between the time of audio input and gesture animation.

Original language	English
Title of host publication	Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems, CHI EA 2021
Publisher	Association of Computing Machinery
Pages	327-344
ISBN (Electronic)	9781450380959
DOIs	https://doi.org/10.1145/3411763.3451554
Publication status	Published - 8 May 2021
Event	2021 CHI Conference on Human Factors in Computing Systems: Making Waves, Combining Strengths: CHI 2021 - Virtual, Online, Japan Duration: 8 May 2021 → 13 May 2021

Publication series

Name	Conference on Human Factors in Computing Systems - Proceedings

Conference

Conference	2021 CHI Conference on Human Factors in Computing Systems: Making Waves, Combining Strengths
Country/Territory	Japan
City	Virtual, Online
Period	8/05/21 → 13/05/21

Keywords

Animation
Gestures
NUI

ASJC Scopus subject areas

Human-Computer Interaction
Computer Graphics and Computer-Aided Design
Software

Fields of Expertise

Information, Communication & Computing

Access to Document

10.1145/3411763.3451554

Cite this

Rebol, M., Gütl, C., & Pietroszek, K. (2021). Real-time Gesture Animation Generation from Speech for Virtual Human Interaction. In Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems, CHI EA 2021 (pp. 327-344). (Conference on Human Factors in Computing Systems - Proceedings). Association of Computing Machinery. https://doi.org/10.1145/3411763.3451554

Real-time Gesture Animation Generation from Speech for Virtual Human Interaction. / Rebol, Manuel; Gütl, Christian; Pietroszek, Krzysztof.
Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems, CHI EA 2021. Association of Computing Machinery, 2021. p. 327-344 (Conference on Human Factors in Computing Systems - Proceedings).

Research output: Chapter in Book/Report/Conference proceeding › Conference paper › peer-review

Rebol, M, Gütl, C & Pietroszek, K 2021, Real-time Gesture Animation Generation from Speech for Virtual Human Interaction. in Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems, CHI EA 2021. Conference on Human Factors in Computing Systems - Proceedings, Association of Computing Machinery, pp. 327-344, 2021 CHI Conference on Human Factors in Computing Systems: Making Waves, Combining Strengths, Virtual, Online, Japan, 8/05/21. https://doi.org/10.1145/3411763.3451554

@inproceedings{ec718ed0dd8740628eb2e6bffc127fd7,

title = "Real-time Gesture Animation Generation from Speech for Virtual Human Interaction",

abstract = "We propose a real-time system for synthesizing gestures directly from speech. Our data-driven approach is based on Generative Adversarial Neural Networks to model the speech-gesture relationship. We utilize the large amount of speaker video data available online to train our 3D gesture model. Our model generates speaker-specific gestures by taking consecutive audio input chunks of two seconds in length. We animate the predicted gestures on a virtual avatar. We achieve a delay below three seconds between the time of audio input and gesture animation.",

keywords = "Animation, Gestures, NUI",

author = "Manuel Rebol and Christian G{\"u}tl and Krzysztof Pietroszek",

note = "Funding Information: This work was supported in part by the Marshall Plan Foundation and the National Science Foundation. Publisher Copyright: {\textcopyright} 2021 Owner/Author.; 2021 CHI Conference on Human Factors in Computing Systems: Making Waves, Combining Strengths : CHI 2021 ; Conference date: 08-05-2021 Through 13-05-2021",

year = "2021",

month = may,

day = "8",

doi = "10.1145/3411763.3451554",

language = "English",

series = "Conference on Human Factors in Computing Systems - Proceedings",

publisher = "Association of Computing Machinery",

pages = "327--344",

booktitle = "Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems, CHI EA 2021",

address = "United States",

}

TY - GEN

T1 - Real-time Gesture Animation Generation from Speech for Virtual Human Interaction

AU - Rebol, Manuel

AU - Gütl, Christian

AU - Pietroszek, Krzysztof

PY - 2021/5/8

Y1 - 2021/5/8

N2 - We propose a real-time system for synthesizing gestures directly from speech. Our data-driven approach is based on Generative Adversarial Neural Networks to model the speech-gesture relationship. We utilize the large amount of speaker video data available online to train our 3D gesture model. Our model generates speaker-specific gestures by taking consecutive audio input chunks of two seconds in length. We animate the predicted gestures on a virtual avatar. We achieve a delay below three seconds between the time of audio input and gesture animation.

AB - We propose a real-time system for synthesizing gestures directly from speech. Our data-driven approach is based on Generative Adversarial Neural Networks to model the speech-gesture relationship. We utilize the large amount of speaker video data available online to train our 3D gesture model. Our model generates speaker-specific gestures by taking consecutive audio input chunks of two seconds in length. We animate the predicted gestures on a virtual avatar. We achieve a delay below three seconds between the time of audio input and gesture animation.

KW - Animation

KW - Gestures

KW - NUI

UR - http://www.scopus.com/inward/record.url?scp=85105803917&partnerID=8YFLogxK

U2 - 10.1145/3411763.3451554

DO - 10.1145/3411763.3451554

M3 - Conference paper

AN - SCOPUS:85105803917

T3 - Conference on Human Factors in Computing Systems - Proceedings

SP - 327

EP - 344

BT - Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems, CHI EA 2021

PB - Association of Computing Machinery

T2 - 2021 CHI Conference on Human Factors in Computing Systems: Making Waves, Combining Strengths

Y2 - 8 May 2021 through 13 May 2021

ER -

Real-time Gesture Animation Generation from Speech for Virtual Human Interaction

Abstract

Publication series

Conference

Keywords

ASJC Scopus subject areas

Fields of Expertise

Access to Document

Other files and links

Fingerprint

Cite this