Evaluating Spatial Configuration Constrained CNNs for Localizing Facial and Body Pose Landmarks

Christian Payer, Darko Stern, Martin Urschler

Research output: Contribution to conferencePaperResearchpeer-review

Abstract

Landmark localization is a widely used task required in medical image analysis and computer vision applications. Formulated in a heatmap regression framework, we have recently proposed a CNN architecture that learns on its own to split the localization task into two simpler sub-problems, dedicating one component to locally accurate but ambiguous predictions, while the other component improves robustness by incorporating the spatial configuration of landmarks to remove ambiguities. We learn this simplification in our SpatialConfiguration-Net (SCN) by multiplying the heatmap predictions of its two components and by training the network in and end-to-end manner, thus achieving regularization similar to e.g. a hand-crafted Markov Random Field model. While we have previously shown localization results solely on data from 2D and 3D medical imaging modalities, in this work our aim is to study the generalization capabilities of our SpatialConfiguration-Net to computer vision problems. Therefore, we evaluate our performance both in terms of accuracy and robustness on a facial alignment task, where we improve upon the state-of-the-art methods, as well as on a human body pose estimation task, where we demonstrate results in line with the recent state-of-the-art.
Original languageEnglish
Publication statusPublished - 2019
Event2019 International Conference on Image and Vision Computing New Zealand (IVCNZ) -
Duration: 2 Dec 20194 Dec 2019

Conference

Conference2019 International Conference on Image and Vision Computing New Zealand (IVCNZ)
Period2/12/194/12/19

Fingerprint

Computer vision
Medical imaging
Image analysis

Cite this

Payer, C., Stern, D., & Urschler, M. (2019). Evaluating Spatial Configuration Constrained CNNs for Localizing Facial and Body Pose Landmarks. Paper presented at 2019 International Conference on Image and Vision Computing New Zealand (IVCNZ), .

Evaluating Spatial Configuration Constrained CNNs for Localizing Facial and Body Pose Landmarks. / Payer, Christian; Stern, Darko; Urschler, Martin.

2019. Paper presented at 2019 International Conference on Image and Vision Computing New Zealand (IVCNZ), .

Research output: Contribution to conferencePaperResearchpeer-review

Payer, C, Stern, D & Urschler, M 2019, 'Evaluating Spatial Configuration Constrained CNNs for Localizing Facial and Body Pose Landmarks' Paper presented at 2019 International Conference on Image and Vision Computing New Zealand (IVCNZ), 2/12/19 - 4/12/19, .
Payer C, Stern D, Urschler M. Evaluating Spatial Configuration Constrained CNNs for Localizing Facial and Body Pose Landmarks. 2019. Paper presented at 2019 International Conference on Image and Vision Computing New Zealand (IVCNZ), .
Payer, Christian ; Stern, Darko ; Urschler, Martin. / Evaluating Spatial Configuration Constrained CNNs for Localizing Facial and Body Pose Landmarks. Paper presented at 2019 International Conference on Image and Vision Computing New Zealand (IVCNZ), .
@conference{54f286b3dc5c4dc6976f695be8ccc003,
title = "Evaluating Spatial Configuration Constrained CNNs for Localizing Facial and Body Pose Landmarks",
abstract = "Landmark localization is a widely used task required in medical image analysis and computer vision applications. Formulated in a heatmap regression framework, we have recently proposed a CNN architecture that learns on its own to split the localization task into two simpler sub-problems, dedicating one component to locally accurate but ambiguous predictions, while the other component improves robustness by incorporating the spatial configuration of landmarks to remove ambiguities. We learn this simplification in our SpatialConfiguration-Net (SCN) by multiplying the heatmap predictions of its two components and by training the network in and end-to-end manner, thus achieving regularization similar to e.g. a hand-crafted Markov Random Field model. While we have previously shown localization results solely on data from 2D and 3D medical imaging modalities, in this work our aim is to study the generalization capabilities of our SpatialConfiguration-Net to computer vision problems. Therefore, we evaluate our performance both in terms of accuracy and robustness on a facial alignment task, where we improve upon the state-of-the-art methods, as well as on a human body pose estimation task, where we demonstrate results in line with the recent state-of-the-art.",
author = "Christian Payer and Darko Stern and Martin Urschler",
year = "2019",
language = "English",
note = "2019 International Conference on Image and Vision Computing New Zealand (IVCNZ) ; Conference date: 02-12-2019 Through 04-12-2019",

}

TY - CONF

T1 - Evaluating Spatial Configuration Constrained CNNs for Localizing Facial and Body Pose Landmarks

AU - Payer, Christian

AU - Stern, Darko

AU - Urschler, Martin

PY - 2019

Y1 - 2019

N2 - Landmark localization is a widely used task required in medical image analysis and computer vision applications. Formulated in a heatmap regression framework, we have recently proposed a CNN architecture that learns on its own to split the localization task into two simpler sub-problems, dedicating one component to locally accurate but ambiguous predictions, while the other component improves robustness by incorporating the spatial configuration of landmarks to remove ambiguities. We learn this simplification in our SpatialConfiguration-Net (SCN) by multiplying the heatmap predictions of its two components and by training the network in and end-to-end manner, thus achieving regularization similar to e.g. a hand-crafted Markov Random Field model. While we have previously shown localization results solely on data from 2D and 3D medical imaging modalities, in this work our aim is to study the generalization capabilities of our SpatialConfiguration-Net to computer vision problems. Therefore, we evaluate our performance both in terms of accuracy and robustness on a facial alignment task, where we improve upon the state-of-the-art methods, as well as on a human body pose estimation task, where we demonstrate results in line with the recent state-of-the-art.

AB - Landmark localization is a widely used task required in medical image analysis and computer vision applications. Formulated in a heatmap regression framework, we have recently proposed a CNN architecture that learns on its own to split the localization task into two simpler sub-problems, dedicating one component to locally accurate but ambiguous predictions, while the other component improves robustness by incorporating the spatial configuration of landmarks to remove ambiguities. We learn this simplification in our SpatialConfiguration-Net (SCN) by multiplying the heatmap predictions of its two components and by training the network in and end-to-end manner, thus achieving regularization similar to e.g. a hand-crafted Markov Random Field model. While we have previously shown localization results solely on data from 2D and 3D medical imaging modalities, in this work our aim is to study the generalization capabilities of our SpatialConfiguration-Net to computer vision problems. Therefore, we evaluate our performance both in terms of accuracy and robustness on a facial alignment task, where we improve upon the state-of-the-art methods, as well as on a human body pose estimation task, where we demonstrate results in line with the recent state-of-the-art.

M3 - Paper

ER -