Synthesizing human-like sketches from natural images using a conditional convolutional decoder

Moritz Daniel Kampelmühler; Axel Pinz

doi:10.1109/WACV45572.2020.9093440

Synthesizing human-like sketches from natural images using a conditional convolutional decoder

Moritz Daniel Kampelmühler^*, Axel Pinz

^*Korrespondierende/r Autor/-in für diese Arbeit

Publikation: Beitrag in Buch/Bericht/Konferenzband › Beitrag in einem Konferenzband › Begutachtung

Abstract

Humans are able to precisely communicate diverse concepts by employing sketches, a highly reduced and abstract shape based representation of visual content. We propose, for the first time, a fully convolutional end-to-end architecture that is able to synthesize human-like sketches of objects in natural images with potentially cluttered background. To enable an architecture to learn this highly abstract mapping, we employ the following key components: (1) a fully convolutional encoder-decoder structure, (2) a perceptual similarity loss function operating in an abstract feature space and (3) conditioning of the decoder on the label of the object that shall be sketched. Given the combination of these architectural concepts, we can train our structure in an end-to-end supervised fashion on a collection of sketch-image pairs. The generated sketches of our architecture can be classified with 85.6% Top-5 accuracy and we verify their visual quality via a user study. We find that deep features as a perceptual similarity metric enable image translation with large domain gaps and our findings further show that convolutional neural networks trained on image classification tasks implicitly learn to encode shape information.

Originalsprache	englisch
Titel	Proceedings - 2020 IEEE Winter Conference on Applications of Computer Vision, WACV 2020
Seiten	3192-3200
Seitenumfang	9
ISBN (elektronisch)	9781728165530
DOIs	https://doi.org/10.1109/WACV45572.2020.9093440
Publikationsstatus	Veröffentlicht - März 2020
Veranstaltung	2020 IEEE/CVF Winter Conference on Applications of Computer Vision: WACV 2020 - Snowmass Village, USA / Vereinigte Staaten Dauer: 1 März 2020 → 5 März 2020

Publikationsreihe

Name	Proceedings - 2020 IEEE Winter Conference on Applications of Computer Vision, WACV 2020

Konferenz

Konferenz	2020 IEEE/CVF Winter Conference on Applications of Computer Vision
Kurztitel	WACV 2020
Land/Gebiet	USA / Vereinigte Staaten
Ort	Snowmass Village
Zeitraum	1/03/20 → 5/03/20

ASJC Scopus subject areas

Maschinelles Sehen und Mustererkennung
Angewandte Informatik

Zugriff auf Dokument

10.1109/WACV45572.2020.9093440

http://openaccess.thecvf.com/content_WACV_2020/html/Kampelmuhler_Synthesizing_human-like_sketches_from_natural_images_using_a_conditional_convolutional_WACV_2020_paper.html

Andere Dateien und Links

http://www.scopus.com/inward/record.url?scp=85085482934&partnerID=8YFLogxK

Dieses zitieren

Kampelmühler, M. D., & Pinz, A. (2020). Synthesizing human-like sketches from natural images using a conditional convolutional decoder. in Proceedings - 2020 IEEE Winter Conference on Applications of Computer Vision, WACV 2020 (S. 3192-3200). Artikel 9093440 (Proceedings - 2020 IEEE Winter Conference on Applications of Computer Vision, WACV 2020). https://doi.org/10.1109/WACV45572.2020.9093440

Synthesizing human-like sketches from natural images using a conditional convolutional decoder. / Kampelmühler, Moritz Daniel; Pinz, Axel.
Proceedings - 2020 IEEE Winter Conference on Applications of Computer Vision, WACV 2020. 2020. S. 3192-3200 9093440 (Proceedings - 2020 IEEE Winter Conference on Applications of Computer Vision, WACV 2020).

Publikation: Beitrag in Buch/Bericht/Konferenzband › Beitrag in einem Konferenzband › Begutachtung

Kampelmühler, MD & Pinz, A 2020, Synthesizing human-like sketches from natural images using a conditional convolutional decoder. in Proceedings - 2020 IEEE Winter Conference on Applications of Computer Vision, WACV 2020., 9093440, Proceedings - 2020 IEEE Winter Conference on Applications of Computer Vision, WACV 2020, S. 3192-3200, 2020 IEEE/CVF Winter Conference on Applications of Computer Vision, Snowmass Village, Colorado, USA / Vereinigte Staaten, 1/03/20. https://doi.org/10.1109/WACV45572.2020.9093440

@inproceedings{072530fd58f5496fab27de962f0163a5,

title = "Synthesizing human-like sketches from natural images using a conditional convolutional decoder",

abstract = "Humans are able to precisely communicate diverse concepts by employing sketches, a highly reduced and abstract shape based representation of visual content. We propose, for the first time, a fully convolutional end-to-end architecture that is able to synthesize human-like sketches of objects in natural images with potentially cluttered background. To enable an architecture to learn this highly abstract mapping, we employ the following key components: (1) a fully convolutional encoder-decoder structure, (2) a perceptual similarity loss function operating in an abstract feature space and (3) conditioning of the decoder on the label of the object that shall be sketched. Given the combination of these architectural concepts, we can train our structure in an end-to-end supervised fashion on a collection of sketch-image pairs. The generated sketches of our architecture can be classified with 85.6% Top-5 accuracy and we verify their visual quality via a user study. We find that deep features as a perceptual similarity metric enable image translation with large domain gaps and our findings further show that convolutional neural networks trained on image classification tasks implicitly learn to encode shape information.",

author = "Kampelm{\"u}hler, {Moritz Daniel} and Axel Pinz",

year = "2020",

month = mar,

doi = "10.1109/WACV45572.2020.9093440",

language = "English",

series = "Proceedings - 2020 IEEE Winter Conference on Applications of Computer Vision, WACV 2020",

pages = "3192--3200",

booktitle = "Proceedings - 2020 IEEE Winter Conference on Applications of Computer Vision, WACV 2020",

note = "wacv2020 : WACV 2020, WACV 2020 ; Conference date: 01-03-2020 Through 05-03-2020",

}

TY - GEN

T1 - Synthesizing human-like sketches from natural images using a conditional convolutional decoder

AU - Kampelmühler, Moritz Daniel

AU - Pinz, Axel

PY - 2020/3

Y1 - 2020/3

N2 - Humans are able to precisely communicate diverse concepts by employing sketches, a highly reduced and abstract shape based representation of visual content. We propose, for the first time, a fully convolutional end-to-end architecture that is able to synthesize human-like sketches of objects in natural images with potentially cluttered background. To enable an architecture to learn this highly abstract mapping, we employ the following key components: (1) a fully convolutional encoder-decoder structure, (2) a perceptual similarity loss function operating in an abstract feature space and (3) conditioning of the decoder on the label of the object that shall be sketched. Given the combination of these architectural concepts, we can train our structure in an end-to-end supervised fashion on a collection of sketch-image pairs. The generated sketches of our architecture can be classified with 85.6% Top-5 accuracy and we verify their visual quality via a user study. We find that deep features as a perceptual similarity metric enable image translation with large domain gaps and our findings further show that convolutional neural networks trained on image classification tasks implicitly learn to encode shape information.

AB - Humans are able to precisely communicate diverse concepts by employing sketches, a highly reduced and abstract shape based representation of visual content. We propose, for the first time, a fully convolutional end-to-end architecture that is able to synthesize human-like sketches of objects in natural images with potentially cluttered background. To enable an architecture to learn this highly abstract mapping, we employ the following key components: (1) a fully convolutional encoder-decoder structure, (2) a perceptual similarity loss function operating in an abstract feature space and (3) conditioning of the decoder on the label of the object that shall be sketched. Given the combination of these architectural concepts, we can train our structure in an end-to-end supervised fashion on a collection of sketch-image pairs. The generated sketches of our architecture can be classified with 85.6% Top-5 accuracy and we verify their visual quality via a user study. We find that deep features as a perceptual similarity metric enable image translation with large domain gaps and our findings further show that convolutional neural networks trained on image classification tasks implicitly learn to encode shape information.

UR - http://www.scopus.com/inward/record.url?scp=85085482934&partnerID=8YFLogxK

U2 - 10.1109/WACV45572.2020.9093440

DO - 10.1109/WACV45572.2020.9093440

M3 - Conference paper

T3 - Proceedings - 2020 IEEE Winter Conference on Applications of Computer Vision, WACV 2020

SP - 3192

EP - 3200

BT - Proceedings - 2020 IEEE Winter Conference on Applications of Computer Vision, WACV 2020

T2 - wacv2020

Y2 - 1 March 2020 through 5 March 2020

ER -

Synthesizing human-like sketches from natural images using a conditional convolutional decoder

Abstract

Publikationsreihe

Konferenz

ASJC Scopus subject areas

Zugriff auf Dokument

Andere Dateien und Links

Fingerprint

Dieses zitieren