Synthesizing human-like sketches from natural images using a conditional convolutional decoder

Moritz Daniel Kampelmühler; Axel Pinz

doi:10.1109/WACV45572.2020.9093440

Synthesizing human-like sketches from natural images using a conditional convolutional decoder

Moritz Daniel Kampelmühler^*, Axel Pinz

^*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceeding › Conference paper › peer-review

Abstract

Humans are able to precisely communicate diverse concepts by employing sketches, a highly reduced and abstract shape based representation of visual content. We propose, for the first time, a fully convolutional end-to-end architecture that is able to synthesize human-like sketches of objects in natural images with potentially cluttered background. To enable an architecture to learn this highly abstract mapping, we employ the following key components: (1) a fully convolutional encoder-decoder structure, (2) a perceptual similarity loss function operating in an abstract feature space and (3) conditioning of the decoder on the label of the object that shall be sketched. Given the combination of these architectural concepts, we can train our structure in an end-to-end supervised fashion on a collection of sketch-image pairs. The generated sketches of our architecture can be classified with 85.6% Top-5 accuracy and we verify their visual quality via a user study. We find that deep features as a perceptual similarity metric enable image translation with large domain gaps and our findings further show that convolutional neural networks trained on image classification tasks implicitly learn to encode shape information.

Original language	English
Title of host publication	Proceedings - 2020 IEEE Winter Conference on Applications of Computer Vision, WACV 2020
Pages	3192-3200
Number of pages	9
ISBN (Electronic)	9781728165530
DOIs	https://doi.org/10.1109/WACV45572.2020.9093440
Publication status	Published - Mar 2020
Event	wacv2020: WACV 2020 - Snowmass Village, United States Duration: 1 Mar 2020 → 5 Mar 2020

Publication series

Name	Proceedings - 2020 IEEE Winter Conference on Applications of Computer Vision, WACV 2020

Conference

Conference	wacv2020
Abbreviated title	WACV 2020
Country/Territory	United States
City	Snowmass Village
Period	1/03/20 → 5/03/20

ASJC Scopus subject areas

Computer Vision and Pattern Recognition
Computer Science Applications

Access to Document

10.1109/WACV45572.2020.9093440

http://openaccess.thecvf.com/content_WACV_2020/html/Kampelmuhler_Synthesizing_human-like_sketches_from_natural_images_using_a_conditional_convolutional_WACV_2020_paper.html

Cite this

Kampelmühler, M. D., & Pinz, A. (2020). Synthesizing human-like sketches from natural images using a conditional convolutional decoder. In Proceedings - 2020 IEEE Winter Conference on Applications of Computer Vision, WACV 2020 (pp. 3192-3200). Article 9093440 (Proceedings - 2020 IEEE Winter Conference on Applications of Computer Vision, WACV 2020). https://doi.org/10.1109/WACV45572.2020.9093440

Synthesizing human-like sketches from natural images using a conditional convolutional decoder. / Kampelmühler, Moritz Daniel; Pinz, Axel.
Proceedings - 2020 IEEE Winter Conference on Applications of Computer Vision, WACV 2020. 2020. p. 3192-3200 9093440 (Proceedings - 2020 IEEE Winter Conference on Applications of Computer Vision, WACV 2020).

Research output: Chapter in Book/Report/Conference proceeding › Conference paper › peer-review

Kampelmühler, MD & Pinz, A 2020, Synthesizing human-like sketches from natural images using a conditional convolutional decoder. in Proceedings - 2020 IEEE Winter Conference on Applications of Computer Vision, WACV 2020., 9093440, Proceedings - 2020 IEEE Winter Conference on Applications of Computer Vision, WACV 2020, pp. 3192-3200, wacv2020, Snowmass Village, Colorado, United States, 1/03/20. https://doi.org/10.1109/WACV45572.2020.9093440

@inproceedings{072530fd58f5496fab27de962f0163a5,

title = "Synthesizing human-like sketches from natural images using a conditional convolutional decoder",

abstract = "Humans are able to precisely communicate diverse concepts by employing sketches, a highly reduced and abstract shape based representation of visual content. We propose, for the first time, a fully convolutional end-to-end architecture that is able to synthesize human-like sketches of objects in natural images with potentially cluttered background. To enable an architecture to learn this highly abstract mapping, we employ the following key components: (1) a fully convolutional encoder-decoder structure, (2) a perceptual similarity loss function operating in an abstract feature space and (3) conditioning of the decoder on the label of the object that shall be sketched. Given the combination of these architectural concepts, we can train our structure in an end-to-end supervised fashion on a collection of sketch-image pairs. The generated sketches of our architecture can be classified with 85.6% Top-5 accuracy and we verify their visual quality via a user study. We find that deep features as a perceptual similarity metric enable image translation with large domain gaps and our findings further show that convolutional neural networks trained on image classification tasks implicitly learn to encode shape information.",

author = "Kampelm{\"u}hler, {Moritz Daniel} and Axel Pinz",

year = "2020",

month = mar,

doi = "10.1109/WACV45572.2020.9093440",

language = "English",

series = "Proceedings - 2020 IEEE Winter Conference on Applications of Computer Vision, WACV 2020",

pages = "3192--3200",

booktitle = "Proceedings - 2020 IEEE Winter Conference on Applications of Computer Vision, WACV 2020",

note = "wacv2020 : WACV 2020, WACV 2020 ; Conference date: 01-03-2020 Through 05-03-2020",

}

TY - GEN

T1 - Synthesizing human-like sketches from natural images using a conditional convolutional decoder

AU - Kampelmühler, Moritz Daniel

AU - Pinz, Axel

PY - 2020/3

Y1 - 2020/3

N2 - Humans are able to precisely communicate diverse concepts by employing sketches, a highly reduced and abstract shape based representation of visual content. We propose, for the first time, a fully convolutional end-to-end architecture that is able to synthesize human-like sketches of objects in natural images with potentially cluttered background. To enable an architecture to learn this highly abstract mapping, we employ the following key components: (1) a fully convolutional encoder-decoder structure, (2) a perceptual similarity loss function operating in an abstract feature space and (3) conditioning of the decoder on the label of the object that shall be sketched. Given the combination of these architectural concepts, we can train our structure in an end-to-end supervised fashion on a collection of sketch-image pairs. The generated sketches of our architecture can be classified with 85.6% Top-5 accuracy and we verify their visual quality via a user study. We find that deep features as a perceptual similarity metric enable image translation with large domain gaps and our findings further show that convolutional neural networks trained on image classification tasks implicitly learn to encode shape information.

AB - Humans are able to precisely communicate diverse concepts by employing sketches, a highly reduced and abstract shape based representation of visual content. We propose, for the first time, a fully convolutional end-to-end architecture that is able to synthesize human-like sketches of objects in natural images with potentially cluttered background. To enable an architecture to learn this highly abstract mapping, we employ the following key components: (1) a fully convolutional encoder-decoder structure, (2) a perceptual similarity loss function operating in an abstract feature space and (3) conditioning of the decoder on the label of the object that shall be sketched. Given the combination of these architectural concepts, we can train our structure in an end-to-end supervised fashion on a collection of sketch-image pairs. The generated sketches of our architecture can be classified with 85.6% Top-5 accuracy and we verify their visual quality via a user study. We find that deep features as a perceptual similarity metric enable image translation with large domain gaps and our findings further show that convolutional neural networks trained on image classification tasks implicitly learn to encode shape information.

UR - http://www.scopus.com/inward/record.url?scp=85085482934&partnerID=8YFLogxK

U2 - 10.1109/WACV45572.2020.9093440

DO - 10.1109/WACV45572.2020.9093440

M3 - Conference paper

T3 - Proceedings - 2020 IEEE Winter Conference on Applications of Computer Vision, WACV 2020

SP - 3192

EP - 3200

BT - Proceedings - 2020 IEEE Winter Conference on Applications of Computer Vision, WACV 2020

T2 - wacv2020

Y2 - 1 March 2020 through 5 March 2020

ER -

Synthesizing human-like sketches from natural images using a conditional convolutional decoder

Abstract

Publication series

Conference

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this