Representing Objects in Video as Space-Time Volumes by Combining Top-Down and Bottom-Up Processes

Publikation: Beitrag in Buch/Bericht/KonferenzbandBeitrag in einem KonferenzbandBegutachtung

Abstract

As top-down based approaches of object recognition from video are getting more powerful, a structured way to combine them with bottom-up grouping processes becomes feasible. When done right, the resulting representation is able to describe objects and their decomposition into parts at appropriate spatio-temporal scales. We propose a method that uses a modern object detector to focus on salient structures in video, and a dense optical flow estimator to supplement feature extraction. From these structures we extract space-time volumes of interest (STVIs) by smoothing in spatio-temporal Gaussian Scale Space that guides bottom-up grouping. The resulting novel representation enables us to analyze and visualize the decomposition of an object into meaningful parts while preserving temporal object continuity. Our experimental validation is twofold. First, we achieve competitive results on a common video object segmentation benchmark. Second, we extend this benchmark with high quality object part annotations, DAVIS Parts 1, on which we establish a strong baseline by showing that our method yields spatio-temporally meaningful object parts. Our new representation will support applications that require high-level space-time reasoning at the parts level.

Originalspracheenglisch
TitelProceedings - 2020 IEEE Winter Conference on Applications of Computer Vision, WACV 2020
Seiten1903-1911
Seitenumfang9
ISBN (elektronisch)9781728165530
PublikationsstatusVeröffentlicht - 1 März 2020
Veranstaltung2020 IEEE/CVF Winter Conference on Applications of Computer Vision: WACV 2020 - Snowmass Village, USA / Vereinigte Staaten
Dauer: 1 März 20205 März 2020

Publikationsreihe

NameProceedings - 2020 IEEE Winter Conference on Applications of Computer Vision, WACV 2020

Konferenz

Konferenz2020 IEEE/CVF Winter Conference on Applications of Computer Vision
KurztitelWACV 2020
Land/GebietUSA / Vereinigte Staaten
OrtSnowmass Village
Zeitraum1/03/205/03/20

ASJC Scopus subject areas

  • Maschinelles Sehen und Mustererkennung
  • Angewandte Informatik

Fingerprint

Untersuchen Sie die Forschungsthemen von „Representing Objects in Video as Space-Time Volumes by Combining Top-Down and Bottom-Up Processes“. Zusammen bilden sie einen einzigartigen Fingerprint.

Dieses zitieren