Combining Descriptive and Discriminative Information for Person Re-Identification

Martin Hirzer

Research output: ThesisDoctoral ThesisResearch

Abstract

A central task in many visual surveillance scenarios is person re-identification, i.e., recognizing an individual person across a network of spatially disjoint cameras. This is a very hard task for human operators and even harder for automated systems due to several challenges such as changes in viewpoint, pose, and illumination. To cope with these difficulties, most existing methods either try to find a suitable description of a person’s appearance or learn a discriminative model. Since these different representational strategies capture a large extent of complementary information, in this thesis, we propose to exploit both directions.
In particular, we first introduce an application-focused approach of integrating a descriptive and a discriminative person model into a single system. Given a specific query person, we initially run a fast, descriptive stage, where appearance is captured by a set of region covariance descriptors. This allows us to quickly provide a preliminary search result to a human operator. In a second stage, the operator can then refine the thus obtained result by applying a discriminatively learned person model, which is based on boosting for feature selection. In this way, we can take advantage of both, the time efficiency of the descriptive as well as the improved accuracy of the discriminative model.
The second part of this thesis is devoted to metric learning, a relatively new direction in the field of person re-identification. Although it provides a very elegant and mathematically principled fusion of descriptive and discriminative techniques, most existing metric learning approaches are not adapted to the task at hand and additionally suffer from high computational costs. Hence, in our work, we address these shortcomings and develop methods that are not only much more efficient, but also less prone to over-fitting, thus, enhancing their practical applicability in realistic, large-scale camera networks.
In order to demonstrate the benefits of our combined strategy, we present results on several publicly available benchmark datasets of different complexity. We show that having two complementary information cues capturing diverse aspects of a person’s appearance is advantageous for the given problem, and that metric learning can achieve state-of-the-art or even better performance, however, requiring much less computational power compared to many other person re-identification approaches.
Translated title of the contributionKombination von Deskriptiven und Diskriminativen Informationen zur Personenwiedererkennung
Original languageEnglish
Publication statusPublished - 2014

Fingerprint

Cameras
Feature extraction
Fusion reactions
Lighting
Costs

Fields of Expertise

  • Information, Communication & Computing

Cite this

Combining Descriptive and Discriminative Information for Person Re-Identification. / Hirzer, Martin.

2014. 139 p.

Research output: ThesisDoctoral ThesisResearch

@phdthesis{351a09843c4b4767a046f6b888d2c814,
title = "Combining Descriptive and Discriminative Information for Person Re-Identification",
abstract = "A central task in many visual surveillance scenarios is person re-identification, i.e., recognizing an individual person across a network of spatially disjoint cameras. This is a very hard task for human operators and even harder for automated systems due to several challenges such as changes in viewpoint, pose, and illumination. To cope with these difficulties, most existing methods either try to find a suitable description of a person’s appearance or learn a discriminative model. Since these different representational strategies capture a large extent of complementary information, in this thesis, we propose to exploit both directions.In particular, we first introduce an application-focused approach of integrating a descriptive and a discriminative person model into a single system. Given a specific query person, we initially run a fast, descriptive stage, where appearance is captured by a set of region covariance descriptors. This allows us to quickly provide a preliminary search result to a human operator. In a second stage, the operator can then refine the thus obtained result by applying a discriminatively learned person model, which is based on boosting for feature selection. In this way, we can take advantage of both, the time efficiency of the descriptive as well as the improved accuracy of the discriminative model.The second part of this thesis is devoted to metric learning, a relatively new direction in the field of person re-identification. Although it provides a very elegant and mathematically principled fusion of descriptive and discriminative techniques, most existing metric learning approaches are not adapted to the task at hand and additionally suffer from high computational costs. Hence, in our work, we address these shortcomings and develop methods that are not only much more efficient, but also less prone to over-fitting, thus, enhancing their practical applicability in realistic, large-scale camera networks.In order to demonstrate the benefits of our combined strategy, we present results on several publicly available benchmark datasets of different complexity. We show that having two complementary information cues capturing diverse aspects of a person’s appearance is advantageous for the given problem, and that metric learning can achieve state-of-the-art or even better performance, however, requiring much less computational power compared to many other person re-identification approaches.",
author = "Martin Hirzer",
year = "2014",
language = "English",

}

TY - THES

T1 - Combining Descriptive and Discriminative Information for Person Re-Identification

AU - Hirzer, Martin

PY - 2014

Y1 - 2014

N2 - A central task in many visual surveillance scenarios is person re-identification, i.e., recognizing an individual person across a network of spatially disjoint cameras. This is a very hard task for human operators and even harder for automated systems due to several challenges such as changes in viewpoint, pose, and illumination. To cope with these difficulties, most existing methods either try to find a suitable description of a person’s appearance or learn a discriminative model. Since these different representational strategies capture a large extent of complementary information, in this thesis, we propose to exploit both directions.In particular, we first introduce an application-focused approach of integrating a descriptive and a discriminative person model into a single system. Given a specific query person, we initially run a fast, descriptive stage, where appearance is captured by a set of region covariance descriptors. This allows us to quickly provide a preliminary search result to a human operator. In a second stage, the operator can then refine the thus obtained result by applying a discriminatively learned person model, which is based on boosting for feature selection. In this way, we can take advantage of both, the time efficiency of the descriptive as well as the improved accuracy of the discriminative model.The second part of this thesis is devoted to metric learning, a relatively new direction in the field of person re-identification. Although it provides a very elegant and mathematically principled fusion of descriptive and discriminative techniques, most existing metric learning approaches are not adapted to the task at hand and additionally suffer from high computational costs. Hence, in our work, we address these shortcomings and develop methods that are not only much more efficient, but also less prone to over-fitting, thus, enhancing their practical applicability in realistic, large-scale camera networks.In order to demonstrate the benefits of our combined strategy, we present results on several publicly available benchmark datasets of different complexity. We show that having two complementary information cues capturing diverse aspects of a person’s appearance is advantageous for the given problem, and that metric learning can achieve state-of-the-art or even better performance, however, requiring much less computational power compared to many other person re-identification approaches.

AB - A central task in many visual surveillance scenarios is person re-identification, i.e., recognizing an individual person across a network of spatially disjoint cameras. This is a very hard task for human operators and even harder for automated systems due to several challenges such as changes in viewpoint, pose, and illumination. To cope with these difficulties, most existing methods either try to find a suitable description of a person’s appearance or learn a discriminative model. Since these different representational strategies capture a large extent of complementary information, in this thesis, we propose to exploit both directions.In particular, we first introduce an application-focused approach of integrating a descriptive and a discriminative person model into a single system. Given a specific query person, we initially run a fast, descriptive stage, where appearance is captured by a set of region covariance descriptors. This allows us to quickly provide a preliminary search result to a human operator. In a second stage, the operator can then refine the thus obtained result by applying a discriminatively learned person model, which is based on boosting for feature selection. In this way, we can take advantage of both, the time efficiency of the descriptive as well as the improved accuracy of the discriminative model.The second part of this thesis is devoted to metric learning, a relatively new direction in the field of person re-identification. Although it provides a very elegant and mathematically principled fusion of descriptive and discriminative techniques, most existing metric learning approaches are not adapted to the task at hand and additionally suffer from high computational costs. Hence, in our work, we address these shortcomings and develop methods that are not only much more efficient, but also less prone to over-fitting, thus, enhancing their practical applicability in realistic, large-scale camera networks.In order to demonstrate the benefits of our combined strategy, we present results on several publicly available benchmark datasets of different complexity. We show that having two complementary information cues capturing diverse aspects of a person’s appearance is advantageous for the given problem, and that metric learning can achieve state-of-the-art or even better performance, however, requiring much less computational power compared to many other person re-identification approaches.

M3 - Doctoral Thesis

ER -