Combining Descriptive and Discriminative Information for Person Re-Identification

Martin Hirzer

Combining Descriptive and Discriminative Information for Person Re-Identification

Titel in Übersetzung: Kombination von Deskriptiven und Diskriminativen Informationen zur Personenwiedererkennung

Martin Hirzer

Institut für Maschinelles Sehen und Darstellen (7100)

Publikation: Studienabschlussarbeit › Dissertation

Abstract

Die Wiedererkennung von Personen in Kameranetzwerken zählt zu den Kernaufgaben vieler visueller Überwachungssysteme. Ausgehend von der Sichtung einer gesuchten Person in einem Kamerabild, sollen möglichst rasch sämtliche Erscheinungen derselben Person in weiteren Kameras des Netzwerkes gefunden werden. Sowohl für Menschen als auch für automatische Systeme stellt dies eine äußerst schwierige Aufgabe dar, da sich die Abbildung einer Person zwischen zwei verschiedenen Kameras sehr stark unterscheiden kann, beispielsweise aufgrund von Veränderungen im Blickwinkel, der Körperhaltung und der Beleuchtung. Viele der existierenden Systeme zur automatischen Personensuche setzen daher entweder auf eine deskriptive Strategie, versuchen also eine robuste, ganzheitliche Personenbeschreibung zu generieren, oder verfolgen einen diskriminativen Ansatz, um spezifische Details einer bestimmten Person zu extrahieren. Da diese beiden komplementären Richtungen ganz unterschiedliche Aspekte eines Personenbildes erfassen können, schlagen wir in dieser Dissertation vor, beide für die Personensuche zu verwenden.
Um dies zu erreichen, stellen wir zuerst einen anwendungsorientierten Ansatz vor, der beide Strategien in einem System vereint. Wird eine Person zur Suche ausgewählt, so beginnen wir mit einem schnellen, deskriptiven Suchverfahren, bei dem verschiedene visuelle Merkmale mit Hilfe einer Kovarianzbeschreibung erfasst werden. Dadurch ist es unserem System möglich, dem Benutzer sehr rasch ein erstes Suchergebnis zu präsentieren. Falls nötig, kann dieses Ergebnis in einem zweiten Schritt dann noch weiter verfeinert werden. Dazu verwenden wir ein auf Boosting basierendes, diskriminatives Suchverfahren. Bezogen auf das Gesamtsystem bedeutet diese zweistufige Vorgehensweise, dass wir sowohl die geringere Rechenzeit des deskriptiven, als auch die höhere Genauigkeit des diskriminativen Modells ausnutzen können.
Im zweiten Teil dieser Dissertation beschäftigen wir uns mit verschiedenen Metrik-Lernverfahren, einem relativ neuen Forschungsgebiet im Bereich der visuellen Personenwiedererkennung. Obwohl Metrik-Lernverfahren eine sehr elegante und mathematisch fundierte Verbindung von deskriptiven und diskriminativen Techniken erlauben, so sind die meisten vorhandenen Ansätze nicht an die speziellen Herausforderungen, die bei der Personensuche auftreten, angepasst und benötigen darüber hinaus noch eine hohe Rechenleistung. Um diese Einschränkungen zu beseitigen und damit die praktische Anwendbarkeit der Lernverfahren zu erhöhen, untersuchen wir in dieser Arbeit Methoden zum Lernen von Metriken die nicht nur sehr viel effizienter, sondern auch robuster sind als existierende Ansätze.
Im letzten Teil demonstrieren wir schließlich die Vorteile unserer kombinierten Strategie auf mehreren öffentlich zugänglichen Personendatenbanken unterschiedlicher Komplexität. Die Ergebnisse zeigen, dass sich die komplementären Aspekte, die von deskriptiven und diskriminativen Modellen beschrieben werden, äußerst nutzbringend miteinander verbinden lassen. Dies trifft im Besonderen auf Metrik-Lernverfahren zu, welche nicht nur hervorragende Resultate erzielen, sondern im Vergleich zu anderen Ansätzen auf dem Gebiet der visuellen Personenwiedererkennung auch um ein Vielfaches effizienter sind.

Titel in Übersetzung	Kombination von Deskriptiven und Diskriminativen Informationen zur Personenwiedererkennung
Originalsprache	englisch
Publikationsstatus	Veröffentlicht - 2014

Fields of Expertise

Information, Communication & Computing

Dieses zitieren

@phdthesis{351a09843c4b4767a046f6b888d2c814,

title = "Combining Descriptive and Discriminative Information for Person Re-Identification",

abstract = "A central task in many visual surveillance scenarios is person re-identification, i.e., recognizing an individual person across a network of spatially disjoint cameras. This is a very hard task for human operators and even harder for automated systems due to several challenges such as changes in viewpoint, pose, and illumination. To cope with these difficulties, most existing methods either try to find a suitable description of a person{\textquoteright}s appearance or learn a discriminative model. Since these different representational strategies capture a large extent of complementary information, in this thesis, we propose to exploit both directions.In particular, we first introduce an application-focused approach of integrating a descriptive and a discriminative person model into a single system. Given a specific query person, we initially run a fast, descriptive stage, where appearance is captured by a set of region covariance descriptors. This allows us to quickly provide a preliminary search result to a human operator. In a second stage, the operator can then refine the thus obtained result by applying a discriminatively learned person model, which is based on boosting for feature selection. In this way, we can take advantage of both, the time efficiency of the descriptive as well as the improved accuracy of the discriminative model.The second part of this thesis is devoted to metric learning, a relatively new direction in the field of person re-identification. Although it provides a very elegant and mathematically principled fusion of descriptive and discriminative techniques, most existing metric learning approaches are not adapted to the task at hand and additionally suffer from high computational costs. Hence, in our work, we address these shortcomings and develop methods that are not only much more efficient, but also less prone to over-fitting, thus, enhancing their practical applicability in realistic, large-scale camera networks.In order to demonstrate the benefits of our combined strategy, we present results on several publicly available benchmark datasets of different complexity. We show that having two complementary information cues capturing diverse aspects of a person{\textquoteright}s appearance is advantageous for the given problem, and that metric learning can achieve state-of-the-art or even better performance, however, requiring much less computational power compared to many other person re-identification approaches.",

author = "Martin Hirzer",

year = "2014",

language = "English",

}

TY - BOOK

T1 - Combining Descriptive and Discriminative Information for Person Re-Identification

AU - Hirzer, Martin

PY - 2014

Y1 - 2014

N2 - A central task in many visual surveillance scenarios is person re-identification, i.e., recognizing an individual person across a network of spatially disjoint cameras. This is a very hard task for human operators and even harder for automated systems due to several challenges such as changes in viewpoint, pose, and illumination. To cope with these difficulties, most existing methods either try to find a suitable description of a person’s appearance or learn a discriminative model. Since these different representational strategies capture a large extent of complementary information, in this thesis, we propose to exploit both directions.In particular, we first introduce an application-focused approach of integrating a descriptive and a discriminative person model into a single system. Given a specific query person, we initially run a fast, descriptive stage, where appearance is captured by a set of region covariance descriptors. This allows us to quickly provide a preliminary search result to a human operator. In a second stage, the operator can then refine the thus obtained result by applying a discriminatively learned person model, which is based on boosting for feature selection. In this way, we can take advantage of both, the time efficiency of the descriptive as well as the improved accuracy of the discriminative model.The second part of this thesis is devoted to metric learning, a relatively new direction in the field of person re-identification. Although it provides a very elegant and mathematically principled fusion of descriptive and discriminative techniques, most existing metric learning approaches are not adapted to the task at hand and additionally suffer from high computational costs. Hence, in our work, we address these shortcomings and develop methods that are not only much more efficient, but also less prone to over-fitting, thus, enhancing their practical applicability in realistic, large-scale camera networks.In order to demonstrate the benefits of our combined strategy, we present results on several publicly available benchmark datasets of different complexity. We show that having two complementary information cues capturing diverse aspects of a person’s appearance is advantageous for the given problem, and that metric learning can achieve state-of-the-art or even better performance, however, requiring much less computational power compared to many other person re-identification approaches.

AB - A central task in many visual surveillance scenarios is person re-identification, i.e., recognizing an individual person across a network of spatially disjoint cameras. This is a very hard task for human operators and even harder for automated systems due to several challenges such as changes in viewpoint, pose, and illumination. To cope with these difficulties, most existing methods either try to find a suitable description of a person’s appearance or learn a discriminative model. Since these different representational strategies capture a large extent of complementary information, in this thesis, we propose to exploit both directions.In particular, we first introduce an application-focused approach of integrating a descriptive and a discriminative person model into a single system. Given a specific query person, we initially run a fast, descriptive stage, where appearance is captured by a set of region covariance descriptors. This allows us to quickly provide a preliminary search result to a human operator. In a second stage, the operator can then refine the thus obtained result by applying a discriminatively learned person model, which is based on boosting for feature selection. In this way, we can take advantage of both, the time efficiency of the descriptive as well as the improved accuracy of the discriminative model.The second part of this thesis is devoted to metric learning, a relatively new direction in the field of person re-identification. Although it provides a very elegant and mathematically principled fusion of descriptive and discriminative techniques, most existing metric learning approaches are not adapted to the task at hand and additionally suffer from high computational costs. Hence, in our work, we address these shortcomings and develop methods that are not only much more efficient, but also less prone to over-fitting, thus, enhancing their practical applicability in realistic, large-scale camera networks.In order to demonstrate the benefits of our combined strategy, we present results on several publicly available benchmark datasets of different complexity. We show that having two complementary information cues capturing diverse aspects of a person’s appearance is advantageous for the given problem, and that metric learning can achieve state-of-the-art or even better performance, however, requiring much less computational power compared to many other person re-identification approaches.

M3 - Doctoral Thesis

ER -