Abstract
This paper presents Dynamically Pooled Complementary Features (DPCF), a unified approach to dynamic scene recognition that analyzes a short video clip in terms of its spatial, temporal and color properties. The complementarity of these
properties is preserved through all main steps of processing, including primitive feature extraction, coding and pooling. In the feature extraction step, spatial orientations capture static appearance, spatiotemporal oriented energies capture image dynamics and color statistics capture chromatic information. Subsequently, primitive features are encoded into a mid-level representation that has been learned for the task of dynamic scene recognition. Finally, a novel dynamic spacetime pyramid is introduced. This dynamic pooling approach can handle both global as well as local motion by adapting to the temporal structure, as guided by pooling energies. The resulting system provides online recognition of dynamic scenes that is thoroughly evaluated on the two current benchmark datasets and yields best results to date on both datasets. In-depth analysis reveals the benefits of explicitly modeling feature complementarity in
combination with the dynamic spacetime pyramid, indicating that this unified approach should be well-suited to many areas of video analysis.
properties is preserved through all main steps of processing, including primitive feature extraction, coding and pooling. In the feature extraction step, spatial orientations capture static appearance, spatiotemporal oriented energies capture image dynamics and color statistics capture chromatic information. Subsequently, primitive features are encoded into a mid-level representation that has been learned for the task of dynamic scene recognition. Finally, a novel dynamic spacetime pyramid is introduced. This dynamic pooling approach can handle both global as well as local motion by adapting to the temporal structure, as guided by pooling energies. The resulting system provides online recognition of dynamic scenes that is thoroughly evaluated on the two current benchmark datasets and yields best results to date on both datasets. In-depth analysis reveals the benefits of explicitly modeling feature complementarity in
combination with the dynamic spacetime pyramid, indicating that this unified approach should be well-suited to many areas of video analysis.
Original language | English |
---|---|
Pages (from-to) | 2389 - 2401 |
Journal | IEEE Transactions on Pattern Analysis and Machine Intelligence |
Volume | 38 |
Issue number | 12 |
DOIs | |
Publication status | Published - 2016 |
Keywords
- Dynamic scenes
- feature representations
- visual spacetime
- image dynamics
- spatiotemporal orientation
Fields of Expertise
- Information, Communication & Computing