Multimedia data has a rich and complex structure in terms of inter- and intra-document references and can be an extremely valuable source of information. However, this potential is severely limited until and unless effective methods for semantic extraction and semantic-based cross-media exploration and retrieval can be devised. Today’s leading-edge techniques in this area are working well for low-level feature extraction (e.g. colour histograms), are focussing on narrow aspects of isolated collections of multimedia data, and are dealing only with single media types. MISTRAL follows the following lines of radically new research: MISTRAL will extract a large variety of semantically relevant metadata from one media type and integrate it closely with semantic concepts derived from other media types. Eventually, the results from this cross-media semantic integration will also be fed back to the semantic extraction processes of the different media types so as to enhance the quality of the results of these processes. MISTRAL will focus on most innovative, semantic-based cross-media exploration and retrieval techniques employing concepts at different semantic levels. MISTRAL addresses the specifics of multimedia data in the global, networked context employing semantic web technologies. The MISTRAL results for semantic-based multimedia retrieval will contribute to a significant improvement of today’s human-computer interaction in multimedia retrieval and exploration applications. New types of functionalities include but are not limited to:
* cross-media-based automatic detection of objects in multimedia data: For example, if a video contains an audio stream with barking together with a particular constellation of video features, the system can automatically consider the features in the video as an object “dog”.
* semantic-enriched cross-media queries: A sample query could be “find all videos with a barking dog in the background and playing children in the foreground”.
* cross-media synchronisation: The idea is to synchronize independent types of media according to the extracted semantic concepts. For example, if users see somebody walking in a video, they should also hear footfall from an audio.