NEXTWRAP - Next generation web wrapper technologies

Projekt: Foschungsprojekt

Beschreibung

Ontology Engineering in the Context of Data Extraction
In this part studies and research on approaches to ontology engineering are investigated to generate a basic framework that is designed for further reuse.
Ontology-based Intelligent Extraction
New methods for data extraction from non-HTML documents, in particular on non-structured formats, are studied. The research is mainly conducted on two formats, namely PDF and plain text, the latter mainly in the context of 3270 applications.
Novel Semantic Technologies in Wrapping
In this part the main goal is to map data instances that have been extracted from e.g. HTML documents to ontologies such as RDF-Schema or OWL. The declarative logic-based language Elog of the Visual Wrapper is ideally suited for tight integration with ontology repositories. Existing RDF repositories like Jena, Sesame and KAON and various existing RDF query languages are analyzed, and the APIs of the libraries are studied to explore ways how to connect the Lixto Visual Wrapper to these repositories.
Wrapper Adaptation
In this part the goal is to study automatic and semi-automatic repair technologies that change a wrapper accordingly to major structural changes on the underlying Web sites.
Human-Machine Communication: htmlButler
htmlButler is intended to be a commodity client server based tool through which general web users can visually specify to be informed via Email about changes in a certain area of interest on a Web page.
StatusAbschlussdatum
Tatsächlicher Beginn/ -es Ende1/01/0531/03/07