Assessing the Quality of Web Content

Elisabeth Lex, Inayat Khan, Horst Bischof, Michael Granitzer

Research output: Working paperPreprint

Abstract

This paper describes our approach towards the ECML/PKDD Discovery Challenge 2010. The challenge consists of three tasks: (1) a Web genre and facet classification task for English hosts, (2) an English quality task, and (3) a multilingual quality task (German and French). In our approach, we create an ensemble of three classifiers to predict unseen Web hosts whereas each classifier is trained on a different feature set. Our final NDCG on the whole test set is 0:575 for Task 1, 0:852 for Task 2, and 0:81 (French) and 0:77 (German) for Task 3, which ranks second place in the ECML/PKDD Discovery Challenge 2010.
Original languageEnglish
Publication statusPublished - 12 Jun 2014

Publication series

NamearXiv.org e-Print archive
PublisherCornell University Library

Keywords

  • information quality
  • classification
  • machine learning
  • big data

ASJC Scopus subject areas

  • Computer Science(all)

Fields of Expertise

  • Information, Communication & Computing

Fingerprint

Dive into the research topics of 'Assessing the Quality of Web Content'. Together they form a unique fingerprint.

Cite this